1 Introduction

Drought is a major natural hazard having severe consequences in regions all over the world. The range of drought impacts is related to drought occurring in different stages of the hydrologic cycle, and usually, different types of droughts are distinguished. The origin is a meteorological drought, which is defined as a deficit in precipitation. A meteorological drought can develop into a soil moisture drought, which may reduce agricultural production and increase the probability of forest fires. It can further develop into a hydrologic drought defined as a deficit in surface water and groundwater, for example, reducing water supply for drinking water, irrigation, industrial needs, and hydropower production, causing death of fish and hampering navigation in some countries (Yahiaoui et al. 2009). Temperatures, high winds, low relative humidity, timing and characteristics of rains, including distribution of rainy days during crop growing seasons, intensity and duration of rain, and onset and termination play a significant role in the occurrence of droughts (Mishra and Singh 2010). Natural hazards, such as drought, tsunami, hurricane, flood, wildfire, and earthquake, are likely to become ever more costly in human lives and economic development (World Bank Independent Evaluation Group 2006). The lessons learned from the 2004 Tsunami in the Indian Ocean and 2005 Hurricane Katrina in the southern United States indicate an urgent need to develop new means for early warning and response systems to enhance human collaborative capabilities in coping with large-scale natural hazards (UNDP 2005; The White House 2006). While modern remote sensing, spatial modeling, and geographic information technologies help users to detect, simulate, and forecast environmental changes, such technology is not yet well integrated with multilevel social cooperative responses. The new techniques such as artificial neural networks (ANN), fuzzy logic (FL), and ANFIS have been recently accepted as an efficient alternative tool for modeling of complex hydrologic systems and widely used for forecasting. For instance, some special applications of ANN in hydrology including modeling rainfall-runoff process (Jeong and Kim 2005; Kumar et al. 2005), hydrologic time series modeling (Jain and Kumar 2007), sediment concentration estimation (Nagy et al. 2002), estimation of heterogeneous aquifer parameters (Mantoglou 2003), runoff, and sediment yield modeling (Agarwal et al. 2006). Morid et al. (2007) examined the effectiveness of ANN approach for medium and long-term forecasting of both the probability of drought events and their severity.

The standardized precipitation index (SPI) is a tool which was developed principally for defining and monitoring drought. It allows a researcher to determine a drought at a given time scale (temporal resolution) of interest for any rainfall station with historic data. The SPI is not a drought prediction tool. McKee et al. (1993) developed the SPI to calculate the precipitation shortage for multiple time scales, reflecting the impact of precipitation deficiency on the accessibility of different water supplies. The SPI provides a quick and useful approach to drought analysis. Other advantages of this approach are its relative ease and minimal data requirements. Mishra and Desai (2006) applied the feed-forward recursive neural network and ARIMA models for drought forecasting using SPI series as a drought index. The results have demonstrated that neural network method can be successfully applied for drought forecasting. Wu et al. (2008) applied the neural network method to establish a risk evaluation model of heavy snow disaster using back-propagation artificial neural network (BP-ANN). On the other hand, several studies have also been carried out using FL in hydrology and water resources planning (Nayak et al. 2005; Altunkaynak et al. 2005). In recent years, adaptive neuro-fuzzy inference system (ANFIS), which is integration of ANN and FL methods, has been used in the modeling of nonlinear engineering and water resources problems (Sen and Altunkaynak 2006; Firat and Gungor 2007, 2008). Moreover, Chou and Chen (2007) have used the neuro-fuzzy computing technique for the development of drought early warning index. For this aim, an approach has been proposed to develop drought early warning index (DEWI) for Southern Taiwan to detect the drought in advance for setting up proper plans to mitigate the water shortage impacts. Hana et al. (2010) developed the AR (1) model for VTCI series in order to drought forecasting in the Guanzhong Plain. Several other studies were performed on drought indices, especially SPI such as Moreira et al. (2006, 2008), Canon et al. (2007), Mishra et al. (2007), Duggins et al. (2010), Kao and Govindaraju (2010), and Dupuis (2010). The main objective of this research is to develop Wavelet-ANFIS and Wavelet-ANN models to forecast 6-month SPI for different prediction periods. In this research, Wavelet-ANFIS and Wavelet-ANN models were developed and compared with ANFIS and ANN models in the Ajabshir Plain, Iran.

2 The adaptive neuro-fuzzy inference system (ANFIS)

The ANFIS is a universal estimator and is able to approximate any real continuous function on a compact set to any degree of accuracy (Jang et al. 1997). The basic structure of the type of fuzzy inference system could be seen as a model that maps input characteristics to input membership functions. Then, it maps input membership function to rules and rules to a set of output characteristics. Finally, it maps output characteristics to output membership functions and the output membership function to a single-valued output or a decision associated with the output (Jang et al. 1997). Each fuzzy system contains three main parts including fuzzifier, fuzzy database, and defuzzifier. Also, Fuzzy database consists of two main parts containing fuzzy rule base, and inference engine.

Figure 1 represents a typical ANFIS architecture. In layer one, every node is an adaptive node with a node function such as a generalized bell membership function or a Gaussian membership function. In layer two, every node is a fixed node representing the firing strength of each rule and is calculated by the fuzzy and connective of the ‘product’ of the incoming signals. In layer three, every node is a fixed node showing the normalized firing strength of each rule. The ith node calculates the ratio of the ith rule’s firing strength to the summation of two rules firing strengths. In layer four, every node is an adaptive node with a node function indicating the contribution of ith rule toward the overall output. In layer five, the single node is a fixed node indicating the overall output as the summation of all incoming signals (Jang and Sun 1995).

Fig. 1
figure 1

A typical ANFIS architecture (Jang 1993). Here, x and y are the inputs and z is the final output; A1, A2, B1, and B2 are the linguistic label (small, large, etc.) associated with this node function, and wi is the normalized firing strength that is the ratio of the ith rule’s firing strength (Wi) to the summation of the first and second rules’ firing strengths (W1 and W2) and Π is the node label

3 Artificial neural network (ANN)

ANNs are powerful nonlinear modeling approaches based on the function of the human brain. They can identify and learn correlated patterns between input data sets and target values. Neural networks can be described as a network of simple processing nodes or neurons, interconnected to each other in a specific order, performing simple numerical manipulations (See and Openshaw 1999). A common three-layered neural network is composed of several elements namely nodes. These networks are made up of an input layer consisting of nodes representing different input variables, the hidden layer consisting of many hidden nodes, and an output layer consisting of output variables (Haykin 1999).

Feed-forward neural networks are successfully applied in many different problems. This network architecture and the corresponding learning algorithm can be viewed as a generalization of the popular least-mean-square (LMS) algorithm (Haykin 1999). Fully recurrent networks, introduced by Elman (1988), feed the outputs of the hidden layer back to itself. Partially recurrent networks start with a fully recurrent net and add a feed-forward connection that bypasses the recurrence, effectively treating the recurrent part as a state memory. Recurrent networks are the state of the art in nonlinear time series prediction, system identification, and temporal pattern classification.

4 Wavelet transform

Wavelet transform is a time-dependent spectral analysis that decomposes time series in the time–frequency space and provides a time-scale illustration of processes and their relationships (Daubechies 1990). In this method, the data series are broken down by the transformation into its ‘wavelets’, a scaled and shifted version of the mother wavelet (Grossman and Morlet 1984). Wavelet analysis allows the use of long-time intervals for low-frequency information and shorter intervals for high-frequency information and is capable of revealing aspects of data like trends, breakdown points, and discontinuities that other signal analysis techniques might miss.

The discrete wavelet transform (DWT) is used to decompose the time series. The classical continuous wavelet transform (CWT) requires a significant amount of computation time and data (Partal 2009; Adamowski 2007). So, in this study, the discrete wavelet transform is preferred because it requires less computation time and data (Christopoulou et al. 2002). The DWT operates two sets of functions: high-pass and low-pass filters. The original time series is passed through high-pass and low-pass filters, and detailed coefficients and approximation series are obtained. In fact, it decomposes the signal into a group of functions (Cohen and Kovacevic 1996):

$$ \psi_{j,k} (x) = 2^{j/2} \psi_{j,k} (2^{j} x - k) $$
(1)

where ψ j,k (x) is produced from a mother wavelet ψ(x) which is expanded by j and translated by k. The discrete wavelet function of a signal f(x) can be calculated as follows:

$$ \begin{aligned} c_{j,k} & = \int\limits_{ - \infty }^{\infty } {f(x)\psi_{j,k}^{*} (x){\text{d}}x} \\ f(x) & = \sum\limits_{j,k} {c_{j,k} \psi_{j,k} (x)} \\ \end{aligned} $$
(2)

where c j,k is the approximate coefficient of a signal. The mother wavelet is formulated from the scaling function φ(x) as:

$$ \begin{aligned} \varphi (x) & = \sqrt 2 \sum {h_{0} (n)} \varphi (2x - n) \\ \psi (x) & = \sqrt 2 \sum {h_{1} (n)} \varphi (2x - n) \\ h_{1} (n) & = ( - 1)^{n} h_{0} (1 - n) \\ \end{aligned} $$
(3)

Different sets of coefficients h 0 (n) can be found corresponding to wavelet bases with various characteristics. In the DWT, coefficients h 0 (n) play a critical role (Rajaee 2011). In summary, although, several studies have been conducted on application of ANN and FL techniques to find out how accurately they can predict weather and climate processes, few researches have integrated ANN and FL as a new method so-called ANFIS so that previous studies have shown that ANFIS model has more advantages and less weaknesses than other computational intelligence techniques. So, the present study is programmed to evaluate the capability of this method for meteorological drought forecasting in the part of East Azerbaijan Province, Iran.

5 Materials and methods

5.1 Study area

The study site located in Tabriz, East Azerbaijan province, Iran, lies from 45 30′ to 46 30′ E longitude and from 37 to 38 N latitude (Fig. 2). The average of yearly precipitation and temperature of Ajabshir plain is 272.3 mm and 13.3 °C, respectively. In recent years, several drought events have caused many agricultural and social loses on account of most of people here are strongly dependent on agricultural productions. The climate of the study area is Mediterranean and most of rainfall occurs in half of year, and there are two wet and dry periods in each year. This is the reason that the 6-month SPI is used in this study.

Fig. 2
figure 2

Location of the study area in East Azerbaijan province, Iran

5.2 SPI

The time series data from 1952 to 2011 were used in this study (the first 40 years for training and the last 20 years for verification). The SPI index was calculated using the precipitation data with the DIP (Drought Indices Package) software based on the following equation:

In most cases, the distribution that best models observational precipitation data is the Gamma distribution. The density probability function for the gamma distribution is given by the expression:

$$ g(x) = \frac{1}{{\beta^{\alpha } \Upgamma (\alpha )}} x^{\alpha - 1} \hbox{e}^{{\frac{ - x}{\beta }}} \quad {\text{for}}\, x> 0 $$
(4)

where α > 0 is the shape parameter, β > 0 is the scale parameter, and x > 0 is the amount of precipitation. \( \Upgamma (\alpha ) \) is the value taken by the standard mathematical function known as the Gamma function defined by the integral:

$$ \Upgamma \left( \alpha \right) = \mathop {\lim }\limits_{n \to \infty } \mathop \prod \limits_{\upsilon = 0}^{n - 1} \frac{{n!n^{y - 1} }}{y + \upsilon } = \mathop \int \limits_{0}^{\infty } y^{\alpha - 1} e^{ - y} {\text{d}}y $$
(5)

In general, the gamma function is evaluated either numerically or using the values tabulated depending on the value taken by parameter α. In order to model the data observed with a gamma distributed density function, it is necessary to estimate appropriately parameters α and β. Different methods have been suggested in the literature for the estimate of these parameters, for example, in Edwards and McKee (1997), the Thom (1958) approximation is used for the maximum probability.

$$ \widehat{\alpha } = \frac{1}{4A}\left( {1 + \sqrt {1 + \frac{4A}{3}}\,4A} \right) $$
(6)
$$ \widehat{\beta } = \frac{{\overline{x} }}{{\widehat{\alpha }}} $$
(7)

where for n observations

$$ A = \ln \left( {\overline{x} } \right) - \frac{\mathop \sum \nolimits \ln (x)}{n} $$
(8)

The estimate of the parameters can be further improved by using the interactive approach suggested in Wilks (1995). After estimating coefficients α and β, the density of probability function g(x) is integrated with respect to x and we obtain an expression for cumulative probability G(x) that a certain amount of rain has been observed for a given month and for a specific time scale.

$$ G\left( x \right) = \int\limits_{0}^{\infty } g \left( x \right){\text{d}}x = \frac{1}{{\hat{\beta }^{{\hat{\alpha }}} \Gamma (\hat{\alpha })}}\int\limits_{0}^{x} {x^{{\hat{\alpha } - 1}} } e^{{ - x/\hat{\beta }}} {\text{d}}x $$
(9)

Taking \( t = x/\hat{\beta } \), this equation becomes the incomplete gamma function

$$ G\left( x \right) = \frac{1}{{\Upgamma (\hat{\alpha })}}\mathop \int \limits_{0}^{x} t^{{\hat{\alpha } - 1}} e^{ - t} {\text{d}}t $$
(10)

The gamma function is not defined by x = 0, and since there may be no precipitation, the cumulative probability becomes:

$$ H\left( x \right) = q + \left( {1 - q} \right)G(x) $$
(11)

where q is the probability of no precipitation. The cumulative probability is then transformed into a normal standardized distribution with null average and unit variance from which we obtain the SPI index. For the details, see Edwards and McKee (1997) or Lloyd-Hughes and Saunders (2002). The above approach, however, is neither practical nor numerically simple to use if there are many grid points or many stations on which to calculate the SPI index. In this case, an alternative method was described in Edwards and McKee (1997) using the technique of approximate conversion developed in Abramowitz and Stegun (1965) that converts the cumulative probability into a standard variable Z.

The SPI index is then defined as:

$$ Z = {\text{SPI}} = - \left( {t - \frac{{c_{0} + c_{1} t + c_{2} t^{2} }}{{1 + d_{1} t + d_{2} t^{2} + d_{3} t^{3} }}} \right)\quad {\text{for }}0 < H\left( x \right) < 0.5 $$
(12)
$$ Z = {\text{SPI}} = + \left( {t - \frac{{c_{0} + c_{1} t + c_{2} t^{2} }}{{1 + d_{1} t + d_{2} t^{2} + d_{3} t^{3} }}} \right)\quad {\text{for }}0.5 < H\left( x \right) < 1 $$
(13)
$$ t = \sqrt {\ln \left[ {\frac{1}{{\left( {H\left( x \right)} \right)^{2} }}} \right]} \quad {\text{for }}0 < H\left( x \right) < 0.5 $$
(14)
$$ t = \sqrt {\ln \left[ {\frac{1}{{\left( {1 - H\left( x \right)} \right)^{2} }}} \right]} \quad {\text{for }}0.5 < H\left( x \right) < 1 $$
(15)

where x is precipitation, H(x) is the cumulative probability of precipitation observed, and c0, c1, c2, d0, d1, d2 are constants with the following values:

$$ {\text{c}}^{0} = 2.515517,\,{\text{c}}^{1} = 0.802853,\,{\text{c}}^{3} = 0.010328 $$
$$ {\text{d}}^{0} = 1.432788,\,{\text{d}}^{1} = 0.189269 ,\,{\text{d}}^{3} = 0.001308 $$

5.3 ANN models

The ANN models for meteorological drought forecasting were developed using MATLAB R2010 software. In this paper, two networks were constructed including feed-forward neural network (FNN) and Elman or recurrent neural network (RNN). Also, 2, 3, 4, 5, 6, 7, 8, 9, and 10 nodes were tested as the number of nodes in the hidden layer. The numbers of nodes were selected using a common trial-and-error approach. There are several training algorithms for ANNs such as gradient descent with momentum and adaptive learning rate back-propagation (GDX), Levenberg–Marquardt (LM), and Bayesian regularization (BR) (as were used in this research). GDX uses back-propagation to calculate derivatives of performance cost function with respect to the weight and bias variables of the network. Each variable is adjusted according to the gradient descent with momentum. For each step of the optimization, if performance decreases the learning rate is increased. This is probably the simplest and most common way to train a network (Haykin 1999). Similarly, the LM method is a modification of the classic Newton algorithm for finding an optimum solution to a minimization problem. The BR is an algorithm that automatically sets optimum values for the parameters of the objective function. In the approach used, the weights and biases of the network are assumed to be random variables with specified distributions. In order to estimate regularization parameters, which are related to the unknown variances, statistical techniques are being used. The advantage of this algorithm is that whatever the size of the network, the function will not be over-fitted (Maier and Dandy 1998). The SPI data for the current time were imported as inputs, and the SPI for one, two, and three time steps (6 months) ahead were considered as target.

5.4 ANFIS models

This model was trained and tested using the data set same as those were used in ANN models. Four different membership functions (MFs) were tested for ANFIS models in this work, that is, Gaussian (MFgauss), bell-shaped (MFgbell), triangular (MFtri), and spline-based (MFpi), or Piduetoits shape. ANFIS models with different types of MF were run with 2, 3, 4, and 5 MFs and with 50, 100, 150, 200, 250, 300, and 400 iterations for each node of input data (Rajaee 2011). Training characteristics of the ANFIS model are presented in Table 1. The SPI data for the current time were imported as inputs, and the SPI for on, tow, and three time steps (12 months) ahead were considered as target.

Table 1 Training parameters of the ANFIS models

5.5 Wavelet-ANN and Wavelet-ANFIS

In order to build hybrid Wavelet-ANN and Wavelet-ANFIS model, sub-series elements which are derived from the use of the discrete wavelet transform on the original time series data were used as inputs for neural network models. Each sub-series element plays a unique role in the original time series, and the performance of each sub-series is distinct. In the first step, the original SPI data were decomposed into a series of details using discrete wavelet transform. The decomposition process was iterated with successive approximation signals being decomposed in turn, so that the original time series was broken down into many lower resolution components (Adamowski and Fung Chan 2011). All of the mentioned variables were decomposed to 1, 2, 3, and 4 levels by seven different kinds of wavelets, that is, db4, bior1.1, bior1.5, rboi1.1, rboi1.5, coif2, and coif4 wavelets.

5.6 Performance comparison of models

Coefficient of determination (R 2) and root mean squared error (RMSE) were used to compare the performance of models and select the best one.

$$ R^{2} = 1 - \frac{{\sum\nolimits_{i = 1}^{N} {(y_{o} - y_{e} )^{2} } }}{{\sum\nolimits_{i = 1}^{N} {(y_{o} - \bar{y}_{o} )^{2} } }} $$
(16)
$$ {\text{RMSE}} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {(y_{o} - y_{e} )^{2} } }}{N}} $$
(17)
$$ E = 1 - \frac{{\sum\nolimits_{i = 1}^{n} {(y_{o} - y_{e} )^{2} } }}{{\sum\nolimits_{i = 1}^{n} {(y_{o} - \bar{y}_{o} )^{2} } }} $$
(18)

where y o , y e , and N are observed and estimated SPI, and number of data, respectively. In the Nash–Sutcliffe model efficiency coefficient, an efficiency of one corresponds to a perfect match of forecasted data to the observed data. An efficiency of zero indicates that the model predictions are as accurate as the mean of the observed data (Pulido-Calvo and Gutierrez-Estrada 2009).

6 Results and discussion

6.1 ANN models

In this study, several different networks were tested. Table 2 shows the R 2 and RMSE for different ANN models. The best number of neurons in the hidden layer, that is, was determined 4 neurons and for the different time step predictions. As shown in this table, FNN-BR (1 4 1) was the best network. This result is in contrast to the suggestion that was expressed by Kavzoglu and Mather (2003). They suggest that the number of neurons in hidden layer should be set as twice as the input nodes plus one. It confirms that the number of neurons in hidden layer should be determined using a trial-and-error method. All models forecasted the meteorological drought for one and two time steps ahead appropriately, but for 3 and 4 months ahead, the R 2 and Nash–Sutcliffe model efficiency coefficient decreased and RMSE increased meaningfully. In other words, more time step duration less accuracy is expected (Table 2).

Table 2 R 2 and RMSE for different ANN models with the best number of neurons in the hidden layer (4 neurons) and for the different predicted time steps (verification data set)

6.2 ANFIS models

There is not any basic rule to delineate the number of membership functions (MFs) of ANFIS models, and they usually determined by trial-and-error approach. To select the number of MFs, a modeler should avoid using a large number of membership functions or parameters to save time and calculation effort (Keskin et al. 2004). Four different types of membership function (MF) were tested in this study including Gaussian (MFgauss), Bell-shaped (MFgbell), Triangular (MFtri), and Spline-based (MFpi), or Piduetoits shape (Jang 1993). ANFIS models with different types of MF were run with 2, 3, 4, and 5 MFs and with 50, 100, 150, 200, 250, 300, and 400 iterations for each node of input data (Table 3).

In general, ANFIS performs more efficiently than ANN models on account of the effect of fuzzification of the input through membership functions.

Table 3 Results of ANFIS models with the best number of membership functions (2 MF), the best iteration (150), and different predicted time steps (verification data set)

6.3 Wavelet-ANN and Wavelet-ANFIS models

In this part, Wavelet-ANN and Wavelet-ANFIS models were tested. The best ANN model was selected to make hybrid Wavelet-ANN models. The data were grouped into 1, 2, 3, and 4 levels. The best Wavelet-ANN models were found to forecast SPI more accurately than both chosen ANN and ANFIS models.

Table 4 shows the results of Wavelet-ANN models with the best network (i.e. FNN-BR) and for different predicted time steps. Moreover, Table 5 shows the results of Wavelet-ANFIS models with the best number and type of MF (i.e. 4 membership function and bell-shape membership function) and for different predicted time steps.

Table 4 Results of Wavelet-ANN models for the different predicted time steps (verification data set)
Table 5 Results of Wavelet-ANFIS models for the different predicted time steps (verification data set)

According to the monthly time-scale resolution, in the subgroup level 2, there are 2 subsets (21 month mode which is nearly yearly mode and 22 month). In fact, the duration and time interval of data series should be considered in order to determine the appropriate number of decomposition levels. This result is in accordance with Nourani et al. (2011) who evaluated some artificial intelligence methods to model rainfall-runoff process. Generally, Wavelet-ANFIS and Wavelet-ANN models were found to provide more accurately predicted meteorological drought than ANN and ANFIS models. Also, it was reached that Wavelet-ANFIS hybrid model is able to forecast SPI more accurately than Wavelet-ANN. Since Wavelet-ANFIS models consisted of ANN, FL, and wavelet transform, they could model SPI more accurately. Figure 3 shows the best ANN, ANFIS, Wavelet-ANN, and Wavelet-ANFIS models that could predict SPI in a good agreement with observed ones for both subbasins. As graphically illustrated by this figure, all models often underestimated SPI. The order of models according to their accuracy is as following: Wavelet-ANFIS, Wavelet-ANN, ANFIS, and ANN, respectively.

Fig. 3
figure 3

Observed and simulated SPI using selected models

7 Conclusions

The capability of coupled Wavelet-ANFIS models in comparison with ANN, ANFIS, and Wavelet-ANN models for one, two and three months ahead of forecasted SPI was assessed in this study. Wholly, mentioned time series are characterized by high nonlinearity, nonstationary, and seasonality behavior. In this study, the wavelet transform, ANN, and ANFIS approaches were combined in order to develop two hybrid models to forecast SPI for different time steps. At first, ANN and ANFIS models were used without any pre-processing. Results showed that these models may be unable to cope with the nonlinearity and seasonality behavior of data. In the second step, wavelet transform was performed on the data and then, the pre-processed data were used as input for ANN and ANFIS models. This research demonstrated that pre-processed data can improve SPI forecasting. Also, results showed that Wavelet-ANFIS hybrid model had the best performance. Additionally, it was concluded that all modeling approaches are capable nearly of forecasting SPI accurately.