Introduction

Droughts have recently been encountered in many parts and different climate zones of the world. Droughts are classified as common and recurrent natural disasters. A clear definition of drought was made at the United Nations Combat Desertification Symposium held in Italy in 1997. It was defined as “a natural event that causes hydrological balance to be disrupted and water and soil resources to be adversely affected due to precipitation falling significantly below the recorded normal levels” (Yu et al. 2022). Drought is classified by researchers based on several different criteria. Three different types of droughts are generally used in the literature: meteorological, agricultural, and hydrological droughts. In some sources, socio-economic drought was also mentioned. Lack of precipitation is defined as “meteorological drought,” decrease/lack of surface or ground water is defined as “hydrological drought,” restriction of agricultural production due to lack of precipitation and surface or ground water is defined as “agricultural drought,” and finally, lack of water that affects the production and consumption activities of the society is defined as “socio-economic drought.” Besides these drought types, various indices have been developed by researchers to designate the frequency, severity, duration, and geographic distribution of droughts. Drought indices such as the “standardized precipitation index” (SPI) (McKee et al. 1993), “standardized precipitation evapotranspiration index” (SPEI) (Vicente-Serrano et al. 2010), and “Palmer drought severity index” (PDSI) (Palmer 1965) have extensively been used for monitoring meteorological droughts over the world (Salimi et al. 2021). Current and future trends of drought should be analyzed reliably, and realistic policies should be developed accordingly. In this sense, it becomes clear how important the precipitation and drought predictions are. Especially in recent years, artificial intelligence techniques have become popular tools to be used in drought prediction studies. Mishra et al. (2007) proposed a hybrid version of a linear stochastic model and a nonlinear neural network model for drought prediction in Kansabati River Basin in India. This hybrid model included autoregressive integrated moving average (ARIMA), artificial neural network (ANN), time series, and complex autocorrelation structures. Performance criteria revealed that this hybrid model predicted droughts more accurately than both the stochastic model and the artificial neural network (Mishra et al. 2007). Belayneh et al. (2016) created five different models to predict long-term drought incidences in the Awash River Basin of Ethiopia. Five models included conventional stochastic model (ARIMA), ANN, and SVM with wavelet transform. Hybrid versions with wavelet transform were superior to other stand-alone models in drought predictions (Belayneh et al. 2016). Başakın et al. (2019) tried to predict the future Palmer drought severity index values using machine learning (ML) algorithm from 116 years of precipitation data of Kayseri Region in Turkey. The accuracy of predictions made by SVM and K-nearest neighbors (KNN) algorithms was measured statistically. ML technologies contributed significantly to the solution of hydrological problems (Başakın et al. 2019). Kaur and Sood (2020) stated that there was a need to establish an automatic system that works globally because the current drought indices were not universal. Therefore, a framework model has been created for the evaluation of drought-causing parameters with the use of ANN, ANN optimized with genetic algorithm (GA), and deep neural networks (DNN) capabilities. Support vector regression (SVR) method was used to predict drought incidences in three different climate zones and three different time frames. SVR model showed a high performance in terms of accuracy, sensitivity, and originality (Kaur and Sood 2020). Fadaei-Kermani and Ghaeini-Hessaroeyeh (2020) proposed a new strategy based on fuzzy-KNN model to deal with drought monitoring. A method was presented to anticipate the most likely drought situations using the standardized precipitation index and the fuzzy-KNN methodology. The model was used to monitor droughts in Kerman, located in the southeast of Iran. In recent years, relevant area has experienced severe droughts and rainfall deficits (Fadaei-Kermani and Ghaeini-Hessaroeyeh 2020). Özger et al. (2020) used pre-processing techniques (empirical mode decomposition (EMD) and wavelet transform (WD)) to predict 1-, 3-, and 6-month self-calibrated Palmer drought severity index (sc-PDSI) values of Adana and Antalya provinces of Turkey using stand-alone M5 model tree, adaptive neuro-fuzzy inference system (ANFIS), and support vector machines (SVM) methods. WD predictions were more accurate than EMD forecasts, and selection of proper wavelet type had a substantial impact on the results (Özger et al. 2020).

This study focused on prediction of short-term meteorological droughts in Sakarya Meteorological Station located in the northwest of Turkey by using SPI drought values at 1-, 3-, and 6-month time scales calculated from monthly precipitation data of 1960–2020 period with the use of hybrid prediction models. Five different ML methods to be used in stand-alone prediction models were ANN, ANFIS, GPR, SVM, and KNN. Besides these stand-alone models, three different pre-processing techniques (DWT, EMD, and VMD) were used. Fifteen hybrid versions were created by using three pre-processing techniques together with five stand-alone ML methods. Prediction performance of the stand-alone and hybrid versions was compared with each other. The effects of pre-processing techniques on prediction models were discussed extensively. This is the first study in which DWT, EMD, and VMD were considered together as a pre-processing technique in drought prediction research.

Materials and methods

Study area and meteorological data

In this study, SPI values calculated from the monthly precipitation data recorded between 1960 and 2020 of Sakarya Meteorology Station located in the northwest of Turkey were used. The station is located at latitude 40°76′ and longitude 30°39′, and its geographic location is shown in Fig. 1. Black Sea climate with hot summers and warm winters is dominant in Sakarya region. Precipitation is encountered in all seasons. The coldest month is January is the coldest month, and July is the hottest month. The annual average temperature is 14.6 °C, and the monthly average precipitation is 71.10 mm. The highest daily total precipitation (127.7 mm) was measured on 26 June 1999 (Republic of Turkey 2022). Change in annual total precipitation from 1960 to 2020 is presented in Fig. 2.

Fig. 1
figure 1

Study area

Fig. 2
figure 2

Annual total precipitation from 1960 to 2020 for Sakarya Meteorology Station

The standardized precipitation index

SPI, proposed by McKee et al. (1993), calculates the meteorological drought index based only on the precipitation parameter. With the use of SPI, arid or humid conditions and anomalies can be determined at a certain time scale anywhere in the world from precipitation data records. For each time scale, a functional and quantitative description of drought can be constructed using the SPI as the drought index. It was initially developed according to the Gamma distribution, but later, it was found to be suitable for the normal distribution. It is calculated with the use of Eq. 1:

$$\mathrm{SPI}=\frac{{\mathrm{x}}_{\mathrm{i}}-\overline{\mathrm{x}}}{\upsigma }$$
(1)

where \({x}_{i}\) represents precipitation value for ith time, \(\overline{x }\) represents average of precipitation values, and σ represents standard deviation (McKee et al. 1993). The advantages of SPI are as follows: it depends only on precipitation; it is easy to determine the beginning and end of meteorological drought; SPI is only about probability; it is very practical and easy to calculate; SPI provides early drought warning for different time scales; it is less complex than the Palmer drought index (PDI), since it conforms to the normal distribution; and the humid period can be followed as well as the dry period (Sevinc and Sen 2003). The classification of SPI accepted in the literature is provided in Table 1.

Table 1 The classification of SPI

Calculation and classification of SPI values are related to probability distribution functions. It was initially developed according to the gamma distribution, but later it was found to be suitable for the normal distribution. The scale, shape, and precipitation parameters of the gamma function are found for each station and time scale. Following the prediction of gamma function, the total likelihood function is calculated. Then, the SPI drought values are calculated by transforming the total likelihood distribution into the standard normal distribution. This probability function, which fits the normal distribution, is converted to a normal random SPI with a mean of zero and a variance of one. The normal random SPI value here indicates the Z-score, which can take a positive or negative value above or below the mean.

Stand-alone models

Artificial neural networks (ANNs)

ANNs imitate nerve cells, the smallest processing unit of the brain, the center of human nervous system, and enable computers to gain the ability of human learning. ANNs have the ability to learn and assist the concept of artificial intelligence. ANNs make generalization through self-training with input parameters presented to it and, following this generalization, generate an output parameter against the presented input parameters (Citakoglu 2017). ANNs are comparable to neurons of the human brain in that they are parallel computing systems. Activation (transfer) function, weights, and nodes are three aspects that define them. Linear, logistic, and tangent activation functions are common in engineering applications. To get an output, each neuron multiplies each input by its connectivity weight, adds the products, and then transfers the sum through a transfer function. The transfer function is commonly a sigmoid function, which is a differentiable, constantly growing S-shaped curve. The output function (yi) has a range of 0 to 1, but the inputs can have an infinite range of values. The yi from the jth neuron in a layer is calculated using the threshold function:

$${y}_{i}=f\left(\sum {w}_{ji}{.x}_{i}\right)= \frac{1}{1+ {e}^{-\left(\sum {w}_{ji}{.x}_{i}\right)}}$$
(2)

where xi is the value of the ith neuron in the previous layer, wji is the weight of relationship connecting the jth neuron in a layer to the ith neuron in previous layer, and f() is a transfer function, which is the rule for mapping the neuron’s summed input to its output and can be used to introduce nonlinearity into the network design if chosen correctly (Haykin 1998).

Adaptive neuro-fuzzy inference system

ANFIS is a combined method of artificial neural networks and fuzzy logic. It was developed by Jang in the 1990s. ANFIS is based on Takagi–Sugeno-type fuzzy inference system, which is used to predict chaotic time series through nonlinear functions. By identifying the input structure, ANFIS exploits the learning capacity of ANNs to describe the input–output relationship and builds fuzzy rules. System results are acquired with the use of thinking and reasoning ability fuzzy logic. In this method, membership functions and fuzzy rules created as “if–then” are taken into account while finding output values against the input values presented to the adaptive network. The learning algorithm used by ANFIS to optimize the input and output values is a hybrid learning algorithm that combines the least-squares approach and the backpropagation learning algorithm (Jang 1993; Citakoglu 2015).

The Takagi–Sugeno-type if–then fuzzy rule-based ANFIS structure with x and y inputs and z outputs is mathematically shown as follows:

$$Rule\;1=If\; \left(x\;is\;{A}_{1}\right)\;and\;\left(y\;is\;{B}_{1}\right)\;Then\;\left({f}_{1}={p}_{1}x+{q}_{1}y+{r}_{1}\right)$$
(3)
$$Rule\;2=If\;\left(x\;is\;{A}_{2}\right)\;and\;\left(y\;is\;{B}_{2}\right)\;Then\;\left({f}_{2}={p}_{2}x+{q}_{2}y+{r}_{2}\right)$$
(4)

where x and y indicate the inputs, Ai and Bi symbolize the fuzzy sets, fi denotes the outputs inside the fuzzy region denoted by the fuzzy rule, and pi, qi, and ri denote the design parameters discovered during the training phase. Six layers of ANFIS model required to realize these two rules are briefly explained below:

Layer 1: this is the input layer, which just fixes the system input variable.

Layer 2: the is the fuzzification layer, used to define the membership grades of each input set using the fuzzy membership \(\left({\mu }_{{A}_{i}}(x)\right)\) function as follows:

$${\mu }_{{A}_{i}}\left(x\right)=exp\left[-\frac{1}{2}{\left(\frac{{x}_{1}-{c}_{i}}{{\sigma }_{i}}\right)}^{2}\right]$$
(5)

where the parameters \(\left\{{\sigma }_{i},{c}_{i}\right\}\) make up a premise parameter set, \({\mu }_{{A}_{i}}\left(x\right)\) is the fuzzy membership function expressed at the ith fuzzy set \({A}_{i}\), and \({x}_{1}\) is the ith the input.

Layer 3: this is the rule layer, output nodes are the rule’s firing strength, which is stated as the product of membership grades in the following way:

$${w}_{i}= {\mu }_{{A}_{i}}\left({x}_{1}\right)\times {\mu }_{{B}_{i}}\left({x}_{2}\right)$$
(6)

Layer 4: this layer (wi) contains normalized firing strengths. The output of the ith node equals the ratio of the ith rule’s firing strength to the sum of all rule’s firing strengths, as follows:

$$\overline{{w }_{i}}= \frac{{w}_{1}}{{w}_{1}+{w}_{2}+{w}_{3}+{w}_{4}}$$
(7)

where \({w}_{1}\) is the first rule, \({w}_{2}\) is the second rule, \({w}_{3}\) is the third rule, and \({w}_{4}\) is the fourth rule.

Layer 5: the ith rule’s weighted output value is computed as follows in this layer:

$${N}_{i}=\overline{{w }_{i}}{ f}_{i}=\overline{{w }_{i}}\left({p}_{i}{x}_{1}+{q}_{i}{x}_{2}+{r}_{i}\right)$$
(8)

where \(\left\{p, q, r\right\}\) is a set of resultant parameters. They are determined by using least-squares method, \({x}_{1}\) and \({x}_{2}\) are inputs.

Layer 6: this is the summation layer. This layer calculates the overall output by summing all of the incoming signals as described as follows:

$$y=\sum_{i}\overline{w}{f }_{i}$$
(9)

Gaussian process regression (GPR)

GPR is an ML method that combines a set of random variables in which several variables have a multivariate Gaussian distribution. The GPR model, in which adjoining observations transmit information to each other, has been used frequently in prediction studies in recent years. The GPR is based on the probability theorem, which can make predictions on unknown input data and provide prediction precision, which greatly increases the significance of predictions (Sihag et al. 2018; Citakoglu 2021). Assuming certain xi and yi from a particular process and yi = f. (xi), GPR’s mean (xi) and covariance function k(xi, xj) are therefore entirely specified and distributed over the function f(xi) as follows:

$${f(x}_{i}) \sim GP \left(\mu \left({x}_{i}\right), {k}_{\theta }\left({x}_{i},{x}_{j}\right)\right).$$
(10)

A vector of parameter θ = [θ1, …,θz] is usually used in covariance functions. The simplest way to optimize these parameters of a data set w = (x, y) is to maximize the log-marginal likelihood log log p(y | x):

$$\theta =\mathrm{arg\;max}(-\frac{1}{2}{y}^{T}({K}_{\theta }+{\sigma }^{2}I{)}^{-1}y-\frac{1}{2}\mathrm{log}(\left|{K}_{\theta }+{\sigma }^{2}I\right|-\frac{N}{2}\mathrm{log}(2\pi ))$$
(11)

where x = [x1, …,xN]T and y = [y1, …,yN]T are vectors with N observed data with Gaussian noise of variance σ2, Kθ = kθ (x, x) is a N × N covariance matrix of the training data set, and I is a N × N identity matrix.

GPR model discovers the predictive distribution of the related output y* based on the data set w and a fresh input x*. The Gaussian description of the predictive distribution of y* over w is

$$P({\mathrm{y}}_{*} \;|\; (\mathrm{x},\mathrm{ y}), {\mathrm{ x}}_{*})=\mathrm{N}({\upmu }_{*}\left({\mathrm{x}}_{*}\right), {\upsigma }_{*}^{2}({\mathrm{x}}_{*}))$$
(12)

with mean μ(x) and covariance σ.2(x) given by Rasmussen and Williams (2006)

$${\upmu }_{*}\left({\mathrm{x}}_{*}\right)= {k}_{*}^{T}({K}_{\theta }+{\sigma }^{2}I{)}^{-1}y$$
(13)
$${\sigma }_{*}^{2}\left({x}_{*}\right)= {k}_{**}-{k}_{*}^{T}({K}_{\theta }+{\sigma }^{2}I{)}^{-1}{k}_{*}$$
(14)

where k  = [k(x1, x), …, k(xN, x)]T is an N × 1 covariance vector between the test and training data sets and k∗∗  = k(x, x) is the autocovariance of test data set. The user must choose a suitable covariance function to obtain the predictive mean and covariance. Further details on GPR can be found in Rasmussen and Williams (2006).

Support vector machines

SVM, created by Cortes and Vapnik in 1995, is an ML technology based on structural risk minimization (Cortes and Vapnik 1995). SVM models are classified into two categories: (a) support vector regression models and (b) support vector machine classifier models. While the former one is used to address data classification and prediction issues, the later one is used to tackle prediction problems. The goal of regression is to find a hyperplane that fits the provided data. The inaccuracy of each location on this hyperplane is determined by its distance from any other point on the hyperplane. The least-squares approach is the best method for linear regression. However, when dealing with regression problems, using the least-squares estimator in the presence of outlier data may be impractical; as a result, the processor will perform poorly. Then, a robust estimator that is not sensitive to modest model modifications should be built. In fact, the following is how a penalty function (ε) is defined:

$$\mathrm L^{\upvarepsilon}\left(\mathrm x,\mathrm y,\mathrm f\right)\left\{\begin{array}{l}0, \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;if\left|\mathrm y-\left.\mathrm f\left(\mathrm x\right)\right|\leq\upvarepsilon\right.\\\left|\mathrm y-\right.\left.\mathrm f\left(\mathrm x\right)\right|-\epsilon otherwise\end{array}\right.$$
(15)

The training data sets are S = {(x1, y1), (x2, y2), …, (xn, yn)}, and the class of the function is as f(x) = {wTx + b, w ∈ R, b ∈ R}. If the data differ from the value of, a deficiency variable must have been created based on the deviation value. The minimization is specifically defined according to the penalty function:

$$\mathrm{Minimize}\frac{1}{2}{||\mathrm{w}||}^{2}+\mathrm{C}\sum_{\mathrm{i}=1}^{\mathrm{N}}\left({\upxi }_{\mathrm{i}}+{\upxi }_{\mathrm{i}}^{*}\right)\mathrm{ Subject\;to}\left\{\begin{array}{c}f\left(\mathrm{x}\right)- {\mathrm{y}}_{\mathrm{i}}\le \varepsilon + \xi i\\ f\left(\mathrm{x}\right)- {\mathrm{y}}_{\mathrm{i}}\le \varepsilon + {\upxi }_{\mathrm{i}}^{*}\\ \xi i,{\upxi }_{\mathrm{i}}^{*}\ge 0\end{array}\right.\mathrm{ i}=1, 2, 3..,\mathrm{n}$$
(16)

where ||w||2 is the norm of weight vector and \({\upxi }_{\mathrm{i}}\) and \({\upxi }_{\mathrm{i}}^{*}\) are auxiliary deficiency variables and C is the coefficient of complexity equilibrium between the machine and the number of indivisible points achieved by trial and error (Kisi and Cimen 2011).

K-nearest neighbors

KNN is a non-parametric ML method for classification and regression issues. The KNN algorithm is largely based on the data mining technique and is thus particularly successful in classification problems. This model is fed from a training set and uses this training set to categorize objects. It classifies an unknown sample as per the known classification of its neighbors (Fadaei-Kermani et al. 2017). In general, to evaluate any methods, there are three important aspects: ease of outputs interpretation, time of calculation, and power of prediction. The KNN classification output is an object, which is categorized by most of its neighbors. The k is a positive integer, which is typically small; if k = 1, then that object is simply allocated to its nearest neighbor. Selecting the number of KNN has a large impact on the quality of the model. Prediction errors increase as the k number decreases. On the other hand, too many KNN can lead to modeling results with the so-called error of overfitting (Kutyłowska 2018).

Developing a method and relation to determine the distance between the training data and the testing data is the first step in using this model, and the distance between the training data and the testing data is commonly calculated using Euclidean distance:

$$D\left(x, p\right)= \sqrt{(x-p{)}^{2}}$$
(17)

where x is the new point and p is the learning example. After Euclidean distance is determined, the data are arranged in ascending order relative to the sample data, based on minimum distance and maximum distance. The next step is to find the number of neighbors (k). The efficiency of this method is significantly dependent on the quality of the selection of the closest sample on the reference data (Akbar Jalilzadnezamabad 2019).

Pre-processing techniques

Discrete wavelet transform (DWT)

Wavelet transform (WT) is a pre-processing technique used to determine the periodic and characteristic structure of data. WT is a mathematical operation that decomposes a signal into a set of fundamental functions, or wavelets, for a more concise description of the original data set without losing their features. WT is largely based on the Fourier transform that transports the series from the time domain to the frequency domain. Unlike the Fourier transform, the WT allows the user to calculate both the signal’s low- and high-frequency components at each time interval (Khanmohammadi et al. 2022). Wavelets are the most important parameters WT, and Morlet, Haar, Coiflets, Mexican hat type, and Daubechies wavelets are the main wavelets commonly used in wavelet analysis. For a function to be a wavelet, the mean value should be zero, and its duration should be limited. Therefore, the main wavelet must satisfy these two conditions:

$${\int }_{-\infty }^{+\infty }\uppsi \left(\mathrm{x}\right)\mathrm{dx}=0$$
(18)
$${\int }_{-\infty }^{+\infty }{\uppsi }^{2}\left(\mathrm{x}\right)\mathrm{dx}=1$$
(19)

Whole scale range and too large stack of data increase the processing time. By overlapping wavelets, to reduce the intensity of information with certain scale groups of the conversion process, discrete wavelet transform (DWT) application has been developed. DWT, also known as Mallat algorithm, by simplified decomposition application, reduces transaction volume and at the same time provides the necessary steps. The most used scale step is the binary scale and the time step. A fixed scale value is set as a0. The position of the wave on the time axis is given by b0. When the values 2 and 1 are chosen in that order, the wavelet equation is

$${\Psi }_{a,b}\left(x\right)= \frac{1}{\sqrt{{a}_{0}^{m}}}\Psi \left(\frac{x-{a}_{0}{b}_{0}n}{{a}_{0}^{m}}\right)= \frac{1}{\sqrt{{2}^{m}}}\Psi \left({2}^{m}x-n\right)$$
(20)

In case of DWT, the time series size N = 2 M must be equal to two times. M is the number of steps of the x(t) series. Therefore, for the current time series conversion where its dose not have the appropriate size, it can be expanded into an x(t) series if necessary:

$$DWT\left(x\right)= \frac{1}{\sqrt{{2}^{m}}}{\sum }_{t=0}^{N-1}X\left(t\right) \Psi \left({2}^{m}x-n\right)$$
(21)

The largest wavelet scale is obtained at 2 M, with a total conversion step of M is 1 < m < M, and a single conversion coefficient is obtained by multiplying the whole series by the largest wavelet scale. In the following steps, the wavelet width for each shrinking m value is reduced by half, and the number that represents the series is doubled. Signal subject to DWT without any loss can be divided into approximate (A) and detail (D) components. While the approximate signal (A) represents low frequency, the detail signal (D) represents high frequency. The approximate signal (A) is obtained as output in each step, and in the next step, it is considered as an input signal, and it is divided into approximate and detail components. This procedure is repeated until the required resolution is achieved. The following equations represent the sequential decomposition process with DWT:

$$x\left(t\right)=A1+D1$$
(22)
$$x\left(t\right)=A2+D2+D1$$
(23)
$$x\left(t\right)= {A}_{n}+ \sum_{i=1}^{n}{D}_{i}$$
(24)

In the final step, the original initial signal is first passed through a high-pass filter and then a low-pass filter. Outputs (y) are the approximate (from the low-pass filter) and detail signals (from the high-pass filter). Conversion coefficients for high-pass filter h(k) and low-pass filter l(k) are obtained by convolution of the signal with filters. Mathematically, this process is expressed as

$${y}_{high}= {\sum }_{t=0}^{N-1}X\left(t\right) h(2k-t)$$
(25)
$${y}_{low}= {\sum }_{t=0}^{N-1}X\left(t\right) l(2k-t)$$
(26)

Empirical mode decomposition

EMD is a signal decomposition approach described by Huang et al. (1998) that entirely empirically and data adaptively decomposes a time series signal into multiple oscillatory modes with particular periodicity. EMD decomposes any given data into intrinsic mode functions (IMF), which are simple oscillations as using shifting process (Huang et al. 1998). The following phases make up the sifting portion of the IMF algorithm:

  1. i.

    Local maximum and lowest values of the data set are determined.

  2. ii.

    The upper (xupper(n)) and lower (xlower(n)) envelopes are created by combining the local maximum and minimum values.

  3. iii.

    Average of the upper and lower envelopes is calculated as follows:

    $$a\left(n\right)= \frac{{x}_{upper}\left(n\right)+{x}_{lower}\left(n\right)}{2}$$
    (27)
  4. iv.

    The original data set is deducted from the estimated mean value:

    $$h\left(n\right)=x\left(n\right)- a\left(n\right)$$
    (28)
  5. v.

    If h(n) meets the criteria for becoming an IMF, the IMF will be φ(n) = h(n). If not, go back to the first step and continue the process until the prerequisites are met.

This method is repeated until a specific number of IMF has been obtained. The residual is the data set that remains after the decomposition process is completed. As a result, the data set is divided into two components: IMFs and residuals (Kisi et al. 2014; Özger et al. 2020; Latifoğlu 2022).

Variational mode decomposition (VMD)

VMD is a non-recursive and adaptive time–frequency analysis method created by Dragomiretskiy and Zosso (2014). VMD is a multiple adaptive band generalization of the original Wiener filter. By decomposing a one-dimensional input signal into a defined number of modes, the VMD approach addresses mode decomposition as an optimization issue. By adding up the K number of decomposition modes, the signal is entirely replicated:

$$\mathrm{H}\left(\mathrm{t}\right)={\sum }_{k=1}^{K}{\mathrm{u}}_{\mathrm{k}}\left(\mathrm{t}\right)$$
(29)

where K is the total number of modes; k is the index of modes; uk(t) is the kt mode, and it is an amplitude-modulated-frequency-modulated signal with the following formula:

$${u}_{k}\left(t\right)={A}_{k}\left(t\right)\mathrm{\;cos}\left({\varphi }_{k}\left(t\right)\right)$$
(30)

where Ak(t) and φk(t) are the time-dependent envelope and the phase of the kth mode, respectively. The corresponding instantaneous frequency (t) of the kth mode is non-negative and needs to change gradually according to the phase. It could be calculated using the following formula:

$${\omega }_{k}\left(t\right)= \frac{\partial {\varphi }_{k}(t)}{{\partial }_{t}}$$
(31)

Decomposition of time series can be stated as a constrained variational problem with the following objective function:

$$\genfrac{}{}{0pt}{}{min}{\left\{{u}_{k}\right\} \{{\varpi }_{k}\}}\left\{\sum_{k=1}^{K}{\Vert {\partial }_{t}\left[({\delta }_{t}+\frac{j}{\pi t})*{u}_{k}(t)\right]{e}^{-j{\varpi }_{k}t}\Vert }_{2}^{2}\right\}, s.t. \sum_{k=1}^{K}{u}_{t}=H\left(t\right)$$
(32)

where {uk} = {u1, …, uK} and {ϖk} = {ϖ1, …, ϖK} are the sets of all modes and their related center frequencies; δ(t) is the Dirac function; * represents the convolution; and j = \(\sqrt{-1}\). Detailed theoretical information about VMD can be found in Dragomiretskiy and Zosso (2014).

Comparison of model performances

Six different criteria were used to compare model performances in the prediction of SPI values: mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), Nash–Sutcliffe efficiency (NSE), overall index of model performance (OI), and determination coefficient (R2). MSE, RMSE, and MAE values close to 0 and R2 value close to 1 indicate that the predicted value converged strongly to the original data. The NSE takes values between − ∞ and 1. NSE values of < 1 are ideal, as this indicates a 100% success rate. Low prediction success is indicated by NSE values of between 0.3 and 0.5, acceptable prediction success is indicated by NSE values of between 0.5 and 0.7, great prediction success is indicated by NSE values of between 0.7 and 0.9, and outstanding prediction success is indicated by NSE values of between 0.9 and 1 (Nash and Sutcliffe 1970). The normalized root mean square error and model efficiency indicators are combined in the OI criterion. OI can take the following values: + 1, with 1 indicating a perfect model that predicts the same values as the measured ones (Citakoglu 2015). All these performance measures are calculated with the following equations:

$$\mathrm{MSE}=\frac{1}{N}\sum\limits_{n=1}^{N}\left({\mathrm{SPI}}_{\mathrm{p}}- {\mathrm{SPI}}_{\mathrm{c}}{)}^{2}\right.$$
(33)
$$\mathrm{RMSE}=\sqrt{\frac{1}{N}\sum\limits_{n=1}^{N}\left({\mathrm{SPI}}_{\mathrm{p}}\right.-{\mathrm{SPI}}_{\mathrm{c}}{)}^{2}}$$
(34)
$$\mathrm{MAE}=\frac{1}{N}\sum\limits_{n=1}^{N}\left|{\mathrm{SPI}}_{\mathrm{p}}-{\mathrm{SPI}}_{\mathrm{c}}\right|$$
(35)
$$\mathrm{NSE}=1- \left[\frac{ {\sum }_{\mathrm{n}=1}^{\mathrm{N}}\left({\mathrm{SPI}}_{\mathrm{c}}\right.-{\mathrm{SPI}}_{\mathrm{p}}{)}^{2}}{{\sum }_{\mathrm{n}=1}^{\mathrm{N}}\left({\mathrm{SPI}}_{\mathrm{c}}\right.-\overline{{\mathrm{SPI} }_{\mathrm{c}}}{)}^{2}}\right]$$
(36)
$$OI= \frac{1}{2}\left(2-\frac{RMSE}{{SPI}_{c max}- {SPI}_{c min}}-\frac{\sum_{n=1}^{N}({SPI}_{c}-{SPI}_{p}{)}^{2}}{\sum_{n=1}^{N}({SPI}_{c}-\overline{{\mathrm{SPI} }_{\mathrm{c}}}{)}^{2}}\right)$$
(37)
$${\mathrm{R}}^{2}=1-\left[\frac{{\sum }_{n=1}^{N}\left({\mathrm{SPI}}_{\mathrm{c}}\right.-{\mathrm{SPI}}_{\mathrm{p}}{)}^{2}}{\sum_{n=1}^{N}({SPI}_{p}-\overline{{\mathrm{SPI} }_{\mathrm{p}}}{)}^{2}}\right]$$
(38)

where \(N\) is the number of data in the available series, \({SPI}_{p}\) is the prediction value obtained from the models, and \({SPI}_{c}\) is the drought value calculated from monthly precipitation data. \(\overline{{\mathrm{SPI} }_{\mathrm{p}}}\) and \(\overline{{\mathrm{SPI} }_{\mathrm{c}}}\) are the averages of predicted and calculated values; \({SPI}_{c max}\) and \({SPI}_{c min}\) are the minimum and maximum values of the calculated series.

Model development

Models were developed in this study to predict short-term meteorological droughts of the Sakarya Meteorological Station, located in the northwest of Turkey, with the use of monthly precipitation data of 1960 − 2020 period and hybrid ML methods. Initially, drought index (SPI) values were calculated from precipitation data. SPI drought values were calculated with the use of Drought Indices Calculator (DrinC) software (Tigkas et al. 2015). In present prediction models, different lag times (t, t − 1, t − 2, t − 3… etc.) of SPI drought values for 1-, 3-, and 6-month time scales were considered as input variables and t + 1 lag time as output variable. Model success is designated by the relationships between the lag times and optimum number of input variables. Autocorrelation was applied to input variables to reveal relationship between lag times and input variables. Autocorrelation conceptually refers to the cooperation relationship between the value of a series in any period and the value of the previous or next period (Başakın et al. 2022). To determine whether there is autocorrelation in SPI drought time series, the codes of the graphic method in MATLAB 2021b software were used. After examining the presence of autocorrelation, the original drought data were divided into sub-series with DWT to determine the optimum input variable, and these sub-series were used as input data as training and test data in the ANFIS prediction model. DWT can comprehensively decompose data on a time scale, has a wide variety of wavelet families and band levels, and is more commonly used in the literature as compared to EMD and VMD. Many different wavelet families and their versions are used in the process of decomposing time series into sub-series with wavelet transform. It is difficult to make a definite opinion regarding the decomposition performance of wavelet families. Compared to most studies in the literature, in this study, analyses were made by diversifying both the wavelet family, the band level, and the input variables. The aim here is to evaluate the optimum input variable more comprehensively and accordingly to increase the performance of the prediction models. The reason why the ANFIS method is considered together with the DWT is that the ANFIS method makes fast analyses. In this study, Haar, symlets (sym3), coiflets (coif2), biorthogonal (bior1.3), reverse biorthogonal (rbio1.3), discrete approximation of Meyer (dmey), Fejer-Korovkin (fk4), and Daubechies (db40) wavelet families were taken into consideration. In addition, 3, 4, 5, 6, and 7 wavelet band levels and five lag times (t, t − 1, t − 2, t − 3, t – 4, and t − 5 input variables) were used. The decomposition process with DWT was completed using the codes in MATLAB 2021b software. Of the input data divided into sub-series, 75% (1960 − 2005) was used as training data and 25% (2006 − 2020) as test data. In the ANFIS prediction model, Gaussian (gaussmf), and triangular (triangular-trimf) functions were used as membership functions of the network, and constant and linear functions were used as output functions. Predictions were obtained with iterations of between 1 and 10 by changing the number of two − three memberships in four different combinations. The input variable that gave the best results in the DWT-ANFIS hybrid prediction model was also used in all hybrid versions. In addition, the wavelet family and wavelet band levels to be used in the hybrid versions with DWT were also determined at this stage.

Following the determination of optimum input variable with the DWT-ANFIS model, the original SPI drought time series were pre-processed using EMD and VMD techniques. The original drought series were divided into four sub-series (2D, 3D, 4D, 5D) in both methods using the codes in MATLAB 2021b software. The internal mode function (IMF) and residual components specific to these techniques were obtained in each sub-series. These components were used as input data in the ANFIS prediction model as in DWT in order to determine the optimum EMD and VMD band levels.

Sub-series obtained with DWT, EMD, and VMD were included as input data to five different ML methods (ANFIS, ANN, SVMR, KNN, GPR) as training and test data. In addition, stand-alone prediction models were created using these ML methods without pre-processing the original drought data. A total of twenty prediction models, fifteen of which are hybrid prediction models and five of which are stand-alone prediction models, were created, and the results of all these models were compared with each other.

In this study, the prediction model of the multi-layered ANN method was used. Levenberg–Marquardt method, which has been widely used in recent years and based on the standard numerical optimization technique, was used as a learning algorithm. Tansig and logsig were used as input functions, and tansig, logsig, and purelin functions were used as output functions. Predictions were obtained with THE number of one-ten neurons and with iterations varying between one-two hundred in six different combinations.

In SVMR method, a prediction model was developed by using kernel functions. The kernel functions used included Gaussian, polynomial, rbf, and linear. Gamma (γ), epsilon (ε), and penalty (c) parameters were obtained by trial and error. Predictions were obtained with true standardization and iterations varying between one-fifty.

In GPR method, a prediction model was created by using both kernel functions and radial basis function (RBF) to increase model performance. As kernel functions, ardmatern32, ardexponential, ardrationalquadratic, ardmatern52, ardsquaredexponential, matern32, exponential, matern52, squaredexponential, and rationalquadratic functions were used. Constant, nonlinear, and pure-quadratic functions were used as RBF functions. In GPR model, beta (β) and sigma (σ) parameters were calculated using the fully independent conditional approximation method. Predictions were obtained with true standardization and iterations varying between one–one hundred.

In KNN method, the K-nearest neighbor’s algorithm was used as the learning algorithm. The prediction model was developed using K-d tree functions and exhaustive functions. Euclidean, cityblock, Minkowski, and Chebyshev functions were used as K-d tree functions, and Spearman, Jaccard, Mahalanobis, correlation, Hamming, cosine, and seuclidean functions were used as exhaustive functions. The k coefficient, which shows the neighborhood relationship, was determined by trial and error, and predictions were obtained with correct standardizations.

The accuracy of the prediction models used in the study was evaluated according to MSE, RMSE, MAE, NSE, OI, and R2 performance criteria. Besides these criteria, Taylor diagrams, in which the statistical values of standard deviation, correlation, and centered root mean square difference (RMSD) can be evaluated simultaneously, were used, and the model results were compared. Radar charts showing all performance criteria together were used to more easily compare the errors of five hybrid prediction models that yielded the best results. Scatter plots of the stand-alone models and the hybrid methods that yielded the best results were also created. In addition, boxplot charts that provide statistical comparison of the errors of the models were also used.

Results and discussion

Before being use in drought analysis, present raw data supplied by the Turkish State Meteorological Service (MGM) were subjected to homogeneity, independence, autocorrelation, and stationary tests. Standard normal homogeneity test, Mann–Whitney test, autocorrelation test, Phillips–Perron test stationary test, augmented Dickey–Fuller test, and Von Neumann’s independence tests were used to check the homogeneity, stationarity, and independence of the precipitation data. The details of all the tests used in the preliminary phase can be accessed from Haktanir and Citakoglu (2014, 2015), Yagbasan et al. (2020), Citakoglu and Minarecioglu (2021), and Demir (2022a). Test results revealed that precipitation data of Sakarya station was homogeneous, independent, and stationary (Table 2).

Table 2 Preliminary test results

SPI drought values for 1-, 3-, and 6-month time scales were calculated separately in the DrinC software using the monthly precipitation values of the station between the years 1960 and 2020. Statistical parameters of the calculated SPI drought time series are given in Table 3.

Table 3 Statistical parameters of SPI drought time series

The means for 1-, 3-, and 6-month SPI drought values were found to be zero, and the standard deviations were around one. This is because the SPI conformed to standard normal distribution with a mean of zero and a variance of one. The kurtosis coefficient of the SPI drought values on a 6-month time scale was found to be 2.23. According to this parameter, some deviation from the normal distribution was observed only in this time period. The lowest SPI drought value was calculated as − 3.92 for 1-month time scale, and the highest SPI drought value was 3.69 for 6-month time scale.

Autocorrelation functions were calculated for 20 lag times, and their plots were created. Autocorrelation plots of SPI drought time series for 1-, 3-, and 6-month time scales are presented in Fig. 3.

Fig. 3
figure 3

Autocorrelation plots of SPI drought indices: a 1-month, b 3-month, c 6-month

There was no autocorrelation in the series until the 20th lag time for each time scale. All the lag times considered were within the confidence interval limits. The 10th, 12th, and 14th lag times in the SPI autocorrelation plot of the only 1-month time scale came close to the confidence interval limits. Autocorrelation plots revealed that the times up to the 20th lag time in all time scales could be used as input variables in prediction models. Comparison of the results with the use of R2 and RMSE performance criteria for 1-month time scale in DWT-ANFIS hybrid model are provided in Table 4.

Table 4 Comparison of DWT-ANFIS hybrid model results for 1-month time scale

As can be seen in Table 4, according to R2 and RMSE for 1-month DWT-ANFIS hybrid model, dmey wavelet family yielded the best prediction results. The results of fk4 and db40 wavelet families were quite close to the results of dmey. When the results of the dmey wavelet family were evaluated within themselves, the 3rd wavelet band level and the 3rd lag time (t, t − 1, t − 2, t − 3 input variables) yielded the best results. These analyses were performed for all 1-, 3-, and 6-month time scales. In all time scales, dmey wavelet family and the 3rd lag time (t, t − 1, t − 2, t − 3 input variables) yielded the best results. In addition, the 3rd wavelet band level for 1- and 3-month time scales and the 4th wavelet band level for 6-month time scale yielded the best prediction results. According to the results obtained from this model, in all hybrid models created with DWT, dmey was used as the wavelet family; the 3rd band level for 1- and 3-month time scales and the 4th band level for 6-month time scale were used as the wavelet band levels. In addition, t, t − 1, t − 2, and t − 3 were taken as the optimum lag times (input variables) in all hybrid models considered in this study.

Following the identification of the best wavelet family, wavelet band level and optimum input variable with the DWT-ANFIS hybrid model, EMD-ANFIS, and VMD-ANFIS hybrid prediction models were created. Comparison of the results obtained with these models according to R2 and RMSE performance criteria is given in Table 5 and Table 6.

Table 5 Comparison of EMD-ANFIS hybrid model results
Table 6 Comparison of VMD-ANFIS hybrid model results

As can be seen in Tables 5 and 6, based on the test performances of the models, the 2nd band level yielded the best decomposition performance in all time scales in the EMD-ANFIS hybrid model; the 5th band level for 1- and 3-month time scales and the 4th band level for 6-month time scale yielded the best decomposition performance in the VMD-ANFIS hybrid model. Therefore, IMF and residual components decomposed to the 2nd band level in all EMD hybrid prediction models, and the 4th and 5th band levels in the VMD hybrid prediction models were used in this study. These components were used as input data and training and test data in each ML method.

Following the completion of pre-processing techniques, prediction processes were completed by using all the stand-alone and hybrid prediction models. Comparison of the prediction results of the training and test data obtained from the models according to MSE, RMSE, MAE, NSE, OI, and R2 performance criteria are given in Tables 7, 8, and 9.

Table 7 Performance comparison of models for 1-month time scale
Table 8 Performance comparison of models for 3-month time scale
Table 9 Performance comparison of models for 6-month time scale

As can be seen in Tables 7, 8, and 9, the prediction performances of the test data were quite low in all stand-alone models that were not subjected to pre-processing techniques. The prediction success of hybrid models created by incorporating DWT, EMD, and VMD pre-processing techniques into these models has increased significantly. When the methods were evaluated within themselves, for 1-month time scale, VMD-ANFIS hybrid model in ANFIS models, DWT-ANN hybrid model in ANN models, VMD-GPR hybrid model in GPR models, VMD-SVMR hybrid model in SVMR models, and EMD-KNN hybrid model in KNN models yielded the best results. For 3-month time scale, VMD-ANFIS hybrid model in ANFIS models, VMD-ANN hybrid model in ANN models, VMD-GPR hybrid model in GPR models, VMD-SVMR hybrid model in SVMR models, and EMD-KNN hybrid model in KNN models yielded the best results. For 6-month time scale. DWT-ANFIS hybrid model in ANFIS models, DWT-ANN hybrid model in ANN models, DWT-GPR hybrid model in GPR models, EMD-SVMR hybrid model in SVMR models, and EMD-KNN hybrid model in KNN models yielded the best results. When all hybrid models were examined, according to the best results of each method, ANFIS and GPR methods for 1-month and 3-month time scales were compatible with VMD pre-processing method, while these methods were compatible with DWT pre-processing method for 6-month time scale. While the SVMR method was compatible with the VMD pre-processing method for 1-month and 3-month time scales, it was found to be compatible with the EMD pre-processing method for 6-month time scale. In ANFIS, GPR, and SVMR methods, it was seen that pre-processing method changed with the growth of time scales. It was determined that the KNN method was compatible with the EMD pre-processing method for all time scales.

When all the hybrid models in Tables 7, 8, and 9 were examined, it was understood that all the hybrid methods obtained by the KNN method according to the performance criteria were considerably weaker than the other hybrid methods. In general, DWT-KNN and VMD-KNN hybrid models yielded higher MSE, RMSE, and MAE values than the stand-alone models for all time scales. As can be understood from these results, it was seen that hybrid KNN models obtained by DWT and VMD pre-processing methods did not yield predictions as effectively as the other hybrid models. According to the performance criteria, the EMD-KNN model was a slightly more successful hybrid model.

As can be seen in Tables 7, 8, and 9, the model results of the test data yielded better results than the training data in six models out of twenty models belonging to each time scale. However, in many prediction studies with machine learning, the performance of training data generally yielded superior results than the test data. In this study, most of the models in which the test data were superior to the training data were hybrid models using VMD. This is because the VMD pre-processing technique removed the boundary effect (Zuo et al. 2020; Purohit et al. 2021; Latifoğlu 2022).

As seen in Table 7, although the VMD-ANFIS and VMD-GPR models yielded close results for 1-month time scale, VMD-GPR was chosen as the best hybrid model because it had the lowest MSE and RMSE and high NSE, OI, and R2 values. The MSE, RMSE, NSE, OI, and R2 values of the VMD-GPR hybrid model of 1-month time scale were found to be 0.063, 0.251, 0.9345, 0.9438, and 0.9367, respectively. As can be seen in Table 8, when all hybrid models were examined for 3-month time scale, the VMD-GPR hybrid model yielded the best result according to all performance criteria. The MSE, RMSE, NSE, OI, and R2 values of the VMD-GPR hybrid model were found to be 0.0582, 0.2413, 0.9528, 0.9559, 0.1962, and 0.9565, respectively. As seen in Table 9, DWT-ANFIS and DWT-ANN models yielded close results for 6-month time scale, and DWT-ANN was determined as the best hybrid model since it yielded the lowest MSE, RMSE, and MAE and high NSE, OI, and R2 values.

In Figs. 4, 5, and 6, the calculated and predicted SPI drought data of the hybrid model that yielded the best results, together with the stand-alone model of each method, were given.

Fig. 4
figure 4figure 4

Scatter plots of calculated and predicted SPI values for 1-month time scale

Fig. 5
figure 5figure 5

Scatter plots of calculated and predicted SPI values for 3-month time scale

Fig. 6
figure 6figure 6

Scatter plots of calculated and predicted SPI values for 6-month time scale

As seen in Figs. 4, 5, and 6, most of the stand-alone models for all time scales did not fall above the y = x (45°) line. As can be seen from these plots, it was understood that stand-alone models failed to predict drought. In hybrid models, on the other hand, it was seen that the relationship between the calculated and predicted SPI drought values was on the y = x (45°) linear curve. In other words, hybrid prediction models seemed to be successful. As can be seen in Fig. 4, except for the EMD-KNN model, other hybrid models were quite successful for 1-month time scale. In the EMD-KNN hybrid model, the y = x (45°) linear curve and the regression line did not coincide. As can be seen in Fig. 5, the linear curve of y = x (45°) and the regression line did not coincide in the hybrid models of VMD-SVMR, VMD-ANN, and EMD-KNN for 3-month time scale. As can be seen in Fig. 6, the linear curve of y = x (45°) and the regression line did not coincide in the hybrid models of EMD-SVMR and EMD-KNN for 6-month time scale.

Besides the scatter plots, Taylor diagrams, in which all the models were together and which were created to compare the model results of the test data, are presented in Fig. 7. Taylor diagram examines the statistical errors of all models and their agreement with the reference data (Citakoglu 2021; Demir 2022b). As can be seen in Fig. 7, the hybrid models of VMD-ANFIS, VMD-GPR, and VMD-SVM yielded very close results for 1-month time scale. The VMD-ANN and VMD-GPR hybrid models yielded quite close results for 3-month time scale. The DWT-ANN, DWT-GPR, and VMD-GPR hybrid models yielded similar results for 6-month time scale. When Taylor diagrams were examined in more detail, it was understood that VMD-GPR hybrid models yielded the best results for 1- and 3-month time scales, and the DWT-ANN hybrid model yielded the best results for 6-month time scale.

Fig. 7
figure 7

Taylor diagrams of SPI test data for a 1-, b 3-, and c 6-month time scales

Radar diagrams with all performance criteria are presented in Fig. 8 to more easily compare the best results of the prediction models for each time scale. As can be seen in Fig. 8, VMD-GPR hybrid models for 1- and 3-month time scales and DWT-ANN hybrid models for 6-month time scale yielded the best prediction results according to all performance criteria.

Fig. 8
figure 8

Radar diagrams of the models that yielded the best five results in the SPI test data for 1-, 3-, and 6-month time scales

Boxplot diagrams of stand-alone and hybrid models and error boxplot diagrams of these models were also drawn and presented in Figs. 9, 10, and 11. As can be seen in Figs. 9, 10, and 11, the stand-alone models did not resemble the original SPI-1, SPI-3, and SPI-6 data. According to boxplot diagrams, the highest error values were given by the stand-alone models. As can be seen in Fig. 9, both model boxplot and error boxplot diagrams of VMD-ANFIS, VMD-GPR, and VMD-SVMR models were similar for 1-month time scale. As can be seen in Fig. 10, both model boxplot and error boxplot diagrams of VMD-ANN and VMD-GPR hybrid models were similar for 3-month time scale. As can be seen in Fig. 11, both model boxplot and error boxplot diagrams of the DWT-ANN, DWT-GPR, and VMD-GPR hybrid models were similar for 6-month time scale. As can be seen from these results, the boxplot diagram results overlapped with the Taylor diagram results.

Fig. 9
figure 9

Boxplot diagrams of all models and errors for 1-month time scale. a Boxplot diagram of all models. b Error boxplot diagram of models

Fig. 10
figure 10

Boxplot diagrams of all models and errors for 3-month time scale. a Boxplot diagram of all models. b Error boxplot diagram of models

Fig. 11
figure 11

Boxplot diagrams of all models and errors for 6-month time scale. a Boxplot diagram of all models. b Error boxplot diagram of models

Conclusion

In this study, twenty different prediction models were implemented to predict the future values of the SPI drought indices for 1-, 3-, and 6-month time scales of Sakarya Meteorology Station, located in the northwest of Turkey. SPI drought indices were calculated with the use of precipitation data of station between 1960 and 2020 period. Fifteen hybrid models were created by incorporating DWT, EMD, and VMD pre-processing techniques into five different ML methods (ANFIS, ANN, SVMR, KNN, GPR). In addition, five stand-alone models were created without pre-processing. The primary objective of this study was to compare the effects of different pre-processing methods on different ML methods in the prediction of short-term SPI drought indices. The following conclusions were drawn from the present findings and analyses:

  • The dmey was the most appropriate discrete wavelet family in the prediction of SPI drought indices for three different time scales.

  • As for discrete wavelet transform, the 3rd band level yielded the best results. Increasing band levels did not improve the performance of prediction models.

  • DWT-ANFIS model revealed t, t − 1, t − 2, and t − 3 lag times input variables for the prediction of the SPI drought indices of Sakarya Meteorology Station.

  • Models with pre-processing techniques performed better than stand-alone models.

  • Hybrid models with KNN were not as successful as the other hybrid models.

  • As the time scale increases, different pre-processing techniques yielded better predictions.

  • According to performance criteria, scatter plots and Taylor and boxplot diagrams, VMD pre-processing technique yielded better results than DWT and EMD pre-processing techniques.

  • The VMD-GPR hybrid model for 1- and 3-month time scales and the DWT-ANN hybrid model for 6-month time scale yielded the best predictions for SPI drought indices.

Present findings are expected to provide important contributions to decision-makers for short-term drought incidents to be encountered in Sakarya, one of the most important industrial provinces of Turkey.