Drought forecasting using new advanced ensemble-based models of reduced error pruning tree

Shahdad, Mojtaba; Saber, Behzad

doi:10.1007/s11600-022-00738-2

Drought forecasting using new advanced ensemble-based models of reduced error pruning tree

Research Article - Hydrology
Published: 22 February 2022

Volume 70, pages 697–712, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Acta Geophysica Aims and scope Submit manuscript

Drought forecasting using new advanced ensemble-based models of reduced error pruning tree

Download PDF

317 Accesses
13 Citations
Explore all metrics

Abstract

The present study investigates the prediction accuracy of standalone Reduced Error Pruning Tree model and its integration with Bagging (BA), Dagging (DA), Additive Regression (AR) and Random Committee (RC) for drought forecasting on time scales of 3, 6, 12, 48 months ahead using Standard Precipitation Index (SPI), which is among the most common criteria for testing drought prediction, at Kermanshah synoptic station in western Iran. To this end, monthly data obtained from a 31-year period record including rainfall, maximum and minimum temperatures, and maximum and minimum relative humidity rates were considered as the required input to predict SPI. In addition, different inputs were combined and constructed to determine the most effective parameter. Finally, the obtained results were validated using visual and quantitative criteria. According to the results, the best input combination comprised both meteorological variable and SPI along with lag time. Although hybrid models enhanced the results of standalone models, the accuracy of the best performing models could vary on different SPI time scales. Overall, BA, DA and RC models were much more effective than AR models. Moreover, RMSE value increased from SPI (3) to SPI (48), indicating that performance modeling would become much more challenging and complex on higher time scales. Finally, the performance of the newly developed models was compared with that of conventional and most commonly used Support Vector Machine and Adaptive Neuro-Fuzzy Inference System (ANFIS) models, regarded as the benchmark. The results revealed that all the newly developed models were characterized by higher prediction power than ANFIS and ANN.

Application of artificial intelligence hybrid models for meteorological drought prediction

Article 25 December 2022

Fusion-based approach for hydrometeorological drought modeling: a regional investigation for Iran

Article 13 March 2024

Forecasting of meteorological drought using ensemble and machine learning models

Article Open access 11 September 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Drought remains among the costliest climatic threats many countries are facing. In a general definition, drought is a long-term climatological period that experiences a less-than-average amount of precipitation, which in turn leads to deficit in water resources and other major problems in agricultural and economic spheres, ecosystems, human health, etc. (Wu et al. 2001; Quiring and Papakryiakou 2003; Belayneh et al. 2016; Wilhite and Buchanan-Smith 2005; Liu et al. 2021). This phenomenon is often interpreted as a creeping phenomenon. To be specific, compared to other natural disasters such as flood, landslide and earthquake, the resulting damage of this phenomenon gradually pans out (Rossi 2000; Wilhite et al. 2007). Situated in the Middle East, Iran is generally categorized as an arid to semi-arid country since two-thirds of its area are mostly desert.

Definition of drought varies from one region to another, and its classification is specific to every region. American Meteorological Society (1997) classified drought in 4 main terms: meteorological (based on precipitation), hydrological (reservoir storage), agricultural (soil moisture and streamflow) and socio-economic categories. Drought is monitored using different drought indices that detect drought conditions and trends based on the precipitation deviation from the normal level (Paulo et al. 2012), soil moisture deficit and reduction in surface and underground flows (Zargar et al. 2011). Palmer Drought Severity Index (Palmer 1968), SPI (McKee et al. 1993), Standardized Precipitation Evapotranspiration Index (Vicente-Serrano et al. 2010), Rainfall Anomaly Index (van Rooy 1965), Crop Moisture Index (Palmer 1968) and Surface Water Supply Index (Doesken and Garen 1991) are some important instances of drought indices. SPI is among the well-known drought indicators (McKee et al. 1993) that has been introduced as a standard global-scale meteorological drought index (Wardlow et al. 2012). Being measurable on different time scales including 1, 3, 6, 9, 12, 18, 24 and 48 months, SPI is used to observe the historical trend of drought on 3 long-, short- and medium-term scales. SPI enjoys several advantages; for instance, it can be measured using only precipitation information/data at a high confidence level, compared to the case of soil moisture. These benefits contribute to the popularity of this index in drought studies (see Hayes 1999; Szalai and Szinell 2000; Bordi and Sutera 2001; Lloyd-Hughes and Saunders 2002; Vicente-Serrano et al. 2004; Tsakiris et al. 2007; Shukla and Wood 2008; Raziei et al. 2009; Palchaudhuri and Biswas 2013; Portela et al. 2015; Ionita et al. 2016; and Kadam et al. 2021).

Drought monitoring helps raise awareness of the onset of drought and identify its magnitude and level in the past. However, more importantly, proper drought forecasting is needed to gain an insight into possible future drought in the region. Drought forecasting seems necessary in managing and reducing its adverse effects. Recently, a number of researchers have expressed great interest in regression models (Leilah and Al-Khateeb 2005), time series models of Auto-Regressive Integrated Moving Average (ARIMA) (Han et al. 2010), probability and analytic models of auto-covariance matrix (Cancelliere et al. 2007), Artificial Neural Network (ANN) models (Morid et al. 2007; Barua et al. 2012), Adaptive Neuro-Fuzzy Interface System (ANFIS) (Mokhtarzad et al. 2017; Kisi et al. 2019), extreme learning machine (Deo and Sahin 2015) and Support Vector Machine (SVM) (Khan et al. 2020) methods for predicting drought.

Mishra and Desai (2005) managed to make a combination of ANN and linear stochastic models based on SPI series in Kansabati River Basin, India. The mentioned model managed to predict drought with high accuracy. Bacanli et al. (2009) investigated the efficiency of Feed Forward Neural Network (FFNN) and ANFIS models in predicting drought based on the SPI series Central Anatolia, Turkey. They found that ANFIS model outperformed FFNN model. Shirmohammadi et al. (2013) investigated the efficiency of ANN, ANFIS, Wavelet-ANFIS and Wavelet-ANN models in predicting drought based on the SPI series in Azerbaijan, Iran. Their results demonstrated that all the aforementioned models could predict SPI; however, the hybrid Wavelet-ANFIS model outperformed the others. Mokhtarzad et al. (2017) compared the efficiencies of ANN, ANFIS and SVM in predicting drought based on the SPI series using meteorological station data of Bojnourd province, Iran. According to the results, SVM outperformed ANN and ANFIS in terms of accuracy.

Kisi et al. (2019) investigated the precision of four evolutionary neuro-fuzzy methods, namely Adaptive Neuro-Fuzzy Inference System with Particle Swarm Optimization (ANFIS-PSO), ANFIS with Genetic Algorithm (ANFIS-GA), ANFIS with Ant Colony Algorithm (ANFIS-ACO) and ANFIS with Butterfly Optimization Algorithm (ANFIS-BOA). Then, they made a comparison between the precision of these methods and that of the classical ANFIS method in predicting SPI time series at Abbasabad and Biarjmand Stations, Semnan, Iran. According to the results obtained from Ebrahim-Abad Station, the ANFIS-PSO method exhibited the best prediction precision on different SPI time scales.

Iran, an arid to semi-arid region, receives the mean rainfall of 250 mm, which is about a quarter of the world average (Mahdavi 2010). In recent years, Iran has been subjected to persistent and brutal droughts that caused severe shortage of surface water and groundwater resources, followed by subsequent environmental and agricultural adverse effects. This issue triggered further investigation and characterization of drought phenomenon in different regions of Iran through several studies recently conducted by Raziei et al. (2009), Zarch et al. (2011), Moradi et al. (2011), Mirabbasi et al. (2013), Saghafian and Mehdikhani (2014), Raziei et al. (2015) and Rezaei et al. (2016). The western part of Iran is a vital area that contains a significant amount of water supply in the country since it is the source of 3 major rivers of Karkheh, Dez and Karoon. The Karkheh basin is one of the regions that are frequently affected by drought (Byzedi et al. 2012; Ashraf Vaghefi et al. 2014; Kamali et al. 2015; Zamani et al. 2015; Kamali et al. 2017). This basin is shared by some of Iranian provinces including Hamedan, Kermanshah, Kurdistan, Ilam, Lorestan and Khuzestan. Karkheh River, after Karoon and Dez rivers, is considered the third biggest river in Iran that plays a significant role in providing a large share of water to many parts of Iran. This is the main reason why droughts in this basin cause many challenges in agricultural and economic sectors of the mentioned provinces. Recently, considerable attention has been drawn to tree-based models all over the world and it has been stated that tree-based models are more effective and have higher prediction power than ANFIS, SVM and ANN models. Hussain and Khan (2020) found that Random Forest (RF) model outperformed both ANN and SVM models in forecasting monthly river flow. Shamshirband et al. (2020) argued that M5 model trees provided a better prediction of the standardized streamflow index than SVM and Gene Expression Programming.

In this regard, first and foremost, this study has 4 main objectives to follow: (a) to forecast the next drought occurrence at the Kermanshah synoptic station on time scales of 3, 6, 12 and 48 months using SPI index as well as new algorithms for the standalone model of REPT and its new integration with Bagging (BA-REPT), Dagging (DA-REPT), Additive Regression (AR-REPT) and Random Committee (RC-REPT); (b) to investigate the prediction power of the meteorological variable versus lag-time SPI (i.e., SPI_(t-1), SPI_(t-2) and so on) as 2 inputs, which have not been compared with each other yet (investigated only separately); (c) to determine which input scenario exhibits better performance in drought forecasting; and (d) to develop a predictive model to forecast future drought based on the past-to-current data. The findings of the current study can assist decision-makers and the rest of Natural Resources Bureau with better management of drought risk threatening the basin.

Study area

This research is centered on the Karkheh Watershed (Fig. 1) which is 50768 km² in size, and it is located in the central and southwestern regions of the Zagros Mountains with latitudes and longitudes ranging from 30° 08ʹ to 35° 04 ʹ and from 46° 06ʹ to 49° 10ʹ, respectively. Nearly 55.5% of the basin is located in mountainous areas and the rest in plains and foothills. The climate of Karkheh basin is Mediterranean-oriented. The mean annual rainfall fluctuates from 150 mm in the South to more than 1000 mm in the North and East parts of the basin. In addition, the mean annual air temperature fluctuates from less than 5 °C over the high mountains to 25 °C in the southern areas.

Methodology

Dataset

A 30-year set of monthly recorded data including the maximum relative humidity (RH_Max), minimum relative humidity (RH_Min), maximum temperature (T_Max), minimum temperature (T_Min), and rainfall was compiled at the Kermanshah synoptic station. In this study, while SPI was considered the target/output variable, other variables and SPI with lag time (i.e., SPI_(t−1), SPI_(t−2), etc.) were considered as the inputs used for predicting the target variable. The input and output datasets were categorized into 3 subsets including 70% (from January 1988 to December 2008) for model development and 30% (from January 2009 to December 2018) for model validation. 70:30 is the most widely used ratio in ML modeling (Khosravi et al. 2018a, b; Khosravi et al. 2019; Venegas-Quiñones et al. 2020; Khosravi et al. 2021a, b, c; Kargar et al. 2021; and Panahi et al. 2021). Table 1 presents the descriptive statistics of the development, calibration and validation datasets.

Table 1 Data characteristic for training and testing sections

Full size table

The input data added to the SPI include the monthly rainfall data collected from Kermanshah synoptic station. To be specific, the monthly precipitation dataset was prepared for a period of 30 × 12 = 420 months. The set of averaging periods is n = 3, 6, 12 and 48 months that represents the typical time scales for precipitation deficits. In this dataset, for each month, a new value is determined from the previous n months. Each of the datasets is fitted to the Gamma function to define the relationship of probability to precipitation. The probability of any observed precipitation data point was calculated; then, it was used to measure the precipitation deviation for a normally distributed probability density with a mean of zero and standard deviation of unity. This value represents the SPI for a particular precipitation data point.

Given that the available data are homogenous, time series on monthly scales of 3, 6, 12 and 48 will be constructed, and finally, a time series will be fitted into the Gamma distribution. Therefore, the probability density function is calculated as follows (Kisi et al. 2019):

$$g(x) = \frac{1}{{\beta^{\alpha } \Gamma (\alpha )}}X^{(a - 1)} e^{ - X/\beta } \;\;{\text{for}}\;\,X > 0,$$

(1)

where $\alpha$ and $\beta$ represent the shape factor and scale factor, respectively, and $\Gamma (a)$ is the Gamma function defined as follows (Kisi et al. 2019):

$$\Gamma (\alpha ) = \int\limits_{0}^{\infty } {x^{(a - 1)} e^{( - x)} {\text{d}}x},$$

(2)

where $\alpha$ and $\beta$ are related to the Gamma density functions for each station and each time scale for every month of the year can be determined. McKee et al. (1993) predicted $\alpha$ and $\beta$ using the optimum maximum likelihood method (Kisi et al. 2019):

$$\alpha = \frac{1}{4A}\left( {1 + \sqrt {1 + \frac{4A}{3}} } \right)$$

(3)

$$A = {\text{Ln}}(\overline{X}) - \frac{{\sum {{\text{Ln}}(X)} }}{n}$$

(4)

$$\beta = \frac{{\overline{X}}}{a},$$

(5)

where n is the number of rainfall data samples (i.e., observations) and $\overline{X}$ the mean rainfall in a specific period. Next, the aforementioned parameter is used for calculating the value of rainfall cumulative probability on a specific time scale. Rainfall cumulative probability calculated in Eq. 6, with the assumption of $t = X/\beta$, can be converted to a deficit Gamma function:

$$G(X) = \int\limits_{0}^{x} {g(x)}.$$

(6)

Given that Gamma function for X = 0 has not been defined yet, whenever the value of rainfall distribution reaches zero, the cumulative probability is calculated as follows:

$$H\left( X \right) = q + \left( {1 - q} \right)G\left( X \right),$$

(7)

where q is zero rainfall probability (q = m/n) and m is zero value in a time series of rainfall data. SPI can be predicted through the following equations (Kisi et al. 2019):

$${\text{SPI}} = - \left[ {t - \frac{{c_{0} + c_{1} t + c_{2} t}}{{1 + d_{1} t + d_{2} t^{2} + d_{3} t^{3} }}} \right], \, 0 < H(X) \le 0.5$$

(8)

$${\text{SPI}} = + \left[ {t - \frac{{c_{0} + c_{1} t + c_{2} t}}{{1 + d_{1} t + d_{2} t^{2} + d_{3} t^{3} }}} \right], \, 0.5 < H(X) \le 1.0.$$

(9)

Regression form of t can be calculated as follows (Kisi et al. 2019):

$$t = \sqrt {\ln \left( {\frac{1}{{H(X)^{2} }}} \right)} , \, 0 < H(X) \le 0.5$$

(10)

$$t = \sqrt {\ln \left( {\frac{1}{{(1 - H(X))^{2} }}} \right)} , \, 0.5 < H(X) \le 1.0.$$

(11)

Constant coefficient values in these equations are found in Table 2.

Table 2 Constant coefficient values in SPI equations (Kisi et al. 2019; McKee et al. 1993)

Full size table

Constructing input combinations

The role of determining the most instrumental input variables in the modeling prediction power cannot be ruled out. In this respect, first, the correlation coefficient (r) between the input variables and each SPI with different time scales was obtained and the values were considered as the bases for constructing different input combinations. At first, the input variables with the highest r values were determined as the first input. Next, those variables with the second highest r values were added to the first input to construct input No. 2. Therefore, this approach should be used until the variables with the lowest r values were added to construct input No. 9. To determine the most effective input combination, models with all inputs were applied (Table 3), and finally, the most effective one with the lowest value of Root Mean Square Error (RMSE) was considered as the best input scenario. This approach was among popular methods for creating and examining the input scenario through ML modeling (Monteiro Junior et al. 2019; Nhu et al. 2020; Salih et al. 2020; Meshram et al. 2021).

Table 3 Various input combinations for SPI on different time scales

Full size table

Determining optimum values

Another step that significantly facilitates and affects prediction power modeling is determining the optimum values for the parameters of the models. All the models except ANFIS and SVM (performing in MATLAB software) were developed in a Waikato Environment for Knowledge Analysis (WEKA 3.9) software. To this end, the optimum values for the parameters of the models selection were obtained through trial and error. The first models were applied using default values, and then, larger and smaller values were considered. Consequently, the models were reapplied. Using this approach continues until determining the optimum values (Khosravi et al. 2021a, b, c). Similar to the previous section, RMSE criterion is considered as the metric to obtain the optimum value.

Descriptions of the models

Reduced error pruning tree (REPT)

REPT is a radical simplification of Decision Tree (DT) based on the “if–then” rule that is used for linking a set of predictors (x_i) to one predicted variable (y) and conducting in-depth research on suitable parameters among a large number of trees (Wang et al. 2020). The cumulative results of several iterations will yield several trees. In this respect, Mean Square Error (MSE) is used in Reduced Error Pruning (REP) to prune the unsuitable tree initially provided by the regression tree (Lalitha et al. 2020). The splitting criteria adopted by the REP Tree include the information gain ratio and error minimization of variance (Saha et al. 2020). The major benefit of REP Tree is its capability to accurately reduce the complexity of DT, which is widely regarded as the most significant deficiencies of DT approaches. In addition, the error resulting from variance is considerably reduced (Abdar et al. 2020; Murwendo et al. 2020). Given that there are a large number of trees in the DT, at each node, the error is computed and compared to each class, the total aggregate error is then recorded, and the most significant errors are finally pruned, this process is referred to as “divide and conquer” (Li et al. 2020).

Bootstrap aggregation (bagging)

To enhance the accuracy of the individual decision tree models, the idea of establishing an ensemble of methods was suggested, which was greatly conducive to the betterment of the accuracy, precision and robustness of the decision tree models. Bootstrap aggregating, also called Bagging (BA), is one of the most and well-known algorithms that function based on the idea of generating multiple models and aggregating them into a unique and coherent aggregated predictor (Sánchez-Medina et al. 2020). BA is an ML ensemble meta-algorithm proposed to enhance the accuracy of ML models used in both statistical classification and regression approaches. It also reduces the variance and helps avoid overfitting. One of the major contributions of the ensemble methods is their capability to decrease the variances of the regression and classification errors (Chen et al. 2020) and overcome the overfitting problem encountered when using the single tree (Lee et al. 2020). BA works by drawing each training pattern through Bootstrap sampling; consequently, n training samples yield n different sets of Out-of-Bag instances (Liu and Chen 2020). BA model can be established in 3 steps. First, the training dataset should be randomly re-sampled, thus providing a set of training subsets with the same size. In the second stage, an individual model is designed and trained for each subset. Finally, a coherent aggregated predictor is constructed based on the averaging approach (Chen et al. 2020). BA model ensures the enhancement of unstable procedures including ANN, classification and regression trees and subset selection in Linear Regression. BA model can improve preimage learning. However, it can mildly degrade the performance of stable methods such as K-nearest neighbors.

Disjoint aggregating (Dagging)

DA is one of the meta-algorithms that were first introduced by Ting and Witten (1997). This meta-classifier forms several disjoint stratified folds based on the data and feeds each chunk of data to a copy of the supplied base classifier. Since all the generated base classifiers were put into the Vote meta-classifier, predictions were made possible through averaging (Chen et al. 2022; Zhao et al. 2020). DA is suitable for base classifiers that are quadratic or worse in time behavior with respect to some instances in the training data. During computation, with N patterns forming the training dataset, the DA built M subset of data where each n pattern does not follow any common pattern. Therefore, an exclusive model was formed for each dataset and the final model was selected based on voting strategy (Ting and Witten 1997).

Additive regression (AR)

The AR model is classified as a type of nonparametric regression. It is a part of Alternating Conditional Expectations (ACE) algorithm that was first introduced by Friedman and Stuetzle (1981). ACE and AR are more flexible due to less curse of dimensionality, hence used for predicting much more complex phenomena. AR is a general (potential and nonlinear) regression model that incorporates a special case of linear regression. Suppose that variable ${Y}_{i} (i=1, 2, \dots ,n)$ is unrestricted function ${f}_{j} (j=1, 2, \dots ,p)$, determined by the input variables ${X}_{i1}, {X}_{i2}\dots ,{X}_{ip}$, respectively. In this model, a mathematical equation is proposed (Xu and Lin 2017):

$$Y_{i} = \mathop \sum \limits_{j = 1}^{p} f_{j} \left( {X_{ij} } \right) + \mu_{i} , \, \mu_{i} \sim iid\left( {0,\sigma^{2} } \right),$$

(12)

where ${f}_{j}({X}_{ij})$ is the nonparametric function that fits the data. The random error term $({\mu }_{i})$ has zero mean and variance of ${\sigma }^{2}$.

Random committee (RC)

Hybrid models produced by a combination of more than 2 artificial intelligence techniques are generally called Committee Machine (CM). The major advantage of the CMs is their ability to constitute a robust model with a necessary know-how to compensate for the deficiencies currently attributed to the individual model (Ghiasi-Freez et al. 2012). RC is a type of the CM learning approach to solving both classification and regression problems, and it is considered one of the promising ensemble models (Niranjan et al. 2017). Upon using the RC, an ensemble of randomizable base regressors or classifiers should be developed, in which each classifier is formed based on identical data, but uses a unique random number seed. The final response of the model is obtained by averaging the prediction results of each individual model (Witten and Frank 2005; Lira et al. 2007).

Model performance evaluation

The present study presents a visual method of scatter plot and offers some quantitative metrics including RMSE, Mean Absolute Error (MAE), Nash Sutcliff Efficiency (NSE), Percentage of BIAS (PBIAS), Coefficient of Persistence (CP) and the ratio of RMSE to standard deviation of the observations (RSR). These metrics are measured in the following:

$${\text{RMSE}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {({\text{SPI}}_{e} - {\text{SPI}}_{o} )^{2} } }$$

(13)

$${\text{MAE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {{\text{SPI}}_{e} - {\text{SPI}}_{o} } \right|}$$

(14)

$${\text{NSE}} = 1 - \frac{{\sum\nolimits_{i = 1}^{n} {({\text{SPI}}_{e} - {\text{SPI}}_{o} )^{2} } }}{{\sum\nolimits_{i = 1}^{n} {({\text{SPI}}_{e} - \overline{{{\text{SPI}}}}_{e} )^{2} } }}$$

(15)

$${\text{PBIAS}} = \left( {\frac{{\sum\nolimits_{i = 1}^{n} {({\text{SPI}}_{o} - {\text{SPI}}_{e} )} }}{{\sum\nolimits_{i = 1}^{n} {{\text{SPI}}_{e} } }}} \right)*100$$

(16)

$${\text{RSR}} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {({\text{SPI}}_{e} - {\text{SPI}}_{o} )^{2} } }}{{\sum\nolimits_{i = 1}^{n} {({\text{SPI}}_{e} - \overline{{{\text{SPI}}}}_{e} )^{2} } }}}$$

(17)

$${\text{CP}} = 1 - \frac{{\left[ {\sum {\left( {{\text{SPI}}_{O(i)} - {\text{SPI}}_{e(i)} } \right)}^{2} } \right]}}{{\left[ {\sum {\left( {{\text{SPI}}_{O(i)} - {\text{SPI}}_{e(i - j)} } \right)}^{2} } \right]}}$$

(18)

$$R^{2} = \left[ {\frac{{\sum\nolimits_{i = 1}^{n} {({\text{SPI}}_{o} - \overline{{{\text{SPI}}}}_{o} ) - ({\text{SPI}}_{e} - \overline{{{\text{SPI}}}}_{e} )} }}{{\sqrt {\sum\nolimits_{i = 1}^{n} {({\text{SPI}}_{o} - \overline{{{\text{SPI}}}}_{o} )^{2} } } \sqrt {\sum\nolimits_{i = 1}^{n} {({\text{SPI}}_{e} - \overline{{{\text{SPI}}}}_{e} )}^{2} } }}} \right],$$

(19)

where SPI_o and SPI_e are the computed and predicted values, respectively. $\overline{{{\text{SPI}}}}_{e}$ and $\overline{{{\text{SPI}}}}_{o}$ are the average forecasted and measured values, j is the prediction lead and n is the number of datasets used. The lower the RMSE and MAE, the better the performance of the models. Given that the closer the NSE to 1, the better the model performance, in which case NSE varies between – and 1. In addition, the closer PBIAS and RSR get to zero, the higher the prediction power of models will be. CP is used to compare the performance of this model with that of a simple model using the observed value of the previous day as the prediction for the current day. The maximum value of PI, which is equal to 1, is indicative of a perfect fit. The values lower than 0 suggest that it is better to accept the last SPI as a forecast instead of using the tested model. R² varies between 0 and 1, and the model with R² = 1 exhibits perfect performance.

Results and analysis

Effectiveness of input variables

According to the findings in this study, in the case of SPI prediction on a three-month time scale, T_max had the highest effect on the modeling process (r = 0.61), followed by T_min (r = 0.60), RH min (r = 0.51), RH_max (r = 0.43) and rainfall (r = 0.41) (Table 4). Table 5 presents the correlation coefficients between SPI for different time scales and their lags. According to the results, followed by determining the highest value of the time scale (e.g., from 3 to 48 months), the rainfall variable gains higher significance than other input variables. For example, rainfall had the least effect on the SPI prediction on a three-month time scale; however, it had the highest effect on the 12 month time scale, followed by the 48 month time scale as the second effective variable. Further, the effectiveness of each variable in the final outcome declined on higher time scales because in this situation, i.e., higher time scales, the prediction process became complicated, especially when we consider the erratic nature of atmospheric variables.

Table 4 Correlation coefficient between input and output variables

Full size table

Table 5 Correlation coefficient between SPI and their lags

Full size table

Best input scenario

Figure 2 shows the results of the best input scenario. The best input scenario clearly varies in different models because each model has a different structure in a way that is developed according to its specific structure. For example, for SPI prediction on a three-month time scale, Input 7 is the most effectively instrumental scenario for the standalone REP Tree model, while for BA-REP Tree model, Input 9 is the best one and Input 8 has the highest effect for the remaining of the models. In the case of SPI on a 6 month time scale, Input 4 is the optimal input scenario with the lowest RMSE. In the cases of SPI on 12 and 48 month time scales, Input 6 has the highest effect on the modeling process.

Model evaluation and comparison

Upon determining the most effective input scenario for each model, the models developed for each time scale are shown in Fig. 3. According to this table, on the 3 month time scale, BA-REP Tree exhibits the highest performance (R² = 0.856), followed by RC-REP Tree (R² = 0.790), DA-REP Tree (R² = 0.761), AR-REP Tree (R² = 0.721) and REP Tree (R² = 0.690). On the 6 month time scale, BA-REP Tree has the highest performance (R² = 0.842), followed by DA-REP Tree (R² = 0.830), RC-REP Tree (R² = 0.824), AR-REP Tree (R² = 0.770) and REP Tree (R² = 0.703). On the 12 month time scale, RC-REP Tree shows the highest performance (R² = 0.774), followed by BA-REP Tree (R² = 0.763), DA-REP Tree (R² = 0.752), AR-REP Tree (R² = 0.745) and REP Tree (R² = 0.720). Finally, on the 48 month time scale, BA-REP Tree exhibits the highest performance (R² = 0.867), followed by DA-REP Tree (R² = 0.855), AR-REP Tree (R² = 0.852), RC-REP Tree (R² = 0.832) and REP Tree (R² = 0.821). Therefore, on time scales of 3 to 48 months, the best model with the highest performance is observed, as implied by both data pattern and model structure.

R² metric indicates the performance of the models, yet it is subject to a number of drawbacks such as high sensitivity to outlier and maximum values. Another drawback lies in the primacy of the model precision over its accuracy. For example, a model with high R² value only has high precision depite its very low performance (low accuracy). To overcome this problem, a number of other quantitative metrics should be employed (Table 6).

Table 6 Model evaluation and comparison in the testing period

Full size table

Results revealed that on the time scale of three months, BA-REP Tree outperformed the other models (RMSE = 0.269, MSE = 0.207, NSE = 0.798 and RSR = 0.449), followed by RC-REP Tree (RMSE = 0.306, MAE = 0.212, NSE = 0.739 and RSR = 0.511), AR-REP Tree (RMSE = 0.348, MSE = 0.246, MAE = 0.662 and RSR = 0.581), DA-REP Tree (RMSE = 0.352, MAE = 0.277, NSE = 0.654 and RSR = 0.588) and REP Tree (RMSE = 0.369, MAE = 0.281, NSE = 0.621 and RSR = 0.616). According to the PBIAS metric, all the models underestimated SPI values (positive value).

On the time scale of 6 months, DA-REP Tree outperformed the other models (RMSE = 0.387, MSE = 0.313, NSE = 0.759 and RSR = 0.449), followed by AR-REP Tree (RMSE = 0.306, MAE = 0.212, NSE = 0.739 and RSR = 0.511), RC-REP Tree (RMSE = 0.399, MAE = 0.323, NSE = 0.744 and RSR = 0.506), BA-REP Tree (RMSE = 0.399, MAE = 0.332, NSE = 0.743 and RSR = 0.515) and REP Tree (RMSE = 0.468, MAE = 0.364, NSE = 0.649 and RSR = 0.592).

On the time scale of 12 months, RC-REP Tree outperformed the other models (RMSE = 0.313, MAE = 0.205, NSE = 0.745 and RSR = 0.505), followed by BA-REP Tree (RMSE = 0.316, MAE = 0.212, NSE = 0.740 and RSR = 0.510), DA-REP Tree (RMSE = 0.327, MAE = 0.207, NSE = 0.721 and RSR = 0.528), AR-REP Tree (RMSE = 0.332, MAE = 0.209, NSE = 0.714 and RSR = 0.535) and REP Tree (RMSE = 0.353, MAE = 0.243, NSE = 0.675 and RSR = 0.570).

On the time scale of 48 months, DA-REP Tree outperformed the other models (RMSE = 0.411, MAE = 0.300, MAE = 0.750 and RSR = 0.500), followed by AR-REP Tree (RMSE = 0.413, MAE = 0.302, NSE = 0.738 and RSR = 0.502), BA-REP Tree (RMSE = 0.453, MAE = 0.346, NSE = 0.697 and RSR = 0.551), RC-REP Tree (RMSE = 0.474, MAE = 0.350, NSE = 0.669 and RSR = 0.575) and REP Tree (RMSE = 0.494, MAE = 0.385, NSE = 0.639 and RSR = 0.601). Furthermore, based on the PBIAS metric, it was observed that all the developed models underestimated SPI values (positive PBIAS value).

Results demonstrated that all hybrid algorithms enhanced the modeling performance of the standalone REP Tree algorithm. On the time scale of 3 months, BA-, RC-, AR- and DA-models improved the performance of the REP Tree model by about 22.25%, 15.96%, 6.1% and 5.1%, respectively, based on the NSE metric. These enhancement rates within 6 months were about 12.65%, 12.76%, 13.2% and 14.5%. On the 12 month time scale, they were about 8.7%, 9.4%, 5.5% and 6.3%, respectively, and on the 48 month scale, they were 8.3%, 4.5%, 13.4% and 14.8%, respectively. According to the findings, the BA algorithm outperformed the other models in terms of performance enhancement (22.25%).

According to NSE metric, the standalone REP Tree model in all the cases exhibited a favorable performance (0.75 > NSE > 0.65). On the contrary, in most of these cases, the hybrid models exhibited an excellent performance (1 > NSE > 0.75). Furthermore, while comparing the performances of the models on all time scales of 3, 6, 12 and 48 in predicting SPI, lower model performance was observed on higher time scales; take, for example, the comparative performances of BA-REP Tree (time scale of 3 months), DA-REP Tree (time scale of 6 months), RC-REP Tree (time scale of 12 months) and DA-REP Tree (time scale of 48 months) with the NSE values of 0.798, 0.759, 0.745 and 0.750, respectively.

According to the NSE metric, no NSE value higher than 0.85 has been achieved for hybrid models yet; however, while predicting many other variables, NSE can reach about 0.980 by BA-M5P to consequently predict the suspended sediment load in glacierized Andean catchment in Chile (Khosravi et al. 2018b), about 0.94 by BA-M5P to predict the bed load transport rate (Khosravi et al. 2020a), about 0.98 by the weighted instance handler wrapper (WIHW-Kstar) model to predict the bridge pier scour depth (Khosravi et al. 2021a; b, c, d, e), about 0.94 by instance-based K-nearest neighbors model to predict the Fluoride concentration (Khosravi et al. 2020b), about 0.99 for river water salinity prediction by AR-M5P (Melesse et al. 2020), about 0.90 for shear stress distribution prediction by RF model (Khozani et al. 2020) and about 0.94 for water quality index prediction by BA-RT (Tien Bui et al. 2020). This shows that the atmosphere-related prediction variables, especially drought perdition variables, are more erratic than other variables and their prediction includes more uncertainty.

To compare the prediction power of newly developed models, the old and most widely used conventional ML models of SVM and ANFIS were taken into account as the benchmark (Fig. 4). Based on the obtained results, it can be concluded that new developed models in this study have higher predictive power than those of both ANFIS and SVM models.

One of the significant criteria that affect the results is the length of the training and testing datasets. Although there is no standard to determine what percentage of data has been used for training and testing datasets, previous studies (Liu et al. 2020; Zhao et al. 2021) proved that the testing dataset needs to be representative of approximately 10–40% of the size of the whole dataset. The 70:30 ratio is often used for training and testing in machine learning models (He et al. 2020, 2021; Chen et al. 2021; Che and Wang 2021; Liang et al. 2022). Another method that has a great effect on the result is the selection of the best input combination with input variables. Sometimes, this variable has either a null or negative effect on the result; thus, it must be determined and removed from the modeling. Although there are some methods that draw the best input scenario automatically such as Principal Component Analysis (PCA), Khosravi et al. (2020b) proved that constructing different input combinations and evaluating them would sound more effective than employing the PCA method. In the literature, while some research papers used the meteorological variables as the input to predict SPI, some others considered SPI with lag time as the input. The findings of this study confirmed that a combination of both SPI and meteorological variables as the input could significantly enhance the modeling performance. Due to the different structure of each model and each data pattern, the best model with high accuracy could be different on different time scales. In other words, BA, DA and RC models are more effective than AR models.

Conclusion

This research attempted to predict drought using SPI as a drought indicator on different time scales of 3, 6, 12 and 48 months using the standalone REP Tree model and several hybrid models of BA, DA, RC and AR algorithms. The following statements briefly summarize the overall findings of this study.

1.
Meteorological variables failed to predict SPI accurately.
2.
SPI with lag-time as an input was much more effective than the meteorological variables.
3.
Combination of SPI and lag-time and meteorological variables as the inputs could improve the modeling prediction power.
4.
The best input scenario varied on different time scales.
5.
The model with high accuracy did not function similarly on different time scales.
6.
Standalone model had a good performance; however, hybrid models exhibited excellent performance in most of these cases.
7.
Modeling performance decreased upon increasing the time scale from 3 to 48 months.
8.
All of BA, DA and RC models were much effective than the AR model.
9.
Upon increasing the time scale from 3 to 48 months, the efficiency of the variable T_max in the SPI prediction decreased and that of rainfall increased (using correlation coefficient).
10.
All the newly developed models exhibited more favorable performance than conventional ANFIS and SVM models.

Data availability

Available from the corresponding author upon reasonable request.

References

Abdar M, Zomorodi-Moghadam M, Zhou X, Gururajan R, Tao X, Barua PD, Gururajan R (2020) A new nested ensemble technique for automated diagnosis of breast cancer. Pattern Recognit Lett 132:123–131. https://doi.org/10.1016/j.patrec.2018.11.004
Article Google Scholar
American Meteorological Society (1997) Meteorological drought—Policy statement. Bull Am Meteorol Soc 78:847–849
Article Google Scholar
Ashraf Vaghefi S, Mousavi SJ, Abbaspour KC, Srinivasan R, Yang H (2014) Analyses of the impact of climate change on water resources components, drought and wheat yield in semiarid regions: Karkheh river basin in Iran. Hydrol Process 28(4):2018–2032
Article Google Scholar
Bacanli UG, Firat M, Dikbas F (2009) Adaptive neuro-fuzzy inference system for drought forecasting. Stoch Environ Res Risk Assess 23(8):1143–1154
Article Google Scholar
Barua S, Ng A, Perera B (2012) Artificial neural network–based drought forecasting using a nonlinear aggregated drought index. J Hydrol Eng 17(12):1408–1413
Article Google Scholar
Belayneh A, Adamowski J, Khalil B, Quilty J (2016) Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction. Atmos Res 172:37–47
Article Google Scholar
Bordi I, Sutera A (2001) Fifty years of precipitation: some spatially remote teleconnnections. Water Resour Manag 15(4):247–280
Article Google Scholar
Bui T, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N (2020) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci Total Environ 721:137612
Article Google Scholar
Byzedi M, Siosemardeh M, Rahimi A, Mohammadi K (2012) Analysis of hydrological drought on Kurdistan province. Aust J Basic Appl Sci 6(7):255–259
Google Scholar
Cancelliere A, Di Mauro G, Bonaccorso B, Rossi G (2007) Drought forecasting using the standardized precipitation index. Water Resour Manag 21(5):801–819
Article Google Scholar
Che H, Wang J (2020) A two-timescale duplex neurodynamic approach to mixed-integer optimization. IEEE Trans Neural Netw Learn Syst 32(1):36–48. https://doi.org/10.1109/TNNLS.2020.2973760
Article Google Scholar
Chen W, Zhao X, Tsangaratos P, Shahabi H, Ilia I, Xue W, Wang X, Ahmad BB (2020) Evaluating the usage of tree-based ensemble methods in groundwater spring potential mapping. J Hydrol 583:124602. https://doi.org/10.1016/j.jhydrol.2020.124602
Article Google Scholar
Chen X, Quan Q, Zhang K, Wei J (2021) Spatiotemporal characteristics and attribution of dry/wet conditions in the Weihe River Basin within a typical monsoon transition zone of East Asia over the recent 547 years. Environ Model Softw 143:105116. https://doi.org/10.1016/j.envsoft.2021.105116
Article Google Scholar
Chen Z, Liu Z, Yin L, Zheng W (2022) Statistical analysis of regional air temperature characteristics before and after dam construction. Urban Clim 41:101085. https://doi.org/10.1016/j.uclim.2022.101085
Article Google Scholar
Deo R, Sahin M (2015) Application of the extreme learning machine algorithm for the prediction of monthly effective drought index in eastern Australia. Atmos Res 153:512–525
Article Google Scholar
Doesken NJ, Garen D (1991) Drought monitoring in the western United States using a surface water supply index. In: Seventh conference on applied climatology, pp 266–269
Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823. https://doi.org/10.1080/01621459.1981.10477729
Article Google Scholar
Ghiasi-Freez J, Kadkhodaie-Ilkhchi A, Ziaii M (2012) Improving the accuracy of flow units prediction through two committee machine models: an example from the south pars gas field, Persian gulf basin. Iran Comput Geosci 46:10–23. https://doi.org/10.1016/j.cageo.2012.04.006
Article Google Scholar
Han P, Wang PX, Zhang SY (2010) Drought forecasting based on the remote sensing data using ARIMA models. Math Comput Model 51(11–12):1398–1403
Article Google Scholar
Hayes M (1999) Drought indices. In: Drought happen climate impacts specialist
He Y, Dai L, Zhang H (2020) Multi-branch deep residual learning for clustering and beamforming in user-centric network. IEEE Commun Lett 24(10):2221–2225. https://doi.org/10.1109/LCOMM.2020.3005947
Article Google Scholar
He S, Guo F, Zou Q, HuiDing (2021) MRMD2.0: a python tool for machine learning with feature ranking and reduction. Curr Bioinf 15(10):1213–1221. https://doi.org/10.2174/1574893615999200503030350
Article Google Scholar
Hussain D, Khan AA (2020) Machine learning techniques for monthly river flow forecasting of Hunza River Pakistan. Earth Sci Inform. https://doi.org/10.1007/s12145-020-00450-z
Article Google Scholar
Ionita M, Scholz P, Chelcea S (2016) Assessment of droughts in Romania using the standardized precipitation index. Nat Hazard 81(3):1483–1498
Article Google Scholar
Kadam A, Rajasekhar M, Umrikar B, Bhagat V, Wagh V, Sankua RN (2021) Land suitability analysis for afforestation in semi-arid watershed of Western Ghat, India: a groundwater recharge perspective. Geol Ecol Landsc 5(2):136–148. https://doi.org/10.1080/24749508.2020.1833643
Article Google Scholar
Kamali B, Abbaspour KC, Lehmann A, Wehrli B, Yang H (2015) Identification of spatiotemporal patterns of biophysical droughts in semi-arid region–a case study of the Karkheh river basin in Iran. Hydrol Earth Syst Sci Discuss 12(6):5187–5217
Google Scholar
Kamali B, Houshmand Kouchi D, Yang H, Abbaspour K (2017) Multilevel drought hazard assessment under climate change scenarios in semi-arid regions—a case study of the Karkheh river basin in Iran. Water 9(4):241
Article Google Scholar
Kargar K, Safari MJS, Khosravi K (2021) Weighted instances handler wrapper and rotation forest-based hybrid algorithms for sediment transport modeling. J Hydrol 598:126452
Article Google Scholar
Khan N, Sachindra DA, Shahid S, Ahmed K, Shiru MS, Nawaz N (2020) Prediction of droughts over Pakistan using machine learning algorithms. Adv Water Resour 139:103562
Article Google Scholar
Khosravi K, Pham BT, Chapi K, Shirzadi A, Shahabi H, Revhaug I (2018a) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci Total Environ 627:744–755
Article Google Scholar
Khosravi K, Mao L, Kisi O, Yaseen ZM, Shahid S (2018b) Quantifying hourly suspended sediment load using data mining models: case study of a glacierized Andean catchment in Chile. J Hydrol 567:165–179
Article Google Scholar
Khosravi K, Mao L, Kisi O, Shahid S, Yaseen Z (2019) Quantifying hourly suspended sediment load using data mining models: case study of a glacierized Andean catchment in Chile. J Hydrol 567:165–179
Article Google Scholar
Khosravi K, Barzegar R, Miraki S, Adamowski J, Daggupati P, Pham B (2020a) Stochastic modeling of groundwater fluoride contamination: introducing lazy learners. Groundwater 58(5):723–734
Google Scholar
Khosravi K, Cooper JR, Daggupati P, Pham B, Bui D (2020b) Bedload transport rate prediction: application of novel hybrid data mining techniques. J Hydrol 585:124774
Article Google Scholar
Khosravi K, Khozani Z, Mao L (2021a) A comparison between advanced hybrid machine learning algorithms and empirical equations applied to abutment scour depth prediction. J Hydrol 596:126100
Article Google Scholar
Khosravi K, Golkarian A, Booij MJ, Barzegar R, Sun W, Yaseen ZM (2021b) Improving daily stochastic streamflow prediction: comparison of novel hybrid data mining algorithms. Hydrol Sci J 66(9):1457–1474. https://doi.org/10.1080/02626667.2021.1928673
Article Google Scholar
Khosravi K, Khozani ZS, Cooper JR (2021c) Predicting stable gravel-bed river hydraulic geometry: a test of novel, advanced, hybrid data mining algorithms. Environ Model Softw 144:105165
Article Google Scholar
Khosravi K, Miraki MS, Saco PM, Farmani R (2021ed) Short-term river streamflow modeling using ensemble-based additive learner approach. J Hydro-Environ Res 39:81–91
Article Google Scholar
Khosravi K, Panahi M, Golkarian A, Keesstra SD, Saco PM, Bui DT, Lee S (2021fe) Convolutional neural network approach for spatial prediction of flood hazard at national scale of Iran. J Hydrol 591:125552
Article Google Scholar
Khozani Z, Khosravi K, Torabi M, Mosavi A, Rezaei B, Tabczuk T (2020) Shear stress distribution prediction in symmetric compound channels using data mining and machine learning models. Front Struct Civ Eng 14(5):1097–1109
Article Google Scholar
Kisi O, Gorgij AD, Zounemat-Kermani M, Mahdavi-Meymand A, Kim S (2019) Drought forecasting using novel heuristic methods in a semi-arid environment. J Hydrol 578:124053
Article Google Scholar
Lalitha S, Gupta D, Zakariah M, Alotaibi YA (2020) Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation. Appl Acoust 170:107519. https://doi.org/10.1016/j.apacoust.2020.107519
Article Google Scholar
Lee J, Wang W, Harrou F, Sun Y (2020) Reliable solar irradiance prediction using ensemble learning-based models: a comparative study. Energy Convers Manag 208:112582. https://doi.org/10.1016/j.enconman.2020.112582
Article Google Scholar
Leilah AA, Al-Khateeb SA (2005) Statistical analysis of wheat yield under drought conditions. J Arid Environ 61(3):483–496
Article Google Scholar
Li S, Bhattarai R, Cooke RA, Verma S, Huang X, Markus M, Christianson L (2020) Relative performance of different data mining techniques for nitrate concentration and load estimation in different type of watersheds. Environ Pollut. https://doi.org/10.1016/j.envpol.2020.114618
Article Google Scholar
Liang X, Lu T, Yishake G (2022) How to promote residents’ use of green space: an empirically grounded agent-based modeling approach. Urban For Urban Green 67:127435. https://doi.org/10.1016/j.ufug.2021.127435
Article Google Scholar
Lira MM, De Aquino RR, Ferreira AA, Carvalho MA, Neto ON, Santos GS (2007) Combining multiple artificial neural networks using random committee to decide upon electrical disturbance classification. In: 2007 international joint conference on neural networks. IEEE, pp 2863–2868. https://doi.org/10.1109/IJCNN.2007.4371414
Liu H, Chen SM (2020) Heuristic creation of deep rule ensemble through iterative expansion of feature space. Inf Sci 520:195–208. https://doi.org/10.1016/j.ins.2020.02.001
Article Google Scholar
Liu F, Zhang G, Lu J (2020) Heterogeneous domain adaptation: an unsupervised approach. IEEE Trans Neural Netw Learn Syst 31(12):5588–5602. https://doi.org/10.1109/TNNLS.2020.2973293
Article Google Scholar
Liu L, Xiang H, Li X (2021) A novel perturbation method to reduce the dynamical degradation of digital chaotic maps. Nonlinear Dyn 103(1):1099–1115. https://doi.org/10.1007/s11071-020-06113-4
Article Google Scholar
Lloyd-Hughes B, Saunders MA (2002) A drought climatology for Europe. Int J Climatol 22(13):1571–1592
Article Google Scholar
Mahdavi M (2010) Applied hydrology. Tehran University Publication, Tehran
Google Scholar
McKee TB, Doesken NJ, Kleist J (1993) The relationship of drought frequency and duration to time scales. In: proceedings of the 8th conference on applied climatology, vol 17. American Meteorological Society, Boston, pp 179–183
Melesse AM, Khosravi K, Tiefenbacher J, Heddam S, Kim S, Mosavi A, Pham B (2020) River water salinity prediction using hybrid machine learning models. Water 12(10):2951
Article Google Scholar
Meshram SG, Safari MJS, Khosravi K, Meshram C (2021) Iterative classifier optimizer-based pace regression and random forest hybrid models for suspended sediment load prediction. Environ Sci Pollut Res 28(9):11637–11649
Article Google Scholar
Mirabbasi R, Anagnostou EN, Fakheri-Fard A, Dinpashoh Y, Eslamian S (2013) Analysis of meteorological drought in Northwest Iran using the joint deficit index. J Hydrol 492:35–48
Article Google Scholar
Mishra AK, Desai VR (2005) Drought forecasting using stochastic models. Stoch Environ Res Risk Assess 19(5):326–339
Article Google Scholar
Mokhtarzad M, Eskandari F, Vanjani NJ, Arabasadi A (2017) Drought forecasting by ANN, ANFIS, and SVM and comparison of the models. Environ Earth Sci 76(21):729
Article Google Scholar
Monteiro Junior JJ, Silva EA, De Amorim Reis AL, Mesquita Souza Santos JP (2019) Dynamical spatial modeling to simulate the forest scenario in Brazilian dry forest landscapes. Geol Ecol Landsc 3(1):46–52. https://doi.org/10.1080/24749508.2018.1481658
Article Google Scholar
Moradi HR, Rajabi M, Faragzadeh M (2011) Investigation of meteorological drought characteristics in Fars province. Iran Catena 84(1–2):35–46
Article Google Scholar
Morid S, Smakhtin V, Bagherzadeh K (2007) Drought forecasting using artificial neural networks and time series of drought indices. Int J Climatol 27(15):2103–2111
Article Google Scholar
Murwendo T, Murwira A, Masocha M (2020) Modelling and predicting mammalian wildlife abundance and distribution in semi-arid Gonarezhou National Park south eastern Zimbabwe. Ecofeminism Clim Change 1(3):151–163. https://doi.org/10.1108/EFCC-05-2020-0016
Article Google Scholar
Nhu VH, Khosravi K, Cooper JR, Karimi M, Pham BT, Lyu Z (2020) Monthly suspended sediment load prediction using artificial intelligence: testing of a new random subspace method. Hydrol Sci J 65(12):2116–2127
Article Google Scholar
Niranjan A, Prakash A, Veena N, Geetha M, Shenoy PD, Venugopal KR (2017) EBJRV: an ensemble of Bagging, J48 and random committee by voting for efficient classification of intrusions. In: 2017 IEEE international WIE conference on electrical and computer engineering (WIECON-ECE), IEEE, pp 51–54. https://doi.org/10.1109/WIECON-ECE.2017.8468876
Palchaudhuri M, Biswas S (2013) Analysis of meteorological drought using standardized precipitation index: a case study of Puruliya District, West Bengal India. Int J Earth Sci Eng 7(3):6–13
Google Scholar
Palmer WC (1968) Keeping track of crop moisture conditions, nationwide: the new crop moisture index. Weatherwise 21:156–161. https://doi.org/10.1080/00431672.1968.9932814
Article Google Scholar
Panahi M, Khosravi K, Ahmad S, Panahi S, Heddam S, Melesse AM et al (2021) Cumulative infiltration and infiltration rate prediction using optimized deep learning algorithms: a study in Western Iran. J Hydrol Reg Stud 35:100825
Article Google Scholar
Paulo AA, Rosa RD, Pereira LS (2012) Climate trends and behaviour of drought indices based on precipitation and evapotranspiration in Portugal. Nat Hazard Earth Syst Sci 12(5):1481–1491
Article Google Scholar
Pham BT, Nguyen-Thoi T, Qi C, Van Phong T, Dou J, Ho LS, Le HV, Prakash I (2020) Coupling RBF neural network with ensemble learning techniques for landslide susceptibility mapping. CATENA 195:104805. https://doi.org/10.1016/j.catena.2020.104805
Article Google Scholar
Portela MM, Zeleňáková M, Santos JF, Purcz P, Silva AT, Hlavatá H (2015) A comprehensive drought analysis in Slovakia using SPI. Eur Water 51:15–31
Google Scholar
Quiring SM, Papakryiakou TN (2003) An evaluation of agricultural drought indices for the Canadian prairies. Agric Meteorol 118(1–2):49–62
Article Google Scholar
Raziei T, Saghafian B, Paulo AA, Pereira LS, Bordi I (2009) Spatial patterns and temporal variability of drought in Western Iran. Water Resour Manag 23(3):439
Article Google Scholar
Raziei T, Martins DS, Bordi I, Santos JF, Portela MM, Pereira LS, Sutera A (2015) SPI modes of drought spatial and temporal variability in Portugal: comparing observations, PT02 and GPCC gridded datasets. Water Resour Manag 29(2):487–504
Article Google Scholar
Rezaei R, Gholifar E, Safa L (2016) Identifying and explaining the effects of drought in rural areas in Iran from viewpoints of farmers (case study: Esfejin Village, Zanjan Country). Desert 21(1):56–64
Google Scholar
Rossi G (2000) Drought mitigation measures: a comprehensive framework. Drought and drought mitigation in Europe. Springer, Dordrecht, pp 233–246
Chapter Google Scholar
Saghafian B, Mehdikhani H (2014) Drought characterization using a new copula-based trivariate approach. Nat Hazard 72(3):1391–1407
Article Google Scholar
Saha S, Saha M, Mukherjee K, Arabameri A, Ngo PTT, Paul GC (2020) Predicting the deforestation probability using the binary logistic regression, random forest, ensemble rotational forest and REPTree: a case study at the Gumani river basin. India Sci Total Environ. https://doi.org/10.1016/j.scitotenv.2020.139197
Article Google Scholar
Salih SQ, Sharafati A, Khosravi K, Faris H, Kisi O, Tao H, Ali M (2020) River suspended sediment load prediction based on river discharge information: application of newly developed data mining models. Hydrol Sci J 65(4):624–637
Article Google Scholar
Sánchez-Medina AJ, Galván-Sánchez I, Fernández-Monroy M (2020) Applying artificial intelligence to explore sexual cyberbullying behaviour. Heliyon 6(1):e03218. https://doi.org/10.1016/j.heliyon.2020.e03218
Article Google Scholar
Shamshirband S, Hashemi S, Salimi H, Samadianfard S, Asadi E, Shadkani S, Kargar K, Mosavi A, Nabipour N, Chau KW (2020) Predicting standardized streamflow index for hydrological drought using machine learning models. Eng Appl Comput Fluid Mech 14:339–350. https://doi.org/10.1080/19942060.2020.171584
Article Google Scholar
Shirmohammadi B, Moradi H, Moosavi V, Semiromi MT, Zeinali A (2013) Forecasting of meteorological drought using wavelet-ANFIS hybrid model for different time steps (case study: southeastern part of east Azerbaijan province, Iran). Nat Hazard 69(1):389–402
Article Google Scholar
Shukla S, Wood AW (2008) Use of a standardized runoff index for characterizing hydrologic drought. Geophys Res Lett 35(2):15–25. https://doi.org/10.1029/2007GL032487
Article Google Scholar
Stone CJ (1985) Additive regression and other nonparametric models. Ann Stat. https://doi.org/10.1214/aos/1176349548
Article Google Scholar
Szalai S, Szinell CS (2000) Comparison of two drought indices for drought monitoring in Hungary—a case study. Drought and drought mitigation in Europe. Springer, Dordrecht, pp 161–166
Chapter Google Scholar
Ting KM, Witten IH (1997) Stacking bagged and dagged models (Working paper 97/09). University of Waikato, Department of Computer Science, Hamilton, New Zealand
Tsakiris G, Pangalou D, Vangelis H (2007) Regional drought assessment based on the reconnaissance drought index (RDI). Water Resour Manag 21(5):821–833
Article Google Scholar
van Rooy MP (1965) A rainfall anomaly index independent of time and space. Notos 14(43):6
Google Scholar
Venegas-Quiñones HL, Thomasson M, Garcia-Chevesich PA (2020) Water scarcity or drought? the cause and solution for the lack of water in Laguna de Aculeo. Water Conserv Manage 4(1):42–50. https://doi.org/10.26480/wcm.01.2020.42.50
Article Google Scholar
Vicente-Serrano SM, González-Hidalgo JC, de Luis M, Raventós J (2004) Drought patterns in the Mediterranean area: the Valencia region (eastern Spain). Clim Res 26(1):5–15
Article Google Scholar
Vicente-Serrano SM, Beguería S, López-Moreno JI, Angulo M, El Kenawy A (2010) A new global 0.5 gridded dataset (1901–2006) of a multiscalar drought index: comparison with current drought index datasets based on the palmer drought severity index. J Hydrometeorol 11(4):1033–1043
Article Google Scholar
Wang F, Shi Z, Biswas A, Yang S, Ding J (2020) Multi-algorithm comparison for predicting soil salinity. Geoderma 365:114211. https://doi.org/10.1016/j.geoderma.2020.114211
Article Google Scholar
Wardlow BD, Anderson MC, Verdin JP (2012) Remote sensing of drought: innovative monitoring approaches, (CRC Press)
Wilhite DA, Buchanan-Smith M (2005) Drought as hazard: understanding the natural and social context. Drought Water Cris Sci Technol Manag Issue 3:29
Google Scholar
Wilhite DA, Svoboda MD, Hayes MJ (2007) Understanding the complex impacts of drought: a key to enhancing drought mitigation and preparedness. Water Resour Manag 21(5):763–774
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Google Scholar
Wu H, Hayes MJ, Weiss A, Hu Q (2001) An evaluation of the standardized precipitation index, the China-Z index and the statistical Z-score. Int J Climatol 21(6):745–758
Article Google Scholar
Xu B, Lin B (2017) Assessing CO2 emissions in China's iron and steel industry: a nonparametric additive regression approach. Renew Sustain Energy Rev 72:325–337
Article Google Scholar
Xu B, Luo L, Lin B (2016) A dynamic analysis of air pollution emissions in China: evidence from nonparametric additive regression models. Ecol Indic 63:346–358. https://doi.org/10.1016/j.ecolind.2015.11.012
Article Google Scholar
Zamani R, Tabari H, Willems P (2015) Extreme streamflow drought in the Karkheh river basin (Iran): probabilistic and regional analyses. Nat Hazard 76(1):327–346
Article Google Scholar
Zarch MAA, Malekinezhad H, Mobin MH, Dastorani MT, Kousari MR (2011) Drought monitoring by reconnaissance drought index (RDI) in Iran. Water Resour Manag 25(13):3485
Article Google Scholar
Zargar A, Sadiq R, Naser B, Khan FI (2011) A review of drought indices. Environ Rev 19(NA):333–349
Article Google Scholar
Zhao T, Shi J, Lv L, Xu H, Chen D, Cui Q, Jackson TJ, Yan G, Jia L, Chen L, Zhao K, Zheng X, Zhao L, Zheng C, Ji D, Xiong C, Wang T, Li R, Pan J, Wen J, Yu C, Zheng Y, Jiang L, Chai L, Lu H, Yao P, Ma J, Lv H, Wu J, Zhao W, Yang N, Guo P, Li Y, Hu L, Geng D, Zhang Z (2020) Soil moisture experiment in the Luan River supporting new satellite mission opportunities. Remote Sens Environ 240:111680. https://doi.org/10.1016/j.rse.2020.111680
Article Google Scholar
Zhao T, Shi J, Entekhabi D, Jackson TJ, Hu L, Peng Z, Yao P, Li S, Kang CS (2021) Retrievals of soil moisture and vegetation optical depth using a multi-channel collaborative algorithm. Remote Sens Environ 257:112321. https://doi.org/10.1016/j.rse.2021.112321
Article Google Scholar

Download references

Funding

There is not any funding for this paper.

Author information

Authors and Affiliations

Engineering and Management of Water Resources, Department of Civil Engineering, Maragheh Branch, Islamic Azad University, Maragheh, Iran
Mojtaba Shahdad
Civil Engineering, Department of Civil Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran
Behzad Saber

Authors

Mojtaba Shahdad
View author publications
You can also search for this author in PubMed Google Scholar
Behzad Saber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mojtaba Shahdad.

Ethics declarations

Conflict of interest

The authors declare that there have no conflict of interest.

Additional information

Edited By Prof. Jarosław Napiórkowski (ASSOCIATE EDITOR).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahdad, M., Saber, B. Drought forecasting using new advanced ensemble-based models of reduced error pruning tree. Acta Geophys. 70, 697–712 (2022). https://doi.org/10.1007/s11600-022-00738-2

Download citation

Received: 18 July 2021
Accepted: 22 January 2022
Published: 22 February 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11600-022-00738-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Drought forecasting using new advanced ensemble-based models of reduced error pruning tree

Abstract

Similar content being viewed by others

Application of artificial intelligence hybrid models for meteorological drought prediction

Fusion-based approach for hydrometeorological drought modeling: a regional investigation for Iran

Forecasting of meteorological drought using ensemble and machine learning models

Introduction

Study area