Introduction

Precipitation prediction is important for water resources management because of its highly precarious conditions in different climate conditions. Precipitation changes may alter underlying water resources conditions and increases the need for new water management programs and strategies especially in highly uncertain climate conditions within arid and semiarid regions (Choubin et al. 2017b).

Arid and semiarid regions span approximately one-third of the global land surface (Yatheendradas et al. 2008), one-third of Asia (Lemons 2003) and two-third of Iran. In such regions, rainfall occurs very rapidly and causes severe flash flooding. The resulting extreme events (runoff) are highly localized, heterogeneous and dominated by environmental gradients. These flash floods account for more of all flood-related deaths in Iran causing the highest human and economic impact (e.g., Sharifi et al. 2012).

Perhaps the most effective way to mitigate the risks of flash floods and predict a reliable forecast is through single-event-based hydrologic model. Typically, continuous hydrologic models require a highly reliable precipitation data. Although precipitation data may be available in recent years in semiarid watersheds but continuous prediction approaches require reliable and long-lead data at many climate stations. In Iran, historical precipitation records (daily, monthly and annual) and corresponded climate stations in semiarid watersheds have neither long-lead records and nor dense gauges distribution (data scarcity); hence, this approach yields a strong motivation to predict precipitation time series for watershed hydrology.

To effectively predict precipitation characteristics, forecasting models must accurately capture precipitation intensity, magnitude and storm intermittency. Indeed, requirements for such a highly complex system include an advanced model to accurately capture the highly nonlinear processes occurring in the climate regime of a semiarid region (Choubin et al. 2017b).

Many studies have been addressed the impact of large-scale signals on hydroclimatology studies in different climate regions (e.g., Gamiz-Fortis et al. 2010; Fallah-Ghalhary et al. 2010; Gaughan and Waylen 2012; Berg et al. 2013; Choubin et al. 2016a among others). Recent literature applied nonlinear models such as artificial neural networks (ANNs), adaptive neuro-fuzzy inference system (ANFIS) for precipitation prediction in different climate conditions. ANNs have been previously used to predict precipitation and proved promising in prediction accuracy and quantifying precipitation values (e.g., Afshin et al. 2011; Azadi and Sepaskhah 2012; Rezaeian-Zadeh et al. 2012; Sigaroodi et al. 2014; Choubin et al. 2016b). For example, El-Shafie et al. (2011) successfully applied the ANFIS model for precipitation prediction in Klang River, Malaysia. Sanikhani and Kisi (2012) employed two different ANFIS techniques (grid partition and sub-clustering) for the estimation of monthly streamflow in the Tigris-Euphrates Basin of Turkey. Choubin et al. (2014, 2017b) studied the relationship between large-scale signals with drought and precipitation in southwestern Iran and the results indicated that the ANFIS model could predict droughts and precipitation values with significant accuracy and precious. Some studies have been performed about the effect of ENSO on precipitation occurrence in Iran; nevertheless, the influences of climate indices on precipitation occurrence still need further investigation in other regions.

Predictions of climatic information in the arid and semiarid region tend to be a challenging task and mostly uncertain due to heterogeneity and nonlinearity of climate system behaviors. Our hypothesis is that precipitation amount is unconditional and has spatiotemporal errors associated with precipitation forecast and is negotiable at the application sites due to a limited areal extent.

The important overall question we attempt to address in this study is as follows: “how reliable is a precipitation forecast using large-scale climate signals for a semiarid precipitation data?” To this end, this research aims to make a better assessment of arid and semiarid precipitation forecast using classification and regression trees (CART) approach by employing various combinations of large-scale climate predictors.

Already, no study has investigated the performance of the CART model for precipitation forecasting. The current research is a new application of the CART model, as a data-mining algorithm, for forecasting the precipitation over a highly complex semiarid climate system using climate signals. This model is robust in case of outliers and auto-correlated input data (Loh 2011), which its splitting algorithm isolates outliers in individual nodes. Also, CART have no assumption on data distribution. Trees in this model are used for description and prediction of patterns (De'ath and Fabricius 2000; Sutton 2005; Choubin et al. 2018). Therefore, the main objectives of current research are to: (1) develop and apply a CART model for predicting the precipitation; and (2) compare CART model results with the two most commonly used approaches including ANFIS and time series models in predicting precipitation. The modeling approach conducted in this study can be helpful and critical for decision makers when precipitation forecast heralds a prolonged drought or flash flood.

Materials and methods

Study area

About two-third of Iran is categorized as arid and semiarid regions with a less annual precipitation amount. In most parts of the country, precipitation usually occurs in winter and spring while summer is hot. Kerman Province (54°20′ to 59°34′ E longitude and 26°29′ to 31°57′ N latitude) was selected as the study area situated in the south central part of Iran (Fig. 1). The study area is 180,725 km2 and the maximum and minimum heights are 4473 and 100 m in the central mountains and northeast of the area, respectively. Based on the statistical center of Iran, the population of Kerman is 2,938,988. So, predicting precipitation is important in this area.

Fig. 1
figure 1

Study area, synoptic stations and extended-De Martonne classification

Dataset

Precipitation

To predict precipitation, we collected monthly precipitation data from six stations, namely Bam, Fathabad, Kahnoj, Kerman, Shahdad and Sirjan (Fig. 1). These stations are located in different subclasses of climates in Kerman Province based on extended-De Martonne classification (Khalili 1997). Figure 1 indicates the study area along with extended-De Martonne classification and selected climate stations for each climate subclass of the province. Precipitation data for this research was provided by Iranian Meteorological Service, in September 2015. Then, seasonal precipitation time series and climate indices were calculated for each station during 1985–2014.

Climate signals

Climate signals are the global coupled ocean–atmosphere phenomenon to describe the state of a climate system. These signals are created in the various parts of the world and affect the global climate. Climate signals data (sea surface temperature; SST, and sea-level pressure; SLP) were obtained from the National Oceanic and Atmospheric Administration (NOAA) site (http://www.esrl.noaa.gov/psd) at the same time as well. Table 1 shows name and coordinates of the points (SST and SLP) used in this study.

Table 1 Name and coordinates of the points (SST and SLP) (Fallah-Ghalhary et al. 2010)

To determine the predictor of precipitation (climate signals), the correlation analysis was conducted between precipitation (predictant) and climate signals (predictor). The relationship between precipitation and previous climate signals was determined in seasonal time scale. Thus, effective climate signals were identified for prediction of precipitation.

Adaptive neuro-fuzzy inference system

ANFIS method was developed by Jang (1993), and the most important point in planning a neuro-fuzzy system is the selection of a suitable inference system. Figure 2 exhibits the structure of ANFIS model in which the nodes of the same layers has the same function.

Fig. 2
figure 2

ANFIS model, where x and y are inputs; A1, A2, B1 and B2 are fuzzy subsets; Π is the fixed nodes of Layer 2; Wi is the weight of a given fuzzy rule, fi; N is the fixed nodes of Layer 3; \( \overline{{{\text{w}}_{\text{i}} }} \) is the normalized weight; fi is the fuzzy rule; and f is the final output of the ANFIS model (Kurtulus and Razack 2010)

The functioning of ANFIS can be summarized as:

Layer 1:

Each node in this layer generates different degrees of membership of an input variable.

$$ OP_{i}^{1} = \mu_{{A_{i} }} (x) \leftarrow i = 1,2 $$
(1)
$$ OP_{i}^{1} = \mu_{{B_{i} - 2}} (y) \leftarrow i = 3,4 $$
(2)

where x (or y) is the input entered into the selected node and A i or (B i  − 2) is the fuzzy set associated with this node. For example, Gaussian membership function can be calculated as follows:

$$ OP_{i}^{1} = \mu_{{A_{i} }} (x) = \frac{1}{{1 + \left( {\frac{{x - c_{i} }}{{a_{i} }}} \right)^{{2b_{i} }} }} $$
(3)

where {a i , b i , c i } are set parameters and the maximum will be 1 and minimum is zero (Kisi et al. 2009).

Layer 2:

Each node in this layer multiply the input signals, and the output represents the animated power of the rule.

$$ OP_{i}^{2} = w_{i} = \mu_{{A_{i} }} (x)\mu_{{B_{i} }} (y) \leftarrow i = 1,2 $$
(4)

Layer 3:

The node number i which is named N calculates the normalized starting power:

$$ OP_{i}^{3} = \bar{w}_{i} = \frac{{w_{i} }}{{w_{1} + w_{2} }} \leftarrow i = 1,2 $$
(5)

Layer 4:

Node i calculates the contribution of rule i to the output of the model by using the following function.

$$ OP_{i}^{.4} = \bar{w}_{i} f_{i} = \bar{w}_{i} (p_{i} x + q_{i} y + r_{i} ) $$
(6)

where w is the output from the previous layer and {p i , q i , r i } are set of parameters.

Layer 5:

The only node in this layer calculates the final output of ANFIS by the following equation:

$$ OP_{i}^{5} = {\text{Final}}({\text{output}}) = \sum\limits_{i} {\bar{w}_{i} f_{i} = \frac{{\sum\nolimits_{i} {w_{i} f_{i} } }}{{\sum\nolimits_{i} {w_{i} } }}} $$
(7)

Hybrid learning algorithm that is a combination of least-squares and back propagation gradient descent method has been used in this study.

Time series modeling

In addition of ANFIS model, this study further implemented the ARIMA model (that only uses historical data for prediction) by using the autocorrelation (ACF) and partial autocorrelation (PACF) functions. Modeling parameters were estimated using the method of maximum likelihood. For better verification of the selected models, corrected Akaike information criterion, AICC, (Akaike 1998; Hurvich and Tsai 1989; Choubin and Malekian 2017) was used to compare model predictions and observed values.

Classification and regression trees (CART)

Classification and regression trees (CART) is a recursive algorithm in data mining developed by Breiman et al. (1984). CART uses historical data to construct decision trees. Depending on dependent variable, classification tree (for categorical variable) or regression tree (for continuous variable) can be constructed (Breiman et al. 1984; Singh et al. 2014; Choubin et al. 2018). Constructed tree can then be applied for predicting (regression tree) and classifying (classification tree) the new observations. Classification tree makes classes of dependent variables by user or calculated according to some exogenous rule, while regression trees do not have predefined classes. Instead there is dependent variable which represents the response values for observations in independent variable matrix (Timofeev 2004). CART methodology includes three steps: (1) construction of maximum tree; (2) choice of the optimal tree size (3) classification or production of new data using constructed tree. In this study, regression tree was used to predict precipitation; hence, we describe steps of CART methodology for regression trees. First step (construction of maximum tree) is the most time-consuming. Splitting and construction of maximum tree in regression trees is based on the squared residuals minimization algorithm. The splitting creates the last observations in learning sample. Maximum tree, especially, in the case of regression tree may be very big, so pruning techniques are necessary for cutting off insignificant nodes (Timofeev 2004). Cross-validation and optimization by number of points in each node are two pruning algorithms which can be used to choice right tree size (second step). In algorithm of optimization by number of points in each node, splitting is stopped when number of observations are less than the minimum number of predefined required observations. Cross-validation procedure, which was used in this study, is based on optimal proportion between misclassification error and the tree complexity. With increasing the size of trees (tree complexity), misclassification error is decreasing. The complexity parameter (cp) is used to select the optimal size of the decision tree. Best cp was identified through trial and error. When regression tree is constructed, it can be used for new data (third step). The output of this step is a certain response value to each of the new observations.

Effective inputs to the CART and ANFIS models were determined using correlation analysis between seasonal precipitation and climate signals. Then, three input combinations based on correlation analysis were considered for precipitation forecasting (Table 2). In the first model, the climate signal that has the highest correlation with precipitation was selected as input. In the second model, two climate signals were selected as inputs based on the high correlations with precipitation. In the third model, all climate signals having significant correlations were used as inputs to the applied model.

Table 2 Input and output of the ANFIS and CART models

Data normalization

Climate data in a semiarid region have sparse and irregular distribution; therefore, the best way to improve the robustness of climate information would be data normalization (Choubin et al. 2017a). Thus, the series was normalized to the range of [0, 1] as follows:

$$ X_{\text{norm}} = \frac{{X_{r} - {\text{X}}_{ \hbox{min} } }}{{{\text{X}}_{ \hbox{max} } - {\text{X}}_{ \hbox{min} } }} $$
(8)

where Xnorm and X r are the normalized and the original inputs, and Xmin and Xmax are the minimum and maximum ranges of inputs, respectively.

Performance criteria

Some performance criteria were used in this research including; Nash–Sutcliffe efficiency (NSE) coefficient (Nash and Sutcliffe 1970), the ratio of the root mean square error (RMSE) to the standard deviation of measured data (RSR), coefficient of determination (R2) and BIAS (Eqs. 9, 10, 11 and 12, respectively). The NSE is a normalized statistic that determines the relative magnitude of the residual variance (“noise”) compared to the measured data variance (“information”) (Nash and Sutcliffe 1970). NSE indicates how well the predicted values represent the observed data and it is computed using Eq. 9. Singh et al. (2005) recommended RSR as a model evaluation statistic to evaluate the differences between model and observed data in hydrological subjects. RSR (Eq. 10) is calculated based on RMSE and standard deviation of measured data. Determination coefficient describes the degree of collinearity between predicted and observed data (Moriasi et al. 2007) and it is a linear relationship between measured and simulated data (Eq. 11). Bias calculates the average tendency of the simulated data to be larger or smaller than their measured counterparts. The optimal value of BIAS is 0.0, with low-magnitude values indicating accurate model simulation. Negative values indicate model underestimation bias, and positive values indicate model overestimation bias. BIAS is presented in Eq. 12.

$$ {\text{NSE}} = 1 - \frac{{\sum\nolimits_{i = 1}^{N} {(O_{i} - P_{i} )^{2} } }}{{\sum\nolimits_{i = 1}^{n} {(O_{i} - \overline{O} )^{2} } }} $$
(9)
$$ {\text{RSR}} = \frac{\text{RMSE}}{{{\text{STDEV}}_{\text{obs}} }} = \frac{{\left[ {\sqrt {\sum\nolimits_{i = 1}^{N} {(O_{i} - P_{i} )^{2} } } } \right]}}{{\left[ {\sqrt {\sum\nolimits_{i = 1}^{N} {(O_{i} - \overline{O} )^{2} } } } \right]}} $$
(10)
$$ R^{2} = \left( {\frac{{\sum\nolimits_{i = 1}^{N} {(O_{i} - \overline{O} )(P_{i} - \overline{P} )} }}{{\sqrt {\sum\nolimits_{i = 1}^{N} {(O_{i} - \overline{O} )^{2} } \sum\nolimits_{i = 1}^{N} {(P_{i} - \overline{P} )^{2} } } }}} \right)^{2} $$
(11)
$$ {\text{BIAS}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {(P_{i} - O_{i} )} $$
(12)

where N is the number of data points, \( O_{i} \) and P i are the observed and predicted ith values, \( \bar{O} \) and \( \bar{P} \) are the mean of the observed and predicted values.

In this study, data were divided into 80 and 20 percent for training and testing sets, respectively.

Results

Determining the models inputs

Correlations between climate signals and precipitation were conducted to determine the model inputs. So, in each station, the same model inputs were selected for modeling methods. Table 3 shows the correlation matrix between fall precipitation (t) and summer climate signals (t − 1). It must be noted that only fall precipitation has satisfactory significant correlation with climate signals. So, fall results were just presented. In this regard, Nazemosadat and Cordey (2000) demonstrated a strong relationship of Iranian fall precipitation data with the El Niño–Southern Oscillation (ENSO) phenomenon. Therefore, the proposed approach in the current study generates a forecast for the fall precipitation values based on summer climate signals (previous season).

Table 3 Correlation matrix between fall precipitation (t) and summer climate signals (1 month ago, t − 1)

Results of selected climate signals are represented for each model in Table 4. The best signals with significant correlations are Black Sea SLP (0.264*), Northern Sea SST (0.373**), Arabian Sea SST (− 0.340**), Azores Sea SST (− 0.527**), Southern Persian Gulf SLP (0.424**) and Southern Persian Gulf SLP (0.538**) for the stations of Bam, Fathabad, Kerman, Kahnoj, Shahdad and Sirjan, respectively. According to Table 4, fall precipitation was predicted by using the input signals of models 1, 2 and 3.

Table 4 Selected climate signals for each model based on correlation analysis

ANFIS results

Various combinations of climate signals represented in Table 4 were used to predict precipitation. In this research, we used grid partition and subtractive fuzzy clustering algorithms to establish the rules based on the relationship between the input and output variables. Grid partition algorithm is used for a few input variables (less than 6, Wei et al. 2007), but in case higher number of input variables, grid partition cannot be used because the fuzzy rules would be too huge (Farokhnia et al. 2011). Therefore, subtractive fuzzy clustering algorithm was used for high number of input variables (higher than 6).

Types and numbers of membership functions were determined by trial-and-error fundamental method. Membership functions of trapezoidal-shaped (trapmf), triangular-shaped (trimf), generalized bell-shaped (gbellmf), gaussian (gaussmf), gaussian 2 (gauss2mf), Π-shaped (pimf), and difference between two sigmoidal functions (dsigmf) were considered for inputs, and linear model was selected as the membership function of output.

ANFIS results for models 1, 2 and 3 are represented in Table 5. Types of membership functions (MF) were determined by trial and error. For training and testing sets, evaluation measures were calculated, and performance rating was represented for the testing set based on Moriasi et al. (2007). The best performance is related to model 2 with good (for Kahnoj station) and satisfactory (for other stations) performance rating (Table 5). As a result, it is clear that the model 2 (two of the best climate signals) provided better results than the models 1 and 3. So, it can be said that the ANFIS model has a satisfactory performance in precipitation forecasting (0.50 < NSE < 0.65). Previous studies such as El-Shafie et al. (2011), Choubin et al. (2014, 2017b) also obtained satisfactory results from ANFIS in forecasting precipitation.

Table 5 ANFIS results for models 1, 2 and 3

CART results

Since response variable is continuous or numeric in current research, ANOVA method was used to grow the trees. Then, the optimal tree was determined based on the complexity parameter (cp) for each combination through trial and error. Determining the best cp help to save computing time through cutting off insignificant nodes. Pruning technique reduces overfitting through reducing the decision size and improves the prediction accuracy. CART results for models 1, 2 and 3 are presented in Table 6. Based on the performance ratings (Moriasi et al. 2007), CART results indicate that inputs of the Model 2 have a better performance than the other combinations in all stations (NSE > 0.75; performance rating is very good), while Model 1 is the worst choice of input variable (Table 6).

Table 6 CART results for models 1, 2 and 3

Time series modeling results

For prediction of precipitation values, this study further implemented the ARIMA model by using the autocorrelation (ACF) and partial autocorrelation (PACF) functions. The best model structures were estimated utilizing the method of maximum likelihood. For example, Fig. 3 shows variation of AICC criterion for determining the best model structure for Kerman station (due to lack of the space, only some of model structures were shown in this figure). As can been seen, ARIMA (0, 1, 1) has the best performance with AICC equal to about 145. Table 7 indicates the best model for each station which was selected by AICC criterion. Residual analysis based on Fig. 4, the residual autocorrelation function (ACF) and the residual partial autocorrelation function (PACF) of the predicted and observed data, confirm the ARIMA model.

Fig. 3
figure 3

Variation of AICC criterion for determining the best model structure in Kerman station

Table 7 Best model of time series for climate station
Fig. 4
figure 4

Autocorrelation and partial autocorrelation functions of the residuals (ACF and PACF) for the Bam station

In Table 8, testing results of the ARIMA are represented for each station. Based on Moriasi et al. (2007), performance rating for all stations was found to be unsatisfactory. This implies that the time series modeling is not successful in predicting precipitation.

Table 8 ARIMA results of each station for testing set

Models comparison

Seasonal forecasting results of precipitation

Since only fall precipitation, among the other seasons, had significant correlation with climate signals, so, we predicted only fall precipitation data in the study area. Figure 5 indicates the observed and predicted fall precipitation values using CART, ANFIS and ARIMA for Model 2 which has the best performance. As can be seen from Fig. 5, CART has closer fit to the corresponding observed precipitations compared to the ANFIS and ARIMA models. Significantly over- and underestimations of the time series model are obviously seen from the time variation graphs. ARIMA model cannot simulate the variations of precipitation data in the stations.

Fig. 5
figure 5

Comparison of observed and simulated (Model 2) fall precipitation data

Evaluation of spatial bias of precipitation

To evaluate the spatial variability of precipitation values, models’ bias was used. Figure 6 indicates the spatial biases of ARIMA, ANFIS and CART models for testing set. Also, it is clear that the bias in ARIMA varies between − 14.41 and 14.04 mm, whereas for ANFIS, the bias is lower (between − 8.07 and 9.82 mm). Lowest bias is related to CART prediction which ranges from − 1.10 to 5.50 mm. ARIMA has overestimations in west of region, on the other hand, it has underestimations in the center of the area. Whereas CART and ANFIS reveal approximately same variations where they have overestimations in south and southeast, and have underestimations in north, northeast and northwest of the Kerman Province.

Fig. 6
figure 6

Spatial bias for testing dataset

Discussion

The best correlated signals with precipitation in this study were the SPG, BS, NS, ARA, AZE and SPG, respectively, for the stations of Sirjan, Bam, Fathabad, Kerman, Kahnoj and Shahdad, respectively. In this regard, there are some studies such as Fallah-Ghalhary et al. (2010) and Ruigar and Golian (2016) which demonstrated relationships of these climate signals on precipitation of Iran, as well. Fallah-Ghalhary et al. (2010) indicated that sea surface temperature (SST) and sea-level pressure (SLP) of the EM, NPG, SPG, OS, ADE, BS, ARA, IO, ATL and NS have effects on climate of the Middle East and Iran. Ruigar and Golian (2016) confirmed the effects of the above-mentioned signals (and other signals used in this study) on north of Iran.

Results of this study indicated that CART performs better than the ANFIS and ARIMA in forecasting the fall precipitation. Structure of models can be a reason for differences between the models performance. Predictions are related with the computing power of the different algorithms (Loh 2011), which is caused differences in their outcome and response. The main important limitation of ARIMA is that it uses historical data for prediction and requires a long data series (Sen et al. 2016). The ANFIS model is rule-based technique, but it is facing with some limitations. ANFIS not have the capability of producing rules to explain its predictions and it is sensitive to learning datasets (Seera et al. 2012). Also, the ANFIS model can overlearn during training period, which it leads to reducing the performance during testing period (Choubin et al. 2014, 2016a).

Our results are agreement with Choubin et al. (2018), where they found the CART model has better performance compared with ANFIS for estimation of the river-suspended sediment load. The major advantages of the CART model which has better performance compared to others are: (1) it is nonparametric, so does not need specification of any functional form; (2) CART results are fixed to uniform transformations of its independent variables (Timofeev 2004); (3) the splitting algorithm of CART easily isolates the outliers in a separate node (Loh 2011; Timofeev 2004); (4) CART has no assumptions and it is fast in view of computation; (5) it is flexible and has an capability to regulate in time (Timofeev 2004); and (6) CART has the capability of explaining its prediction with rules and it is less sensitive to learning datasets (Seera et al. 2012).

The results of this study demonstrated that the predictions accomplished better performance by two of the best climate indices (Model2) in prediction of fall precipitation at time “t ” by using the climate signals at “t − 1.” During the summer (t − 1), large-scale oceanic and atmospheric information can provide important insight into next season climate conditions and fall extreme precipitation events. Specifically, global climate signals were shown to have a good statistical correlation with fall precipitation in the forecasting scheme of next season precipitation values. The CART model showed significant skill at fall precipitation forecast, thus providing crucial advance knowledge of precipitation characteristics which enables efficient water resources planning and management. The outputs of this research can provide useful insights into the skill of longer-lead climate forecasts and also can increase the skill of shorter-lead forecasts that rely on seasonal hydroclimate data. To this end, incorporating large-scale climate predictors into a semiarid forecast model allowed for a shorter-lead time forecast (beginning in September) and also contributed a significant skill during the fall through forecast of early winter extreme events.

In practice, CART model requires more sophisticated data transformation and screening of candidate predictor variables to predict precipitation values precisely. It is recognized that this step in the procedure still assumes a degree of some statistics objectives performance concerning the choice of most appropriate large-scale predictors. Furthermore, in prediction processes, precipitation can be divided into two classifications: dry and wet data, next CART framework could apply on wet class to obtain the precipitation amount. Further efforts are warranted to categorize wet data more finely in technique. Considering days of light, medium and heavy rain by constructing different models for each class can enhance precipitation forecast of a complex climate system. This approach can effectively reduce the number of invalid values calculated by the model. Moreover, it can reproduce precipitation values with less bias and improve the performance criteria. In order to evaluate whether model indeed decreases the entire projection envelope of seasonal climatic variables, it is necessary to implement an additional study on bias analysis and study additional series (here daily and monthly precipitation datasets), which could provide more insight into the climate system (Samadi et al. 2013). Therefore, future studies can apply other machine learning techniques such as self-organizing maps (SOM) and support vector machine (SVM) to compare the results with CART.

Conclusion

This study focuses on the application and evaluation of CART in prediction of seasonal precipitation. Accuracy of the CART model was compared with two most commonly used models (ANFIS and time series modeling) for prediction of precipitation. The results revealed that the CART produced more accurate fall precipitation values than the other models, and this was also confirmed by spatial bias analysis. The results of the CART, in addition, demonstrated that the predictions accomplished better performance by two of the best climate indices in prediction of fall precipitation at time “t” by using the climate signals at “t − 1.”

Findings of this research can be utilized to detect precipitation values in a semiarid region and can be also used as a basis for precipitation forecasting in south and southeast Iran with the same climate conditions. Having a skillful forecast of the upcoming fall precipitation during early summer is a significant importance to water managers and decision makers because fall precipitation usually causes flash floods in semiarid regions. Incorporating this forecast in a decision support model can be further investigated to provide promising alternatives for arid water resources management.