Introduction

Precipitation experiences monthly and seasonal variation. Along with temperature, it is one the major factors essential for climate classification and for assessing climate change and its consequences. It is the source of surface water, subsurface water, groundwater, lakes and reservoirs, and Cryospheric water. Precipitation directly or indirectly affects all aspects of human life, and therefore, its prediction is of fundamental significance.

Time series models have long been used for prediction, in hydrological and meteorological sciences (Aghelpour and Varshavian 2020; Aghelpour et al. 2019; Pandey et al. 2019; Dabral and Murry 2017; Yan and Ma 2016; Valipour et al. 2013). Tran Anh et al. (2019) used ARIMA to predict monthly precipitation in Vietnam. Wang et al. (2013) employed SARIMA for predicting monthly precipitation in Shouguang, China. Bari et al. (2015) and Mahmud et al. (2016) evaluated SARIMA for predicting monthly precipitation in Bangladesh. Several studies have reported on the accuracy of precipitation prediction by time series models at different scales: annual scale (Nyatuame and Agodzo 2018), monthly scale (Abdul-Aziz et al. 2013; Eni and Adeyeye 2015; Dwivedi et al. 2019; Wang et al. 2014), and daily scale (Nanda et al. 2013), but not at the seasonal scale.

In recent years, machine Learning (ML) models have also been used for meteorological and hydrological predictions (Mohammadi et al. 2020; Moazenzadeh and Mohammadi 2019; Aghelpour et al. 2019; Abbasi et al. 2019; Moazenzadeh et al. 2018). For example, MLP and ANFIS have been used to predict wind speed (Maroufpoor et al. 2019; Deo et al. 2018); river flow (Aghelpour and Varshavian 2020; Poul et al. 2019; Parsaie et al. 2019); meteorological, hydrological, and agricultural drought indices (Aghelpour et al. 2020a; Aghelpour et al. 2020c; Malik et al. 2019; Kisi et al. 2019; Maca and Pech 2016); solar radiation (Jahani and Mohammadi 2018; Halabi et al. 2018; Khosravi et al. 2018); snow cover area (Aghelpour et al. 2020b); and monthly precipitation (Ghamariadyan et al. 2019; Tran Anh et al. 2019; Dwivedi et al. 2019; Nanda et al. 2013). Total seasonal precipitation has been predicted using ANFIS and MLP in Australia (Hossain et al. 2020; Mekanik et al. 2016; Mekanik et al. 2013), and total monthly precipitation in South Korea (Lee et al. 2018). In these predictions, time lags of teleconnection climatic signals, such as El Niño Southern Oscillation (ENSO), Indian Ocean Dipole (IOD), or East Atlantic Pattern (EA), were used as model input but the accuracy of MLs was not compared with time series models. Studies show that if climatic signals are not available, both ML and time series-based models can do precipitation prediction. Therefore, the current research aims to use and compare some of these numerical models in predicting seasonal precipitation, without teleconnection climatic signals. Also, literature survey shows that the time series prediction of total seasonal precipitation does not seem to have been done for any type of Iranian climate. Therefore, this study addresses this issue in different climate regions of Iran (from per-humid to extra-arid areas), using time series, stochastic models, and artificial intelligence models and compare these model types under similar input conditions. Also, the impact of climate type on the prediction accuracy of seasonal precipitation by different models is investigated.

Materials and methods

Study area

Iran has considerable climatic diversity, ranging from per-humid climates of Caspian Sea’s south side and semi-arid regions of the Zagros mountains to arid and extra-arid regions in central, south, and southeast parts. Based on the Extended De-Martonne classification method (Rahimi et al. 2013), Iran has all 28 climatic classes from which 8 stations were selected. These stations were selected because they have the longest records, are spread out countrywide, and have the maximum differences between their climate classes in terms of temperature and humidity. Stations’ locations are shown in Fig. 1.

Fig. 1
figure 1

Stations’ position in Iran

Data

Total monthly precipitation data for the period of 1951–2018 was obtained from Iran Meteorological Organization (IRIMO). Seasonal (3 months) precipitation was calculated by summing monthly precipitation corresponding to each season, thus yielding 4-season data sets. The characteristics of the stations and their precipitation data are shown in Table 1.

Table 1 Stations’ climate and coordinates with the statistics of seasonal precipitation data

The input for predictive models constituted time-lagged seasonal precipitation, which was selected using the autocorrelation function (ACF). Then, the data were divided into two parts: 75% (including first 51 years) for the training of models and 25% (last 17 years) for testing.

Time series models

The time series model refers to a model commonly used to measure time-based data. An observed time series is considered to be one realization of a stochastic process. The simplest model proposed for simulating the time series consists of a process in which the events have been taking place at different times and at constant intervals; each event is independent of other values (Salas et al. 1988). Time series models are based on calibrated regression coefficients, multiplied by the time lags of the original series. In these models, inputs are the time lags of the original series and the coefficients are optimized by the least squares (LS) algorithm.

The basic types of these models are Autoregressive (AR) and Moving Average (MA) and the rest of the models originate from these two models, such as Autoregressive Moving Average (ARMA) and Autoregressive Integrated Moving Average (ARIMA). If there is a seasonal trend in the data, seasonal variants of time series models, such as PARMA (Periodic ARMA) and SARIMA (Seasonal ARIMA), can be used (Salas et al. 1988). This study used the seasonal SARIMA model. This model, which is mainly used to simulate the stochastic behavior in seasonal time series, is a linear parametric stochastic model which is denoted by SARIMA(p,d,q)×(P,D,Q)ω, where ω denotes the periodicity of the series; p, d, q, show the non-seasonal degrees of AR, Integrated & MA and P, D, and Q show the seasonal degrees of AR, Integrated & MA. A general relationship can be expressed as below:

$$ {\varPhi}_P\left({B}^{\omega}\right)=\left(1-{\varPhi}_1{B}^{\omega \times 1}-\dots -{\varPhi}_P{B}^{\omega \times P}\right) $$
(1)
$$ {\phi}_p(B)=\left(1-{\phi}_1{B}^1-\dots -{\phi}_p{B}^p\right) $$
(2)
$$ {\nabla}_{\omega}^D={\left(1-{B}^{\omega}\right)}^D $$
(3)
$$ {\nabla}^d={\left(1-B\right)}^d $$
(4)
$$ {\varTheta}_Q\left({B}^{\omega}\right)=\left(1-{\varTheta}_1{B}^{\omega \times 1}-\dots -{\varTheta}_Q{B}^{\omega \times Q}\right) $$
(5)
$$ {\theta}_q(B)=\left(1-{\theta}_1{B}^1-\dots -{\theta}_q{B}^q\right) $$
(6)
$$ {\varPhi}_P\left({B}^{\omega}\right){\phi}_p(B){\nabla}_{\omega}^D{\nabla}^d{X}_t={\theta}_q(B){\varTheta}_Q\left({B}^{\omega}\right){\varepsilon}_t $$
(7)

Here, Xt stands for the stochastic variable and εt is a normal random variable with mean μ and variance \( {\sigma}_{\varepsilon}^2 \). Moreover, parameters B، Φ، ϕ ، \( {\nabla}_{\omega}^D \) andd، Θ، θ represent the backward operators of seasonal autoregressive, non-seasonal autoregressive, seasonal differencing and non-seasonal differencing, seasonal moving average, and non-seasonal moving average, respectively (Salas et al. 1988).

Multilayer Perceptron

The concept of perceptron was first introduced by McCulloch and Pitts in (1943) as an artificial neuron. A Multilayer Perceptron (MLP) network provides a nonlinear relationship between input and output vectors which is accomplished by connecting neurons from one layer to another (previous or next layer). The output of each neuron is multiplied by weight coefficients and given as input to a nonlinear excitation function. In the training phase, the training data are given to perceptron, then the grid weights are adjusted to minimize the error between the target and the output of the model, or to reach the number of training times to the default value. Then, different inputs (which were not present in the training phase) are used for model validation. The training of these neural networks can be stated as an optimization problem with a large number of variables (Rumellhart 1986). For further information details, one can refer to Rumellhart 1986; Haykin 1999; Jahani and Mohammadi 2018; and Aghelpour and Varshavian 2020.

Adaptive Neuro-Fuzzy Inference System

Adaptive Neuro-Fuzzy Inference System (ANFIS) is a model that uses a neural network learning algorithm and fuzzy logic for designing nonlinear maps between input and output spaces. The model is capable of learning through the neural network and defining and using the relationships between input and output variables by fuzzy rules and subsequently creating the input structure of the system. ANFIS uses a variety of methods for fuzzy clustering, the strongest of which is known as Subtractive Clustering (SC) and Fuzzy Cluster Means (FCM). In this study, these two methods are used for ANFIS clustering to provide ANFIS-SC and ANFIS-FCM. These clustering methods are described below.

Subtractive Clustering

The subtractive clustering method assumes that each data point is a potential cluster center and calculates a measure of the likelihood that each data point would define the cluster center on the basis of the density of surrounding data points. Considering a set of n data points {x1, x2, …xi} in m-dimensional space, it is assumed that all data points within a cubic space have been normalized. In subtractive clustering, each of the data points is considered as a potential cluster center (Kisi et al. 2018). As a result, the density index Di corresponding to the data xican be expressed as:

$$ {D}_i=\sum \limits_{j=1}^n\mathit{\exp}\left(-\frac{\left\Vert {x}_i-{x}_j\right\Vert }{{\left(\frac{r_a}{2}\right)}^2}\right) $$
(8)

Here, ra is a positive quantity called cluster radius. If many data points are adjacent to a data point, then that data point has the maximum density. After measuring the density of each data point, data point with the highest density is selected as the first data center clustering (Kisi et al. 2018; Hiremath et al. 2012). If the effect of a limited area of the center of the first cluster center is removed, the following formula can be used to measure the density of other points.

$$ {D}_i={D}_i-{D}_{c1}\sum \limits_{j=1}^n\mathit{\exp}\left(-\frac{{\left\Vert {x}_i-{x}_{c1}\right\Vert}^2}{{\left(\frac{r_b}{2}\right)}^2}\right) $$
(9)

Here, xc1 and Dc1 are the selected points and density potential, respectively, and rb is a positive constant. To avoid approaching the cluster centers, the rb constant value is normally larger than ra (rb is considered 1.5ra). After measuring the density for each data point, the next cluster center xc2 is selected and the measured density for all data points will be recalculated. This process continues until a sufficient number of cluster centers produce (Kisi et al. 2018; Kisi et al. 2014; Aqil et al. 2007).

Fuzzy Cluster Means

In fuzzy clustering, each pattern might belong to several clusters or segments. One of the most functional clustering algorithms is the K-mean algorithm. This unsupervised algorithm in large datasets, exposed with some limitations in the process, may not work properly. To deal with this disadvantage, different clustering algorithms have been proposed. Among them, a fuzzy cluster means, as a proper alternative method, is used (Kisi et al. 2018; Kisi and Zounemat-Kermani 2016). Fuzzy cluster means was developed by Dunn (1973), and Bezdek (2013) improved it.

The Fuzzy Cluster Means (FCM) method blocks a set of N vector xi, i = 1, …N, into c fuzzy clusters, where each pattern corresponds to a cluster with a degree specified by a membership grade uij between 0 and 1. The final object by the FCM algorithm is to find c cluster centers so that the cost function of the dissimilarity measure can be minimized. The aim is minimizing the objective function that is defined as below:

$$ {MinJ}_{FCM}=\sum \limits_{c=1}^C\sum \limits_{i=1}^N{w}_{ic}^p{\left\Vert {w}_i-{v}_c\right\Vert}^2\kern1.25em s.t.\sum \limits_{c=1}^C{w}_{ic}=1,i=1,2,\dots N $$
(10)

which p (1< p) is known as fuzzifier portion; N, is the number of data points; C, the number of clusters; wic, the number of belongings of the ith data point to the cth cluster; v, is the cluster’s center; and x is the number of the input for calculating the amount of wic the following formula is used (Kisi et al. 2018; Bezdek et al. 1984):

$$ {w}_{ic}=\frac{1}{\sum \limits_{L=1}^{\mathrm{C}}{\left(\frac{d_{ic}^2}{d_{ij}^2}\right)}^{\left(\frac{1}{p-1}\right)}}\kern1em for\ i=1,2,\dots N\kern0.5em and\kern0.5em c=1,2,\dots C $$
(11)

For the beginning of the center vectors, centers are calculated by:

$$ {v}_c=\frac{\sum \limits_{j=1}^N{w}_{jc}^p{x}_j}{\sum \limits_{j=1}^N{w}_{jc}^p} $$
(12)

FCM procession continues until a convergence condition is achieved.

Measuring prediction accuracy

In the current study, seven criteria were used for evaluating the prediction accuracy: root mean squared error (RMSE), normalized root mean squared error (NRMSE), mean absolute error (MAE), Wilmott Index (WI), coefficient of determination (R2), Akaike Information Criterion (AIC), and Schwarz Bayesian Information Criterion (BIC) were as follows:

$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({y}_i-{f}_i\right)}^2} $$
(13)
$$ \mathrm{NRMSE}=\frac{\mathrm{RMSE}}{{\mathrm{Maximum}}_{y_i}-{\mathrm{Minimum}}_{y_i}} $$
(14)
$$ \mathrm{MAE}=\frac{1}{n}\sum \limits_{i=1}^n\left|{y}_i-{f}_i\right| $$
(15)
$$ WI=1-\frac{\sum \limits_{i=1}^n{\left({y}_i-{f}_i\right)}^2}{\sum \limits_{i=1}^n{\left(\left|{f}_i-\overline{y}\right|+\left|{y}_i-\overline{y}\right|\right)}^2} $$
(16)
$$ {R}^2={\left[\frac{\sum_{i=1}^n\left({y}_i-\overline{y}\right)\left({f}_i-\overline{f}\right)}{\sqrt{\sum_{i=1}^n{\left({y}_i-\overline{y}\right)}^2}\ast \sqrt{\sum_{i=1}^n{\left({f}_i-\overline{f}\right)}^2}}\right]}^2 $$
(17)
$$ \mathrm{AIC}= nln\left(\frac{\sum \limits_{i=1}^n{\left({y}_i-{f}_i\right)}^2}{n}\right)+2p $$
(18)
$$ \mathrm{BIC}= nln\left(\frac{\sum \limits_{i=1}^n{\left({y}_i-{f}_i\right)}^2}{n}\right)+ pln(n) $$
(19)

Here, yi is the observed precipitation; \( \overline{y} \) is the average value of observed precipitation; fi is the predicted precipitation; \( \overline{f} \) is the average value of predicted precipitation; n is the count of data; and p is the number of model parameters. Prediction results will be better if the values of RMSE, NRMSE, and MAE were close to 0, and the values of WI and R2 were close to 1. Also, the smaller AIC and BIC values show the better performance of the model.

The Minitab software was used to implement the time series model, and MATLAB software was used to implement the MLP and ANFIS models. Graphs were made by software Excel and Minitab.

Results

Results of time series model

To find the appropriate input matrix, the Autocorrelation Function (ACF) was used. Four examples of ACF plots are shown.

Because of the regular crossing of signification lines (at lags of 2, 6, 10, 14, 18… in the negative direction; and at lags of 4, 8, 12, 16, 20… in the positive direction) as shown in Fig. 2, it is clear that for all stations, there was a return period of 4 lags (4 seasons). It means there was a seasonal periodic trend among the precipitation data, suggesting that the applicable model of time series models is the Seasonal Autoregressive Integrated Moving Average (SARIMA) model. The model pattern of SARIMA(p,d,q)(P,D,Q)ω, was changed to SARIMA(p,d,q)(P,D,Q)4. The seasonal differencing degree of this pattern (D) was determined by differencing the data with lag equal to the return period (lag = 4). As an example, for Anzali station, it was done four times, as shown in Fig. 3.

Fig. 2
figure 2

Samples of ACF plot, for seasonal precipitation data

Fig. 3
figure 3

Changes of ACF by four steps of seasonal differencing (Anzali station)

The figure shows that the crossing of signification lines increased with an increasing seasonal differencing degree. For example, in D = 1 (ACF plot for 1 step of seasonal differencing), there were two lags out of the significance level, in D = 2, D = 3, and D = 4, the number of crossed lags were 3, 4, and 5, respectively. Thus, the least crossing belonged to the differencing degree of 1 and the model pattern changed to SARIMA(p,d,q)(P,1,Q)4. To determine the other degrees of seasonal and non-seasonal Autoregressive and Moving average (p, q, P and Q), a trial and error approach was used. The degrees were examined among 0 to 5 for each station, and the best one(s) were evaluated by the evaluation criteria (Table 2).

Table 2 Assessment of SARIMA model (The bold rows show the best prediction performance of each station)

For some of the stations (including Anzali, Kermanshah, Shiraz, and Bushehr), the best SARIMA model was clear, so the single best model is shown in Table 2 for these stations. For other stations (including Babolsar, Shahroud, and Isfahan), more than one SARIMA model was fitted. In the test period for Babolsar, models SARIMA(1,0,5)(0,1,2)4 and SARIMA(2,0,5)(1,1,1)4 were the best models and there was just a little difference between their 2nd and 3rd decimal places, but according to the principle of parsimony (Salas et al. 1988), the model SARIMA(1,0,5)(0,1,2)4 should be chosen, because of its fewer parameters. By this principle, the chosen models for Shahroud, Isfahan, and Zahedan were SARIMA(0,0,5)(0,1,0)4, SARIMA(0,0,4)(0,1,2)4, and SARIMA(0,0,4)(0,1,4)4, respectively. The lowest prediction error of SARIMA belonged to Zahedan Station, which is located in an extra-arid moderate region; with RMSE = 17.726 mm per season, NRMSE = 0.205, WI = 0.743, AIC = 399.007, and BIC = 407.885. The highest prediction error belonged to Anzali station in the per-humid moderate area, with RMSE = 156.394 mm per season, NRMSE = 0.140, WI = 0.911, AIC = 695.705, and BIC = 704.597.

Results of ML models

For the selection of input for the ML models (MLP, ANFIS-SC, and ANFIS-FCM), ACF was used. To make a logical comparison between the time series and ML models, inputs should be similar. From the ACF plots (Fig. 2), the even time lags of seasonal precipitation had significant autocorrelations, so the time lags 2, 4, 6, 8, 10, 12, 14, 16, 18, and 20 were used as input for the ML models. The models were implemented and the evaluation results are shown in Table 3.

Table 3 Assessment of machine learning models (The bold rows show the best prediction performance of each station)

For the ML prediction, the models were calibrated for their parameters. For the MLP model, the parameters were the number of hidden layers, the number of neurons in each hidden layer, and the type of transfer function; for ANFIS-SC, the only parameter was the cluster’s radius; for ANFIS-FCM, the only parameter was the number of clusters, which were optimized by trial and error. Results showed that the ML model had a close accuracy in predicting seasonal precipitation. The least error was due to the ANFIS-FCM model with clusters = 2 for Zahedan station with RMSE = 20.045 mm per season, NRMSE = 0.232, WI = 0.676, AIC = 415.722, and BIC = 424.600. The highest prediction error of ML models was for ANFIS-SC with a cluster radius = 0.55, which was for Anzali station with RMSE = 172.456 mm per season, NRMSE = 0.155, WI = 0.899, AIC = 708.419, and BIC = 717.297.

Comparison between models

Models were compared by their mean absolute error (MAE) and they were also compared between their under and overestimations (Fig. 4). To draw the graph of Fig. 2, observed precipitation data for all of the stations in the prediction period (test period) were compared with their related predictions made by the models. They were separated for each model and then their MAE was calculated for underestimation and overestimation separately.

Fig. 4
figure 4

Investigating under and over estimation of the models, using MAE

Comparison between drought classes

At the first glance, the error of model under-estimation was larger than that of over-estimation. This may be due to the nature of precipitation data, which have irregularity and sudden jumps in their time series and the models cannot usually determine them and usual precipitation occurrences. Among these four models, SARIMA had the lowest MAE in both under-estimation (MAE = 43.45 mm) and over-estimation (MAE = 41.89 mm) and also the difference between under-estimation and over-estimation was the least, so it can be regarded as the best model among other models. It seems that there was not a significant difference between MLs, but according to their MAE, ANFIS-FCM can be selected as the best of MLs (MAE of under-estimation = 53.13 mm, MAE of over-estimation = 45.51 mm), ANFIS-SC the second one (MAE of under estimation = 53.46 mm, MAE of over-estimation = 46.67 mm), and MLP the third one (MAE of underestimation = 54.74 mm, MAE of over-estimation = 47.01 mm), with minor differences.

After determining the best predictor model (known as SARIMA), its accuracy was evaluated in different years from the perspective of meteorological drought classes. For this, the Standardized Precipitation Index (SPI) was used to determine drought classes at the annual scale, and then the years were separated into 3 classes of drought, normal, and wet years. The observed and predicted data of the test period for each station were investigated using the scatter plot (Fig. 5), and the R2 values were calculated for different drought years separately.

Fig. 5
figure 5

Regression plots of SARIMA outputs vs observed precipitation data in all stations, for different drought classes

At the first glance, the best correlations for all classes were for Anzali station and the weakest correlations for Isfahan station (because of their fitted regression line’s closeness and distance to the 1:1 line and also the R2 value). For all of the stations, the fitted regression line of the classes “wet” (blue dots) and “normal” (green dots) was located below the 1:1 line, which showed the model under-predicted in these two classes. At stations Anzali, Babolsar, Shiraz, and Bushehr, the fitted regression plots of drought years (shown by the red dots) were completely at the top of 1:1 line and at stations Kermanhah, Shahroud, Isfahan, and Zahedan, half of the drought’s regression lines were over and the others were under the 1:1 line. So, it can be generally regarded that the model over-predicted precipitation in drought years. From the R2 values, the model’s predictions were more correlated with observations in wet and normal years than in drought years.

Comparison under different climatic classes

To compare under different climates, the bar charts (Fig. 6) were constructed to show the prediction errors for all stations.

Fig. 6
figure 6

Comparing the models’ accuracy in different climates

In Fig. 6, the stations were sorted by their dryness from left (the most humid) to right (the driest), and then the values of 3 criteria of RMSE, NRMSE, and WI were calculated for the test period. In the RMSE bar chart, the prediction error was the highest for Anzali station (about 155 \( \frac{\mathrm{mm}}{\mathrm{season}} \)), which is located in a per humid-moderate climate region (refer to Table 1). With decreasing humidity of the climate, the model error also decreased which showed that the least error occurred in the extra arid-moderate climate region of Zahedan station with about 18 \( \frac{\mathrm{mm}}{\mathrm{season}} \). But in the NRMSE bar chart, the trend was complete to the contrary. It showed that the least normalized error occurred for humid climate stations and the highest ones for extra arid climate stations. For example, the NRMSE of Anzali and Babolsar were about 14 and 11, respectively; but in continuation, it got its highest values for Isfahan and Zahedan stations (about 20). In conclusion, referring to the third criterion was necessary, WI. The bar chart of WI showed similar results for RMSE. The best prediction belonged to the per humid-moderate climate region of Anzali station with WI≈0.91. The value of WI reduced for the dry climate area of stations Isfahan and Zahedan with the amount of 0.74 approximately. It can be said that RMSE was not a good criterion to compare different modeling results, because in different climates, it showed opposite results. This issue can also be confirmed by referring to the R2 values in different climates, as shown in Fig. 5. Also, two samples are shown in Fig. 7 as a time series plot for seeing the model predictions against their observation values for stations Anzali and Isfahan.

Fig. 7
figure 7

Time series plot of observation vs output for two samples of the stations

Discussion

SARIMA model has not been used to predict seasonal cumulative precipitation so far, so the current study can be compared with monthly predictions. Dabral and Murry (2017) implemented SARIMA for monthly precipitation of Doimukh station in India. They compared SARIMA’s prediction and observed data by showing the average data of each month (both observed and predicted) in tabular form (Tables 2 and 3 of this paper). In terms of cumulative annual precipitation, only Anzali station was the closest and most similar to Doimukh station, so Anzali station is discussed.

The monthly amounts of Tables 2 and 3 from the study of Dabral and Murry (2017) were extracted. Then, their monthly average amounts were changed to seasonal average amounts and were compared with Anzali’s seasonal average. The two criteria NRMSE and WI were calculated for these two stations’ seasonal average precipitation for comparing with SARIMA’s prediction for both training and test periods (shown in Fig. 8). Results showed that SARIMA for Anzali had better results than did Doimukh in both test and training periods. This difference can be due to the climate difference and also the annual precipitation regime. In Iran’s climates (especially the cities on the margin of the Caspian Sea, such as Anzali and Babolsar), the most part of annual precipitation occurred in autumn and winter and the least precipitation amounts belonged to summer; while in Indian climates, the peak of precipitation is in monsoon season which occurs in summer. Also, it can result from the return period of SARIMA, 12 for Dabral and Murry’s study and 4 for the current study. This shows the current method for SARIMA can yield a better seasonal precipitation prediction, while the NRMSE value of Anzali station (0.032) was about 37% better than Doimokh’s (0.044) in the test period. Also, the NRMSE value for Anzali in the training period was 0.023, which was less than half of Doimukh’s NRMSE = 0.054, but it can be related to the difference between their statistical periods (in the current research; the data belongs to 68 years but in the mentioned research, it belongs to 26 years). SARIMA has also been reported as a good predictor for the prediction of monthly precipitation in some other studies in Bangladesh (Mahmud et al. 2016), India (Bari et al. 2015), and Nigeria (Eni and Adeyeye 2015) which is in line with the current study. Predicting seasonal precipitation in Australia using machine learning methods had similar results (Mekanik et al. 2013) and even weaker results (Hossain et al. 2018), in comparison to the current study (according to the available values of R & R2), with this advantage that they used climatic indexes as predictor inputs. But, an ML model with these inputs cannot be logically compared with the time series model because it just uses lags of the same precipitation data as input.

Fig. 8
figure 8

Comparing the accuracy of SARIMA between Anzali station in this study, and Doimukh station in the study of Dabral and Murry (2017)

The difference in error between similar climates can also relate to physical and synoptic reasons. For example, both Anzali and Babolsar stations are located in humid climatic class, but have different prediction results (referring to Fig. 6 and NRMSE criterion). The impact of Siberian high pressure on the eastern part of the Caspian Sea’s southern coasts (Babolsar) is more atmospheric stability in the region, while the western coasts of the Caspian Sea (Anzali) are relatively more affected by western systems, such as the Black Sea and the Mediterranean Sea than eastern coasts. Atmospheric instability can cause irregularities in time series, reducing autocorrelation of the series and consequently reducing prediction accuracy in an area such as Anzali compared to Babolsar which despite having a similar climate differs in the prediction accuracy. For semi-arid climatic stations (Kermanshah and Shiraz), differences in prediction accuracy are also observed. Effective systems in the Kermanshah region include low-pressure of Saudi Arabia, low-pressure of Sudan, and Mediterranean fronts that have severe impacts on the atmospheric instability and consequently precipitation of Kermanshah, while southwestern Iran (Shiraz station) can be only affected by weaker-just the two low-pressures of Saudi Arabia and Sudan. In addition, Shiraz’s adaptation to the subtropical high-pressure belt may also be another reason for greater atmospheric stability in this area (29.53° latitude) than in Kermanshah (at 34.35° latitude). This is also true for the difference in prediction accuracy between the two arid climate stations of Bushehr (latitude 29°) and Shahroud (latitude 36.42°), which have provided more accurate predictions in the Bushehr region (Fig. 6 and NRMSE criterion). The difference between the two stations in the extra-arid climate is much smaller than in the other regions. Trade winds and monsoon systems sometimes affect the Sistan and Baluchestan area (Zahedan station) and cause atmospheric instability in the area, which may be a reason for the poorer prediction accuracy of Zahedan compared to Isfahan.

Conclusion

It was found that the SARIMA linear model better predicted seasonal precipitation in Iranian climates than did ML models. The linear relation of seasonal precipitation’s time lags was stronger than nonlinear relations in such areas. So, SARIMA is recommended for Iran. Among MLs, ANFIS was the best model (especially with the FCM clustering method), which has the least parameter for optimization, while MLP has more parameters for its network’s makeup. All of the models predict well in wet and normal years than in drought years. According to the NRMSE value which is in the range of 0.1 to 0.2, SARIMA’s performance was not excellent, but it was in good and medium classes, so it has potential for prediction of seasonal precipitation in other areas. Among the regions studied, the per-humid and humid climate regions, such as for Anzali and Babolsar, can have more accurate predictions than the arid and extra-arid climate regions, like for Bushehr, Shahroud, Isfahan, and Zahedan. The significant result is that the evaluation criterion “RMSE” is a good criterion to compare some models for one station, but it cannot be a good criterion for different climate regions. Because RMSE does not consider the variation range of data and in different climates, the variation range of data (especially precipitation data) is highly changeable; it is better to use RMSE’s normalized form as “NRMSE.” It is suggested to use climatic indexes as predictor inputs for the ML models, and optimize the MLs for precipitation prediction in Iran using complex optimization algorithms to check their efficiency.