Time series prediction of seasonal precipitation in Iran, using data-driven models: a comparison under different climatic conditions

Aghelpour, Pouya; Singh, Vijay P.; Varshavian, Vahid

doi:10.1007/s12517-021-06910-0

Time series prediction of seasonal precipitation in Iran, using data-driven models: a comparison under different climatic conditions

Original Paper
Published: 19 March 2021

Volume 14, article number 551, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Arabian Journal of Geosciences Aims and scope Submit manuscript

Time series prediction of seasonal precipitation in Iran, using data-driven models: a comparison under different climatic conditions

Download PDF

363 Accesses
13 Citations
Explore all metrics

Abstract

Seasonal total precipitation is one of the important meteorological variables and its prediction is useful for the supply of water to different sectors. This study aims to compare Seasonal Autoregressive Integrated Moving Average (SARIMA), Multilayer Perceptron (MLP), Adaptive Neuro-Fuzzy Inference System-Subtractive Clustering (ANFIS-SC), and ANFIS-Fuzzy Cluster Means (ANFIS-FCM) for the prediction of seasonal precipitation. The precipitation data were obtained for the 1951–2018 period from 8 stations located in different climatic zones of Iran. The stations and their climates are Anzali (per-humid moderate climate), Babolsar (humid moderate climate), Kermanshah (semi-arid cold climate), Shiraz (semi-arid moderate climate), Bushehr (arid warm climate), Shahroud (arid cold climate), Isfahan (extra-arid cold climate), and Zahedan (extra-arid moderate climate). The time-lagged precipitation as input for all models was chosen using the autocorrelation function (ACF), and the data were divided into two periods: 1951–2001 for training (75%) and 2002–2018 for testing (25%). Based on the evaluation criteria (root mean squared error [RMSE], normalized root mean squared error [NRMSE], Wilmott Index [WI], Akaike Information Criterion [AIC], and Bayesian Information Criterion [BIC]), results showed that the SARIMA stochastic model was more accurate than the artificial intelligence methods and had the least over- and under-estimations. MLs exhibited good prediction accuracy, but ANFIS-FCM had a little higher accuracy. Consequently, due to the high accuracy and simplicity, the stochastic model is reported as the best predictor for seasonal precipitation in all climates. In terms of the R² values, the models showed better fitting in wet and normal years than in drought years. Further, the model predictions were more accurate in per-humid and humid areas than in arid and extra-arid climates. Also, the NRMSE values were in the range of 0.1 and 0.2, which indicated that SARIMA’s performance was medium and well. A significant result of this study was that results for different climates based on RMSE were completely opposite to those based on NRMSE, WI, and R². This contrast was caused by the neglect of data range in the RMSE equation, so it is not a good choice to compare the results under different climates and it is better to use its normalized form “NRMSE.”

Long-range forecast of all India summer monsoon rainfall using adaptive neuro-fuzzy inference system: skill comparison with CFSv2 model simulation and real-time forecast for the year 2015

Article 29 February 2016

MODWT-ANN hybrid models for daily precipitation estimates with time-delayed entries in Amazon region

Article 25 March 2022

Artificial intelligence models for prediction of monthly rainfall without climatic data for meteorological stations in Ethiopia

Article Open access 03 January 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Precipitation experiences monthly and seasonal variation. Along with temperature, it is one the major factors essential for climate classification and for assessing climate change and its consequences. It is the source of surface water, subsurface water, groundwater, lakes and reservoirs, and Cryospheric water. Precipitation directly or indirectly affects all aspects of human life, and therefore, its prediction is of fundamental significance.

Time series models have long been used for prediction, in hydrological and meteorological sciences (Aghelpour and Varshavian 2020; Aghelpour et al. 2019; Pandey et al. 2019; Dabral and Murry 2017; Yan and Ma 2016; Valipour et al. 2013). Tran Anh et al. (2019) used ARIMA to predict monthly precipitation in Vietnam. Wang et al. (2013) employed SARIMA for predicting monthly precipitation in Shouguang, China. Bari et al. (2015) and Mahmud et al. (2016) evaluated SARIMA for predicting monthly precipitation in Bangladesh. Several studies have reported on the accuracy of precipitation prediction by time series models at different scales: annual scale (Nyatuame and Agodzo 2018), monthly scale (Abdul-Aziz et al. 2013; Eni and Adeyeye 2015; Dwivedi et al. 2019; Wang et al. 2014), and daily scale (Nanda et al. 2013), but not at the seasonal scale.

In recent years, machine Learning (ML) models have also been used for meteorological and hydrological predictions (Mohammadi et al. 2020; Moazenzadeh and Mohammadi 2019; Aghelpour et al. 2019; Abbasi et al. 2019; Moazenzadeh et al. 2018). For example, MLP and ANFIS have been used to predict wind speed (Maroufpoor et al. 2019; Deo et al. 2018); river flow (Aghelpour and Varshavian 2020; Poul et al. 2019; Parsaie et al. 2019); meteorological, hydrological, and agricultural drought indices (Aghelpour et al. 2020a; Aghelpour et al. 2020c; Malik et al. 2019; Kisi et al. 2019; Maca and Pech 2016); solar radiation (Jahani and Mohammadi 2018; Halabi et al. 2018; Khosravi et al. 2018); snow cover area (Aghelpour et al. 2020b); and monthly precipitation (Ghamariadyan et al. 2019; Tran Anh et al. 2019; Dwivedi et al. 2019; Nanda et al. 2013). Total seasonal precipitation has been predicted using ANFIS and MLP in Australia (Hossain et al. 2020; Mekanik et al. 2016; Mekanik et al. 2013), and total monthly precipitation in South Korea (Lee et al. 2018). In these predictions, time lags of teleconnection climatic signals, such as El Niño Southern Oscillation (ENSO), Indian Ocean Dipole (IOD), or East Atlantic Pattern (EA), were used as model input but the accuracy of MLs was not compared with time series models. Studies show that if climatic signals are not available, both ML and time series-based models can do precipitation prediction. Therefore, the current research aims to use and compare some of these numerical models in predicting seasonal precipitation, without teleconnection climatic signals. Also, literature survey shows that the time series prediction of total seasonal precipitation does not seem to have been done for any type of Iranian climate. Therefore, this study addresses this issue in different climate regions of Iran (from per-humid to extra-arid areas), using time series, stochastic models, and artificial intelligence models and compare these model types under similar input conditions. Also, the impact of climate type on the prediction accuracy of seasonal precipitation by different models is investigated.

Materials and methods

Study area

Iran has considerable climatic diversity, ranging from per-humid climates of Caspian Sea’s south side and semi-arid regions of the Zagros mountains to arid and extra-arid regions in central, south, and southeast parts. Based on the Extended De-Martonne classification method (Rahimi et al. 2013), Iran has all 28 climatic classes from which 8 stations were selected. These stations were selected because they have the longest records, are spread out countrywide, and have the maximum differences between their climate classes in terms of temperature and humidity. Stations’ locations are shown in Fig. 1.

Data

Total monthly precipitation data for the period of 1951–2018 was obtained from Iran Meteorological Organization (IRIMO). Seasonal (3 months) precipitation was calculated by summing monthly precipitation corresponding to each season, thus yielding 4-season data sets. The characteristics of the stations and their precipitation data are shown in Table 1.

Table 1 Stations’ climate and coordinates with the statistics of seasonal precipitation data

Full size table

The input for predictive models constituted time-lagged seasonal precipitation, which was selected using the autocorrelation function (ACF). Then, the data were divided into two parts: 75% (including first 51 years) for the training of models and 25% (last 17 years) for testing.

Time series models

The time series model refers to a model commonly used to measure time-based data. An observed time series is considered to be one realization of a stochastic process. The simplest model proposed for simulating the time series consists of a process in which the events have been taking place at different times and at constant intervals; each event is independent of other values (Salas et al. 1988). Time series models are based on calibrated regression coefficients, multiplied by the time lags of the original series. In these models, inputs are the time lags of the original series and the coefficients are optimized by the least squares (LS) algorithm.

The basic types of these models are Autoregressive (AR) and Moving Average (MA) and the rest of the models originate from these two models, such as Autoregressive Moving Average (ARMA) and Autoregressive Integrated Moving Average (ARIMA). If there is a seasonal trend in the data, seasonal variants of time series models, such as PARMA (Periodic ARMA) and SARIMA (Seasonal ARIMA), can be used (Salas et al. 1988). This study used the seasonal SARIMA model. This model, which is mainly used to simulate the stochastic behavior in seasonal time series, is a linear parametric stochastic model which is denoted by SARIMA(p,d,q)×(P,D,Q)_ω, where ω denotes the periodicity of the series; p, d, q, show the non-seasonal degrees of AR, Integrated & MA and P, D, and Q show the seasonal degrees of AR, Integrated & MA. A general relationship can be expressed as below:

$$ {\varPhi}_P\left({B}^{\omega}\right)=\left(1-{\varPhi}_1{B}^{\omega \times 1}-\dots -{\varPhi}_P{B}^{\omega \times P}\right) $$

(1)

$$ {\phi}_p(B)=\left(1-{\phi}_1{B}^1-\dots -{\phi}_p{B}^p\right) $$

(2)

$$ {\nabla}_{\omega}^D={\left(1-{B}^{\omega}\right)}^D $$

(3)

$$ {\nabla}^d={\left(1-B\right)}^d $$

(4)

$$ {\varTheta}_Q\left({B}^{\omega}\right)=\left(1-{\varTheta}_1{B}^{\omega \times 1}-\dots -{\varTheta}_Q{B}^{\omega \times Q}\right) $$

(5)

$$ {\theta}_q(B)=\left(1-{\theta}_1{B}^1-\dots -{\theta}_q{B}^q\right) $$

(6)

$$ {\varPhi}_P\left({B}^{\omega}\right){\phi}_p(B){\nabla}_{\omega}^D{\nabla}^d{X}_t={\theta}_q(B){\varTheta}_Q\left({B}^{\omega}\right){\varepsilon}_t $$

(7)

Here, X_t stands for the stochastic variable and ε_t is a normal random variable with mean μ and variance $ {\sigma}_{\varepsilon}^2 $. Moreover, parameters B، Φ، ϕ ، $ {\nabla}_{\omega}^D $ and ∇^d، Θ، θ represent the backward operators of seasonal autoregressive, non-seasonal autoregressive, seasonal differencing and non-seasonal differencing, seasonal moving average, and non-seasonal moving average, respectively (Salas et al. 1988).

Multilayer Perceptron

The concept of perceptron was first introduced by McCulloch and Pitts in (1943) as an artificial neuron. A Multilayer Perceptron (MLP) network provides a nonlinear relationship between input and output vectors which is accomplished by connecting neurons from one layer to another (previous or next layer). The output of each neuron is multiplied by weight coefficients and given as input to a nonlinear excitation function. In the training phase, the training data are given to perceptron, then the grid weights are adjusted to minimize the error between the target and the output of the model, or to reach the number of training times to the default value. Then, different inputs (which were not present in the training phase) are used for model validation. The training of these neural networks can be stated as an optimization problem with a large number of variables (Rumellhart 1986). For further information details, one can refer to Rumellhart 1986; Haykin 1999; Jahani and Mohammadi 2018; and Aghelpour and Varshavian 2020.

Adaptive Neuro-Fuzzy Inference System

Adaptive Neuro-Fuzzy Inference System (ANFIS) is a model that uses a neural network learning algorithm and fuzzy logic for designing nonlinear maps between input and output spaces. The model is capable of learning through the neural network and defining and using the relationships between input and output variables by fuzzy rules and subsequently creating the input structure of the system. ANFIS uses a variety of methods for fuzzy clustering, the strongest of which is known as Subtractive Clustering (SC) and Fuzzy Cluster Means (FCM). In this study, these two methods are used for ANFIS clustering to provide ANFIS-SC and ANFIS-FCM. These clustering methods are described below.

Subtractive Clustering

The subtractive clustering method assumes that each data point is a potential cluster center and calculates a measure of the likelihood that each data point would define the cluster center on the basis of the density of surrounding data points. Considering a set of n data points {x₁, x₂, …x_i} in m-dimensional space, it is assumed that all data points within a cubic space have been normalized. In subtractive clustering, each of the data points is considered as a potential cluster center (Kisi et al. 2018). As a result, the density index D_i corresponding to the data x_ican be expressed as:

$$ {D}_i=\sum \limits_{j=1}^n\mathit{\exp}\left(-\frac{\left\Vert {x}_i-{x}_j\right\Vert }{{\left(\frac{r_a}{2}\right)}^2}\right) $$

(8)

Here, r_a is a positive quantity called cluster radius. If many data points are adjacent to a data point, then that data point has the maximum density. After measuring the density of each data point, data point with the highest density is selected as the first data center clustering (Kisi et al. 2018; Hiremath et al. 2012). If the effect of a limited area of the center of the first cluster center is removed, the following formula can be used to measure the density of other points.

$$ {D}_i={D}_i-{D}_{c1}\sum \limits_{j=1}^n\mathit{\exp}\left(-\frac{{\left\Vert {x}_i-{x}_{c1}\right\Vert}^2}{{\left(\frac{r_b}{2}\right)}^2}\right) $$

(9)

Here, x_c1 and D_c1 are the selected points and density potential, respectively, and r_b is a positive constant. To avoid approaching the cluster centers, the r_b constant value is normally larger than r_a (r_b is considered 1.5r_a). After measuring the density for each data point, the next cluster center x_c2 is selected and the measured density for all data points will be recalculated. This process continues until a sufficient number of cluster centers produce (Kisi et al. 2018; Kisi et al. 2014; Aqil et al. 2007).

Fuzzy Cluster Means

In fuzzy clustering, each pattern might belong to several clusters or segments. One of the most functional clustering algorithms is the K-mean algorithm. This unsupervised algorithm in large datasets, exposed with some limitations in the process, may not work properly. To deal with this disadvantage, different clustering algorithms have been proposed. Among them, a fuzzy cluster means, as a proper alternative method, is used (Kisi et al. 2018; Kisi and Zounemat-Kermani 2016). Fuzzy cluster means was developed by Dunn (1973), and Bezdek (2013) improved it.

The Fuzzy Cluster Means (FCM) method blocks a set of N vector x_i, i = 1, …N, into c fuzzy clusters, where each pattern corresponds to a cluster with a degree specified by a membership grade u_ij between 0 and 1. The final object by the FCM algorithm is to find c cluster centers so that the cost function of the dissimilarity measure can be minimized. The aim is minimizing the objective function that is defined as below:

$$ {MinJ}_{FCM}=\sum \limits_{c=1}^C\sum \limits_{i=1}^N{w}_{ic}^p{\left\Vert {w}_i-{v}_c\right\Vert}^2\kern1.25em s.t.\sum \limits_{c=1}^C{w}_{ic}=1,i=1,2,\dots N $$

(10)

which p (1< p) is known as fuzzifier portion; N, is the number of data points; C, the number of clusters; w_ic, the number of belongings of the i^th data point to the c^th cluster; v, is the cluster’s center; and x is the number of the input for calculating the amount of w_ic the following formula is used (Kisi et al. 2018; Bezdek et al. 1984):

$$ {w}_{ic}=\frac{1}{\sum \limits_{L=1}^{\mathrm{C}}{\left(\frac{d_{ic}^2}{d_{ij}^2}\right)}^{\left(\frac{1}{p-1}\right)}}\kern1em for\ i=1,2,\dots N\kern0.5em and\kern0.5em c=1,2,\dots C $$

(11)

For the beginning of the center vectors, centers are calculated by:

$$ {v}_c=\frac{\sum \limits_{j=1}^N{w}_{jc}^p{x}_j}{\sum \limits_{j=1}^N{w}_{jc}^p} $$

(12)

FCM procession continues until a convergence condition is achieved.

Measuring prediction accuracy

In the current study, seven criteria were used for evaluating the prediction accuracy: root mean squared error (RMSE), normalized root mean squared error (NRMSE), mean absolute error (MAE), Wilmott Index (WI), coefficient of determination (R²), Akaike Information Criterion (AIC), and Schwarz Bayesian Information Criterion (BIC) were as follows:

$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({y}_i-{f}_i\right)}^2} $$

(13)

$$ \mathrm{NRMSE}=\frac{\mathrm{RMSE}}{{\mathrm{Maximum}}_{y_i}-{\mathrm{Minimum}}_{y_i}} $$

(14)

$$ \mathrm{MAE}=\frac{1}{n}\sum \limits_{i=1}^n\left|{y}_i-{f}_i\right| $$

(15)

$$ WI=1-\frac{\sum \limits_{i=1}^n{\left({y}_i-{f}_i\right)}^2}{\sum \limits_{i=1}^n{\left(\left|{f}_i-\overline{y}\right|+\left|{y}_i-\overline{y}\right|\right)}^2} $$

(16)

$$ {R}^2={\left[\frac{\sum_{i=1}^n\left({y}_i-\overline{y}\right)\left({f}_i-\overline{f}\right)}{\sqrt{\sum_{i=1}^n{\left({y}_i-\overline{y}\right)}^2}\ast \sqrt{\sum_{i=1}^n{\left({f}_i-\overline{f}\right)}^2}}\right]}^2 $$

(17)

$$ \mathrm{AIC}= nln\left(\frac{\sum \limits_{i=1}^n{\left({y}_i-{f}_i\right)}^2}{n}\right)+2p $$

(18)

$$ \mathrm{BIC}= nln\left(\frac{\sum \limits_{i=1}^n{\left({y}_i-{f}_i\right)}^2}{n}\right)+ pln(n) $$

(19)

Here, y_i is the observed precipitation; $ \overline{y} $ is the average value of observed precipitation; f_i is the predicted precipitation; $ \overline{f} $ is the average value of predicted precipitation; n is the count of data; and p is the number of model parameters. Prediction results will be better if the values of RMSE, NRMSE, and MAE were close to 0, and the values of WI and R² were close to 1. Also, the smaller AIC and BIC values show the better performance of the model.

The Minitab software was used to implement the time series model, and MATLAB software was used to implement the MLP and ANFIS models. Graphs were made by software Excel and Minitab.

Results

Results of time series model

To find the appropriate input matrix, the Autocorrelation Function (ACF) was used. Four examples of ACF plots are shown.

Because of the regular crossing of signification lines (at lags of 2, 6, 10, 14, 18… in the negative direction; and at lags of 4, 8, 12, 16, 20… in the positive direction) as shown in Fig. 2, it is clear that for all stations, there was a return period of 4 lags (4 seasons). It means there was a seasonal periodic trend among the precipitation data, suggesting that the applicable model of time series models is the Seasonal Autoregressive Integrated Moving Average (SARIMA) model. The model pattern of SARIMA(p,d,q)(P,D,Q)_ω, was changed to SARIMA(p,d,q)(P,D,Q)₄. The seasonal differencing degree of this pattern (D) was determined by differencing the data with lag equal to the return period (lag = 4). As an example, for Anzali station, it was done four times, as shown in Fig. 3.

The figure shows that the crossing of signification lines increased with an increasing seasonal differencing degree. For example, in D = 1 (ACF plot for 1 step of seasonal differencing), there were two lags out of the significance level, in D = 2, D = 3, and D = 4, the number of crossed lags were 3, 4, and 5, respectively. Thus, the least crossing belonged to the differencing degree of 1 and the model pattern changed to SARIMA(p,d,q)(P,1,Q)₄. To determine the other degrees of seasonal and non-seasonal Autoregressive and Moving average (p, q, P and Q), a trial and error approach was used. The degrees were examined among 0 to 5 for each station, and the best one(s) were evaluated by the evaluation criteria (Table 2).

Table 2 Assessment of SARIMA model (The bold rows show the best prediction performance of each station)

Full size table

For some of the stations (including Anzali, Kermanshah, Shiraz, and Bushehr), the best SARIMA model was clear, so the single best model is shown in Table 2 for these stations. For other stations (including Babolsar, Shahroud, and Isfahan), more than one SARIMA model was fitted. In the test period for Babolsar, models SARIMA(1,0,5)(0,1,2)₄ and SARIMA(2,0,5)(1,1,1)₄ were the best models and there was just a little difference between their 2nd and 3rd decimal places, but according to the principle of parsimony (Salas et al. 1988), the model SARIMA(1,0,5)(0,1,2)₄ should be chosen, because of its fewer parameters. By this principle, the chosen models for Shahroud, Isfahan, and Zahedan were SARIMA(0,0,5)(0,1,0)₄, SARIMA(0,0,4)(0,1,2)₄, and SARIMA(0,0,4)(0,1,4)₄, respectively. The lowest prediction error of SARIMA belonged to Zahedan Station, which is located in an extra-arid moderate region; with RMSE = 17.726 mm per season, NRMSE = 0.205, WI = 0.743, AIC = 399.007, and BIC = 407.885. The highest prediction error belonged to Anzali station in the per-humid moderate area, with RMSE = 156.394 mm per season, NRMSE = 0.140, WI = 0.911, AIC = 695.705, and BIC = 704.597.

Results of ML models

For the selection of input for the ML models (MLP, ANFIS-SC, and ANFIS-FCM), ACF was used. To make a logical comparison between the time series and ML models, inputs should be similar. From the ACF plots (Fig. 2), the even time lags of seasonal precipitation had significant autocorrelations, so the time lags 2, 4, 6, 8, 10, 12, 14, 16, 18, and 20 were used as input for the ML models. The models were implemented and the evaluation results are shown in Table 3.

Table 3 Assessment of machine learning models (The bold rows show the best prediction performance of each station)

Full size table

For the ML prediction, the models were calibrated for their parameters. For the MLP model, the parameters were the number of hidden layers, the number of neurons in each hidden layer, and the type of transfer function; for ANFIS-SC, the only parameter was the cluster’s radius; for ANFIS-FCM, the only parameter was the number of clusters, which were optimized by trial and error. Results showed that the ML model had a close accuracy in predicting seasonal precipitation. The least error was due to the ANFIS-FCM model with clusters = 2 for Zahedan station with RMSE = 20.045 mm per season, NRMSE = 0.232, WI = 0.676, AIC = 415.722, and BIC = 424.600. The highest prediction error of ML models was for ANFIS-SC with a cluster radius = 0.55, which was for Anzali station with RMSE = 172.456 mm per season, NRMSE = 0.155, WI = 0.899, AIC = 708.419, and BIC = 717.297.

Comparison between models

Models were compared by their mean absolute error (MAE) and they were also compared between their under and overestimations (Fig. 4). To draw the graph of Fig. 2, observed precipitation data for all of the stations in the prediction period (test period) were compared with their related predictions made by the models. They were separated for each model and then their MAE was calculated for underestimation and overestimation separately.

Comparison between drought classes

At the first glance, the error of model under-estimation was larger than that of over-estimation. This may be due to the nature of precipitation data, which have irregularity and sudden jumps in their time series and the models cannot usually determine them and usual precipitation occurrences. Among these four models, SARIMA had the lowest MAE in both under-estimation (MAE = 43.45 mm) and over-estimation (MAE = 41.89 mm) and also the difference between under-estimation and over-estimation was the least, so it can be regarded as the best model among other models. It seems that there was not a significant difference between MLs, but according to their MAE, ANFIS-FCM can be selected as the best of MLs (MAE of under-estimation = 53.13 mm, MAE of over-estimation = 45.51 mm), ANFIS-SC the second one (MAE of under estimation = 53.46 mm, MAE of over-estimation = 46.67 mm), and MLP the third one (MAE of underestimation = 54.74 mm, MAE of over-estimation = 47.01 mm), with minor differences.

After determining the best predictor model (known as SARIMA), its accuracy was evaluated in different years from the perspective of meteorological drought classes. For this, the Standardized Precipitation Index (SPI) was used to determine drought classes at the annual scale, and then the years were separated into 3 classes of drought, normal, and wet years. The observed and predicted data of the test period for each station were investigated using the scatter plot (Fig. 5), and the R² values were calculated for different drought years separately.

At the first glance, the best correlations for all classes were for Anzali station and the weakest correlations for Isfahan station (because of their fitted regression line’s closeness and distance to the 1:1 line and also the R² value). For all of the stations, the fitted regression line of the classes “wet” (blue dots) and “normal” (green dots) was located below the 1:1 line, which showed the model under-predicted in these two classes. At stations Anzali, Babolsar, Shiraz, and Bushehr, the fitted regression plots of drought years (shown by the red dots) were completely at the top of 1:1 line and at stations Kermanhah, Shahroud, Isfahan, and Zahedan, half of the drought’s regression lines were over and the others were under the 1:1 line. So, it can be generally regarded that the model over-predicted precipitation in drought years. From the R² values, the model’s predictions were more correlated with observations in wet and normal years than in drought years.

Comparison under different climatic classes

To compare under different climates, the bar charts (Fig. 6) were constructed to show the prediction errors for all stations.

In Fig. 6, the stations were sorted by their dryness from left (the most humid) to right (the driest), and then the values of 3 criteria of RMSE, NRMSE, and WI were calculated for the test period. In the RMSE bar chart, the prediction error was the highest for Anzali station (about 155 $ \frac{\mathrm{mm}}{\mathrm{season}} $), which is located in a per humid-moderate climate region (refer to Table 1). With decreasing humidity of the climate, the model error also decreased which showed that the least error occurred in the extra arid-moderate climate region of Zahedan station with about 18 $ \frac{\mathrm{mm}}{\mathrm{season}} $. But in the NRMSE bar chart, the trend was complete to the contrary. It showed that the least normalized error occurred for humid climate stations and the highest ones for extra arid climate stations. For example, the NRMSE of Anzali and Babolsar were about 14 and 11, respectively; but in continuation, it got its highest values for Isfahan and Zahedan stations (about 20). In conclusion, referring to the third criterion was necessary, WI. The bar chart of WI showed similar results for RMSE. The best prediction belonged to the per humid-moderate climate region of Anzali station with WI≈0.91. The value of WI reduced for the dry climate area of stations Isfahan and Zahedan with the amount of 0.74 approximately. It can be said that RMSE was not a good criterion to compare different modeling results, because in different climates, it showed opposite results. This issue can also be confirmed by referring to the R² values in different climates, as shown in Fig. 5. Also, two samples are shown in Fig. 7 as a time series plot for seeing the model predictions against their observation values for stations Anzali and Isfahan.

Discussion

SARIMA model has not been used to predict seasonal cumulative precipitation so far, so the current study can be compared with monthly predictions. Dabral and Murry (2017) implemented SARIMA for monthly precipitation of Doimukh station in India. They compared SARIMA’s prediction and observed data by showing the average data of each month (both observed and predicted) in tabular form (Tables 2 and 3 of this paper). In terms of cumulative annual precipitation, only Anzali station was the closest and most similar to Doimukh station, so Anzali station is discussed.

The monthly amounts of Tables 2 and 3 from the study of Dabral and Murry (2017) were extracted. Then, their monthly average amounts were changed to seasonal average amounts and were compared with Anzali’s seasonal average. The two criteria NRMSE and WI were calculated for these two stations’ seasonal average precipitation for comparing with SARIMA’s prediction for both training and test periods (shown in Fig. 8). Results showed that SARIMA for Anzali had better results than did Doimukh in both test and training periods. This difference can be due to the climate difference and also the annual precipitation regime. In Iran’s climates (especially the cities on the margin of the Caspian Sea, such as Anzali and Babolsar), the most part of annual precipitation occurred in autumn and winter and the least precipitation amounts belonged to summer; while in Indian climates, the peak of precipitation is in monsoon season which occurs in summer. Also, it can result from the return period of SARIMA, 12 for Dabral and Murry’s study and 4 for the current study. This shows the current method for SARIMA can yield a better seasonal precipitation prediction, while the NRMSE value of Anzali station (0.032) was about 37% better than Doimokh’s (0.044) in the test period. Also, the NRMSE value for Anzali in the training period was 0.023, which was less than half of Doimukh’s NRMSE = 0.054, but it can be related to the difference between their statistical periods (in the current research; the data belongs to 68 years but in the mentioned research, it belongs to 26 years). SARIMA has also been reported as a good predictor for the prediction of monthly precipitation in some other studies in Bangladesh (Mahmud et al. 2016), India (Bari et al. 2015), and Nigeria (Eni and Adeyeye 2015) which is in line with the current study. Predicting seasonal precipitation in Australia using machine learning methods had similar results (Mekanik et al. 2013) and even weaker results (Hossain et al. 2018), in comparison to the current study (according to the available values of R & R²), with this advantage that they used climatic indexes as predictor inputs. But, an ML model with these inputs cannot be logically compared with the time series model because it just uses lags of the same precipitation data as input.

The difference in error between similar climates can also relate to physical and synoptic reasons. For example, both Anzali and Babolsar stations are located in humid climatic class, but have different prediction results (referring to Fig. 6 and NRMSE criterion). The impact of Siberian high pressure on the eastern part of the Caspian Sea’s southern coasts (Babolsar) is more atmospheric stability in the region, while the western coasts of the Caspian Sea (Anzali) are relatively more affected by western systems, such as the Black Sea and the Mediterranean Sea than eastern coasts. Atmospheric instability can cause irregularities in time series, reducing autocorrelation of the series and consequently reducing prediction accuracy in an area such as Anzali compared to Babolsar which despite having a similar climate differs in the prediction accuracy. For semi-arid climatic stations (Kermanshah and Shiraz), differences in prediction accuracy are also observed. Effective systems in the Kermanshah region include low-pressure of Saudi Arabia, low-pressure of Sudan, and Mediterranean fronts that have severe impacts on the atmospheric instability and consequently precipitation of Kermanshah, while southwestern Iran (Shiraz station) can be only affected by weaker-just the two low-pressures of Saudi Arabia and Sudan. In addition, Shiraz’s adaptation to the subtropical high-pressure belt may also be another reason for greater atmospheric stability in this area (29.53° latitude) than in Kermanshah (at 34.35° latitude). This is also true for the difference in prediction accuracy between the two arid climate stations of Bushehr (latitude 29°) and Shahroud (latitude 36.42°), which have provided more accurate predictions in the Bushehr region (Fig. 6 and NRMSE criterion). The difference between the two stations in the extra-arid climate is much smaller than in the other regions. Trade winds and monsoon systems sometimes affect the Sistan and Baluchestan area (Zahedan station) and cause atmospheric instability in the area, which may be a reason for the poorer prediction accuracy of Zahedan compared to Isfahan.

Conclusion

It was found that the SARIMA linear model better predicted seasonal precipitation in Iranian climates than did ML models. The linear relation of seasonal precipitation’s time lags was stronger than nonlinear relations in such areas. So, SARIMA is recommended for Iran. Among MLs, ANFIS was the best model (especially with the FCM clustering method), which has the least parameter for optimization, while MLP has more parameters for its network’s makeup. All of the models predict well in wet and normal years than in drought years. According to the NRMSE value which is in the range of 0.1 to 0.2, SARIMA’s performance was not excellent, but it was in good and medium classes, so it has potential for prediction of seasonal precipitation in other areas. Among the regions studied, the per-humid and humid climate regions, such as for Anzali and Babolsar, can have more accurate predictions than the arid and extra-arid climate regions, like for Bushehr, Shahroud, Isfahan, and Zahedan. The significant result is that the evaluation criterion “RMSE” is a good criterion to compare some models for one station, but it cannot be a good criterion for different climate regions. Because RMSE does not consider the variation range of data and in different climates, the variation range of data (especially precipitation data) is highly changeable; it is better to use RMSE’s normalized form as “NRMSE.” It is suggested to use climatic indexes as predictor inputs for the ML models, and optimize the MLs for precipitation prediction in Iran using complex optimization algorithms to check their efficiency.

References

Abbasi A, Khalili K, Behmanesh J, Shirzad A (2019) Drought monitoring and prediction using SPEI index and gene expression programming model in the west of Urmia Lake. Theor Appl Climatol 138(1-2):553–567. https://doi.org/10.1007/s00704-019-02825-9
Article Google Scholar
Abdul-Aziz AR, Anokye M, Kwame A, Munyakazi L, Nsowah-Nuamah NNN (2013) Modeling and forecasting rainfall pattern in Ghana as a seasonal ARIMA process: The case of Ashanti region. Int J Humanit Soc Sci 3(3):224–233
Google Scholar
Aghelpour P, Varshavian V (2020) Evaluation of stochastic and artificial intelligence models in modeling and predicting of river daily flow time series. Stoch Env Res Risk A 34:33–50. https://doi.org/10.1007/s00477-019-01761-4
Article Google Scholar
Aghelpour P, Mohammadi B, Biazar SM (2019) Long-term monthly average temperature forecasting in some climate types of Iran, using the models SARIMA, SVR, and SVR-FA. Theor Appl Climatol 138(3-4):1471–1480. https://doi.org/10.1007/s00704-019-02905-w
Article Google Scholar
Aghelpour P, Bahrami-Pichaghchi H, Kisi O (2020a) Comparison of three different bio-inspired algorithms to improve ability of neuro fuzzy approach in prediction of agricultural drought, based on three different indexes. Computers and Electronics in Agriculture, Volume 170, March 2020, 105279. https://doi.org/10.1016/j.compag.2020.105279
Aghelpour P, Guan Y, Bahrami-Pichaghchi H, Mohammadi B, Kisi O, Zhang D (2020b) Using the MODIS sensor for snow cover modeling and the assessment of drought effects on snow cover in a mountainous area. Remote Sens 12(20):3437. https://doi.org/10.3390/rs12203437
Article Google Scholar
Aghelpour P, Mohammadi B, Biazar SM, Kisi O, Sourmirinezhad Z (2020c) A theoretical approach for forecasting different types of drought simultaneously, using entropy theory and machine-learning methods. ISPRS Int J Geo Inf 9(12):701. https://doi.org/10.3390/ijgi9120701
Article Google Scholar
Aqil M, Kita I, Yano A, Nishiyama S (2007) A comparative study of artificial neural networks and neuro-fuzzy in continuous modeling of the daily and hourly behaviour of runoff. J Hydrol 337(1-2):22–34. https://doi.org/10.1016/j.jhydrol.2007.01.013
Article Google Scholar
Bari SH, Rahman MT, Hussain MM, Ray S (2015) Forecasting monthly precipitation in Sylhet city using ARIMA model. Civil Environ Res 7(1):69–77
Google Scholar
Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media, Berlin. https://doi.org/10.1007/978-1-4757-0450-1
Book Google Scholar
Bezdek JC, Ehrlich R, Full W (1984) FCM: The fuzzy C-means clustering algorithm. Comput Geosci 10(2–3):191–203. https://doi.org/10.1016/0098-3004(84)90020-7
Article Google Scholar
Dabral PP, Murry MZ (2017) Modelling and forecasting of rainfall time series using SARIMA. Environ Proc 4(2):399–419. https://doi.org/10.1007/s40710-017-0226-y
Article Google Scholar
Deo RC, Ghorbani MA, Samadianfard S, Maraseni T, Bilgili M, Biazar M (2018) Multi-layer perceptron hybrid model integrated with the firefly optimizer algorithm for windspeed prediction of target site using a limited set of neighboring reference station data. Renew Energy 116:309–323. https://doi.org/10.1016/j.renene.2017.09.078
Article Google Scholar
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. https://doi.org/10.1080/01969727308546046
Book Google Scholar
Dwivedi DK, Kelaiya JH, Sharma GR (2019) Forecasting monthly rainfall using autoregressive integrated moving average model (ARIMA) and artificial neural network (ANN) model: A case study of Junagadh, Gujarat, India. J Appl Nat Sci 11(1):35–41. https://doi.org/10.31018/jans.v11i1.1951
Article Google Scholar
Eni D, Adeyeye FJ (2015) Seasonal ARIMA modeling and forecasting of rainfall in Warri Town, Nigeria. J Geosci Environ Protec 3(06):91–98. https://doi.org/10.4236/gep.2015.36015
Article Google Scholar
Ghamariadyan M, Imteaz MA, Mekanik F (2019) A hybrid wavelet neural network (HWNN) for forecasting rainfall using temperature and climate indices. In: IOP Conference Series: Earth and Environmental Science, vol 351. IOP Publishing, p 012003. https://doi.org/10.1088/1755-1315/351/1/012003
Halabi LM, Mekhilef S, Hossain M (2018) Performance evaluation of hybrid adaptive neuro-fuzzy inference system models for predicting monthly global solar radiation. Appl Energy 213:247–261. https://doi.org/10.1016/j.apenergy.2018.01.035
Article Google Scholar
Haykin S (1999) Neural networks: a comprehensive foundation. MacMillan, New York
Google Scholar
Hiremath SM, Patra SK, and Mishra AK (2012). ANFIS with subtractive clustering-based extended data rate prediction for cognitive radio.
Hossain I, Rasel HM, Imteaz MA, Mekanik F (2018) Long-term seasonal rainfall forecasting: efficiency of linear modelling technique. Environ Earth Sci 77(7):1–10. https://doi.org/10.1007/s12665-018-7444-0
Hossain I, Rasel HM, Imteaz MA, Mekanik F (2020) Long-term seasonal rainfall forecasting using linear and non-linear modelling approaches: a case study for Western Australia. Meteorog Atmos Phys 132(1):131–141. https://doi.org/10.1007/s00703-019-00679-4
Article Google Scholar
Jahani B, Mohammadi B (2018) A comparison between the application of empirical and ANN methods for estimation of daily global solar radiation in Iran. Theor Appl Climatol 137(1-2):1257–1269. https://doi.org/10.1007/s00704-018-2666-3
Article Google Scholar
Khosravi A, Nunes RO, Assad MEH, Machado L (2018) Comparison of artificial intelligence methods in estimation of daily global solar radiation. J Clean Prod 194:342–358. https://doi.org/10.1016/j.jclepro.2018.05.147
Article Google Scholar
Kisi O, Zounemat-Kermani M (2016) Suspended sediment modeling using neuro-fuzzy embedded fuzzy c-means clustering technique. Water Resour Manag 30(11):3979–3994. https://doi.org/10.1007/s11269-016-1405-8
Article Google Scholar
Kisi O, Karimi S, Shiri J, Makarynskyy O, Yoon H (2014) Forecasting sea water levels at Mukho Station, South Korea using soft computing techniques. Int J Ocean Climate Syst 5(4):175–188. https://doi.org/10.1260/2F1759-3131.5.4.175
Article Google Scholar
Kisi O, Shiri J, Karimi S, Adnan RM (2018) Three different adaptive neuro fuzzy computing techniques for forecasting long-period daily streamflows. In: Big data in engineering applications. Springer, Singapore, pp 303–321. https://doi.org/10.1007/978-981-10-8476-8_15
Chapter Google Scholar
Kisi O, Gorgij AD, Zounemat-Kermani M, Mahdavi-Meymand A, Kim S (2019) Drought forecasting using novel heuristic methods in a semi-arid environment. J Hydrol 578:124053. https://doi.org/10.1016/j.jhydrol.2019.124053
Article Google Scholar
Lee J, Kim CG, Lee JE, Kim NW, Kim H (2018) Application of artificial neural networks to rainfall forecasting in the Geum River basin, Korea. Water 10(10):1448. https://doi.org/10.3390/w10101448
Article Google Scholar
Maca P, Pech P (2016) Forecasting SPEI and SPI drought indices using the integrated artificial neural networks. Comput Iintellig Neurosci 2016:2016–2017. https://doi.org/10.1155/2016/3868519
Article Google Scholar
Mahmud I, Bari SH, Rahman M (2016) Monthly rainfall forecast of Bangladesh using autoregressive integrated moving average method. Environ Eng Res 22(2):162–168. https://doi.org/10.4491/eer.2016.075
Article Google Scholar
Malik A, Kumar A, Singh RP (2019) Application of heuristic approaches for prediction of hydrological drought using multi-scalar Streamflow drought index. Water Resour Manag 33(11):3985–4006. https://doi.org/10.1007/s11269-019-02350-4
Article Google Scholar
Maroufpoor S, Sanikhani H, Kisi O, Deo RC, Yaseen ZM (2019) Long-term modelling of wind speeds using six different heuristic artificial intelligence approaches. Int J Climatol 39(8):3543–3557. https://doi.org/10.1002/joc.6037
Article Google Scholar
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133. https://doi.org/10.1007/BF02478259
Article Google Scholar
Mekanik F, Imteaz MA, Gato-Trinidad S, Elmahdi A (2013) Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes. J Hydrol 503:11–21. https://doi.org/10.1016/j.jhydrol.2013.08.035
Article Google Scholar
Mekanik F, Imteaz MA, Talei A (2016) Seasonal rainfall forecasting by adaptive network-based fuzzy inference system (ANFIS) using large scale climate signals. Clim Dyn 46(9-10):3097–3111. https://doi.org/10.1007/s00382-015-2755-2
Article Google Scholar
Moazenzadeh R, Mohammadi B (2019) Assessment of bio-inspired metaheuristic optimisation algorithms for estimating soil temperature. Geoderma 353:152–171. https://doi.org/10.1016/j.geoderma.2019.06.028
Article Google Scholar
Moazenzadeh R, Mohammadi B, Shamshirband S, Chau KW (2018) Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran. Eng Applic Comput Fluid Mechan 12(1):584–597. https://doi.org/10.1080/19942060.2018.1482476
Article Google Scholar
Mohammadi B, Guan Y, Aghelpour P, Emamgholizadeh S, Pillco Zolá R, Zhang D (2020) Simulation of Titicaca Lake water level fluctuations using hybrid machine learning technique integrated with Grey Wolf Optimizer Algorithm. Water 12(11):3015. https://doi.org/10.3390/w12113015
Article Google Scholar
Nanda SK, Tripathy DP, Nayak SK, Mohapatra S (2013) Prediction of rainfall in India using Artificial Neural Network (ANN) models. Int J Intellig Syst Appl 5(12):1–22. https://doi.org/10.5815/ijisa.2013.12.01
Article Google Scholar
Nyatuame M, Agodzo SK (2018) Stochastic ARIMA model for annual rainfall and maximum temperature forecasting over Tordzie watershed in Ghana. Journal of Water Land Develop 37(1):127–140. https://doi.org/10.2478/jwld-2018-0032
Article Google Scholar
Pandey PK, Tripura H, Pandey V (2019) Improving prediction accuracy of rainfall time series By Hybrid SARIMA–GARCH modeling. Nat Resour Res 28(3):1125–1138. https://doi.org/10.1007/s11053-018-9442-z
Article Google Scholar
Parsaie A, Haghiabi AH, Moradinejad A (2019) Prediction of scour depth below river pipeline using support vector machine. KSCE J Civ Eng 23(6):2503–2513. https://doi.org/10.1007/s12205-019-1327-0
Article Google Scholar
Poul AK, Shourian M, Ebrahimi H (2019) A comparative study of MLR, KNN, ANN and ANFIS models with wavelet transform in monthly stream flow prediction. Water Resour Manag 33(8):2907–2923. https://doi.org/10.1007/s11269-019-02273-0
Article Google Scholar
Rahimi J, Ebrahimpour M, Khalili A (2013) Spatial changes of extended De Martonne climatic zones affected by climate change in Iran. Theor Appl Climatol 112(3-4):409–418. https://doi.org/10.1007/s00704-012-0741-8
Article Google Scholar
Rumellhart DE (1986) Learning internal representations by error propagation. Parallel Distribut Proc 1:318–362
Google Scholar
Salas JD, Delleur J, Yevjevich W (1988) V. and Lane, WL Applied modeling of hydrological time series. Water Resources Publication, Chicago, USA
Google Scholar
Tran Anh D, Duc Dang T, Pham Van S (2019) Improved rainfall prediction using combined pre-processing methods and feed-forward neural networks. J Multidiscip Sci J 2(1):65–83. https://doi.org/10.3390/j2010006
Article Google Scholar
Valipour M, Banihabib ME, Behbahani SMR (2013) Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez dam reservoir. J Hydrol 476:433–441. https://doi.org/10.1016/j.jhydrol.2012.11.017
Article Google Scholar
Wang S, Feng J, Liu G (2013) Application of seasonal time series model in the precipitation forecast. Math Comput Model 58(3-4):677–683. https://doi.org/10.1016/j.mcm.2011.10.034
Article Google Scholar
Wang HR, Wang C, Lin X, Kang J (2014) An improved ARIMA model for precipitation simulations. Nonlinear Process Geophys 21(6):1159–1168. https://doi.org/10.5194/npg-21-1159-2014
Article Google Scholar
Yan Q, Ma C (2016) Application of integrated ARIMA and RBF network for groundwater level forecasting. Environ Earth Sci 75(5):396. https://doi.org/10.1007/s12665-015-5198-5
Article Google Scholar

Download references

Acknowledgements

This study was supported by the Bu-Ali Sina University Deputy of Research and Technology (Grant no. 99-227). The authors thank the reviewers for their valuable comments and the Iran Meteorological Organization (IRIMO) for providing the data used in this study.

Author information

Authors and Affiliations

Department of Water Science and Engineering, Faculty of Agriculture, Bu-Ali Sina University, Hamedan, Iran
Pouya Aghelpour & Vahid Varshavian
Department of Biological and Agricultural Engineering & Zachry Department of Civil & Environmental Engineering, Texas A&M University, 321 Scoates Hall, 2117 TAMU, College Station, TX, 77843-2117, USA
Vijay P. Singh

Authors

Pouya Aghelpour
View author publications
You can also search for this author in PubMed Google Scholar
Vijay P. Singh
View author publications
You can also search for this author in PubMed Google Scholar
Vahid Varshavian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vahid Varshavian.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Responsible Editor: Broder J. Merkel

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aghelpour, P., Singh, V.P. & Varshavian, V. Time series prediction of seasonal precipitation in Iran, using data-driven models: a comparison under different climatic conditions. Arab J Geosci 14, 551 (2021). https://doi.org/10.1007/s12517-021-06910-0

Download citation

Received: 26 August 2020
Accepted: 05 March 2021
Published: 19 March 2021
DOI: https://doi.org/10.1007/s12517-021-06910-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Time series prediction of seasonal precipitation in Iran, using data-driven models: a comparison under different climatic conditions

Abstract

Similar content being viewed by others

Long-range forecast of all India summer monsoon rainfall using adaptive neuro-fuzzy inference system: skill comparison with CFSv2 model simulation and real-time forecast for the year 2015

MODWT-ANN hybrid models for daily precipitation estimates with time-delayed entries in Amazon region

Artificial intelligence models for prediction of monthly rainfall without climatic data for meteorological stations in Ethiopia

Introduction