Introduction

Since arid regions of the world seriously suffer from inadequate water available, planning for optimal utilization of the current and future available water resources is very important. Therefore, prediction of future precipitation is one of the important requirements of water resource management in such regions, which has several advantages and applications. For example, a flood warning system for fast responding catchments may require a quantitative rainfall forecast to increase the lead time for warning. Similarly, a rainfall forecast system provides advanced information for drought management. Therefore, providing an accurate quantitative rainfall forecast is an important challenge in many real-world applications such as catchment management (Luk et al. 2001), flood predictions (Yu et al., 2006; Kisi and Cimen 2011), agriculture (Wei et al. 2005), etc.

Seasonal variations in rainfall pattern may modify the hydrological cycle and environmental processes in addition to the vegetation and the entire ecosystem (Lazaro et al. 2001; Ni and Zhang 2000). Generally, although the scale of these variations differs in different regions, greater spatial and temporal variations of rainfall in the arid zone rather than more humid regions are confirmed (Asadi Zarch et al. 2015). Consequently, long sequences with very little rain are common in such regions. Long dry periods may cause changes in the vegetation and structure of the soil, especially at its surface, which have a considerable effect on infiltration and the production of runoff. Therefore, extreme variability of rainfall in time and space not only causes limitations and complexities in its prediction in arid and hyper-arid areas (Batisani and Yarnal 2010; Srivastava et al. 2010), huge degradation of the ecosystem is also resulted.

Although several numerical prediction models, from one hand, have been developed currently, they are still unable to produce quantitative precipitation forecasts at proper spatial and temporal resolution (Dierer et al. 2009). Global climate model (GCM) projections, for example, require downscaling prior to be used in case studies. On the other hand, standard time series models such as autoregressive moving average (ARMA) are also extensively used for hydrological time series forecasting. These models are fundamentally linear and assume that data are stationary. Therefore, they are not able to capture non-stationarity and non-linearity in hydrologic data.

Artificial intelligence approaches with several advantages over numerical and classic time series models have been developed to overcome this problem. Artificial neural network is a branch of artificial intelligence developed in the 1950s. Artificial neural networks (ANNs) emulate the biological neural system, distributing the computation process to small and simple processing units, i.e., neurons (nodes). ANNs are generally presented as systems of interconnected “neurons,” which send messages to each other. The connections have numeric weights that can be tuned based on experience, making neural nets adaptive to inputs and capable of learning. Although meaningful physical interpretations are not usually provided to describe the weights obtained from neural networks, many studies have been successfully carried out in different fields, e.g., Zhang and Govindaraju (2000), Birikundavyi et al. (2002), Jain and Indurthy (2003), Jain and Srinivasulu (2004), Daliakopoulos et al. (2005), Mohammadi et al. (2005), Moosavi et al. (2013a, b), etc.

It can be claimed that neural networks which do not pose any restrictive hypothesis are useful models for data with high variations and cyclicity. Therefore, since difficulty of precipitation prediction in arid climates is mostly due to its high interannual variability, ANNs likely present most accurate rainfall predictions over arid regions. The three-layer feed-forward neural network as the most common ANN model is employed for the present study.

The simplest and most common types of neural networks are based on multilayer perceptrons (MLPs) that create static models, where the input–output map depends only on the present input. Several studies applied multilayer perceptrons to a set of predictive variables, carefully chosen to be related to rainfall, and with data from precipitation gauges (pluviometers) to obtain rainfall quantity (Hung et al. 2009). Hung et al. (2009) used a simple persistent model and a feed-forward neural network model for hourly rainfall data and concluded that a feed-forward neural network model with tangent hyperbolic transfer function produces the best rainfall forecasts. Moustris et al. (2011) compared ANNs with classical statistical models to forecast precipitation. They showed that the neural networks are satisfactorily able to forecast precipitation. Adya and Collopy (1998) showed that neural networks produced more accurate predictions when they are effectively implemented and validated. They confirmed that effectively implemented and validated neural network models outperformed linear regression, stepwise polynomial regression, multiple regression, discriminant analysis, logic models, and rule-based system. McCullagh et al. (1995) used neural networks to estimate the 6-h rainfall over the southeast coast of Tasmania. They showed that ANN models can produce acceptable results.

To put it simply, rain falls when vapor is saturated in the air. However, rainfall is an end product of a number of complex atmospheric processes which considerably differs both spatially and temporally. Therefore, several climatic parameters (exogenous variables or externals) are playing important roles in rainfall process. Even if the rainfall processes can be described concisely and completely, the volume of calculations involved in this type of precipitation simulation and prediction may be prohibitive. Also, the data available to assist in the definition of control variables for the process models, such as rainfall intensity, are usually limited in both the spatial and temporal dimensions (Luk et al. 2001). The accuracy of the ANN outputs depends on the ability of the network in simulating the rainfall procedure. Therefore, better training of the ANN network results in more precise predictions of precipitation. Regarding this, the aim of this research is improving the networks’ training process. To this end, exogenous variables of precipitation as well as special models are employed to the ANNs.

Therefore, this paper is motivated by this hypothesis that in arid and hyper-arid regions, under extreme spatial and temporal variation of precipitation occurrences, the ANN outputs can be enhanced by providing more relevant input data. Selecting input variables is an important and crucial consideration in recognizing the best functional form of statistical models. The task of selecting input variables is common to the development of all statistical models and is largely dependent on the discovery of relationships within the available data to identify suitable predictors of the model output.

This article is aimed to discuss such the enhancement in providing input data in one of the main strategic hyper-arid regions of Iran in Yazd province. To enhance the precision of predicted precipitation by ANNs, different sets of inputs are tested. In the first test, only precipitation time series is used as inputs. Then, several different exogenous inputs, such as temperature, relative humidity, etc., are imported step by step to the models as inputs to select the best set of input data.

Generally, the objective of this research is to predict precipitation in arid regions using feed-forward artificial neural networks using different input sets (with and without exogenous variables), lag times, hidden layer sizes, and train algorithms in different running sums. The rest of the paper is as follows: a description of the study area, implementation of the method, presenting the results and interpretation of the results, and conclusion.

Materials and methods

Study area

Iran is situated in the mid-latitude belt of arid and semi-arid areas of the Earth. The arid and semi-arid regions cover more than 60 % of the country. In these regions, the rains are highly variable in time, space, amount, and duration, and water is the most important restrictive factor for biological and agricultural activities. The study is performed in Yazd province, Iran. Yazd province is located beside the central mountains, adjacent to the kavir. The climate varies from cold steppic to semi-desert. Average annual precipitation of the study area ranges from 250 mm in Shirkouh Mountain to 80 mm in the margin of Kavir-e-Abarkouh. Minimum temperature is recorded in December (8 °C) while the highest temperature touches +45 °C in June (Fattahi 1998). The city of Yazd, one of the more ancient cities of Iran, is the capital of the Yazd province, (Fig. 1). Yazd is located in a desert environment with an annual precipitation of 60–70 mm while the rate of potential evapotranspiration is around 1750 mm (Asadi Zarch et al. 2011). Based on the UNESCO aridity index (Asadi Zarch et al. 2015), the climate of Yazd can be classified to arid.

Fig. 1
figure 1

The study area of the research

In this study, monthly precipitation amounts for Yazd city are predicted using neural network models. Monthly time series data are obtained from Iran’s meteorological organization site (www.weather.ir ) for the period of 1952 to 2010. Total data sets (59 years) are divided into three data sets: training, validation, and test sets. Training data (65 % of the total data) are presented to the network during training, and the network is adjusted according to its error. Validation data (15 %) are presented to the network generalization and to halt training when generalization stops improving. Testing data (20 %) have no effect on training and so provide an independent measure of network performance during and after training (MathWorks 2015).

Model implementation

Artificial neural networks (ANN) have been successfully used as a tool for time series prediction and modeling in a variety of application domains. In particular, when the time series is noisy and the underlying dynamical system is non-linear and not easily approached through analytical means, ANN models frequently outperform standard techniques (Huon and Poo 2013). A neural network model is composed of many artificial nodes that are linked together. The objective is to transform the inputs into meaningful outputs. In this study, the most common ANN model, i.e., the three-layer feed-forward neural network model, which was trained with back propagation method (BPANN), is used for precipitation prediction. In this model, input values are first imported to the nodes in the input and the hidden layers; then, they are processed and passed to the next layer. The number of nodes in the input layer and the output layer are considered equal to the number of input and output parameters (Mar and Naing 2008). Feed-forward neural network architecture and the corresponding learning algorithm can be viewed as a generalization of the popular least mean square (LMS) algorithm (Haykin 1999).

Background of the model

All the selected parameters affect precipitation amount but not in a quite direct way. For example, if the humidity is more or relative humidity (the amount of water vapor in the air as a percentage of its total capacity) becomes maximum, it leads to the formation of clouds and subsequent precipitation. Since warm air will hold more moisture than cold air would, the percentage of relative humidity must change with changes in air temperature. For example, at 30° air temperature, an air parcel may be saturated and will hold no more molecules of water vapor and the relative humidity can be 100 %. If the temperature of the air parcel is raised to 35°, it can hold more water molecules to reach saturation. So, the air temperature can change the relative humidity. Thus, a cool, dry air mass may actually have a higher relative humidity than a warm, moist air mass. Relative humidity alone can be misleading when comparing atmospheric moisture conditions. Therefore, importing both temperature and humidity to the ANN network may result in a better performance.

Moreover, on one hand, intensifying a variable like wind speed can accelerate evaporation and increase relative humidity. On the other hand, more wind may carry out the water vapor in the air and decrease the humidity. Therefore, the chosen factors affect the amount of precipitation in different ways. So, it is hypothesized that importing exogenous variables leads to better performance of ANN. Also, adding more of these external variables can help our model to simulate precipitation procedure better and predict precipitation more precisely. These assumptions are tested for verification. To this end, two ANN models, one just based on precipitation and the other one based on precipitation and its exogenous variables, are employed.

NAR model

Various non-linear dynamic models have been proposed in the literature and non-linear autoregressive or NAR is one of them. In this model, the future values of a time series y(t) are predicted only from the past values of that series. Therefore, there is only one series involved to train the network. The model can be shown as follows:

$$ y(t)=f\left(y\left(t\hbox{--} 1\right),\dots, y\left(t\hbox{--} d\right)\right) $$
(1)

where d is assigned delay time. In fact in this approach, the following equation is used as the main function. After configuration of input parameters of neural network models, the next step is to train the neural network models with these settings. In the present case, the models are trained by using the following train data format:

$$ {P}_t=f\left({P}_{t-1,},\dots, {P}_{t-n}\right) $$
(2)

where P shows the precipitation, n is the number of inputs based on the lag times (month), and t shows the time. In this approach, the neural network is used as a time series model such as ARIMA. In fact, both inputs and outputs are precipitation time series. In other words, every precipitation data before time t could be applied as inputs for output precipitation at time t.

NARX model

The non-linear autoregressive network with exogenous inputs (NARX) is an important class of discrete-time non-linear systems (Huon and Poo 2013). This is a powerful class of models which has been demonstrated to be well suited for modeling non-linear systems and specially time series (Diaconescu 2008). This means that the model predicts the current value of a time series based on its relation to the past values of the series and current and past values of the exogenous series (Safavieh et al. 2007). The defining equation for the NARX model is

$$ y(t)=f\left(y\left(t-1\right),y\left(t-2\right),\dots, y\left(t-{d}_y\right),u\left(t-1\right),u\left(t-2\right),\dots, u\left(t-{d}_u\right)\right) $$
(3)

where y is the output, u is the input, and d u and d y are the delays of the input and output respectively. In fact, in this approach, more than one climatic parameter is used for precipitation prediction and the network is trained using various climate monthly data such as precipitation, average minimum temperature, average maximum temperature, mean temperature, relative humidity, average wind speed, number of days with storm, number of snowy days, and number of cloudy days. Table 1 indicates the applied climatic data and their symbols for network modeling. Figure 2 illustrates the NARX model used in this research. In the used neural network models, the current time precipitation (P t ) is considered as output and the V t−1, V t−2, etc. (V stands for variables such as precipitation, temperature, wind speed, etc.) are considered as inputs. So, when the lag time is increased, the numbers of inputs are also increased.

Table 1 The climatic parameters used in the study and their symbols
Fig. 2
figure 2

NARX network used for this study with delay of 1 to 4 months

As mentioned before, the precipitation parameter in dry lands such as Yazd city has a considerable variance and also crenulations that reduce the accuracy of the predicted precipitation. Other relevant parameter relations in precipitation are moving average and running sum on precipitation. So, both running sum and moving average could be used. Since the amount of precipitation is very low in this region (like other arid zones), to show results better, running sum is selected as output.

Three, 6, 9, 12, and 18 running sums were selected as output. As the running sum performs the same to different types of monthly precipitation, it has not been normalized. Different lag times were used as input in the second approach. In the first lag time, each of the abovementioned variables in time t − 1 was considered as input. Equation 4 shows the inputs at 2 months of lag time as an example.

$$ {P}_{SU{M}_{M,t}}=f\left(\begin{array}{l}{P}_{t-1},T{ \min}_{t-1},T{ \max}_{t-1}, Tmea{n}_{t-1},R{H}_{t-1},V{P}_{t-1},W{S}_{t-1}, Dstor{m}_{t-1}, Dsno{w}_{t-1},C{D}_{t-1,}\hfill \\ {}{P}_{t-2},T{ \min}_{t-2},T{ \max}_{t-2}, Tmea{n}_{t-2},R{H}_{t-2},V{P}_{t-2},W{S}_{t-2}, Dstor{m}_{t-2}, Dsno{w}_{t-2},C{D}_{t-2,}\hfill \end{array}\right) $$
(4)

ANN architecture

Choosing the right network architecture is an important task of ANN-based studies. A neural network in general consists of highly interconnected layers of neuron-like nodes in which the input, output, and hidden layers are placed between them. The numbers of nodes in input and output layers correspond to the input and output variables of the process, respectively (Dhussa et al. 2014). In this study, for each 3, 6, 9, 12, and 18 running sums, the networks are trained up to 12 months of lag time. According to the above statement, for a 2-month lag time, 20 input variables; for a 3-month lag time, 30 inputs; and for a 12-month lag time, 120 input values were used for each output.

The number of neurons in the hidden layer allows neural networks to determine the patterns and to perform complex non-linear mapping between the input and output variables and plays very important roles for many successful applications of neural networks. The number of hidden layers and the number of nodes in each of them are decided by the user and can vary from one to a finite number. It has been verified that only one hidden layer is enough for ANNs to approximate any complex non-linear function with any desired accuracy (Horn et al. 1989). In the case of the popular one hidden layer networks, several practical guidelines exist. These include using “2n +1” (Hect-Nielsen, 1990), “2n” (Wong 1991), and “n” (Tang and Fishwick 1993) hidden neurons for better forecasting accuracy, where n is the number of input nodes. As it is confirmed by Mishra and Desai (2006), Rahimikhoob (2014), Moosavi et al. (2013a,b) and Shirmohammadi et al. (2013), the optimum number of neurons in hidden layer cannot be always determined using a specific formula and should be investigated by a trial-and-error method. For this study, up to 40 hidden layers are performed to obtain the best ANN architecture. Figure 3 shows the applied feed-forward network with three hidden layers and 1-month lag time.

Fig. 3
figure 3

The feed-forward network used for the study with lag time of 1 and hidden layer size of 3

Training methods

In order to train the models, the backpropagation approach is used. The aim is to create a network that gives an optimum result. There are many variations of the backpropagation algorithm. For example, the gradient descent with momentum and adaptive learning rate backpropagation (GDX) has widely been considered as an effective backpropagation learning algorithm. Also, other backpropagation learning algorithms have been surveyed in different ANN architects. In this study, to make sure that the best selected architecture is trained by the most efficient training method, nine popular training algorithms are employed to the data and the most capable method is ascertained. The applied training algorithms are (1) Levenberg-Marquardt (LM), (2) BFGS quasi-Newton (BFG), (3) resilient backpropagation (RP), (4) scaled conjugate gradient (SCG), (5) conjugate gradient with Powell/Beale restarts (CGB), (6) Fletcher-Powell conjugate gradient (CGF), (7) Polak-Ribiére conjugate gradient (CGP), (8) one-step secant (OSS), and (9) variable learning rate gradient descent (GDX).

Performance comparison of models

Correlation coefficient (R) and root mean squared error (RMSE) were used to compare the performance of models and to select the best one (Sreekanth et al. 2009).

$$ R=\left(\frac{{\displaystyle \sum_{i=1}^n\left({o}_i-\overline{o}\right)\left({e}_i-\overline{e}\right)}}{\sqrt{{\displaystyle \sum_{i=1}^n{\left({o}_i-\overline{o}\right)}^2}}\sqrt{{\displaystyle \sum_{i=1}^n{\left({e}_i-\overline{e}\right)}^2}}}\right) $$
(5)
$$ \mathrm{RMSE}=\sqrt{\frac{{\displaystyle \sum_{i=1}^n{\left({o}_i-{e}_i\right)}^2}}{n}} $$
(6)

where o, e, and n are observed precipitation, estimated precipitation, and number of data, respectively.

Results and discussion

The performance without and with importing the externals

As mentioned earlier, ten climatic variables (precipitation and nine other parameters as exogenous factors of precipitation) are used as data for this study. To compare the performance of the two employed models, NAR and NARX, precipitation is predicted by applying both the models. As explained before, to run NAR, just precipitation data are required. Performance result of NAR is described in the third row of Table 2. Regarding existing nine externals and to show results of NARX more smoothly and efficiently, the model is run nine times with nine different groups of data as input of the model.

Table 2 Performance of the networks using different input sets for different time series

As presented in Table 2, in the first run, precipitation (as target input) and minimum temperature (as an exogenous variable of the target input) are imported to the ANN network as inputs. In the second run, in addition to precipitation and minimum temperature, maximum temperature is also employed to NARX as second exogenous. Finally, in the ninth run, all the nine externals are inserted into the model. As the information of the table shows, based on both the performance indexes (R 2 and RMSE), all the nine types of the NARX model perform better than did those of NAR in all the five time series. It should be noted that lower values of RMSE represent higher correlations. Among the NARX models, with increasing the number of imported exogenous variables, the performance is clearly and regularly rising. Table 2 also presents that the network performance is improving by increasing the size of time scales. Therefore, the networks with ten inputs and 18-month time scale are the most accurate ones.

The most effective exogenous variable

As explained earlier, although all the selected variables affect the amounts of precipitation in a region, undoubtedly, the control of some of them on rainfall procedure is higher than of the other ones. Knowing the more effective exogenous parameters has lots of benefits. It helps meteorologists to better understand the procedure of falling rain. So, it contributes them to model rainfall more efficiently and accurately. Therefore, determining efficiency of any of the nine considered exogenous parameters in rainfall forecasting in arid and hyper-arid areas is ended in this research. The higher performance, recognized by the results of performance criteria, resulted by any of the externals in precipitation prediction shows more effectiveness on precipitation patterns. To this end, the mean correlation of any of the exogenous variables for any of the five time series is ascertained and presented in Table 3.

Table 3 Mean R 2 values for different externals for the five time series

The table shows that in the 3-month time series, the temperature parameters (minimum, maximum, and mean) are equally the most effective exogenous variables to increase accuracy of precipitation prediction. In the case of the 6-month time scale, in addition to the temperature parameters, importing wind speed also remarkably causes better rainfall forecasts. In time series of 9 and 12, employing wind speed as an exogenous parameter results with the highest correlation between observed and simulated values. Finally, in the 18-month series, the predicted values based on maximum and mean temperatures are (equally) highly correlated with the corresponding real precipitation amounts.

Figure 4 depicts real observations and their corresponded ANN simulated values for the 12-month time series in ten situations of the imported data, which is based on the order showed in Table 3. As the plots show, considering wind speed as an external results in a relatively good match of precipitation predictions and real values. However, it can be claimed that this model is unable to exactly simulate the peaks. As the figure shows, taking some other variables (i.e., vapor pressure) into account as an exogenous can lead to better prediction of peaks. This is of importance for the studies dealing with extremes like floods or droughts and dam operation.

Fig. 4
figure 4

Efficiency of each external variable in 12 months of precipitation forecasting (ordered based on the first column of Table 2). Blue and red lines present real observations and ANN predictions, respectively

The architecture of the applied networks

As mentioned earlier, the 3-, 6-, 9-, 12-, and 18-month time series of precipitation and its nine external parameters are employed to the networks with lag time varying from 1 to 12 months and hidden layer size from 1 to 40. The networks are trained by applying nine popular algorithms. The performance of any network is calculated. The information of the network with highest performance for each of the ten groups of inputs for any of the five time series is presented in Table 4. It should be mentioned that importing the variables to the networks follows the order presented in Table 3. The italic rows show the best performance among the ten groups of data in each of the time series.

Table 4 Performance of the best networks and their architecture for different groups of exogenous variables in the five time series

As it can be clearly seen from the table, the network with all the ten input variables show the highest performance in any time series. It should be noted that in the 3-month time series, the correlation of the networks with nine and ten inputs are the same at 0.89. In all the time series, the performance presented by the network with one input toward ten inputs has generally an increasing trend, even though there are some fluctuations. The table shows that 1 month is the best lag time assigned to the networks in 6, 9,12, and 18 months. In 3 months, both 1 and 12 months are the most efficient lags. It may be because of the high temporal variability of rainfall in arid and hyper-arid lands (Asadi Zarch et al. 2015). Hence, high irregularity of precipitation mounts in these regions may cause higher efficiency of lower lag times.

About hidden layer size, all the best networks have a size between 31 and 40. Therefore, higher sizes of hidden layer significantly result in better performance of the networks. As mentioned before, nine train algorithms are used in this research. Table 4 shows that method 9 (GDX) performs by far better than the other algorithms did. In 3, 6, 9, and 12 months of time series, the best networks are trained by GDX. GDX uses backpropagation to calculate derivatives of performance the function with respect to the weight and bias variables of the network. Each variable is adjusted according to the gradient descent with momentum. Just in 18 months, the CGF method shows a higher performance. Therefore, it can be strongly concluded that GDX training algorithm is a capable method in training networks to predict precipitation in arid and hyper-arid climates.

Figure 5 compares the predicted values and real observations for the best networks of the five time series (shown in Table 4). As the figure shows, matching of the simulated and real values is clearly increasing from 3 toward the 18 months time series. The performance criteria (presented in Table 4) also show this trend quite clearly. The figure presents that in the 6 and 18 months time series, the networks are able to simulate high and low points accurately. Therefore, if a short time scale prediction is desired, the 6 months time series can be employed. However, if a longer time is preferred, 18 months of prediction would be best choice.

Fig. 5
figure 5

Comparison of real values of rainfall (blue lines) and predicted values (red lines) of the best network in different time series (presented in Table 4)

To better understand how hidden layer size affects the network, Fig. 6 illustrates how performance of the networks respond to changes in the number of hidden layers. To this end, all the performance values for any of 1 to 40 sizes (which changed based on lag times between 1 and 12 and nine train algorithms in situation of entering all the ten externals into the networks) are averaged and presented. The figure shows that in all the time series, there is a significant rising trend of performance, when the size of hidden layer increases. Therefore, higher number of hidden layers definitely causes higher performance. It also can be claimed that increasing the size of hidden layer beyond of 40 may result in even higher accuracy. However, it should be considered that the computation time is also increased when the number of neurons in the hidden layer rises.

Fig. 6
figure 6

Performance of different sizes of hidden layer in different time series, when all the externals are considered in the network

The performance of different lag times in different time series for all the ten situations of input data (presented in Table 2) is illustrated in Fig. 7. It should be noted that the size of delay and feedback delay are set as the same. As mentioned previously, 1 month of lag time shows the highest accuracy in all the time series. In spite of some fluctuations in 3 and 6 months, from 1 to 12 lags, generally first, a decreasing and then an increasing trend can be seen in almost all the time series. Although performance of 1 month lag is relatively high in all situations, the performance of 12 months of lag depends on the group of data considered as input. It can be clearly seen that with increasing the number of exogenous variables to 10, the performance of 12 lag time remarkably grows.

Fig. 7
figure 7

Performance of 1 to 12 months of lag times in 3, 6, 9, 12, and 18 months time series

Conclusions

Regarding high variability of rainfall in arid and hyper-arid regions, ANNs were assumed to be a capable tool to simulate and forecast precipitation. In this paper, traditional feed-forward, multilayer perceptron (MLP) networks were used. The aim of this research was improving accuracy of ANNs in precipitation simulations in arid climates. The idea to reach this end was importing exogenous variables to the networks. Nine climatic parameters were employed as most potentially effective factors on rainfall procedure. Then, NARX and NAR (the model with and with externals, respectively) were employed. To obtain the best architecture of the network, 1 to 12 months of lag times, 1 to 40 hidden layers, and 9 train algorithms are also applied. The results showed that the NARX model performers by far better than NAR. NARX model with more externals also presented higher performances. Less lags, larger hidden layer sizes, and GDX training algorithm also indicated more accurate simulations in almost all the time series.

As shown before, the ideal prediction is produced by including all these exogenous parameters in the ANN network. However, it can be claimed that determining the most effective exogenous on rainfall is significantly of importance, especially to better understand rainfall patterns in arid and hyper-arid zones. To reveal the most effective exogenous variable in rainfall prediction, precipitation is predicted by separately adding any of the exogenous parameters (i.e. P and T min, P and T max, etc.) to the network for each time series of 3, 6, 9, 12, and 18. Then, the correlation of the predicted values and real observations were estimated. The results showed in different time scales that various externals are more effective.