Introduction

Rainfall is a principal component of hydrological studies. Rainfall is also a complex process, which varies in time and space. Therefore, rainfall prediction in time and space is of great importance. Moreover, the behavior of rainfall time series can be analyzed in different scales of time and space. There are many approaches to predict meteorological parameters; for instance, Sharifi and Souri (2015) utilized a hybrid model to predict time series of precipitable water vapor derived from GPS measurements. Haktanir et al. (2013) analyzed the maximum daily rainfall series recorded at two stations across the Mediterranean Sea using stochastic approaches. In this study, monthly rainfall was predicted in a catchment with area of 44,491 km2.

There are several researches that proposed different approaches for rainfall prediction. However, the models that are currently applied to spatio-temporal data can be divided into two main categories, including parametric and nonparametric methods. Autoregressive (AR) models (Yevjevich 1972), autoregressive moving-average (ARMA) models (Carlson et al. 1970), and autoregressive integrated moving average (STARIMA) models (Pfeifer and Deutrch 1980) are examples of parametric methods. Furthermore, ANNs models (Moustris et al. 2011), and self-organizing maps (SOM) (Hsu and Li 2010), are examples of nonparametric methods in the field of spatio-temporal precipitation analysis. In this study, ANN models were used for spatial and temporal monthly rainfall prediction.

The wavelet is a popular tool in signal processing and has been developed by Grossmann and Morlet (1984). Wavelet transform has some advantages over Fourier transform due to its capability of analyzing signals at multi-time scales and good localization in time and scale domains (Xu et al. 2005). Therefore, wavelet transform is appropriate to analyze irregularly distributed events and nonstationary time series (Guimaraes Santos et al. 2003). Guimaraes Santos et al. (2003) have used wavelet analysis for a long time series of monthly precipitation data from different regions of the world to characterize the distinct time-frequency precipitation variability in each of these areas. Xu et al. (2005) used wavelet analysis for monthly and annual rainfall data of 1955–2000 in Hebei Plain. Furthermore, Veitch (2005) studied wavelet neural networks and their usefulness for dynamical systems applications such as prediction of chaotic time series and nonlinear noise reduction. Partal and Cigizoglu (2009) predicted the daily rainfall of Turkey using the wavelet–neural network method, which combines discrete wavelet transform (DWT) and artificial neural networks (ANN). Ramana et al. (2013) applied wavelet and ANN models to monthly rainfall data of Darjeeling rain gauge station for temporal analysis. In the present study, a model which combines wavelet analysis and ANN has been applied to each synoptic station in Qara-Qum catchment, located in the northeast of Iran. Thus, the results of the present model could represent the temporal characteristics of rainfall series at each synoptic station.

There are some factors associated with rainfall. Presence of moisture is closely related to rainfall. Dew point measurements can be used to reveal atmospheric moisture. Evaporation is also caused by temperature for adding moisture. Furthermore, the moisture variation can be caused by wind mixing the air mass (Sivaramakrishnan and Meganathan 2012). Therefore, ancillary data considered for temporal analysis of rainfall are dew point, temperature, and wind speed.

In order to generate continuous surfaces from meteorological data, many different interpolation techniques could be used. Among them, ANN has received increasing interest. The ANN method does not have any limiting assumption about stationarity and handling nonlinear relationships of spatial data (Openshaw and Openshaw 1997). Kajornrit et al. (2012) proposed the use of modular artificial neural networks to estimate missing monthly precipitation data in the northeast region of Thailand. Paraskevas et al. (2014) used an ANN model to estimate the spatial distribution of the mean annual rainfall values over the Achaia County, Greece. Furthermore, Shirin Manesh et al. (2014) successfully applied ANN-based mapping in Iran. In this study, a trial-error procedure is used to obtain the optimum network architecture for achieving optimum performance (i.e., better generalization and over-fitting avoidance). Therefore, the results of the present model could represent the spatial characteristics of precipitation data over the studied area through a geographic information system (GIS).

In addition, the spatial distribution of precipitation is strongly related to the shape of the underlying topography over a specific region (Sharples and Hutchinson 2005). Hence, considering predictors such as elevation is desirable to interpolate rainfall data. Therefore, elevation is used for interpolating precipitation data over the study area, in addition to the data point locations (2D coordinates), to achieve accurate rainfall surfaces.

Materials and methods

Study area and rainfall data

Qara-Qum catchment is one of the main 38 catchments of Iran. Qara-Qum catchment is located in the Razavi Khorasan Province in northeastern of Iran, covering an area of approximately 44,491 km2 with latitude values ranging from 34.35 (34 21′ 0″) to 37.716 (37 42′ 57.6″) and longitude values from 58.2 (58 12′ 0″) to 61.183 (61 10′ 58.79″). Qara-Qum catchment has two main rivers including Kashafrud and Jamrud rivers. This catchment also has two main mountains including Hezar Masjed and Binalood Mountains. The metropolitan city of Mashhad is located in this catchment and is the second largest and populated city in Iran.

Five synoptic weather stations were located in Qara-Qum catchment. Besides, an exhaustive quality control of precipitation data sets is needed in climate studies (Haktanir and Citakoglu 2014). However, this study uses four synoptic stations which have good quality, continuous data records for the period 1996–2010; because quality, continuity, and homogeneity of data records are considered to be more important than the number of stations. The spatial distribution of the available four synoptic stations throughout the Qara-Qum catchment is considered satisfactory. The data are from the National Centre for Climatology (NCC) of Iran. The available data are monthly rainfall values. The first 10 years (66.6 % data) were used for calibration of the model, and the remaining 5 years (33.3 % data) data were used for validation. The locations of the stations and 30-m elevation model (USGS DEMs) of the study area are shown in Fig. 1.

Fig. 1
figure 1

a Map of the study area, b the spatial distribution of synoptic stations, and c the digital elevation model of the study area

Temporal analysis

Wavelet transform

Fourier transforms are probably by far the most popular transformation among mathematical transformations. However, wavelet transform analysis appears to be a more effective tool than the Fourier transform in studying nonstationary time series. Because the wavelet transform can analyze time series that contain nonstationary power at many different frequencies (Torrence and Compo 1998). The wavelet transform is a powerful tool that provides a time–frequency representation of a signal (Drago and Boxall 2002). The basis of the wavelet transform is that a signal is compared to a function (mother wavelet). There are some commonly used wavelet families including Haar wavelet, Meyer wavelet, Daubechies wavelet, Mexican Hat wavelet, Coiflet wavelet, and Last Assymetric (Rao and Bopardikar 1998).

The discrete wavelet transform algorithm is capable of producing coefficients of fine scales for capturing high frequency information and coefficients of coarse scales for capturing low frequency information (Bunnoon et al. 2012). The DWT operates on a discretely sampled function or time series X(·), usually defining time to be finite (Veitch 2005). For a mother wavelet function ψ and for a given signal f(t), a DWT can be expressed as (Bunnoon et al. 2012):

$$ f(t)={\displaystyle \sum_k{c}_{j0,k}}{\phi}_{j0,k}(t)+{\displaystyle \sum_{j>j0}{\displaystyle \sum_k{\omega}_{j,k}{2}^{\frac{j}{2}}\psi \left({2}^jt-k\right)}} $$

where j is the level index, k is a scaling index, ϕ j0 , k is a scaling function of coarse scale coefficients, c j0 , k , ω j , k are the scaling function of detail coefficients, and all function of ψ(2j t - k) are orthonormal (Bunnoon et al. 2012).

Wavelet processing has two stages: decomposition and reconstruction. Regarding decomposition, the original signal is passed through a series of high pass filters to analyze the low-scale, high frequency components, and it is passed through a series of low pass filters to analyze the high-scale, low frequency components. Wavelet decomposition provides a signal to break down into many lower resolution components, known as the wavelet decomposition tree, which can yield a signal to get valuable information (Fig. 2). A suitable number of levels can be selected based on the nature of the signal. Wavelet decomposed components can be assembled back into the original signal without loss of information through reconstruction process (Fig. 3). Therefore, The signal is reconstructed by combining its wavelet coefficients (Pandey et al. 2010). The decomposition computes the convolution between the rainfall values and high pass/low pass filter, while the reconstruction calculates the convolution between the rainfall and inverse filter (Bunnoon et al. 2012).

Fig. 2
figure 2

Wavelet decomposition

Fig. 3
figure 3

Wavelet reconstruction

The choice of the mother wavelet depends on time series to be analyzed. Daubechies wavelet family is one of the most commonly used mother wavelet in many subjects. Daubechies wavelets also exhibit good trade-off between parsimony and information richness (Benaouda et al. 2006). Therefore, Daubechies family of wavelets were examined as the mother wavelets, and a mother wavelet based on Daubechie2 (Db2) is used for the filter’s coefficients. The reconstructed details and approximations are true parameters of the original signal (Bunnoon et al. 2012):

$$ S=A1+D1=A2+D2+D1 $$

where S is the original signal, and at level one, D1 is the high frequency detail coefficients for smoothening of the data, and A1 is the reconstructed approximation. The coefficient vectors (or A1 and D1) are produced by down sampling and are half the length of the original signal S. The low frequency content of the signal (approximation, A) demonstrates the signal identity, while the high-frequency component (detail, D) is nuance. (Ramana et al. 2013). The decomposed approximation (A) and details (D) were taken as inputs to the neural network.

Artificial neural networks

An artificial neural network is a parallel distributed network of connected processing units called neurons (Veitch 2005). Four basic elements of the neuronal model include synapses or connecting links (each is characterized by a weight or strength of its own), an adder for summing the input signals, an activation function, and an external bias (Csáji 2001). One of the most popular neural network architectures is feed forward network with backpropagation learning algorithm.

Feed forward neural networks consist of layers of neurons in which the input layer of neurons is connected to the output layer of neurons through one or more hidden layers (Minu et al. 2010). The network is able to extract higher-order statistics by adding one or more hidden layers (Partal and Cigizoglu 2009).

The backpropagation algorithm is the best-known training algorithm for multi-layer neural networks. This algorithm defines rules of propagating the network error back from network output to network input units and adjusting network weights along with this backpropagation (Lucio et al. 2007). The backpropagation can be expressed as:

$$ Y=f\left({\displaystyle \sum WX+b}\right) $$

where X is the input or hidden node value; Y is the output value of the hidden or output node; f is a transfer function, which is the rule for mapping the neuron’s summed input to its output; W is the weights connecting the input to hidden or hidden to output nodes; and b is the bias for each node (Ramana et al. 2013). The backpropagation algorithm modifies the network weights by minimizing the error between a target and computed outputs (Rezaeian-Zadeh et al. 2010). Although the backpropagation algorithm can be too slow to reach the error minimum and sometimes does not find the best solution, it usually gets an acceptable result and requires lower memory resources than most learning algorithms (Lucio et al. 2007). Thus, the conventional feed-forward backpropagation (FFBP) algorithm was used for temporal analysis.

The Levenberg–Marquardt (LM) backpropagation algorithm has been also used for ANN training. LM introduces another approximation to Hessian matrix, in order to make sure that the approximated Hessian matrix J T J is invertible (Yu and Wilamowski 2011).

$$ H\approx {J}^TJ+\upmu \mathrm{I} $$

where J is the Jacobian matrix of the performance criteria to be minimized; μ is the combination coefficient (learning rate) and is always positive; and I is the identity matrix (Yu and Wilamowski 2011).

By combining the update rule of the Gauss–Newton algorithm and the previously mentioned equation, the update rule of the LM algorithm can be expressed as:

$$ {W}_{k+1}={W}_k-{\left({J}_k^T{J}_k+\upmu \mathrm{I}\right)}^{-1}{J}_k^T{e}_k $$

where W is the weights of neural network (Yu and Wilamowski 2011).

LM algorithm switches between the steepest descent algorithm and the Gauss–Newton algorithm during the training process. When the learning rate μ is very small (nearly zero), Gauss–Newton algorithm is used. When learning rate μ is very large, the steepest descent method is used (Yu and Wilamowski 2011). LM is popular in the ANN domain, and it is even considered as the first approach for an unseen multiple layer percepteron training task (Ghaffari et al. 2006); while some other studies confirm the scale conjugate gradient as the best training algorithm (Rezaeian-Zadeh et al. 2013; Yonaba et al. 2010). LM is also faster and less easily trapped in local minima than other optimization algorithms (Ramana et al. 2013).

It is required to select the optimum number of layers (or the number of hidden nodes), during the development of the network. If the number of layers is too small, the network is not able to solve the learning task, while if the number of layers is too large, the problem of overfitting occurs (Paraskevas et al. 2014). The developed network initially begins by applying two hidden neurons. Then, the number of hidden neurons was increased gradually. It ends when no significant improvement in the ANN’s performance (the mean square error at the output layer) is achieved. In order to check over fitting properties during the training process, we applied the leave-one-out cross-validation technique. Leave-one-out cross-validation is the degenerate case of K-fold cross-calidation, where K is chosen as the total number of examples. This technique assumes a year’s observation of data from the original sample as the testing data, while the remaining observations are considered as the training data. The true error is estimated as the mean error (ME) on test examples. A logarithmic sigmoidal and a pure linear function were also used as the activation functions for the hidden and output layers, respectively. More details on ANN can be found in Karunanithi et al. (1994).

Statistical performance criteria

Several statistical indices were used as performance criteria to assess the performance of the presented model during calibration and validation processes. The ME, mean-absolute error (MAE), root mean squared error (RMSE), standard deviation of residuals (SDR), index of agreement (IA), and correlation coefficient (R) are statistical indices which were used as performance criteria. They can be expressed as:

$$ \begin{array}{l}\mathrm{ME}=\left\langle {x}_c-{x}_m\right\rangle =\left\langle {x}_c\right\rangle -\left\langle {x}_m\right\rangle \\ {}MAE=\left|\left\langle {x}_c-{x}_m\right\rangle \right|\\ {}\mathrm{RMSE}=\sqrt{\left\langle {\left({x}_c-{x}_m\right)}^2\right\rangle}\\ {}SDR=\sqrt{\left\langle {\left[\left({x}_c-{x}_m\right)-\left(\left\langle {x}_c\right\rangle -\left\langle {x}_m\right\rangle \right)\right]}^2\right\rangle}\\ {}IA=1-\frac{\left\langle {\left({x}_c-{x}_m\right)}^2\right\rangle }{\left\langle {\left(\left|{x}_c-\left\langle {x}_c\right\rangle \right|+\left|{x}_m-\left\langle {x}_m\right\rangle \right|\right)}^2\right\rangle}\\ {}R=\frac{\left\langle {x}_m{x}_c\right\rangle -\left\langle {x}_m\right\rangle \left\langle {x}_c\right\rangle }{\sqrt{\left\langle {x}_m^2\right\rangle -{\left\langle {x}_m\right\rangle}^2}\sqrt{\left\langle {x}_c^2\right\rangle -{\left\langle {x}_c\right\rangle}^2}}\end{array} $$

where x m and x c are the measured and calculated values at an observation point respectively, and <.> is the expectation (averaging) operator.

Smaller ME, MAE, RMSE, and SDR values indicate better agreement between measured and calculated values. In the case of ME indicator, positive values and negative values indicate positively and negatively biased computed values, respectively. IA indicator is a nondimensional and bounded measure with values closer to 1 indicating better agreement (Willmott et al. 1985). A correlation coefficient of 1 indicates a perfect one-to-one linear relationship and −1 indicates a negative relationship. More details on statistical indices can be found in Willmott et al. (1985).

Spatial analysis

There are some of the successful applications of ANNs in hydrology and water resources (Kişi 2007; Kişi 2009; Rezaeian-Zadeh et al. 2015; Tabari et al. 2010). We used multilayer feed forward neural networks in order to generate continuous surfaces to map the results of temporal analysis in spatial domain.

The observation data of four synoptic stations in the studied area were used. The outcomes of the validation process of the WTANN model with ancillary data (model 1) were used for spatial analysis, and the calculated rainfall values of each month between the years 2006 and 2010 were used. Therefore, the latitude, longitude, and altitude values (Coordinate System WGS’ 84) of each observation station were used as input variables, and the total monthly precipitation values (2006–2010) at each station in mm were used as target variables in the ANN model.

The backpropagation algorithm is used for training the ANN architecture. The optimum number of hidden neurons was obtained during a trial and error procedure. Thus, the developed network initially starts by applying only one hidden neuron, and then, the number of hidden neurons was increased with a step size of 1 in each trial. The procedure ends when no significant improvement in the mean square error at the output layer is achieved. A cross-validation technique has been used in order to check over fitting properties during the training process. We applied the leave one out cross-validation technique. This technique assumes an observation station values as testing data sets, while the remaining observation stations are considered as training data sets. The learning process terminates when the mean error (a measure of accuracy in statistics) calculated from the training data is reduced to a minimum or acceptable level. Finally, the optimum network architecture is obtained and precipitation values at the nonsampled locations can be computed.

After completing the training process, we used the minimum bounding rectangle of the study area, and for ease of calculation, we considered a regular grid for every 9 mins (0.15 decimal degrees) (Fig. 4). Therefore, the latitude, longitude, and elevation values (Coordinate System WGS’ 84) of each point in the considered grid were used as input variables, while the output consists of precipitation values at nonsampled locations.

Fig. 4
figure 4

The produced regular grid with spatial step 9 min (0.15 decimal degrees)

In summary, we used 12 separate networks for each month. The computed precipitation values in the nonsampled locations of each month were entered into ArcGIS 10, and a surface was fitted to them by means of the inverse distance weighted (IDW) interpolation tool. IDW was used for interpolation because of its relatively fast and easy computation and interpretation. Finally, we obtained 12 maps for each year.

Results

The decomposed approximation (A) and details (D) which were obtained through wavelet transform algorithm play different roles in the original time series. The behavior of each sub-time series (i.e., the decomposed approximation (A) and details (D)) also is distinct (Ramana et al. 2013; Wang and Ding 2003). As previously explained, the Daubechies2 wavelet has been used as mother wavelet at level 2.

We used the monthly rainfall data of 15 years. There are different methods for distributing data between training and testing data sets. Rezaeian-Zadeh et al. (2010) used a 5-year data record and randomly extracted training and testing records with various proportions. They found that randomization of datasets could make a more robust model. However, in this study, the first 10 years from 1996 to 2005 were used for calibration, and the remaining 5 years data from 2006 to 2010 were used for validation. Time series data were also standardized for zero mean and unit variation and then normalized into 0 to 1. The reasons why we selected training and testing data sets in this way are as follows: first, we used longer time series (15 years), and different precipitation patterns over recent years could be revealed through such a long data series; next, we selected 5 years of data as testing samples, and randomization of data makes the interpretation of the outputs more complicated; and finally, a cross-validation technique was used in order to avoid over fitting.

We developed four models for temporal prediction of rainfall series. The ancillary data (the time series of dew point (Fig. 5), temperature (Fig. 6), and wind speed (Fig. 7)) and the decomposed approximation (A) and details (D) of the two antecedent values of the rainfall (Fig. 8) were taken as inputs to the developed models. The properties of the used networks in this study are summarized in Table 1. Thus, besides the fact that the results of the proposed WTANN model could be compared with the ANN model, the effect of using ancillary data for rainfall prediction in the study area could also be investigated.

Fig. 5
figure 5

The time series of dew point from 1996 to 2010 at a Golmakan station, b Mashhad station, c Sarakhs station, and d Torbatejam station

Fig. 6
figure 6

The time series of temperature from 1996 to 2010 at a Golmakan station, b Mashhad station, c Sarakhs station, and d Torbatejam station

Fig. 7
figure 7

The time series of wind speed from 1996 to 2010 at a Golmakan station, b Mashhad station, c Sarakhs station, and d Torbatejam station

Fig. 8
figure 8

The decomposed wavelet sub-time series of rainfall from 1996 to 2010 at a Golmakan station, b Mashhad station, c Sarakhs station, and d Torbatejam station

Table 1 Structure of the neural networks used in this study and errors obtained from cross-validation of each model

Having defined the inputs and desired outputs to the neural networks, the weights of the neural network were adjusted by a feed forward neural network with backpropagation algorithm. The number of optimum intermediate neurons for WTANN and ANN were determined during a trial and error procedure. Leave-one-out cross-validation was also used as an estimate of the generalization performance of the used models. The average amounts of the mean errors resulting from the cross-validation of each model are presented in Table 1. From the table, it can be found that all the developed models have relatively small values of average ME, and model 1 has the lowest over fitting compared with other models.

The performance of the models was evaluated through statistical indices (Table 2). Figs. 9 and 10 also illustrate the predicted and the observed monthly rainfall time series for each station for the validation period with two of the best models, i.e., the WTANN model (model 1) and the ANN model with ancillary data (model 3), respectively. In addition, Figs. 11 and 12 show the scatter plot between the observed and modeled monthly rainfall for model 1 at Golmakan, Mashhad, Sarakhs, and Torbatejam stations during the calibration and validation periods, respectively. Figs. 13 and 14 also depict the mentioned scatter plots for model 3.

Table 2 Goodness of fit statistics of the calibration and validation of the developed models
Fig. 9
figure 9

The predicted and observed monthly rainfall time series for the WTANN model with ancillary data (model 1) at a Golmakan, b Mashhad, c Sarakhs, and d Torbatejam stations for the validation period

Fig. 10
figure 10

The predicted and observed monthly rainfall time series for the ANN model with ancillary data (model 3) at a Golmakan, b Mashhad, c Sarakhs, and d Torbatejam stations for the validation period

Fig. 11
figure 11

Scatter plot between observed and modeled monthly rainfall for the WTANN model with ancillary data (model 1) at a Golmakan, b Mashhad, c Sarakhs, and d Torbatejam stations during calibration period

Fig. 12
figure 12

Scatter plot between observed and modeled monthly rainfall for the WTANN model with ancillary data (model 1) at a Golmakan, b Mashhad, c Sarakhs, and d Torbatejam stations during validation period

Fig. 13
figure 13

Scatter plot between observed and modeled monthly rainfall for the ANN model with ancillary data (model 3) at a Golmakan, b Mashhad, c Sarakhs, and d Torbatejam stations during calibration period

Fig. 14
figure 14

Scatter plot between observed and modeled monthly rainfall for the ANN model with ancillary data (model 3) at a Golmakan, b Mashhad, c Sarakhs, and d Torbatejam stations during validation period

In the present study, a multilayer feed-forward backpropagation neural network was developed to interpolate mean monthly rainfall values of the Qara-Qum catchment, Iran. Having applied the leave-one-out cross-validation to the neural networks developed for each month, the generalization performance of each neural network was investigated. The average amounts of mean errors computed for each neural network are shown in Table 3. In particular, the results of the study outlined the important advantage of the ANN over other interpolation methods, since they neither require the specification of the function being modeled (e.g., linearity) nor make any underlying statistical assumptions about the data (e.g., normality) (Paraskevas et al. 2014). The interpolated total monthly rainfall maps between the years 2006 to 2010 were prepared, and the maps for the year 2010 are presented in Fig. 15.

Table 3 Errors obtained from cross-validation of the neural networks for each month
Fig. 15
figure 15

ANN’s total monthly precipitation values for the year 2010

Discussion

From Table 2, it is found that there is better agreement between the measured and calculated values for WTANN models when compared with ANN models because the WTANN models resulted in smaller ME, MAE, RMSE, and SDR values compared with the ANN models. Furthermore, the results of WTANN models have the IA and R values closer to 1 indicating better agreement when compared with the results of ANN models. Thus, statistical indices show that the performance of WTANN models was much better than the ANN models to predict the precipitation values.

In more details, in the case of Golmakan, Mashhad, and Torbatejam stations, all the statistical indices indicated that model 1 and model 4 had the best and worst performances, respectively. In addition, model 2 had better performance compared with model 3; i.e., using wavelet transform is more efficient than using ancillary data for precipitation prediction at these three stations. In the case of Srakhs station, the ME indicator confirms previously obtained results, while other statistical indicators suggested that model 3 had better performance than model 2. Thus, at Sarakhs station, using ancillary data to predict total monthly rainfall was more efficient than using wavelet transform for rainfall series, in spite of the fact that ME indicator showed that the predicted rainfall values for model 3 were more positively biased compared with model 2. In summary, statistical indices indicated that the WTANN model with ancillary data (model 1) was the most efficient model, while using wavelet transform (model 2) or ancillary data (model 3) led to different results depending on the study area.

The differences between the models can be better understood by comparing the plotted original rainfall values versus the predicted values. From Table 2, Fig. 9, and Fig. 10, it is clear that model 1 outperformed the results of model 3. Figure 9 shows that Mashhad station has lower over-estimation than other stations, while Sarakhs and Torbatejam stations have the highest over-estimated values. The values of ME indicator confirm the results of time series comparison. In general, positive and negative values of ME mean over- and under-estimations, respectively. The ME indicator at Mashhad station for the validation period showed small negative value which means under-estimations, while it showed positive values at other stations (i.e., over-estimations). Figure 10 also confirms that Golmakan station has the lowest over-prediction (with the smallest positive value of ME compared with other stations) using the simple ANN with ancillary data. The statistical comparison of the results also is in good agreement with the results of time series comparison. Consequently, both models predicted more real values at Golmakan and Mashhad stations compared with other two stations in the validation period.

Having drawn the scatter plots, a comparison was performed between the WTANN model proposed in this work and the ANN model. From scatter plots, it is evident that the forecasted precipitation values with the WTANN model have better agreement than the forecasted rainfall values with the ANN model because the forecasted precipitation values of the WTANN model at all of the stations were much closer to the y = x line compared with the forecasted rainfall values of ANN model. In conclusion, the drawn scatter plots confirmed the results of statistical indicators.

Regarding spatial analysis, the neural networks used to interpolate the results obtained through temporal analysis were cross-validated during the learning procedure. Table 3 shows that the least average of ME belongs to December. In general, the average of the predicted rainfall values for each station is more accurate in December compared with other months. Therefore, the neural network trained for December can predict closer rainfall values to the real values of nonsampled locations. On the contrary, the predicted values in July and August with the lowest amounts of rainfall showed the lowest accuracy. It is because the networks trained for interpolation might tend to over-predict the precipitation values. In summary, from Table 3, it is realized that the neural networks predicted less accurately in months with lower amounts of precipitation, namely July, August, October, June, and September.

Having interpolated the predicted total monthly rainfall values, the spatial distribution of total monthly rainfall was investigated over the studied region. The rainfall values are relatively high in 7 months, including January, February, March, April, May, November, and December. It can be seen that rainfall values in January, February, and March are high in the middle part of the studied area, while rainfall values are higher in the northwest of the catchment in April, May, and November, and it is high in the northeast and southwest of the Qara-Qum basin in December. In the case of other months, the precipitation values in June and October are higher than July, August, and September. In June, the rainfall values are higher in the west of the catchment, while it is higher in the east of the studied area in October. In July, August, and September, there are no significant differences between the rainfall values across the studied area. However, the presented maps can give us an interesting visualization tool to perceive the rainfall distribution over the catchment.

Figure 16 depicts the average long-term total annual rainfall distribution in the Qara-Qum catchment provided by the National Centre for Climatology of Iran. It can be perceived that rainfall values are higher in the northwest and west of the studied area. Furthermore, the lowest rainfall values belong to the southeast and east of the studied area. This pattern can be seen in the produced monthly rainfall maps in the rainy months because of their greater contribution to the total annual rainfall. Therefore, this is generally compatible with the obtained monthly rainfall maps by means of neural networks.

Fig. 16
figure 16

The average long-term total annual rainfall distribution in the study area, a rainfall isolines and b rainfall continuous map

Conclusion

In this study, mean monthly rainfall values measured at four synoptic stations over the Qara-Qum catchment are used to reveal temporal behavior and spatial distribution of mean monthly precipitation. A hybrid wavelet transform neural network model was used for temporal analysis of the monthly rainfall time series in the Qara-Qum catchment, located in the northeastern Iran. Furthermore, an artificial neural network model was used to compare the results of the WTANN model. Time series of dew point, temperature, and wind speed were also taken as ancillary data in the modeling procedure. From this analysis, it was found that the WTANN model was much better in modeling rainfall time series in the studied area because statistical indices showed better agreement between the measured and calculated values for WTANN model when compared with ANN model. It was also evident that using ancillary data improved the accuracy of predicted rainfall data. Furthermore, the scatter plot between the observed and calculated monthly rainfall for WTANN model in all of the stations was very close to the y = x line. Finally, the results of the present study showed that the implementation of the hybrid wavelet transform neural network model with ancillary data can improve the accuracy of monthly rainfall time series forecasting.

A simple method is proposed to interpolate the predicted rainfall values in the Qara-Qum catchment with the help of available synoptic stations. The ANN which was used in this study has important advantages over other interpolation methods since they neither require the linearity of the function being modeled nor make any underlying statistical assumptions about the data. The generated continuous surfaces showed the spatial distribution of monthly rainfall in the Qara-Qum catchment. It can be concluded that the use of ANN models as a spatial interpolation method can improve the accuracy of spatial analysis concerning climate analysis studies, making them useful tools in different applications. In future studies, inclusion of other variables in the input seems to be appropriate in mapping the complexity of rainfall variability over the studied area. Therefore, further studies should be carried out in interpolation of rainfall using neural networks.