Introduction

The rainfall as a meteorological parameter is a very complex nonlinear phenomena and varies along with time and place. Nevertheless, the literature shows that the rainfall is predictable (Hung et al. 2009; Afshin et al. 2011; Wu and Chau 2013; Govinda and Hiremath 2014). Currently, the common prediction methods in meteorological organization of Iran are mainly based on analysis of synoptic maps conducted by expert meteorologists; however, there are some limitations through human’s error and low accuracy (Heydari 1996). In addition, the methods such as Mesoscale model 5th generation (MM5) that are established based on numerical equation solution need numerous input data and suffer from high computation costs (Haltiner and Williams 1980; Nayak et al. 2013). Therefore, numerical models of weather prediction cannot be effectively and practically used for real world applications in sites where there are no such information available for developing/applying a numerical model. Recently, researchers have begun to apply the artificial intelligence as a flexible and powerful tool for prediction of nonlinear complex systems. Artificial neural networks (ANN) are famous classes of artificial intelligent methods, which are inspired by the function of the brain and nervous systems in biological organisms. They are able to learn, generalize, and decide in a similar way to human kind. The ANN models often need lower inputs and less computation compared to conventional models. To figure out the relation between inputs and outputs of a system under study and extract its behavior, an ANN model needs to be trained using special training methods. After proper training, given new inputs, the resulted model can approximate the system outputs. The theoretical studies of neural networks go back to the years during decades 1980s and 1990s. Later on, ANN models were found as proper tools for nonlinear and complex problems such as atmospheric and climatologic (Kisi 2009; Tabari et al. 2010; Rezaeian-Zadeh et al. 2012) and hydrological challenges (Michaelide 2001; Kisi 2007; Rezaeianzadeh et al. 2015).

Several models of ANNs have been developed for precipitation prediction. Ramirez et al. (2005) used a multi-layer feed-forward perceptron (MLFP) neural network for daily precipitation prediction in the region of Sao Paolo State, Brazil. In their research, potential temperature, vertical component of the wind, specific humidity, air temperature, precipitable water, relative vorticity, and moisture divergence flux were used as input data for the training stage of the model. The results from this ANN model were much better than those obtained by the linear regression models. Hung et al. (2009) used 4 years of hourly data from 75 rain gauge stations in Bangkok, Thailand, to develop an ANN model for rainfall forecasting. The results showed that a generalized feed-forward ANN model with hyperbolic tangent transfer functions achieved the best generalization of rainfall. Nanda et al. (2013) used ARIMA (1,1,1) model and ANN models like multi-layer perceptron (MLP), functional-link artificial neural network (FLANN) and legendre polynomial equation ( LPE) to predict the rainfall time series data in India. MLP, FLANN, and LPE gave very accurate results for complex rainfall time series model with minimum absolute average percentage error (AAPE). Compared to ARIMA models, ANN models such as FLANN give better prediction results with less AAPE for the analysis of rainfall time series. Generally, there is no fixed ANN model as a suitable network that is able to work for all of the problems and for all different situations. Indeed, to find a proper ANN model, one should try to use different ANN models and structures to obtain an appropriate model that can work best for the particular situation of the case under study (Khalili 2006). However, according to the literature, MLFP networks have more reasonable outputs compared to other ANN types (Ninan et al. 2003; Ramirez et al. 2005; Vamsidhar et al. 2010; Kumar et al. 2012; Acharya et al. 2013; Singh and Borah 2013; Nayak et al. 2013).

In this paper, we have used ANNs to obtain a prediction model for the monthly rainfall of Mashhad’s synoptic station in Iran. This is due to the intelligent capability of the neural networks in the extraction of the system’s features, even in the cases that there is not enough information about the system dynamics. To overcome the lack of information about system dynamics, ANNs can be used as black-box models to extract the model of a given system. Also, ANN black-box models can be used for predication of the system’s behaviors if the collected data are predictable. Therefore, in this study, we have used the Hurst’s rescaled range statistical (R/S) analysis, which shows that the collected data in Mashhad station is predictable based on the past information. This allows us to use ANNs for rainfall prediction in Mashhad’s synoptic station in Iran. We have developed several black-box MLFP structures by using only previous monthly precipitation data. The developed ANN structures prediction results show the effectiveness of the method. This paper extends our preliminary results (Khalili et al. 2011) and applies the techniques to monthly time-frames and details the procedure for choosing and tuning a proper MLFP ANN for predicting rainfall.

The significance of this work is the application of ANNs to rainfall time series obtained by Mashhad synoptic weather station as a continental climate case with a challenging dynamical weather behavior being interfered with different air masses caused by Polar continental, Maritime Tropical, and Sudanian air masses. This makes the rainfall prediction extremely difficult. Nevertheless, the results of this study show a very reasonable rainfall prediction. Furthermore, this is a fact that there is no universal ANN model that can work for all weather conditions. Therefore, this paper details the procedure for choosing and tuning a proper multi-layer feed-forward perceptron ANN tool for predicting rainfall. The result of this research can be directly used for predicting the rainfall, developing warning systems for possible floods, calculating the drought risk in Mashhad, or other synoptic stations with similar continental climates. In addition, the procedure for developing ANN models for rainfall prediction can be used for other climatic parameters.

The rest of this paper is organized as follows. First, the employed ANN method and other preliminaries are discussed in the “Material and method” section. The “Results and discussions” section describes the development of different ANN models for the monthly rainfall at Mashhad synoptic weather station. Finally, the “Conclusion” section concludes the paper.

Material and method

Monthly rainfall data was collected from Mashhad’s synoptic weather station. Mashhad’s synoptic weather station is located in the northeast of Iran (Razavi-Khorasan province). Figure 1 shows the location of The Razavi-Khorasan province and Mashhad in Iran. This synoptic station is located at 36° 16′ northern longitude, 59° 38′ eastern latitude and 999.2-m elevation. Mashhad is with special geographical situation and with a continental climate, resulted from interfacing different air masses including Polar continental, Maritime Tropical, and Sudanian air masses (Fig. 1). Overall, its climate is mid dry. It has dry-hot summers and wet-cold winters. The maximum annual temperature is about +35 °C and the minimum is about −15 °C. The annual average rainfall is about 253 mm in Mashhad. The long-term monthly total average of rainfall data for each month is shown in Table 1.

Fig. 1
figure 1

Location of Mashhad

Table 1 The long-term monthly total average of rainfall for each month in Mashhad station

In the literature, MLFP has been found a suitable type of ANNs for meteorological predictions (Kumar et al. 2012; Singh and Borah 2013; Nayak et al. 2013). Therefore, in this study, we have used this structure to construct a rainfall prediction system for Mashhad’s synoptic weather station. For this purpose, we have used monthly rainfall data collected from 1953 to 2003 to train and validate the ANN models. Before developing rainfall models, we have used Hurst’s rescaled range statistical (R/S) analysis test (Gammel 1998) to assess the data predictability. Indeed, Hurst’s index captures the existence of memory effect in the given data. The collected data is arranged in the form of time series (Fig. 2). Table 2 shows the minimum, maximum, mean, and coefficients of variations (the ratio of the standard deviation to the mean) for monthly rainfall data.

Fig. 2
figure 2

The monthly rainfall time series in Mashhad

Table 2 Summary of statistical characteristics of data

Predictability of rainfall data

Hurst’s rescaled range statistical analysis (R/S analysis)

R/S time series analysis test is based on general statistical properties expected for an independent Gaussian process. It is particularly useful as a general tool to test the presence of a long-run statistical dependence of the given data. R/S analysis is a non-parametric method, meaning that there is no assumption/requirement of the shape and distribution of the underlying data set.

Hurst (1951) empirically found that the scaling relation \( X\left(t,T\right)={\displaystyle \sum_{u=1}^t\left({x}_t-{\overline{x}}_T\right)} \)describes many natural phenomena. He introduced an exponent to capture the indecency of the given data. He showed that the Hurst exponent, H, satisfies the following relation (Hurst 1951; Qian and Rasheed 2004):

$$ R(T)/S(T)={T}^H $$
(1)

where H is the Hurst exponent, T is the number of samples, S (T) is the standard deviation, and R (T) is self-adjusted range defined as follows:

$$ R(T)=\underset{t=1}{\overset{T}{ \max }}\;X\left(t,T\right)-\underset{t=1}{\overset{T}{ \min }}\;X\left(t,T\right) $$
(2)

Where, X (t,T) is the sum of cumulative deviation which can be calculated by Eq. (3):

$$ X\left(t,T\right)={\displaystyle \sum_{u=1}^t\left({x}_t-{\overline{x}}_T\right)} $$
(3)

On the other hand, S (T) can be calculated as follows:

$$ S(T)={\left[\frac{1}{T}{\displaystyle \sum_{t=1}^T{\left({x}_t-{\overline{x}}_T\right)}^2}\right]}^{1/2} $$
(4)

Now, following the definition of Hurst exponent in Eq. (1), H can be obtained as follows:

$$ H= \log \left(R/S\right)/ \log (T) $$
(5)

Deriving the above equations for a given data provides us with the Hurst exponent. A Hurst exponent of H = 0.5 implies that the given data is for an independent process. Hurst exponents of 0 ≤ H < 0.5 presents an anti-persistent time series. Hurst exponents of 0.5 < H ≤ 1 imply a persistent time series characterized by long-memory effects and show that the given data set is predictable.

Artificial neural networks (ANN)

Figure 3 shows an example for ANN networks, with three layers. In this ANN model, synapse weights, w, bias, b, the activation functions in hidden layer, f(x), and output function in the layer, g(x), specify the relation between neuron inputs, x i, and their outputs, y i. This relation is explained mathematically in Eq. (6):

$$ y=g\left[\left[{\displaystyle \sum_{j=1}^n{w}_{kj}f\left({\displaystyle \sum_{i=1}^n\left({w}_{ji}{x}_i+b\right)}\right)+b}\right]\right] $$
(6)
Fig. 3
figure 3

The artificial neural network structure

where, x 1 , x 2 , …, x i are inputs, w j1 , w j2 , …, w jn are synapse weights in the hidden layer,w k1 , w k2 , …, w kn are synapse weights in the output layer, b is the bias or external threshold, f(x) is the activation function in the hidden layer, g(x) is the activation function in output layer, and y is the output.

Training

It is assumed that there is no prior knowledge about the system and the given data. Therefore, the network must be firstly trained with some sets of input samples. During the training process, the error between the actual output and the calculated output is propagated back through the network, to update the parameters. Hence, this method is called “back propagation algorithm” (Rumelhart et al. 1986). To start the training algorithm, at the initial step, all unknown weights in the ANN model are set to independent random values. Back propagation uses gradient decent algorithm to minimize the averaged squared error E that is the difference between the predicted output data and the actual output data. The value of E for the whole set of training data can be computed as given in Eq. (7):

$$ E={\displaystyle \sum_{m=1}^M{E}_m={{\displaystyle \sum_{m=1}^M{\displaystyle \sum_{n=1}^N\;\left({y}_{mn}-{y}_{prd\;mn}\right)}}}^2} $$
(7)

where E is the total error, M is the number of input sets, E m is the error for the m th input sample, N is the number of outputs, and y mn and y prdmn are actual and predicted outputs.

Indeed, the training algorithm is an auto-correction mechanism based on the comparison between the actual and the predicted output samples. This comparison will be used to adjust the weights in the ANN model by small amounts Δ m W ji calculated from the gradient descent rule as described in Eq. (8):

$$ {\triangle}_m{W}_{ij}=-\eta \frac{\partial {E}_m}{\partial {W}_{ji}} $$
(8)

where, η is the learning rate.

This procedure of the adjustment of ANN model weights will continue until the error is minimized, or it becomes less than a predefined threshold, or the desired number of training periods (epochs) is achieved.

Structure of utilized ANNs

After the assessment of the predictability of available data (R/S analysis), different types of neural network models are tested and their performance were evaluated. Based on our experiments and also based on the literature, MLFP neural network with back propagation training method can provide us with proper results (Vamsidhar et al. 2010; Kumar et al. 2012; Singh and Borah 2013). The monthly data for 51 years from 1953 to 2003 are arranged to form a time series with the length of the 612 monthly data. Then, we have constructed a matrix with m rows and n columns in which, each row corresponds to one of the data samples to be given to the ANN model. Each row is an augmented data set in which the columns 1 to n-1 are the information that are required to be used to predict the target data in the last column. In other words, columns 1 to n-1 are the inputs of the ANN and the last column is the data to be estimated (target data).

Among these collected data samples, 550 data samples are used for the training phase and the rest are used for the validation phase. The training data set can be selected in order; however, a more comprehensive approach of training is to randomly select the augmented data in the constructed matrix. In this case, the ANN model has more chance to capture the dynamics of the system and later on, the trained model is able to respond to the given data in any order. For validation phase, we have used a similar procedure and instead of validating the system with the ordered data, we gave the validation data to the trained ANN model in a randomly selected way. Therefore, after training, we expect the trained ANN model to be able to predict the rainfall value for a desired month by providing the necessary information about the history of that particular month. In the abovementioned procedure, after trying different structures of MLFP models, the number of hidden layer neurons is chosen to achieve the best possible output. Here, the number of epochs was set to 1000 with η = 0.5. The selected activation functions in hidden and output layer of the networks are sigmoid as shown in Eqs.(9) and (10).

$$ {f}_{(x)}=\frac{1}{1+{e}^{- ax}} $$
(9)
$$ a=f(n)=n $$
(10)

In the next section, we describe different MLFP structures that we have tested for this case study. In particular, we will discuss the design of the input layers, which are related to rainfall data of the previous period, and the design of the output layers, which give the predicted monthly rainfall.

Results and discussions

Hurst exponent of monthly data in our work is calculated as 0.9621. This value implies that the monthly rainfall time series in Mashhad synoptic station is predictable. Then, using MATLAB software, several different structures of MLFP is designed with different numbers of neurons in the input and hidden layers. Each structure is introduced in the form of Mijk, in which the indices j, k, and i stand for the number of neurons in the input layer, the hidden layer, and the output layer, respectively. Among different structures, we found M531and M741 as the structures that give the best performance.

M531 neural network

The input layer of the M531 network consists of five neurons for the last five bi-monthly rainfall moving-averages. For this structure, three neurons in the hidden layer yields the best prediction performance. The structure of this network is shown in Fig. 4.

Fig. 4
figure 4

M531 neural network structure

M741 neural network

In the M741 structure, seven neurons were used in the input layer. These inputs are the long-term average of rainfall for the estimated month, last year rainfall for the estimated month, and the last five bi-monthly rainfall moving-averages, respectively. After several trials and errors for finding the optimal number of neurons in the hidden layer, four neurons were selected for this layer. Moreover, for this structure, all input data are initially normalized. The network output should be de-normalized correspondingly. Figure 5 shows the topology of this network.

Fig. 5
figure 5

M741 neural network structure

The correlation coefficients between the input variables as well as the correlation relation between the input and the output variables are shown in Tables 3 and 4. As these tables show, the important data are the previous month rainfall (i), the long-term average of rainfall for the estimated month (long-term avei), and last year rainfall for the estimated month (i-12). The small effects of the other parameters with small correlation can be used for leveraging and fine-tuning of the resulting ANN model. Through the training procedure, the ANN model automatically will obtain the weights so that the more important factors have higher weights in the resulted model.

Table 3 Correlation coefficients between the input variables as well as between input and output variables for M531
Table 4 Correlation coefficients between the input variables as well as between input and output variables for M741

The obtained results are shown in Table 5 that include correlation coefficient (R), root mean square error (RMSE), mean absolute error (MAE), variance accounted for between two signals (VAF) (Trabelsi and Lafont, 2004), and non dimensional error index (NDEI) for monthly rainfall prediction using M531and M741.

Table 5 Results of used superior ANNs structures

Regression analysis equations between predicted and actual rainfall for validation phase of the proposed MLFP structures are shown in Figs. 6 and 7. For M531 model, a strong correlation (R 2 = 0.82) is shown in Table 5. Also, Fig. 6 shows that this model provides acceptable predictions of monthly rainfall values for Mashhad’s synoptic station. However, there is a slight tendency to underestimate the rainfall (Fig. 6). Figure 8 shows the actual and predicted rainfall in validation phase for M531 model. The results do not show significant improvement by changing the structure of the hidden layer or even increasing the number of neurons in the input or hidden layer. It should be highlighted that in this MLFP structure, only bi-monthly rainfall moving-averages are used in the input layer and this may not comprehensively capture the dynamics of the system. To leverage the prediction performance, we used the M741 structure in which a richer input dataset is used to feed the network.

Fig. 6
figure 6

Regression analysis between predicted and actual monthly rainfall for validation phase of M531

Fig. 7
figure 7

Regression analysis between predicted and actual monthly rainfall for validation phase of M741

Fig. 8
figure 8

Accordance between predicted and actual monthly rainfall for validation phase of M531

Figure 9 shows a stronger correlation (R 2 = 0.87) between predicted and actual rainfall values for validation phase of M741 compared to M531 shown in Fig. 8. As it can be seen in Fig. 9, there is a tendency to overestimate the rainfall. Overall, although both M531 and M741 give satisfactory performance, using input neurons including long-term average of rainfall and last year rainfall for the estimated month in addition to the last five bi-monthly rainfall moving-averages in M741 provides richer input dataset, which leads to the better performance of M741 comparing to M531. Figure 9 shows the satisfactory performance of M741 for validation of monthly rainfall.

Fig. 9
figure 9

Accordance between predicted and actual monthly rainfall for validation phase of M741

It is worth mentioning that the other structures of MLFP based on different input data and neuron numbers in hidden layer for monthly rainfall prediction like M331, M431, M631, M731, M541, M641, M841, and M941 structures were examined. However, M531 and M741 have relatively a better performance.

Conclusion

The proposed method and discussed results on using MLFP networks for the rainfall prediction can be recapped as follows:

  1. 1.

    There are different ANN model structures, training algorithms, activation functions, and number of epochs. This makes the process of finding a proper model for a particular problem difficult. The selection of appropriate structure could be specified only by the designer experience and with a trial and error procedure.

  2. 2.

    Regarding the training and validation phases for available rainfall data in this study, the best-achieved structure was a three-layer feed-forward perceptron with a back propagation algorithm in the form of M741. The inputs were selected from the monthly dataset in the past time steps with a special pattern. In this structure of ANN, the prediction performance was assessed by different statistical criteria like R, RMSE, and MAE. For the achieved M741 model, these parameters were obtained as 0.93, 0.99, and 6.02 mm, respectively.

  3. 3.

    Using long-term average of rainfall for the current month along with rainfall for the corresponding month in the previous year and the last five bi-monthly rainfall moving-averages results in a much better prediction performance.

  4. 4.

    The achieved ANN model gave satisfactory prediction performance.

Although this paper shows that the black-box model is capable of predicting the rainfall, it is reasonable to employ the prior information in our rainfall model in the form of a gray-box ANN model to improve the prediction performance. Therefore, as our future work, we are going to use a gray-box model instead of the black-box to incorporate the prior information into our rainfall model.

Moreover, as it was shown that the prediction results for this synoptic station were quite accurate, it encourages us to investigate the method for other meteorological parameters in this station. Furthermore, to assess the generalizability of the proposed method, we are planning to apply this method to other stations with different meteorological characteristics to see whether the results are as satisfactory as what we achieved for Mashhad synoptic station.