1 Introduction

Reference evapotranspiration (ET0) is a very important and necessary parameter in water resources management and environmental assessment in general and irrigation scheduling in particular. A large number of methods have been developed for assessing ET0 from meteorological data. The Food and Agriculture Organization (FAO) recommends the use of the FAO-56 Penman-Monteith (FAO PM) equation as the sole method for estimating ET0 wherever the required input data are available (Allen et al. 1998; Droogers and Allen 2002). This method is a physically based approach and has been proven to accurately estimate ET0 using lysimeter data from a wide range of climate conditions (Allen et al. 1994; Itenfisu et al. 2000). It requires measurements of air temperature, relative humidity, solar radiation and wind speed. However, these climatic variables are not always measured in weather stations. Although temperature and humidity are routinely measured, solar radiation and wind speed data are rarely available over the world even in developed countries. Under these conditions, simplified models, not requiring solar radiation and wind speed data, should be considered. Determination of ET0 is a complex nonlinear phenomenon because it depends on several interacting climatologic factors. More recently, alternative approaches based on artificial neural networks (ANNs) and M5 model tree have been suggested to provide reliable estimation model for various application in engineering. The main advantage of these approaches over conventional methods is that they do not need detailed information on the physical processes of the system.

ANNs are effective tools for modelling nonlinear systems and those that are difficult to formalize. In recent years, neural network methods have been employed for the estimation of ET0 as a function of climatic variables. Some of them used the same climatic data required for application of the FAO PM method (Odhiambo et al. 2001; Kumar et al. 2002; Trajkovic et al. 2003). These researchers reported that the ANN can predict ET0 ever better than the FAO PM conventional method. Sudheer et al. (2003) and Zanetti et al. (2007) simplified the input variables and ET0 was estimated as a function of air temperature, extraterrestrial solar radiation and the daylight hours. They found satisfactory results. Chauhan and Shrivastava (2009) compared the performance of four climate based methods and Artificial Neural Networks (ANNs) for estimation of ET0 in India, when input climatic parameters are insufficient to apply FAO PM method. They concluded that the ANN models were performed better than the climatic based methods. In another study, Rahimikhoob (2010) applied ANN technique to estimate ET0 based on air temperature data under humid subtropical conditions on the southern coast of the Caspian Sea situated in the north of Iran. He showed that ANN successfully estimated the daily ET0 and simulated ET0 better than the Hargreaves conventional equation.

Recently, M5 model trees have been used successfully for flood forecasting (Solomatine and Xue 2004), water level-discharge relationship (Bhattacharya and Solomatine 2005), rainfall-runoff modeling (Solomatine and Dulal 2003), sedimentation modeling (Bhattacharya and Solomatine 2006), and estimation of ET0 (Pal and Deswal 2009). Pal and Deswal (2009) investigated the potential of M5 model tree based regression approach to model daily ET0 using four inputs including solar radiation, average air temperature, average relative humidity, and average wind speed. Results from their study suggested that M5 model tree could successfully be employed in modeling the ET0. In other research, Sattari et al. (2013a) compared the performance of an M5 model tree and support vector machine in predicting daily stream flows in the River Sohu, located within the municipal borders of Ankara, Turkey. They found that the M5 model tree was performed better than the support vector machine. Recently, two different studies have been made to investigate the ANN and M5 model tree techniques for the assessment of ET0 in two different countries, the first one (Sattari et al. 2013b) in Ankara (Turkey), and the second (Sattari et al. 2013c) in Bonab (Northwestern Iran). In both cases, the comparison results showed that the ANN model gave better performance in estimating ET0 in comparison with M5 model tree. But M5 model tree was appropriate which provides simple linear relations.

The purpose of the research reported in this article was to compare ANN model and M5 model trees to estimate monthly ET0 in an arid environment of Iran. Since the maximum and minimum air temperature and relative humidity records are more readily available around the globe, these records with extraterrestrial radiation are being used as input in above models for the estimation of ET0. Extraterrestrial radiation reflects the seasonality of ET0 and can be calculated theoretically as a function of the local latitude and Julian data, according to the equations presented by Allen et al. (1998). Thus, for proposed models in this study, only temperature and relative humidity are the parameters that require observation. Here, the FAO PM method was used as a substitute for measured ET0 data, as this is the standard procedure used when no measured lysimeter data are available (Irmak et al. 2003; Utset et al. 2004). Although in practice the best way to test the performance of the above-mentioned methods would be to compare their performances against the lysimeter-measured data, this type of data set is not available in the study area.

2 Materials and Methods

2.1 Study Area and Climate Dataset

The area under study was Sistan and Baluchestan province, which lies between latitudes 25.0°N and 31.5°N and between longitudes 58.8°E and 63.3°E. Sistan and Baluchestan province is in the south-east of Iran, borders Pakistan, Afghanistan and Oman Sea, and covers an area of 181,578 km2. On the basis of the Koppen climate classification, the climate is arid, with an average annual precipitation of about 112 mm.

2.2 Data Description

Monthly meteorological data were obtained from January 1998 through December 2007 (10 years) (120 months) from four weather stations in the study area with varying latitudes, longitudes and elevations. The annual average weather data of meteorological stations are presented in Table 1. The stations belong to the meteorological organization of Iran and spatial distribution of them within the province is shown in Fig. 1. Five monthly meteorological variables were recorded including: (1) mean maximum air temperature (Tx °C); (2) mean minimum air temperature (Tn °C); (3) mean wind speed (U m s−1); (4) mean relative humidity (RH %) and (5) bright sunshine hours (n h). Measurements were made at a height of 2 m (air temperature and relative humidity) and 10 m (wind speed) above the soil surface. Wind speed data at 2 m (U2) were obtained from those taken at 10 m using the log-wind profile equation. All measurements were made daily according to Iran Meteorological Organization with monthly data being averaged from daily data as appropriate. Mean measured monthly Tx, Tn, RH and U2 for the four meteorological stations used in the study, over 10 years are presented in Figs. 2, 3, and 4.

Table 1 Mean annual meteorological parameters averaged over 10 years for weather stations used in this study
Fig. 1
figure 1

Spatial distribution of the four meteorological stations used in the study (see Table 1 for weather station codes)

Fig. 2
figure 2

Mean measured monthly maximum and minimum air temperature averaged over 10 years for weather stations used in this study

Fig. 3
figure 3

Mean measured monthly relative humidity averaged over 10 years for weather stations used in this study

Fig. 4
figure 4

Mean measured monthly wind speed averaged over 10 years for weather stations used in this study

In order to train ANN and M5 model tree, whole data set of four stations (480 patterns, from 1998 to 2007) were collected into one group to produce a model with a higher regional capacity that could be applied to estimate ET0 for different locations in the Sistan and Bluchestan Province. This data set was divided into two parts: The first part (336 patterns, from 1998 to 2004) was used for training and the second part (144 patterns, from 2005 to 2007) was used for testing the trained model.

2.3 The FAO PM Method

The following equation was applied for the FAO PM (Allen et al. 1998):

$$ {\mathrm{ET}}_0\mathrm{PM}=\frac{0.408\varDelta \left({\mathrm{R}}_{\mathrm{n}}-\mathrm{G}\right)+\gamma \frac{900}{{\mathrm{T}}_{\mathrm{a}}+273}{\mathrm{U}}_2\left({\mathrm{e}}_{\mathrm{s}}-{\mathrm{e}}_{\mathrm{a}}\right)}{\varDelta +\gamma \left(1+0.34{\mathrm{U}}_2\right)} $$
(1)

where ET0 PM is reference crop evapotanspiration calculated using the FAO PM method (mm d−1), Rn is the daily net radiation (MJ m−2 d−1), G is the daily soil heat flux (MJ m−2 d−1), Ta is the mean daily air temperature at a height of 2 m (°C), U2 is the daily mean wind speed at a height of 2 m (m s−1), es is the saturation vapor pressure (kPa), ea is the actual vapor pressure (kPa), ∆ is the slope of the saturation vapor pressure versus the air temperature curve (kPa °C−1), and γ is the psychrometric constant (kPa °C−1). The terms in the numerator on the right-hand side of the equation are the radiation term and aerodynamic term, respectively.

In this study, the daily values of ∆, Rn, es and ea were calculated using the equations given by Allen et al. (1998). For Rn, an albedo of 0.23 (green vegetation surface) was used. Since G is usually small compared with Rn and is difficult to measure, it was assumed to be zero over the calculation time step period (daily and monthly) (Allen et al. 1998). The measured RH, Tx and Tn values were used to calculate ea and es. The daily solar or shortwave radiation (Rs) was calculated using the Angstrom formula, which relates solar radiation to extraterrestrial radiation (Ra) and relative sunshine duration. Eq. (39) in Allen et al. (1998) was used to calculate the net outgoing longwave radiation. Ra (MJ m−2 d−1), was calculated from the following equation (Allen et al. 1998):

$$ {\mathrm{R}}_{\mathrm{a}}=\frac{24(60)}{\pi }{G}_{SC}{d}_r\left[{\omega}_S \sin \left(\varphi \right) \sin \left(\delta \right)+ \cos \left(\varphi \right) \cos \left(\delta \right) \sin \left({\omega}_S\right)\right] $$
(2)

where GSC is solar constant (0.0820 MJ m−2 min−1), dr is inverse relative distance between Earth and Sun (Eq. 3), ωs sunset hour angle (Eq. 4; radians), φ is the latitude of the site (radians), δ is solar declination (Eq. 5; radians).

$$ {\mathrm{d}}_{\mathrm{r}}=1+0.033\kern0.5em \cos \left(2\pi /365\times \mathrm{J}\right) $$
(3)
$$ {\omega}_{\mathrm{s}}=\mathrm{arcos}\left[- \tan \left(\varphi \right) \tan \left(\delta \right)\right] $$
(4)
$$ \delta =0.409\kern0.5em \sin \left\{\left(2\pi /365\times \mathrm{J}\right)-1.39\right\} $$
(5)

where J is the number of the days in the year.

2.4 Artificial Neural Network (ANN)

In this study, an ANN of the multilayer perceptron (MLP) type with one input layer, one hidden layer and one output layer was used for estimating ET0 from the temperature, humidity and extraterrestrial radiation data. MLP networks consist of units (neurons) arranged in layers (input, hidden and output layer) with only forward connections to units in subsequent layers. The number of nodes in the input and the output layers depends on the number of input and output variables, respectively. The performance of the ANN depends on the number of nodes in the hidden layer. Because no specific guidelines exist for choosing the optimum number of hidden nodes for a given problem, this network parameter is often optimized using a combination of empirical rules and trial and error. Figure 2 shows the general layout of a three-layer neural network used in this study. In this structure, there are four neurons in the input layer (representing the Tx, Tn, RH and Ra variables), i neurons in a single hidden layer, and one neuron in the output layer (representing the ET0).

A neural performs a particular function by adjusting the weights of the connections between the elements. Each connection has its corresponding weight. The processing element consists of two parts. The first part simply aggregates the weighted and biases inputs; the second part is essentially a nonlinear filter, usually called the transfer function or activation function. The activation function acts as a squashing function, such that the output of a neuron in a neural network is between certain values (usually 0 and 1, or −1 and 1). Mathematically, this process is described in the Fig. 3. In this paper, the log sigmoid activation function is used for both hidden layer and output layer. This function is the most commonly used activation function. It is a continuous function that varies gradually between two asymptotic values, typically 0 and 1 which is defined as follows:

$$ {\mathrm{y}}_{\mathrm{k}}=\frac{1}{1+ \exp \left(-{\nu}_{\mathrm{k}}\right)} $$
(6)

where νk and yk denote the weighted sum of inputs to the kth hidden neuron and output from that neuron, respectively. The training of an MLP network involves finding values of the connection weights and biases, which minimize an error function between the actual network output and the corresponding target values in the training set. In this study, a backpropagation (BP) algorithm was employed to train our MLP neural network. Levenberg–Marquardt (LM), a second-order nonlinear optimization technique, was chosen from the various BP training algorithms available for use in this study. The LM algorithm is widely applied to many different domains and is faster and produces better results than other training methods (Hagan and Menhaj 1994; Tan and van Cauwenberghe 1999). In some examples, however, the BP algorithm may become trapped in a local minimum. Initial values of weights also affect in trapping in local minimum. Thus, the weights have been reinitialized and the networks retrain several times to guarantee global minimum in this research.

Generalization is the quality of neural networks that is sought following supervised learning. It is the ability to provide accurate output values for input variables that have not been seen by the network (Atkinson and Tatnall 1997). Lack of generalization is caused by overfitting. The network has memorized the training examples, but it has not learned to generalize new situations. The most common technique to circumvent overfitting is based on an early stopping criterion that halts training before convergence (Sarle 1995; Prechelt 1998). Here, the LM algorithm was used with an early stopping criterion to improve the network training speed and efficiency. The accuracy of the networks was evaluated for each epoch in the training through mean squared error. For the criterion, all the data were divided into three sets (Coulibaly et al. 2000). The first set is the training set for determining the weights and biases of the network. The second set is the validation set for evaluating the weights and biases and for deciding when to stop training. The validation error normally decreases at the beginning of the training process. When the network starts to overfit the data, the validation error begins to increase. The training is stopped when the validation error begins to increase, and the weights and biases will then be derived at the minimum error. The last data set is for testing the weights and biases to verify the effectiveness of the stopping criterion and to estimate the expected network operation on new data sets.

In this study, In order to reflect the seasonality of ET0, extraterrestrial radiation was selected as an input variable to the ANN. Therefore, in this study, maximum and minimum air temperature and relative humidity with extraterrestrial radiation were employed as input variables. The Ra was calculated as a function of the local latitude and Julian data, according to the equations presented by Allen et al. (1998). Thus, the proposed model only needs the measured values of maximum and minimum air temperature and relative humidity for estimating the ET0. In this study, the data from 1998 to 2004 for each station were collected into one set to train the network. The training set was divided at random, with 70 % being reserved to train the ANN and 30 % being used to validate the training. This data set had a total of 336 patterns. After the training process, the remaining data for each station (2005 to 2007) were used to test the network. The test data set had a total of 144 patterns that were not used for training. As the purpose of this study was the estimation of ET0, the ANN has only one output variable. The computed daily ET0 values from Eq. 1 were used as target output.

The performance of the ANN depends on the number of hidden layers and the number of nodes in each hidden layer. In general, neural networks with one hidden layer containing a sufficiently large number of hidden nodes have been shown to be capable of providing accurate approximations to any continuous nonlinear function (Hornik et al. 1989).

However, neural networks with a large number of hidden nodes may lead to overfitting of data, resulting in network models with poor predictive capability. It is thus of great importance to select an appropriate number of hidden nodes. Because no specific guidelines exist for choosing the optimum number of hidden nodes for a given problem, this network parameter is often optimized according to some empirical rules combined with trial and error.

To suit the consistency of the model, all source data were normalized in the range 0.0–1.0 and then returned to original values after the simulation using:

$$ {\mathrm{X}}_{\mathrm{norm}}=\frac{\mathrm{X}-{\mathrm{X}}_{\min }}{{\mathrm{X}}_{\max }-{\mathrm{X}}_{\min }} $$
(7)

where Xnorm is the normalized value; X is the original value; Xmin and Xmax are the maximum and minimum of original values.

2.5 M5 Model Tree

Another method that is used in this study to estimate ET0 from the temperature and relative humidity data is the M5 model tree. M5 model tree was first presented by Quinlan (1992). The model is based on a binary decision tree having linear regression functions at the terminal (leaf) nodes, which develops a relationship between independent and dependent variables. Unlike decision tree which is used for categorical data, it can also be used for quantitative data (Quinlan 1992; Mitchell 1997). M5 model tree generation requires two different stages (Quinlan 1992; Solomatine and Xue 2004). The first stage involves splitting of the data into subsets to create a decision tree. The splitting criterion is based on treating the standard deviation of the class values that reach a node as a measure of the error at that node, and calculating the expected reduction in this error as a result of testing each attribute at that node. The formula for computing the standard deviation reduction (SDR) is defined as follows (Pal and Deswal 2009):

$$ \mathrm{SDR}=\mathrm{sd}\left(\mathrm{T}\right)-{\displaystyle \sum \frac{\left|{\ \mathrm{T}}_{\mathrm{i}}\right|}{\left|\ \mathrm{T}\ \right|}\mathrm{sd}\left({\mathrm{T}}_{\mathrm{i}}\right)} $$
(8)

where T denotes a set of examples that reaches the node; Ti denotes the subset of examples that have the ith outcome of the potential set; sd denotes the standard deviation (Wang and Witten 1997). Due to the splitting process, the standard deviation of the data in child nodes (lower nodes) is less than that at the parent node. After examining all the possible splits, the one that maximizes the expected error reduction was chosen. However, this division often produces a large tree-like structure which may cause over fitting or poor generalization. To overcome this problem, in second stage the overgrown tree is pruned and then pruned sub-trees are replaced with linear regression functions. This technique of generating the model tree substantially increases the accuracy of estimation (Quinlan 1992). Figure 5a shows splitting the input space X1 × X2 (independent variables) into six subspaces (leaves) by M5 model tree algorithm. A linear regression function was built at the leaves, labeled LM1 through LM6. Figure 5b shows its relations in form of tree diagram, in which LM1 to LM6 is in leave level. Further details of the M5 model tree can be found in Quinlan (1992).

Fig. 5
figure 5

Example of M5 model tree, a splitting the input space X1 × X2 by M5 model tree algorithm, b diagram of model tree with six linear regression models at the leaves

In order to compare ANN and M5 model tree methods, the same climatic data required for the application of the ANN method were selected as input variable of the M5 model tree. Therefore, the maximum and minimum air temperatures, relative humidity and the extraterrestrial radiation were adopted as input variables for the M5 model. The data used to train and test the neural network were used to create and test the M5 model tree. Thus, the ET0 estimates by the M5 model tree (ET0 M5) can be compared with the ET0 values produced by the neural network estimates (ET0 ANN). For creating M5 model tree, based on training data set, the Weka software (Witten and Frank 2005) was used.

The performance of the ANN and M5 models was checked with three statistical indices: determination coefficient (R2), mean bias error (MBE) and root mean square error (RMSE). To ease the comparison, both MBE and RMSE indices are normalized and expressed as percentages of the mean observed ET0 (calculated with the FAO PM method) value. These indices were defined as follows:

$$ {\mathrm{R}}^2=\frac{{\left[{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{N}}\left({\mathrm{P}}_{\mathrm{i}}-\overline{\mathrm{P}}\right)\left({\mathrm{O}}_{\mathrm{i}}-\overline{\mathrm{O}}\right)}\right]}^2}{{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{N}}{\left({\mathrm{P}}_{\mathrm{i}}-\overline{\mathrm{P}}\right)}^2{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{N}}{\left({\mathrm{O}}_{\mathrm{i}}-\overline{O}\right)}^2}}} $$
(9)
$$ \mathrm{MBE}=\frac{{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{N}}\left({\mathrm{P}}_{\mathrm{i}}-{\mathrm{O}}_{\mathrm{i}}\right)}}{\mathrm{N}\overline{\mathrm{O}}}\times 100 $$
(10)
$$ \mathrm{RMSE}=\frac{\sqrt{\frac{1}{\mathrm{N}}{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{N}}{\left({\mathrm{P}}_{\mathrm{i}}-{\mathrm{O}}_{\mathrm{i}}\right)}^2}}}{\overline{O}}\times 100 $$
(11)

where N is the number of observations, Pi is the estimated ET0 (using the ANN and M5 methods), Oi is the observed ET0, \( \begin{array}{l}\overline{\mathrm{P}}\\ {}\end{array} \) and \( \overline{\mathrm{O}} \) are the average values for Pi and Oi.

3 Results and Discussion

The weather parameters considered for the ANN models with four inputs were the monthly mean daily Tx, Tn, RH and Ra. The output was the monthly mean daily ET0 calculated using the FAO PM method. The optimal node number in the hidden layer of the network was determined using a trial and error method by considering the MBE, RMSE and R2 values from a test sample. In this study, ten ANNs were trained with one to 10 nodes in the hidden layer, and the aforementioned statistical parameters were calculated using only the whole test data set after each training run. Based on the three statistical results, the network that employed six nodes in the hidden layer provided the best results, with MBE, RMSE and R2 values of 0.7 (%), 5.3 (%) and 0.99, respectively for testing data.

The data used for the training of neural network were used for creating of M5 model tree. The following is the generated model tree with only two rules:

$$ \begin{array}{l}\mathrm{Rule}\kern0.5em 1:\mathrm{If}\kern0.5em \mathrm{Ra}<=33.163\kern0.5em \mathrm{then}\kern0.5em \mathrm{LM}1\hfill \\ {}\mathrm{Rule}\kern0.5em 2:\mathrm{If}\kern0.5em \mathrm{Ra}>33.163\kern0.5em \mathrm{then}\kern0.5em \mathrm{LM}2\hfill \end{array} $$

LM1 and LM2 are linear models provided by M5 model tree with train data set:

$$ \begin{array}{l}\mathrm{LM}1:{\mathrm{ET}}_0=0.0601*{\mathrm{T}}_{\mathrm{n}}-0.0108*{\mathrm{T}}_{\mathrm{x}}-0.0481*\mathrm{RH}+0.1528*\mathrm{Ra}+1.3661\hfill \\ {}\mathrm{LM}2:{\mathrm{ET}}_0=0.0907*{\mathrm{T}}_{\mathrm{n}}-0.0108*{\mathrm{T}}_{\mathrm{x}}-0.0959*\mathrm{RH}+0.2279*\mathrm{Ra}-0.418\hfill \end{array} $$

The developed ANN and M5 model tree were applied on the test data set and the statistical summary of ET0 estimate for all the locations is presented in Table 2. It is clear from Table 2 that the difference between the two models is quite small. The RMSEs for both methods are generally low, indicating that for either method the systematic error is small. From Table 2, the RMSE has a maximum of 7.8 and 9.4 % for the ANN and M5 models, respectively. The RMSE varies between 0.5 and 7.8 % for the ANN model. It varies between 8.9 and 9.4 % for the M5 model tree. Generally, the result in Table 2 showed that use of ANN model offered an advantage over the use of M5 model tree for the data in study area; however, differences with the statistical approach are small. The selected ANN model showed very good performance when compared to values estimated FAO PM method. This model, with an R2 of 0.98, RMSE of 5.6 % and MBE of 0.8 % produces a small overestimation. The M5 model tree also performs well compared with FAO PM estimates with 2.1 % overestimation, a RMSE of 8.9 % and an R2 of 0.98.

Table 2 Statistical summary of ET0 estimates for four locations in Sistan and Bluchestan province

The ET0 estimates of developed ANN and M5 model tree at four weather stations for test data set are illustrated in Fig. 6 in the form of scatterplot. In both the cases, all ET0 data appear to be well distributed along the 1: 1 line. A good correlation was observed for all sites in both cases, with R2 higher than 0.97. The selected ANN and the M5 model tree models perform very well when compared with the FAO PM estimates. The slopes of the straight lines in both models are nearly close to one, and neither overestimations nor underestimations are produced in the range of the values studied. This verifies that the models can be used to estimate ET0 values for different days.

Fig. 6
figure 6

Comparison between the values of ET0 calculated by FAO PM method and those by two methods at four weather stations for test data, a ANN, b M5 model tree

Figure 7 showes the comparison between monthly mean of daily ET0 values estimated by FAO PM method and those calculated by the selected ANN model and the M5 model tree during the test period. It can be seen that both models have no significant MBE. In both models, the evolution is similar and one line is practically superimposed over the other.

Fig. 7
figure 7

Evolution of the ET0 calculated using the FAO PM values and those estimated by the proposed ANN and M5 models during the test period, a ANN, b M5 model tree

4 Conclusions

The results showed that the both neural network and M5 models provide quite good agreement with the ET0 obtained by the FAO PM method. They gave reliable estimation at all the locations. The study demonstrated that modelling of ET0 through the use of ANN technique gave better estimates than the M5 model tree. However, differences with the M5 model tree are small. The advantage of the M5 model tree over ANN is that, it is simple to compute. So it is recommended to use M5 model for estimating ET0. The overall results are of significant practical use because the temperature and Humidity-based model can be used when radiation and wind speed data are not available.

The results of this study are similar to those reported by Sattari et al. (2013b, 2013c) when comparing ANN and M5 model tree approaches at different locations. These results suggested a better performance by the ANN approach, but M5 model tree, being analogous to piecewise linear functions, provides a simple linear relation. Therefore, these results recommended using the M5 model tree to estimate ET0.