1 Introduction

1.1 Background

Evapotranspiration (ETo), which is the combination of evaporation from soil and transpiration from plants, is a critical component of the energy budget and water cycle. The process is affected by many climatological parameters such as solar radiation, air temperature, air humidity and wind speed. Transpiration is vaporization of liquid water contented in plant tissue, which happens through plant stomata. Transpiration, like evaporation, depends on many factors: energy supply, vapor pressure gradient, wind, radiation, air temperature, air humidity, soil water content, the ability of the soil to conduct water to the roots, water logging, soil water salinity, crop characteristics, environmental aspects, and the cultivation practices. Evaporation and transpiration (ET) happen at the same time and there is no easy way to separate them. For example, when the crop is small the main process for losing water is evaporation, but once the crop completely developed and covered the soil, transpiration becomes the main process. Thus, an accurate estimation of ETo will greatly improve the monitoring of changes in the water cycle, which is an important part of the on-going global climate change.

Numerous methods have been proposed for modeling evapotranspiration, as described by Brutsaert (1982). In the last decades, the neural networks approach has been applied to many branches of science and many studies have approved the successful application of ANN for estimating different phase of hydrological cycle such as rainfall runoff modeling (Hsu et al. 1995; Tokar and Johnson 1999); streamflow prediction (Imrie et al. 2000); reservoir inflow forecasting (Jain et al. 1999; El-Shafie et al. 2007, 2009; Sulaiman et al. 2011) and prediction of water quality parameters (Maier and Dandy 1999). Recently, artificial neural networks have been successfully used in modeling evapotranspiration. Trajkovic et al. (2003) applied a sequentially adaptive radial basis function network for the forecasting mean monthly ETo. They concluded that ANN could predict monthly ETo with high accuracy. Sudheer et al. (2003) checked the potential of ANN in estimating the monthly actual crop evapotranspiration (ET) from limited climatic data. Their research employed radial-basis function (RBF) type ANN and compared the result with measured lysimeter. Their study clearly showed that ANN could successfully estimate ET with limited climate data. Awchi (2008) estimated evapotranspiration by means of ANN and daily climatic data of temperature, relative humidity, sunshine hours, wind speed and rainfall, and compared them with Penman-Monteith (PM) results. He found that ANN could successfully be utilized for estimating ETo. Zanetti et al. (2007) applied ANN for estimating daily ETo as a function of maximum and minimum temperature, extraterrestrial radiation and the daylight. Wang et al. (2008) used ANN (FFBP) for estimation average decadal ETo with maximum and minimum temperate in Burkina Faso. According to their results, output of ANN has a higher accuracy than Hargreaves (HGS) and Blaney-Criddle (BCR) (Blaney and Criddle 1962). Khoob (2008) compared method of converting pan evaporation (Ep) with ANN utilizing maximum and minimum temperature. He posited that ANN gives more accurate results than conversion method. Chauhan and Shrivastava (2008) evaluated the performance of ANN for estimating ETo and showed that ANN models performs better than climatic models such as Blaney–Criddle, Radiation and Modified Penman. He reported that ANN model could provide estimating for ETo relatively close to actual measures from Pan evaporation. The result also suggested that ANN models could estimate ETo from maximum and minimum temperature.

One of the main issues in machine learning research is that of generalization. Generalization refers to the predictive ability of a base learner (or learning machine). The better a predictor performs on unseen data, the better it is said to possess the ability to generalize (Chandra and Yao 2006).

1.2 Problem Statement and Objective

Obviously, ANN provides a viable and effective approach for developing input–output prediction models in situations that do not require modeling of the whole and/or part of the internal parameter affecting ETo, although, those models have proved to be efficient, its convergence tends to be very slow, and yields sub-optimal solution. This may not be suitable for dynamic adaptive accurate forecasting purpose. In fact, the major objective of training an ANN for prediction is to generalize, i.e. to have the outputs of the network approximate target values given inputs that were not in the training set. One of the major shortcomings is that the ANN model experienced over-fitting problem during training session and occurs when a neural network loses its generalization. Based on the above assertion and according to Kumar et al. (2002), ETo depends on the interaction of many climatic parameters such as temperature air humidity, wind speed, solar radiation, and type and growth stage of the crop.

In our study the data are available in both the study areas and then the ETo could be calculated based PM method, which is considered as reference value of ETo. On the other hand, the ETo could be calculated utilizing the proposed method in this study with the limited data (maximum and minimum temperature) and also the other competitor method, and hence the comparison analysis could be carried out in order to evaluate and examine the performance of the proposed model in this study over the other competitor methods.

The main objective of this research is to introduce a modification for the classical neural network modeling, namely, Ensemble Neural Network (ENN) to overcome the over-fitting problem. In addition, this manuscript investigates the potential of utilizing the ENN model to estimate and predict the monthly time series of daily ETo at Rasht City, Iran and Johor City, Malaysia. The proposed ENN will use only the daily time series minimum and maximum temperature and solar radiation (Tmin, Tmax and Rs) as input pattern. Since there is no lysimeter installed in the case study area, PM method was accepted as reference ETo in order to evaluate the proposed model. The anticipated impact of this model is that it can predict the monthly time series of daily ETo similar to the level of accuracy of PM method without the need to explicitly consider the internal hydrologic or climatic parameters.

2 Material and Data Collection

In order to achieve the objective of this study, daily minimum and maximum temperature (°C) have been collected. Two different data set that represent two different climate conditions humid and semi-humid in Johor, Malaysia and Rasht city, Iran. In fact, the idea behind utilizing such various data sets is to validate the applicability of the proposed model in predicting evapotranspiration.

There are several rivers in Johor State “located in south of Malaysia”, Johor River is considered as one of the main river not only in Johor state but also at Malaysian level. Johor River flows in a roughly north–south direction originating from Mount Gemuruh and then empties into the Strait of Johor. Johor River is the main water supply for several water demands (domestic, industrial and agricultural) for Johor State and for some part in neighboring country Singapore. The major features of Johor River stream are 122.7 km long with a catchment of 2,636 km2, on other words; it is almost 14 % of the Johor State of Peninsular Malaysia. There are several tributaries that connected with main stream of the river while the majors’ one are Sayong, Linggui, Tiram and Lebam Rivers. Its banks are also known to be the location of past capitals of Johor. The Sungai Johor Bridge, officially opened in June 2011, is the first bridge to span the river and is currently the longest river bridge in Malaysia.

Syarikat Air Johor, SAJ (or Johor Water Company) and the Public Utilities Board of Singapore (PUB) each draws about 250,000 cubic metres/day of water from the Johor River near Kota Tinggi. Both water supply schemes have been operational since the mid-1960s. In addition, the Linggui Dam completed and impounded in 1993 also supplements the water supply to both Johor and Singapore.

For the Rasht station (latitude: 37 12 N, longitude: 49 39 E and elevation 36.7 m above sea level) were collected for the 30 years (01.01.1975 to 31.12.2005). North part of Iran, near Caspian Sea, has a special climate and plant cover, which is different other parts of Iran. It has cold and humid weather in winter and fall and hot and humid weather in summer. This region has the most precipitation, compared to other parts of Iran, and a maximum annual precipitation of 1,400 mm per year recorded in Anzali City in Gilan province. The principal agricultural product in that region is rice, which have a high crop water requirement. This region contains three provinces; Gilan, Mazandaran, and Golestan provinces. The case study is Rasht, the capital city of Gilan province, located west of the region. Geographical location of this province is from 36° 34′ to 38° 27′ north and from 48° 53′ to 50° 34′ south. Its area is 14,711 Km2 with 2.4 million populations.

3 Methodology

3.1 Estimation of Reference Evapotranspiration

The Penman-Monteith (PM) equation for computation reference evapotranspiration proposed by Allen and Pruitt (1991) is as following:

$$ E{T}_o=\frac{0.408\varDelta \left({R}_n-G\right)+\gamma \frac{900}{T+273}{u}_2\left({e}_s-{e}_a\right)}{\varDelta +\gamma \left(1+0.34{u}_2\right)} $$
(1)

where,

ETo is reference evapotranspiration (mm day−1); Rn is net radiation at the crop surface (MJ m−2 day−1);G is soil heat flux density (MJ m−2 day−1);T is mean daily air temperature at 2 m height (°C);u2 is wind speed at 2 m height (m s−1); es is saturation vapor pressure (kPa);ea is actual vapor pressure (kPa); es - ea is saturation vapor pressure deficit (kPa);Δ is slope vapor pressure curve (kPa °C−1);γ is psychrometric constant (kPa °C−1).

In the case where only Tmax and Tmin are available, Hargreaves and Samani (1985) suggested that ET o can be calculated as:

$$ E{T}_o={C}_o{\left({T}_{\max }-{T}_{\min}\right)}^{0.5}\left({T}_{mean}+17.8\right){R}_a $$
(2)

Where, ETo is reference evapotranspiration calculated by HGS method (mm/day); Tmin, Tmax and Tmean are minimum, maximum and mean temperature, respectively (°C); Ra is extraterrestrial radiation (mm/day).

The input data for training and simulation are Tmax, Tmin and Rs. In case of availability of Tmax and Tmin only, Rs can be calculated by Eq. (3):

$$ {R}_s={K}_{Rs}\sqrt{\left(T{}_{\max }-{T}_{\min}\right)}{R}_a $$
(3)

where, Rs is solar radiation (MJ m−2 d−1); KRs is adjustment coefficient (0.16–0.19); Tmax and Tmin are maximum and minimum air temperature, respectively (°C); Ra extraterrestrial radiation ((MJ m−2 d−1) Eq. 2).

3.2 Model Architecture

The actual data for Tmax, Tmin and Rs over the 30 years between 1975 and 2005 on daily basis at Rashat City, Iran is used in this study, while the ETo was calculated based PM method. On the other hand, for similar period the same parameters where collected for Johor Bahru city, Malaysia, and these data were used to train, test and validate the proposed ANN model. In general, the model architecture is for predicting the average daily evapotranspiration. However, the nature of the data and its stochastic behavior is different from 1 month to another. In this context, the model is developed for average daily evapotranspiration but for each month.

In general, the ANN modeling method learnt from examples, it uses the preceding and the recent behavior of a system to predict its future changes with respect to the changes in the input parameter (Bishop 1996). In fact, the major advantage of ANN method is its capability to mimic the behavior of the feature of the input pattern and its mapping with the corresponding output. Furthermore, the ANN method has the potential to predict the behavior of systems without analytical prediction rules. One of the major shortcomings is the selection of ANN architecture in terms of input–output pattern that could provide the best result for desired output.

In this context, two different scenarios I and II for the model architecture are employed to predict the monthly ET o at both case studies. Scenario I is organized such that the prediction of ET o at particular month (t) and year (n) is based on the Tmax, Tmin and Rs, at month (t-1) year (n) as presented in the following equation

$$ E{T}_o\left(t,n\right)=f\left({T}_{\left(t-1,n\right)}^{\max },{T}_{\left(t-1,n\right)}^{\min },{R}_{\left(t-1,n\right)}^s\right) $$
(4)

On the other hand, Scenario II, is structured such that the prediction of ET o at month (t) and year (n) is based on the Tmax, Tmin and Rs. of the same month for year (n-1)

$$ E{T}_o\left(t,n\right)=f\left({T}_{\left(t,n-1\right)}^{\max },{T}_{\left(t,n-1\right)}^{\min },{R}_{\left(t,n-1\right)}^s\right) $$
(5)

The architecture of the two scenarios applied in this research is a three-layer network including an input layer, an output layer and a hidden layer as shown in Fig. 1a, b.

Fig. 1
figure 1

Exact architecture of the ANN for the proposed two scenarios a scenario I b scenario II

The architecture of the network consists of an input layer of three neurons (corresponding to the monitored Tmax, Tmin and Rs), an output layer of one neuron (corresponding to the predicted ETo) and a number of hidden layers of arbitrary number of neurons at each layer. In order to achieve the desirable forecasting accuracy, 12 ANN architectures were developed (one for each month). Daily Tmax, Tmin, Rs and ET o for the period of 25 years, from 1975 to 2000, were utilized in order to train the 12 networks. The performance and the reliability of the ANN models were examined using the data monitored between 2000 and 2002. The capabilities of the developed ANN models were further verified for the period between years 2003 and 2005.

One classical problem of neural networks is called overfitting, which occurs especially with noisy data. Is has been observed that excessive training results in decreased generalization. Instead of finding general properties of the different input patterns that match a certain output, the training brings the network closer to each of the given input patterns. This results in less tolerance in dealing with new patterns. One Solution is using evaluation of the network performance as a different set of patterns than for the training. Hence, only networks that generate the ability to generalize are evaluated high.

3.3 Assembly Neural Network Procedure

In order to achieve this goal, we use a sequence of the previous behavior of the system as the training data then generate a sequence of inputs with proper length and their corresponding outputs for the first 90 % of 25 years (training data) and with respect to the size of the best period regarding the previous section. Subsequently, we construct a series of networks with an initial guess for the number of hidden layers neurons and initialize their parameters randomly. Finally for every network, the parameters vector will stop on a local minimum of its performance surface. Up to this point, all of the networks are over fitted on the training set. Afterward, a simulated annealing process is applied on each network. To do this, the model is modified to generate a set of vectors named the noise vectors. Length of each noise vector is equal to the length of each network parameters vector and its components are random numbers with uniform distribution between −0.05 and +0.05. By adding noise vectors to the network parameter vectors a new set of network parameters are obtained. This action makes for relatively minor changes in the location of each network in its state space.

Furthermore, the learning phase, a random vector of length N is generated where N is the length of the sequence of the first 25 years of time series values. This vector is called data noise vector and shown by following equation:

$$ {V}_{dn}=M\times {10}^{-2}\times z\times rand\left(1,N\right) $$
(6)

In this equation z is the number of networks added to the ensemble of neural networks before this step. Rand (1, N) is a 1 × N vector. The components of this vector are uniformly distributed random numbers between −0.05 and +0.05. Also, M = Max–Min where Max and Min are the maximum and minimum values of time series of the system’s behavior respectively. Once again, we select the network that has the best generalization on this new training data sets. But this time the number of neurons in the hidden layers of networks is calculated using the following equations:

$$ {n}_1^{\prime }=\left|x\times \frac{n_1}{n_2}\right| $$
(7)
$$ {n}_2^{\prime }=x+{n}_2 $$
(8)

Where n 1 and n 2 are the initial number of neurons in the first and second hidden layers of the first set of networks and n 1 and n 2 are new values. The value of x is gained through the following equation:

$$ \begin{array}{l}x=\left\{\left|\frac{ IN+1}{2}\right|\times \mod \left(\left( IN-1\right),2\right)\right\}-\\ {}\left\{\left[\left|\frac{ IN+1}{2}\right|\times \left|\frac{1+ sign\left({n}_2-\left|\frac{ IN+1}{2}\right|\right.}{2}\right|+\left(\left(2\times {n}_2-\left|\frac{ IN+1}{2}\right|\right)-1\right)\times \left|\frac{1- sign\left({n}_2-\left|\frac{ IN+1}{2}\right|\right.}{2}\right|+0.5\right]\times \mod \left( IN,2\right)\right\}\end{array} $$
(9)

In this equation, IN is the iteration number. Following the initial step, if IN is even the networks will be constructed with the previous structure but with more neurons. However if IN is odd then the number of neurons will be decreased until a zero limit is met. At this step we will continue the process by increasing the number of neurons in the hidden layer (n 1). This enables us to find more suitable number of neurons in the completion process of the ensemble if the initial guess was not accurate and networks need more (or less) number of neurons to achieve a good generalization.

Consider a single NN that has been trained on a given data set. Let x denote an input vector not seen before and let d denote the corresponding desired response; x and d represent realisation of the random vector X and random variable D, respectively. Let F(x) denote the input–output function realised by the network. Networks are trained with these noisy parameters until another local minimum is achieved. Making noise vectors and training are repeated for a number of times and the outputs of these networks are compared on the followed 10 % of the 25 years which are not used during training steps. The winner has the best generalization amongst all and is selected as the first member of an ensemble of neural networks. Then, in light of the material on the bias/variance dilemma, we decompose the MSE between F(x) and the conditional expectation,

The expectation, ED, is taken over the space D, defined as the space encompassing the distribution of all training sets (for example, inputs and target outputs) and the distribution of all initial conditions. There are different ways of individually training the networks and also different ways of combining their outputs. Here, we consider the situation where the networks have an identical configuration, but they are trained starting from different initial conditions by analogy with Eq. 9,

After finding the best network in each set, we compute the sum of absolute errors in the prediction of the last 10 % of data using the following equations:

$$ {e}_1=\frac{{\displaystyle \sum_{i=1}^z{\displaystyle \sum_{j=1}^k\left|{T}_s(j)- \Pr \left(i,j\right)\right|}}}{z} $$
(10)
$$ {e}_2=\frac{{\displaystyle \sum_{i=1}^{z+1}{\displaystyle \sum_{j=1}^k\left|{T}_s(j)- \Pr \left(i,j\right)\right|}}}{z+1} $$
(11)

In the above equations, z is the number of networks that was added to the ensemble before this step. Ts is the sequence of the last 10 % events of time series in training data set and k is the size of Ts. Pr(i,j) is the value that ith member of ensemble predicts for the jth event in the last 10 % of time series events. If e1 > e2, adding the best network of this step to the ensemble has caused improvements in the generalization of the ensemble totally. Otherwise, we don’t add the selected network to the ensemble and repeat this step again with new noisy data sets and a new set of networks with different number of neurons in their hidden layers. The terminating condition is as follows: a predefined number of iterations (namely itr) are considered and at the end of these iterations the improvement of ensemble predictions (on the last 10 %) are measured. If this value is smaller than a predefined factor the termination condition is met. Otherwise this process will be repeated. Figure 2 illustrates the learning phase for the proposed ensemble ANN model.

Fig. 2
figure 2

Learning phase process for ensemble neural network

3.4 Model Performance

In order to evaluate the proposed ANN model with ensemble procedure, various statistical indices that could be used. In this study, different methods for investigating model performance have been introduced. These were mean absolute relative error (MARE), mean absolute error (MAE), mean square error (MSE) and correlation coefficient (R2). They evaluated models by the following equations:

$$ MARE=\frac{1}{N}{\displaystyle \sum \frac{\left|y-y^{\prime}\right|}{y}}\ast 100 $$
(12)
$$ MAE=\frac{{\displaystyle \sum \left|y-y^{\prime}\right|}}{N}\ast 100 $$
(13)
$$ MSE=\frac{{\displaystyle \sum {\left(y-y\prime \right)}^2}}{N} $$
(14)

Where, y is the ET o calculated by PM, y′ predicated ET o by model and N is the total number of data. R2 was resulted by linear correlation between actual and simulated data.

As the prediction accuracy of the peak and low ET o events is of particular interest of the water resources management, especially in the field of irrigation, it is important to evaluate the model performance considering these extreme events. Therefore, the model performance during these events, another two statistical indices were recommended namely; Peak Value Criteria (PVC) and Low Value Criteria (LVC), were recommended to evaluate the model performance at the extreme values, which can be computed by Eqs. 15 and 16

$$ PVC=\frac{{\left({\displaystyle \sum_{i=1}^{T_p}{\left(E{T_o}_m-E{T_o}_p\right)}^2\ast {\left(E{T_o}_m\right)}^2}\right)}^{0.25}}{{\left({\displaystyle \sum_{i=1}^{T_p}{\left(E{T_o}_m\right)}^2}\right)}^{0.5}} $$
(15)
$$ LVC=\frac{{\left({\displaystyle \sum_{i=1}^{T_i}{\left(E{T_o}_m-E{T_o}_p\right)}^2\ast {\left(E{T_o}_m\right)}^2}\right)}^{0.25}}{{\left({\displaystyle \sum_{i=1}^{T_i}{\left(E{T_o}_m\right)}^2}\right)}^{0.5}} $$
(16)

Where T p  = number of peak evapotranspiration greater than one-third of the mean peak ET o observed; T i  = number of low ET o lower than one-third of the mean low evapotranspiration observed. El-Shafie et al. (2009) reported that both PVC and LVC provide better performance indicators for assessment of the prediction model performance for the extreme inflow events. As the model can provide low PVC or LVC as the model represents better fit.

4 Results and Discussion

The daily PM was calculated by FAO ETo calculator (version 3.1 January 2009). The time series input data of PM employed in the software and 30 years daily ETo were generated and then sorted in to monthly time series. The 12 networks “one for each month” have successfully trained for the data between the period 1975 and 2000. Different Multi Layer Perceptron- Artificial Neural Network (MLP-ANN) architectures (while keeping three neurons in the input layer and only one neuron in the output layer) were used to examine the best performance. In fact, there is no formal and/or mathematical method for determining the appropriate “optimal set” of the key parameters of Neural Network (number of hidden layers, number of neurons with each hidden layer and the type of transfer function between two consequence layers). Therefore, it was decided to perform this task utilizing trial and error method. Several sets were examined with maximum three hidden layers and maximum ten neurons within each layer. Table 1 shows the optimal number of neurons in the hidden layers for each network. It was decided to use the same number of neurons while implementing the model utilizing scenario II. Furthermore, the test session has been carried out for both data for Rasht and Johor cities utilizing the data for the years 2001 and 2005. Since the ETo were accurately calculated using PM method over the years 2001 and 2005 period, the performance of the proposed ANN-based architecture can be examined and evaluated.

Table 1 The ANN architecture for each month

Before examining the model performance in details utilizing the statistical indices as presented in Sections 34, it is essential to pre-evaluate the model utilizing the error distribution. The error distribution equation is as follow;

$$ Error=\frac{y-y^{\prime }}{y}\ast 100 $$
(17)

As an example for the model performance utilizing scenario I, Fig. 3a, b shows the result achieved for selected months April and September for Rasht city. It should be highlighted here that those two months present turning point for different season in terms of climate conditions for both cities. Figure 3 shows the error distribution for daily evapotranspiration as output from the proposed model versus the PM method for the testing period between 2001 and 2005, “5*30 = 150”. On the other hand, Fig. 4a, b demonstrates error distribution of the proposed ANN model for Johor city. In general, it can be noticed that ANN model could provide relatively reliable accuracy, however the maximum is relatively high which reflect a weakness on the proposed architecture of the model utilizing scenario I. For example, the maximum error for month of September for both cities is above 40 % in some days and more than 25 % for month of April. Such relatively high error is due to two major reasons, i) The model architecture and ii) Model overfitting.

Fig. 3
figure 3

Error distribution for ANN model scenario I for months April and September in Rasht City Iran

Fig. 4
figure 4

Error distribution for ANN model scenario I for months April and September in Johor City Malaysia

In order to improve the performance of the model architecture, scenario II has been implemented and examined as shown in Figs. 5 and 6 for the same selected months for both cities, respectively. Apparently, the results achieved from scenario II are relatively better than scenario I. It can be depicted from Fig. 3a, b that the maximum error decreased to 15 % for Rasht city for both months April and September, while it kept relatively high “30 %” for month of September for Johor city, as shown in Fig. 6, while, the maximum error value is reduced to 15 % for month of April for Johor City.

Fig. 5
figure 5

Error distribution for ANN model Scenario II for months April and September in Rasht City Iran

Fig. 6
figure 6

Error distribution for ANN model Scenario II for months April and September in Johor City Malaysia

In search for better understanding of the difference between the performance of both scenarios, further analysis to study the overall performance (not on monthly basis) of both scenarios have been carried out. Tables 2 and 3 show complete details for the selected statistical indices to evaluate the performance for both scenarios. In addition, these tables introduce the performance of ANN to PM method as a reference and to HGS as empirical method “competitor. It could be observed that ANN provides significant improvement on the level of accuracy in predicting the evapotranspiration if compared with HGS method. Furthermore, it could be found out that there is no remarkable change when comparing the results achieved for both classical ANN scenarios. In addition, the ANN model scenario I could provide better accuracy for 4 months January, March, April and September for Rasht City and only 1 month October for Johor City as highlighted with gray in the associated cell of the Tables 2 and 3, however, the ANN model scenario II could achieve better accuracy level than scenario I for the rest. Accordingly, in order to compare both ANN scenarios, averaging the whole statistical indices for all months. Tables 2 and 3 present the averaging values for MAE, MARE, MSE and R2 in the last row. The results showed that scenario II outperformed scenario I and provides relatively higher performance. For further details, accuracy improvement (AI) index for each of the above statistical index to measure the significance of scenario II module over scenario I model could be expressed as follows

Table 2 MAE and MARE for ETo prediction using HGS method, ANN scenario I and scenario II
Table 3 MSE and R2 for ETo prediction using HGS method, ANN scenario I and scenario II
$$ AI\%=\left(\frac{E_{scenarioI}-{E}_{scenarionII}}{E_{scenarioI}}\right)\ast 100 $$
(18)

Where E scenarioI is the value of the statistical error given by scenario I, while E scenarionII is the same statistical error given by the proposed scenario II. It should be noted that the negative values of MAE, MARE and MSE and positive values for R2 mean that the scenario II provides better results over scenario I. Table 4 introduces the AI% calculated values for the above mentioned statistical indices. It could be observed from AI% that scenario II could significantly improve the accuracy level for most of the months.

Table 4 Accuracy Improvement AI% for MAE, MARE, MSE and R2 associated with the output of both scenarios

The ensemble procedure described in Section 3.3 was applied to overcome the overfitting and improve the generalization of the training process of the 12 networks with scenario II architecture. For example, the ensemble ANN model is applied for the same 2 months (June and October) which experienced relatively poor accuracy for testing data set because of the over fitting problem during training. Figure 7 shows that seven and eight networks are selected as ensemble members for the above mentioned months and ten and eight networks for Rasht city and Johor city, respectively, before the termination conditions met. As shown in Fig. 7, utilizing ensemble ANN method could reduce the MSE significantly if compared with the classical ANN method (over fitted network). The performance of the ensemble ANN model during the testing data (2001 and 2005) was examined and presented in Table 5. Table 5 summarizes the results of proposed ensemble ANN associated with each month for statistical indices for the testing period. In addition, Table 5 shows the AI% for all the statistical indices over the classical ANN model with scenario II. It can be depicted that the ensemble ANN model successfully provides better and consistent level of accuracy for all months for all statistical indices if compared with the classical ANN scenario II showed in Tables 2 and 3.

Fig. 7
figure 7

The effect of increasing number of ensemble neural network on the mean square error for months July and December

Table 5 MAE, MARE, MSE and R2 associated with the output of Ensemble ANN and its Accuracy Improvement AI% over classical ANN Scenario II

For further assessment, it is of important to examine the proposed model for predicting the extreme values of the evapotranspiration. This is due to the fact that the extreme value is essential for water resources managers. Figure 8 shows the actual values of evapotranspiration versus the prediction values as the output from the proposed model. Figure 8a shows the actual and prediction values for month of April for Rasht City, and Fig. 8b for the same month for Johor city. The selection for month of April is due to the fact that this month experienced the most fluctuated values for evapotranspiration. It could be noticed that the proposed model could accurately predict the extreme values. In addition, the ensemble ANN model is examined for the peak and low ET o events using PVC and LVC statistics as discussed earlier in Section (4.3) and presented in Eqs. 15 and 16 are presented in Table 6. As presented in Table 6, it can be observed that the developed ensemble ANN model can perform the function of providing an accurate prediction value for the PVC ranging between 1.45 and 5.3 % for months January and July, respectively, while the LVC values experienced better prediction error if compared with the PVC ranging between 1.2 and 5 % for months November and March, respectively. On the other hand, the PVC and LVC values achieved, utilizing the classical ANN model are relatively higher than ensemble ANN model ranging between 2.2 and 9.53 for PVC and 1.25 and 5.84 for LVC for January, August, November and September, respectively. As a result, utilizing the proposed ensemble ANN model outperforms the classical ANN model and significantly improves the prediction error for not only for the medium range of ET o but also for the ET o extreme values.

Fig. 8
figure 8

Actual versus the prediction evapotranspiration utilizing ensemble ANN model, for month of April a Rasht b Johor Cities

Table 6 Classical ANN Scenario II and Ensemble ANN performance based on the peak and low flow error criteria

5 Conclusion

This study introduced a procedure for development of Artificial Neural Network (ANN) model for predicting daily evapotranspiration on monthly basis time series using only maximum, minimum range of temperature and solar radiation. The model was examined utilizing the actual data for two cities that experience two different climatic conditions namely Rasht in Iran and Johor in Malaysia. Two different scenarios for input–output pattern of the ANN model architecture in order to optimize the accuracy level were developed. In addition, a generalization technique is developed namely; Ensemble ANN to overcome the over-fitting problem that is experienced while training the ANN model. The proposed ANN model was evaluated and compared with the traditional model Hargerios-Samani (HGS). The results showed that the classical ANN model outperformed the HGS model and achieved reliable performances including low MAE, MARE, MSE and high R2. In addition, the proposed ANN with ensemble procedure showed significant enhancement over the classical ANN. Furthermore, the ensemble ANN proved its reliability and high consistency level of accuracy when examined for predicting the extreme values of ET o . The ability of the proposed ensemble ANN is extremely useful especially for predicting evapotranspiration which is highly needed for water resources managers while the availability of the data is limited.