1 Introduction

Accurate estimation of reference evapotranspiration (ET0) is needed for water resources management, farm irrigation scheduling, and environmental assessment. A large number of methods have been developed for assessing ET0 from meteorological data. The Penman Monteith (PM) method is recommended by FAO as the sole method to calculate reference evapotranspiration wherever the required input data are available (Allen et al. 1998). The PM is a physically based approach, which requires air temperature, relative humidity, solar radiation, and wind speed. The details of the PM equation are provided in the FAO’s Irrigation and Drainage Paper Number 56 (FAO-56) (Allen et al. 1998). Unfortunately, there are a limited number of meteorological stations even in developed countries where these climatic variables are accurately measured. Empirical ET0 models that require fewer variables exist. In the past decade, considerable attention has been focused on the evaluation of these models. For example, Trajkovic and Kolakovic (2009) evaluated five ET0 estimation methods by comparing the estimates with results from the reference FAO-56 Penmane Monteith (FAO-PM) equation under humid conditions. They showed that Turc’s method gave the best ET0 estimates and ranking first, and other equations ranked in decreasing order are: Priestley–Taylor, Jensen–Haise, Thornthwaite, and Hargreaves. Tabari (2010) evaluated four simpler models based on monthly performance for Various Climates in Iran. The author reported that the Makkink and Priestley–Taylor models estimated ET0 values less accurately than Turc and Hargreaves models for the all climates. Chauhan and Shrivastava (2009) compared the performance of four climate based methods and Artificial Neural Networks (ANNs) for estimation of ET0 in India, when input climatic parameters are insufficient to apply FAO-PM method. They concluded that the ANN models were performed better than the climatic based methods.

Evaporation pans (class A pan, US Weather Bureau) are used extensively throughout the world to estimate ET0. Evaporation pan (Ep) provides a measurement of the combined effect of temperature, humidity, wind speed and solar radiation on the reference crop evapotranspiration. This measurement can successfully be used to estimate ET0 with a reasonable accuracy (Irmak et al. 2002). Numerous studies have shown that a high correlation between Ep and ET0 can be obtained when evaporation pans are properly maintained (Jensen et al. 1961; Doorenbos and Pruitt 1977; Irmak et al. 2002). Conventional method uses Pan coefficient (Kp) as a factor to convert Ep into ET0. Since the evaporation rate from the open pan and the ET0 rate from the vegetated surface differ, ET0 is computed by multiplying the Ep with Kp to account for differences between the grass and open water.

Doorenbos and Pruitt (1977) reported that the Kp values range from 0.40 to 0.85, depending on the prevailing upwind fetch distance (F) and climatic parameters such as wind speed at 2 m height (U2) and air relative humidity (RH). Fetch is the horizontal distance that the wind blows over green vegetation or dry surface to reach the pan. So the ground cover in the station influences the Kp values. Two cases of evaporation pan sitting are considered (Allen et al. 1998): 1) the pan is sited on a short green vegetation cover (green fetch) and surrounded by fallow soil, and 2) the pan is sited on fallow soil (dry fetch) and surrounded by a green crop.

The Kp values were first published by Jensen (1974) and subsequently tabulated by FAO-24 (Doorenbos and Pruitt 1977). Doorenbos and Pruitt (1977) suggested Kp values for the two cases of evaporation pan siting in tabular form for a number of fetch distance under different wind speed and relative humidity conditions. The values for F are presented quantitatively but those of U2 and RH are presented as classifications in their table. When the Kp values were first reported no computers were available. Later on when computers and data loggers were developed and when electronic data transmission became possible, automatic conversion of Ep to ET0 and the elimination of search operations became possible (Snyder 1992). Since then, several empirical equations to calculate daily values of Kp have been developed based on Doorenbos and Pruitt (1977) table using linear, nonlinear, and indicator regression techniques (Frevert et al. 1983; Cuenca 1989; Snyder 1992; Allen and Pruitt 1991 and Raghuwanshi and Wallender 1998).

The fundamental question of which equation predicts Kp most accurately has been considered in several studies. Irmak et al. (2002) evaluated the techniques of Frevert et al. (1983) and Snyder (1992) to convert Ep to ET0 in the humid climate of Gainesville, Florida. Results of Irmak et al. (2002) showed that ET0 calculated using the daily Kp values from Equation of Frevert et al. (1983) provided more accurate daily, monthly, and annual total estimates compared to the ET0 calculated using Kp values from Equation of Snyder (1992) when the FAO-PM method was used as a reference for this climatic condition. The Snyder (1992) method tended to overestimate ET0 calculated by the FAO-PM method, especially in summer (Irmak et al. 2002). In another study, Sabziparvar et al. (2010) compared seven exiting pan models to estimate Kp values for two different climates of Iran. They showed that, for the cold semi-arid climate condition, the best Kp models for estimation of ET0 were Orang and Raghuwanshi–Wallender, respectively. Also, the Snyder and Orang models were best fitted models for warm arid climate, respectively. Trajkovic and Kolakovi (2010) evaluated the reliability of simplified pan-based approaches for estimating ET0. In this study, three pan-based (FAO-24 pan, Snyder ET0, and Ghare ET0) equations were compared against lysimeter measurements of grass evapotranspiration using daily data from Policoro, Italy. Based on summary statistics, the Snyder ET0 equation ranked first with the lowest RMSE value.

The above Kp equations were presented for pans with green fetch and only two equations were presented for pans with dry fetch. Allen and Pruitt (1991) developed a non-linear Kp equation for a Class A pan type with fallow soil surrounding condition. This equation was presented by Allen et al. (1998) in FAO-56. Abdel-Wahed and Snyder (2008) reported that the equation to calculate Kp developed by Allen et al. (1998) was somewhat complex and as a result, they proposed a simpler equation to calculate daily Kp values for a pan placed in a dry fallow area. Evaporation pans are placed in dry fallow area at most weather stations in Iran, especially in arid and semi-arid environment, so it is desirable to select the appropriate Kp equations. Therefore, the first objective of this study was to compare the Allen et al. (1998) and Abdel-Wahed and Snyder (2008) equations to estimate ET0 by comparing them against the FAO-PM method using data collected in a semi arid climate of Iran. The FAO-PM method was chosen as a standard for testing the accuracy of the Kp equations in this study because there were no measured ET0 data at this location. This method was accepted as a standard method for estimating ET0 by the FAO (Allen et al. 1998).

Recently, M5 model trees have been used successfully for flood forecasting (Solomatine and Xue 2004), water level-discharge relationship (Bhattacharya and Solomatine 2005), rainfall-runoff modeling (Solomatine and Dulal 2003), sedimentation modeling (Bhattacharya and Solomatine 2006), and estimation of ET0 (Pal and Deswal 2009). Pal and Deswal (2009) investigated the potential of M5 model tree based regression approach to model daily ET0 using four inputs including solar radiation, average air temperature, average relative humidity, and average wind speed. Results from their study suggested that M5 model tree could successfully be employed in modeling the ET0. The second objective of this study was to examine the potential of this approach for converting Ep to ET0. A comparison between conventional approach and M5 model tree was the last objective of this study.

2 Materials and Methods

2.1 Study Area and Data

The area under study was Khuzestan province, which lies between latitudes 29.95°N and 32.9°N and between longitudes 47.6°E and 50.6°E. Khuzestan province is in the south-west of Iran, borders Iraq and the Persian Gulf, and covers an area of 63,238 km2. On the basis of the Koppen climate classification, Khuzestan province is categorized as having a semi-arid climate. The average annual rainfall ranges from 320 mm in the east to 145 mm in the west and occasionally reaches as high as 400 mm in the east. Based on the climatic data from meteorological stations, the maximum annual rainfall is experienced during winter and late fall. The air temperature reaches its maximum in August and its minimum in January. According to the climatic data from meteorological stations, the average annual temperature along the Khuzestan province has varied from 21.5 °C in the north to 25.3 °C in the south over the past decade. The warmest temperature of the warmest month ranges from 38 °C to 47 °C, while that of the coldest month ranges from 1.5 °C to 4 °C. Daily mean relative humidity ranges from 13 to 92 % with an annual average of 54 %. The highest wind speed of approximately 259 km day−1 usually occurs in December. Wind speed is usually lowest from June through September, ranging from 47 to 145 km day−1 and averaging 96 km day−1.

Measured weather data were obtained from eight weather stations across the study area with varying latitudes, longitudes, and elevations. The spatial distribution of selected stations is shown in Fig. 1. The stations belong to the meteorological organization of Iran. Information regarding the sites and mean annual values of relevant weather variables are given in Table 1. The dataset consist of daily records of 12 years (1997–2008) of maximum and minimum air temperature, Tx and Tn respectively, (°C), relative humidity, RH, (%), wind speed, U, (m s−1), bright sunshine hours, n, (hours) and class A pan evaporation, Epan, (mm d−1). Monthly means of these daily data were used for estimating Kp and ET0 on a monthly basis. Measurements were made at a height of 2 m (air temperature and relative humidity) and 10 m (wind speed) above the soil surface. Wind speeds at 2 m (U2) were obtained from those taken at 10 m using the log-wind profile equation. The Class-A pan evaporimeters (USWB) were 0.25 m deep and 1.21 m diameter were made of galvanized steel. The bottoms of the pans were supported 0.15 m above the ground level on open-frame wooden platforms. The water level in the pans was maintained between 5.0 and 7.5 cm from the rim. Ep values were measured on the stations daily at 7.00 AM (local time).

Fig. 1
figure 1

Study area and location of the weather stations

Table 1 Summary of weather stations used in the study

2.2 Conventional Method of Estimating ET0

The basic form of the conventional method as described by FAO-24 (Doorenbos and Pruitt 1977) is ET0 = Kp × Ep. In this study two Kp equations proposed by Allen et al. (1998) and Abdel-Wahed and Snyder (2008) were evaluated (Table 2). The Kp equations are functions of daily mean relative humidity, RH (%), daily mean wind speed, U2 (m s−1), and fetch distance, F (m), as defined by Doorenbos and Pruitt (1977). All the stations used in this study are surrounded by dry fallow land. In the Kp calculations, F was taken as 1,000 m since the weather stations were surrounded by dry fallow land.

Table 2 Kp equations in the evaluation analysis

2.3 M5 Model Tree

M5 model tree was first presented by Quinlan (1992). The model is based on a binary decision tree having linear regression functions at the terminal (leaf) nodes, which develops a relationship between independent and dependent variables. Unlike decision tree which is used for categorical data, it can also be used for quantitative data (Quinlan 1992; Mitchell 1997). M5 model tree generation requires two different stages (Quinlan 1992; Solomatine and Xue 2004). The first stage involves splitting of the data into subsets to create a decision tree. The splitting criterion is based on treating the standard deviation of the class values that reach a node as a measure of the error at that node, and calculating the expected reduction in this error as a result of testing each attribute at that node. The formula for computing the standard deviation reduction (SDR) is defined as follows (Pal and Deswal 2009):

$$ \mathrm{SDR}=\mathrm{sd}\left(\mathrm{T}\right)-{\displaystyle \sum \frac{\left|{\mathrm{T}}_{\mathrm{i}}\right|}{\left|\mathrm{T}\right|}\mathrm{sd}\left({\mathrm{T}}_{\mathrm{i}}\right)} $$
(3)

where T denotes a set of examples that reaches the node; Ti denotes the subset of examples that have the ith outcome of the potential set; sd denotes the standard deviation (Wang and Witten 1997). Due to the splitting process, the standard deviation of the data in child nodes (lower nodes) is less than that at the parent node. After examining all the possible splits, the one that maximizes the expected error reduction was chosen. However, this division often produces a large tree-like structure which may cause over fitting or poor generalization. To overcome this problem, in second stage the overgrown tree is pruned and then pruned sub-trees are replaced with linear regression functions. This technique of generating the model tree substantially increases the accuracy of estimation (Quinlan 1992). Figure 2a shows splitting the input space X1 × X2 (independent variables) into six subspaces (leaves) by M5 model tree algorithm. A linear regression function was built at the leaves, labeled LM1 through LM6. Figure 2b shows its relations in form of tree diagram, in which LM1 to LM6 is in leave level. Further details of the M5 model tree can be found in Quinlan (1992).

Fig. 2
figure 2

Example of M5 model tree, a splitting the input space X1 × X2 by M5 model tree algorithm, b diagram of model tree with six linear regression models at the leaves

In this study, pan evaporation data (mm d−1) with relative humidity (%) and daily mean wind speed (m s−1) were selected as inputs to the M5 model tree for estimating reference evapotranspiration. The whole data of Mahshahr, Ramhormoz, Izeh and Bostan stations (from 1997 to 2008) were collected into one group in order to create the M5 model tree that has a higher regional capacity that could be applied to estimate ET0 for different locations in Khuzestan. After the creating process, the whole data of Aghajari, Behbahan, Masjedsoliman and Shushtar stations (from 1997 to 2008) were used to test the created model.

2.4 The FAO Penman–Monteith (FAO-PM)

In this study, the performance of empirical methods and M5 model tree were compared with the conventional FAO Penman–Monteith method. Although in practice, the best way to test the performance of the empirical methods would be to compare their performances against lysimeter-measured data; this type of data set is not available in the study area. The following equation was applied for the PM (Allen et al. 1998):

$$ {\ \mathrm{ET}}_0=\frac{0.408\Delta \left({\mathrm{R}}_{\mathrm{n}}-\mathrm{G}\right)+\gamma \frac{900}{{\mathrm{T}}_{\mathrm{a}}+273}{\mathrm{U}}_2\left({\mathrm{e}}_{\mathrm{s}}-{\mathrm{e}}_{\mathrm{a}}\right)}{\Delta +\gamma \left(1+0{.34\mathrm{U}}_2\right)} $$
(4)

where ET0 is reference crop evapotanspiration (mm d−1), Rn is the daily net radiation (MJ m−2 d−1), G is the daily soil heat flux (MJ m−2 d−1), Ta is the mean daily air temperature at a height of 2 m (°C), U2 is the daily mean wind speed at a height of 2 m (m s−1), es is the saturation vapor pressure (kPa), ea is the actual vapor pressure (kPa), ∆ is the slope of the saturation vapor pressure versus the air temperature curve (kPa °C−1), and γ is the psychrometric constant (kPa °C−1). The terms in the numerator on the right-hand side of the equation are the radiation term and aerodynamic term, respectively.

In this study, the daily values of ∆, Rn, es and ea were calculated using the equations given by Allen et al. (1998). For Rn, an albedo of 0.23 (green vegetation surface) was used. Since G is usually small compared with Rn and is difficult to measure, it was assumed to be zero over the calculation time step period (daily and monthly) (Allen et al. 1998). The measured RH, Tx and Tn values were used to calculate ea and es. The daily solar or shortwave radiation (Rs) was calculated using the Angstrom formula, which relates solar radiation to extraterrestrial radiation and relative sunshine duration. Equation (39) in Allen et al. (1998) was used to calculate the net outgoing longwave radiation.

2.5 Statistical Analysis

The comparison between the models (M5 and two Kp equations) and the FAO-PM model was carried out using: (1) a linear regression equation (Y = mX + c), through least square regression, between ET0 computed by FAO-PM equation and ET0 estimated from the above mentioned three methods (m and c are the slope and the intercept of the regression equation, respectively); (2) the coefficient of determination (R 2); (3) the Root Mean Square Error (RMSE). In the case of a perfect correlation with no bias, c = 0 and m = 1, R 2 = 1 and RMSE = 0.

3 Results and Discussion

To assess the estimation capacity of the Kp equations and to express the interactions between the different variables a correlation matrix for two training and testing data set was prepared (Table 3). Using a 95 % level F test, nearly all variables are significantly intercorrelated. It can be observed from Table 3 that the linear correlation between Ep and ET0 is high (0.98 and 0.96 for training and testing data set, respectively) implying that any model built using Ep will certainly be able to compute the ET0 satisfactorily. The relationship between Kp equations and ET0 FAO-PM shows a statistically significant correlation as well. The model’s accuracy can be improved by incorporating Kp variables that account for aerodynamic effects on ET0, such as humidity and wind speed in addition to Ep. As seen from these results, the correlation coefficients of two Kp equations and ET0 FAO–PM are negative, which indicates a decrease in Kp values, the ET0 rate will increase. This could be attributed to the fact that the decrease in Kp values is associated with a reduction in aerodynamic resistance to ET0, greater ET0 resulting in lower relative humidity and higher wind speed. Among the two Kp equations, the Allen et al. (1998) equation shows a high correlation coefficient (r = −0.92 for the both data set) with ET0 FAO–PM.

Table 3 Correlation matrix between ET0–PM, relative humidity (RH), wind velocity (U2), Kp equations and pan evaporation (Ep) for two training and testing data set

All monthly Kp data calculated from the two Kp equations were averaged over the 12 years to obtained mean monthly estimated Kp. The comparisons of calculated monthly Kp values using Eqs. 1 and 2 for all the stations are given in Fig. 3. The evolution of monthly values of Kp were nearly similar for all equations. Equation 1 gave a lower value, whereas Equation 2 gave a higher value of Kp for all months.

Fig. 3
figure 3

Calculated monthly Kp values using the Kp equations

For building model tree, based on creating data set, the Weka software (Witten and Frank 2005) was used. The model tree generated by M5 algorithm is shown in Fig. 4. As can be seen, four rules (LM1 to LM4) were generated. Figure 5 shows the scatter plot between ET0 estimated by the FAO-PM method and M5 model estimated ET0 for all creating data set. As seen from the fit line equation there is a very good agreement (m = 1.0, with c = −0.006 and R 2 = 0.99) and less scatter between the points.

Fig. 4
figure 4

Linear models generated by M5 model tree

Fig. 5
figure 5

Scatter plot between estimated ET0 by FAO-PM method and estimated one by M5 model tree, using creating data set

The ET0 estimates of developed M5 model tree and conventional Kp equations for the data set of test locations are illustrated in Fig. 6 in the form of scatterplot. It is clear from the scatterplots that the M5 estimates are closer to the corresponding FAO-PM ET0 values than those of the Two Kp equations. As seen from the fit line equations in the scatterplots that the m and c coefficients for the M5 model are closer to the 1 and 0 with a higher R 2 value than those of the other Kp equations. The slope of the fitted line is nearly close to one (lying on 1:1 line) for each station. This shows that the M5 model produces well for estimating ET0 in the scatter plots the slope of straight line (m) varies between 0.98 and 1.13 with an average of 1.04. As seen from the scatter plots, Allen et al. (1998) and Abdel-Wahed and Snyder (2008) equations compared less favorably with FAO-PM values than the M5 method. Estimates by the Kp equations overestimated the ET0 at all locations. This overestimation was constant throughout the study area.

Fig. 6
figure 6

Comparison between the values of ET0 calculated by FAO-PM method and those by three methods at four test weather stations. a M5 model tree, b Allen et al. (1998) and c Abdel-Wahed and Snyder (2008)

The statistical results are reported in Table 4. According to these results, the M5 method seems to be the best one to calculate ET0 in the Khuzestan plain (semi-arid climate). The coefficient of determination (R 2) and the slope are close to 1 and the value of RMSE = 0.50 mm d−1 can be also considered acceptable with regard to the average value of ET0 (5.35 mm). In contrast to M5 model, the performance of the conventional methods (Eqs. 1 and 2) was poor, the corresponding RMSE were 1.90 and 1.1 mm d−1 for Allen et al. (1998) and Abdel-Wahed and Snyder (2008), respectively (see Table 4 for other statistical analysis).

Table 4 Statistical values of the comparison between ET0 estimated by FAO-PM method against those obtained by the three methods

4 Conclusions

This study investigated the ability of M5 model tree for converting pan evaporation data to reference evapotranspiration under dry fetch condition in a semi- arid environment of Iran. The accuracy of M5 model tree has been compared to those of the two common Kp equations (Allen et al. in FAO irrigation and drainage paper number 56, 1998; Abdel-Wahed and Snyder in J Irrig Drain Eng 134(4):425–429, 2008). The monthly climatic data of eight weather stations in Khuzestan, are used for the model simulations. The Penman-Monteith method as recommended by FAO (Allen et al. 1998) was assumed as a standard in evaluating the above methods. The study demonstrated that modelling of reference evapotranspiration is possible through the use of M5 Model tree technique (RMSE of 0.4 to 0.6 mm d−1 for mean daily ET0 of 4.5 to 5.7 mm d−1) from pan evaporation, relative humidity, wind speed and extraterrestrial radiation data. The comparison results show that the M5 model tree approach works well in estimating reference evapotranspiration in comparison with conventional method that uses Kp equations. However, it would be suitable to consider for more humid and fetch distance to confirm this result.