1 Introduction

Efficient management of water resources in the agricultural sector is crucial for mitigating water crises (Lu et al. 2023; Roy et al. 2023), particularly in arid and semi-arid regions. Iran allocates over 90% of its water resources to agriculture (Alizadeh and Keshavarz 2005; Fathi-Taperasht et al. 2022). Evapotranspiration (ET) plays a vital role in optimizing water demand in agriculture, as more than 90% of the water utilized in agricultural ecosystems is lost through ET (Shan et al. 2020; Wang et al. 2019). ET also constitutes a basis for various calculations in water resources management as well as in the design and operation of irrigation and drainage systems (Feng et al. 2017; Yan et al. 2023). Accurate estimation of ET at the field level can greatly enhance management planning for irrigation water, determining the irrigation cycle, estimating the hydromodule of the network (water demand of crops), and predicting crop yield (Allen et al. 1998; Bachour et al. 2016; Anderson et al. 2007; Teuling et al. 2009).

Various factors influence ET variations, a complex physical phenomenon comprising multiple nonlinear processes (Jovic et al. 2018; Li et al. 2022; Amani and Shafizadeh-Moghadam 2023). Over the years, researchers have proposed two general groups of ET measuring techniques: point methods and regional methods. Lysimeter, a point method, is used to measure ET directly with no assumptions (Holmes 1984) and is a benchmark for calibrating other methods (Liu et al. 2017). Nevertheless, its limited availability, high costs, operational challenges, and environmental impact restrict its usage (Fan et al. 2018; WMO 1963; Scanlon et al. 1997). As a result, mathematical models that utilize meteorological data to estimate ET have gained popularity (Ferreira et al. 2019), and numerous indirect methods for estimating ET based on influential factors have been developed (Almorox et al. 2015). The Penman–Monteith (PM) model modified by the Food and Agriculture Organization (FAO) is widely used as a reference for evaluating the performance and calibration of other ET estimation models (Allen et al. 1998).

The FAO56-PM model requires a complete set of meteorological data, comprising air temperature (maximum temperature (Tmax), minimum temperature (Tmin) and average temperature (Tm)), relative air humidity (RH), net solar radiation (Rn), wind speed (Ws), atmospheric pressure, and soil heat flux (G). However, the cost of collecting this data is considerable not only in developed countries (Chu et al. 2017), but also and most particularly in developing countries. Consequently, reliable data may not be consistently available over consecutive years (Bellido-Jiménez et al. 2021; De Paola and Giugni 2013; Eccel 2012). Therefore, preferred over the FAO56-PM model are alternative experimental methods, the most common of which are categorized as temperature-based methods that utilize Tmax and Tmin (Hargreaves and Samani 1985; Hargreaves et al. 1985; Blaney and Criddle 1962), solar radiation-based methods that use the difference between Rn, G, and latent heat (λ) (Abtew 1996; Irmak et al. 2003; Makkink 1957; Priestley and Taylor 1972), mass transfer-based methods that employ Dalton's law and the concept of water vapor flux transfer (Penman 1948; WMO 1963), and hybrid methods that combine various parameters such as solar radiation(Rs), T (Tm, Tmax, Tmin), and RH (Doorenbos and Pruitt 1977; Valiantzas 2013a, b). These methods are often complex, nonlinear, influenced by random factors, and rely on multiple assumptions. Each method is optimized based on the specific characteristics and unique weather conditions of the area under study (Küçüktopcu et al. 2023). Experimental methods for measuring ET, however, are limited to field or catchment-level applications. Furthermore, their results are dependent on time and location, hindering generalization of the findings to other areas. The need to calibrate equation coefficients and the inherent uncertainty associated with these methods have further contributed to their limitations (Islam and Alam 2021; Kisi et al. 2015).

The inherent nonlinearity and instability of meteorological variables makes challenging the complex phenomenon of ET estimation. Consequently, developing a precise physics-based formula for making accurate estimations is difficult. Thus, researchers have recently turned their attention to machine learning (ML) as an alternative approach for ET estimation (Krishnashetty et al. 2021). Numerous studies have demonstrated that ML techniques such as artificial neural networks (ANNs), support vector machines (SVMs), and random forest (RF) outperform empirical and semi-empirical methods in estimating reference evapotranspiration (ET0). ML methods offer advantages such as fast computation, high accuracy, and strong generalization capability (Elbeltagi et al. 2021; Feng et al. 2016; Mousavi et al. 2015; Abd-Elaty et al. 2023). Kumar et al. (2002) introduced an ANN for calculating ET0 that exhibited accuracy comparable to the FAO-56 PM method. Shi et al. (2020) investigated daily ET0 in southeastern Australia and demonstrated the superior performance of RF over empirical equations. Rahimi Khoob (2008) developed an ANN model based on the Hargreaves method that used monthly data from the Khuzestan Plain of Iran, and it outperformed the Hargreaves model. Tabari et al. (2012) simulated ET0 in Iran utilizing several ML methods, all of which outperformed the Blaney-Criddle, Hargreaves, and Jensen Haise models. Landeras et al. (2018) found that ANN models outperformed the Hargreaves method when using the same inputs. Rashid Niaghi et al. (2021) simulated ET0 in a semi-humid climate using gene expression programming (GEP), SVM, ML, and RF methods with empirical equations as inputs. They found that the combination of radiation-based models and the RF model yielded the best performance results across all stations.

Evaluating ML models to reduce input data is crucial because of the significance of data availability in estimating ET. Wen et al. (2015) employed SVM and ANN to model ET0 using limited meteorological data in arid regions of China and compared their results with experimental models like those of Priestly-Taylor and Hargreaves. They found that SVM performed best when using Tm, Rs, and Ws data. Mohammadrezapour et al. (2018) investigated the performance of SVM, adaptive neuro-fuzzy inference system (ANFIS), and GEP utilizing five combinations of inputs to simulate ET0 in southeast Iran from 1970 to 2010 and found that SVM performed superiorly with inputs consisting of Tm, RH, Ws, and sunshine hours (Sshn). Ferreira et al. (2019) evaluated the performance of ANN and SVM models in estimating ET0 across Brazil using either Tm and RH data or T (Tmin, Tm, Tmax) alone; both models demonstrated acceptable accuracy. Bellido-Jiménez et al. (2021) developed various neural intelligence methods, including MLP, generalized regression neural network (GRNN), extreme learning machine (ELM), SVM, and RF, to estimate ET0 using temperature-based data as the only input in southern Spain. They concluded that ELM performed superiorly in all scenarios and locations. In general, ML models using fewer inputs exhibit comparable performances to the FAO-56PM model and outperform experimental methods.

Although ML models excel at unraveling intricate relationships, their effectiveness as data-driven models depends on the careful selection of variables, data quality, and the optimization of model parameters. Determining these parameters, however, typically depends on user expertise and the nature of the input data. In ET estimation, one approach for selecting ML variables is to align them with the inputs used in experimental methods. Despite numerous studies having explored ET estimation using different variables, few have compared ML models to experimental methods for estimation accuracy, identification of important variables, and the generalizability and stability of results. In the current study, 13 experimental methods and four ML models were examined to estimate ET0 in a watershed located in southwestern Iran. The study objectives were: 1) to compare the accuracy of ML and experimental models with similar inputs, 2) to assess the accuracy of ML models compared to the FAO56-PM model using minimal input data, and 3) to identify the variables that influence ET.

2 Material and Methods

2.1 General Methodology

Figure 1 depicts a flowchart illustrating the primary steps of this study. Initially, annual precipitation was processed to identify wet, drought, and normal years. Next, meteorological data for these periods were gathered and utilized as input for estimating ET0 using FAO-56PM, experimental models, and ML models. The results were then assessed using three indices: R2 (coefficient of determination), RMSE (root mean square error), and MAE (mean absolute error).

Fig. 1
figure 1

General Methodology

2.2 Study Area

The current study considered the Karkheh Basin located in the southwest of Iran. Covering an area of 51,000 km2, the basin originates from the Zagros mountain range, flows into Horul Azim (Fig. 2), and boasts elevations varying from 3626 (m.a.s.l) in upstream regions to -8 (m.a.s.l) in downstream areas. The upper parts of the basin are characterized as semi-arid, while the southern part is classified as dry. Average precipitation in the region measures 474 mm and daily Tm fluctuate between -13.7 and 45.9 °C. Dam construction and the expansion of agricultural lands, particularly irrigated areas, have been an enduring characteristics of this basin.

Fig. 2
figure 2

Study area and its location in Iran

2.3 Wet, Drought, and Normal Year Selection

Meteorological data from 15 stations within the Karkheh Basin were procured from the National Meteorological Organization of Iran. Table S1, provided in the supplementary file, provides the main characteristics of the data. Precipitation data for the years 2000 to 2021 were analyzed. Average annual precipitation (± SD) was calculated and the mean ± 1SD was derived. Wet and drought years were defined as average annual precipitation exceeding the mean ± 1SD average precipitation lower than the mean-1SD, respectively; those years falling within these two intervals were considered normal (Chow et al. 1971; McCuen 2016). Thirteen stations reported drought conditions in 2019, eight experienced normal conditions in 2020, and 12 encountered drought in 2021.

2.4 ET Estimation Models

2.4.1 FAO56-PM

The effectiveness of 13 experimental and four ML models was evaluated using the PM equation, specifically the FAO56-PM model (Eq. 1), as the benchmark and calculated as (Allen et al. 1998):

$${ET}_{0}=\frac{0.408\Delta }{\Delta +\gamma }\left({R}_{n}-G\right)+\frac{\gamma }{\Delta +\gamma }\frac{\frac{900}{(Ta+273)}{u}_{2}({e}_{s}-{e}_{a})}{(1+0.34{u}_{2})}$$
(1)

where ET0 denotes reference evapotranspiration (mm/day), Rn is the net solar radiation at the crop surface (MJ m−2 d−1), G represents soil heat flux (MJ m−2 d−1) (which is typically ignored for daily estimates), Ta indicates the daily mean air temperature (°C), u2 represents the wind speed at a height of 2 m (m s−1), es signifies the saturation vapor pressure (kPa), ea represents the actual vapor pressure (kPa) (obtained using maximum and minimum relative humidity), ∆ indicates the slope of the vapor pressure curve (kPa ºC−1), and γ denotes the psychrometric constant (kPa ºC−1).

2.4.2 Experimental Methods

The 13 experimental models utilized to estimate ET0 were the Hargreaves-Samani and Blaney-Criddle (temperature-based); Penman and WMO (mass transfer-based); Makkink, Priestley-Taylor, Jensen-Haise, Abtew, and Irmak (radiation-based); and the Doorenbos-Pruitt, Valiantzas-1, Valiantzas-2, and Valiantzas-3 (combined approaches) models. These models were specifically developed to cater to diverse climatic conditions and geographical regions. Table 1 presents the equations and references for these experimental models.

Table 1 Experimental models for estimating reference evapotranspiration

2.4.3 Machine Learning Models

Random Forest

RF, a tree-based model introduced in 2001 (Breiman 2001), was developed using a base learner called CART which has the capability to model nonlinear and complex patterns (Hastie et al. 2009). Unlike CART which can yield significantly different trees with minor variations in input data, RF employs the bootstrapping sampling method and generates multiple data samples using replacements from the original dataset. Each sample is then used to train a CART model, and the final output is determined by averaging the results. This ensemble approach produces more stable outcomes than CART (Carter and Liang 2019).

Artificial Neural Networks

Multiple ANNs with different architectures have been developed for various applications; among them, the multilayer perceptron is widely utilized. Regardless of the architecture, a learning algorithm is employed to discover the relationships between independent and dependent variables. The learning process entails adjusting the weights to minimize prediction error. During the ANN training phase, the learning algorithm optimizes the weights by reducing the prediction error through a repetitive procedure called backpropagation, which computes the difference between the predicted and the actual output of the network (Rumelhart et al. 1986). The direction and magnitude of the weight adjustments are determined by the partial derivative of the error with respect to each weight (Hecht-Nielsen 1992).

Support Vector Machine

SVM performs well when the available training data is limited (Mantero et al. 2005). This algorithm maps each data instance onto an n-dimensional space, where the dimensions represent the features or independent variables, and then separates them using a line or plane (Cortes and Vapnik 1995). In certain cases, separation is improved by transforming the samples to a higher-dimensional space using kernels. Commonly employed kernels include sigmoid, linear, radial basis function (rbf), and polynomial ones. The support vector acts as an optimal boundary that effectively separates the data groups, aiming to maximize the margin with the data.

Generalized Additive Model

GAM is suitable for situations in which the relationship between independent variables and the response variable is complex and non-linear, such as in environmental processes. GAM is a non-parametric extension of a generalized linear model (Hastie and Tibshirani 1990) that offers explicit insight into the relationships between variables. It allows the response curve to be determined by the observed data utilizing splines, i.e., mathematical functions that offer flexibility in fitting intricate curves to the data. Splines divide the curve into smaller, simpler segments, enabling the representation of the non-linear relationship between independent variables and the response variable (Hastie and Tibshirani 1990).

2.4.4 Variable Selection for ET Estimation

The efficacy of ML models can be affected by the existence of collinearity among independent variables. In this research, collinearity among variables was examined daily and monthly and input variables were selected using variable clustering and variance inflation factor (VIF). Variable clustering is advantageous in feature selection, as it allows for the identification of representative variables within each cluster, which can then be chosen for subsequent analysis or modeling purposes. VIF quantifies the degree of multicollinearity; a VIF value of 1 indicates no collinearity, while a value exceeding 5 is considered indicative of high multicollinearity (O'brien 2007).

2.5 Model Evaluation

RMSE, MAE, and R2 (Eqs. 24) were used to assess the performance of both experimental and ML models. RMSE indicates an overall measure of the error, MAE indicates the average absolute error, and R2 indicates the relationship between the observed and predicted values. R2 should be as close to one as possible, and RMSE and MAE should be close to zero.

$$RMSE=\sqrt{\sum_{1}^{N}\frac{{(Pi-Qi)}^{2}}{N}}$$
(2)
$${R}^{2}=\frac{\sum_{1}^{N}(Qi-Qavg)(Pi-Pavg)}{\sqrt{\sum_{1}^{N}(Qi-Qavg)}\sqrt{\sum_{1}^{N}(Pi-Pavg)}}$$
(3)
$$MAE=\frac{\sum_{1}^{N}(Pi-Qi)}{n}$$
(4)

where Pi is the predicted value of ET0, Pavg represents the predicted mean ET0, Qi denotes the observed value, Qavg shows the mean observed ET0, and n is the number of data.

3 Results and Discussion

3.1 Evaluation of Experimental Models

Different experimental models based on FAO56-PM were assessed using daily and monthly data for normal, drought, and wet years. The combined models exhibited higher accuracy on both daily and monthly scales (Figs. 3 and 4). Conversely, the mass transfer-based models displayed low accuracy. The temperature-based Blaney-Criddle method showed superior accuracy on both daily and monthly scales, potentially because of its incorporation of Ws in the calculation of constants a and b. Among the radiation-based models, the Abtew method (RMSE: 0.78, R2: 0.93 and MAE: 0.57) and Priestley-Taylor (RMSE: 1.57, R2: 0.90 and MAE: 1.17) achieved the highest and lowest accuracy, respectively on a monthly scale. On a daily scale, Priestley-Taylor was the most accurate (RMSE: 1.41, R2: 0.79 and MAE: 1.02), whereas the Jensen-Haise model was the least accurate (RMSE: 8.81, R2: 0.88 and MAE: 6.85). In the combined models, the Valiantzas-3 method demonstrated the highest accuracy, while Doorenbos and Pruitt's method exhibited the lowest accuracy on both daily and monthly scales. According to Fig. 4, the Valiantzas-3 and -1, Abtew, Makkink, and Jensen-Haise models showed higher monthly accuracy compared with other methods, while the Valiantzas-3, Blaney-Criddle, Valiantzas-2, and Priestley-Taylor models showed the most accuracy on a daily scale. The experimental models performed differently across various time scales. Because Valiantzas-3 and Valiantzas-1 required Ws and RH, the Abtew and Makkink models which require the least input were selected for the monthly scale. The Valiantzas-2 and Priestley-Taylor models were found to be most suitable for the daily scale, because Valiantzas-3 and Blaney-Criddle models incorporate Ws in their inputs.

Fig. 3
figure 3

Experimental models for estimating ET0 at the daily interval

Fig. 4
figure 4

Experimental models for estimating ET0 at the monthly interval

Figure 4 presents a performance comparison between the FAO56-PM and various experimental models on the daily scale. As illustrated, the Valiantzas-3, Blaney-Criddle, Valiantzas-2, and Priestley-Taylor models exhibited superior performances for daily ET0 estimation compared to the other models for daily, and thus, the Valiantzas-2 and Priestley-Taylor models were considered the optimal choices. Figures 3 and 4 show the results from evaluations of 13 experimental models on daily and monthly scales, respectively. Among these models, the Valiantzas-3, Valiantzas-1, and Abtew models demonstrated superior performances on the monthly scale. The Abtew model utilizes RS and Tmax as inputs, while the Valiantzas-1 model incorporates Rs, Tm, and RH as inputs. As the model requiring the minimum monthly ET0 data, Abtew was deemed more suitable. It also benefits from a simpler equation.

3.2 Variable Selection for Machine Learning Models

Based on VIF analysis, Ws, vapor-pressure deficit (VPD), and Sshn demonstrated the least collinearity (Table 2); however, why temperature was excluded from the daily and monthly scales is inexplicable, considering its crucial role in ET estimation. The results were further investigated using the variable clustering method, and Fig. 5 presents the outcomes, where one variable from each group falling below the 0.8 dashed line should be chosen. Tm was selected from the group of variables (Tsoil, Tm, Tmax, Tmin), because Tmin and Tmax only represent specific times of the day and cannot adequately capture the Tm for water consumption throughout the entire day. Furthermore, measuring Tmin and Tmax may require specific instruments that are not universally available. Average relative humidity (RHm) was chosen from the group of relative humidity variables (RHm, minimum relative humidity (RHmin), maximum relative humidity (RHmax)), because it reflects the capacity of air to hold water vapor, and higher relative humidity indicates a closer proximity to saturation, resulting in lower ET. Rhmax represents the maximum relative humidity recorded during the day or month, leading to a lower estimation of ET. Conversely, RHmin causes overestimation. VPD was selected from the group (VPD and mean pressure (Pm)), as it directly indicates the atmosphere’s ability to accept water vapor. Based on theoretical considerations and the results of both methods, Tm, RHm, VPD, Sshn, and Ws were chosen for modeling ET estimation.

Table 2 Variable selection using VIF
Fig. 5
figure 5

Variable selection using the variable clustering method

Apart from selecting variables statistically, the availability, low cost, and measurement accuracy of each variable must also be considered. Therefore, various combinations were explored for ET estimation (Table 3). Furthermore, to compare the outputs of the ML models and experimental methods, the input variables of experimental methods were also examined to be used as input for ML models.

Table 3 Combinations of input variables for ET estimation using ML models

3.3 ML Models for ET Estimation

The current study assessed the use of RF, SVM, ANN, and GAM models for estimating ET using different combinations of input variables. The findings are presented in three sections: models utilizing input data similar to the FAO56-PM, models employing diverse input combinations, and models incorporating inputs similar to the experimental methods.

3.3.1 ML Models for ET Estimation Using the Same Input as FAO56-PM

Figure 6 shows the performance comparison of RF, SVM, ANN, and GAM models on daily and monthly scales using the same inputs as the FAO56-PM model. As seen, all models achieved high accuracy with performances similar to that of the FAO56-PM model. Nevertheless, ML models required a significantly longer computational time than the FAO56-PM model using software such as CropWat or Macro Excel. All ML models were executed in less than a minute, and the most accurate models for monthly and daily scales were ANN and SVM, respectively.

Fig. 6
figure 6

ET estimation using the ML models A: ANN daily, B: GAM daily, C: RF daily, D: SVM daily, E: ANN monthly, F: GAM monthly, G: RF monthly, and H: SVM monthly

3.3.2 ML Models for ET Estimation Using Different Input Combinations

Figure 7 illustrates the results of ML models utilizing various combinations of inputs as presented in Table 4. Overall, both R2 and RMSE values improved as more inputs were included, with the ANN model consistently outperforming other models across most combinations. ANN, GAM, RF, and SVM, respectively, exhibited higher accuracy when predicting on a monthly scale. When estimating on the daily scale, however, ANN, SVM, RF, and GAM were respectively more precise.

Fig. 7
figure 7

Accuracy of ML models for ET estimation using different input combinations

Table 4 Accuracy of the ML models with variables similar to the experimental models

In terms of two-variable combinations, the ANN model incorporated Tm and Ws and demonstrated the highest accuracy for daily predictions. For monthly predictions, the SVM utilizing Tm and Ws as well as the GAM employing Ws and VPD exhibited superior accuracy. These findings suggest the importance of Ws in estimating ET. By adding Sshn to the Tm and Ws, an accuracy very close to that of the models with all inputs was achieved by SVM on the monthly scale and ANN on the daily scale. Models using four variables achieved similar accuracy to models using all variables. Among different combinations, adding RH or VPD had an equivalent effect on the combination set of Tm, Ws, and Sshn, which is in line with the findings of Mohammadrezapour et al. (2018). Furthermore, by introducing Sshn to the Tm and Ws variables in the subsequent combination, a level of accuracy comparable to that of models employing all inputs was achieved. The SVM and ANN models displayed greater accuracy for monthly and daily predictions, respectively.

To summarize, ANN and SVM demonstrated superior performances when utilizing a smaller number of variables, whereas RF exhibited better results when incorporating a larger number of variables. Tm, Ws, and Sshn were identified as influential factors in enhancing the accuracy of the ML models. Consequently, models incorporating these three inputs can serve as a viable alternative to the FAO-56PM method. Fan et al. (2019) discovered that including solar radiation further improved the accuracy of the models, and Pandey et al. (2016) demonstrated that models utilizing Ws data achieved higher levels of accuracy.

3.3.3 ML Models for ET Estimation Using the Same Input as Experimental Models

The findings of a comparison between ML models and experimental methods using the same input variables showed that ML models surpassed all experimental models in accuracy for both daily and monthly scales, as shown in Table 5. Specifically, the ANN employing identical inputs as Valiantzas-1 demonstrated a superior performance on the monthly scale, while the SVM utilizing the same inputs as Valiantzas-3 exhibited better results on the daily scale. As mentioned in the preceding section, the Valiantzas-2 and Priestley-Taylor models were determined to be appropriate for daily scale estimations, while the Abtew model was found to be suitable for monthly scale estimations. Among the ML models, the SVM aligned with the Priestley-Taylor model and the RF aligned with the Abtew model demonstrated superior accuracy compared to the other models. Notably, both of these models relied on radiation as a key input. These findings correspond with those of similar studies conducted by Heramb et al. (2023), Ünes et al. (2020), and Pendey et al. (2016), in which radiation-based models consistently demonstrated better performances.

Extensive research has consistently demonstrated the superior performance of ML models over empirical methods, a trend that was also observed in the present study. For example, Salam et al. (2020) reported the superiority of various ML models over empirical models (e.g., Ritchie, Thornthwaite, and Valiantzas) in predicting ET0. Mehdizadeh et al. (2017) showed that ML models (SVM, GEP, and MARS) consistently outperformed empirical methods in estimating ET0 across 44 meteorological stations in Iran. Additionally, Alazba et al. (2016) employed the temperature-based Hargreaves model and the radiation-based Priestley-Taylor model to estimate ET0 using local meteorological data; they found that the ML-based model yielded the most accurate results among all the approaches considered.

3.4 Variable Importance in ML Models for ET Estimation

Figure 8 illustrates the ten most important variables in ML models for estimating daily and monthly ET0. In the ANN model, Tm was the most important for both daily and monthly scales, followed by Ws for the monthly scale and RHm for the daily scale. In the GAM model, variables such as Pm, VPD, and Tm had the most substantial impact on both temporal scales, with Ws being the third influential factor for the daily scale. As for the RF model, Ws emerged as the most important for both scales, followed by Sshn for the daily scale and Pm for the monthly scale. Similarly in the SVM model, temperature exhibited the greatest effect on both temporal scales. In the daily scale, the three primary variables were Tm, Tmin, and Tmax, which aligns with the findings of Wu et al. (2019), who studied eight ML models with daily temperature and precipitation data from 14 different weather stations in China. The researchers recommended SVM models be used with temperature data only to predict daily ET0 throughout China. Additionally, Yunfei et al. (2023) identified temperature and humidity as the most important factors in estimating ET in arid regions.

Fig. 8
figure 8

Variable importance for ET estimation using ML models

In monthly estimations, the variable Tm appeared most frequently with four repetitions, followed by Ws and Pm, each with two repetitions. Additionally, Tmax, Sshn, and precipitation 24 hour (P24) were deemed important, each with one repetition. For daily scale estimations, Tm was the most frequently repeated variable, followed by Tmax and Ws. Tmin, Sshn, and VPD variables were also considered important. Overall, it can be concluded that Tm, Ws, VPD, and Sshn had the most important impact on forecasting, highlighting their importance in ET estimation.

4 Conclusion

The present study assessed the accuracy of thirteen experimental methods and four ML models for daily and monthly ET0 estimation during drought, wet, and normal years and compared their performances against the FAO56-PM method, which served as a benchmark model. The study aimed to identify those variables that impact ET0 estimation notably. The experimental models were categorized into four groups, among which the combined methods exhibited the best performances, while methods based on mass transfer demonstrated weaker performances. Notably, the performance of the experimental models varied across different time intervals. Consequently, the Valiantzas-3, Valiantzas-1, Abtew, Makkink, and Jensen-Haise models were identified as more suitable for monthly scale ET0 estimation. For daily scale ET0 estimation, however, the Valiantzas-3, Blaney-Criddle, Valiantzas-2, and Priestley-Taylor models were considered more appropriate. For cases with minimal input, the Abtew and Makkink models are recommended for monthly scale, while the Valiantzas-2 and Priestley-Taylor models are suggested for daily scale estimations. Nevertheless, the ML and FAO-56PM models performed similarly and exhibited comparable accuracy on both daily and monthly scales. Overall, SVM showed higher accuracy at the monthly scale, while ANN performed better at the daily scale. Furthermore, both ANN and SVM achieved better accuracy when using fewer variables, whereas RF had greater accuracy with a larger number of variables.

In sum, our findings indicate that Tm, Ws, and Sshn contribute positively to enhancing the accuracy of ML models, and ML models can serve as an alternative to the FAO56-PM method. Additionally, with similar inputs, ML models outperformed experimental methods in both daily and monthly scales. In general, Tm, Ws, VPD, and Sshn were found to have the most significant influence on predicting ET0. The present study was conducted in arid and semi-arid regions of Iran. Therefore, it is recommended this research be replicated under different climatic conditions to assess the applicability and performance of this research in diverse regions.