1 Introduction

Estimating crop water use (actual evapotranspiration, ETa) is highly crucial as it is used for irrigation scheduling and planning (Kisi 2016). ETa can be measured directly in the field using an eddy covariance flux tower (Samani et al. 2011) or weighing lysimeters (Ding et al. 2010). However, these methods are largely limited because of the high cost of instrumentation. Alternatively, ETa can be calculated using crop coefficient (Kc) and reference evapotranspiration (ET0). ET0 can be estimated using the FAO Penman–Monteith model (Allen et al. 1998) which is the most accurate and recommended method, but requiring several input climate variables has made this method pretty complex for its utilization (Mokari et al. 2021). Thus, developing models with fewer input climatic date is highly required particularly where the climatic data are not completed. Several investigators have developed simplified empirical ET0 models using the limited input climatic data over the last decades (Hargreaves and Samani 1985; Romanenko 1961; Tabari et al. 2013). However, these models have showed less suitability for daily ET0 estimation (Torres et al. 2011). The estimation of ET0 can be a complex and non-linear process where it is quite complicated for empirical models to take into account these complex processes (Fan et al. 2018).

Alternatively, machine learning (ML) models have shown their capability to be used as powerful tools to estimate ET0 since they do not require any specific knowledge of internal variables (Wang et al. 2017). Several ML models including artificial neural networks (ANN), support vector machine (SVM), random forest (RF), extreme learning machine (ELM), and genetic programming (GP) have been investigated by various researchers to estimate ET0 (Fan et al. 2018; Feng et al. 2017a, 2017b; Gocic et al. 2016; Traore et al. 2016; Wen et al. 2015; Yin et al. 2017). Among these models, the SVM and ELM models have showed the best estimation accuracies compared to the other ML models (Abdullah et al. 2015; Fan et al. 2018; Feng et al. 2017b; Patil and Deka 2016; Yin et al. 2017, 2016). Patil and Deka (2016) evaluated three ML models, namely ELM, SVM, and ANN. They reported similar performance when both ELM and SVM models were used to estimate weekly ET0 where these two models outperformed the ANN model. Yin et al. (2017) forecasted ET0 variability with aid of the ELM and SVM models in the north west of China. Both ML models showed great performances for estimating ET0 where a slightly better performance was observed using the ELM model. Fan et al. (2018) compared the ELM, SVM, and four tree-based models to estimate daily ET0 across China. They found that the ELM model slightly outperformed the SVM model where both models showed a better accuracy compared with the four tree-based models.

Although various ML models, the SVM and ELM models in particular, have been frequently used to estimate ET0, their potential to estimate ET0 has not been comprehensively investigated in the regions with different climate zones. The related literature reviews show that several efforts made on estimating ET0 using ML models have mainly focused on a specific climate zone including arid climate zone (Shiri 2018; Wen et al. 2015), semi-arid climate zone (Tabari et al. 2012), Mediterranean climate zone (Kisi 2016), a warm and humid climate zone (Feng et al. 2017b), and a maritime climate zone (Shiri et al. 2012). Wen et al. (2015) estimated daily ET0 using the SVM model with four combinations of climatic data in the extremely arid regions of China. The findings showed the SVM model estimated daily ET0 more accurately than the ANN and empirical models. Kisi (2016) investigated the performance of three different ML models, namely SVM, multivariate adaptive regression splines (MARS), and M5 Model Tree (M5Tree), to estimate ET0 in the Mediterranean climate zone. They reported that the SVM model performed better than the MARS and M5Tree models.

To the best of our knowledge, there are limited studies on evaluating the potential of ML models for estimating ET0 in the region with different climate zones. It is crucial to know how powerful the ML models are when the climate zone is changed. For examples, Shiri et al. (2014) comprehensively assessed the empirical, semi-empirical, ML models for estimating ET0 across three different climate zones in Iran. The most accurate results were observed in the humid climate zone while the poorest ET0 estimations were found in the arid climate zone. New Mexico (NM) is comprised of eight climate zones where the climatic data (particularly air temperature) are highly variable between the different climate zones. Thus, the objectives of the present study were to (1) comprehensively evaluate the potential of ML models including SVM, ELM, GP, and RF for estimating daily ET0 across different climate zones in NM during the 2009–2019 period where no studies are available on estimating daily ET0 using ML models; and (2) assess the effects of different input combinations of climatic data on the estimation accuracy of daily ET0 across different climate zones of NM.

2 Materials and methods

2.1 Study area

Based on the topographic features, the NM state is divided into eight climate zones (Karl and Koss 1984) where six of them including climate zones 1, 2, 3, 5, 7, and 8 were studied in this study (Fig. 1). The mean annual temperature varies from 20 ℃ in climate zone 8 to 4.4 ℃ in climate zone 2 in the north with high mountains and valleys (NM climate center: Climate in New Mexico). In summer, although daytime temperature can exceed 37 ℃ in climate zone 8, the average monthly maximum temperature during July as the warmest month is slightly over 32 ℃. The average annual rainfall differs from less than 254 mm in the southern parts such climate zone 8 to more than 508 mm at higher elevations such as climate zone 2. The potential evaporation in the state is much higher than average annual rainfall and it can reach 1854 mm in the southeast parts such as climate zone 7 (NM climate center: Climate in New Mexico).

Fig. 1
figure 1

The geographical locations of the six weather stations across different climate zones (Z1 to Z8) in NM state

2.2 Data collection and input scenarios

Continuous time series of daily climatic data including maximum air temperature (Tmax), minimum air temperature (Tmin), maximum relative humidity (RHmax), minimum relative humidity (RHmin), wind speed at 2 m height (U2), and total solar radiation (RS) during the 2009–2019 period were collected from six weather stations across different climate zones of NM (Fig. 1). The collected data were analyzed to determine the missing and outlier data. Days with missing and outlier data were removed. The average relative humidity (RHave) was also calculated using RHmax and RHmin. The FAO Penman–Monteith model (Allen et al. 1998), the most accepted method for estimating ET0, was applied as follows:

$${\mathrm{ET}}_{0}=\frac{0.408 \Delta \left({R}_{\mathrm{n}}-G\right)+\gamma \left(900/\left({T}_{\mathrm{m}}+273\right)\right){u}_{2}\left({e}_{\mathrm{s}}-{e}_{\mathrm{a}}\right)}{\Delta +\gamma \left(1+0.34{u}_{2}\right)}$$
(1)

where ET0 is the reference evapotranspiration (mm/day), Rn is the net radiation at the crop surface (MJ m−2 day−1), G is the soil heat flux density (MJ m−2 day−1), Tm is the mean daily air temperature at 2 m height (°C), u2 is the wind speed at 2 m height (m/s), es is the saturation vapor pressure (kPa), ea is the actual vapor pressure (kPa), es − ea is the saturation vapor pressure deficit (kPa), \(\Delta\) is the slope of the vapor pressure curve (kPa °C−1), and \(\gamma\) is the psychrometric constant (kPa °C−1).

Four different input scenarios were determined to assess the effects of different input combinations of climatic data on the estimation accuracy of daily ET0 across different climate zones. The scenarios were S1 (Tmax, Tmin, RHave, U2, RS), S2 (Tmax, Tmin, U2, RS), S3 (Tmax, Tmin, RS), and S4 (Tave, RS).

2.3 Applied machine learning (ML) models

2.3.1 Extreme learning machine (ELM)

ELM, known as an advanced method of the single-hidden layer feed-forward neural networks (SLFNs), is a model with a single input layer, a hidden layer, and an output. This standard form of the model is classified as a type of ANN model. The computation process in the ELM is faster compared to the traditional ANN. In ELM, the hidden biases and input-hidden weights are generated randomly when the hidden nodes are selected. Then, the hidden layer outputs are computed. Finally, the hidden-output weights are determined using the Moore–Penrose generalized inverse. More detailed information about this model can be found in the literature (Huang et al. 2006).

2.3.2 Genetic programming (GP)

GP is a data-driven technique developed by Koza (1992) which is used for finding a highly fit individual in the space of possible solutions. In this method, individuals are mathematical formulas created by combinations of functions such as sin (α) and variables. The GP model applies evolutionary computation to find the best individual for the optimized fitness values. Generally, the GP model follows five steps to find the fittest individual: (1) an initial random population of individuals composed of functions and variables is created; (2) the fitness of each individual in the population is validated with a problem-specific fitness function and the most appropriate individuals are selected to survive in the new population as parents; (3) once parents are selected, they create better types known as offspring or new generations by producing algorithms known as genetic operators; (4) then, the individuals are assessed for fitness; and (5) the process from (2) to (4) is repeated over several generations until an individual satisfies a given success criterion.

2.3.3 Random forest (RF)

RF is an ensemble ML model which has been widely applied for several regression and classification problems (Breiman 2001). This model includes several random and simple decision trees. Fundamentally, the target of the RF model is to create a large random subset of decorrelated regression trees with bootstrap from samples and features. This model is divided into two main parts, i.e., randomness and ensemble learning. More details about this model can be found in Breiman (2001).

2.3.4 Support vector regression (SVR)

SVR is a frequently used ML model for classification and regression purposes (Cortes and Vapnik 1995). The structural risk reduction (SRR) concept is employed in this model as an alternative of the empirical risk reduction concept which is frequently used by ANN models. Based on the SRR concept, the upper bound to the generalization error is minimized instead of the training error which is resulted in an optimum network structure (Lin et al. 2006). The SVR is originated in a fundamental hypothesis known as the nonlinear mapping of the principal data into a higher dimensional feature space. The performance of linear regression in the feature space is employed by the kernels. The radial basis function (RBF) is found to be the best kernel among several kernels used in the SVR (Barzegar et al. 2017). Therefore, the RBF kernel was applied in the present study.

2.4 Cross-validation and model parameterization

In this study, a tenfold cross-validation method was applied for the training period to determine the optimum parameters for the applied ML models. Then, the optimum values of each ML model were used to estimate ET0 for the testing period. Normalization, which is a part of data preparation for ML models, was also applied to match the consistency of the ML models. Table 1 shows the optimized values of the four different ML models with different input scenarios at six different climate zones.

Table 1 Optimized parameters of the four different ML models with different input scenarios at six different climate zones

In ELM, the weights and biases of the hidden layer are generated using the random computation. The random initialization of the weights in ELM can result in different outputs of the networks for identical numbers of neurons. To find the best weights, 1000 ELMs with the selected number of hidden neurons were trained in the training period and the best weights that minimize the objective function were maintained. Then, the selected structure with optimized weights was used to estimate ET0 in the validation phase. A three-layer ELM model with a sigmoid activation function was developed. Therefore, the optimum number of hidden neurons was found using the tenfold cross-validation approach.

In GP, the two main parameters population size and generation size need to be optimized to produce a great performance. The optimum parameters were determined with aid of the harmony search algorithm (Zong Woo et al. 2001) using the tenfold cross-validation method.

With respect to the RF model, the main parameter which is known as ntree (the number of decision trees) was determined using the tenfold cross-validation approach based on the reported numerical ranges by Belgiu and Drăguţ (2016).

In SVR, the RBF kernel function was applied including three main parameters of structural parameter (γ), penalty coefficient (C), and tolerance threshold (ε). The different values for each parameter including γ (20 values between 0.0001 and 10,000), C (20 values from 0.0001 to 10,000), and ε (10 values between 0.001 and 1) were assessed and the optimized values were determined (Zhang et al. 2018).

2.5 Model performance evaluation

Quantitative measures (Despotovic et al. 2015) including coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE) were used to assess the performance of the different ML models for estimating daily ET0 as follows:

$${R}^{2}=\frac{\sum_{i=1}^{n}{\left({P}_{i}-{\mathrm{PM}}_{i}\right)}^{2}}{\sum_{i=1}^{n}{\left({\mathrm{PM}}_{i}-{\mathrm{PM}}_{\mathrm{ave}}\right)}^{2}}$$
(2)
$$\mathrm{RMSE}=\sqrt{\frac{{\sum }_{i=1}^{N}{\left({P}_{i}-{\mathrm{PM}}_{i}\right)}^{2}}{N}}$$
(3)
$$\mathrm{MAE}=\frac{1}{n}\sum_{i=1}^{n}\left|{P}_{i}-{\mathrm{PM}}_{i}\right|$$
(4)

where Pi is the ith value of estimated ET0 for ML models, PMi is the ith value of ET0 for the FAO Penman–Monteith model, PMave is the average of ET0 values for the FAO Penman–Monteith model, and N is the number of paired values.

Higher values of R2 (closer to 1) show more efficient models while lower values of RMSE and MAE indicate a better model performance.

3 Results and discussions

The statistical results of the four ML models employing different input scenarios to estimate daily ET0 across different climate zones in NM are shown in Tables 2, 3, 4, 5, 6, and 7. It is clear that the estimated daily ET0 values varied significantly following ML model types and input scenarios. For example, with respect to climate zone 5 (Table 5), the RF model was found to be the best ML model for all input scenarios during the training stage where the highest R2 and lowest RMSE and MAE were observed. The SVR and ELM models indicated great performances to estimate daily ET0 for all input scenarios during the testing stage. However, the GP and RF models were found to be comparable models for estimating daily ET0 to the SVR and ELM models. Apparently, all ML models under S1 and S2 scenarios showed the best estimation accuracy compared with other input scenarios (Table 5). The average RMSE of S1 and S2 scenarios during the training and testing stages was 0.18 mm day−1 and 0.22 mm day−1, respectively. However, ML models under S4 scenario (only Tave and RS as input climatic data) were also able to estimate daily ET0 pretty accurately with having an average RMSE of 0.48 mm day−1 and 0.56 mm day−1 during the training and testing stages, respectively (Table 5).

Table 2 Statistical values of the four different ML models with different input scenarios during the training and testing at climate zone 1
Table 3 Statistical values of the four different ML models with different input scenarios during the training and testing at climate zone 2
Table 4 Statistical values of the four different ML models with different input scenarios during the training and testing at climate zone 3
Table 5 Statistical values of the four different ML models with different input scenarios during the training and testing at climate zone 5
Table 6 Statistical values of the four different ML models with different input scenarios during the training and testing at climate zone 7
Table 7 Statistical values of the four different ML models with different input scenarios during the training and testing at climate zone 8

All applied ML models showed different accuracies under various input scenarios across different climate zones (Tables 2, 3, 4, 5, 6, and 7). Similar to climate zone 5, the ML models under S1 and S2 scenarios had the best estimation accuracy during the testing stage across other climate zones. Both the SVR and ELM models provided the best estimation of daily ET0 for all input scenarios across all studied climate zones during the testing stage followed by RF and GP models with acceptable accuracy (Tables 2, 3, 4, 5, 6, and 7). Therefore, the accuracy ranking is SVR = ELM > FR > GP according to the statistical indicators provided in the Tables 2, 3, 4, 5, 6, and 7. With respect to the lack of complete dataset, daily ET0 estimated by ML models under S1 and S2 scenarios were observed more accurately compared with the calculated daily ET0 values by the FAO Penman–Monteith model across all studied climate zones. However, the ML models under S3 and S4 scenarios were found to be more preferred at climate zones 1, 5, and 8 (Tables 2, 3, 45, 6, and 7).

Input scenarios had a major key in the estimation accuracy of ML models. The ML models under S1 scenario produced a better accuracy compared with other scenarios although the difference between S1 and S2 scenarios was negligible for some climate zones. The findings showed that the estimation accuracy of ML models was decreased with lack of RHave and U2 data in input scenarios where this reduction was the worst in climate zone 7 (RMSE and MAE > 1.5 mm day−1). Therefore, RHave and U2 data played a key role in the estimation accuracy of daily ET0 using ML models across different climate zones in NM. However, the results of ML models based on S4 scenario (only Tave and RS) showed acceptable ET0 estimations particularly in climate zone 5 where RMSE varied between 0.5 and 0.6 mm day−1. Findings are in agreement with previous studies which showed more input climatic data improved the model estimation accuracy but the contribution of climatic data for estimating ET0 varied across different climate zones (Antonopoulos and Antonopoulos 2017; Fan et al. 2018).

Figure 2 shows the scatter plots of the calculated ET0 by the FAO-PM model and estimated values by the four ML models for the best scenario under different climate zones in the testing stage. The SVR model provided more scattered estimations for climate zones 1, 2, 3, 5, and 8, whereas the ELM model produced more scattered estimations for climate zone 7 (Fig. 2). Generally, the estimated ET0 by SVR and ELM models were observed to be closer to the calculated ET0 by the FAO-PM model (Fig. 2). This trend showed that SVR and ELM models produced accurate estimations of daily ET0. The findings are in agreement with previous studies. Fan et al. (2018) reported the ELM and SVM models as the best combination of estimation accuracy and stability for estimating ET0 in different climate zones of China. Wen et al. (2015) showed the potential of the SVR model than the ANN model for the accurate estimation of daily ET0 in the extreme arid regions of China. Feng et al. (2017b) reported that the ELM model could be successfully used for estimating ET0 in southwest of China.

Fig. 2
figure 2

Scatter plots of the calculated reference evapotranspiration (ET0) by the FAO Penman–Monteith model (FAO-PM) and the estimated values by the four different ML models for best scenario across various climate zones in the testing stage. ELM extreme learning machine, GP genetic programming, RF random forest, SVR support vector regression. Z1 climate zone 1, Z2 climate zone 2, Z3 climate zone 3, Z5 climate zone 5, Z7 climate zone 7, Z8 climate zone 8

Figure 3 shows the training and testing RMSE of the best ML model in each climate zone for various input scenarios. SVR and ELM were the best ML models in all climate zones for all input scenarios (Fig. 3). For S1 and S2 scenarios, the corresponding models provided the best estimation accuracy (RMSE < 0.5 mm day−1) in both the training and testing stages for all climate zones except climate zone 7 (Fig. 3). However, the corresponding ML models showed acceptable estimation accuracy (RMSE < 1 mm day−1) when S3 and S4 scenarios were employed (Fig. 3). The percentage increase in testing RMSE over training RMSE for the best ML model in each climate zone under various input scenarios is also shown in Fig. 3. The SVR and ELM models provided the highest stability in the testing stage where either decreases or the smallest increases in RMSE were observed (Fig. 3). The stability of ML models has been a key factor for estimating ET0 because it can affect the estimation accuracy significantly. Fan et al. (2018) reported a large percentage increase in testing RMSE when the RF and M5Tree models were used to estimate daily ET0 across China. However, they found the SVM and ELM models as the most stable models with the RMSE of less than 10.1% in the testing stage.

Fig. 3
figure 3

Percentage increase in testing root mean square error (RMSE) over training RMSE for different input scenarios (S1, S2, S3, and S4) with the best ML model for each climate zone. ELM extreme learning machine, GP genetic programming, RF random forest, SVR support vector regression. Z1 climate zone 1, Z2 climate zone 2, Z3 climate zone 3, Z5 climate zone 5, Z7 climate zone 7, Z8 climate zone 8

4 Conclusion

The present study assessed the potential of ML models including extreme learning machine (ELM), genetic programming (GP), random forest (RF), and support vector regression (SVR) for estimating daily ET0 using various input scenarios across different climate zones in NM during the 2009–2019 period. Findings showed that the estimation accuracy of daily ET0 values was a function of ML model types and input scenarios across different climate zones. Both the SVR and ELM models provided the most accurate estimation of daily ET0 during the testing stage followed by RF and GP models with acceptable accuracy in all studied climate zones. Daily ET0 estimated by ML models under S1 and S2 scenarios were found more accurate compared with the calculated daily ET0 values by the FAO Penman–Monteith model across all studied climate zones. However, the ML models under S3 and S4 scenarios were more preferred at climate zones 1, 5, and 8. Input scenarios showed significant effects on the estimation accuracy of ML models. The ML models under S1 and S2 scenarios showed better accuracies than other scenarios. The estimation accuracy was decreased under missing RHave and U2 data in input scenarios where this reduction was the worst in climate zone 7 (average RMSE of 1.5 mm day−1). Therefore, RHave and U2 data had a major role in the estimation accuracy of daily ET0 across different climate zones in NM. With respect to the best input scenario for each climate zone, the SVR model showed more scattered estimations for climate zones 1, 2, 3, 5, and 8 whereas the ELM model produced more scattered estimations for climate zone 7. The SVR and ELM models offered the highest stability in the testing stage where either decreases or the smallest increases (less than 10%) in RMSE were found.

Findings provide guidelines for future investigators who need to study specific climate zones and identify appropriate ML models for the climate zone. The SVR model can be effectively applied to estimate ET0 in regions where the mean annual temperature fluctuates between 4 and 20 ℃. This model also has potential to estimate ET0 in dried regions where the average monthly maximum temperature exceeds 32 ℃. Estimation of ET0 in windy climate zones can bring additional challenges. The results of this study suggest that the ELM model can be used for those regions. In addition, the results of this present study can be applied to forecast agriculture/rangeland productivity which is crucial for agricultural planning. As an example, estimated ET0 using ML models in this study can be used to estimate rangeland aboveground biomass across New Mexico which is vitally important for grazing management. Rangeland’s production is directly affected by ET0. Thus, a model can relate estimated ET0 using ML models to aboveground biomass. ML models are more convenient and comparably faster to be implemented than other models particularly when climate data are limited which was the case in this study. Generally, estimated ET0 using ML models can be used as an input layer for variety of decision-making models where precision agriculture is practiced.