Introduction

Evapotranspiration (ET) plays a key role in water resources management, agriculture, drought, climate change adaptation, and ecosystem productivity (Currie 1991). There are various methods/models to estimate potential evapotranspiration but most of them give precise outputs for specific climate zones (Lu et al. 2005). ETp calculated under certain properties can be regarded as the reference crop ET (ETC). ETc is usually estimated from reference evapotranspiration (ETo), crop, and soil coefficients. FAO and working group of the International Commission on Irrigation and Drainage recommended standardized Penman-Monteith reference evapotranspiration (ETo) as the potential evapotranspiration for short grass or a tall reference crop (alfalfa) (Allen et al. 1998). This hypothetical evapotranspiration considers a reference surface with an assumed crop height of 0.12 m, a fixed surface resistance of 70 s/m, and an albedo of 0.23; and the reference surface closely resembling an extensive surface of green grass of uniform height, actively growing, well-watered, and completely shading the ground. The ETo calculation is an important issue for computing crop irrigation water requirements in agriculture. The practical value of pan evaporation with empirical coefficients (relating ETo) has been widely used for 10 days or longer periods (Allen et al. 1998). Furthermore, many empirical or physically based equations have been developed and used to estimate ETo under the climate regime of the country they were developed. How to choose the appropriate model to estimate ETo among many evapotranspiration calculations is generally a major problem, and method selection under the climatic conditions of the research area is highly subjective unless certain techniques are used. Generally, empirical ETo methods can be categorized under six groups: (1) combination (e.g., Shuttleworth); (2) radiation (e.g., Turc, Priestley and Taylor, Makking, Abtew); (3) temperature (e.g., Blaney and Criddle, Hargreaves–Samani, Thornthwaite, Hamon); (4) mass-transfer based (e.g., Penman, Dalton); (5) water budget methods (e.g. , Guitjens); and (6) pan evaporation methods (e.g., Allen et al. 1998). Many studies evaluate the reliability of these alternative empirical ETo methods for the lack of the calculated ETo data considering the United Nations Food and Agriculture Organization (FAO) Penman–Monteith (PM FAO-56) as the standard method (Table 1). These studies were conducted for purposes such as the effectiveness, improvement, and performance of PM FAO-56 at regional and global scales. Performances of these ETo models have also been evaluated under different climate conditions and land cover. Assumptions and inputs are the most important causes for having different results of the methods (Maes et al. 2019).

Table 1 Summary of various methods or model adopted by different authors for ETo estimation

In recent years, interest has grown in testing models for non-linear relationships. Statistical tests have been proposed in many studies to help analysts check for the presence of non-linearities in an observed time series. Another alternative to ETo estimation is the application of data-driven models. Recently, machine learning models generated simpler equations and require fewer inputs than the PM FAO-56 method. Thus, they are potentially good alternatives in ETo calculation. As shown by numerous studies, machine-learning approaches such as Artificial Neural Networks (ANN) have been successfully applied in ETo research (Zanetti et al. 2007; Traore et al. 2010; Käfer et al. 2020). ANN, which has a nonlinear mathematical structure, trains from the strength of correlation between input and simulated variables by checking previous trends (Yurtseven and Zengin 2013).

Sudheer et al. (2003) used radial basis function (RBF) to simulate crop evapotranspiration (ETc) for rice crops. The simulated data was compared with the lysimetric data. The results clearly showed that RBF performed good (modeling efficiency of 98.2–99.0%) in ETo estimation. Trajkovic et al. (2003) used a RBF type of ANN and found that the ANN gives accurate ET0 estimates. Hashemi and Sepaskhah (2020) also reported the superiority of multi-layer perceptron with sunshine hours and wind speed and the radial basis function with sunshine hours. Zanetti et al. (2007) used the multilayer perceptron for estimating the ETo by using only data from the maximum and minimum air temperatures in Brazil.

Machine learning approaches using support vector machine (SVM) have also been described and evaluated by many studies (Wen et al. 2015; Chia et al. 2020; Seifi and Riahi 2020). SVM, which is a useful estimator for practical applications, has the ability to provide a powerful algorithm between dependent and independent variables. This algorithm uses robust mathematical equations between dependent and independent variables to solve complex problems (Vapnik 1995). SVM has been a preferred approach as it adopts a global optimum rather than a local optimum compared to ANN method, and is less prone to overfitting than the ANN method. However, SVM models for estimating ETo had limited applications compared to ANN models. Wen et al. (2015) developed SVM models for ETo estimation and compared it with ANN model and three empirical models including Priestley-Taylor, Hargreaves, and Ritchie. The study showed that SVM showed relatively superior performance to ANN and empirical equations in modeling ETo.

In recent years, the random forest (RF) model, which is an ensemble learning method for classification and regression, has become popular due to some of its advantages such as satisfactory performance, ability of preventing overfitting, and user-defined parameter selection in both classification and regression problems (Feng et al. 2017). The relative importance of variables can also be determined by this method. Wang et al. (2019) conducted a competitive analysis using different model-based approaches (random forest, gene-expression programming) on daily climatic data from the 24 meteorological stations recorded from 2010 to 2014 and concluded that random forest-based ETo models performed slightly better than the gene expression-based models.

Multiple linear regression is a conventional model for estimating the value of one dependent variable based on two or more independent variables with linear relationship (Tabari et al. 2012). Many researchers have attempted to estimate the evaporation values from climatic variables with MLR. Yirga (2019) reported the performance of MLR in ETo estimation. This research stated that the model is successfully employed for the estimation of the monthly reference evapotranspiration. da Silva et al. (2016) emphasized that models can be regarded as an alternative method to estimate the ETo when the climatic variables are insufficient for other methods.

In this study, we tested some of the recent approaches in ETo estimation. Our study area was the middle Anatolian region that is the driest region in Turkey. Drought has become an important and prominent phenomenon in Turkey, and especially semi-humid (semi-dry) drought classes have shifted to semi-dry (dry) conditions in Central Anatolia regions. Besides, the Central Anatolian Region has strong and big potential for marketing and growing of cereal production (wheat, barley, oat, etc.). In recent years, less rain and more ET have led to crop failure and economic losses in the region. Spatial variability of precipitation regimes influenced by the topography has been studied before (Türkeş and Tatlı 2011; Schemmel et al. 2013). The complex biotic and abiotic environment in different elevation zones makes it difficult to measure and estimate ET directly or indirectly. With respect to the climatic composition at different elevations, the variation of the ETo in different elevation is quite complex. The elevation causes a manifold effect in ETo in different locations since the dynamics of climatic parameters at different altitudes are also different. For example, relative humidity is one of the most relevant meteorological factors in ETo measurement, and it is affected by elevation with a reverse relationship. Moisture availability affected by relative humidity and absolute vapor pressure decreases with elevation (Duane et al. 2008). Therefore, elevation dominates climatic parameters that affect ETo at elevation gradients. Furthermore, the spatial variation in ETo is also affected with RS received by the surface (Vicente-Serrano et al. 2007). Ma et al. (2019) reported that available energy (shortwave radiation and air temperature) increased with elevation is a more influential factor than water vapor. Sun et al. (2020) found that net RS leaf area index and air temperature have strong relationship with ETo in mountainous regions. Wang et al. (2020) emphasized that the FAO-Penman Monteith (PM) and Hargreaves-Samani (HS) perform well as appropriate ETo estimation methods in high elevation zones. Understanding the topographic characteristics, especially elevation controlling the ETo in Central Anatolia and its variability, is one of the scientific gaps of climate research of Turkey. Furthermore, the elevation and ETo interaction in dry regions of Turkey are poorly characterized despite obvious practical importance.

One of the main objectives of this study is to determine possible variations in ETo at different elevations and the performance of selected methods/models that can be used in estimation of ETo.

Other objectives were the following:

  • To calculate ETo using six different empirical methods (FAO-56 Penman-Monteith method-ETo, Hargreaves-ET_Har, Penman-ET_PM, Makking-ET_Mak, Thornthwaite-ET_Thor, and FAO-Blaney-Criddle-ET_BC) and make comparison between these five ETo methods and the FAO-56 Penman-Monteith method (ETo) in regional average values of 45 meteorological stations (represent the average of Central Anatolian Region) and four different elevation groups (650–850 m-G1, 850–1100 m-G2, 1100–1350 m-G3, and 1350–1600 m-G4).

  • To investigate the accuracy of data-driven modeling such as two different artificial neural network (ANN) techniques, namely the multi-layer perceptrons (MLPs), radial basis neural networks (RBNNs), support vector machine (SVM), random forest (RF), and multi linear regression (MLR) in estimating long-term monthly ETo by using data from the same 45 stations in Central Anatolian Region in Turkey.

  • Statistical evaluation of the outputs of all ETo approaches and climatic parameters used in the assessment.

Material and method

Study work-flow

The study work-flow (Fig. 1) presents the research steps represented by subdivision method. The methodology essentially seeks the possibility of different PET methods, ANN (MLP and RBF), SVM, RF, and MLR model as an alternative to the respective FAO-PM (ETo). The flowchart illustrates the primary structure of the model involving three main parts, i.e., calculate ETo with five simple empirical ETo methods and PM FAO-56 method and generate alternative ETo using data-driven models (ANN, SVM, RF, and MLR). The study was carried out in two steps/stages. In the first step, in which denotes regional average, a data formation was prepared by taking the average of the climate data of 45 meteorological stations used in all analysis. At this stage, 45 meteorological stations were evaluated as one station to represent the entire Central Anatolian Region of Turkey. In the second step, the data were grouped according to elevation of meteorological station as main data formations in the paper; this step is termed the “elevation group.” Thus, a large dataset was grouped along four different elevation gradients (650–850 m, 850–1100 m, 1100–1350 m, 1350–1600 m) using the elevation of 45 meteorological stations. All analyses were evaluated separately for both data formations. The conceptual background of the study consists of two main parts. First, ETo was calculated with the equations of different researchers using the unnormalized climate data to compare ETo (PM FAO–56). Second, the climate data were used in alternative data-driven model based on ETo calculations. Performance evaluation was used for determining appropriate method or model to estimate ETo in regional average and grouped data. Therefore, the coefficient of determination (R2), mean absolute deviation (MAD), Nash–Sutcliffe efficiency (NSE), the index of agreement (d), and percent bias (PBIAS) were used to identify the best method among the empirical methods and data-driven ETo models.

Fig. 1
figure 1

Study work-flow

Study area and data acquisition

According to multiple-year local assessments, Turkey is classified under seven geographical and 8 ecological regions (ecozones) (Serengil 2018). This research was conducted in the Central Anatolia geographical region, and Central Anatolia Steppe ecozone. Climatological data from 47 synoptic stations located in Central Anatolian Region of Turkey were obtained from the Climate Forecast System Reanalysis (CFSR) global meteorological dataset. The CFRS dataset consist of hourly weather forecast generated by National Weather Service’s NCEP Global Forecast Syetems. Studies showed that the CFSR data used in hydrological models provide satisfactory results (Fuka et al. 2014; Dile and Srinivasan 2014). All stations, with 35 years of monthly meteorological data, were selected for analysis. The data covered the time period between January 1979 and December 2013. The locations of the 47 stations are given in Fig. 2, and Table 2 shows some characteristics of these stations.

Fig. 2
figure 2

Elevation (b) and precipitation zones (c) with meteorological stations in Central Anatolian Region of Turkey (a)

Table 2 Properties of 45 meteorological stations with long-term average climatic conditions. Elevation of stations, elevation group, annual average temperature (Tavg), annual total precipitation (P), annual total evapotranspiration (ETO), UNEP aridity index (Middleton and Thomas 1997), and UNEP aridity index zone (Middleton and Thomas 1997)

The Central Anatolian Region of Turkey is a generally semiarid area based on United Nations Environment Program (UNEP) aridity index (Middleton and Thomas 1997) with a size of about 151,000 km2, representing 21% of the country. The region is located between 31° 21′ to 38° 07′ E longitude and 36° 59′ to 40° 55′ N latitude. In this region, average altitude of 1000 m and low precipitation plateaus are located and it is limited by Bolu-Köroğlu Mountain to the north, Sündiken and Uludağ Mountains to the west, Toros Mountain to the south, and Tecer Mountains of Turkey to the east. As the region is surrounded by high mountains, the humid mild sea air cannot easily penetrate into the region. Therefore, the region has a continental climate with hot and dry summers and cold and snowy winters. In the region, the terrestrial effect increases due to the increase in altitude, and winter temperatures reach extremely low values towards the east. The annual average temperature of the region is 10–11 °C (Table 1). Annual precipitation averages about 418 mm, and the actual amount is determined by elevation. Low precipitation amount in some areas of the region is not sufficient to satisfy the water need of the crops during especially summer months. In a dry period, it would thus be necessary to irrigate the crops, while in average wet seasons, irrigation is not needed in agricultural areas. Low precipitation generally causes low productivity in agriculture. Drought necessitates fallow practice in grain agriculture. The natural vegetation is mostly composed of steppes since drought prevents forest growth.

In this study, data processing follows the raw data download and converts into usable or readable form. The monthly values of maximum temperature (Tmax), minimum temperature (Tmin), average temperature (Tavg), precipitation (P), average wind speed (U), average (RHavg), maximum (RHmax), minimum relative humidity (RHmin), and average solar radiation (RS) were obtained for 45 stations located in Central Anatolian Region. There are two steps in the study. (1) The regional average of the Central Anatolian Region for each parameter was calculated by taking the average of the values obtained from 45 stations. Therefore, each station has not been evaluated separately in this first step. (2) In the second step, the 45 different climate stations were divided into four different elevation groups as follows: 650–850 m considered as “low elevation group-G1,” 850–1100 m considered as “moderate elevation group-G2,” 1100–1350 m considered as “high elevation group-G3,” and 1350–1600 m considered as “very high elevation group-G4.” The results of five different literature-based equations (ET_Har, ET_PM, ET_Mak, ET_Thor, and ET_BC), ANN (MLP and RBF), SVR, RF, and MLR models for 4 different elevation groups were subjected to performance evaluations with target output ETo. The objective was to compare models for different elevation groups located in different local climatic conditions.

Empirical ETo methods

The following ET methods have been chosen for the assessment:

  1. (a)

    FAO-56 Penman–Monteith method (ETo): This method is considered the most precise method to estimate ETo. The FAO Penman-Monteith method for calculating reference (potential) evapotranspiration ETo can be expressed as (Allen et al. 1998) follows:

$$ \mathrm{ETo}=\frac{0.408\Delta \left({R}_n-G\right)+\gamma \frac{900}{T_a+273}{u}_2\left({e}_s-{e}_a\right)}{\Delta +\upgamma \left(1+0.34{u}_2\right)} $$
(1)

where ETo= reference evapotranspiration (mm day−1); Δ is the slope of the saturated vapor pressure curve (kPa 8C−1); Rn is the net radiation (MJ m−2 day−1); G is the soil heat flux density (MJ m−2 day−1), considered as null for daily estimates; T is the daily mean air temperature (°C) at 2 m, based on the average of maximum and minimum temperatures; U2 is the average wind speed at 2 m height (m s−1); es is the saturation vapor pressure (kPa); ea is the actual vapor pressure (kPa); (esea) is the saturation vapor pressure deficit (Δe, kPa) at temperature T; and γ is the psychrometric constant (0.0677 kPa °C−1).

The following equations were recommended by Allen et al. (1998) to estimate Rn:

$$ {R}_n={R}_{\mathrm{ns}}-{R}_{\mathrm{nl}} $$
(2)
$$ {R}_{\mathrm{ns}}=0.77 SR $$
(3)
$$ {R}_{\mathrm{nl}}=\left[\sigma \left(\frac{T{\max_K}^4+T{\min_K}^4\ }{2}\right)\left(0.34-0.14\sqrt{e_a}\right)\left(1.35\frac{Rs}{Rs\mathrm{o}}-0.35\right)\right] $$
(4)
$$ Rs\mathrm{o}=0.75 Ra $$
(5)

where Rns is the net shortwave radiation (MJ m−2 day−1); Rnl is the net longwave radiation (MJ m−2 day−1); Rs is the incoming solar radiation (MJ m−2 day−1); σ is the Stefan–Boltzmann constant (4.903 × 10−9 MJ K−4 m−2 day−1); TmaxK is the maximum temperature (K); TminK is the minimum temperature (K); SR/SRo is ratio between the incoming solar radiation and the clear sky solar radiation (MJ m−2 day−1), which is less or equal to 1; and Ra is the extraterrestrial solar radiation (MJ m−2 day−1). The other parameters of equation of ETo were determined as follows:

$$ \varDelta =\frac{4098\left[0.6108\exp \left(17.27T/\left(T+237.3\right)\right)\right]}{{\left(T+237.3\right)}^2} $$
(6)
$$ {e}_s=\frac{\begin{array}{c}\left[0.6108\ \exp \left(\frac{\left(17.27T{\max}_c\right)}{T\max_c+237.3}\right)\right]+\\ {}\left[0.6108\exp \left(\left(17.27T{\min}_c\right)/\left(T{\min}_c+273.3\right)\right)\right]\end{array}}{2} $$
(7)
$$ {e}_a=\frac{\mathrm{RH}}{100}{e}_s $$
(8)

where Tmaxc is the maximum temperature (°C); Tminc is the minimum temperature (°C); and RH is the mean daily relative humidity, calculated from maximum and minimum values.

The following equation was used to the equation of a logarithmic wind speed profile to convert wind speed data obtained at height of 10 m to the standard height of 2 m.

$$ {U}_2={U}_Z\left[\frac{4.87}{\ln \left(67.8z-5.42\right)}\right] $$
(9)

where z is the height of the wind speed measurement (=10 m).

  1. (b)

    Hargreaves method (ET_Har): The Hargreaves method (Hargreaves and Samani 1985), which is a temperature based equation, estimates ETo (mm d−1); using only the maximum and minimum temperatures, and is expressed by Eq. 10:

$$ \mathrm{ETo}={C}_0 Rs{\left(T{\max}_c-T{\max}_c\right)}^{0.5}\left(T+17.8\right) $$
(10)

where Rs is the extraterrestrial solar radiation, in mm day−1; and Co the conversion parameter (=0.0023).

  1. (iii)

    Penman method (PET_PM): This method is still a mass-transfer-based method in estimating free water surface evaporation E because of its simplicity and reasonable accuracy. Penman (1948) proposed the following equation.

$$ \mathrm{ETo}=0.35\left(1+\frac{0.98}{100{U}_2}\right)\left({e}_s-{e}_a\right) $$
(11)

where U2 wind speed at 2 m high in miles day−1; es the saturation vapor pressure at the temperature of the water surface; ea the actual vapor pressure in the air.

  1. (iv)

    Makking method (PET_Mak): For estimating potential evapotranspiration (mm d−1) Makking (1957) proposed the following equation.

$$ \mathrm{ETo}=0.61\frac{\Delta}{\Delta +\gamma}\frac{R_s}{\lambda }-0.12 $$
(12)

where Rs = the total solar radiation in cal cm−2 day−1; Δ = the slope of saturation vapor pressure curve (in mb/8C); γ = the psychrometric constant (in mb/8C); λ = latent heat (in calories per gram); P = atmospheric pressure (in millibar).

  1. (e)

    Thornthwaite method (PET_Thor): The Thornthwaite method is a temperature-based method for calculating PET can be expressed as (Thornthwaite 1948):

$$ \mathrm{ETo}=\left\{\begin{array}{c}\ 0\kern0.5em ,\kern7.00em {\mathrm{T}}_{\mathrm{avg}}<0{}^{\circ}\mathrm{C}\\ {}\kern1.25em 16{\left(\frac{10\ {T}_{\mathrm{avg}}}{I}\right)}^a,\kern0.75em 0{}^{\circ}\mathrm{C}\le {\mathrm{T}}_{\mathrm{avg}}\le 26.5{}^{\circ}\mathrm{C}\\ {}-0.43{T_{\mathrm{avg}}}^2+32.24{T}_{\mathrm{avg}}-415.85\kern0.75em ,\kern0.75em {\mathrm{T}}_{\mathrm{avg}}>26.5{}^{\circ}\mathrm{C}\ \end{array}\right. $$
(13)
$$ I=\sum \limits_{k=1}^{12}{\left(0.2{T}_k\right)}^{1.514} $$
(14)
$$ a=0.000000675{I}^3-0:0000771{I}^2+0.01792I+0.49239 $$
(15)

where ETo = reference evapotranspiration estimated by Thornthwaite equation (mm month−1), Tavg = mean monthly air temperature (°C), I = thermal index imposed by the local normal climatic temperature regime, and a = exponent being a function of I. The value of a varies from 0 to 4.25, while the thermal index I varies from 0 to 160.

  1. (f)

    FAO Blaney Criddle method (FAO_BC): Blaney-Criddle equation (BC) is a simpler method comparing than other empirical methods and the method use only air temperature as an input data. The equation calculates evapotranspiration for a “reference crop” and this crop is an actively growing green grass with 8–15 cm high. Blaney and Criddle (1950) proposed a very simplified calculating approach of the temperature-based equation.

$$ \mathrm{ETo}=\mathrm{kp}\ \left(0.46{T}_a+8.13\right) $$
(16)

where ETo = potential evapotranspiration from a reference crop, in mm, for the period in which p is expressed; Ta = mean temperature in °C; p = percentage of total daytime hours for the used period (daily or monthly) out of total daytime hours of the year (365 × 12); k = monthly consumptive use coefficient, depending on vegetation type, location and season, and for the growing season (May to October); k varies from 0.5 for orange tree to 1.2 for dense natural vegetation.

ANN method

Soft computing methods such as artificial neural networks (ANN) have been successfully employed to develop a new estimation model(s) for estimating the available model parameters. ANN is an information processing system that consists of three main layers as input, hidden, and output. ANN works in layers where send parallel-operated information with a series of processing elements called neurons. The function of these neurons provides various conversion functions for synaptic weights with their information. Training was occurred in this process. All neurons receive weighted inputs which run as interconnect between input variables or the outputs, add a bias term and pass the result by an activation function. The basis of this process can be formulized in the following equations.

$$ {I}_J=\sum \limits_{i=1}^n{w}_{ij}{x}_i+{b}_i $$
(17)
$$ {y}_j=f\left({I}_J\right) $$
(18)
$$ f(x)=\frac{e^x-{e}^{-x}}{e^x+{e}^{-x}} $$
(19)

where Ij is the activation value of neuron j of the ith layer; wij is the weight of the ith input and the neuron j of the layer; xi is the ith input value, bi is the ith bias term, yj is the output of the neuron j, and f(x) is the activation function.

In ANN and MLR, the variables in dataset were normalized to increase the model performance. Min-max feature scaling (unity-based normalization) is used to bring all values into the range 0 and 1. The general form of normalization that is using in this study is presented in Eq. 20:

$$ {X}_{\mathrm{norm}}=\frac{x-{x}_{\mathrm{min}}}{x_{\mathrm{max}}-{x}_{\mathrm{min}}} $$
(20)

In this study, ANNs of the multi-layer perceptron (MLP) and radial basis function (RBF) were employed. The back propagation learning algorithm was used in MLP training process. A structure of MLP consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer. Figure 3 represents a three-layer structure of MLP. Each neuron that uses a nonlinear activation functions except for the input layer. Every node is fully connected in MLP, and each node connects with a weight of wij and Kj from input layer to hidden layer and hidden layer to output layer, respectively.

Fig. 3
figure 3

General architecture of the MLR

Radial basic functions (RBF) calculate distance criteria with respect to the center, and the algorithm can be constructed accordingly. Figure 4 represents a RBF structure consisted of a three-layer structure namely (1) input layer, (2) hidden layer, and (3) output layer. The general construction is just like a MLP but there are some differences between MLP and RBF. The most characteristic feature of the RBF network is the activation function (Hp(x) as networks neuron) in hidden layers using Gaussian Bell function that is the most widely used function of RBF (Fig. 4). This function calculates the distance between the neuron center in the hidden layer and the input vector for each neuron in the input layer. The final output is obtained by running sum of dot products of activation function and distance. Therefore, it describes the way that the unit responds to the total input.

Fig. 4
figure 4

General architecture of the RBF

We selected parameters of the input layer considering using correlation performance with the reference evapotranspiration (Table 2). Some monthly climate variables that are Tmax, RHavg, and RS were used in the input layer. The optimum hidden layer node numbers of the ANN models were obtained after trying different hidden layer network structures that errors can be minimized. The optimum iteration number of ANN networks was also tried. The training of the ANN models was stopped at 250 iterations due to the mean square error between the observed and estimated values decreased with increasing iteration numbers until this number of iterations. The learning process of the MLP and RBF was carried out with daily data series extracted from the 45 selected locations between January 1979 and January 2004 (70% of the whole data set). The data series from January 2004 to July 2014 (30% of the whole data set) were used for testing. The hyperbolic tangent and SoftMax activation functions were used for the hidden nodes for MLP and RBF models, respectively. It was found that the network structure of 3-5-1 in MLP and 3-9-1 in RBF leads to the best results. 3-5-1 denotes an MLP model comprising 3 inputs, 5 hidden, and 1 output node.

MLR method

Regression analysis is one of the statistical tools, which can be considered the process as fitting a model to data. In a linear regression model, data and linear functions can be used to construct the relation that model real-world applications and output parameters are estimated from the data. MLR use several (two or more) explanatory (independent) variables to estimate the outcome of a response (dependent) variable with a linear equation to fitting a linear model. The independent variable x is associated with a value of the dependent variable y in MLR analysis. A typical MLR model expressed as in Eq. 21 below:

$$ \hat{Y}={a}_0+\sum \limits_{j=1}^m{a}_j{X}_j $$
(21)

where Ŷ is the model’s output, Xj (from X1 to Xm) is the independent input variables to the model, and aj (from a0 to am) is partial regression coefficients. The magnitude of each regression coefficient (aj) in MLR model shows explanatory power of relationship between dependent and independent variables.

SVM method

Support vector machine (SVM), which is a well-known machine-learning method based on classification and regression analysis theory introduced by Vapnik (1995). The optimal support vector network automatically generated SVMs network architecture while ANN architecture generally involves manual trial-and-error procedures. The types of kernel functions namely linear, sigmoid, polynomial, radial basis function, and multi-layer perceptron are successful in explaining complex data sets. In this study, linear kernel function uses in SVM models. The kernel function is similar to a two-layer perceptron model of the neural network. Unlike the process in standard neural network, the weight of the network is found by solving a quadratic programming problem with linear constraints. In general architecture of the SVM (Fig. 5), the final output connected with hidden nodes are the support vectors (SVs) of the SVM and the weights of SVM network.

Fig. 5
figure 5

General architecture of the SVM

The relationship between a dependent variable (y) and a set of independent variables (x) is determined by f(x) in SVM for regression, according to the following equation:

$$ F(X)=\sum \limits_{k=1}^n{\overline{a}}_n.K\left(x,{x}_n\right)+b $$
(22)

where \( {\overline{a}}_n \) is the Lagrange multipliers, B is a bias term, and K(x, sn) is the kernel function which is based upon reproducing kernel Hilbert spaces. In this study, the input vectors (xn) refer to the daily records of Tmax, RHavg, and RS while the target value (y) refers to ET0 values calculated using the FAO-56 PM. In this study, the SVM (100, 10) model has the regularization constant = 100 and width of the RBF kernel = 10.

RF method

Random forest is one of the machine learning models that can be applied to both regression and classification problems. The algorithm uses decision trees using a CART-like procedure that uses a subset of observations through the bootstrap approach (Tsangaratos and Ilia 2017). It is necessary to understand the decision trees structure, which is the basic part of the model on the basis of the random forest. In the model, many individual trees are created by sampling the variables in the data set. Random forest aims to provide better accuracy by using these many decision trees to create a forest. The subsets of variables are generated in the method and each node in the decision tree is divided by the best of this subset of variables. Each variable is classified by each decision tree and thus contribution of variables is well determined to explaining the variance in the dependent variable. Breiman (2001) introduced that many regression trees in RF are installed on marginal functions which are dependent on random vector (Θ), indicator function (I), and specified numerical predictor hk(X). The marginal functions might be given as follows (Breiman 2001):

$$ \mathrm{mg}\left(X,Y\right)={\mathrm{av}}_kI\left({h}_k(X)=Y\right)-\underset{j\ne Y}{\max }{\mathrm{av}}_kI\left({h}_k(X)=j\right) $$
(23)

The overall result is given as the average of the sub-results from each tree. The average generalization error of RF can be given as follows:

$$ {\mathrm{PE}}^{\ast}\left(\mathrm{forest}\right)={P}_{X,Y}\left(\mathrm{mg}\left(X,Y\right)<0\right) $$
(24)

There are two theorems that can be given in RF algorithm.

Theorem 1

By the number of trees increases, we will have the following:

$$ {P}_{X,Y}\left({P}_{\uptheta}h\left(X,\uptheta \right)=Y\right)-\underset{j\ne Y}{\max }{P}_{\uptheta}\left(h,\uptheta \right)=j<0\Big) $$
(25)

The average generalization error of a tree will be as follows:

$$ {\mathrm{PE}}^{\ast}\left(\mathrm{tree}\right)={P}_{\uptheta}{P}_{X,Y}{\left(Y-h\left(X,\uptheta \right)\right)}^2 $$
(26)

Theorem 2

Suppose that PY = PXh(X, θ) for all Θ, so:

$$ {\mathrm{PE}}^{\ast}\left(\mathrm{forest}\right)\leqslant \overline{\rho}.{PE}^{\ast}\left(\mathrm{tree}\right) $$
(27)

where \( \overline{\rho} \) is represented as a weighted correlation between the Y − h(X, θ) and (Y − h(X, θ) (Breiman 2001).

Overfitting, which is one of the biggest problems of decision trees, is decreasing with training on different data sets in the random forest model. In addition, the chance of finding an outlier in subset of variables created by bootstrap method is reduced. The random forest training algorithm (for both classification and regression) applies bootstrap aggregating, or bagging, to tree learners. More details about random forest can be found in Breiman (2001). In this study, RF is used as regression model to estimate ETo. The important tunable parameters are the number of trees (ntree) and the number of estimators in the random subset of each node (mtry). The default values of mtry (one-third of all estimator variables) were used in this study. The process of ntree decision which affects the forecast performance was used during parameter optimization to yield the minimum error. An iterative evaluation and out-of-bag error (mean squared error for regression problems) were used as the selection criteria in ntree defining. The number of trees was especially used in terms of parameter optimization to yield the minimum error in the study. In general, RMSE decreased with increasing ntree, and r increased correspondingly. In this study, two number of trees were considered differently, for first forest 100 trees and second forest 30 trees. Since the 100-tree gives, the random forest with 100 trees is not included in the results section due to its results are very similar to the results of RBF (ANN). Thus, the random forest with 30 trees was considered in the evaluation of the study.

Performance criteria

Two performance criteria are used in this study to assess the goodness of fit of the models, which are R2, root mean square error (RMSE), Nash Sutcliffe efficiency (NSE), the index of agreement (d), and percent bias (PBIAS) by using the following equations (Moriasi et al. 2015).

$$ {R}^2={\left[\frac{\sum \limits_{i=1}^n\left({O}_i-\overline{O}\right)\left({P}_i-\overline{P}\right)}{\sqrt{\sum \limits_{i=1}^n{\left({O}_i-\overline{O}\right)}^2}\sqrt{\sum \limits_{i=1}^n{\left({P}_i-\overline{P}\right)}^2}}\right]}^2 $$
(28)
$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({O}_i-{P}_i\right)}^2} $$
(29)
$$ \mathrm{NSE}=1-\frac{\sum \limits_{i=1}^n{\left({O}_i-{P}_i\right)}^2}{{\left({O}_i-\overline{O}\right)}^2} $$
(30)
$$ d=1-\frac{\sum \limits_{i=1}^n{\left({O}_i-{P}_i\right)}^2}{\sum \limits_{i=1}^n{\left(\left|{P}_i-\overline{O}\right|+\left|{O}_i-\overline{O}\right|\right)}^2} $$
(31)
$$ PBIAS\left(\%\right)=\frac{\sum \limits_{i=1}^n{O}_i-{P}_i}{\sum \limits_{i=1}^n{O}_i}x100 $$
(32)

where Oi is the results of methods or model as ETo in mm d−1; Pi is the ETo in mm d−1; \( \overline{O} \) is the results of methods or model as ETo, and n is the total number of data.

Results and discussion

In the first step of the study, which denotes regional average, all approach and analysis were made by considering the average of data from 45 meteorological stations that represent the Central Anatolian Region. At this step, the stations were not compared, and the Central Anatolian Region was evaluated as a single station by taking the average of all stations. In the second step, 45 stations were divided into 4 groups according to their elevations, and the meteorological dataset of each stations are averaged within their elevation group. Thus, the performance of models and methods was analyzed according to four different groups in the second step.

The results of first step (regional average)

The mean, minimum, maximum, standard deviation, variation coefficient, and skewness of monthly statistical parameters of regional average dataset for the entire time series are given in Table 3. The statistical parameters of the training, testing, and whole data are shown in the table separately. The performance of ANN models was affected by skewness of the time series data (Zheng et al. 2018). It was shown that ETo and all variables have quite low skewness values in the complete dataset. The precipitation shows higher skewed distribution comparing other parameters for each period (see SK values in Table 3). Accordingly, the skewness values for all data sets were seen to be roughly similar although the SK values of Tmin quite differed from others for each period. The greater of CV values, which is defined as the standard deviation divided by the mean, shows the greater level of dispersion around the mean. The mean ETo (131.70 mm/month) in testing period set is quite higher than the mean ETo in the training and whole data period (119.27 and 127.96 mm/month, respectively). As can be seen from the R2 in whole series, Tmax (R2= 0.84, p<0.05), RHavg (R2= 0.68, p<0.05), and RS (R2= 0.79, p<0.05) are closely correlated with ETo.

Table 3 Statistical parameter of climatic data and ETo in training period, testing period, and entire period

In this study, monthly ET was estimated using six different data-driven models including ANN (MLP and RBF), SVR, RF, and MLR. The ET_MLP, ET_RBF, ET_SVM, ET_RF, and ET_MLR models use the same input variables. The Penman-Monteith FAO-56 equation (ETo) was accepted as the reference equation; and other empirical equations (Hargreaves-Samani, Penman, Makking, Thornthwaite, Blaney Criddle), data-driven models (ET_MLP, ET_RBF, ET_SVM, ET_RF), and statistical model (ET_MLR) were compared with ETo. Table 4 shows the results of all models and equations based on the MAD, RMSE, d, and NSE calculations in training and testing period. Generally, considering its high R2 and low MAD and RMSE, the ET_RBF model and ET_PM formula produced better results in the field of this study within all equation methods, while the worst performance belongs to the ET_Mak in data-driven models. This result is similar in the Kingdom of Saudi Arabia where the Makking equations perform worse than different selected methods (Islam et al. 2020). It is clear from Table 4 that the ET_MLP and SVM model outperformed all other models in terms of all performance criteria in training period. ET_RF and ET_MLR equation results are close to each other, based on their high R2 and low RMSE in training period.

Table 4 The performance statistics of the models and equations in training and testing period

It is apparent that all of the methods and models performed well in training and testing periods, and the values of RMSE, d, and NSE had very small difference between training and testing periods, and all R2 were also greater than 0.85. In testing periods, it is apparent that MLP (R2=0.999, p<0.05) and SVM models (R2=0.998, p<0.05) were better than others in testing period for ETo estimation, (Table 4). Therefore, ET_MLP and ET_SVM were selected as the best fit models for estimating the ETo in training and testing period. The performance of the MLP and SVM model on the testing dataset showed that the MLP and SVM models can be used to provide accurate and reliable ETo estimations. Based on the results of Table 4, Penman method (ET_PM) whose input combinations were U, actual and saturation vapor pressure had the highest value of R2 (0.989; p<0.05), NSE (0.99), and d (0.99), than other empirical equations in the training period. The results of performance evaluation showed that ET_PM also performs clearly better than other empirical methods in testing period based on R2 (0.988), RMSE (20.74 mm/month), d (0.98), and NSE (0.98). In both periods, it was found that the ET_MLP method provides best accuracy (R2=0.998), highest d value (1.00), and lowest RMSE value (2.02 mm/month) in all methods. Malik et al. (2017) reported better performances (RMSE = 0.214 mm/month) by multi-layer perceptron neural network to estimate monthly pan-evaporation (EPm) in Indian central Himalayas. This indicates that the accuracy of the models may vary according to the climate of the research site, the type of climatic data, and the sample size. In this study, ET_RBF and ET_RF models have almost same R2, and both models performed worse than ET_SVM and ET_MLP models in testing period. As can be seen from the Table 4, all performance statistics illustrated a reasonably better performance for all data-driven models than empirical methods. These results are parallel with previous studies (Karimaldini et al. 2011; Tabari and Talaee 2013) which indicate that the performances of data-driven models were better than local calibrated physical model or conventional methods. It is evident that all data-driven models and statistical method (ET_MLR) are rather simple in terms of input parameter, and its difference from empirical methods is that it contains RHavg in the input parameters group. The results of models show that the models, in which Tmax, RHavg, and RS are needed, performed well in reference to ETo modeling and could be used with limited weather data. The results of performance show that the presence or absence of critical input significantly impacted the performances of equation methods. However, the performance values can vary with model dynamics (numbers of hidden nodes, epoch values, type of activation functions used, etc.) in data-driven models with the same input set.

The comparison of the ETo values calculated by FAO PM-56 and the values estimated by different empirical methods and data-driven models in testing period was shown in Figs. 6 and 7, in the form of line graphs, scatter plots, and residual graphs. The slope of regression lines ranged from 0.23 to 1.08 in empirical methods while in data-driven models, these values ranged from 0.89 to 0.99. The ETo values estimated by the ET_MLP, ET_SVM, and ET_PM were close to that calculated using the ETo values and followed the same trend as in ETo. It was clearly shown from the figures that the ET_MLP, ET_SVM, and ET_PM models closely follow the corresponding ETo values and less scattered estimates compared to other methods. Therefore, these methods are considered as best alternatives for estimating monthly averages of monthly ETo based on the values of R2. The slope of regression lines for each method was <1.0 except for the ET_PM and ET_RF method, indicating that ET_PM and ETo methods had strong relationships with the ETo among all empirical methods and data-driven models, respectively. However, in general, the estimated ETo in empirical methods cannot catch the observed values and produce less accurate results than the data-driven methods including ANN, SVM, RF, and MLR in testing period based on R2. For example, the R2 of the ET_MLP, ET_RBF, ET_SVM, ET_RF, and ET_MLR models varies from 0.956 to 0.999 (Fig. 7); the R2 of the ET_HAR, ET_PM, ET_Mak, ET_Thor, and ET_BC models slightly decreases and varies from 0.854 to 0.988 (Fig. 6) in testing period, respectively. These results indicate that types and number of input variables affect better efficiency in the ETo estimation. In equation methods, a radiation-based model (ET_Mak) compared to other empirical methods was not satisfactory, with R2 value of 0.854. It is seen that the Hargraeves method shows less predictive accuracy when considering the peak values of estimated ETo values in equation methods in Fig. 6. An evaluation that only base on R2 may not be sufficient to decide since R2 is oversensitive to extreme values and insensitive to both additive and proportional differences between observed and model-estimated values (Legates and McCabe 1999). The error term calculations based on goodness-of-fit indicators (d, RMSE, and MAD) are also suitable for model evaluation than R2 as they calculate the deviation or error between each pair of observed and estimated values based on the measurement uncertainty. Thus, d, RMSE, and MAD were used in addition to evaluate the performance of all techniques and these values are shown in Table 4. Also shown in Figs. 6 and 7 is graphical representation of temporal variation between observed and estimated monthly ETo values by empirical methods and data-driven models during testing period. Initially, ETo values of cooler months were observed as low and then increased gradually when number of high-temperature months increased in all trends. The record shows marked fluctuations between winter and summer, which implies that changes in climatic conditions that alter evapotranspiration, could easily affect balance and interaction with surface and subsurface water. However, it can perceptibly be seen in Fig. 6 that the Hargraves method did not accurately estimate the evapotranspiration values of the high-temperature months. ET_Har method was not good enough in forecasting peak ETo values. This could be due to the fact that the study area is characterized by a semi-arid continental climate of mild cold winter and hot dry summer, where atmospheric conditions other than temperature, and RS are more favorable to evaporation and transpiration. Therefore, peak ETo values inefficiency could be caused by the formulation used in the ET_Har method. Likewise, the scatters of the ET_PM, which base on a combination technique using U and vapor pressure of input parameters, based models are less dispersed, generally overestimating the ETo with very low errors. Generally, ET_PM models indicate overestimation while values of ET_Mak, ET_Thor, and ET_BC models remain under peak of ETo values for the Central Anatolian Region. The result of empirical methods, which indicates the superiority of the ET_PM models on the combination-based one, could be considered as a reliable alternative method for ETo estimation among empirical methods. The Penman method uses vapor pressure deficit, actual vapor pressure, and an empirical U function. ET_PM method run underestimates ETo little. Lee et al. (2004) reported that this difference derived from empirical wind function used in the equation and the function takes many different forms in literature. The estimation results of ETo using data-driven methods for the regional average values in Central Anatolian Region have revealed that the SVM and MLP models can achieve reliable estimates. In the study, data-driven models estimated peak ETo values more accurately. The data preprocessing such as normalization of the data in data-driven models enabled a finest accuracy for capturing peak magnitudes (Demirel et al. 2009). In particular, among data-driven models, the SVM and MLP-based models used in this study were found to have better performances than the RBF, RF, and MLR models (statistical method); they increased the estimation accuracy by up to 98% in regional average dataset. The obtained results were in well agreement with some previous studies (Sayyadi et al. 2009; Rahimikhoob 2010; Traore et al. 2010) that all reported the application of MLP model and their superior accuracy compared to other methods for ETo estimation in different climates around the Earth. As far as the performance of the ET_SVM model is concerned, the results appeared to be quite satisfactory, and similar results were obtained by Tabari et al. (2012) and Mohammadrezapour et al. (2019) for semi-arid environment. The selection of kernel function type is responsible for performance of SVM model for estimating of ETo (Seifi and Riahi 2020). As mentioned earlier, a very satisfactory performance has been obtained by using the linear kernel function of SVM model. However, Tabari et al. (2012) had found RBF is the best kernel function among the other functions of SVM models. With regard to the overall performance of the applied all empirical methods and data-driven models in testing period, the hierarchical performance for regional average in Central Anatolian Region follows the order: ET_MLP > ET_SVM > ET_PM > ET_MLR > ET_RF > ET_RBF > ET_BC > ET_Thor > ET_Har > ET_Mak, respectively.

Fig. 6
figure 6

Relationship between results of empirical methods and reference evapotranspiration with scatter plot (left) (a), time series (middle) (b), and temporal residual graph (c) in testing period

Fig. 7
figure 7

Relationship between results of data-driven models and reference evapotranspiration with scatter plot (left) (a), time series (middle) (b), and temporal residual graph (c) in testing period

All regression model residuals as a function of observed ETo of testing period and month of year were also examined in Figs. 6 and 7. These graphs explain the vertical distance between the actual data point and the estimated point on the line. Figures show an example of model residuals versus observed ETo for all (regional average) dataset. The seasonality can be seen in the residuals at all methods or models, which is more clearly pronounced at some models such as ET_Mak (Fig. 6) and ET_MLR (Fig. 7). The magnitude of seasonality, which increases with increasing estimated ETo magnitude, is particularly pronounced for notable residuals of ET_RBF and ET_RF models. The reason of these results can be explained by considering that the RHavg is a seasonally dynamic property. In other words, this parameter leads the seasonality more pronounced in residuals. Besides, the seasonal magnitude (the difference between the maximum and minimum value) of seasonally varying RHavg values explained as a percentage is considerably higher than the other parameters. Therefore, the residuals in all models using RHavg as an input parameter were found higher than others that did not use this parameter. For example, the residuals for ET_Har model are not strongly related to ETo magnitude or month of year. The residuals also show relatively unbiased situations for the models. According to the results of equation methods depicted in Fig. 6, the ET_Har, ET_PM, and ET_Mak methods tended to overestimate observed ETo while the ET_Thor method tended to underestimate ETo. In the ET_BC method, residuals generally showed a balanced distribution by years. As can be seen from Fig. 8, the all methods were found to be mostly positive residuals after 2010 and 2011 (Fig. 8a, b). The graphs of cumulative average residuals clearly depict cumulative underestimate and overestimate estimations (Fig 8c, d). Before 2009, the ET_MLP model, which is the best data-driven model, estimated cumulative overestimate values in 2006, while ET_PM, which is the best empirical method, tended to overestimate after 2011. As has been shown in Fig 8c, d, cumulative residual plots may display a tendency to overestimate and underestimate with relation to control of wet and cold biases in considered years of the study area.

Fig. 8
figure 8

Residuals and cumulative residuals graphs for testing period. a Annual average residuals of empirical methods. b Annual average residuals of data-driven models. c Cumulative residuals graph of ET_MLP. d Cumulative residuals graph of ET_PM

The results of the second step (elevation groups)

The statistics given in Table 5 illustrate the difference between some selected data characteristics in the 4 different elevation groups. Tmin of all group except for G4 in whole data shows a significantly greater level of dispersion around the mean compared with other CV of all groups. The parameter with the highest CV value in the whole data of G4 is Tmax. It is seen in Table 5 that the CV values of other parameters are also close to the value in Tmax of G4. The precipitation has higher skewed distribution in all groups, just as in complete dataset shown in Table 5. Another important statistical characteristic of the selected climate data is the highest R2 found between the ETo and Tmax in training period of all four groups in the ranges 0.84 and 0.86 (p<0.05) and the lowest R2 between the ETo and U2 in training period of all groups ranges within an interval of 0.00 and 0.04 (p>0.05)

Table 5 Statistical parameter of climatic data and ETo in four elevation groups

Test results of the six different optimal data-driven models for each station are provided in training period (Table 6) and testing period (Table 7) using long-term monthly data of elevation-based groups. In training/testing period, it is clear from the Tables 6 and 7 that the RMSE values of empirical methods in training period are considerably higher than the RMSE results of testing period. For the ET_RBF, ET_RF, and ET_MLR models, the maximum RMSE (15.12, 29.17, and 12.93 mm/month) values were found for the G1, respectively. The maximum R2 of all models in G1 were found in ET_MLR (R2= 0.998, p<0.05) and ET_SVM (R2= 0.998, p<0.05). These models presented the highest d value (1.00; 1.00) and NSE equal to 1.00 and 0.99, respectively. For the ET_Thor, however, the maximum RMSE value was found to be 3.50 mm/month in the G2. It can be clearly seen in Table 7, the G2 group shows already better performance for all performance criteria than the other groups in testing period. Therefore, it has been determined that the models used for ETo estimation in the Central Anatolian Region can be used most effectively at an altitude between 850 and 1000 m. The values of performance are similar for all elevation groups in training period (Table 7). Once again, the ET_Har method performed the worst in G3, due to significant underestimations, with a RMSE value of 78.95 mm/month, NSE value of −0.11. The values d, NSE, and R2 shown in Table 7 indicate that the ET_MLP was the best simple method for estimating ETo in G4 (R2 = 0.75). It is clearly seen from Table 6 that the accuracy of the ET_MLP is generally better than the other models in ETo estimation. In four groups, the ET_MLP model has the best accuracy. The ET_SVM and ET_PM models respectively also performed well in all groups while the ET_Mak yielded the worst estimation in all groups in testing period (Table 7). Estimated ETo values by models are lower than the observed ETo values since 2009.

Table 6 Comparison of performances of all techniques for ETo estimation of four elevation groups in training period
Table 7 Comparison of performances of all techniques for ETo estimation of four elevation groups in testing period

PBIAS (%) indicates the model performance with overestimate (PBIAS < 0) or underestimate (PBIAS > 0) of ETo, and values of the PBIAS nearer to 0 suggest a model or method with more predictive skill. Safeeq and Fares (2012) emphasize that value of PBIAS more than 15% and less than 25% was considered an indicator of average performance; however, a value between 10% and 15% indicates a good performance, and a value less than 10% indicates a very good performance. As it can be seen, model efficiency using PBIAS is higher for data-driven models as compared to the use of empirical methods in both training and testing period of each group (Fig. 9). The model performance of ET_MLP and ET_RBF in the entire training and testing period is considered “very good” on the basis of the PBIAS values vary between 0 and −2%, respectively (Fig. 9). Celestin et al. (2020) found that the World Meteorological Organization (WMO) and the Mahringer (MAHR) models performed well with monthly data compared to the PM FAO-56 model with PBIAS of −2.5% and −2.6% after the calibration period, respectively. With regard to PBIAS, The ET_Thor method provided the highest PBIAS values in all group in both periods. From the MAD, d, and NSE perspective, ET_Thor shows acceptable performance in both training and testing periods (Tables 4, 5, and 6). However, the maximum RMSE and MAD values are exhibited by ET_Har method in non-group in both periods (Table 4). ET_BC, ET_Mak, ET_PM, ET_SVM, and ET_RBF are underestimating the reference evapotranspiration for all elevation groups and regional average (non-group) in training and testing period. ET_MLP in regional average and ET_MLR in G2 group are equally suitable with 0% PBIAS, and they could also be used satisfactorily to estimate reference evapotranspiration for the study area.

Fig. 9
figure 9

PBIAS (%) graphs. a Empirical methods in training period. b Data-driven models in training periods. c Empirical methods in testing period. d Data-driven models in testing period

For further analysis, the developed predictive models of reference evapotranspiration are examined by Taylor diagram (Taylor 2001). Taylor diagram classifies the results of methods or models by using standard deviation and the R2 of observed and simulated data. The radial coordinate shows the value of standard deviation; the concentric semi-circles represent the magnitude of standard deviation, and the angular coordinate indicates the values of R2. Estimated ETo by different methods and models that run it with observed ETo will lie nearest to the point marked “reference” on the x-axis. Figures 10 and 11 display the standard deviation and R2 (with observed ETo) for the results of different equation methods and models calculated from the various inputs, respectively. The data-driven models (Fig. 11), in general, are produced more accurately than empirical methods (Fig. 10), with the latter having a relatively low d and NSE values (Table 7). On the basis of the results shown in the Taylor diagram, four elevation groups for the variables are determined by concentric analysis, which falls in the range of 78–86, with respect to ideal model points of both empirical methods and data-driven models in testing period. The ET_PM for the Taylor diagram is composed of the models that perform highly for estimated ETo (Fig. 10). Taylor diagram analysis reveals that ET_MLP has the R2 (range between 0.997 and 0.999), lowest standard deviation (range between 74 and 80 mm/month), and smallest RMSE (range between 0.04 and 5.71, and captures observations better than all data-driven models in all group.

Fig. 10
figure 10

Taylor diagram of the correlation coefficient (r), the centered root mean square difference, and standard deviation between estimated ETo of different groups (a G1, b G2, c G3, and d G4) by using empirical methods and ETo values in testing period

Fig. 11
figure 11

Taylor diagram of the correlation coefficient (r), the centered root mean square difference (RMSD), and standard deviation (STD) between estimated ETo of different groups (a G1, b:G2, c:G3, and d G4) by using data-driven methods and ETo values in testing period

Conclusion

The performances of ETo, developed based on two main approaches (regional average and elevation group) to the estimated ETo produced by the five different empirical methods (ET_Har, ET_PM, ET_Mak, ET_Thor, and ET_BC) and the six different data-driven models (ET_MLP, ET_RBF, ET_SVM, ET_RF, and ET_MLR), were assessed for the Central Anatolian Region of Turkey. The performances of the empirical methods and data-driven models are reported to provide evidence for suitable techniques for estimating ETo values.

Monthly selected climatic data variables of 45 meteorological stations, over a period of 35 years (1979–2013) were used in this study. This study conducted by two stages of data preparation. In the first stage, the average of all parameter values obtained from 45 meteorology stations was evaluated. In the second step, the data set was divided into 4 elevation groups. Correlation of the parameters with ETo was taken into account in the selection of input parameters. Climatic variables considered in all stations showed that ETo is strongly and positively correlated with Tmax, RHavg, and RS, with a R2 equal to 0.84, 0.68, and 0.79, respectively. It has been found that these three variables can be effective in modeling evapotranspiration in a semi-arid region. Therefore, these variables should be included in long-term monitoring programs, especially in agricultural planning and water resources management in semi-arid regions due to evapotranspiration is an essential factor that causes a great change in the water budget, especially in fragile semi-arid ecosystems.

Based on the performance of a grouping result evaluations, it is found that the MLP and SVM models in G2 (850–1100 m) can be employed successfully in modeling the monthly mean ETo, because both approaches yield better estimates with high value of R2, compared to other empirical methods and yet MLP being slightly more successful than SVM. Therefore, this research suggests that a reference evapotranspiration in semi-arid region can be modeled using only a few input parameters with the help of a simple but effective data-driven models. We find that Penman method has achieved the highest accuracy in terms of all performance criteria among the empirical methods. The Penman method is suitable for estimating the reference evapotranspiration, and it can be used reliably in semi-arid areas.

From this study, it can be concluded that in case a single climatic variable such as U or sunshine duration is missing, the alternative models can be used for computing accurate PM FAO-56 model semi-arid environments. The results are encouraging and suggest an easy-to-use and accurate estimate to assess reference evapotranspiration model as an alternative to empirical approaches, because the advantage of the soft computational methods lies in the possibility of having improvements in the performance criteria by modifying the important tunable parameters.