Introduction

Evaporation occurrence in nature is an essential element for the hydrological cycle and the prediction of evaporation is a significant issue for water resources, agricultural modeling and water management (Gundalia and Dholakia 2013; Malik and Kumar 2015; Feng et al. 2018). Therefore, modeling evaporation is an especially important process, especially in regions and basins where measurements are insufficient (Dalkiliç et al. 2014). The evaporation process is not linear and has a complex structure and generally depends on heat energy and vapor pressure, which depends on atmospheric pressure, solar radiation, air temperature, relative humidity, and wind speed. It is also affected by season, geographical location, and climate type (Vicente-Serrano et al. 2018).

Direct or indirect procedures are applied to calculate and predict evaporation. While pan evaporation (Ep) measurements from direct approaches are used, methods such as Penman–Monteith techniques and energy and water budgets are used from indirect procedures. Because the evaporation process is not linear, it is not possible to use a mathematical model to accurately represent it (Rezaie-Balf et al. 2019). However, many semi-empirical and empirical methods have been established to predict evaporation (Antonopoulos and Antonopoulos 2017; Lu et al. 2018). Nevertheless, the biggest problem in using such evaporation prediction methods is that meteorological variables are dynamic due to their nonlinearity and stochastic properties (Yaseen et al. 2020).

Due to the lengthy duration and numerous limitations of the physically based and statistical methods employed in the hydrological study, machine learning (ML) algorithms have gained prominence. Prediction models developed with ML algorithms allow simpler implementation, faster results with less input, and excellent computing efficiency (Mosavi et al. 2018; Zounemat-Kermani et al. 2020; Dazzi et al. 2021).

The use of ML algorithms in the field of hydrology has been widely increasing in recent years. Algorithms such as Gradient Boosting Machines (GBM), Empirical Mode Decomposition (EMD), Variational Mode Decomposition (VMD), Ensemble Empirical Mode Decomposition (EEMD), and Robust Empirical Mode Decomposition (REMD) algorithms have enjoyed wide usage. The following paragraphs present the numerous instances where these algorithms have been used.

Wang et al. (2017) researched the performances of neuro-fuzzy inference systems with grid partition (ANFIS-GP), fuzzy genetic (FG) and M5Tree to predict the monthly Ep values of stations. They found that FG models were successful in estimating Ep. On the other hand, Pammar and Deka (2017) used support vector machines (SVM) and discrete wavelet transform (DWT) for a pan evaporation prediction. In terms of accuracy, they concluded that the results of the Ep predictions of the established models were very promising. Lu et al. (2018) predicted daily Ep using empirical models together with gradient boost decision tree (GBDT), random forests (RFs), and M5 model tree (M5Tree). As a result of the research, they found the GBDT model to be the best in estimating and stable daily Ep among all models. Yaseen et al. (2020) employed the cascade correlation neural network (CCNNs), gene expression programming (GEP), classification and regression tree (CART), and SVM to forecast evaporation using meteorological variables. At the conclusion of the study, they observed that all ML algorithms performed well in forecasting evaporation, with the SVM providing the greatest performance results.

Rezaie-Balf et al. (2019) used EEMD combined with SVM and model tree (MT) to predict the monthly Ep. Ultimately, they discovered that MT and SVM algorithms combined with Ensemble Empirical Mode Decomposition (EEMD) performed better at estimating monthly Ep than MT and SVM algorithms alone. Ali Ghorbani et al. (2018) used the Quantum-Behavioral Particle Swarm Optimization (QPSO) model trained on the multilayer perceptron (MLP) algorithm to predict the daily evaporation rates and found that the hybrid MLP-QPSO algorithm outperformed the hybrid MLP-PSO and the standalone algorithm. Mohamadi et al. (2020) used firefly algorithms (FFAs) and shark algorithm (SA), which trained MLP, ANFIS and radial basis function (RBF) algorithms to predict monthly evaporation and revealed that ANFIS-SA gave better than the other algorithms. Wu et al. (2020) researched hybrid WOAELM and FPAELM algorithms to demonstrate the applicability flower pollination algorithm (FPA) and whale optimization algorithm (WOA) of the extreme learning machine (ELM) algorithm for monthly Ep prediction. It was concluded that the hybrid FPAELM algorithm gave the best estimation result and both gave better results than other models. Jasmine et al. (2022) used the ANFIS and hybridization of ANFIS, which include the firefly algorithm (FFA), GA, and PSO algorithms, to forecast the evaporation of agricultural areas and found that ANFIS–PSO and ANFIS gave better results than ANFIS–FFA and ANFIS–GA.

To predict the monthly Ep estimation, Al-Mukhtar (2021) researched the performances of ML models, which are multivariate adaptive regression splines (MARS), bagged multivariate adaptive regression splines (BaggedMARS), conditional random forest regression (CRFR), weighted K-nearest neighbor (KKNN), K-nearest neighbor (KNN), and model tree M5. He found that the weighted KNN algorithm gave the best results. Malik et al. (2022) also researched the effectiveness of GBM and deep learning (DL) algorithms using the maximum air temperature parameter in predicting evaporation for two stations and found that the DL algorithm gave better results than the GBM algorithm.

To model pan evaporation under different climatic conditions in Iran, Dehghanipour et al. (2021) used an MLP-NN and genetic algorithm. They used the best overall relationship in the first method as the main relationship in the second method and determined the climatic correction coefficients for the other climate types with the genetic algorithm rhythm optimization model. Their study found that both methods gave accurate results in modeling pan evaporation in Iran.

In the evaporation research, the most well-known algorithms in the literature—ANFIS, SVM, and MLP—were employed, as the literature review has shown (Wang et al. 2017; Pammar and Deka 2017; Ali Ghorbani et al. 2018). The research has determined that the evaporation estimation research topics made with hybrid algorithms in the literature are few and limited to certain algorithms (Wu et al. 2020; Jasmine et al. 2022). Furthermore, evaporation estimates, especially with signal separation techniques, are limited and generally used a signal separation model (Rezaie-Balf et al. 2019).

Signal separation techniques allow understanding of the basic structure of time series and reveal patterns or trends that may be present. Therefore, predicting future values and detecting unusual data observations or anomalies is important. Also, decomposition techniques can help express these patterns and reveal their effects.

Considering this deficiency in the literature, evaporation prediction research was carried out by combining the four signal decomposition techniques REMD, EMD, EEMD and VMD algorithms with the GBM model. In addition, the effectiveness of four different signal separation techniques, which were not used simultaneously before on the GBM model was investigated.

Consequently, for the first time in the literature, the results of monthly evaporation estimates for a region were investigated with the GBM technique, which was hybridized with REMD, EMD, EEMD and VMD signal separation techniques using eight parameters. The innovative contribution of this research to the literature is to show the final state of artificial intelligence algorithms in evaporation and to determine the evaporation prediction performance of some hybrid models by parameter optimization. Additionally, the effect of various meteorological parameters on evaporation prediction was evaluated. This revealed the type of input combination that can predict evaporation successfully.

This study aims to forecast evaporation using monthly total precipitation (P), monthly average temperature (Tavg), monthly minimum temperature (Tmin), monthly maximum temperature (Tmax), monthly average wind speed (WS), monthly average actual compression (AP), monthly average relative humidity (RH), and various combinations of monthly total solar time (ST) variables. The data was modeled using the GBM, EMD-GBM, REMD-GBM, EEMD-GBM, and VMD-GBM algorithms. The models underwent training and testing phases before being assessed using the NSE, MAE, RMSE, R2 coefficient, radar plot, and boxplot analysis. It is presumed that the models examined in the study can be used to forecast evaporation in various places.

The remainder of this article is organized as follows: "Method and material" section introduces machine learning and signal separation algorithms after showing the study area and the data used. Then the technical details of the model performance methods used are presented. In "Results and discussion" section, the analyses of the models used for evaporation prediction and the results of the graphical analysis are compared. The performance of the best model is also evaluated and compared with similar studies. Finally, the research results are given in "Conclusion" section.

Method and material

Study data and area

This study estimated monthly evaporation values in the GAP region using the GBM model and signal preprocessing techniques. For this purpose, EMD, REMD, EEMD and VMD algorithms were used to increase the performance of a single-GBM algorithm. In the design of the algorithms, 80% of the data was used as training and 20% as testing. Additionally, tenfold cross-validation was applied to solve the overfitting problem, which negatively affects the forecast performance. The GAP Project area is the Euphrates-Tigris Basin and the Upper Mesopotamia region. It covers the provinces of Gaziantep, Adıyaman, Mardin, Diyarbakır, Kilis, Siirt, Şanlıurfa, Batman, and Şırnak. Utilizing the resources of the Southeastern Anatolia Region, raising the income level and standard of living of the local population, expanding employment opportunities in rural regions, and ensuring economic development are some of the GAP's top priorities (Gap.gov 2022). This research used meteorology station data in Siirt, Adıyaman and Diyarbakır provinces. Figure 1 shows the meteorology station's locations.

Fig. 1
figure 1

Location of Siirt, Adıyaman, and Diyarbakır Meteorology Stations

Summary information about the meteorological stations used in the study is presented in Table 1, and the parameters of meteorology stations are shown in Table 1. There are approximately 10 meteorological observation stations in the study area. However, the 3 most reliable stations were used in the study since the meteorological data at some stations were missing and the neighboring stations did not have enough data to complete the data. Therefore, the study area is limited to 3 stations.

Table 1 Summary information of meteorological stations

The statistical parameters of the meteorological data used in the study are presented in Table 2. These are skewness, kurtosis, standard deviation representing mean, maximum, minimum values and distribution parameters that provide essential information about statistical training and test data.

Table 2 Statistical properties of meteorological data used in the study

Machine learning and decomposition models

The performance of monthly evaporation prediction models is compared by using GBM, EMD-GBM, REMD-GBM, EEMD-GBM and VMD-GBM algorithms widely used in the literature. While selecting the ML algorithms, 80% of the data were used for training and %20 used for testing, respectively, and eight parameters were used as input. In addition, the evaporation data were used as output. In Fig. 2, the application steps of the study are shown in order. First, the input combination of the model was selected by subjecting the meteorological data to correlation analysis. Second, the lagged values of the variables with high correlation with evaporation were chosen for the model's setup. Then, the selected input values are subdivided by various decomposition techniques. In creating hybrid models, each sub-component is presented to the GBM model. In the setup of the GBM model, the number of trees parameter is 10000, interaction depth = 1, the distribution type is Gaussian, and shrinkage values are set to 0.01 default values.

Fig. 2
figure 2

Flowchart of the study

Empirical mode decomposition (EMD)

This model adaptively decomposes a non-stationary signal from high frequencies to low frequencies into a set of intrinsic-mode functions (IMFs), and the decomposed signal can be expressed as:

$$x_{t} = \mathop \sum \limits_{i = 1}^{N} C_{i} \left( t \right) + r_{N} \left( t \right).$$
(1)

Here, rN(t): residual of signal, Ci(t): ith IMF x(t) (Kedadouche et al. 2016).

Ensemble empirical mode decomposition (EEMD)

Although this model has been largely used in non-stationary and/or nonlinear signal analysis, decomposition results encounter various problems due to mode mixing. The ensemble EMD method has been created to fix this problem. As this algorithm repeatedly decomposes the signal into the intrinsic modal function (IMFs) using the EMD method, finite-amplitude white noise is added to the original signal. The aggregated means of IMFs generated from each trial are also expressed as IMFs of the EEMD method. Thus, the mode mixing problem is removed by EEMD (Zhang et al. 2010).

The EEMD algorithm contains the following steps and is shown in Fig. 3 (Wu et al. 2009).

Fig. 3
figure 3

EEMD algorithm steps

Variational mode decomposition (VMD)

This algorithm non-recursively decomposes a real-valued multicomponent f signal into semi-vertical band-limited sub-signals. Furthermore, each mode is compact around a center vibration. Thus, constrained variational problem equation can be written as:

$$\left\{ {\begin{array}{*{20}l} {\min \left\{ {\sum\limits_{1}^{K} {\left\|\partial t {\left[ {\left( {\delta \left( t \right) + \frac{i}{\pi t}} \right)u_{k} \left( t \right)} \right]e^{{ - j\omega_{k} t}} } \right\|}_{2}^{2} } \right\}} \hfill \\ {\left\{ {u_{k} } \right\},\left\{ {w_{k} } \right\}} \hfill \\ {{\text{s}}.{\text{t}} \mathop \sum \limits_{k = 1}^{K} u_{k} = f.} \hfill \\ \end{array} } \right.$$
(2)

Here uk: decomposed band-limited IMF, wk:frequency center of each IMF, \(\left\{{w}_{k}\right\}=\left\{{w}_{1},{w}_{2},\dots ,{w}_{k}\right\}; \left\{{u}_{k}\right\}=\left\{{u}_{1},{u}_{2},\dots ,{u}_{k}\right\}\) (Wang and Markert 2016).

Robust Empirical mode decomposition (REMD)

EMD is an efficient algorithm for extracting useful information from multicomponent and modulated signals. Robust Empirical Mode Decomposition (REMD) algorithm is established by applying the elimination sifting stopping criterion (SSC) to the EMD method. As a result, the EMD algorithm emerges as a method that can ease the mode mixing problem in signal decomposition and improve the demodulation performance in signal demodulation (Liu et al. 2022).

Gradient-boosted machine (GBM)

While this algorithm generates a fixed function, it starts the algorithm process by taking the first guess, then iteratively adds a decision tree at each stage until it reaches the optimal reduction in the loss function, and the GBM method can be expressed as follows:

$${f}_{0}\left(x\right)=\sum_{m=1}^{k}{\gamma }_{m}{h}_{m}(x).$$
(3)

Here fk(x): classified function, γm: weight of each decision tree, x: clustered results input variable, k:total number of decision trees, hm: numerical value that minimizes the loss function (Asante-Okyere et al. 2020).

Analysis of model performance

In this research, forecasting analyzes were made using various evaluation criteria which are R2, RMSE, NSE, and MAE performance measures to determine the accuracy of the proposed ML algorithms.

The root-mean-square error (RMSE) measures how far the regression line is from the data points.

$$\mathrm{RMSE}=\sqrt{\frac{{{\sum }_{i=1}^{n}\left({E}_{\mathrm{pi}}-{E}_{\mathrm{oi}}\right)}^{2}}{n}}$$
(4)

Here; Epi: Predicted value, Eoi: Observed value, n: number of data.

Nash–Sutcliffe Efficiency (NSE) is calculated by subtracting the ratio of the mean squared errors and written as below:

$$\mathrm{NSE}=1-\left[\frac{{\sum }_{i=1}^{n}{\left({E}_{\mathrm{pi}}-{E}_{\mathrm{oi}}\right)}^{2}}{{\sum }_{i=1}^{n}{\left({E}_{\mathrm{oi}}-{E}_{o}\right)}^{2}}\right],$$
(5)

where n: number of data Epi: Predicted value, Eoi: Observed value, Eo: Average of observed value (Başakın et al. 2021).

The determination coefficient (R2) is the linear regression between the predicted and observed values and is expressed by the formula below (Zare and Koch 2013).

$${R}^{2}={\left(\frac{{\sum }_{i=1}^{n}\left({E}_{\mathrm{oi}}-{E}_{\mathrm{om}}\right)\left({E}_{\mathrm{pi}}-{E}_{\mathrm{pm}}\right)}{\sqrt{{\sum }_{i=1}^{n}{\left({E}_{\mathrm{oi}}-{E}_{\mathrm{om}}\right)}^{2}{\sum }_{i=1}^{n}{\left({E}_{\mathrm{pi}}-{E}_{\mathrm{pm}}\right)}^{2}}}\right)}^{2}$$
(6)

Here, n: number of data, Eoi: Observed value, Epi: Predicted value, Eom: average of observed value, Epm: average of predicted value.

The Mean Absolute Error (MAE) shown in Eq. 7 is the mean of the absolute difference between the calculated value and actual value expressed by the formula below (Chai and Draxler 2014).

$$MAE = \frac{1}{n}\sum\limits_{{i = 1}}^{n} {\left| {E_{{pi}} - E_{{oi}} } \right|}$$
(7)

Here, n: number of data, Eoi: Observed value, Epi: Predicted value.

Box plots provide information about data distribution, maximum and minimum quintile and median values for established models (Nhu et al. 2020; Dehghani et al. 2022).

Radar plot is a graphic in which each circle represents a certain error size and spouses-centered circles come together (Pardo et al. 2017).

Results and discussion

In the model setup, monthly total precipitation (P), monthly average temperature (Tavg), monthly minimum temperature (Tmin), monthly maximum temperature (Tmax), monthly average wind speed (WS), monthly average actual compression (AP), monthly average relative humidity (RH), and various combinations of monthly total solar time (ST) variables are presented to artificial intelligence models as input. Figures 4, 5 and 6 show the correlation matrix between the model inputs and outputs of Adıyaman, Diyarbakır and Siirt meteorological stations selected for evaporation prediction. When the correlation matrices are examined, it is seen that all meteorological variables used have a significant relationship with evaporation. Correlation coefficients and scatter diagrams are also seen in the correlation matrix. It is observed that while RH, AP and P variables have an inverse relationship with EVP, Tavg, Tmax, WS, RH and ST variables have a linear relationship with EVP. When the correlation coefficients are evaluated, it is noteworthy that the variable with the highest correlation with the EVP values in all meteorology stations is Tavg.

Fig. 4
figure 4

Correlation matrix of Adıyaman station

Fig. 5
figure 5

Correlation matrix of Diyarbakır station

Fig. 6
figure 6

Correlation matrix of Siirt station

Table 3 expresses the model combinations used in evaporation prediction. Based on correlation matrices, the high-variable meteorological parameters were presented as inputs to the models. Thus, it is aimed to reveal which meteorological variables are more effective in evaporation prediction. As an example, the sub-signals of Tavg values obtained by various decomposition techniques at Adıyaman station are visualized (Fig. 7).

Table 3 Installed model combinations
Fig. 7
figure 7

The sub-signals of the Tavg variable produced by the signal decomposition techniques used in Adıyaman

The established GMB model is created according to the default values as n.trees = 10,000, interaction.depth = 1, distribution = Gaussian, n.trees = 10,000, interaction.depth = 1, shrinkage = 0.01. In the setup of hybrid models, the inputs separated into sub-signals with EMD, REMD, EEMD and VMD techniques are combined with the GBM algorithm. EMD, REMD, EEMD and VMD techniques and various IMF and residual values of Tavg data at Adıyaman station are examples.

The test results of hybrid GBM models combined with the established single-GBM and signal processing are shown in Table 4. M1, M2 and M3 input combinations for evaporation prediction at Adıyaman meteorological station, M3 and M7 input combinations at Diyarbakır station and M1 input combinations at Siirt station are prominent. When all models are evaluated together, the most accurate estimation results (RMSE: 3.2638, R2: 0.9989, MAE: 1.9455, NSE: 0.9986) were obtained with the M1 model combination and EMD-GBM hybrid model at Adıyaman station. In addition, when the model performances are evaluated, it is noteworthy that the prediction successes are generally high and close to each other. In this case, it can be deduced that all installed modules can predict evaporation effectively.

Table 4 Test results of installed single-GBM and hybrid GBM models

Boxplots of the test results of the evaporation prediction model established in the GAP region are shown in Fig. 8. Performance evaluation is made according to the models' structure based on this boxplot and their distribution among the real data. Accordingly, it can be said that all models found in Adıyaman and Siirt meteorology observation stations have a high forecasting success because they have a similar distribution with the real data. On the other hand, the prediction performance of the Diyarbakır station is relatively weak. When all boxplots were examined, it was revealed that the top model is the M1 model combination and EMD-GBM hybrid model at Adıyaman station.

Fig. 8
figure 8

Evaluation of model performances with Boxplots a Adıyaman, b Diyarbakır, and c Siirt

Figure 9 shows scattering diagrams of all models established for evaporation prediction in the GAP region. When the established models are evaluated, it is noteworthy that the data in Adıyaman and Siirt stations are distributed around a 45-degree line. This shows that the evaporation estimates at Adıyaman and Siirt stations are close to the truth. However, in Diyarbakır station, it is seen that evaporation values above 400 mm cause the higher error as they deviate from the linear line. In addition, it is noteworthy that the most successful evaporation prediction was made with EMD-GBM at Adıyaman station. In addition, when the scatter diagrams are not evaluated on a province basis, it is noteworthy that Adıyaman M3, Diyarbakır M7, and in Bitlis M1 model input combinations come to the fore. In line with these results, it was concluded that in Adıyaman, P, Tavg, Tmax, Tmin, in Diyarbakır, P, Tavg, Tmax, Tmin, WS, AP, RH, ST and in Bitlis P, Tavg meteorological parameters were the most effective on evaporation estimation.

Fig. 9
figure 9

Scatter diagrams of test results of top models

Figure 10 shows the error changes of GBM and hybrid GBM models with the radar plot. Low error value was analyzed in selecting the most suitable model. Accordingly, it is seen that hybrid models built using signal separation techniques generally have lower errors than the single-GBM model. In addition, the most accurate evaporation values were obtained with the EMD-GBM models with the lowest error. In addition, the lowest error value was seen in the Adıyaman meteorology station. In addition, when the polar diagrams of the errors are evaluated on a province basis, it is noteworthy that the highest error values occur in Adıyaman, while the lowest error values are observed in Siirt. Based on this, it can be said that the evaporation estimates in Siirt province are closer to the truth than other stations. When the success of the models established according to the error values is compared, it is seen that the performances of the GBM and hybrid GBM models are quite similar. However, it has been observed that decomposition techniques in other locations other than Siirt generally increase the success of the GBM model.

Fig. 10
figure 10

Analysis of the variation of errors with radar plot a Adıyaman, b Diyarbakır, and c Siirt

Figure 11 shows the variation of the predicted and actual evaporation time series of the EMD-GBM model, which predicts monthly evaporation data with the highest accuracy. These curves' parallel progression and overlap support that the forecast model produces realistic forecasts.

Fig. 11
figure 11

Comparison of evaporation time series of the best model

This study aims to forecast the evaporation using eight parameters by hybrid ML algorithms, which are GBM, EMD-GBM, REMD-GBM, EEMD-GBM and VMD-GBM hybrid models. For this purpose, this study analyzed the effect of signal preprocessing techniques on the machine learning algorithm's performance. It has been concluded that the models made by combining the GBM algorithm and signal processing techniques make the evaporation prediction effective and reliable. The results of the Rezaie-Balf et al. (2019), Wu et al. (2020), Mohamadi et al. (2020), Gümüş et al. (2016) and Gümüş et al. (2021) researches are compatible with the presented study. Rezaie-Balf et al. (2019) used EEMD combined with SVM and MT models to estimate monthly Ep. Ultimately, they found that the MT and SVM algorithms combining EEMD gave better results than MT and SVM algorithms in estimating monthly PE. The results of this study overlap with Rezaie-Balf et al. (2019) evaporation prediction research in establishing hybrid models such as the EEMD-SVM algorithm and giving better results than other algorithms. Wu et al. (2020) researched hybrid WOAELM and FPAELM algorithms to demonstrate the applicability of FPA and WOA algorithms of ELM algorithm for monthly Ep prediction and they determined that the hybrid FPAELM algorithm gave the best estimation result and both gave better results than other models. The results of this study overlap with Wu et al. (2020) evaporation prediction research in establishing the hybrid algorithms and giving better results than other algorithms. Mohamadi et al. (2020) used FFAs and SA algorithms which trained MLP, ANFIS and RBF algorithms to predict monthly evaporation and determined that ANFIS-SA gave better than the other models. The findings obtained from the study by Mohamadi et al. (2020) coincide with establishing the hybrid models used and giving better results than other models. Gümüş et al. (2021) predicted the monthly Ep of Diyarbakır and Adıyaman using the parameters of wind speed, temperature, pressure, relative humidity, monthly clear days and sunshine intensity. They used ANN, ANFIS, and GEP algorithms with different input combinations while estimating evaporation and found that the GEP algorithm gave better results than other algorithms. This study was conducted in terms of both the region where the study was conducted and the parameters used by Gümüş et al. (2021) shows great similarities with the evaporation estimation study. Although the artificial intelligence models used in the estimation of evaporation are different, it is thought that the study will be of great importance in estimating evaporation for the region. Gümüş et al. (2016) used the climatic parameters of wind speed, monthly average temperature, humidity, pressure, sunshine intensity and duration to estimate Adana's monthly average evaporation. For the modeling process, ANFIS, ANN and GEP algorithms were used and found that the evaporation estimation of all methods used gave good results. Still, the combination of 6 inputs gave the best results in the ANFIS method. The findings obtained from the study by Gümüş et al. (2016) evaporation prediction research in which the parameters used and the study area overlap in terms of being a close region. Although the machine learning algorithms used are different, it is evident that there will be essential data about the evaporation status of the area.

Dehghanipour et al. (2021) used a MLP-NN and genetic algorithm model to predict pan evaporation under different climatic conditions by using temperature (T), relative humidity (RH), wind speed (WS), sunshine hours (SH) parameters in Iran. It was employed the best overall relationship in the first method as the main relationship in the second method and determined the climatic correction coefficients for the other climate types with the genetic algorithm rhythm optimization model. Ultimately, they determined that both methods gave accurate results at study area. In this study, 8 parameters were used for evaporation estimation for 3 regions, while 4 parameters were used for 6 different regions in Dehghanipour et al. (2021) evaporation estimation study. The ML algorithms used when creating models are completely different from each other. Only in terms of parameters used Dehghanipour et al. (2021) study is partially similar to this study.

Conclusion

This study proposed a new hybrid model combining GBM and signal processing techniques to estimate monthly open surface evaporation values in the GAP region. In addition, the study analyzed the effect of signal preprocessing techniques on the performance of the ML algorithm. The study outputs constitute a resource for decision-makers and planners in managing water resources, irrigation planning, developing irrigation systems and construction of water structures in the GAP region. The main outputs of the research are summarized as follows:

  • According to the correlation analysis, EVP values and RH, AP and P variables show a negative relationship, while Tavg, Tmax, WS, RH and ST variables show a positive relationship.

  • In addition, evaporation values in the GAP region were found to have the highest correlation with average temperature values.

  • Evaporation values can be estimated with accuracy close to reality with the Single-GBM model.

  • When various signal separation techniques such as EMD, REMD, EEMD and VMD are combined with the GBM model, generally more accurate evaporation estimates can be made than the Single-GBM model.

  • The highest prediction success was obtained with the EMD-GBM hybrid model and M1 Model input combinations (P, Tavg) at Adıyaman meteorology station (RMSE: 3.2638, R2: 0.9989, MAE: 1.9455, NSE: 0.9986). Although the most successful results are obtained with the combination of EMD-GBM model and P, Tavg input in the study area with a semi-arid climate, it will be useful to compare the results by evaluating all input combinations in different climatic regions. Because parameters such as water resources, climate, altitude and pressure centers vary in different regions.

In future studies, evaporation prediction accuracy can be researched by combining bio-inspired algorithms such as gray wolf optimizer, whale optimization, ant colony optimization, artificial bee colony, and chaos game optimization.