Introduction

The importance of water and how it affects human life is evident, as it is impossible to survive without water. Therefore, proper water resources’ management is essential and plays a crucial role in the future of human beings (Orimoloye et al., 2020; Weng et al., 2021). It is necessary to have sufficient knowledge and understanding of all the factors involved in the resources’ development and limitation (Orimoloye et al., 2021; Quan et al., 2021). Evaporation is one of the most critical factors that play an essential role in the hydrological cycle. Evaporation is an essential hydrological variable in the study, control, and management of water resources (Friedrich et al., 2018).

Increased evaporation is a remarkable indicator of global warming (Chen et al., 2018; Limjirakan & Limsakul, 2012). Monitoring changes in evaporation is of great importance for water resources’ monitoring and management (Kim et al., 2013; Wang 2020). Water losses due to evaporation significantly affect the water budget of reservoirs and lakes which in turn can remarkably decrease the water level. Consequently, water loss due to evaporation should be determined before designing irrigation systems and adopting water resource strategies (Allawi et al., 2019). Reliable evaporation prediction is a critical aspect of the hydrological considerations in water resources’ management, water balance, and water use improvement. The use of previous information available for these variables makes it possible to predict future developments that are a key factor in the planning, design, and management of water resources (Owolabi et al., 2020). Therefore, to accurately predict the amount of evaporation, a relatively long period of previous data is required along with hydrological and meteorological information. This information should be variable with respect to time because if the values of a variable are constant over time, it will not affect evaporation changes.

The evaporation rate is affected by various climatic variables, such as temperature, lake surface area, and precipitation (Benzaghta et al., 2012). In other words, these factors create complex and nonlinear equations describing evaporation. The prediction of evaporation requires complicated nonlinear equations with several input variables (Sebbar et al., 2019). However, it is not practical and realistic to consider many physical variables and factors for predicting the evaporation rates (Rianna et al., 2018), although climatic variables and the inlet and outlet information of the dam are required to predict the evaporation from the reservoir dam correctly.

Direct and indirect methods, including water balance, energy balance, mass transfer, Penman, and evaporation pan, are used to predict evaporation (Wu et al., 2020). Among these methods, the evaporation pan method has been widely used due to its low cost and simple operation (Keshtegar et al., 2016). However, the installation and maintenance of the pan in some places are impossible, or daily reading of the evaporation rate is challenging (Kişi, 2006). In indirect methods, evaporation is estimated using meteorological data and energy volume and energy conservation relationships, which require calibration in regions with different climates. However, it has been proven that both these methods cannot provide reliable estimations of evaporation. Both methods’ unsatisfactory performance has led water scientists to test other approaches for evaporation prediction (Quinn et al., 2018).

According to the literature, machine learning has been successfully applied for water resource problems, for example, rainfall, runoff, sedimentation, river flow, water level, water quality, and reservoir operations (Adnan et al., 2021; He, et al., 2018; Nhu et al., 2020). These methods are data-driven and do not require physical information from the study area. They identify patterns embedded in time series information and use these patterns to predict future scenarios. Recent studies have shown that these methods can achieve more accurate results than other models in hydrological applications (Arya Azar et al., 2021; Chu & Chang, 2009) and other fields of research (Jiang et al., 2017; He, 2020).

Machine learning methods have also been successfully used in evaporation studies (Ghorbani et al., 2018; Allawi et al., 2019; Wu et al., 2019). Wu et al. (2020) used machine learning models to predict monthly evaporation from the evaporation pan and reported on the acceptable performance of machine learning models in predicting monthly evaporation. Antonopoulos and Antonopoulos (2017) used artificial neural networks (ANN) with experimental methods for predicting daily evaporation data and reported that the ANN model provided better results in evaporation prediction. Goyal et al. (2014) utilized ANN, least-squares support vector regression (LS-SVR), and fuzzy inference system (FIS) to predict the daily evaporation of the pan in subtropical climates.

In previous studies, evaporation from reservoir dams has rarely been discussed. Since the amount of evaporation from the surface of the reservoirs is one of the essential parameters of water balance, its correct prediction is essential in hydrological studies. Therefore, this study was aimed to predict the amount of evaporation from the surface of dam reservoirs using two machine learning models and also using evolutionary algorithms. Due to the importance of evaporation and studies on predicting the evaporation amounts, the performance of LS-SVR and ANFIS was evaluated in evaporation prediction. Then, to improve the ANFIS prediction performance, Harris hawks optimization (HHO; Heidari et al., 2019) was considered for optimizing the parameters of ANFIS. Afterward, the developed ANFIS-HHO hybrid model was utilized for predicting the monthly evaporation from the dam reservoir. Different scenarios of input variables were developed and incorporated into each model. The results of the models and scenarios were analyzed, and the best model with the most appropriate scenario was selected and proposed for predicting the evaporation from reservoir dams.

Study area and the data used

The Doroudzan dam (29° 50′‒30° 15′ N, 51° 53′‒52° 22′ E) is located in Tasht-e Bakhtegan watershed, 100 km northwest of Shiraz, central Iran, on the Kor River. By supplying about 760 MCM/year of water, this dam provides agricultural water of ca. 42,000 ha of Ramjerd and 34,000 ha of Korbal and Marvdasht. The area reported for the watershed is 4,116 km2. Figure 1 shows the geographical location of the study area.

Fig. 1
figure 1

The location of the study area in Iran

Several variables, including temperature (T), inflow to the dam reservoir (Qin), the outflow from the dam reservoir (Qout), overflow from the dam reservoir (OF), lake area behind the dam (A), precipitation (P), and evaporation at previous month (EVO(n-1)), were used to predict the monthly evaporation (EVO). The statistical characteristics of these variables during the study period are shown in Table 1. The evaporation varied in the range of 0 to 74.5 mm per month. The lowest evaporation amounts were in the cold months of the year: January and February. Moreover, the highest amounts of evaporation were recorded in the hot months of the year. The amount of monthly precipitation during the study period varied from 0 to 730.5 mm per month. Moreover, when the amount of rainfall during the day increased, the amount of inflow to the reservoir naturally increased. The maximum temperature was about 11.93 °C, while its average value was about 5.25 °C.

Table 1 Statistical characteristics of the variables used in this study

A, lake area behind the dam; OF, overflow from the dam reservoir; T, temperature; Qout, the outflow from the dam reservoir; Qin, inflow to the dam reservoir; P, precipitation; EVO, evaporation.

Methodology

Since this study was aimed to provide a reliable and efficient model for predicting evaporation from dam reservoirs, the hydrological and meteorological variables affecting the evaporation were determined. These variables included lake area behind the dam, precipitation, inflow to the reservoir, outflow from the reservoir, temperature, and overflow. Then, various input scenarios of variables were defined, which were then implemented by machine learning techniques. The HHO evolutionary algorithm was used to optimize the ANFIS parameters to improve its prediction performance. According to the literature, the HHO algorithm has unique features that can significantly improve the ANFIS model. In this structure, the objective function is defined as minimizing the error of the values predicted by the model. The performance of the models and scenarios was investigated using error evaluation criteria in the form of statistical and graphical relationships (i.e., RMSE, MAPE, NSE, Taylor’s diagram, and scatterplots). Finally, the most appropriate predictive model with its appropriate scenario was proposed (Fig. 2). In the following, the models used in this study are described in detail.

Fig. 2
figure 2

The flowchart of this study

Least-squares support vector regression (LS-SVR)

Conventional SVR often fails to provide optimum solution optimization problems, which can lower the performance of the machine. Therefore, LS-SVR is recommended for solving complex problems since it exerts lower computational complexity compared to SVR, resulting in more desirable performance (Arya Azar et al., 2021; Goyal et al., 2014).

Given a set of training data such as \({\{{x}_{k}, {y}_{k}\}}_{K=1}^{N}\), whose input and output data include \({x}_{k}\in {R}^{N}\) and \({y}_{k}\in R\), respectively, Eq. (1) shows the nonlinear regression function in the initial weighting (Suykens and Vandewalle, 1999)

$$y\left( {(x)} \right){=}{W^T}{{\varphi}}\left({(x}\right){)+}b$$
(1)

where T, b, and W are the weight, regression bias, and transpose operator, respectively. φ (x) maps the inputs in the feature space with high dimensions. This nonlinear regression can be solved by optimizing Eq. (2).

$$\min j\left( {(w,\;e} \right){) = }\frac{1}{2}{W^2}W{ + }\frac{1}{2}\gamma \mathop \sum \limits_{k{ = }1}^N e_k^2$$
(2)

Subject to:

$${y}_{k}={W}^{T}\varphi \left(x\right)+b+{e}_{k}, k=1,N$$
(3)

where γ is the regulator parameter for the error e. γ always controls the approximation function, so the larger the γ value, the higher the error. Solving this equation using the Lagrangian form of the main objective function:

$$L\left(w, b, e, a\right)=j\left(w,e\right)-\sum_{i=1}^{N}{\alpha }_{i}\{{W}^{T}\varphi \left({x}_{k}\right)+ b+ {e}_{k}- {y}_{k}\}$$
(4)

where αi is the Lagrangian coefficient. Based on the Karush–Kuhn–Tucker condition, the LS-SVR model is written for the approximation function as Eq. (5).

$$y\left(x\right)=\sum_{k=1}^{N}{\alpha }_{k}K\left(x, {x}_{k}\right)+b$$
(5)

where K (x, xk) is called kernel function. In this study, the Gaussian function (Eq. (6)) was used.

$$K\left(x, {x}_{k}\right)={exp}(-\frac{{\left|\left|x-{x}_{k}\right|\right|}^{2}}{{\sigma }^{2}})$$
(6)

Adaptive neuro-fuzzy inference system (ANFIS)

Jang (1993) developed the ANFIS model for the first time by combining ANN and fuzzy logic. ANFIS does not have ANN and FIS limitations, such as overfitting and sensitivity to the definition of membership functions, to perform better in prediction problems. The most common method for the training of ANFIS is the Sugeno-type FIS, which uses a robust learning algorithm to determine the model’s parameters (Asefpour Vakilian & Massah, 2018). ANFIS architecture generally includes five layers. In layer 1, the generalized Gaussian membership function µ produces a new output Out1i from the inputs x and y (Eq. (7)).

$$\begin{array}{c}{Out}_{1i}=\upmu {A}_{i}\left(x\right), i=1, 2 \\ {Out}_{1i}=\upmu {B}_{i-2}\left(y\right), i=3, 4\end{array}$$
(7)

Where

$$\upmu \left(x\right)={e}^{-{\left(x-\frac{{p}_{i}}{{\sigma }_{i}}\right)}^{2}}$$
(8)

and Ai and Bi are the membership values of µ, while Pi and σi are the equation parameters. The output of each node is obtained in the second layer using Eq. (9)

$${Out}_{2i}=\mu A\left(x\right)*{\mu B}_{i-1}$$
(9)

Then, the output of layer 2 is normalized in layer 3 (Eq. (10))

$${Out}_{3i}={\overline{W} }_{i}=\frac{{\omega }_{i}}{\sum_{i=1}^{2}{\omega }_{i}}$$
(10)

The output is then used in a linear combination equation

$${Out}_{{4,1}}= {\overline{w}}_{i}{f}_{i} ={\overline{w}}_{i} ({p}_{i}x + {q}_{i}y + {r}_{i})$$
(11)

where p, q, and r are parameters defined for the ith node. The model’s output is obtained using Eq. (12).

$${{Out}}_{4,i} =\sum_{i}{\overline{w}}_{i}{f}_{i}$$
(12)

Harris hawks optimization (HHO)

Introduced by Heidari et al. (2019), the HHO algorithm is inspired by nature and how rabbits are hunted by Harris hawks. This algorithm involves two stages of soft and hard besieges of the rabbit. In the soft besiege, the rabbit still has enough energy and tries to escape with random misleading jumps. Harris hawks gently surround it to make the rabbit more tired. However, in the hard besiege, the prey is very tired and has little energy for escape. Finally, the hawks hardly encircle the rabbit for performing a surprise pounce.

In this algorithm, the Harris hawks move randomly to find prey. Their position is mathematically expressed as:

$$X(t+1)=\left\{\begin{array}{ll}{X}_{{rand}}(t)-{r}_{1}\left|{X}_{{rand}}(t)-2{r}_{2}X(t)\right|& q\ge 0.5\\ ({X}_{{rabbit}}(t)-{X}_{{m}}(t))-{r}_{3}({LB}+{r}_{4}({UB}-\rm{LB}))& q<0.5\end{array}\right.$$
(13)

where X(t) and X(t + 1) denote the position of hawks at iterations t and t + 1, respectively; Xrabbit(t) is the rabbit’s position; r1, r2, r3, r4, and q are random numbers, being updated in each iteration; UB and LB are the lower and upper limits of variables; Xrand(t) is the position of a hawk randomly selected from the population; and Xm is the average position of the population, which is obtained using Eq. (14).

$${X}_{m}(t)=\frac{1}{N}\sum_{i=1}^{N}{X}_{i}(t)$$
(14)

where N is the total number of hawks and Xi(t) is the position of each hawk in iteration t. The prey’s energy decreases during the escape (Eq. (15))

$$E=2{E}_{0}(1-\frac{t}{T})$$
(15)

where T is the maximum iteration number, E is the prey’s energy, and E0 is the initial energy. The E parameter is utilized for enabling the algorithm to use soft and hard besiege processes to trap the prey. Soft and hard besieging occur when |E|≥ 0.5 and |E|< 0.5, respectively.

When |E|≥ 0.5, although the prey performs some random misleading jumps since it still has enough energy, it finally cannot. The hawks encircle it softly to make the rabbit more exhausted and then perform the surprise pounce (Eqs. (16) and (17)).

$$X(t+1)=\Delta X(t)-E\left|J{X}_{\rm{rabbit}}(t)-X(t)\right|$$
(16)
$$\Delta X(t)={X}_{\mathrm{rabbit}}(t)-X(t)$$
(17)

where ΔX(t) is the difference between the prey’s position and the current position in iteration t, and J is a coefficient representing the strength of the prey’s random jumps. When |E|< 0.5, the rabbit has low escaping energy since it is exhausted, and at this time, the surprise pounce is performed by the hawks. Equation (18) shows the updates of current positions in hard besiege.

$$X(t+1)={X}_{\mathrm{rabbit}}(t)-E\left|\Delta X(t)\right|$$
(18)

Input scenarios

LS-SVR, ANFIS, and ANFIS-HHO were used in this study for predicting monthly evaporation from the Doroudzan reservoir dam. For this purpose, various scenarios with different combinations of effective variables were developed. The correlation coefficient of each variable with the output (EVO(n-1)) is listed in Table 2. The variable with the highest correlation coefficient with the output was introduced as the first scenario. The second scenario was developed based on two variables that achieved the highest correlation coefficients. The rest of the scenarios were defined using other variables such that the S7 included all input variables. The scenarios defined in this study to predict the evaporation from the dam are listed in Table 3. Each of these scenarios was implemented by each model, and its results were evaluated by the performance evaluation criteria.

Table 2 The correlation coefficients between output and inputs
Table 3 Scenarios defined in this study to predict the evaporation

In examining the correlation coefficients between the inputs and output, it is observed that evaporation at the previous month had a lower correlation than other parameters such as temperature, precipitation, and inflows and outflows. This shows that in predicting evaporation from dam reservoirs, parameters such as inflow and outflow of the dam can be more effective than evaporation values in previous months. The evaporation had the highest correlation with precipitation, inflows and outflows of the dam, and temperature, respectively, and the lowest correlation with the reservoir surface area.

Performance evaluation criteria

The dataset was randomly divided into two groups: 70% of data were considered for model training and the remaining 30% were used for the test. Root mean square error (RMSE), mean absolute percentage error (MAPE), Nash Sutcliffe Index (NSE), and coefficient of determination (R2) were considered for evaluating the scenarios and machine learning methods (Hua et al., 2021; Weng et al., 2021).

$${RMSE}=\sqrt{\frac{\sum_{i=1}^{n}{\left({x}_{o}-{x}_{p}\right)}^{2}}{n}}$$
(19)
$$\rm{MAPE}=\frac{100 \%}{n} \sum_{i=1}^{n}|\frac{{x}_{o}-{x}_{p}}{{x}_{o}}|$$
(20)
$${NSE}=1-\frac{\sum_{i=1}^{n}{({O}_{i}-{S}_{i})}^{2}}{\sum_{i=1}^{n}{({O}_{i}-\overline{O })}^{2}}$$
(21)
$${R}^{2}=1-\frac{\sum_{i=1}^{n}{\left({x}_{p}-{x}_{o}\right)}^{2}}{\sum_{i=1}^{n}{\left({x}_{o}-{\overline{x} }_{o}\right)}^{2}}$$
(22)

where xo is the observed (measured) value, xp is the predicted value, and n is the number of samples. The lower RMSE and MAPE and higher NSE and R2 values indicate better model performance.

Results and discussion

To obtain the most proper values for the parameters of each machine learning algorithm, it was necessary to run the algorithm several times with different parameter values. The results of this procedure are shown in Table 4 and indicate that the Gaussian function is the most appropriate fuzzy membership function for the ANFIS model. Since ANFIS uses the Sugeno-type method in its structure, the linear function was used for the model’s output equation. Some ANFIS parameters cannot be obtained by the trial and error method and require robust optimization algorithms to obtain their optimized values. Therefore, HHO was used to optimize the ANFIS model. The population for HHO and its maximum iteration number were adjusted to 30 and 2000, respectively. The Gaussian kernel in LS-SVR had two parameters, namely, σ2 and γ, the optimum values of which were obtained equal to 5.365 and 136.03, respectively.

Table 4 Parameters and specifications of the models for the prediction of evaporation

Error evaluation criteria for the training and test data are presented in Table 5. A model is introduced as the superior model in which the RMSE and MAPE values are the lowest for both test and training data, while the NSE is the highest. Scenario S6 that included all variables but the lake area behind the dam was identified as the most suitable scenario for all three models. The best performance was obtained using the ANFIS-HHO model with RMSE, MAPE, and NSE of 1.55, 2.35, and 0.95, respectively. Error evaluation criteria showed that ANFIS had the lowest accuracy among the models with RMSE, MAPE, and NSE of 3.85, 5.30, and 0.85, respectively. Therefore, the HHO algorithm improved the accuracy of ANFIS performance by RMSE, MAPE, and NSE of 1.55, 2.35, and 0.95, respectively. The performance of LS-SVR was slightly lower than that of ANFIS-HHO, with RMSE, MAPE, and NSE values equal to 2.48, 2.77, and 0.93, respectively. This shows that although LS-SVR had relatively small errors during the prediction, ANFIS-HHO was the most suitable model among the developed models in the prediction of evaporation of reservoir dams.

Table 5 Performance evaluation of the developed models. The best performances are shown in bold

The first three scenarios (S1 to S3) had the lowest prediction accuracy in all three models, which indicate that the monthly evaporation from the dam reservoir is so complex and nonlinear that it is not possible to predict its amount by having only the inflow and outflow of the dam and evaporation at the previous month. Therefore, it requires more information, for example, the input variables of scenario S6, which accurately predicted the evaporation. Therefore, in addition to the information on the dam’s inflow and outflow, we need information about temperature, overflow, and evaporation in the previous month.

The scatterplots of the observed (measured) and predicted values (Fig. 3) showed that the ANFIS model had poor performance compared to other methods (R2 = 0.898). In contrast, ANFIS-HHO achieved the highest performance than the other models (R2 = 0.959). Moreover, data are close to the bisector line in the ANFIS-HHO model, which reveals its small prediction error.

Fig. 3
figure 3

Scatter plots of the measured and predicted data

For further evaluation, Taylor’s diagram was used to investigate the correlation between the predicted and observed evaporation, as well as their standard deviations (Fig. 4). The correlation coefficient for all three models ranged from 0.95 to 0.99, indicating the efficiency of all three models for evaporation prediction. ANFIS-HHO had the highest correlation coefficient than the other two models. Furthermore, the root mean squares deviation (RMSD) for ANFIS-HHO and LS-SVR was ca. 2, while its value for the ANFIS model was slightly higher than 3. Although all three models could predict evaporation, the closest results to the observed values using Taylor’s diagram were obtained by the ANFIS-HHO model. Therefore, HHO was able to optimize the ANFIS parameters for increasing the model performance.

Fig. 4
figure 4

Taylor’s diagram for evaporation prediction

Figure 5 shows that ANFIS-HHO correctly detected the evaporation changes in almost all test steps. However, in several steps, such as 13, 28, and 65, the predicted values had significant errors when the ANFIS model was used for the prediction. In other words, ANFIS was unable to predict the minimum and maximum values of evaporation, and this result is probably achieved when the ANFIS model was trapped at local optimization points. The results also show that the changes in evaporation were detected correctly by the LS-SVR model. However, in some steps, the predicted values involved remarkable errors. The ANFIS-HHO model had the highest ability to predict the evaporation data; the minimum and maximum values are predicted with the lowest errors.

Fig. 5
figure 5

Time series of observed and predicted values for evaporation using a LS-SVR, b ANFIS, and c ANFIS-HHO

According to the results, to predict the evaporation properly, the use of meteorological data and some parameters related to the dam, such as its inflows and outflows and the area behind the dam, is required. Moreover, the correlation of the input parameters with the output showed that evaporation had a weak correlation with the evaporation at the previous month, indicating that only having various delays of the target parameter could not result in a reliable prediction for future months. Therefore, to have an accurate prediction, information about temperature, precipitation, and dam inflows and outflows is required. The results also showed that the presence of the water area behind the dam as an input parameter did not affect the prediction performance, so it could be omitted in the modeling. Of course, this result is slightly different from our understandings. We usually consider the lake surface area behind the dam as one of the effective parameters in evaporation losses. This is one of the disadvantages of the machine learning models that they do not consider the nature and type of data and only prefer inputs with values being in line with the system output changes. Scenario S6, including all input parameters but the surface area behind the dam, was the selected scenario for all three models, indicating the necessity of the participation of the parameters investigated in this study. Moreover, the results of Taylor’s diagram, as well as scatter point, confirmed the results of RMSE, MAPE, and NSE error evaluation criteria in the promising performance of the ANFIS-HHO model.

In this study, for the first time, meteorological parameters along with dam inlet and outlet parameters were used to improve the prediction performance, which included a higher number of input variables than Allawi et al. (2021). In fact, this study tries to use the parameters of the dam balance to measure the amount of evaporation change in the future according to the changes of each input variable, which can help better manage the dam allocation. Therefore, in addition to the accurate prediction of evaporation, this study aimed to contribute meteorological variables and flow continuity parameters in the dam with appropriate accuracy to predict the evaporation amount in the future. This prediction can help us to make the right decision in the future in reducing evaporation or proper allocation of the dam. Developing various input scenarios allows researchers and authorities to consider different input information to predict evaporation depending on the status of each region. For example, in some areas, temperature information might not be available. In this case, we can use a scenario that does not include this parameter and has a relatively good performance, such as the third scenario introduced in this study. On the other hand, the use of meteorological parameters along with dam inlet and outlet variables provides a more realistic situation during the modeling, leading to a better prediction of evaporation.

In general, the performance of all three models was appropriate in predicting the amount of evaporation. Among the two single models, LS-SVR performed better than ANFIS, which was consistent with the results of Razavi et al. (2019) in estimating thermal conductivity enhancement and Bemani et al. (2019) in estimating the acid solvent solubility in supercritical CO2 conditions. The use of the HHO algorithm improved the performance of ANFIS, which is consistent with the study of Arya Azar et al. (2021) in predicting the longitudinal dispersion coefficient of the river, Milan et al. (2021) in predicting optimal groundwater withdrawal, and Shehabeldeen et al. (2019) in predicting the friction process of welding. Since various algorithms are proposed daily by researchers to solve optimization problems (Bo et al., 2021), the application of other algorithms is recommended for improving the performance of weak single models (e.g., ANFIS) in hydrological problems.

One of the strengths of using machine learning models is that they can predict evaporation without special knowledge of geology or meteorology. However, the results showed that an efficient prediction requires the participation of effective input variables such as temperature, precipitation, and inflow to the dam. Therefore, more information than one input parameter, such as evaporation at the previous month, is required to predict evaporation at the current month, which is in line with the findings of Allawi et al. (2021). Moreover, one of the advantages of using ANFIS and its hybrid models is considering the uncertainties in the input information, which did not exist in the LS-SVR model. On the other hand, although the HHO algorithm was able to improve the performance of ANFIS and the hybrid model developed had better efficiency than the LS-SVR model, the development of an LS-SVR model is much simpler than the ANFIS-HHO hybrid model. Therefore, in developing hybrid models based on metaheuristic optimization methods, in addition to improving the prediction performance, one should also pay attention to their structure and complexities. Hence, experts in the field of machine learning are required to develop such models since implementing the LS-SVR is much easier than the ANFIS-HHO.

The results showed that the performance of the models used strongly depends on their input variables. The variables with similar trends to the target parameter have a higher correlation coefficient with the output than other input variables and, therefore, are more important in determining the amount of output. This is a relatively fundamental weakness in the use of machine learning models because an input parameter might actually have a remarkable effect on evaporation but is not considered an important factor in machine learning models due to its trend of changes (e.g., area behind the dam in this study). Finally, it can be said that the trend of data is more valuable than the nature of the data for machine learning models.

Conclusions

The present study evaluated the performance of ANFIS and LS-SVR for the prediction of monthly evaporation from dam reservoirs. Seven scenarios that included different combinations of input variables were considered to evaluate the models’ performance. LS-SVR performed better than the ANFIS model. To improve the ANFIS performance, the HHO algorithm optimized the ANFIS parameters. Among the input variables, precipitation, inflow to the dam, and temperature had the most significant effects on evaporation. The area of the lake behind the dam had the most negligible impact compared to other parameters. Two approaches were used to evaluate and select the appropriate model. In the first approach, error evaluation criteria (RMSE, MAPE, and NSE) were used to select the appropriate model and scenario, which showed that ANFIS-HHO is more accurate than the other two models. In the second approach, Taylor’s diagram and scatterplots were used to compare the models graphically. Taylor’s diagram reveals the correlation coefficient, standard deviation, and RMSD of predicted and observational data in the predictive models. Taylor’s diagram showed that ANFIS-HHO resulted in the closest prediction values to the observational data. The introduced approach in this study can be used to predict and manage dams that have similar conditions to the dam investigated in this study.

In general, the results showed that the use of machine learning models to predict evaporation from reservoir dams provides satisfactory results that can be used in hydrological studies and management strategies. Furthermore, to have an accurate prediction, the information about the inflow and outflow of the dam, precipitation, and temperature was more effective than parameters such as evaporation at the previous month and the water area behind the dam. Due to the wide range of available machine learning models, it is recommended to evaluate these models to achieve the highest performance in predicting the evaporation of reservoirs in daily and monthly time steps. Considering climate change and investigating the daily prediction of evaporation from reservoir dams and including the amount of daily and monthly evaporation changes in the dam outflow planning program can be performed in future research.