Introduction

Agriculture is the biggest consumer of the world's freshwater reserves. Inadequate water availability and allocations affecting crop growth and harvest would lead to reduced food production or even food scarcity (De Fraiture et al., 2010). The impacts of anthropogenic development, climate change, environmental degradation, and industrial water demand have placed enormous pressure on the already depleting freshwater resource. Hence, water conservation has been a major concern for irrigated agriculture (Hamdy et al., 2003). Planning, management, and regulation of agricultural water resources require an accurate estimation of water use over agricultural (Wanniarachchi & Sarukkalige, 2022). Proper water resources management can help mitigate these issues and promote equitable access to water resources for all activities.

In agricultural water management and the hydrological cycle, one of the most influential parameters is evapotranspiration (ET). ET is the process by which water is transferred from the land surface and vegetation to the atmosphere through evaporation from soil and transpiration from plants (Chang et al., 2018). It plays a crucial role in regulating the water balance of ecosystems (Wang et al., 2023). In order to optimize the water management system, accurate estimation of ET values is crucial (Reyes-Gonzalez, 2017). Accurate estimation of ET is essential for crop growth modeling, climate change impact studies, ecosystem management, estimating crop water resource requirements, and subsequent irrigation scheduling (Hussain et al., 2020).

Direct point-based measurement of ET can be performed using a Lysimeter and then estimations care performed using the Bowen ratio–energy balance approach, and Eddy covariance techniques (Todd et al., 2000). These methods have several inherent drawbacks such as the high cost of installation and maintenance of Lysimeter, the need for a large area of land with similar actual conditions, and the need for specialized skilled personnel to collect and process the data (Todd et al., 2000). It is also possible to indirectly estimate ET by using a set of climatological variables in order to define reference evapotranspiration (Zouzou & Citakoglu, 2023). Estimating reference ET is commonly used in water management since it represents the amount of water that would be lost to the atmosphere by a hypothetical, well-watered grass reference crop that completely covers the soil surface and experiences minimal water stress under optimal conditions (Hargreaves, 1994; Cobaner et al., 2017). The FAO-56 Penman–Monteith (FAO-56 PM) method is one of the stable and well-established techniques for determining ET (Allen et al., 1998). The FAO-56 PM method combines energy balance and aerodynamic principles to estimate ET based on meteorological data such as temperature, humidity, wind speed, and solar radiation. This method was validated using the lysimeter technique in different climates, it can also be used to validate other ET computation methods (Landeras et al., 2008), yet proper calibration is necessary.

Alternatively, to save on costs, tedious calibration and validation processes, machine learning algorithms can reliably estimate and forecast ET (Roy, 2021; Bayram & Çıtakoğlu, 2023). Using machine learning algorithms, it is possible to identify complex relationships between meteorological parameters and ET values that are difficult to detect using traditional methods (Yamaç & Todorovic, 2020; Citakoglu et al., 2014). In a study, Hu et al. (2022) investigated ET in Pakistan during the period 2015-2021 using the Internet of Things (IoT) based on machine learning (ML) models including k-nearest neighbors (KNN), Gaussian Naive Bayes (GNB), artificial neural network (ANN), and support vector machine (SVM). Comparing and evaluating the results of this research indicated that the KNN model had more suitable performance compared to other ML models with an accuracy of 92%. Mehdizade et al. 2021  used the adaptive neuro-fuzzy inference system (ANFIS) algorithm in combination with the shuffled frog-leaping algorithm (SFLA) and the invasive weed optimization method (IWO) to estimate ET0. Results showed that these models outperformed classic ANFIS as well as empirical models, with ANFIS-SFLA demonstrating the best performance (RMSE = 0.15 mm.day-1, R2 = 0.99). Hadadi et al. (2022) in research, the performance of artificial intelligence models, including ANFIS and its hybrids with SFLA and grey wolf optimization (GWO) optimization algorithms, in determining monthly Actual evapotranspiration (AET) in Iran is evaluated. By combining optimization algorithms with ANFIS, further accuracy was enhanced, resulting in promising results for estimating AET in arid climates. Talebi et al. 2023 developed an algorithm based on multilayer perceptrons (MLP) and MLP optimized with stochastic gradient descent (SGD) (MLP-SGD) for estimating daily ET0 in two different climates. Results showed that the hybrid model with SGD-MLP performed better than the single model with all input parameters. In another study, Kushwaha et al. (2022) evaluated additive regression (AdR), random subspace (RSS), and the M5P tree models for modeling daily ET. The evaluation of the used parameters demonstrated that with more input variables, the performance of the model increased. Also, among the implemented models, the AdR6 model, which had all meteorological variables as input during the test period, performed better than other models with a coefficient of determination of 0.998. In a recent study by Sabanci et al. (2023), the researchers aimed to estimate ET0 using different machine learning models, as an improvement over the FAO 56-PM approach. The study focused on 12 stations located in the Central Anatolian Region (CAR), which had diverse climate characteristics. To evaluate the performance of the models, the researchers used commonly used metrics such as R2, MAE, RMSE, and PI. Among the models tested, the LSTM, ANN, and multivariate adaptive regression splines (MARS) showed the best performance at eight, three, and one station(s), respectively. The selected models achieved impressive values for R2 (ranging from 0.987 to 0.999), MAE (ranging from 1.948 to 4.567), RMSE (ranging from 2.671 to 6.659), and PI (ranging from 1.544 to 4.018). Overall, this research provided valuable insights into accurately estimating ET0 for various climate conditions in the Central Anatolian Region. In addition, supervised machine learning algorithms from artificial neural networks (ANN) such as feedforward neural networks (FFNN) (Mahesh, 2020) are also very effective for ET forecasting. FFNNs have been used to estimate ET in many different studies. For predicting daily ET in northwest Algeria, Achite et al. (2022) compared the FFNN and Radial Basis Function Neural Network (RBFNN) and found that the FFNN model performed better than the Penman-Monteith model in predicting ET with a coefficient of determination of 0.992. In modeling and predicting water quality using adaptive neural fuzzy inference system (ANFIS) and FFNN Hmoud Al-Adhaileh and Waselallah Alsaade (2021) found that the FFNN model registered higher accuracy (100%) for water quality classification (WQC) despite the ANFIS model having acceptable accuracy for predicting WQI water quality index values. The review of the conducted studies indicates that the FFNN model has performed well as a reliable model. The FFNN model has four major advantages: 1. Adaptability: even without the presence of the user, there can be a high correspondence between observed and predicted values. 2. Non-linearity: it causes the relationship between parameters to be found in the best way. 3. Mapping between input and output: reduces the difference between input and output to the minimum possible value. 4. High strength: it reduces the amount of noise (Svozil et al., 1997).

With the emergence of advanced artificial intelligence techniques, deep learning approaches have been widely used recently. One such DL model, the Long Short-Term Memory (LSTM) is commonly used for time-series data analysis (Sun et al., 2019; Citakoglu, 2021; Uncuoglu et al., 2022). These DL algorithms can process large amounts of data quickly and with high accuracy, reducing the time required for manual calculations. It is particularly useful for modeling sequences of data where there are long-term dependencies between the inputs and outputs. During the training process, the LSTM model ‘learns’ to identify patterns in historical hydro-meteorological data that are associated with daily ET changes (Ferreira & da Cunha, 2020). The LSTM model has attracted the attention of scientists due to its appropriate accuracy in recent years (Demir & Citakoglu, 2023). In a recent study, Alibabaei et al. (2021) modeled ET and reference soil water content using climate data by deep learning method. The results of this research indicated that LSTM achieved the best performance with a coefficient of determination of 0.9. Similarly, Chen et al. (2020) estimated daily ET based on limited meteorological data using deep learning and machine learning methods in the northeastern plain of China. The results of this study indicated that when temperature-based features were available, temporal convolution neural network (TCN) and LSTM models performed significantly better than experimental temperature-based models beyond the study regions.

Scholars argue and have shown that hybrid models tend to have more suitable performance in comparison to standalone models. In research conducted by Jia et al. (2023), two hybrid models were introduced to predict ET at four climate stations in Shaanxi province, China. These models combined particle swarm optimization (PSO) with the LSTM network. To train the models, 40 years of historical data were utilized, with PSO optimizing the hyperparameters within the LSTM network. The resulting optimized models were then employed to predict daily ETo in 2019, using different datasets. Therefore, predicting daily reference evapotranspiration with hybrid methods can be more cost-effective and precise than classical models. Therefore, the primary objective of this study is to introduce an innovative hybrid model that integrates Feedforward Neural Networks with Long Short-Term Memory Networks for the accurate prediction of daily reference evapotranspiration (ET0). Even though coastal areas have a high potential for agricultural industry having humid climates, high rainfall, and fertile soil, estimating ET in these humid climates is a challenge. Hence the proposed model is tested at two distinct sites in the humid climates of Iran. To achieve the mentioned goal, the approach is based on meteorological parameters from one, two, and three-day antecedent periods. The next section of the paper presents materials and methods following are results and conclusions.

Materials and methods

Study area

Effective management of water resources in humid and extremely humid climates is critical to ensure sustainable and equitable utilization. As a result, two stations with a varied climate, including Bandar Anzali and Babolsar in Iran, were selected as the study sites. Bandar Anzali experiences an average annual rainfall of 1733.9 mm and an average annual temperature of 16.4°C. In Babolsar, the average annual rainfall is slightly lower at around 902.9 mm, and the average annual temperature is slightly higher at around 17.3°C (Fallah-Ghalhari & Shakeri, 2023). As the aim of the study was to develop modelling options for arid climatic conditions, the de Martonne aridity index was used to identify the stations. The de Martonne aridity index categorizes the stations on the basis of mean annual precipitation and mean temperature, which shows that both the Bandar Anzali and Babolsar stations are located in extremely humid and humid climates (Mehdizadeh, 2020). Also, in the present study, 70% of the data were used for the training phase and 30% of the data were used for the test phase. Table 1 summarizes the geographical characteristics of the study area, while Fig. 1 illustrates their respective locations.

Table 1 Geographical characteristics of the studied stations.
Fig. 1
figure 1

The location of the studied stations

FAO-penman–Monteith (FAO-PM)

A standard method for estimating ET is the FAO-PM equation. FAO-PM was calculated using Eq. 1.

$$ET=\frac{0.408\Delta \left({R}_n-G\right)+\gamma \frac{900}{T_{min}+273}{U}_2\left({e}_s-{e}_a\right)}{\Delta +\gamma \left(1+0.34{U}_2\right)}$$
(1)

where ET is the reference evapotranspiration (mm/day), Δ is the slope of the saturation vapor pressure curve (kPa/C) at the daily air temperature (C), Rn and G are the net solar radiation and soil heat flux density (MJ/m2 day), γ is the psychrometric constant (kPa/C), T is the daily temperature (C), U2 is the wind speed (m/s), 𝑒𝑠 is the saturation vapor pressure (kPa), and 𝑒𝑎 is the actual vapor pressure (kPa) (Allen et al., 1998).

Equation 1 is used for ET estimations that can only be performed after the magnitude of parameters such as Rn, G, U2, 𝑒𝑠, 𝑒𝑎, T, are known at the end of the day. In order to make effective decisions, prior knowledge is important at the beginning of the day. Hence, this study tries to perform these estimations.

Long short-term memory (LSTM) model

Long Short-Term Memory (LSTM) is designed to handle the disappearing gradient problem that exists in traditional recurrent neural networks (RNN) (Hochreiter & Schmidhuber, 1997). LSTM networks use a memory cell to keep data over long periods and selectively forget or remember data based on input signals (Coşkun & Citakoglu, 2023). Three gates rule the memory cell: input gate, forget gate, and output gate (Gers et al., 2000). These gates are controlled by sigmoid activation functions the outputs of which ranges between 0 and 1. The input gate controls the amount of new data that should be added to the cell state and is calculated as follows:

$${i}_t=\sigma \left({W}_i\ast \left[{h}_{t-1},{x}_t\right]+{b}_i\right)$$
(2)

where it is the input gate vector at time t, σ is the sigmoid function, Wi is the weight matrix for the input gate, ht-1 is the previous hidden state, xt is the current input vector, and bi is the bias vector for the input gate.

The forget gate is calculated as follows:

$${f}_t=\sigma \left({W}_f\ast \left[{h}_{t-1},{x}_t\right]+{b}_f\right)$$
(3)

where ft is the forget gate vector at time t, Wf is the weight matrix for the forget gate, ht-1 is the previous hidden state, xt is the current input vector, and bf is the bias vector for the forget gate.

The output gate determines the magnitude of the cell state that needs to be expelled to generate a new hidden state, which is calculated as follows:

$${o}_t=\sigma \left({W}_o\ast \left[{h}_{t-1},{x}_t\right]+{b}_o\right)$$
(4)

where ot is the output gate vector at time t, Wo is the weight matrix for the output gate, ht-1 is the preceding hidden state, xt is the current input vector, and bo is the bias vector for the output gate.

The cell state (Ct) at time t can be updated using these gates as follows:

$${C}_t={f}_t\bullet {C}_{t-1}+{i}_t\bullet \mathit{\tanh}\left({W}_c\ast \left[{h}_{t-1},{X}_t\right]+{b}_c\right)$$
(5)

where • denotes element-wise multiplication and tanh represents hyperbolic tangent function. Wc and bc are weight matrix and bias vector respectively for updating cell state.

Finally, a new hidden state (ht) can be generated using this updated cell state as follows:

$${h}_t={o}_t\bullet \mathit{\tanh}\left({C}_t\right)$$
(6)

The structure of the LSTM model consists of multiple LSTM cells arranged sequentially. Figure 2 shows the structure of the standalone LSTM model. Each LSTM cell takes an input vector (x) and a hidden state (h) from its previous cell as inputs. The LSTM cell processes these inputs through its three gates (input gate “i”, forget gate “f”, and output gate “o”) to update its internal memory (cell state “c”). The updated memory then generates a new hidden state (output “h”). LSTM architecture can have multiple layers stacked on top of each other to form a deep LSTM network. The outputs from each layer are fed into subsequent layers terminating in the output layer that produces the final forecasts or classifications. LSTM architecture has been widely used in various applications such as speech recognition, natural language processing (NLP), image captioning, etc., where sequential data needs to be processed with long-term dependencies.

Fig. 2
figure 2

Structure of a typical LSTM model

Furthermore, the optimization of the number of neurons and epochs in the LSTM network and the batch size and learning rates was conducted through trial and error to enhance the overall accuracy. The implementation of a dropout layer serves as a regularization approach that is employed to mitigate the issue of overfitting within a neural network. During the training process, a subset of neurons is deliberately excluded or "dropped out" randomly, enabling the neural network to acquire knowledge from diverse designs and neuron configurations. In the present investigation, a dropout rate of 3% was implemented on the input layer. The activation function known as Rectified Linear Unit (ReLU) is commonly used in the middle layer of neural networks. The ADAM optimizer method was utilized to adjust the weights. The number of epochs, batch size, and learning rate were configured at 400, 20, and 0.0075, respectively. The results were selected via testing to enhance the level of accuracy of the network.

Feedforward neural network (FFNN) model

FFNN is a type of artificial neural network that consists of several layers with interconnected nodes so that each node performs a non-linear transformation on its inputs. The FFNN can learn complex patterns in the input data and extract useful features that can be used for prediction. Figure 3 displays the general assembly of the FFNN model. Each FFNN model has an input layer, a number of hidden layers, and output layers. A simple case of one hidden layer is presented in Fig. 3. In this study, after rigorous trial and error, the number of hidden layers was considered as one.

Fig. 3
figure 3

Structure of a one-hidden layer FFNN model

In the present study, the interconnection between neurons in each layer was established through weighted connections to all neurons in the subsequent layer—training the neural network involved iteratively updating the weights. The neurons in this architecture exhibited an inability to preserve the current condition. They only allowed for unidirectional signal propagation, perhaps due to the absence of internal interconnections and return loops. The experiment employed a guided test. The training strategies employed to generate the predictive models include Levenberg-Marquardt Back Propagation (LMBP), Bayesian Regularization, and the Scaled Conjugate Gradient approach. During the training phase, weighted connections propagated the input layer characteristics to the subsequent layer. The data underwent processing within the concealed layers and ultimately arrived at the output neuron. The computation involved determining the discrepancy between the output of the network and the intended target aim. This discrepancy was then used to update the layers' weights by propagating the error in a backward manner. The training procedures were deemed complete when no additional weight updates were observed. The Levenberg algorithm was employed to minimize mistakes within the framework of Levenberg's technique.

Furthermore, the neural network's activity was governed by the transfer function utilized in each layer. The transfer functions employed for the initial and secondary hidden layers were sigmoid-tan and sigmoid-log. In addition, the output layer also incorporated the sigmoid-log transferring mechanism.

The proposed hybrid LSTM-FFNN model architecture

A combination of optimization algorithms improves the accuracy of hydrological time series estimation (Mohammadi, 2023). In this study, the long short-term memory (LSTM) network and the feedforward neural network (FFNN) are combined to develop the proposed hybrid LSTM- FFNN algorithm. The LSTM- FFNN algorithm works by first extracting the pertinent features from the input time series data using the LSTM model. Once the features are extracted, these are fed into the FFNN network. The relationship between LSTM and FFNN lies in their architecture and application. While LSTM is used for tasks that involve sequential data, such as speech recognition or language translation, the FFNN is suitable for tasks that require processing static data, such as image recognition or text classification. In this study, both LSTM and FFNN are combined to create a hybrid model that can handle both sequential and static data. This hybrid LSTM-FFNN algorithm is being used for time series estimation of daily ET values, which involves extracting pertinent features of a time-dependent variable and building a relationship based on its past values in predicting future values. Figure 4 displays the flowchart of the proposed hybrid LSTM-FFNN hybrid algorithm. Initially, the predictors are channeled into the algorithm and the assembly of the LSTM model is determined. Then the data training process begins. Once the least error criterion in training data is met, the process of testing commences. By determining the structure of FFNN, the model starts evaluating training and testing again and then determines the prediction accuracy using evaluation criteria. Long Short-Term Memory (LSTM) models and their combination with Feedforward Neural Networks (FFNN) have proven to be effective in a wide range of sequence-based tasks, such as natural language processing and time series forecasting. However, they do have several limitations: Computational complexity, hyperparameter tuning, training time, overfitting (Nguyen et al., 2020).

Fig. 4
figure 4

Flowchart of hybrid LSTM-FFNN modelling framework

The scenario-based modelling approach

To test the pertinence of the LSTM-FFNN model, daily ET estimation was performed using one, two, and three-day antecedent meteorological parameters. The parameters used include average air temperature (T), sunshine hours (S), average relative humidity (RH), average wind speed (W), and reference evapotranspiration (ET), while the integers -1, -2, and -3 in the parameters indicate a delay of one, two, and three days, respectively. The historic data from 1990-2022 have been as predictor inputs in this study.

The heat plots based on Pearson's correlation coefficient were used to study the probable impact of different input variables on ET at Babolsar and Anzali stations as shown in Fig. 5. For Babolsar station, the comparison between input parameters demonstrated that temperature has the highest correlation with ET values. On the other hand, relative humidity showed a negative correlation with ET revealing its lowest correlation with ET. For Anzali station, the comparison between the input parameters showed that antecedent evapotranspiration and then temperature had the highest correlation with ET followed by that was relative humidity, and then the wind speed had the lowest correlation with ET.

Fig. 5
figure 5

Evaluation of ET forecasting using Pearson correlation coefficient heat maps for Babolsar and Anzali stations

To have a comprehensive understanding of the influence of antecedent inputs on ET, a Scenario-based approach with incremental combinations of input series was used as shown in Table 2. For instance, in Scenario 1 (ETt-1, Tt-1) two input series were used including one-day antecedent ET and one-day antecedent T. However, for Scenario 16 a total of 19 inputs were used for daily ET estimations (Table 2).

Table 2 Scenarios used in the case study of daily evapotranspiration estimations

Model evaluation criteria

Model evaluation is one of the most important aspects. As such different evaluation parameters were considered to accurately evaluate the performances of the proposed models for ET estimations. The first criterion, coefficient of determination (R2) varies from 0 to 1, is mathematically described as follows:

$${R}^2={\left(\frac{\left({\sum}_{i=1}^N\left( Oi-\bar {\textrm{O}}\right)-\frac{1}{N}\Big( Pi-\overline{P}\right)}{\sqrt{\sum_{i=1}^N{\left({O}_i-\overline{O}\right)}^2{\sum}_{i=1}^N{\left({P}_i-\overline{P}\right)}^2}}\right)}^2$$
(7)

Another broadly used statistical parameter is the root mean square error (RMSE), and it can be determined as follows:

$$\mathrm{RMSE}=\sqrt{\frac1N\sum\nolimits_{i=1}^N\left(Pi-Oi\right)^2}$$
(8)

The mean absolute error (MAE) is an index that is used to evaluate the model error and varies from 0 to ∞. It can be showed as follows:

$$MAE=\frac{1}{N}{\sum}_{i=1}^n\left| Oi- Pi\right|$$
(9)

Nash-Sutcliffe coefficient (NS) can range from -∞ to 1. If NS = 1, it shows complete agreement (Nash & Sutcliffe, 1970), expressed as:

$$\textrm{NS}=1-\left[\frac{\sum_{i=1}^N{\left( Oi- Pi\right)}^2}{\sum_{i=1}^N{\left( Oi-\overline{O}i\right)}^2}\right]$$
(10)

While the Willmott's index of agreement (WI) ranges between 0 to 1. A value of 1 shows a complete correlation, and 0 shows a mismatch between the observed and predicted values (Willmott, 1981), expressed as:

$$\textrm{WI}=1-\left[\frac{\sum_{i=1}^N{\left( Oi- Pi\right)}^2}{\sum_{i=1}^N{\left(\left| Pi-\overline{O}i\right|+\left| Oi-\overline{O}i\right|\right)}^2}\right]$$
(11)

In Eqs. 7-12, the Pi and Oi are the estimated and observed ith values, respectively. Also, \(\overline{P}\) and \(\overline{O}\) are the mean of estimated and observed values, respectively.

In addition, graphical evaluation of the model performances was also performed. The Taylor diagram, which is a graphical representation of the observed and predicted data, was used to check the accuracy of the used models (Taylor, 2001) together with the scatter and violin plot.

Results

The performance of the hybrid LSTM-FFNN model is comprehensively evaluated for daily ET values at Babolsar and Anzali stations located in Iran using the inputs from 1990-2022. The performance of the hybrid LSTM-FFNN model is compared with the standalone LSTM model.

An evaluation of model performances on the basis of R2, RMSE, MAE, NS, and WI criteria for Babolsar station is presented in Table 3. Based on the results, the models performed better during the training period than during the test period in all scenarios. In the first scenario, LSTM and FFNN-LSTM models had the highest error compared to other scenarios. According to scenario one, the RMSE values for the LSTM model are 1.62 mm/day during training and 1.65 mm/day during testing, while those for the FFNN-LSTM model are 1.22 mm/day during training and 1.26 mm/day during testing. In the second and third scenarios, all the evaluation criteria for the LSTM model had a constant value, which shows that both scenarios generated similar performance, however, for the FFNN-LSTM model, the error value (RMSE) increased from 1.21 to 1.23 mm/day and the other metrics were almost constant. This shows that ETt-3 and Tt-3 parameters did not have much positive effect on the accuracy of both models. In the 4th scenario, the addition of the St-1 parameter to the inputs caused the error criterion to decrease by 3.7% in the standalone model and by 6.7% in the hybrid model. From the 5th to the 12th scenario, the accuracy of the LSTM model remained constant and the addition of parameters did not produce a noteworthy effect on the accuracy of the standalone model. This lack of change indicates that the variables such as relative humidity and wind speed do not have much effect on the prediction accuracy of ET values for the standalone model. On the other hand, for the combined LSTM-FFNN model, the addition of the St-2 parameter in the 5th scenario decreased the model error from 1.15 to 1.02 mm/day whilst a slight improvement in NS and WI criteria was observed. In the 6th and 7th scenarios, the addition of St-3 and RHt-1 parameters did not increase the accuracy of the combined LSTM-FFNN model and it also increase the RMSE values by about 3.9% reducing the model performances. In the 8th scenario, as in the previous two scenarios, the error of the combined LSTM-FFNN model has increased from 1.07 to 1.11 mm/day and has caused the model's accuracy to decrease by 3.7%. In the 9th scenario, although the R2 and WI criteria remained constant, the RMSE and NS criteria improved by 8.5% and 5.3%, respectively. On the contrary, in the 10th scenario, the addition of the Wt-1 input series increased the RMSE by 8.5% and decreased NS by 5.3%, although R2 and WI criteria remained unchanged. Both the 11th and 12th scenarios showed improved accuracy of the LSTM-FFNN model. In the 11th scenario, the error reduced from 1.11 to 1.05 mm/day and in the 12th scenario the error reduced from 1.05 to 1 mm/day. For the 13th scenario, adding the Tt input series increased the RMSE value by 3.1% and 18.2%, respectively, for both the standalone LSTM and combined LSTM-FFNN models, and other criteria also decreased relatively. On the other hand, in the 14th scenario, the RMSE for the LSTM model decreased from 1.63 to 1.58 mm/day and for the FFNN-LSTM model the decrease was from 1.2 to 1.02 mm/day, which can be attributed to the addition of the St input series. The 15th and 16th scenarios had similar accuracy, yet in comparison to the 14th scenario, the error rate of both of the models decreased slightly.

Table 3 Statistical evaluation of model performances during the training and testing period at Babolsar station

Comparing and evaluating the results in the Babolsar station for the LSTM model also showed that the 5th, 8th, 11th, 15th, and 16th scenarios performed better than other scenarios of the standalone model. The results of this study are in agreement with the results of Roy (2021), who used short-term memory networks to predict one step ahead evapotranspiration at Ghazipur station located in Bangladesh during 2004-2019 and showed that the bi-directional LSTM (Bi-LSTM) model with a correlation coefficient of 0.99 and then the LSTM model with a correlation coefficient of 0.69 have high accuracy in predicting evapotranspiration. Both the standalone and the hybrid FFNN-LSTM models showed that the 12th, 15th, and 16th scenarios had the best performance compared to the rest of the scenarios. The performance of the FFNN-LSTM model was higher in these scenarios with R2 = 0.79, RMSE = 1 mm/day, MAE=0.72 mm/day, NS = 0.79, and WI = 0.94.

In addition to statistical evaluation, the graphical evaluation via the scatter plots for the best scenarios of both LSTM and LSTM- FFNN models for Babolsar station are illustrated in Fig. 6. The scatter plots reveal that the 16th scenario has a slightly higher correlation than the 5th, 8th, 11th, and 15th scenarios registered by the LSTM model. In concurrence with the outcomes of Table 3, the 16th scenario has slightly higher accuracy than the 15th and 12th scenarios for the LSTM-FFNN model (Fig. 6). With that, the combined LSTM-FFNN models registered better performance than the standalone LSTM model.

Fig. 6
figure 6

Scatter plots of the best scenarios in the testing period at Babolsar station

In the case of Bandar Anzali station, the performance evaluation of the proposed LSTM-FFNN and standalone LSTM models during the test phase are presented in Table 4. A comparison of the first and second scenarios for the LSTM model shows that the R2 criterion has increased a little and other criterion have remained unchanged. For the 3rd and 4th scenarios, the WI criterion was constant, while the RMSE decreased by 1.2%. On the other hand, the NS indices increased by 4.1%, and the R2 also increased slightly. The WI index in the 6th scenario showed a slight increase compared to the 5th scenario, while the other parameters did not change. In the 7th and 8th scenarios, only the R2 increased slightly among all criteria. The comparison between the 9th and 10th scenarios also shows that the addition of the Wt-1 parameter did not improve the accuracy of the standalone LSTM model (slight increase in the RMSE index) and also a decrease in the R2 criterion was noted. Also, the examination of the performance of the 11th and 12th scenarios shows that the 12th scenario has performed better by reducing the amount of error to a small extent. For the 13th and 14th scenarios, the values ​​of all criteria except NS improved by about 1% and the RMSE error values reduced. These results persisted in the last three scenarios (Scenarios 14-16).

Table 4 Statistical evaluation of model performances during the training and testing period at Bandar Anzali station

For the proposed hybrid LSTM-FFNN model, the evaluation between the first and second scenarios showed that the addition of ETt-2 and Tt-2 parameters to the first scenario caused a slight increase in error while the other metrics remained unchanged. A comparison between the 3rd and 4th scenarios revealed that the error of the 4th scenario decreased compared to the third scenario and reached the value of 1.14 mm/day, and the correlation coefficient R2 also increased slightly. A comparison between the 5th and 6th scenarios also indicated that the 6th scenario had a better performance registering an increase in R2, NS, and WI criteria by 2.4%, 3.7%, and 1.1%, respectively, and a reduction in RMSE,and MAE error criteria by 6.7%,and 15.8%, respectively. On the other hand, the evaluation of the 7th and 8th scenarios showed that the 8th scenario had lower accuracy than the 7th scenario due to an increase in error by 9.4% and a decrease in the NS criterion by 5%. By increasing the RMSE value from 0.99 to 1.03 mm/day, the 10th scenario also showed a decrease in performance accuracy. Examining the performance metrics of the 11th and 12th scenarios showed that the values remained unchanged except for the RMSE, which showed a very slight decrease in the 12th scenario. In comparison to the 13th scenario, the 14th scenario indicated that it has produced a better accuracy by reducing the RMSE,and MAE criterion by 14.5%, and 22.2 respectively, while an increase in R2, NS, and WI was recorded. The results in the 15th scenario were the same as in the 14th scenario. In contrast, in the 16th scenario, a decrease in the accuracy of the LSTM-FFNN model was noted whereby the RMSE error increased by 20.6% compared to the 15th scenario.

Moreover, an evaluation of all scenarios for LSTM and hybrid LSTM-FFNN models revealed that the 14th and 15th scenarios registered the highest accuracy with the lowest error (RMSE=0.96 mm/day) for the combined LSTM-FFNN model. For the standalone model, the 15th and 16th scenarios had the highest performance (RMSE=1.66 mm/day) compared to other scenarios. In a study, Granata and Di Nunno (2021) predicted evapotranspiration in different climates in the state of Florida using a set of recurrent neural networks including LSTM and nonlinear autoregressive networks with exogenous inputs (NARX). The results showed that in the humid subtropical climate of South Florida, the model based on LSTM performed better than the model based on NARX and showed good accuracy with R2=0.81. Therefore, in this study, the standalone LSTM model with R2=0.83 agrees with the study by Granata & Di Nunno (2021). Our study further ascertains that a hybrid modelling approach as the proposed LSTM-FFNN model is able to have better accuracies in comparison to the standalone one with R2=0.85.

To further attest the better accuracy of the hybrid LSTM-FFNN model, Fig. 7 shows the scatter plots of the best scenarios of standalone LSTM and the hybrid LSTM-FFNN models for the Bandar Anzali station. Although the accuracy in both the 15th and 16th scenarios is almost the same for the standalone LSTM model, the observed and predicted values are more correlated with each other. This important evaluation could only be seen graphically. Also, for the combined hybrid LSTM-FFNN model, the scatter points in the 15th scenario are closer to the 45° (y=x) line with the gradient very close to unity (m = 0.847) and therefore have higher better accuracy.

Fig. 7
figure 7

Scatter plots of the best scenarios in the testing period at Bandar Anzali station

Moreover, the violin plots for the best scenarios for respective models at Babolsar and Anzali stations are displayed in Fig. 8. In all the violin plots, the white squares and circles indicate the mean and median of the data series. The results of the LSTM models for the Babolsar station under all scenarios show that the forecasted values ranged between 1.5 to 2.5 mm/day, which are not as the observed ET values are distributed. The comparison of the scenarios also shows that there was the highest number of predicted data around ET=2 mm/day. For the FFNN-LSTM model, the estimated values in Scenario 16 closely followed the observed values ET with a narrow tail (extreme values) and more values concentrated around ET=1 mm/day. This revealed the better performance of the FFNN-LSTM model.

Fig. 8
figure 8

Violin plots for the best scenario in each model during the testing period

For the Anzali station, the evaluation of the plot related to the standalone LSTM model showed that the highest dispersion of the LSTM estimated values in the best scenarios was around ET=2.1 mm/day while the observed ET values had the highest dispersion around ET=1 mm/day. Also, even though the scenarios have a mean value almost the same as the observed value, all the scenarios have a higher median than the observed value. However, the combined FFNN-LSTM model again showed a better performance. The comparison of the scenarios showed that the combined FFNN-LSTM model was able to predict the ET values with acceptable accuracy. The violin plots also revealed that the 14th and 15th scenarios were very similar and were closest to the distribution of observed ET values providing the best prediction scenario.

Discussions

Taylor diagrams (Fig. 9) are also excellent at evaluating the performances of respective models. The Taylor diagrams in Fig. 9 present the best-case scenario of both the models during testing period. For Anzali station, a comparison between the models clearly showed that LSTM-FFNN under Scenario-15 had the highest correlation compared to other models. It provided the best forecast, followed by this was LSTM-FFNN under Scenario-14 with slightly lower correlation and standard deviation. Among the standalone model scenarios, LSTM under Scenario-15 had the best performance with the highest correlation. For Babolsar station, the comparison between the combined LSTM-FFNN models showed that although all three scenarios 12, 15, and 16 have the same correlation coefficient, the 16th scenario has a better performance than the competing 15th and 12th scenarios. Among the standalone LSTM model scenarios, all scenarios had similar standard deviations, but the correlation coefficients of scenarios 5, 8, 11, 15, and 16 slightly increased.

Fig. 9
figure 9

Taylor diagrams for the best modelling Scenarios at both study sites

Comparing and evaluating the accuracy of scenarios for both stations showed that scenario 15 was the best scenario for both LSTM and hybrid LSTM-FFNN models registering the highest accuracy. Also, the investigation of the effectiveness of the input variables for both stations showed that the sunshine hours were the most important variable in increasing the accuracy of ET estimation, which significantly increased the accuracy of the models. The importance of the sunshine hours for daily ET estimations has been highlighted in other studies as well. Petković et al. (2015) determined the most influential weather parameters on reference evapotranspiration estimation with the adaptive neuro-fuzzy inference system (ANFIS) method. For this research, the weather data set from 12 meteorological stations in Serbia between 1980 and 2010 was used. The results indicated that sunshine hour is the most influential single parameter for estimating evapotranspiration (RMSE = 0.4398 mm/day). In another study, Biazar et al. (2019) analyzed the sensitivity of crop evapotranspiration in a humid region in the north of Iran. The results showed that the most important evapotranspiration parameter at Lahijan station was sunshine hours. One limitation of the current study was the unavailability of solar radiation data, hence was not used in this study. It is suggested solar radiation could be used as one of the inputs to the models in further studies.

The Wilcoxon signed-rank test is a non-parametric statistical test that compares two paired samples from a single population (Taheri & Hesamian, 2013). The Wilcoxon signed-rank test was conducted to compare the performance of two different models: LSTM and FFNN-LSTM in estimating ET0 at two stations, Anzali (Table 5) and Babolsar (Table 6). In summary, the Wilcoxon signed-rank test results clearly indicate that the FFNN-LSTM model consistently and significantly outperforms the LSTM model in both positive and negative cases at both Anzali and Babolsar stations in all scenarios. These results highlight the effectiveness of the hybrid FFNN-LSTM approach in improving the accuracy of daily ET predictions in these regions. It's worth noting that the "Test statistics" values in the tables (W and Z) indicate the degree of significance, with values significantly deviating from zero implying stronger statistical significance in favor of the FFNN-LSTM model.

Table 5 Wilcoxon signed-rank test for Anzali station
Table 6 Wilcoxon signed-rank test for Babolsar station

Conclusions

The ever-increasing reduction of water resources requires precise information for prudent water resource management decisions. The ET provides valuable information that can aid tasks such as irrigation planning, ultimately contributing to more efficient and effective practices in the agricultural sector. Therefore, this study proposes and evaluates a new combined LSTM-FFNN method to predict daily ET in some coastal areas of Iran. The modelling is performed under 16 scenarios and the hybrid LSTM-FFNN is benchmarked with the standalone LSTM method. The correlation analysis of meteorological parameters showed that air temperature had a high correlation with ET values. The outcomes reveal that the models at Babolsar station the LSTM-FFNN models (RMSE=1.57,1 mm/day) had the best performance in the 15th and 16th scenarios while for Bandar Anzali station, the 15th scenario had better performance for both the hybrid LSTM-FFNN model (RMSE=1.66, 0.96 mm/day) and the standalone LSTM models. The evaluation of the two models showed that the hybrid LSTM-FFNN model can significantly improve the prediction accuracy of ET in humid conditions. The use of this LSTM-FFNN hybrid model is suggested to be explored in further studies at varied sites to comprehensively evaluate model performance. There are some limitations to the study. The study focuses on specific coastal areas in Iran, and the model's applicability to other regions with different climatic conditions has not been explored. Data limitations may affect the generalizability of the findings since the accuracy of the models greatly depends on the quality and availability of meteorological data. The optimized hybrid models have significant potential to assist farmers and irrigation planners in making more informed and precise decisions as accurate prediction of the reference evapotranspiration parameter will lead to the design of efficient and high-efficiency irrigation systems.