1 Introduction

Agricultural production is mainly dependent on the effective utilization of the available water resources, especially under drought-prone, dry, sub-humid and semi-arid climatic regions. For efficient water resource management, measurement or accurate estimation of evaporation losses is extremely important [1, 2]. The pan evaporation (EP) is considered as most valuable input for determining crop water requirement, irrigation scheduling, rainfall runoff modeling, computation of balance parameters, etc., to ensure judicious use of available water resources. Evaporation is a surface phenomenon in which liquid water gets converted to a gaseous form below its boiling point. The state of climatic variables such as temperature, humidity, wind speed and sunshine surrounding the evaporating water surface hugely influences the process of evaporation from the water bodies. Higher temperature increases the kinetic energy of water particles at the surface, and the inter-particle space between water particles also gets increased. As a result, the inter-particle force of attraction between the water particles at the water surface decreases, and because of that, liquid water gets converted into gaseous form and gets evaporated. Another climatic factor that influences the evaporation process is humidity which is a measure of water vapor present in the air. When air humidity is less, more amount of water vapor gets accommodated in the air and thereby the rate of evaporation from the evaporating surface increases with the decrease in air humidity. Wind speed is another climatic factor which triggers the evaporation process. When wind speed increases, it carries away greater amount of water vapor present in the air surrounding the water surface and hence the amount of water vapor that can be accommodated in the air increases and thereby more water particles get converted into gaseous form and the rate of evaporation increases.

Evaporation is measured precisely using Class A evaporation pan standardized by the National Weather Service of the USA. Installation and maintenance of such equipment for recording evaporation on a daily basis is a cumbersome task and requires a skilled workforce [3]. Alternatively, it is often estimated using different climatic variables affecting the evaporation process through an empirical approach. Due to the highly complex physical and nonlinear nature of the evaporation process, it is furthermore difficult to model evaporation through empirical methods as well [4]. Moreover, empirical model developed for one agro-climatic situation may not perform well in other situations and needs recalibration of model coefficients before its implementation. Attempts are made by researchers to model evaporation process and to develop several empirical formulae in the past which are discussed in the literature [5,6,7,8,9,10]. Among empirical methods, evaporation estimates obtained through Penman equation are considered as most precise, and therefore, it is widely used and globally accepted. However, application of Penman method is limited as it requires other climatic inputs such as net radiation and vapor pressure deficit.

Considering the limitations associated with both measurement and empirical approaches discussed so far for evaporation estimation, in the recent past researchers employed several data-driven computational intelligence and machine learning techniques with different optimization algorithms and have provided some alternate machine learning solutions to the problem with different input combinations of available climatic variables such as temperature, humidity, wind speed, sunshine, solar radiation and vapor pressure [11,12,13,14,15,16,17,18,19,20,21,22]. A comprehensive review of the available literature has been carried out, and significant results of some of the recently published research articles are discussed briefly in this section. In a study carried out by Deo et al. [23] monthly evaporative losses had been estimated using three machine learning techniques, namely relevance vector machine (RVM), extreme learning machine (ELM) and multivariate adaptive regression spline, using meteorological parameters as predictor variable and RVM was found to be the best predictor among these. Wang et al. [24] have investigated the potential of multilayer perceptron (MLP), generalized regression neural network (GRNN), fuzzy genetic (FG), least square support vector machine (LSSVM), multivariate adaptive regression spline (MARS) and adaptive neuro-fuzzy inference systems with grid partition (ANFIS-GP) for estimating evaporation and compared the results with regression methods in different climates of China. They have found that heuristic technique generally performed better than regression and empirical methods. The MLP ranked first concerning accuracy among other complex nonlinear heuristic models considered in this study. In another investigation carried out by Wang et al. [25], daily EP is estimated using fuzzy genetic (FG), least square support vector regression (LSSVR), multivariate adaptive regression spline (MARS), M5 model tree (M5Tree) and multiple linear regression (MLR) for eight stations around Dongting Lake basin in China. Investigations suggest that FG and LSSVR provide better performance over other machine learning techniques. Monthly EP has been estimated by Malik et al. [26] in the Indian central Himalayas region, employing MLPNN, co-active neuro-fuzzy inference system (CANFIS), radial basis neural network (RBNN) and self-organizing map neural network (SOMNN). Gamma test is used for the selection of appropriate input combination. They reported the superiority of CANFIS over other techniques. Tezal and Buyukyildiz [27] have studied the applicability of MLP, RBFN and ε-support vector regression (SVR) using different training algorithms. Both ANNs and SVR with a scaled conjugate gradient (SCG) learning have performed better as compared to empirical methods. In one of the studies, Kisi et al. [28] have explored the potential of decision tree-based machine learning methods such as Chi-square automatic interaction detector (CHAID) and classification and regression tree (CART) and compared these with the neural network model for daily EP estimation in Turkey. Comparison of results show that neural networks performed better as compared to other models in different scenarios. The conjugate gradient optimization method is employed to calibrate three nonlinear mathematical models in a few locations of Iran by Keshtegar et al. [29]. The results indicate that proposed models ranked higher than adaptive neuro-fuzzy inference system (ANFIS) and M5 model tree (M5Tree) models. Goyal et al. [30] have examined the applicability of ANN, LSSVR, fuzzy logic (FL) and ANFIS techniques in estimating daily EP, and the results are compared with empirical methods suggested by Hargreaves and Samani (HGS) and the Stephens–Stewart (SS). Investigation unveils that daily evaporation can be modeled successfully and more accurately by FL and LSSVR techniques which are superior to traditional approaches. In addition, machine learning and evolutionary techniques have been successfully implemented in various other fields including biomedical science for prediction of proteins’ secondary structure [31], load frequency controller design and renewable distributed generations [32,33,34] and for solving second-order boundary value problems and fuzzy differential equations [35,36,37,38].

It is learnt from the literature review that among different machine learning methods applied so far, ANNs with appropriate learning algorithm have proven potentially capable of modeling evaporation process in diverse locations and have performed better than more complex structures. Prediction task is nonlinear in nature, and hence, the adaptive model for prediction should have nonlinear characteristics. Out of different ANN structures reported in the literature, the Deep-LSTM is capable of capturing higher-order nonlinear features. The Deep-LSTM is a stack of LSTM units where different orders of nonlinear feature representation are captured by LSTM unit at different depths, which explores the inherent features of time series over longer time period to attain improved prediction performance [39, 40]. Since nonlinear features are more suited for nonlinear prediction, the Deep-LSTM is a better candidate for prediction of daily EP. Hence, Deep-LSTM is employed for prediction purpose in this paper.

The methodology section provides detailed description about the study locations, data sets, architecture and implementation of Deep-LSTM neural network, MLANN and empirical methods (Blaney–Criddle and Hargreaves) considered in this paper. Simulation study and the results obtained are elaborated in the subsequent section. Finally, the contribution of the study is summarized in the concluding section at the end.

2 Methodology

2.1 Study area

The present investigation is carried out for three representative stations: Raipur, Jagdalpur and Ambikapur, from three distinct agro-climatic zones (ACZs) of Chhattisgarh state in east-central India (Fig. 1). ACZ refers to a land unit in terms of its major climate and growing period which is climatically suitable for certain range of crops and cultivars. The climate of Chhattisgarh is dry and sub-humid in general with potential evaporation losses being more than the average annual rainfall of the state, which is about 1400 mm. Raipur is located in Chhattisgarh plains ACZ with average annual of about 1200 mm, whereas Jagdalpur and Ambikapur are located in Bastar plateau and Northern hills ACZs with an average annual rainfall of 1400 mm and 1600 mm, respectively. Long-term daily weather data on maximum temperature (Tmax), minimum temperature (Tmin), morning and afternoon relative humidity (RHI and RHII), wind speed (WS), bright sunshine hours (BSS) and EP (EP) are collected from meteorological observatories located at respective stations. All these meteorological observatories are well maintained and certified by the India Meteorological Department, Govt. of India. The details about data sets are given in Table 1. Descriptive statistics of the data sets considered for this investigation are presented in Table 2.

Fig. 1
figure 1

Location map of the study area

Table 1 Data sets used for the study
Table 2 Descriptive statistics of daily climatic variables of Raipur, Jagdalpur and Ambikapur

2.2 Deep-LSTM architecture

Deep neural networks are a class of recursive feedforward networks, which can extract and learn features which are deeply embedded in the data. Deep networks are broadly categorized into two classes: classical and modern deep networks. Recently, deep learning techniques have been successfully employed in natural language processing [41], sequence learning [42] and time series predictions such as financial market and wind forecasting [43, 44]. Deep networks differ from other feedforward networks in terms of independence of the connecting nodes. In traditional feedforward network, nodes receive input from the previous node only and are independent of all other nodes. In deep networks, the nodes are massively interdependent and share weights which signify the idea of long-term dependencies. In other words, the current node receives input not only from the previous node but also from many other previous nodes. There may be nodes which have self-connection loops as well, which signifies interconnected hidden states for the same node in the time domain. The long-term dependency on the input requires the network to keep previous states in memory. Conventional recurrent networks face a vanishing gradient problem while dealing with the need for storing information about long-term inputs. The vanishing gradient problem is a condition in which the input between different hidden states at different time steps decreases exponentially. Long short-term memory (LSTM) networks are a class of recurrent networks which handle vanishing gradient problem efficiently. LSTM networks have been successfully applied in natural language processing. In this study, we have constructed a deep recurrent network comprising of layers of LSTM networks subsequently named as Deep-LSTM network. The synthesized network suitably utilizes the advantages of hugely successful deep networks and LSTM recurrent networks. Deep-LSTM networks [45] manage the vanishing gradient problem by incorporating the idea of memory cells. Deep-LSTM neural network consists of some internal contextual state cells that act as long-term or short-term memory cells. The output of Deep-LSTM neural network is dependent on the state of these cells. This feature assists the network for the prediction purpose because such task needs the historical context of inputs, rather only the last input. The working mechanism of Deep-LSTM networks is solely dependent upon the memory cell. Memory cells have different subunits with different objectives as shown in Fig. 2. The working principle of Fig. 2 is dealt in brief.

Fig. 2
figure 2

A block diagram of an LSTM network

The input node gt receives input xt, from the input layer of the deep network and from the previous hidden states ht−1 of the node itself in time steps. The data to be predicted are nonlinear in nature. Hence, the model which is used to predict nonlinear output should have nonlinear element. The sigmoid is a nonlinear function and helps improve prediction accuracy. Therefore, the weighted sum of xt and ht−1 is passed through a tanh function, as given in Eq. 1.

$$g_{\text{t}} = { \tan }h\left( {x_{\text{t}} \cdot W_{\text{gx}} + h_{t - 1} W_{\text{gh}} + {\text{bias}}_{{{\text{input}}\;{\text{node}}}} } \right)$$
(1)

The input gate (it) is similar to the input node as this also receives the same inputs as the input node. This is a unit sigmoidal activation function. It is termed as input gate as it blocks the flow of inputs from other nodes to the current node, if its net value is zero. It allows the values to pass through if the net value is one. Its operation is represented by Eq. 2.

$$i_{\text{t}} = \sigma \left( {x_{\text{t}} \cdot W_{\text{gx}} + h_{t - 1} W_{\text{gh}} + bias_{{{\text{input}}\;{\text{gate}}}} } \right)$$
(2)

The internal state st is a node with a self-loop recurrent edge of unit weight and a linear activation function, which is updated using Eq. 3.

$$s_{\text{t}} = i_{{_{\text{t}} }} \odot g_{{_{\text{t}} }} + s_{t - 1}$$
(3)

The forget state (ft) is a subunit to reinitiate the internal state of the memory cell and is formulated as Eq. 4.

$$f_{\text{t}} = \sigma \left( {x_{\text{t}} .W_{\text{hx}} + h_{t - 1} W_{\text{fh}} + {\text{bias}}_{\text{forget}} } \right)$$
(4)

Finally, the output gates Ot perform the task given in Eq. 5.

$$O_{\text{t}} = \sigma \left( {x_{\text{t}} \cdot W_{\text{ox}} + h_{t - 1} W_{\text{oh}} + {\text{bias}}_{{{\text{output}}\;{\text{gate}}}} } \right)$$
(5)

The final output of the memory cell is computed using Eq. 6.

$$h_{\text{t}} = { \tan }h\left( {s_{\text{t}} } \right) \odot O_{\text{t}}$$
(6)

where \(s_{\text{t}} = g_{{_{\text{t}} }} \odot i_{\text{t}} + s_{t - 1} \odot f_{{_{\text{t}} }}\)

The network architecture of Deep-LSTM used for simulation study is shown in Table 3. The output layer has only one node and hence not provided in this table.

Table 3 Architecture of LSTM model

2.3 Multilayer artificial neural network (MLANN)

A commonly used neural network structure, MLANN suggested by Haykin [46] consists of an input layer, one intermediate hidden layer and an output layer. An N-5-1 structure of MLANN, with N the number of input nodes, five neurons in the hidden layer and one neuron at the output layer, is considered in this study. The training of the weights of different layers is carried out by conventional backpropagation algorithm. Basically, there are two passes through the different layers of the network: forward pass and the backward pass. The forward pass produces an estimated output. The output error term in a modified form is backpropagated from last layer to input layer to adjust the connecting bias and weights of different layers. The specifications of the MLANN structure used in this study are given in Table 4.

Table 4 Architecture of MLANN models used

2.4 Hargreaves method

Hargreaves et al. [47] have suggested computing the potential atmospheric evaporative demand termed as reference evapotranspiration (ET0) with maximum and minimum temperatures as

$$ET_{0} = \, 0.0023R_{\text{a}} T_{\text{d}} 0.5\left( {T_{\text{m}} + 17.8} \right)$$
(7)

where Ra = water equivalent of extra-terrestrial radiation (mm day−1), Td = difference between maximum and minimum temperatures (°C), and Tm = mean temperature (°C).

2.5 Blaney–Criddle method

Blaney–Criddle empirical equation reported in FAO irrigation and drainage paper no. 24 [48] is used to compute reference evapotranspiration (ET0) with available data on temperature, humidity, wind speed and sunshine hours. The empirical equation is given as

$$ET_{ 0} = \, a + b\left[ {p\left( {0.46T \, + \, 8.13} \right)} \right]$$
(8)

where a = 0.0043 RHII − (n/N) − 1.41, b = ao + a1RHII + a2(n/N) + a3Ud + a4RHII(n/N) + a5RHminUd, ET0 = reference evapotranspiration in mm day−1, T = (Tmax+Tmin)/2) = mean daily temperature in °C, p = mean daily percentage of total annual daytime hours, n/N= ratio of possible to actual sunshine hours, RHmin = minimum daily relative humidity in percentage, Ud = daytime wind at 2 m height (ms−1), a0 = 0.81917, a1  = 0.0040922, a2 = 1.0705, a3 = 0.065649, a4 = 0.0059684, and a5 = 0.0005967.

3 Performance evaluation criteria

The performance of the proposed prediction model is evaluated by computing root-mean-square error (RMSE), coefficient of determination (R2) and model efficiency factor (EF) [49] between desired and estimated values of evaporation for the data sets considered. These are defined as

$$RMSE = \sqrt {\frac{1}{T}\sum\limits_{i = 1}^{T} {({\text{Out}}_{\text{est}} - {\text{Out}}_{\text{obs}} )^{2} } }$$
(9)
$$R^{2} = \frac{{\left( {\sum\nolimits_{i = 1}^{T} {\left( {{\text{Out}}_{\text{obs}} - \overline{{{\text{Out}}_{\text{obs}} }} } \right)\left( {{\text{Out}}_{\text{est}} - \overline{{{\text{Out}}_{\text{est}} }} } \right)} } \right)^{2} }}{{\sum\nolimits_{i = 1}^{T} {\left( {{\text{Out}}_{\text{obs}} - \overline{{{\text{Out}}_{\text{obs}} }} } \right)^{2} \sum\limits_{i = 1}^{T} {\left( {{\text{Out}}_{\text{est}} - \overline{{{\text{Out}}_{\text{est}} }} } \right)^{2} } } }}$$
(10)
$$EF = 1 - \frac{{\sum\nolimits_{i = 1}^{T} {\left( {{\text{Out}}_{\text{est}} - {\text{Out}}_{\text{obs}} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{T} {({\text{Out}}_{\text{obs}} - \overline{{{\text{Out}}_{\text{obs}} }} )^{2} } }} \quad \left( { - \infty \le EF \le 1} \right)$$
(11)

where Outobs and Outest represent the desired and estimated evaporation values, respectively. T is the total number of input patterns, and i denotes the number of particular instances of input patterns. RMSE value should be close to 0, and R2 and EF values should be near to 1.

4 Simulation study and results

To estimate the daily evaporation with the help of data, the Deep-LSTM and MLANN models are simulated in Python and MATLAB, respectively, with different input combinations as shown in Table 5. The number followed by model name represents the number of input parameters. Availability of consistent long-term weather data has always been one of the significant constraints in deciding input combination. Hence, the correlation coefficient between daily climatic factors influencing evaporation process and EP values (Table 2) forms the basis for selection of input combinations. The main focus of this study is to make effective utilization of available climatic data and to model the daily evaporation process with minimum input parameters with higher accuracy.

Table 5 Input data combination used in Deep-LSTM, MLANN and empirical models

A significantly large number of available daily input patterns, i.e., 17–35 years of daily weather data, are used for simulation study for each of the data sets as shown in Table 1. To attain consistency, each data set is normalized between 0 and 1 before presenting to the model for training and testing and then renormalized to its original unit for final comparison between actual and estimated outputs.

$$\left( {X_{\text{k}} {-} \, X_{ \hbox{min} } } \right)/\left( {X_{ \hbox{min} } {-} \, X_{ \hbox{max} } } \right)$$
(12)

where Xk = kth sample value of the input parameter, Xmin = minimum of the input parameter, and Xmax = maximum of the input parameter.

Training of the proposed models with desired input combination is done with 80% of the available data for model development, and the remaining 20% data are used to test the model performance. In order to train the MLANN model, the first pattern is given as input to neural network, and after the forward pass and estimation process, the final output is obtained at the output node. All the training patterns are applied sequentially. The process continues till the remaining input patterns are exhausted. The outputs corresponding to each input pattern are compared with the desired output to produce error term. The change in weights in each path is calculated using backpropagation learning algorithm. The change in weight of each path of the model is stored for every input. Then the average change in weight in each path is calculated. The weights are then updated by adding the average change in weight of each path. This constitutes one iteration. The same process is continued for 5000 iterations. The value of the convergence coefficient (µ) is fixed at 0.01 as it provides better training. This completes one experiment, and the same experiment is repeated for ten independent times. For steady-state estimation of different weights, in each iteration, the root-mean-square error (RMSE) is computed using error value of each pattern. The training process is stopped when the RMSE value achieves the best possible minimum value. After the training process is over, the weights and biases at each layer of the neural network are fixed according to final iteration. To validate the prediction performance of this model, test patterns are fed sequentially. For each test pattern, the estimated output is obtained and compared with the desired values using performance evaluation measures RMSE, R2 and EF for each of the models and data sets. Model parameters during training are optimized to minimize the RMSE toward zero and maximize the R2 and EF toward one for achieving improved performance.

Similarly, the Deep-LSTM is trained for three different data sets used for the study with the same basic configuration as given in Table 3. The network has one LSTM layer, two dense layers and one output layer. Hyperbolic tangent function (tanh) is used for activation and hard sigmoid function for recurrent activation, both of which are default in LSTM layer. For output layer sigmoid activation function is used. Default ‘glorut uniform,’ for kernel initializers and bias initializers as ‘zeros’ are used for LSTM layer. For better convergence, Adam optimizers [50] with default parameter settings (β1 = 0.9, β2 = 0.999 and learning rate = 0.01) have been used. Layer_1 or Layer_2 regularization is not used as previous studies have found that model performance does not improve with regularization for sequence learning problem [51, 52]. Dropout affected the performance of the model. Hence, subsequently, dropout is not preferred in any of the layers. All the architectures that have been implemented for this study are achieved by using open-source software library Tensorflow [53], Keras high-level neural networks API [54] and scikit-learn [55] on a Dell PowerEdge T130 server, set to CPU execution. The stopping criterion used during training phase is attainment of minimum and consistent root-mean-square error. The training is stopped when the RMSE attains a possible minimum value and then remains almost constant.

Test performance based on evaluation criteria considered for the proposed models is shown in Table 6. It is observed from the inferences that marked improvement has been observed in terms of RMSE, R2 and EF with the proposed Deep-LSTM models over conventional MLANN and empirical models for each of the data sets. The RMSE values at Raipur improved from 1.21 to 0.98 with a Deep-LSTM model against the RMSE values 1.40–1.15 with MLANN with increasing number of input features. The magnitude of improvement is less with the increasing number of input features. In the other two locations Jagdalpur and Ambikapur the improvement in RMSE values obtained with Deep-LSTM models is only slightly better as compared to MLANN models for all four input feature combinations. At Jagdalpur the RMSE values improved from 1.09 to 0.97 with Deep-LSTM model followed by 1.08 to 1.03 with MLANN models with increasing number of input features. Similar trend in RMSE values is observed in Ambikapur also where it improved from 1.07 to 0.93 for Deep-LSTM models followed by 1.12 to 0.96 with MLANN models as the number of input features increased.

Table 6 Comparison of test performance of neural network (Deep-LSTM and MLANN) and empirical models (Blaney–Criddle and Hargreaves) for daily EP estimation at Raipur, Jagdalpur and Ambikapur

With regard to R2 and EF, both improved marginally with Deep-LSTM models over MLANN models; however, higher values of R2 and EF ranging from 0.865 to 0.915 and 0.826 to 0.915, respectively, with increasing number of input combinations for neural network model (Deep-LSTM and MLANN) at Raipur are noticeable and encouraging. A similar trend is observed at Jagdalpur also with a comparatively lower magnitude of R2 and EF ranging from 0.727 to 0.769 and 0.708 to 0.768, respectively, with increasing number of input features for neural network models. Further, the magnitude of R2 and EF at Ambikapur is even less as compared to other two stations. However, R2 and EF improved from 0.670 to 0.716 and 0.481 to 0.638, respectively, for different neural network models with increasing number of input features. The performance of both Deep-LSTM and MLANN models is superior to empirical methods in terms of RMSE, R2 and EF in all the three locations. Deep-LSTM model ranked top among the models as it performed better in all three objectives (RMSE, R2 and EF) in most cases and at least two objectives in some cases under different scenarios of input combinations. The difference in magnitudes of RMSE values at Raipur, Jagdalpur and Ambikapur is observed mainly because of difference in agro-climatic situation of the respective ACZ. The low R2 and EF at Jagdalpur and Ambikapur as compared to Raipur may be associated with the variation in correlation coefficient between climatic factors and EP in respective station. Availability of less number of input patterns for model development may also be one of the reasons for poor predictive performance of the proposed models at Jagdalpur and Ambikapur as compared to Raipur.

Comparison between observed and predicted daily EP for Deep-LSTM-6, MLANN-6, Blaney–Criddle and Hargreaves models at Raipur, Jagdalpur and Ambikapur during the testing phase is shown in Figs. 3a–d, 4a–d and 5a–d, respectively. Relationship between observed and predicted values of daily EP for Deep-LSTM6, MLANN-6 and Blaney–Criddle models at Raipur, Jagdalpur and Ambikapur is also shown in Figs. 3e–h, 4e–h and 5e–h, respectively. It is observed that, pictorially, it is difficult to differentiate the performance of Deep-LSTM-6 and MLANN-6 on a daily scale; however, daily estimates obtained through Deep-LSTM-6 and MLANN-6 models seem to be in close agreement with the observed ones in most of the cases as compared to empirical models in all stations. The empirical models either underestimate (at Raipur) or overestimate (at Jagdalpur and Ambikapur) daily EP and are unable to predict peak evaporation rates during summer season. Further, smaller intercept values close to zero indicate that the proposed Deep-LSTM-6 estimate exhibits a close relationship with observed evaporation in most cases as compared to other models.

Fig. 3
figure 3

ah Comparison of observed and estimated daily EP and their relationship for Deep-LSTM-6, MLANN-6 and empirical models (Blaney–Criddle and Hargreaves) at Raipur

Fig. 4
figure 4

ah Comparison of observed and estimated daily EP values and their relationship for Deep-LSTM-6, MLANN-6 and empirical models (Blaney–Criddle and Hargreaves) at Jagdalpur

Fig. 5
figure 5

ah Comparison of observed and estimated daily EP values and their relationship for Deep-LSTM-6, MLANN-6 and empirical models (Blaney–Criddle and Hargreaves) at Ambikapur

5 Statistical studies for model selection

For selection of appropriate regression models among the models under investigation, two statistical analyses, namely paired-t test and Akaike information criterion (AIC), have been conducted and the results obtained are discussed:

5.1 Paired t test

In order to further examine the performance of models under consideration a paired t test is conducted to test the null hypothesis that the pairwise difference between squared errors obtained for different models has a mean equal to zero or no significant difference exists in estimated outputs of compared models. The alternate hypothesis says that the difference among estimated outputs of compared model is statistically significant. The t test returns ‘h’ and ‘p’ values as result. The h value equivalent to 0 and a corresponding p value greater than 0.05 approves or, in other words, fail to reject the null hypotheses, which indicates that statistically no significant difference exists between the mean squared error obtained with Deep-LSTM and compared models, whereas h = 1 and p < 0.05 approve the rejection of null hypotheses at the 95% significance level. This indicates the fact that there exists a significant difference between the estimated output of compared models. Comparative paired t test statistics (p and h values) for Deep-LSTM with equivalent (in terms of the number of input features) MLANN and empirical models are shown in Table 7. It is observed that in most of the cases h value of 1 and p value less than 0.05 approve the fact that Deep-LSTM-produced estimated outputs are significantly different than equivalent model. However, h value of 0 and p value greater than 0.05 at Jagdalpur indicate that it is not possible to prove that there exists any significant difference between Deep-LSTM-2 and MLANN-2 predictions. Similarly, h = 0 and p greater than 0.05 at Jagdalpur and Ambikapur indicate there is hardly any difference between the performance of Deep-LSTM-4 and MLANN-4 at these locations.

Table 7 Pair t test statistics for Deep-LSTM with corresponding MLANN and empirical models for different data sets

5.2 Akaike information criterion (AIC)

The AIC is widely used for model selection for regression problems [56, 57]. The AIC values are computed with the help of mean squared error (MSE) between observed and estimated evaporations for each model and using following Eq. (13).

$$AIC \, = \, N*{ \log }\left( {MSE} \right) + 2*k$$
(13)

where N = number of observations and k = number of parameters.

The AIC values for Deep-LSTM and corresponding MLANN and empirical models are shown in Table 8. The lower AIC values represent the better models. It is also seen that in all cases AIC values obtained with Deep-LSTM models are lower as compared to other models. The difference in magnitude for different data sets, i.e., Raipur, Jagdalpur and Ambikapur, is due to the difference in the number of observations considered for testing the models.

Table 8 AIC for Deep-LSTM with corresponding MLANN and empirical models for different data sets

6 Conclusion

This study is carried out to assess the potentiality of Deep-LSTM structure for estimation of daily EP losses under different agro-climatic situations with the help of climatic data influencing the evaporation process. The investigation has led to the following conclusions:

  • Both Deep-LSTM and MLANN models are capable of estimating the daily EP with different input combinations.

  • Deep-LSTM models performed better compared to MLANN and empirical models in all scenarios.

  • Statistical inferences based on paired t test and AIC also suggest that the Deep-LSTM models are superior to the MLANN and empirical models for different input combinations.

  • Depending on the availability of the climatic data appropriate Deep-LSTM model can be adopted for estimating daily EP for the stations where direct measurement of evaporation is not done. In future other deep learning-based neural network structures may be applied to predict the nonlinear processes such as evaporation and reference evapotranspiration.