Introduction

Reference evapotranspiration (ETo) is a vital component of the hydrological cycle and plays an important role in the sustainable management of water resources because it accounts for more than two-thirds of rainfall losses (Shiri et al. 2013; Citakoglu et al. 2014; Shiri et al. 2014; Kisi et al. 2015). Therefore, it is necessary that irrigation managers and water researchers are provided with an accurate tool for estimating reference evapotranspiration. Typically, ETo can be measured directly by using the lysimeters or eddy-covariance systems. However, making accurate estimation of ETo is difficult because it is affected by several factors, and the use of lysimeters is very expensive with high construction and maintenance costs, especially in developing countries such as Algeria. Thus, the use of mathematical models (empirical and semi-empirical models based on meteorological variables) is a more suitable approach for practical applications and does not cost any price (Ferreira et al. 2019; Shiri et al. 2019a). The Food and Agriculture Organization recommended Penman-Monteith (FAO-56 PM) model for estimating ETo, by refer to the default grass with a presumed height of 0.12 m, albedo of 0.23, and surface resistance of 70 s m−1 (Allen et al. 1998). The main limitation of using the FAO-56 PM model is the lack of concrete climate data, whereas this model requires a large number of climatic variables (relative humidity, maximum and minimum air temperatures, solar radiation and wind speed) as input data. Unfortunately, in most of the developing countries, these climatic variables are often unavailable or incomplete and this impedes the ETo account using the FAO-56 PM model (Djaman et al. 2017; Banda et al. 2018; Ferreira et al. 2019)

In the last two decades, various models of artificial intelligence (simple or mixed) have been applied in various fields of science and engineering to deal with various scientific issues such as modeling, optimization, and prediction taking into account the ability of artificial intelligence to address the nonlinear relationships between variables (Nourani et al. 2014; Yaseen et al. 2015). ETo prediction received numerous applications of artificial intelligence (AI) models in the last decade such as (Shiri et al. 2014, 2013; Citakoglu et al. 2014; Kisi et al. 2015; Kisi 2016; Feng et al. 2017; Shiri 2017; Landeras et al. 2018; Tao et al. 2018; Wu et al. 2019; Mohammadrezapour et al. 2019; Malik et al. 2019a; Chia et al. 2020). The use of artificial intelligence models allows the identification and selection of inputs (adding or deleting inputs) and finding the appropriate relationship between variables (inputs and the target variable). This option attracts researchers to adapt and formulate an appropriate model using the lowest possible number of datasets considering the high performance of the chosen model into account. Kisi (2007) investigated the artificial neural networks (ANNs) with the Levenberg–Marquardt for estimation of ETo and concluded that they could be successfully utilized for modeling ETo from the available climatic data. Shiri et al. (2014) compared the efficacy of heuristic data-driven models (adaptive neuro-fuzzy inference system, ANNs, gene expression programming and support vector machine) against empirical models (Hargreaves–Samani, Makkink, Priestley–Taylor, and Turc) to estimate ETo on the daily basis in Iran. The comparison results revealed the supremacy of heuristic data-driven models over the empirical models. Shiri (2018) applied wavelet-random forest (WRF) models against mass transfer-based models for daily ETo estimating in southern Iran. The results showed that the WRF model performed superior to the mass transfer-based models. Saggi and Jain (2019) examined the performance of the generalized linear model (GLM), gradient boosting machine (GBM) random forest (RF), and deep learning (DL) for ETo estimation in India. They stated that the best results of daily ETo estimation are those obtained with the DL models.

Recently, ETo estimation received the massive applications of AI models (Shiri et al. 2019b; Tikhamarine et al. 2019a). Ferreira et al. (2019) examined the ANNs and support vector machine (SVM) for estimating daily ETo in Brazil. The comparison results proved the superiority of the ANNs than the SVM model. Granata (2019) applied support vector regression, RF, and regression tree to model daily ETo in Florida and reported that the regression tree performs superior to the other models in estimating ETo using six input variables, namely mean temperature, relative humidity, net solar radiation, wind speed, soil moisture content, and sensible heat flux. Keshtegar et al. (2019) examined multi-layer perceptron neural network, M5 tree, and polynomial chaos expansion models to estimate daily ETo in Turkey. Through the results obtained, the polynomial chaos expansion model provides the best performance compared to other models in the study area. Nourani et al. (2019) applied AI models (support vector regression, adaptive neuro-fuzzy inference system and feed-forward neural network) and empirical models (Hargreaves-Samani, modified Hargreaves-Samani, Makkink, Ritchie, multi-linear regression) to estimate daily ETo in Turkey, Cyprus, Iraq, Iran, and Libya. They found that the best estimates are those given by AI models. Shiri (2019) compared the performance of gene expressions programming (GEP) versus Hargreaves-Samani, Makkink, Trabert, Priestley-Taylor, Dalton, and Turc for daily ETo modeling in Iran. The comparison results indicated the better accuracy of GEP to the other alternatives. Shiri et al. (2019a) used the data splitting strategy with gene expressions programming to model daily ETo in northwestern Iran and reported that, under different scenarios, GEP have led to good results.

For the Algerian sites, ETo estimation suffers from the problem of lack of qualitative and quantitative climatic data, at both the spatial and time scales. Therefore, it would be interesting to know the least demanding methods in terms of the quantity of data and leading to good estimation results. In this context, Heddam et al. (2018) proposed three methods (the on-line and off-line dynamic evolving neural fuzzy inference systems (DENFIS_ON/ DENFIS_OF), and evolving fuzzy neural network) for estimating daily ETo in the north of Algeria. The proposed models were compared with the Penman FAO-56 PM method. The results obtained revealed that the proposed models showed high accuracy compared to the Penman-Monteith model and the DENFIS_OF outperformed the other models. Zakhrouf et al. (2019) examined the ability of subtractive clustering adaptive neuro-fuzzy inference system (SC_ANFIS) and ANFIS based on the fuzzy C-means clustering (FC_ANFIS) for estimation ETo from Dar El Beida Station situated in Algiers, Algeria. The results revealed that the S_ANFIS model significantly outperformed multiple linear regression and F_ANFIS models.

The main objectives of this research are to find robust alternative methods that could achieve accurate estimation of ETo even when available data are very limited such as the minimum and maximum temperatures that are easily determined. The generalized Penman-Monteith model (FAO-56 PM) is the most reliable method for estimating ETo in agricultural and water resource research, as it fits well with evapotranspiration observations (Penman 1948). The FAO-56 PM model was recognized as a standard estimation method of ETo although its application requires a large amount of climate data, which may not be available in certain locations, as is the case in developing countries.

In this context, in the current study, the focus will be on how to eliminate the inputs one by one in four proposed scenarios to overcome the problem of providing all climate data, along with taking into account the highest performance of the developed models in mind. The proposed models (support vector regression with the new heuristic algorithm) were examined for four scenarios using different meteorological data and compared with artificial neural network (ANN) and empirical methods for estimating ETo. The accuracy of the support vector regression depends mostly on the correct identification of parameters that can be considered as an optimization problem. Therefore, in order to select the most suitable initialization for the internal parameters, an advanced nature-inspired optimization algorithm has been functionalized and integrated with the SVR model namely, grey wolf optimizer (GWO). In fact, there are several internal parameters within all the AI models; the optimization algorithm selected should be able to consider the scalability and the dimension of the optimization problem with avoiding the large numbers of optimal local solutions. The main objective behind using new algorithms is to obtain the best solution (best parameters) that achieve robust accuracy in a brief time possible. Therefore, the genetic algorithm (GA), and particle swarm optimizer (PSO) algorithms were integrated with the SVR model and compared with the proposed SVR-GWO model to achieve robust performance in all scenarios through visual inspection and statistical performance criteria.

Materials and methods

Study area and dataset

In the present study, three climatic stations located in the North of Algeria were chosen to test the developed models; Algiers station (latitude 36° 40’ 59” N, longitude 3° 13’ 1.2” E) located in center of North Algeria with an altitude of 25 m above sea level, Tlemcen (35° 1’ 1.2” N, 1° 15’ 9.7” E) located in west of North Algeria at 247 m above sea level, and Annaba (36° 49’ 58.8” N, 7° 49’ 1.2” E) located in east of North Algeria at 4 m above sea level. Figure 1 illustrates the location map of study stations. The observed monthly data of relative humidity (RH), solar radiation (Rs), wind speed (Us), and maximum and minimum air temperatures (Tmax and Tmin) were obtained from meteorological national office (ONM) situated in Algiers, Algeria. The climatic data cover 14 years (168 months) from January 2000 to December 2013 for all study stations. The data were separated into two by utilizing data from January 2000 to October 2009 for training (70%) and data from November 2009 to December 2013 for testing (30%) for all study stations. The descriptive statistical characteristics of the meteorological variables are presented in Table 1 for the Algiers, Tlemcen, and Annaba stations.

Fig. 1
figure 1

Location map of the studied stations

Table 1 Statistical parameters of the meteorological variables at study stations

Four scenarios were compared according to the following combinations of the climatic variables: (M1) Tmax, Tmin, RH, Us, and Rs, (M2) Tmax, Tmin, RH and Rs, (M3) Tmax, Tmin, and Rs and (M4) Tmax and Tmin. Consequently, SVR-GWO-1, SVR-PSO-1, SVR-GA-1, ANN-1, and Valiantzas-1 correspond to the first combination (M1); SVR-GWO-2, SVR-PSO-2, SVR-GA-2, ANN-2, Turc, and Valiantzas-2 correspond to the second scenario (M2), SVR-GWO-3, SVR-PSO-3, SVR-GA-3, ANN-3, Ritchie and Valiantzas-2 correspond to the third scenario (M3); and the last scenario (M4) involves SVR-GWO-4, SVR-PSO-4, SVR-GA-4, ANN-4, and Thornthwaite. Table 2 summarized the details of input variables for hybrid AI and traditional climate-based models.

Table 2 Input combinations for hybrid AI and empirical models

Empirical methods

Penman-Monteith equation (FAO-56 PM)

Due to the absence of experimental reference evapotranspiration around the study stations, the FAO-56 PM model was used as a target for empirical and artificial intelligence models. The Penman method (Allen et al. 1998) can be considered as the most popular equation in evaporation studies and it is accepted as the sole empirical method by the Food and Agriculture Organization of the United Nations (FAO) and very common practice for reference evapotranspiration (Allen et al. 1998). Equation (1) expresses the FAO-56 PM:

$$ {\mathrm{ET}}_o=\frac{0.408\ \Delta \ \left({R}_n-G\right)+\gamma \frac{900}{T+273}\mathrm{U}\left({e}_s-{e}_a\right)}{\Delta +\gamma \left(1+0.34\ \mathrm{U}\right)} $$
(1)

where, ETo is the monthly reference evapotranspiration (mm/month), Rn is the net radiation (MJ/m2/month), : slope of saturation vapor pressure curve (kPa oC−1), G: soil heat flux (MJ/m2/month), es and ea are saturated and actual vapor pressures (kPa), γ: psychrometric constant (kPa oC−1), U is the monthly wind speed at 2 m height (m/s), and T is the mean air temperature (°C).

Valiantzas method

In 2013, Valiantzas proposed empirical equations for modeling ETo (Valiantzas 2013a, b). Valiantzas equations are empirical methods based on the simplification of the Penman FAO-56 equation. The three different versions of Valiantzas methods developed with and without a full set of climatic data can be expressed as:

Valiantzas method using all climatic data (Valiantzas-1)

$$ \mathrm{E}{\mathrm{T}}_0=0.0393{R}_s\sqrt{T+9.5}-0.19{R_s}^{0.6}{\phi}^{0.15}+0.048\left(T+20\right)\left(1-\frac{RH}{100}\right){U}^{0.7} $$
(2)

Valiantzas method without wind speed (Valiantzas-2)

$$ \mathrm{E}{\mathrm{T}}_0=0.0393{R}_s\sqrt{T+9.5}-0.19{R_s}^{0.6}{\phi}^{0.15}+0.078\left(T+20\right)\left(1-\frac{RH}{100}\right) $$
(3)

Valiantzas without relative humidity and wind speed (Valiantzas-3)

$$ \mathrm{E}{\mathrm{T}}_0=0.0393{R}_s\sqrt{T+9.5}-0.19{R_s}^{0.6}{\phi}^{0.15}+0.0061\left(T+20\right){\left(1.12\ T-{T}_{\mathrm{min}}-2\right)}^{0.7} $$
(4)

where, Tmin: the minimum air temperature (°C), Rs: the solar radiation (MJ/m2/month), ϕ: latitude of station (rad), and RH: the mean relative humidity (%).

Turc method

The Turc method (Turc 1961) is a simplified version of the Makkink method (Makkink 1957), and demands mean air temperature, solar radiation, and relative humidity to calculate the reference evapotranspiration. The implementation of the Turc method is expressed as follows:

$$ {\mathrm{ET}}_o=\frac{23.89{R}_s+50}{\uplambda}\ast \frac{T}{T+15}0.0133\ast {a}_T $$
(5)
$$ \mathrm{If}\ \mathrm{RH}\ge 50\%\kern0.5em \mathrm{then}\ {a}_T=1+\frac{50-\mathrm{RH}}{70} $$
(6)
$$ \mathrm{Else},\mathrm{RH}<50\%\mathrm{then}\ {a}_T=1 $$
(7)

where λ is the latent heat of the evaporation (MJ/kg).

Ritchie method

The Ritchie method, suggested by Jones and Ritchie (1990), to calculate the potential atmospheric evaporative demand termed as reference evapotranspiration (ETo) can be described as follows:

$$ \mathrm{E}{\mathrm{T}}_0={\upalpha}_1\left[\ 3.87\ast {10}^{-3}.{R}_s.{\left(0.6{T}_{\mathrm{max}}+0.4{T}_{\mathrm{min}}+29\right)}^1\right] $$
(8)
$$ {\upalpha}_1=1.1\ \mathrm{If}\ 5{}^{\circ}\mathrm{C}<{\mathrm{T}}_{\mathrm{max}}\le 35{}^{\circ}\mathrm{C} $$
(9)
$$ {\upalpha}_1=1.1+0.05\ast \left({\mathrm{T}}_{\mathrm{max}}\hbox{--} 35\right)\ \mathrm{If}\ {\mathrm{T}}_{\mathrm{max}}>35{}^{\circ}\mathrm{C} $$
(10)
$$ \mathrm{Else},{\mathrm{T}}_{\mathrm{max}}<5{}^{\circ}\mathrm{C}\kern1em {\upalpha}_1=0.010\mathrm{exp}\ \left[0.18\ast \left({\mathrm{T}}_{\mathrm{max}}+20\right)\right] $$
(11)

Thornthwaite method

Thornthwaite (Thornthwaite 1948) developed a formula for estimating monthly ETo based only on the mean air temperature and given by:

$$ {\mathrm{ET}}_{\mathrm{tw}}=\left[1.6{\left(\frac{10{T}_a}{I}\right)}^a\right] $$
(12)
$$ I=\sum \limits_{n=1}^{12}{\left(0.20{T}_a\right)}^{1.514} $$
(13)
$$ a=0.49239+6.75\ast {10}^{-7}{I}^3-7.71\ast {10}^{-5}{I}^2+1.7912\ast {10}^{-2}I $$
(14)

where Ta: monthly mean temperature (°C); and, I: annual heat index. The temperature-based methods, in most cases, overestimate or underestimate ETo obtained using the Penman method. Allen et al. (1994) prescribed that empirical models can be calibrated utilizing the FAO-56 PM. ETo is computed as follows:

$$ {{\mathrm{ET}}_{\mathrm{o}}}^{\mathrm{Target}}=\upalpha \ast {\mathrm{ET}}_{\mathrm{tw}} $$
(15)

where α is a calibration factor, EToTarget denotes the FAO-56 PM reference evapotranspiration and ETtw is the ETo estimated by the Thornthwaite formula (Eq. 12).

Machine learning approaches

Artificial neural network (ANN)

Artificial neural network was presented by McCulloch and Pitts (1943). The ANN is a numerical model that imitates the aptitude of the human brain to learn and can accurately learn the complex relationships (Haykin 1998). The multi-layer perceptron (MLP) with training algorithms of Levenberg-Marquardt (LM) is selected to evaluate the ETo for the present study. The MLP is a kind of ANN with three-layer: input layer, hidden layer, and output layer connected to each other with weights and biases. MLP was selected to evaluate ETo due to its common use in the literature (Kisi 2007; Khatibi et al. 2017; Ghorbani et al. 2018). The LM algorithm, which is more effective than the normal gradient descent (Kisi 2007), is utilized to optimize network weights. Figure 2 a shows the proposed ANN structure. The explicit expression for calculating the output value, ETo, using MLP can be given as follows:

Fig. 2
figure 2

a Simple ANN architecture. b Nonlinear support vector regression

$$ {\mathrm{ET}}_o={F}_2\left[{\sum}_{i=1}^m{W}_{kj}\times {F}_1\left({\sum}_{i=1}^n{X}_i{W}_{ji}+{b}_j\right)+{b}_o\right] $$
(16)

The sum of the inputs and their weights lead to a summation operation, which can be expressed as:

$$ {A}_j={\sum}_{i=1}^n{X}_i{W}_{ji}+{b}_j $$
(17)
$$ \mathrm{So};\kern0.5em {ET}_o={F}_2\left[{\sum}_{i=1}^m{W}_{kj}\times {F}_1\left({A}_j\right)+{b}_o\right] $$
(18)
$$ {F}_1\left({A}_j\right)=\frac{1}{1+\exp \left(-{A}_j\right)} $$
(19)

where ETo: the reference evapotranspiration calculated using ANNs, xi is the variables of input layer, wij is the weight connection between input and hidden neuron, wjk is the weight of hidden neuron j to the output neuron k. bj and bo are the bias of hidden and output layers, respectively. F1 is the sigmoid activation function, represented by Eq. (19) and F2 is the activation function for the output neuron. In order to avoid the large numbers of trial and error process, Eq. (20) is used to calculate the number of hidden neurons (Faris et al. 2016; Aljarah et al. 2018; Tikhamarine et al. 2020).

$$ m=2\ast n+1 $$
(20)

where, n is the number of inputs, and m is the neurons number in the hidden layer.

Support vector regression

Support vector machine (SVM) was firstly presented by Vapnik in 1995; the idea of SVM is dependent on statistical learning theory and principle of structural risk minimization (Vapnik 1995). Support vector regression (SVR) is a type of regression model that is developed by Smola (1996). The SVR models were developed to resolve forecasting, prediction, and regression problems by combining regression functions with SVM (Smola and Scholkopf 1998). The main aim of the SVR model is to find a function that has the most ε deviation and has to be a linear as possible for all training data points and from the target vectors (Smola 1996). Figure 2 b gives the structural configuration of the SVR model. The SVR regression function is declared as:

$$ f(x)=w\times \phi (x)+b $$
(21)

where w is the weights vector, b is the bias, and ϕ is the transfer function. For minimization of the regularized risk function and get an appropriate SVR function f (x), the regression problem can be expressed as follows:

$$ \operatorname{Minimize}\ \frac{1}{2}{\left\Vert w\right\Vert}^2+C{\sum}_{i=1}^N\left({\zeta}_i+{\zeta}_i^{\ast}\right) $$
(22)

subject to the condition:

$$ \mathrm{for}\ \mathrm{i}=1\ \mathrm{to}\ \mathrm{l}\left\{\begin{array}{c}{y}_i-w\times \varphi \left({x}_i\right)-b\le \varepsilon +{\xi}_i\\ {}w\times \varphi \left({x}_i\right)+b-{y}_i\le \varepsilon +{\xi}_i^{\ast}\\ {}{\xi}_i,{\xi}_i^{\ast}\ge 0,i=1,2,\dots, N\end{array}\right. $$
(23)

where C is a penalty parameter, ξi and ξ0i are the slack variables corresponding boundary values of ε. After obtaining the optimal conditions with the Lagrangian, a nonlinear regression function can be expressed by the following expression:

$$ f(x)={\sum}_{i=1}^n\left({\alpha}_i-{\alpha}_i^{\ast}\right).K\left(x,{x}_i\right)+b $$
(24)

where αi, αi* ≥ 0 are the Lagrangian multipliers, K(x, xi) denotes the Kernel function. There are several Kernels such as linear, polynomial, radial basis function (RBF), and sigmoid. According to the literature (Khan and Coulibaly 2006; Yin et al. 2017; Ferreira et al. 2019), it was reported that the radial basis kernel gives the most accurate results. Therefore, the RBF kernel was selected as a kernel function in the current study. The RBF is defined as follow:

$$ K\left({x}_i,{x}_j\right)=\mathit{\exp}\left(-\gamma {\left\Vert {x}_i-x\right\Vert}^2\right) $$
(25)

More details regarding SVM and SVR approaches are available in the technical report: Support vector machines for classification and regression (Gunn 1998).

Grey wolf optimizer

The grey wolf optimizer (GWO) is a new swarm intelligence algorithm that has been suggested by Mirjalili et al. (2014). The GWO algorithm has been effectively used and applied to numerous investigations (Faris et al. 2016; Aljarah et al. 2019; Maroufpoor et al. 2019; Tikhamarine et al. 2019b). The fundamental motivation for the GWO algorithm originated from the social chasing of gray wolves in nature. To map the social hierarchy of the gray wolf, the pack of wolves is grouped into four classes, alpha (α) as the best solution, followed by the second beta (ß) and the third solution is the delta (δ), and the rest of population are called omega (ω). The hunting process of gray wolves (victim tracking, encircling and attacking) is designed on a mathematical basis to design the grey wolf optimizer. The social hierarchy and illustration of the position updating mechanism are represented in Fig. 3 a and b. The gray wolf behavior of encircling the prey can be calculated as:

Fig. 3
figure 3

a The social hierarchy of gray wolves. b Illustration of position updating mechanism of ω wolves according to positions of α, β, and δ wolves (Faris et al. 2018)

$$ \overrightarrow{D}=\left|\overrightarrow{C}\ast {\overrightarrow{X}}_P(t)-\overrightarrow{X}(t)\right| $$
(26)
$$ \overrightarrow{X}\left(t+1\right)={\overrightarrow{X}}_P(t)-\overrightarrow{A}\ast \overrightarrow{D} $$
(27)
$$ \overrightarrow{A}=2\overrightarrow{a}\ast {\overrightarrow{r}}_1-\overrightarrow{a} $$
(28)
$$ \overrightarrow{C}=2\ast {\overrightarrow{r}}_2 $$
(29)

where X: vector position of the gray wolf, Xp: vector position of the prey, D is the distance between X and Xp, t is the current iteration number, A and C corresponding component-wise multiplication.

To imitate the hunting conduct of gray wolves, Eqs. (30–34) demonstrate how gray wolves update their locations of α, β, and δ wolves. It is acknowledged that the wolves α, β, and δ are nearest to the prey and draw the rest ω wolves to the prey location. To decide the prey position, the gray wolf population can use the following equations:

$$ {\overrightarrow{D}}_{\alpha }=\left|{\overrightarrow{C}}_1\cdotp {\overrightarrow{X}}_{\alpha }(t)-\overrightarrow{X}\right| $$
(30)
$$ {\overrightarrow{D}}_{\beta }=\left|{\overrightarrow{C}}_2\cdotp {\overrightarrow{X}}_{\beta }(t)-\overrightarrow{X}\right| $$
(31)
$$ {\overrightarrow{D}}_{\delta }=\left|{\overrightarrow{C}}_3\cdotp {\overrightarrow{X}}_{\delta }(t)-\overrightarrow{X}\right| $$
(32)
$$ {\overrightarrow{X}}_1={\overrightarrow{X}}_{\alpha }(t)-{\overrightarrow{A}}_1\ast {\overrightarrow{D}}_{\alpha } $$
(33)
$$ {\overrightarrow{X}}_2={\overrightarrow{X}}_{\beta }(t)-{\overrightarrow{A}}_2\ast {\overrightarrow{D}}_{\beta } $$
(34)
$$ {\overrightarrow{X}}_3={\overrightarrow{X}}_{\delta }(t)-{\overrightarrow{A}}_3\ast {\overrightarrow{D}}_{\delta } $$
(35)

The obtained positions from Eqs. (33–35) are used to change the next position of wolves by Eq. (36):

$$ \overrightarrow{X}\left(t+1\right)=\frac{{\overrightarrow{X}}_1+{\overrightarrow{X}}_2+{\overrightarrow{X}}_3}{3} $$
(36)

Where \( \overrightarrow{X}\left(t+1\right) \) is the next iteration position. Finding a new location for the leading wolves using Eq. (36) forces the Omega wolves to update their positions to converge with prey.

Particle swarm optimization

The particle swarm optimization (PSO) algorithm is a swarm intelligence algorithm proposed by Kennedy and Eberhart (1995). The PSO algorithms mimic the behavior of birds in nature and can be considered as one of the most popular algorithms in the metaheuristic’s literature. Owing to its performance to find an optimal solution, PSO has been effectively utilized for solving optimization problems and find optimal solutions based on two main parameters: position (x) and velocity (V). The velocity could be updated according to the following equation:

$$ {V}_i\left(k+1\right)=w{V}_i(k)+{r}_1{c}_1\Big({x}_{{\mathrm{pbest}}_i}-{x}_i(k)+{r}_2{c}_2\left({x}_{{\mathrm{gbest}}_i}-{x}_i(k)\right) $$
(37)

where pbest is the best position of the ith particle, and gbest is the global best value attained by the different particles.

The xi(k) is the position of the particle at time step (k), Vi(k) is the velocity of the particle i at the time (k), w is the coefficient of inertia, r1 and r2 are random coefficients, C1 and C2 are the acceleration coefficients and Vi(k + 1) is the newly updated velocity. The value of w can be determined as follows (Kennedy and Eberhart 1995):

$$ w={w}_{\mathrm{max}}-\frac{w_{\mathrm{max}}-{w}_{\mathrm{min}}}{{\mathrm{iter}}_{\mathrm{max}}}.\mathrm{iter} $$
(38)

where wmin is the minimum weight and wmax is the maximum weight, iter is the iteration number and the itermax is the maximum iteration number. The particles are transformed into their new locations using the following equation:

$$ {x}_i\left(k+1\right)={x}_i(k)+{V}_i\left(k+1\right) $$
(39)

Genetic algorithm

The genetic algorithm (GA) is an evolutionary algorithm dependent on direct similarity of Darwinian natural genetics and selection in biological systems (Goldberg 1989; Holland 1992). This algorithm mimics a technique of natural selection and genetics for finding solutions to a problem based on employing three major operators to optimize chromosomes in each generation, such as selection, crossover, and mutation that recombined and modified chromosomes. In this operation, each of the new individuals makes genetic changes between the two individuals. A mutation factor is utilized to change chromosomes and convert genes to make decent diversity. Crossover is considered based on the following relationships:

$$ {\mathrm{Pop}}_i^{\mathrm{new}}={\alpha \mathrm{Pop}}_i^{\mathrm{old}}+\left(1-\alpha \right){\mathrm{Pop}}_j^{\mathrm{old}} $$
(40)
$$ {\mathrm{Pop}}_j^{\mathrm{new}}={\alpha \mathrm{Pop}}_j^{\mathrm{old}}+\left(1-\alpha \right){\mathrm{Pop}}_i^{\mathrm{old}} $$
(41)

where α was a random value, \( {\mathrm{Pop}}_i^{\mathrm{new}} \)was the ith child, \( {\mathrm{Pop}}_i^{\mathrm{old}} \) was the ith parent, \( {\mathrm{Pop}}_j^{\mathrm{new}} \)was the jth parent, \( {\mathrm{Pop}}_j^{\mathrm{old}} \)was the jth child. The mutation is based on Eq. (42):

$$ {\mathrm{Pop}}_{j,i}^{\mathrm{new}}={\mathrm{Var}}_{j,i}^{\mathrm{low}}+\beta \left({\mathrm{Var}}_{j,i}^{\mathrm{hi}}-{\mathrm{Var}}_{j,i}^{\mathrm{low}}\right) $$
(42)

where β is a random value from 0 to 1, \( {\mathrm{Pop}}_{j,i}^{\mathrm{new}} \)is the new gene ith in the jth chromosome, \( {\mathrm{Var}}_{j,i}^{\mathrm{hi}} \)is the upper limit of the ith gene in the jth chromosome, \( {\mathrm{Var}}_{j,i}^{\mathrm{low}} \)is the lower limit of the ith gene in the chromosome jth.

Hybrid SVR models

Usually, the SVR models achieve high performance in modeling linear and nonlinear relationships. Nevertheless, the accuracy of the support vector regression is dependent on the correct selection of parameters (C, ε, and γ). These three parameters are known to vary in a very wide range and to significantly affect the accuracy of the SVR. According to the literature, there is no fixed rule to select these parameters; thus, finding optimal parameters is computationally hard and can be considered as an optimization problem. Therefore, we have applied hybrid optimization algorithms (GWO, PSO, and GA) for this issue. Further, the initial parameters of the proposed algorithm are presented in Table 3. In the current study, the SVR model was implemented utilizing the LIBSVM software (version 3.23) promoted by Chang and Lin (2011). Furthermore, the RBF kernel was selected and ε-SVR was executed. The flowchart of the proposed hybrid SVR-GWO is shown in Fig. 4.

Table 3 Initial parameters of the GWO, GA, and PSO
Fig. 4
figure 4

The proposed architecture of grey wolf optimizer with SVR methodology

Performance evaluation indicators

The performance of empirical methods and developed artificial intelligence models was evaluated in this study with respect to the root mean squared error (RMSE), Nash–Sutcliff efficiency (NSE), Pearson correlation coefficient (PCC), and Willmott Index (WI), which can be expressed as:

  1. I.

    Root mean squared error (Malik and Kumar 2015; Malik et al. 2019b; Adnan et al. 2019):

$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}{\sum}_{i=1}^n{\left(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}(i)-\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}(i)\right)}^2} $$
(43)
  1. II.

    Nash-Sutcliffe efficiency (Nash and Sutcliffe 1970):

$$ \mathrm{NSE}=1-\frac{\sum_{i=1}^n{\left(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}(i)-\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}(i)\right)}^2}{\sum_{i=1}^n{\left(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}(i)-\overline{\Big(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}}\right)}^2} $$
(44)
  1. III.

    Pearson correlation coefficient (Malik et al. 2017a, b; Pham et al. 2019):

$$ \mathrm{PCC}=\frac{\frac{1}{n}{\sum}_{i=1}^n\left(\left(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}(i)-\overline{\Big(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}}\right)\left(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}(i)-\overline{\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}}\right)\right)}{\sqrt{\frac{1}{n}{\sum}_{i=1}^n{\left(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}(i)-\overline{\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}}\right)}^2}+\sqrt{\frac{1}{n}{\sum}_{i=1}^n{\left(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}(i)-\overline{\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}}\right)}^2}} $$
(45)
  1. IV.

    Willmott Index (Willmott 1981; Malik et al. 2019c):

$$ \mathrm{WI}=1-\left[\frac{\sum_{i=1}^n{\left(\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}(i)-\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}(i)\right)}^2}{\sum_{i=1}^n{\left(\left|\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}(i)-\overline{\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}}\right|+\left|\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}(i)-\overline{\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}}\right|\right)}^2}\right] $$
(46)

where ETo_obt is the monthly reference evapotranspiration obtained through the FAO-56 PM, ETo_est is the estimated monthly reference evapotranspiration using other applicable models, n is the number of data points, and \( \overline{\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{obt}}} \) and \( \overline{\mathrm{E}{\mathrm{T}}_o{\_}_{\mathrm{est}}} \) are the mean values of obtained and estimated ETo, respectively. A model with a lower RMSE value and a higher R, Nash, and WI can be considered as the best model for the estimation of ETo.

Results and discussion

The developed hybrid SVR-GWO model was implemented with different combinations of input variables and compared with SVR-PSO, SVR-GA, ANN, and traditional empirical climate-based models (Valiantzas-1, Valiantzas-2, Valiantzas-3, Turc, Ritchie, and Thornthwaite) using training and testing (November 2009 to December 2013) datasets at Algiers, Tlemcen, and Annaba stations. Tables 4, 5, and 6 summarize the overall performance of all applied models (hybrid AI and climate-based models) during the testing period for estimation of monthly ETo of the three climatic stations. In the tables, “5-11-1” indicates a neural network structure having 5, 11, and 1 neurons in the input, hidden, and output layers, respectively. The comparison results revealed that the SVR combined with the GWO algorithm outperformed SVR-GA, SVR-PSO, ANN, and empirical models whatever the station and the input variables used. So, in terms of the optimization algorithms, GWO seems to be the most efficient algorithm leading to the most accurate hybrid models, whereas, GA and PSO algorithms give almost the same performance as the empirical models. SVR-GWO-1 model improved the RMSE accuracy of SVR-GA-1, SVR-PSO-1, ANN-1, and Valiantzas-1 by 49%, 54%, 43%, and 65% in Algiers Station, by 81%, 55%, 73%, and 49% in Tlemcen Station, and by 70%, 85%, 83%, and 86% in Annaba Station, respectively. Similarly, SVR-GWO with other input combinations (models 2, 3, and 4) also surpassed the corresponding SVR-GA, SVR-PSO, ANN, and empirical models by producing the least RMSE and the highest NSE, PCC, and WI values during the testing period in all stations.

Table 4 RMSE, NSE, PCC, and WI values during testing period of SVR-GWO, SVR-GA, SVR-PSO, ANN, Valiantzas, Turc, Ritchie, and Thornthwaite models at Algiers Station
Table 5 RMSE, NSE, PCC, and WI values during testing period of SVR-GWO, SVR-GA, SVR-PSO, ANN, Valiantzas, Turc, Ritchie, and Thornthwaite models at Tlemcen Station
Table 6 RMSE, NSE, PCC, and WI values during testing period of SVR-GWO, SVR-GA, SVR-PSO, ANN, Valiantzas, Turc, Ritchie, and Thornthwaite models at Annaba Station

Furthermore, a reduction in prediction accuracy (NSE) on average about 0.06% has been achieved by the SVR-GWO hybrid models with five input variables regarding those with four input variables and about 4% compared to those with three input variables, which indicates that reliable ETo estimation can be obtained with only temperature and solar radiation as input variables to the SVR-GWO. It should be also noted that more important differences have been obtained with the other hybrid AI models and empirical models. This research complements the previous work carried out by the authors (Tikhamarine et al. 2019a) which emphasized the overall significance and the practical importance of the GWO algorithm in enhancing the prediction capability of AI models.

The obtained results have well agreement with those of previous studies (Huang et al. 2019; Keshtegar et al. 2019; Nourani et al. 2019; Valipour et al. 2019; Wu et al. 2019; Malik et al. 2019a) which they all reported the application of hybrid AI models and their superior accuracy compared to empirical methods for ETo estimation in different climates over the globe. The other important information which can be derived from Tables 4, 5, and 6 that the accuracy of the models considerably increases by adding more variable; for example, in the Algiers, Tlemcen, and Annaba stations, the accuracy of SVR-GWO model increases by 63%, 66%, and 58% including Rs variable in input (SVR-GWO-2), by 57%, 48%, and 90% including RH variable in input (SVR-GWO-3), 43%, 61%, and 32% including Us variable in input (SVR-GWO-4), respectively. The estimation results of ETo using the empirical models for the three stations have generally revealed that reliable estimates can be achieved by the radiation-based models, i.e., Valiantzas, Ritchie, and Turc models. In fact, the radiation-based models used in this study were found to have better performances than the temperature-based model (Thornthwaite); they increased the prediction accuracy by up to 30% in all the stations. This could be due to the fact that the stations are located in a semi-arid Mediterranean climate of mild cold winter and hot dry summer, where atmospheric conditions other than temperature are more favorable to evaporation and transpiration. In another hand, the temperature-based Thornthwaite model gives more accurate estimates in Annaba and Tlemcen stations than those obtained in Algiers station. This result is probably due to the differences noticed in Algiers station between the training and testing conditions especially the temperatures which influence the testing results. Lower correlation between Tmin/Tmax and ETo (0.769/0.867) for Algiers compared to those of Tlemcen (0.777/0.871) and Annaba (0.828/0.915) might be another reason for this difference.

Figures 5, 6, and 7 a to d show the temporal distribution and scatter plots of the FAO-56 PM and estimated monthly ETo of SVR-GWO, SVR-PSO, SVR-GA, ANN, Valiantzas, Turc, Ritchie, and Thornthwaite models during the testing period of three stations. It was noticed from the figures that the SVR-GWO-1 closely follows the corresponding FAO-56 PM ETo values and less scattered estimates compared to other methods. It is also clear that the empirical models cannot catch the target values and produce less accurate results than the hybrid AI models. For example, for the Algiers Station, the coefficient of determination (R2) of the SVR-GWO-1, SVR-PSO-1, SVR-GA-1, ANN-1, and Valiantzas-1 models varies from 0.9955 to 0.9884 (Fig. 5a); the R2 of the SVR-GWO-2, SVR-PSO-2, SVR-GA-2, ANN-2, Turc, and Valiantzas-2 models slightly decreases and varies from 0.9866 to 0.9744 (Fig. 5b); the R2 of the SVR-GWO-3, SVR-PSO-3, SVR-GA-3, ANN-3, Ritchie and Valiantzas-3 models varies from 0.9290 to 0.8987 (Fig. 5c); and the R2 of the SVR-GWO-4, SVR-PSO-4, SVR-GA-4, ANN-4, and Thornthwaite models varies from 0.5287 to 0.5771 (Fig. 5d), respectively. These results indicate that more input variables provide better efficiency in the ETo estimation. The worst results are those with only temperatures (max and min) as inputs; however, with three input variables adding only the solar radiation Rs to the minimum and maximum temperatures as input variables, the accuracy of the models increases considerably. This result is in great agreement with those obtained with the empirical methods which indicate the superiority of the radial-based models on the temperature-based one, so all AI-hybrid models including solar radiation have led to the best results.

Fig. 5
figure 5

Comparison of SVR-GWO, SVR-GA, SVR-PSO, ANN, and empirical models in estimating FAO-56 PM ETo at Algiers station during testing phase (a Model-1, b Model-2, c Model-3, and d Model-4)

Fig. 6
figure 6

Comparison of SVR-GWO, SVR-GA, SVR-PSO, ANN, and empirical equations in estimating FAO-56 PM ETo in Tlemcen station (a Model-1, b Model-2, c Model-3, and d Model-4)

Fig. 7
figure 7

Comparison of SVR-GWO, SVR-GA, SVR-PSO, ANN, and empirical equations in estimating FAO-56 PM ETo in Annaba station (a Model-1, b Model-2, c Model-3, and d Model-4)

With regard to the performance of the applied models, the hierarchical performance for Algiers Station follows the order: SVR-GWO-1 > ANN-1 > SVR-GA-1 > SVR-PSO-1 > Valiantzas-1; SVR-GWO-2 > SVR-GA-2 > ANN-2 > Valiantzas-2 > SVR-PSO-2 > Turc; SVR-GWO-3 > SVR-PSO-3 > SVR-GA-3 > ANN-3 > Ritchie > Valiantzas-3; and SVR-GWO-4 > SVR-PSO-4 > SVR-GA-4 > ANN-4 > Thornthwaite, respectively. This classification slightly differs for the other two stations, but with remarkable superiority of the hybrid model SVR-GWO compared to the other hybrid AI models or the empirical models.

Taylor diagram was utilized to evaluate the performance of implemented models which can be summarized by the multiple aspects like standard deviation (SD), RMSE, and coefficient of correlation (COC) in a single frame through the polar plot. Figures 8 a–d, 9 a–d, and 10 a–d show the Taylor diagrams of the observed and estimated monthly ETo values for the three climatic stations, by using the SVR-GWO, SVR-PSO, SVR-GA, ANN, Valiantzas, Turc, Ritchie, and Thornthwaite models during the testing period, respectively. It was observed from Fig. 8 a–d that the estimates of SVR-GWO models are close to the observed ETo values with the least RMSE and SD, and the highest COC at Algiers station. It can be noticed that the same performances have been observed for the other two stations (Tlemcen and Annaba).

Fig. 8
figure 8

Taylor diagram of observed and estimated monthly ETo values by SVR-GWO, SVR-GA, SVR-PSO, ANN, and empirical models during testing period at Algiers station (a Model-1, b Model-2, c Model-3, and d Model-4)

Fig. 9
figure 9

Taylor diagram of observed and estimated monthly ETo values by SVR-GWO, SVR-GA, SVR-PSO, ANN, and empirical models during testing period at Tlemcen station (a Model-1, b Model-2, c Model-3, and d Model-4)

Fig. 10
figure 10

Taylor diagram of observed and estimated monthly ETo values by SVR-GWO, SVR-GA, SVR-PSO, ANN, and empirical models during testing period at Annaba station (a Model-1, b Model-2, c Model-3, and d Model-4)

Feng et al. (2017) employed two machine learning methods, random forests (RF), and generalized regression neural networks (GRNN), for modeling ETo of two stations in China using inputs of maximum/minimum air temperature, relative humidity, solar radiation, and wind speed. They obtained NSE for the RF and GRNN as 0.978 and 0.971 for the Chengdu Station and 0.987 and 0.982 for the Nanchong station, respectively. Khosravi et al. (2019) estimated ETo of Baghdad and Mosul stations in Iraq using Nine models, including five data mining algorithms, i.e., M5P, random forest (RF), random tree (RT), reduced error pruning tree (REPT), and Kstar, and four neuro-fuzzy systems (ANFIS, ANFIS-differential evolution, ANFIS-genetic algorithm, and ANFIS-imperialistic competitive algorithm) with sunshine hours, maximum and minimum temperatures, and relative humidity inputs and they obtained NSE for the optimal M5P, RF, RT, REPT, Kstar, ANFIS, ANFIS-DE, ANFIS-GA, and ANFIS-ICA as 0.93, 0.94, 0.85, 0.91, 0.95, 0.90, 0.94, 0.95, and 0.94, respectively. In the current work, the NSE values of the best SVR-GWO model are 0.995, 0.999, and 0.999 for the Algiers, Tlemcen, and Annaba stations, respectively. This comparison recommends the usefulness of the hybrid SVR-GWO method in estimating ETo.

Overall results indicated that the SVR-GWO model can better analyze the relationship between ETo and other climate variables than the SVR-PSO, SVR-GA, ANN, and empirical methods. Since there are a large number of local optimal solutions in search spaces such as these, the search is more difficult and solutions tend to fall into local solutions, the movement of search agents to the global optimal level is very important. This requires to need a powerful algorithm to avoid this problem and ultimately find global optimization. The appropriate balance between exploration and exploitation by adopting an effective mechanism, such as a decrease during iterations (Fig. 3), ensures avoidance of large numbers of local solutions. This is one of the most important characteristics of the GWO algorithm that allows it to achieve the best global solution and allows the SVR-GWO model to perform better. Moreover, the GWO algorithm contains only one parameter (A) that has to be adapted (Table 3) compared to PSO and GA, which have several parameters that need to be adapted (C1, C2, wmax, wmin for PSO and mutation, crossover for GA algorithm). With a large number of algorithm parameters, search space and optimization process become more difficult either. The SVR-GWO can be considered as an alternative tool for estimating reference evapotranspiration regarding the availability of the data.

Conclusion

The main purpose of this study was to investigate the potential ability of hybrid SVR-GWO model for estimating monthly reference evapotranspiration (ETo) using climatic variables at three stations: Algiers, Tlemcen, and Annaba located in the north of Algeria. The SVR-GWO performance was compared with those of SVR-PSO, SVR-GA, ANN, and traditional climate-based models (Turc, Ritchie, Thornthwaite, and three versions of Valiantzas methods). According to the obtained results, the following conclusions have been extracted from this investigation:

  1. i.

    The artificial intelligence models (i.e., SVR-GWO, SVR-PSO, SVR-GA, ANN) exhibited better performance compared with traditional empirical methods (i.e., Valiantzas, Turc, Ritchie, and Thornthwaite) at all studied stations.

  2. ii.

    The SVR-GWO model had better accuracy than the other models in all scenarios at Algiers, Tlemcen, and Annaba stations.

  3. iii.

    The SVR-GWO with five inputs variable (Tmax, Tmin, Rs, Us, and RH) exposed the feasible model in estimation ETo.

  4. iv.

    The efficiency of the GWO algorithm also found to be better than the PSO and GA algorithms.

  5. v.

    The traditional empirical methods used in this study, except the Thornthwaite model, can provide reliable ETo estimates. Moreover, it should be highlighted that the Valiantzas methods showed better performance over the other traditional methods (Turc, Ritchie, and Thornthwaite) at the study stations and can be considered as good alternatives for ETo estimation in these regions.

Three stations were used in this study, and in future work, more stations will be considered from different locations in the world to draw generalized conclusions about the performance of hybrid artificial intelligence models.