Support-Vector-Machine-Based Models for Modeling Daily Reference Evapotranspiration With Limited Climatic Data in Extreme Arid Regions

Wen, Xiaohu; Si, Jianhua; He, Zhibin; Wu, Jun; Shao, Hongbo; Yu, Haijiao

doi:10.1007/s11269-015-0990-2

Support-Vector-Machine-Based Models for Modeling Daily Reference Evapotranspiration With Limited Climatic Data in Extreme Arid Regions

Published: 11 April 2015

Volume 29, pages 3195–3209, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Water Resources Management Aims and scope Submit manuscript

Support-Vector-Machine-Based Models for Modeling Daily Reference Evapotranspiration With Limited Climatic Data in Extreme Arid Regions

Download PDF

Xiaohu Wen¹,
Jianhua Si¹,
Zhibin He¹,
Jun Wu²,
Hongbo Shao^3,4 &
…
Haijiao Yu¹

1057 Accesses
115 Citations
Explore all metrics

Abstract

Evapotranspiration is a major factor that controls hydrological process and its accurate estimation provides valuable information for water resources planning and management, particularly in extremely arid regions. The objective of this research was to evaluate the use of a support vector machine (SVM) to model daily reference evapotranspiration (ET₀) using limited climatic data. For the SVM, four combinations of maximum air temperature (T _max), minimum air temperature (T _min), wind speed (U ₂) and daily solar radiation (R _s) in the extremely arid region of Ejina basin, China, were used as inputs with T _max and T _min as the base data set. The results of SVM models were evaluated by comparing the output with the ET₀ calculated using Penman–Monteith FAO 56 equation (PMF-56). We found that the ET₀ estimated using SVM with limited climatic data was in good agreement with those obtained using the conventional PMF-56 equation employing the full complement of meteorological data. In particular, three climatic parameters, T _max, T _min, and R _s were enough to predict the daily ET₀ satisfactorily. Moreover, the performance of SVM method was also compared with that of artificial neural network (ANN) and three empirical models including Priestley-Taylor, Hargreaves, and Ritchie. The results showed that the performance of SVM method was the best among these models. This offers significant potential for more accurate estimation of the ET₀ with scarce data in extreme arid regions.

A novel hybrid AIG-SVR model for estimating daily reference evapotranspiration

Article 12 April 2023

Modeling daily reference evapotranspiration using SVR machine learning algorithm with limited meteorological data in Dar-el-Beidha, Algeria

Article 29 May 2023

Artificial Intelligence Based and Linear Conventional Techniques for Reference Evapotranspiration Modeling

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Evapotranspiration (ET) mainly controls several hydrological processes and its accurate estimation provides valuable information for water resources planning and management (Tabari et al. 2012), particularly in the arid area (Laaboudi et al. 2012).

The ET quantification, however, must be preceded by the determination of reference evapotranspiration (ET₀) (López-Urrea et al. 2006). A great number of empirical equations have been developed for estimating ET₀ using meteorological data. The Penman-Monteith FAO-56 combination equation (PMF-56) has been recommended by the Food and Agriculture Organization of the United Nations (FAO) as the standard equation for estimating ET₀. The PMF-56 equation is a physically based method, which requires a number of climatic parameters such as daily maximum temperature and minimum temperature, solar radiation, relative humidity, and wind speed. However, records for such weather variables are often incomplete or not always available for many locations so that the application of the PMF-56 model is limited (Cobaner 2011).

Evapotranspiration is an open, nonlinear, dynamic and complex system; therefore, it is difficult to derive an accurate formula to represent all the physical processes involved. As an alternative to traditional techniques, artificial neural networks (ANN) are highly appropriate for the modeling of non-linear processes. Many researchers have applied ANN to estimate ET₀ (Chauhan and Shrivastava 2008; Laaboudi et al. 2012; Citakoglu et al. 2013; Kisi and Cengiz 2013; El-Shafie et al. 2014; Rahimikhoob 2014). These studies revealed that ANN models were superior in estimating ET₀ than conventional methods such as regression and empirical equations. However, ANN have some disadvantages such as training slowly, requiring a large amount of training data, and easily getting stuck in a local minimum (Principe et al. 2000). Support vector machine (SVM), which is a novel learning machine based on statistical learning theory and a structural risk minimization principle, can be used for nonlinear system modeling (Vapnik 1995). Compared with ANN, SVM provides more reliable and better performance under the same training conditions (He et al. 2014). In last decade, SVM models have been extended to a wide range of hydrological problems (Raghavendra. N and Deka 2014).

Recently, some scientists began to use SVM for ET₀ modeling. Kisi and Cimen (2009) studied the potential of SVM in modeling ET₀ in central California, USA. Kisi (2012) examined the performances of least square support vector machine (LSSVM) in the modeling of ET₀. Tabari et al. (2012) examined the potential SVM for estimating ET₀ in a semi-arid highland environment in Iran. Lin et al. (2013) developed SVM models for daily pan evaporation estimation and compared it with ANN models. These studies showed that SVM could be used to estimate ET₀, with relatively superior performance to ANN and empirical equations in modeling ET₀. Although SVM has excellent features, there are still limited studies using SVM in modeling ET₀ research, particularly in the extremely arid regions with limited daily climatic data.

Ejina basin, located in the lower reach of Heihe River, northwestern China (Fig. 1), is one of the most arid regions in the world. Water resource is a main controlling factor in economic development and ecological environment protection. However, the region is limited in water resources with a mean annual precipitation of 42 mm. The Heihe River is the only runoff flow through the area. In the1950s, the annual discharge of the Heihe River into the Ejina Basin was about 12 × 10⁸ m³; however, it was less than 7 × 10⁸ m³ in the 1990s. The accurate determination of ET₀ is helpful to understand water balance in the extremely arid region and to determine the actual ecological water demand of ecosystem in the Ejina basin to serve as a reference for future water needs (Hou et al. 2010). Hence, a well performed model to improve daily ET₀ estimation is always an important task to determine the actual ecological water demand and improve water use efficiency in the area (Hou et al. 2010). Generally, as a developing area, it is more difficult to collect sufficient daily meteorological data in such extreme regions for ET₀ estimation.

The main objective of this study was to investigate the accuracy of SVM models for estimating daily ET₀ using various combinations of daily meteorological data including maximum air temperature (T _max), minimum air temperature (T _min), wind speed (U ₂) and solar radiation (R _s) in extremely arid environment of Ejina basin, northwestern China. In addition, the performances of the SVM models were compared with those of the ANN and three empirical models including Priestley–Taylor, Hargreaves and Ritchie equations to further test the SVM performance.

2 Materials and Methods

2.1 Penman-Monteith FAO 56 (PMF-56) Equation

In this paper, PMF-56 was used to provide the SVM targets to train and test the SVM models. As the sole standard method for the computation of ET₀ when no measured lysimeter data are available, PMF-56 method is described by Allen et al. (1998):

$$ E{T}_{0-PMF-56}=\frac{0.408\varDelta \left({R}_n-G\right)+\gamma \frac{900}{T_{mean}+273}{U}_2\left({e}_s-{e}_a\right)}{\varDelta +\gamma \left(1+0.34{U}_2\right)} $$

(1)

where ET_0-PMF-56 is the reference evapotranspiration (mm day⁻¹); Rn is the net radiation (MJ m⁻² day⁻¹); G is the soil heat flux (MJ m² day⁻¹); γ is the psychrometric constant (kPa °C⁻¹); e _s is the saturation vapor pressure (kPa); e _a is the actual vapor pressure (kPa); _Δ is the slope of the saturation vapor pressure-temperature curve (kPa °C⁻¹); T _mean is the average daily air temperature (°C); and U ₂ is the mean daily wind speed at 2 m (m s⁻¹). The computation of all data required for calculating ET₀ followed the method and procedure given in Chapter 3 of FAO-56 (Allen et al. 1998).

2.2 Hargreaves Equation

Hargreaves and Samani (1985) presented a formula for the estimation of reference evapotranspiration when daily weather data is limited or missing. The equation has the form:

$$ E{T}_{0- Hargreaves}=0.0023{R}_a\left(\frac{T_{\max }+{T}_{\min }}{2}+17.8\right)\sqrt{T_{\max }-{T}_{\min }} $$

(2)

where ET_0-Hargreaves is the reference evapotranspiration (mm day⁻¹); R _a is the water equivalent of the extraterrestrial radiation (mm day⁻¹) computed according to Allen et al. (1998).

2.3 Ritchie Equation

Ritchie equation was described by Jones and Ritchie (1990):

$$ E{T}_{0- Ritchie}={\alpha}_1\left[3.87\times {10}^{-3}{R}_s\left(0.6{T}_{\max }+0.4{T}_{\min }+29\right)\right] $$

(3)

where ET_0-Ritchie is the reference evapotranspiration (mm d⁻¹); R _s is the solar radiation (MJ m⁻² d⁻¹); and α₁ is defined as follows:

$$ 5 < {T}_{max}\le 35\ {}^{\circ}\mathrm{C}\kern1.5em {\alpha}_1 = 1.1 $$

$$ {T}_{max}>35\ {}^{\circ}\mathrm{C}\kern2.75em {\alpha}_1 = 1.1 + 0.05\left({T}_{max}\hbox{--}\ 35\right) $$

$$ {T}_{max}<5\ {}^{\circ}\mathrm{C}\kern3em {\alpha}_1 = 0.01 \exp\ \left[0.18\left({T}_{max} + 20\right)\right] $$

2.4 Priestley and Taylor Equation

Priestley and Taylor equation (Priestley and Taylor 1972) for computing ET₀ value is expressed as:

$$ E{T}_{0- Priestley- Taylor}=\frac{\alpha }{\lambda}\frac{\varDelta }{\varDelta +\gamma}\left({R}_n-G\right)\;\alpha =1.26 $$

(4)

Where ET_{0-Priestley-Taylor} is the reference evapotranspiration (mm day⁻¹); α is empirical coefficient; and λ is latent heat of the evaporation (MJ/Kg).

Empirical equations are usually developed using local-related data, Allen et al. (1994) recommended that empirical equations should be calibrated using PMF-56 method. Calibrated ET₀ is calculated as

$$ E{T}_0=a+b\times E{T}_{method} $$

(5)

where ET₀ is the reference evapotranspiration defined by PMF-56 method, ET _method represents the evapotranspiration estimated by the evaluated empirical models, and a and b are the regression constants.

2.5 Support Vector Machine (SVM)

Support vector machine (SVM), which is a supervised learning model based on statistical learning theory introduced by Vapnik (1995). Generally, support vector regression (SVR) is used to describe regression with SVM. Here, we only show a brief introduction of SVR, while detailed principles and algorithms of SVM can be found in Müller et al. (1997).

In SVM, the basic idea is to map the data x into a high dimensional feature space via a nonlinear mapping π and to do linear regression in this space (Boser et al. 1992; Vapnik 1995).

The regression estimation with SVR is to estimate a function according to a given data set {(x _i, y _i)} ⁿ_i , where x _i denotes the input vector; y _i denotes the output value and n is the total number of data sets.

In SVM, the regression function is approximated by the following function:

$$ f(x)=\omega \cdot \phi (x)+b $$

(6)

where ω is a weight vector, and b is a bias. $ \pi $ (x) denotes a nonlinear transfer function that maps the input vectors into a high-dimensional feature space in which theoretically a simple linear regression can cope with the complex nonlinear regression of the input space.

The coefficients ω and b can be estimated by minimizing the following regularized risk function:

$$ {R}_{reg}(f)=C\frac{1}{n}{{\displaystyle \sum_{i=1}^N{L}_{\varepsilon}\left(f\left({x}_i\right),{y}_i\right)+\frac{1}{2}\left\Vert \omega \right\Vert}}^2 $$

(7)

$$ L\left(f(x),y\right)=\left\{\begin{array}{l}\left|f(x)-y\right|-\varepsilon \\ {}0\end{array}\right.\kern0.24em \begin{array}{c}\hfill \kern0.24em for\left|f(x)-y\right|\ge \varepsilon \hfill \\ {}\hfill otherwise\hfill \end{array} $$

(8)

where C is a positive constant named penalty parameter, L _ε(f(x _i), y _i) is called ε-insensitive loss function that measures the empirical risk of the training data; (1/2)||ω||² is the regularization term; ε is the tube size of SVM.

Finally, a nonlinear regression function is obtained using the following expression

$$ f(x)={\displaystyle \sum_{i=1}^l\left({\alpha}_i-{\alpha}_i^{*}\right)k\left({x}_i,x\right)+b} $$

(9)

where α _i and α _i ^*are the introduced Lagrange multipliers. With the utilization of the Karush-Kuhn-Tucker (KKT) conditions, only a limited number of coefficients will not be zero among α _i and α _i ^*. The related data points could be referred to the support vectors. k(x _i,x) refers to kernel function describes the inner product in the D-dimension feature space.

$$ k\left({x}_i,x\right)={\displaystyle \sum_{i=1}^D{\phi}_j\left({x}_i\right){\phi}_i(x)} $$

(10)

It can be shown that any symmetric kernel function k satisfying Mercer’s condition corresponds to a dot product in some feature space (Boser et al. 1992). In this paper, radius basis function (RBF) is selected as the kernel function. The RBF is defined as following:

$$ k\left({x}_i,x\right)= \exp \left(-\gamma {\left\Vert x{}_i-x\right\Vert}^2\right),\lambda >0 $$

(11)

There are three parameters while using RBF kernels such as penalty parameter C, error exceeding ε and kernel function’s parameter γ. The general performance of SVM models depends on a proper setting of these parameters. In this study, C, ε and γ were determined through grid-search algorithm with cross-validation as described by Hus et al. (2010), SVM algorithms were developed using Matlab libsvm Toolbox (Chang and Lin 2011).

2.6 Artificial Neural Network (ANN)

ANN is a massively parallel distributed information processing system that has certain performance characteristics resembling biological neural networks of the human brain (Haykin 1999). A neural network is characterized by its architecture that represents the pattern of connection between nodes, its method of determining the connection weights and the activation function. The most commonly used neural network structure is the feed forward hierarchical architecture. A typical three-layered feed-forward neural network is comprised of a multiple elements also called nodes, and connection pathways that link them (Haykin 1999). The nodes are processing elements of the network and are normally known as neurons, reflecting the fact the neural network method model is based on the biological neural network of the human brain. A neuron receives an input signal, processes it, and transmits an output signal to other interconnected neurons.

In the hidden and output layers, the net input to unit i is of the form

$$ Z={\displaystyle \sum_{j=1}^k{w}_{ji}{y}_j+{\theta}_i} $$

(12)

where w_ji is the weight vector of unit i and k is the number of neurons in the layer above the layer that includes unit i. y_j is the output from unit j, and θ _i is the bias of unit i. This weighted sum Z; which is called the incoming signal of unit i, is then passed through a transfer function f to yield the estimates ŷ _i for unit i. The sigmoid function is continuous, differentiable everywhere, and monotonically increasing. The sigmoid transfer function, f _i, of unit i, is of the form

$$ {\widehat{y}}_i=\frac{1}{1+{e}^{-Z}} $$

(13)

A training algorithm is needed to solve a neural network problem. Since there are so many types of algorithms available for training a network, selection of an algorithm that provides the best fit to the data is required. In the current research, the ANN models were trained using the Levenberg–Marquardt training algorithm. The sigmoid and linear activation functions were used for the hidden and output node(s), respectively.

3 Case Study

3.1 Observation Data and Statistical Analysis

The climatic data in the site located near Ejina City (101°09′17.69″E, 41°58′53.95″N, altitude 927.32 m) were observed during the Phragmites communis’ growing season of May 9th to October 1th, 2004 (Fig. 1), with the total numbers of growing days of about 146 days. An automatic weather measurement system was installed in a flat field with Phragmites stand used to measure the primary climatic parameters including net radiation, soil heat flux, air temperature, water vapor pressure, humidity, wind speed and direction, dew point temperature and solar radiation. The detailed measurement system and methods can be found in Si et al. (2005).

The daily climatic data employed in this study were composed of T _max, T _min, U ₂, and R _s. The data from May 9th to August 18th, the first 102 records (about 70 % of total data) were used for training the models, and the remaining 44 records from August 19th to October 1th (about 30 %) were used for testing. The statistical parameters of daily climatic data were shown in Table 1. U ₂ shows a skewed distribution. According to the statistical properties of those data sets, no statistically significant differences between the divisions of the data were observed. Obviously, training data contain sufficient information about the system behavior to qualify as a system model.

Table 1 Statistical parameters of climatic data and PMF-56 ET₀ in each data set

Full size table

In order to eliminate dimension difference, all the climatic data were scaled to [0, 1] before input the SVM model. The formula is defined as following:

$$ {x}_{new}=\frac{x-{x}_{\min }}{x_{\max }-{x}_{\min }} $$

(14)

where x _new is the normalization data; x _min is the minimum data; x _max is the maximum data.

3.2 Model Development

Selecting appropriate input variables is important for SVM and ANN models development since it provides the basic information about the system being modeled. Temperature is the most predominant physical factor in the evaporation process. So, T _max and T _min were selected as an input. Some studies reported that R _s and U ₂ are more effective variables for estimating ET₀ in arid and semiarid zone (Cobaner 2011; Tabari et al. 2012). In current study, the performance of SVM and of ANN ET₀ was compared with daily PMF-56 ET₀. To achieve this, various combinations of daily climatic data including T _max, T _min, U ₂, and R _s were used as inputs to SVM and ANN models to estimate ET₀. The four input combinations evaluated were (1) T _max and T _min; (2) T _max, T _min and U ₂; (3) T _max, T _min and R _s; (4) T _max, T _min, U ₂ and R _s.

3.3 Models Performance Criteria

The performances of the models developed in this research were assessed using various standard statistical performance evaluation criteria such as coefficient of correlation (r), root mean squared error (RMSE), and mean absolute error (MAE). r measures the degree to which two variables are linearly related. RMSE and MAE provide different types of information about the predictive capabilities of the model. The RMSE measures the goodness-of-fit relevant to high ET₀ values whereas the MAE yields a more balanced perspective of the goodness-of-fit at moderate value distribution of the estimation errors.

The following equations were used for the computation of the above parameters:

$$ r=\frac{{\displaystyle {\sum}_{i=1}^n\left(E{T_{0i}}^p-\overline{E{T_0}^p}\right)\left(E{T}_0{{}_i}^o-\overline{E{T_0}^o}\right)}}{\sqrt{{\displaystyle {\sum}_{i=1}^n{\left(E{T}_0{{}_i}^p-\overline{E{T_0}^p}\right)}^2{\left(E{T}_0{{}_i}^o-\overline{E{T_0}^o}\right)}^2}}} $$

(15)

$$ RMSE=\sqrt{\frac{{\displaystyle {\sum}_{i=1}^n{\left(E{T}_0{{}_i}^p-E{T}_0{{}_i}^0\right)}^2}}{n}} $$

(16)

$$ MAE=\frac{1}{n}{\displaystyle {\sum}_{i=1}^n\left|E{T_0}_i^p-E{T_0}_i^o\right|} $$

(17)

where ET _0i ^p and ET _0i ^o are the ith estimated and PMF-56 ET₀values, respectively; $ \overline{E{T_0}^p} $ and $ \overline{E{T_0}^o} $ are the average of ET _0i ^p and ET _0i ^o; and n is the total numbers of data. The best fit between observed and calculated values would have r = 1, RMSE = 0 and MAE = 0, respectively.

In order to test the robustness of the developed model, it is important to test the model using some other performance evaluation criteria such as relative error (RE) and threshold statistics (TS) (Jain and Indurthy 2003). The TS for a level of x% is a measure of the consistency in modeling errors from a particular model. The TS are represented as TS_x and expressed as a percentage. This criterion can be expressed for different levels of relative error (RE) from the model.

$$ RE=\frac{\left|E{T_0}_i^p-E{T_0}_i^o\right|}{E{T_0}_i^o} $$

(18)

$$ T{S}_X=\frac{n_x}{n}\times 100 $$

(19)

where, n _x is the number of data points for which the RE is less than x%; n is the total number of data points computed. Clearly, higher n _x and TS _X values would indicate better model performance.

3.4 Results and Discussion

The performance of SVM models for PMF-56 ET₀ and the parameters C, ε, γ of the optimum SVM model were given in Table 2. It is apparent that all of the models performed similarly in training periods and testing periods, as the values of RMSE and MAE don’t vary significantly, and all r are also very close to unity. In testing periods, it is apparent that SVM4 and SVM3 models were better than SVM1 and SVM2 models for PMF-56 ET₀ estimation (Table 2). Based on the performance statistics, SVM4 whose inputs combinations were T _max, T _min, U ₂ and R _s had the smallest value of the RMSE (0.262 mm/day), MAE (0.207 mm/day) and higher value of r (0.950) than other model in the testing periods. Therefore, it was selected as the best-fit model for estimating the PMF-56 ET₀. SVM3 model whose inputs include T _max, T _min and R _s with RMSE of 0.282 mm/day, MAE of 0.228 mm/day and r of 0.946 provided the secondly best PMF-56 ET₀ estimation. Comparative analysis of the performance statistics showed that, SVM4 and SVM3 models performed similarly. Moreover, r values were also very close to unity. For practical applications, SVM4 and SVM3 had good accuracy in PMF-56 ET₀ modeling and the selection of one model over the other should be dependent upon the available meteorological data. Furthermore, SVM3, in which T _max, T _min and R _s are needed, performed well in PMF-56 ET₀ modeling and could be used in the developing areas with limited weather data.

Table 2 Optimal SVM parameters and the performance statistics of SVM models during training and testing periods

Full size table

The comparison of the ET₀ values computed by the PMF-56 equation and the values estimated by SVM4 and SVM3 models were shown in Fig. 2, in the form of line graphs and scatter plots. The ET₀ values estimated by the SVM models are closely to that computed using the PMF-56 ET₀ values and followed the same trend. The consistence revealed that the two models showed good estimation accuracy of the PMF-56 ET₀ (Fig. 2).

In order to evaluate the ability of SVM model relative to ANN model, four ANN models were developed using the same variables combinations for ET₀ modeling. The optimal number of neuron in the hidden layer was identified using a trial and error procedure by varying the number of hidden neurons from 2 to 15. Furthermore, the optimal network architecture was selected based on the one with minimum of MSE. The final ANN architecture and the performance statistics of each model were shown in Table 3. According to the testing periods results, ANN4 (4-2-1) model with the input combination T _max, T _min, U ₂ and R _s had the smallest RMSE (0.322 mm/day), MAE (0.268 mm/day) and the highest r (0.937), performed best. ANN3 (3-3-1) model, whose inputs were T _max, T _min and R _s had smaller RMSE (0.337 mm/day), MAE (0.268 mm/day) and higher r (0.923), ranked the second in ET₀ estimations. However, a comparison of the performance criteria for ANN models (Table 3) with those of SVM in Table 2 showed that all the SVM models have performed better than the corresponding ANN models in modeling the PMF-56 ET₀.

Table 3 The structure and the performance statistics of ANN models during training and testing periods

Full size table

It is important to evaluate not only the average estimation error but also the distribution of estimation errors when assessing the performance of any model for its applicability in modeling ET₀. Comparing the best SVM model SVM4 and the best ANN model ANN4 for modeling ET₀, SVM4 gave 28 estimates lower than the 10 % relative error in the testing periods, while ANN4 had 23 estimates lower than the 10 % error. Furthermore, SVM4 had 15 estimates lower than the 5 % error, while ANN4 had 13 estimates lower than the 5 % relative error, respectively. The SVM model yielded more accurate results than the ANN model.

The comparison between best ANN model ANN4 in modeling the ET₀ and PMF-56 ET₀ was shown in Fig. 3. Compared to Fig. 2 for SVM4 model, it further confirmed that although both the ANN and SVM had comparable performance during testing periods, the SVM models provided more accurate ET₀ estimates than the ANN during the more important independent testing stage. Overall, the results obtained confirmed the capability of SVM models for ET₀ estimates.

The performance of SVM models was further compared with three different empirical models including Priestley–Taylor, Hargreaves and Ritchie equations. These empirical models were firstly applied to calculate evapotranspiration based on the training data, and then calibrated using the PMF-56 ET₀ by the equation (5).

Priestley–Taylor, Hargreaves and Ritchie were calibrated by a and b coefficients. The performance statistics in testing periods of each model was given in Table 4. Priestley–Taylor equation had the smallest RMSE, MAE and the highest r, with the best performance. Ritchie equation performed the second best in ET₀ estimations. Hargreaves model performed the worst in the PMF-56 ET₀ estimation. Compared with those of SVM4 in Table 2, Priestley–Taylor equation had the highest r (0.951) that provided information for linear dependence between observations and corresponding estimates. It is not always expected that r is in agreement with performance criteria such as the RMSE. In the present study the main model performance criterion is the RMSE. The best model was selected by considering this criterion. From this viewpoint, it revealed that SVM4 model gave more accurate results than the empirical models in modeling PMF-56 ET₀.

Table 4 The calibration coefficients of the empirical models and performance statistics of the empirical models during testing periods

Full size table

For the distribution of estimation errors, in testing periods, Priestley-Taylor, Hargreaves and Ritchie methods had 16, 3 and 19 estimates lower than the 10 % error, respectively. Furthermore, Priestley-Taylor, Hargreaves and Ritchie methods had 7, 0 and 6 estimates lower than the 5 % relative error, respectively. From the viewpoint of relative error, SVM4 model still performed better than the empirical methods.

The ET₀ estimates of the empirical methods were illustrated in Fig. 4 in the form of line graphs and scatter plots. All of the empirical models underestimated the ET₀ values calculated by PMF-56 model. The performance differences between the empirical equations and the SVM approaches models showed that the SVM models performed better than the empirical equations.

The estimation of total PMF-56 ET₀ obtained from the estimated ET₀ values was also considered for comparison due to its importance in water balance calculation, water resources planning and management. The total estimated ET₀ amounts in testing periods were given in Table 5. It showed that all models underestimate total PMF-56 ET₀ value in testing periods. SVM4 and ANN4 models whose input parameters were T _max, T _min, U ₂ and R _s estimated the total PMF-56 ET₀ value of 94.14 mm as 90.27 mm and 89.03 mm, with an underestimation of 4.1 % and 5.4 %, respectively. While SVM3, Priestley-Taylor, Hargreaves and Ritchie equations estimated the total PMF-56 ET₀ value as 89.05 mm, 86.31 mm, 67.91 mm and 89.88 mm with underestimation of 5.4 %, 8.3 %, 27.9 % and 4.5 %, respectively. The total PMF-56 ET₀ amount estimates of SVM4, SVM3, ANN4 and Ritchie equation were closer to the PMF-56 ET₀ values. Among the models, SVM4 model had the best estimate (−4.1 %) and Ritchie equation had the secondly best estimate (−4.5 %), while Hargreaves equation had the worst (−27.9 %) in terms of total estimated PMF-56 ET₀ values.

Table 5 Total ET₀ values calculated by various models during testing period

Full size table

As a whole, the findings of this study revealed that SVM model seemed to be more adequate than ANN, Priestley-Taylor, Hargreaves and Ritchie equations for the ET₀ modeling and can be employed successfully in ET₀ estimation in the extreme arid regions with limited climatic data. SVM3 model which only needed T _max, T _min and R _s can be considered as simple model that offers a significant potential for accurate estimation of daily ET₀. SVM4 model with T _max, T _min, U ₂ and R _s as input variables exhibited good daily ET₀ estimation ability and produced better results. They are the recommended models by a lack of appropriate meteorological data for the application of the PMF-56 equation in extreme arid regions.

Generally, there are few limitations when SVM model is applied in practice. For many data-driven techniques, the amount of data size used to develop the model usually does limit their performance. To realize reliable forecasts, long-term weather data are required, but our computations show that SVM model is still reliable in ET₀ modeling even short-term weather records are used. The other limitation is that the model has been developed using data from a single site. However, this should not be seen as constituting a major problem since the analysis can easily be widened if more data from other stations become available for analysis. More data from different sources would allow the model to capture the patterns of data from a wider range of scenarios, thus increasing the geographical scope of its validity (Adeloye et al. 2012).

4 Conclusions

The accurate estimation of evapotranspiration is one of the most important issues in the management of water resources. This work investigated the applicability of SVM for daily ET₀ modeling using limited climatic data in the extremely arid regions of Ejina basin, northwestern China. Four models were developed using different combinations of four daily climatic data including T _max, T _min, R _s and U ₂. The developed SVM models were tested using the ET₀ calculated by PMF-56. The results demonstrated that SVM could be applied successfully to establish accurate and reliable PMF-56 ET₀ modeling. Particularly, SVM model whose inputs included T _max, T _min and R _s provided good ET₀ estimate, this is especially true in the developing areas where reliable weather data sets are limited.

Based on the comparison of SVM models with ANN and empirical models such as Priestley–Taylor, Hargreaves, Ritchie equations, the SVM gave more accurate results than the ANN and empirical models in the estimation of PMF-56 ET₀. SVM can be successfully used for modeling daily PMF-56 ET₀ when there are limited climatic data in extreme arid regions.

References

Adeloye AJ, Rustum R, Kariyama ID (2012) Neural computing modeling of the reference crop evapotranspiration. Environ Model Softw 29(1):61–73
Article Google Scholar
Allen RG, Smith M, Pereira LS (1994) An update for the definition of reference evapotranspiration. ICID Bull 43:1–34
Google Scholar
Allen RG, Pereira LS, Raes D, Smith M (1998) Crop evapotranspiration-guidelines for computing crop water requirements. FAO irrigation and drainage. paper no. 56. FAO, Rome
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N., (1992) A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory. ACM Press, pp.144–152
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27, Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chauhan S, Shrivastava RK (2008) Performance evaluation of reference evapotranspiration estimation using climate based methods and artificial neural networks. Water Resour Manag 23(5):825–837
Article Google Scholar
Citakoglu H, Cobaner M, Haktanir T, Kisi O (2013) Estimation of monthly mean reference evapotranspiration in Turkey. Water Resour Manag 28(1):99–113
Article Google Scholar
Cobaner M (2011) Evapotranspiration estimation by two different neuro-fuzzy inference systems. J Hydrol 398:292–302
Article Google Scholar
El-Shafie A, Najah A, Alsulami HM, Jahanbani H (2014) Optimized neural network prediction model for potential evapotranspiration utilizing ensemble procedure. Water Resour Manag 28(4):947–967
Article Google Scholar
Hargreaves GH, Samani ZA (1985) Reference crop evapotranspiration from temperature. Appl Eng Agric 1(2):96–99
Article Google Scholar
Haykin S (1999) Neural network-a comprehensive foundation. Prentice-Hall, Englewood Cliffs
Google Scholar
He Z, Wen X, Liu H, Du J (2014) A comparative study of artificial neural network, adaptive neuro fuzzy inference system and support vector machine for forecasting river flow in the semiarid mountain region. J Hydrol 509:379–386
Article Google Scholar
Hou LG, Xiao HL, Si JH, Xiao SC, Zhou MX, Yang YG (2010) Evapotranspiration and crop coefficient of Populus euphratica Oliv forest during the growing season in the extreme arid region northwest China. Agric Water Manag 97(2):351–356
Article Google Scholar
Hsu, C.W., Chang, C.C., Lin, C.J., (2010). A practical guide to support vector classification. URL http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.
Jain A, Indurthy SKVP (2003) Comparative analysis of eventbased rainfall-runoff modeling techniques-deterministic, statistical, and artificial neural networks. J Hydrol Eng ASCE 8(2):93–98
Article Google Scholar
Jones, J.W., Ritchie, J.T., (1990) Crop growth models. Management of farm irrigation systems. In: Hoffman, G.J., Howel, T.A., Solomon, K.H. (Eds.), ASAE Monograph No. 9. ASAE, St. Joseph, Mich., pp.63–89
Kisi O (2012) Least squares support vector machine for modeling daily reference evapotranspiration. Irrig Sci. doi:10.1007/s00271-012-0336-2
Google Scholar
Kisi O, Cengiz TM (2013) Fuzzy genetic approach for estimating reference evapotranspiration of Turkey: Mediterranean Region. Water Resour Manag 27(10):3541–3553
Article Google Scholar
Kisi O, Cimen M (2009) Evapotranspiration modeling using support vector machines. Hydrol Sci J 54(5):918–928
Article Google Scholar
Laaboudi A, Mouhouche B, Draoui B (2012) Neural network approach to reference evapotranspiration modeling from limited climatic data in arid regions. Int J Biometeorol 56(5):831–841
Article Google Scholar
Lin GF, Lin HY, Wu MC (2013) Development of a support-vector-machine-based model for daily pan evaporation estimation. Hydrol Process 22:3115–3127
Google Scholar
López-Urrea R, de Santa M, Olalla F, Fabeiro C, Moratalla A (2006) Testing evapotranspiration equations using lysimeter observations in a semi-arid climate. Agric Water Manag 85:15–26
Article Google Scholar
Müller K, Smola A, Rätsch G, Schölkopf B, Kohlmorgen J, Vapnik VN (1997) Predicting time series with support vector machines. Artif Neural Networks—ICANN 97(1327):999–1004
Google Scholar
Priestley CHB, Taylor RJ (1972) On the assessment of surface heat flux and evaporation using large scale parameters. Mon Weather Rev 100:81–92
Article Google Scholar
Principe JC, Euliano NR, Lefebvre CW (2000) Neural and adaptive systems: fundamentals through simulations. Wiley, New York
Google Scholar
Raghavendra NS, Deka PC (2014) Support vector machine applications in the field of hydrology: a review. Appl Soft Comput 19:372–386
Article Google Scholar
Rahimikhoob A (2014) Comparison between M5 model tree and neural networks for estimating reference evapotranspiration in an arid environment. Water Resour Manag 28(3):657–669
Article Google Scholar
Si JH, Feng Q, Zhang XY, Liu W, Su YH, Zhang YW (2005) Growing season evapotranspiration from Tamarix ramosissima stands under extreme arid conditions in northwest China. Environ Geol 48(7):861–870
Article Google Scholar
Tabari H, Kisi O, Ezani A, Hosseinzadeh Talaee P (2012) SVM, ANFIS, regression and climate based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment. J Hydrol 444–445:78–89
Article Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book Google Scholar

Download references

Acknowledgments

This work was funded by the National Basic Research Program of China (2013CB429906), the authors also wish to thank anonymous reviewers for their reading of the manuscript and for their suggestions and critical comments.

Author information

Authors and Affiliations

Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences, No. 320 Donggang West Road, Lanzhou, 730000, Gansu Province, China
Xiaohu Wen, Jianhua Si, Zhibin He & Haijiao Yu
Next Fuel Inc., 122 North Main Street, Sheridan, WY, 82801, USA
Jun Wu
Key Laboratory of Coastal Biology &Bioresources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai, 264003, People’s Republic of China
Hongbo Shao
Institute of Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
Hongbo Shao

Authors

Xiaohu Wen
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Si
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin He
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hongbo Shao
View author publications
You can also search for this author in PubMed Google Scholar
Haijiao Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xiaohu Wen or Hongbo Shao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wen, X., Si, J., He, Z. et al. Support-Vector-Machine-Based Models for Modeling Daily Reference Evapotranspiration With Limited Climatic Data in Extreme Arid Regions. Water Resour Manage 29, 3195–3209 (2015). https://doi.org/10.1007/s11269-015-0990-2

Download citation

Received: 23 December 2014
Accepted: 24 March 2015
Published: 11 April 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s11269-015-0990-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Support-Vector-Machine-Based Models for Modeling Daily Reference Evapotranspiration With Limited Climatic Data in Extreme Arid Regions

Abstract

Similar content being viewed by others

A novel hybrid AIG-SVR model for estimating daily reference evapotranspiration

Modeling daily reference evapotranspiration using SVR machine learning algorithm with limited meteorological data in Dar-el-Beidha, Algeria

Artificial Intelligence Based and Linear Conventional Techniques for Reference Evapotranspiration Modeling

1 Introduction

2 Materials and Methods