Introduction

In the very simplest of terms, evapotranspiration (ET) could be defined as the amount of water lost to the surrounding environment of a crop through transpiration done by leaves and evaporation from the soil surface. From an agricultural and hydrological point of view, ET estimation is at the heart of many operations. From irrigation water allocation and subsequent design of conveyance systems to regional and global water balance studies, ET finds its relevance.

Different crops have different ET characteristics, which also vary spatially and temporally. For instance, a paddy crop of a particular variety, growing at a certain location, requires varying amounts of water at different growth stages. The same variety of paddy may demand a different irrigation regime at a different location. Therefore, a single ET model may not be appropriate for the very same crop at two geographical locations. To address the complexity arising from such situations, the idea of a reference surface was introduced. The ET rate of a crop is related to the reference surface evapotranspiration or reference evapotranspiration (ETO) by a multiplicative factor known as the crop coefficient. FAO-56 (Allen et al. 1998) puts forth the definition of a reference surface as “a hypothetical reference crop with an assumed crop height of 0.12 m, and a fixed surface resistance of 70 sm−1 and an albedo of 0.23.

Prior to the consultation of experts in 1990, the Food and Agriculture Organization (FAO) endorsed the use of Blaney–Criddle, radiation, modified Penman, and pan evaporation methods. However, a performance evaluation study of the above mentioned and some more estimation procedures carried out by the committee on the Irrigation Water Requirement of the American Society of Civil Engineers (ASCE) led to the understanding that the methods showed variable performance in varying climates. This led to the recommendation for the adoption of the Penman–Monteith combination method as the new standard method for ETO estimation, and it was called the FAO-56 Penman–Monteith (FAO-56 PM) equation. It was seconded by multiple subsequent studies evaluating the performance of the FAO-56 PM equation (Kashyap and Panda 2001; Irmak et al. 2003a, b; Garcia et al. 2004; Temesgen et al. 2005; Allen et al. 2005, 2006; Jabloun and Sahli 2008). The equation is given by:

$${ET}_{o}= \frac{0.408\Delta \left({R}_{n}-G\right)+\gamma \frac{900}{T+273}{U}_{2}({e}_{s}-{e}_{a})}{\Delta +\gamma (1+0.34{U}_{2})},$$
(1)

where Rn is the net radiation available at the crop surface [MJ m−2 day−1], G is the soil heat flux density [MJ m−2 day−1], T [°C]and U2 [m s−1] are the mean air temperature and wind speed measured at 2 m from the ground surface, respectively, es and ea are the saturation and actual vapour pressure, respectively, in kPa, es-ea is the saturation vapour pressure deficit [kPa], ∆ the slope of vapour pressure curve [kPa °C−1] and γ is the psychrometric constant [kPa °C−1].

It has been observed very commonly how planners tend to rely on simpler temperature-based equations for calculation of ET (Xu and Singh 2001). In the process of doing so, the resulting ET estimates can be fairly biased because of ET being sensitive to a number of variables. In such circumstances, linear regression equations could be used as alternatives owing to their structural simplicity. Linear regression based models have been consistently used to estimate ETO (Irmak et al. 2003b; Kovoor and Nandagiri 2007; Cristea et al. 2013; Yirga 2019). In this paper, we analyse the relationship between meteorological variables and ETO on a monthly time-scale in the temperate Kashmir Valley.

Materials and methods

Study area and meteorological data

The study area for this analysis is the Valley of Kashmir. With a mean altitude of 1545 m, the Valley is situated between 32° 22′ and 34° 43′ N latitude and 73° 52′ and 75° 42′ E longitude (Fig. 1). The Valley, bound by the Pir-Panjal range and the greater Himalayas, is famous world over for its snow-clad mountains, pristine lakes, and gushing streams. There are four marked seasons in the Valley; Spring, Summer, Autumn, and Winter. The climate of the Valley puts it in the dfb category in the Köppen climate classification.

Fig. 1
figure 1

Meteorological stations considered for the study

The meteorological data required for the study corresponding to the Srinagar, Qazigund, and Kupwara stations were acquired from the National Data Centre (NDC), Pune. The dataset comprised of monthly values of maximum temperature (Tmax), minimum temperature (Tmin), Wind Speed (WS), Relative humidity (RH), and sunshine hours (SSH) for a period of 20 years (2000–2019). Missing data, if any, were estimated by following the procedure laid out in FAO-56 (Allen et al. 1998). Table 1 presents the location and primary information about the average values of the meteorological data.

Table 1 Information regarding the location and the annual average values of the primary meteorological data

Reference evapotranspiration estimation

The FAO-56 PM equation (Allen et al. 1998) was used for calculating ETO for the Srinagar, Qazigund, and Kupwara stations for a period of 20 years using DSS_ET, which is a decision support system for ET estimation (Bandyopadhyay et al. 2012). DSS_ET is a very handy tool for ET calculations using a variety of models for making the calculations. It also has features for estimating missing data, visualization of the results, and primary statistical analysis for performance evaluation of ETO models, which makes it a very useful tool for researchers.

Multiple linear regression

A generic equation for multiple linear regression (MLR) can be expressed as

$$Y= {\propto }_{0}+{\mu }_{1}{X}_{1}+ {\mu }_{2}{X}_{2}+...+ {\mu }_{n}{X}_{n}+\in ,$$
(2)

where \(Y\) is the dependent variable, \({\propto }_{0}\) is the intercept, \({\mu }_{1}\), \({\mu }_{2}\)\({\mu }_{n}\) are the coefficients of the multiple linear regression equation, \({X}_{1}, {X}_{2},{X}_{n}\) are the independent variables and \(\in\) is the error term associated with the multiple linear regression equation. Multiple linear regression equation is essentially a minimization problem wherein the coefficients are estimated for the minimum sum of squared error terms.

Using the ETO results based on the calculations made by the FAO-56 PM equation, multiple regression analysis was performed to develop linear models for the estimation of monthly ETO values from the meteorological variables (Tmax, Tmin, WS, RH, and SSH) for each of the stations. The basic assumptions of multiple linear regression were checked, and the variables showing multicollinearity were removed. Moreover, HAR ETO values were also calculated using DSS_ET, and the same were used for carrying out a linear regression with FAO-56 PM ETO, and linear models were developed for FAO-56 PM ETO estimation in terms of HAR ETO for each station.

Results and discussion

Spatiotemporal variation in the ETO values in the study area

Figure 2 presents the overall variation of the average daily ETO values for the stations under consideration calculated by the FAO-56 PM equation. The ETO values peak between May and August with the mean maximum values for Srinagar, Kupwara, and Qazigund as 4.187, 4.112, and 3.920 mm/day. The overall annual mean ETO values in the same order are 2.549, 2.409, and 2.424 mm/day.

Fig. 2
figure 2

Temporal variation of ETO values for the stations

Modeling ETO using multiple linear regression

After elimination of highly correlated explanatory variables which induced the problem of singularity in the regression and subsequent recognition of linear relationships between the independent variables Tmin, RH, WS, SSH, and the dependent variable ETO (Fig. 3a–l), based on the linear regression analysis between explanatory variables and the independent variable, linear models for reference evapotranspiration, which passed the significance test at a p value of 0.05 were developed for the Srinagar, Kupwara and Qazigund stations. Brief results of the multiple linear regression are presented in Table 2. The units of measurement of all the variables are presented in Table 3. Figure 4a–c illustrates the R2 values obtained for the plots between predicted ETO and actual ETO. The normal probability plots (Fig. 5), which are approximately straight lines, imply an insignificant departure from the normal distribution. Moreover, the residual plots (Fig. 6a–l) did not show any signs of heteroscedasticity. As the variables are meteorological variables, some level of multicollinearity will always be there. Some studies suggest the upper bound of the variance inflation factor (VIF) as 5 to indicate significant collinearity while others fix a limit of 10 to indicate significant collinearity. Some even argue that a VIF of the order of 40 or even higher may not essentially be harmful to the model performance (O’Brien 2007). For this study, the VIFs were lower than 5, the highest being 3.73.

Fig. 3
figure 3

Correlations of the FAO-56 PM ETO and all other predictor variables for Srinagar (ad), Kupwara (eh) and Qazigund (il)

Table 2 Multiple linear regression models developed for the stations
Table 3 Units of measurement of the variables in the developed regression models
Fig. 4
figure 4

Plots of FAO-56 PM ETO and predicted ETO using the developed linear models for the Srinagar (a), Kupwara (b) and Qazigund (c) stations, respectively

Fig. 5
figure 5

Normal probability plots of the developed linear models for the Srinagar (a), Kupwara (b) and Qazigund (c) stations, respectively

Fig. 6
figure 6

Residual plots for the linear models developed for the Srinagar (ad), Kupwara (eh), and Qazigund (il) stations, respectively

Station-wise calibration of the Hargreaves and Samani equation

The Hargreaves and Samani (HAR) equation (Hargreaves and Samani 1985) is a very simple equation for the determination of ETO from records of minimum and maximum temperature. In data-scarce regions, where the temperature is often the only variable recorded and in places where meteorological data acquisition is wearying and uneconomical, this method is frequently put to use. In certain places, despite the availability of data, field practitioners are often tempted to rely on this method owing to the structural simplicity of the model. ETO values calculated using the HAR equation when compared to the ETO values calculated using the FAO-56 PM equation showed that the HAR equation significantly over-estimated the ETO values for Srinagar (199 mm/year), Kupwara (266 mm/year) and Qazigund (209 mm/year). Keeping the same in mind, a linear regression between the two models was set up with HAR ETO as the independent variable and the FAO-56 PM ETo as the dependent variable. The results of the regression analysis are presented in Table 4.

Table 4 Regression equation for FAO-56 ETO in terms of HAR ETO

Conclusions

Reference evapotranspiration (ETO) values calculated using the FAO-56 PM method were used to carry out multiple linear regression to develop linear models for the estimation of ETO for three stations in the Kashmir Valley. For all the stations, the strongest correlation of ETO was found with minimum temperature (Tmin) followed by sunshine hours (SSH), relative humidity (RH), and wind speed (WS). The analysis of variance of the residuals showed no signs of heteroscedasticity. Moreover, the normal probability plots were also satisfactory. All the models performed exceedingly well with R2 > 0.96 for each model. The developed models are simple in their structure and do not mandate any complex intermediate calculations and can be therefore used effectively in field practice. For cases wherein the data measured are only limited to maximum and minimum temperature, the Hargreaves and Samani equation was calibrated for all the stations considering its overestimation of ETO in comparison to the FAO-56 Penman–Monteith equation.