1 Introduction

In agricultural-dominated systems, agrochemicals are extensively used and consequently are the main sources for surface and groundwater pollution (Almasri 2008). The major issue that affects these systems is the vulnerability to water and agrochemicals losses from the surface soil profile where the crops' effective rooting system subsists. This issue is also dominated by the combined effects of the physical soil fertility, the crop type, and the agricultural practices (Kay et al. 2009). The most widely used method to determine water and agrochemicals losses from agricultural land is the use of process-based mathematical models, describing water movement and the transport and transformations of dissolved species through the soil profile (Jarvis 1994; RZWQM team 1998; Jasson and Karlberg 2004; Simunek et al. 2008). On the other hand, more simple models like classification indices (vulnerability or risk indices) use fewer and more accessible data such as climatic data, topography, and general soil physical properties (Gogu and Dassargues 2000). These empirical models can be employed easily in large-scale application using Geographical Information Systems (GIS) and support efficiently the agricultural management using Decision Support Systems (Manos et al. 2010a, b; Voudouris et al. 2010). The most popular vulnerability indices, using weights and ratings, to describe the intrinsic vulnerability of the groundwater to contamination are DRASTIC index (Aller et al. 1985), SINTACS index (Civita and De Maio 1997), RISKE index (Petelte-Giraude et al. 2000), COP (Zwahlen 2003), and MERLIN (Aveline et al. 2009).

When intrinsic vulnerability of the groundwater is studied, errors are introduced by the uncertainties related to field measurements used for the calibration procedure. It has been demonstrated that field measurements are affected from land uses (especially from human-induced processes) and from the hydraulic gradients in the saturated zone (either due to the aquifer’s nature or to groundwater abstraction) (Barnes and Raymond 2010). In general, these factors lead to a misinterpretation of the intrinsic groundwater vulnerability to pollution. Stigter et al. (2006) note a limitation of the DRASTIC method, which ascribe a great significance to the attenuation capacity of the involved hydrogeological parameters. A different approach with respect to the classical DRASTIC index was presented by Leone et al. (2009), where DRASTIC was determined separately for different land uses in order to act as a risk index rather than as an index describing the intrinsic groundwater vulnerability. This approach was supported by additional results from the process-based model Groundwater Loading Effects of Agricultural Management Systems (GLEAMS) (Leonard et al. 1987). Additional errors in many case studies, that concern use of classification indices in agricultural land (Aller et al. 1985; Civita and De Maio 1997), may occur from the assumption that nitrates are a conservative substance without considering their origin from the nitrification of ammonia species in fertilizers and organic matter mineralization, the denitrification process which occur in anaerobic environments, and nitrogen immobilization when C/N ratios in soils are high.

On the other side, the direct use of process-based simulation models to predict nitrate leaching at the watershed scale, are more valid but more difficult to handle due to their complexity and data requirements. Carey and Lloyd (1985) used a numerical distributed transport model with a primitive GIS in order to simulate nitrate concentrations in groundwater. Several process-based models combined with GIS, such as NLEAP model (Pierce et al. 1991; Shaffer et al. 1995), NITS-SHETRAN (Birkinshaw and Ewen 2000), AgriFlux-IDRISI GIS (Lasserre et al. 1999), DAISY-MIKE SHE (Refsgaard et al. 1999), GLEAMS (De Paz and Ramos 2002), GIS NIT-1 (de Paz et al. 2009), and MT3D-MODFLOW (Almasri and Kaluarachchi 2007), have been used in the past in order to predict the spatial and temporal distribution of nitrate leaching and to assess nitrate contamination in groundwater.

To avoid the complexity of the abovementioned process-based models which require a considerable amount of data to describe the aquifer properties and to avoid the subjectivity of the indices using weights and ratings, a set of calibrated indices are developed in this study in order to classify the intrinsic vulnerability of agricultural land to water and nitrogen losses. To calibrate these indices using multiple regression analysis, the results of GLEAMS v3.0 model for combinations of different soil properties, topography, and climatic conditions of a reference field crop are used as “observed values.” The calibrated indices are restricted to agricultural soils since GLEAMS was developed specifically for these kind of soils and not, e.g., for forest soils.

2 Methodology

2.1 GLEAMS V3.0 Model Description

The GLEAMS V3.0 model is a computer program used to simulate water quality events on agricultural fields. GLEAMS has been used internationally and especially in the USA to evaluate the hydrologic and water quality response of many different scenarios considering different cropping systems, wetland conditions, subsurface drained fields, agricultural and municipal waste application, nutrient and pesticide applications, and different tillage systems.

In order to simulate the many events occurring on the field, GLEAMS model is divided into three separate submodels: hydrology, erosion/sediment yield, and chemical transport/transformation (i.e., nutrients and pesticides) (Leonard et al. 1987; Knisel and Davis 2000). Regarding nitrogen, the chemical transport/transformation submodel of GLEAMS, is adjusted to different soil and climate environments using reaction coefficients for nitrification, denitrification, ammonia volatilization, mineralization, immobilization, which are self-calibrated functions of soil moisture, soil temperature, and other soil physicochemical characteristics. This model structure allows predicting the response of each of the abovementioned hydrobiogeochemical processes under different environments.

2.2 Reference Field Crop Characteristics

A uniform reference square area of 1 ha with homogeneous soil profile cropped with perennial grass (with extensive, uniform surface of dense actively growing coverage) was selected to perform different scenarios on different soils and climatologic conditions, using the assumption of Allen et al. (1998) for the reference crop evapotranspiration method. Model application was carried out for the top 30 cm of the soil profile, where most of the roots are included. GLEAMS model considers, as water losses, the losses of surface runoff and the losses due to percolation below the top 30 cm. The cases simulated via GLEAMS model are presented in the following subsections.

2.2.1 Topography

In the GLEAMS model, surface runoff is determined using the curve number method. Four slope cases (0%, 1%, 5%, and 10%) were linked to curve number using an equation adopted by Getter et al. (2007), which was determined for grass surfaces:

$$ {\text{CN}} = 82.904 + 2.3476 \cdot { \ln }\left( {S\% } \right)\;{R^{{2}}} = 0.948 $$
(1)

2.2.2 Soil Physical–Hydraulic Properties

Four cases were considered using the soil texture and the average values of hydraulic conductivity ranges adopted by the four hydrologic soil groups A, B, C, and D according to USDA-SCS classification (2007) (Table 1).

Table 1 Hydraulic conductivity, soil texture, and field capacity for the four hydrologic soil groups

Volumetric water content at saturation θ SAT and permanent wilting point θ PWP were assumed to be constant in all simulations at 0.5 and 0.1 cm3 cm−3, respectively. Water content at field capacity θ FC was determined using effective porosity Ø e (θ SAT –θ FC) from Franzmeier’s (1991) equation (Table 1), which has been determined using data for 15 lithomorphic classes:

$$ {K_s} = 1.95 \times {10^{{ - 3}}} \times \emptyset_e^{{2.67}},\quad {R^{{2}}} = 0.66 $$
(2)

where K s is the saturated hydraulic conductivity (meter per second) and Ø e is the effective porosity (cubic centimeter per cubic centimeter). Equation 2 was adjusted in order to give the water content at field capacity for the respective values of K s :

$$ {\theta_{\text{FC}}} = {\theta_{\text{SAT}}} - \sqrt[{2.67}]{{14.24 \times {{10}^{{ - 4}}}{K_s}}} $$
(3)

where K s is saturated hydraulic conductivity (centimeter per hour).

2.2.3 Soil Biochemical Properties

Three cases of organic matter were used in this study (0.5%, 2%, and 5%). According to Knisel and Davis (2000), the relationship between the organic carbon content OC (%) and the organic matter content OM (%) is given by:

$$ {\text{OC}}\% = 0.58 \times {\text{OM}}\% $$
(4)

Due to the large variation of soil C/N ratios in grassland soils (Hassink 1994), the simplified chemical composition of organic matter adopted by Barry et al. (2002) was used, which leads to the minimum C/N ratios, in order to diminish the nitrogen immobilization effects and to maximize the mineralization potential which are calculated by the model. These assumptions were adopted in order to calculate the maximum rates of nitrogen release, leading to maximum losses for a specific value of OM% using the GLEAMS model.

The examined cases of organic matter, which are used in the simulations, do not include the case of highly organic soils (peats), which have different characteristics and different response to processes governing nitrogen fate due to the increased values of their C/N ratio values. These soils are described by low mineralization rates, high immobilization, high nitrification, and high denitrification rates leading to lower losses of nitrogen leaching compared to the soils with low organic matter but with the same hydraulic properties (Verhoeven et al. 1990; van Beek et al. 2004; Yu and Ehrenfeld 2009; Pal et al. 2010).

2.2.4 Irrigation

Simulations applied with and without irrigation. Irrigation can be applied automatically by the GLEAMS model during the growing season. Irrigation was adjusted to keep the soil moisture between 20% (0.1 cm3 cm−3) and 100% (0.5 cm3 cm−3) of plant available water content in the soil profile. This range was introduced in order to give a realistic number of irrigation applications during the growing season and to preserve the effects of soil moisture variation on the processes governing nutrients' fate.

2.2.5 Fertilization

A total amount of 180 N kg ha−1 and 90 P kg ha−1 of a combined inorganic and organic fertilization for the reference crop was used as a basis for nitrogen balance determination using GLEAMS model. Fertilization applied in two doses:

  1. (a)

    The first dose was 30 kg NO3–N ha−1, 30 kg NH4–N ha−1, and 40 kg P ha−1 from inorganic fertilization and 60 kg N ha−1 and 20 kg P ha−1 from organic fertilization using dairy cattle manure (1.25 t ha−1). Dairy cattle manure characteristics were obtained by Knisel and Davis (2000) (total nitrogen 4.8%, organic nitrogen 3.23%, NH4–N nitrogen 1.54%, NO3–N nitrogen 0.03%, total phosphorus 1.6%, organic phosphorus 1.52%, inorganic phosphorus 0.08%, organic matter 85%);

  2. (b)

    The second dose was 30 kg NO3–N ha−1, 30 kg NH4–N ha−1, and 30 kg P ha−1 from inorganic fertilization.

Nitrogen content in precipitation and in irrigation water was not accounted in the model. Fertilization type and rate was constant in all simulations.

2.2.6 Meteorological Conditions

Weather has a large influence on nitrate losses from agricultural land (Gibbons et al. 2005). One of the major problems in the conception of generalized empirical relationships is the climatic variability, and especially daily rainfall variability among regions. In order to develop realistic climatological scenarios, an attempt was carried out using four climatic cases in the model. The cases represent the meteorological parameters observed at four stations located in Sindos (Greece—Thessaloniki province), Mirabello (Italy—Emilia Romagna province), Allardt (USA—Tennessee state), and Oakland (USA—Iowa state) (Table 2). Simulations in GLEAMS were carried out for three successive years using daily precipitation data and monthly average values of minimum and maximum temperature, solar radiation, wind speed, and dew point temperature (determined as a function of relative humidity).

Table 2 Mean annual values of the meteorological parameters from four meteorological stations for the 3-year simulations in GLEAMS model

2.2.7 Scenarios

GLEAMS was applied using the reference field crop characteristics for 384 scenarios, which consist of combinations of four cases of saturated hydraulic conductivity, three cases of organic matter, four cases of soil surface slope, four cases of climatic conditions using the selected stations, and two cases for irrigation conditions (irrigated and non-irrigated).

2.3 LOS Indices

The results of the GLEAMS model from the 384 simulation scenarios were used as “observed values” in this study and concern (1) the annual losses of the percolated water beneath the root zone, (2) the annual losses of the surface runoff, (3) the annual losses of the nitrogen leaching beneath the root zone, and (4) the annual losses of nitrogen through the surface runoff. These results were used to calibrate the LOSW-P, LOSW-R, LOSN-PN, and LOSN-RN (LOS indices), which describe the respective annual losses of water and nitrogen from the agricultural land. Taking into account that the calculations were carried out under the reference field crop, the same nitrogen fertilization, and the same irrigation practice, the obtained LOS indices consequently describe the intrinsic vulnerability of agricultural land to water and nitrogen losses.

The general form of LOS indices for water (LOSW) and nitrogen (LOSN) losses were developed using the multiple regression analysis method. The independent variables of the LOSW were the hydraulic conductivity, surface slope, precipitation, potential evapotranspiration and irrigation, while for LOSN, organic matter and temperature were also included. The selected independent variables were transformed using square root transformation in order to diminish errors from normality departures. As dependent variables, (1) the annual losses of the percolated water beneath the root zone P, (2) the annual losses of the surface runoff R, (3) the annual losses of the nitrogen leaching beneath the root zone PN, and (4) the annual losses of nitrogen through the surface runoff RN, were used to calibrate the following indices LOSW-P, LOSW-R, LOSN-PN, and LOSN-RN, respectively. Dependent variables were also transformed. The general forms of LOS indices for water and nitrogen losses are the following:

$$ \sqrt {{{\text{LOS}}{{\text{W}}_{\text{i}}}}} = {a_{{1}}}\sqrt {{{\text{K}}{{\text{s}}_{\text{i}}}}} + {a_{{2}}}\sqrt {{{{\text{S}}_{\text{i}}}}} + {a_{{3}}}\sqrt {{{\text{PC}}{{\text{P}}_{\text{i}}}}} + {a_{{4}}}\sqrt {{{\text{P}}{{\text{E}}_{\text{i}}}}} + {a_{{5}}}\sqrt {{{\text{I}}{{\text{R}}_{\text{i}}}}} + {e_i} $$
(5)
$$ \sqrt {{{\text{LOS}}{{\text{N}}_{\text{i}}}}} = {a_{{1}}}\sqrt {{{\text{O}}{{\text{M}}_{\text{i}}}}} + {a_{{2}}}\sqrt {{{T_{\text{i}}}}} + {a_{{3}}}\sqrt {{{\text{K}}{{\text{s}}_{\text{i}}}}} + {a_{{4}}}\sqrt {{{{\text{S}}_{\text{i}}}}} + {a_{{5}}}\sqrt {{{\text{PC}}{{\text{P}}_{\text{i}}}}} + {a_{{6}}}\sqrt {{{\text{P}}{{\text{E}}_{\text{i}}}}} + {a_{{7}}}\sqrt {{{\text{I}}{{\text{R}}_{\text{i}}}}} + {e_i} $$
(6)

where LOSW is the water losses (millimeters per year), LOSN is the nitrogen losses (kilograms per hectare per year), K s is hydraulic conductivity (millimeters per day), S is the surface slope (%), PCP is the precipitation (millimeters per year), PE is the potential evapotranspiration (millimeters per year), IR is the irrigation applied by the model (millimeters per year), OM is the organic matter (%), T is the mean annual temperature (°C), e is the regression residuals between predicted and simulated values, and i is subscript for each simulation.

In the multiple regression method for the determination of the regression coefficients of Eqs. 5 and 6, the least square criterion was used in order to minimize the sum of squares of the residuals (difference between the predicted values of the regression models and the respective simulated results of the GLEAMS model).

The regression coefficients of the LOSW and LOSN indices for each type of losses are given in Table 3. The graphs of the computed values of P 1/2, R 1/2 PN 1/2, and RN 1/2 by GLEAMS versus the predicted (LOSW-P)1/2, (LOSW-R)1/2, (LOSN-PN)1/2, and (LOSN-RN)1/2 values are given in Figs. 1 and 2. Together with the square correlation coefficient (R 2), additional statistical tests to validate models' efficiency were used such as the mean error ME (range −∞ up to +∞, optimum value 0), the mean absolute error MAE (range 0 up to +∞, optimum value 0), the coefficient of efficiency EF (range −∞ up to 1, optimum value 1), the coefficient of residual mass CRM (range −∞ up to 1, optimum value 0), and the root mean square error RMSE (range 0 up to +∞, optimum value 0), which are given by the following equations (Antonopoulos and Wyseure 1999):

$$ {\text{ME}} = \frac{1}{N}\sum\limits_{{i = 1}}^N {\left( {{C_i} - {O_i}} \right)} $$
(7)
$$ {\text{MAE}} = \frac{1}{N}\sum\limits_{{i = 1}}^N {\left| {{C_i} - {O_i}} \right|} $$
(8)
$$ {\text{EF}} = 1 - \left\{ {\sum\limits_{{i = 1}}^N {{{\left( {{C_i} - {O_i}} \right)}^2}} /\sum\limits_{{i = 1}}^N {{{\left( {{O_i} - \bar{O}} \right)}^2}} } \right\} $$
(9)
$$ {\text{CRM}} = 1 - \left[ {\sum\limits_{{i = 1}}^N {{C_i}} /\sum\limits_{{i = 1}}^N {{O_i}} } \right] $$
(10)
$$ {\text{RMSE}} = \frac{{100}}{{\bar{O}}}\sqrt {{\frac{1}{N}\sum\limits_{{i = 1}}^N {{{\left( {{C_i} - {O_i}} \right)}^2}} }} $$
(11)

where C is the computed values, Ο is the observed values, and Ν is the number of samples.

Table 3 Statistics of the regression coefficients for the (LOSW-P)1/2, (LOSW-R)1/2, (LOSN-PN)1/2, and (LOSN-RN)1/2 determination using Eqs. 5 and 6
Fig. 1
figure 1

Calibration of the LOSW indices using the GLEAMS model results

Fig. 2
figure 2

Calibration of the LOSN indices using the GLEAMS model results

Regarding the data elaboration in GIS environment, multiple checks must be performed on the calculated raster files. Primarily, the determination of slope using digital elevation models may introduce error, especially when the resolution is low, leading to unrealistic values of slope in the agricultural land, which lays in most cases in terraced surfaces but at different altitudes. For the determination of (LOSW-P)1/2, (LOSW-R)1/2, (LOSN-PN)1/2, and (LOSN-RN)1/2 in GIS environment, it is compulsory to check for possible negative values in the raster files, which have been computed using the general Eqs. 5 and 6 and the regression coefficients of Table 3. Negative values must be transformed to 0 before using the power to calculate LOSW-P, LOSW-R, LOSN-PN, and LOSN-RN.

Considering the above data, the final formulas in order to determine the LOSW and LOSN indices are the following:

  • when (LOSW-P)1/2 ≥ 0, then

    $$ {\text{LOSW - P}} = {\left\{ {\begin{array}{*{20}{c}} {{0}{.0941}\sqrt {\text{Ks}} - 0.761\sqrt {\text{S}} + 0.4185\sqrt {\text{PCP}} } \hfill \\ { - {0}{.0487}\sqrt {\text{PE}} + {0}{.0903}\sqrt {\text{IR}} } \hfill \\ \end{array} } \right\}^{{2}}}{\text{else}}\;\left( {{\text{LOSW - P}}} \right) = 0 $$
    (12)
  • when (LOSW-R)1/2 ≥ 0, then

    $$ {\text{LOSW - R}} = {\left\{ {\begin{array}{*{20}{c}} { - {0}{.0856}\sqrt {\text{Ks}} + {1}{.8573}\sqrt {\text{S}} + 0.9966\sqrt {\text{PCP}} } \\ { - {0}{.5612}\sqrt {\text{PE}} + 0.2384\sqrt {\text{IR}} } \\ \end{array} } \right\}^2}{\text{else}}\,\left( {{\text{LOSW - R}}} \right) = 0 $$
    (13)
  • when (LOSN-PN)1/2 ≥ 0, then

    $$ {\text{LOSN}} - {\text{PN}} = {\left\{ {\begin{array}{*{20}{c}} { - {0}{.1536}\sqrt {\text{OM}} + {2}{.6981}\sqrt {\text{T}} + {0}{.0439}\sqrt {\text{Ks}} } \hfill \\ { - {0}{.2046}\sqrt {\text{S}} { + 0}{.0471}\sqrt {\text{PCP}} - { 0}{.2515}\sqrt {\text{PE}} } \hfill \\ { - 0.0116\sqrt {\text{IR}} } \hfill \\ \end{array} } \right\}^{{2}}}{\text{else}}\,\left( {{\text{LOSN}} - {\text{PN}}} \right) = 0 $$
    (14)
  • when (LOSN-RN)1/2 ≥ 0, then

    $$ {\text{LOSN - RN}} = {\left\{ {\begin{array}{*{20}{c}} {0.0121\sqrt {\text{OM}} { - 2}{.6559}\sqrt {\text{T}} { - 0}{.0228}\sqrt {\text{Ks}} } \hfill \\ { + 0.3785\sqrt {\text{S}} { + 0}{.1298}\sqrt {\text{PCP}} + 0.2923\sqrt {\text{PE}} } \hfill \\ { + 0.0047\sqrt {\text{IR}} } \hfill \\ \end{array} } \right\}^{{2}}}{\text{else}}\,\left( {{\text{LOSN - RN}}} \right) = $$
    (15)

where LOSW-P is the annual losses due to deep percolation beneath the root zone at 30 cm (millimeters per year), LOSW-R is the annual losses due to surface runoff (millimeters per year), LOSN-PN is the annual nitrogen losses due to deep percolation beneath the root zone at 30 cm (kilograms per hectare per year), and LOSN-RN is the annual nitrogen losses due to surface runoff (kilograms per hectare per year). The sum of total losses of water and nitrogen are given by the following:

$$ \left( {{\text{LOSW - PR}}} \right) = \left( {{\text{LOSW - P}}} \right) + \left( {{\text{LOSW - R}}} \right) $$
(16)
$$ \left( {{\text{LOSN - PRN}}} \right) = \left( {{\text{LOSN - PN}}} \right) + \left( {{\text{LOSN - RN}}} \right) $$
(17)

During the simulations of GLEAMS model, nitrogen losses were attributed to NO3–N, which was about 99.99% of the total losses through percolation or surface runoff.

2.4 Ground and Surface Water Pollution from Agricultural Land

In order to relate LOSW-P and LOSN-P to groundwater vulnerability, their computed values are set equal to the mass of water and nitrogen which reach the groundwater using the following assumptions: (1) there is a direct connection of the root zone with the groundwater table in lowland soils (soils with high water table), and (2) for the upland soils, it is assumed that the water content, the agrochemicals concentrations, and the thickness of the unsaturated zone are in steady-state conditions. LOSN-PN can be used in order to determine the concentration of percolated water using the following equation:

$$ {\text{CPW}} = 100\frac{{\left( {{\text{LOSN - PN}}} \right)}}{{\left( {{\text{LOSW - P}}} \right)}} $$
(18)

where CPW is the nitrogen concentration of the percolated water under the root zone (milligrams per liter). The same rule can be followed for the surface water using the following equation:

$$ {\text{CRW}} = 100\frac{{\left( {{\text{LOSN - RN}}} \right)}}{{\left( {{\text{LOSW - R}}} \right)}} $$
(19)

where CRW is the nitrogen concentration of the runoff water (milligrams per liter).

Regarding the vulnerability of agricultural land to other agrochemical losses, LOSW can be used, following the assumption that their losses are proportional to the water losses.

In order to include the unsaturated zone, an additional equation that gives the minimum transit time of water and consequently substances losses from the surface to reach the groundwater is given:

$$ {\text{TT}} = 1,000\frac{\text{Depth}}{{{K_s}}} $$
(20)

where TT is the minimum transit time losses from the soil surface to reach the groundwater table (days), Ks is the saturated hydraulic conductivity (millimeters per day) and depth is the distance of the groundwater table from the soil surface (meters).

3 Conclusions

A set of indices was developed in order to classify the vulnerability of agricultural land to water and nitrogen losses, setting a basis for the integrated water resources management in agricultural systems. The water and nitrogen losses (LOS) indices derived within this study via the calibrations using GLEAMS model have some considerable advantages compared to the vulnerability indices derived using weights and ratings. In fact, the ranking of LOS indices has a physical meaning using units for the amounts and concentrations of water and nitrogen losses; moreover, they are originated by a process-based model and can be calibrated simultaneously with the model when experimental data exist. Moreover, LOS indices' results can be introduced more easily in GIS environment compared to process-based models and they can be calibrated using fewer input parameters. They are focusing on the vulnerability of the pollution source (agricultural land) and not the vulnerability of the pollution recipients (surface and ground waters), which are described by more complex properties. Finally, an important advantage derived by the utilization of the LOS indices is that they are comparable for different regions and they can assess the pollution potential not only for groundwater but also for surface waters.