1 Introduction

Researchers have demonstrated that increased infiltration is achievable by adopting such management practices as soil tillage (Strudley et al. 2008). The tillage practices usually include two categories: conventional tillage (CT) and no-till (NT) (Schomberg et al. 2009). CT leaves less than 30 % of the surface covered with crop residue and consists of plowing to a certain soil depth. NT means no longer any turning and loosening of the soil materials. It was reported that continuous NT systems enhanced the development of macro pores that are mostly created by soil organisms (worms) and plant roots, thus increasing the hydraulic conductivity and infiltration capacity of soil (Edwards et al. 1988; Tyler and Thomas 1977). In contrast, CT destroys these channels, reducing infiltration and creating more runoff. Additionally, when the soil is maintained under NT practices, the residues remaining on the ground will increase the levels of organic carbon and water-stable soil aggregates, which will also lead to increased infiltration (McGregor et al. 1975; Moldenhauer et al. 1983; West et al. 1992). Further, the amount of evaporation from stubble-covered NT fields is generally smaller than that from bare soils of CT areas, leaving more water available for crops (Blevins et al. 1983; Brun et al. 1986). Based on the study of Golabi et al. (1988) in Georgia, long-term NT systems allowed more water infiltration into the soil than conventional treatments. McGee et al. (1997) determined that NT treatments resulted in more water storage for the crops than the CT practice in wheat fallow systems in Colorado. A similar conclusion was obtained by Blevins et al. (1983) on soils under 10 years of NT and CT, respectively, in Kentucky. Radcliffe et al. (1988) compared soils with 10 years of CT to those with NT histories and found that infiltration rates were more than doubled in NT fields compared to those under CT.

There are other advantages associated with NT farming. For instance, NT has been recommended as an effective management practice for reducing soil erosion by increasing surface residues and reducing surface runoff (Fu et al. 2006; Greer et al. 2006; McCool et al. 1997; Shelton et al. 1983). Additionally, in cold regions, the surface residues resulted from NT help insulate the soil surface and shorten the period the soil stays frozen in winter, thus increasing water infiltration and reducing runoff and erosion.

In summary, long-term NT tends to reduce the amount of surface runoff and increase infiltration compared to CT. The infiltrated water may contribute to soil water storage. A specific hypothesis of this research is that land management practice, such as NT, considerably enhances field infiltration, and therefore has the potential for increasing recharge to the subsurface storage. However, more specific questions need to be answered by both field- and watershed-scale studies: are the model parameters different between CT and NT? Do the statistically different parameters result in statistical differences in model outputs? Does statistical difference indicate substantial difference in model outputs at different temporal (monthly or yearly) scales? In this study, model parameters, different from state variables [e.g., soil water content (SWC) and ET], refer to the “constants” which stand for inherent properties of hydrological systems. For example, saturated hydraulic conductivity and available water capacity are soil hydraulic parameters (Sarris and Paleologos 2004; Wang and Xia 2010).

Models often contain many parameters. The analysis on each single parameter is not enough for the comparison between different factors (e.g., field sites or tillage practices). A comprehensive comparison at the parameter-set level can be more informative. Multiple parameters in a model constitute a parameter set (i.e., a vector). Clustering has the advantage to analyze the relationship between vectors (Likas and Vlassis 2003).

In this paper, we conducted watershed-scale modeling based on field-scale observations at multiple sites in the Pataha Creek Watershed, WA, USA. We developed a simple but statistically-based uncertainty analysis method to quantify the uncertainty in parameters. The parameter sets for each field site generated by an auto-calibration algorithm were further filtered by a critical objective function value estimated from the minimum objective function value. The numbers of filtered parameter sets may be very different across all sites. In order to reduce the size of parameter sets in each group (site) and conduct a balanced statistical test on the differences between parameter sets under different tillage practices, the k-means clustering (Bradley et al. 2000) was used to achieve new groups of parameter sets with equal group size.

The scenario-based analysis is widely used to demonstrate likely responses of a system to various decisions by creating a set of possible alternatives (Wang et al. 2013). Two scenarios (i.e., croplands under CT or NT management practices) were developed to investigate the impacts of tillage practices on hydrological processes.

2 Materials and methods

2.1 Study area and data collection

The Pataha Creek Watershed (46°11′–46°34′N, 117°25′–118°00′W) is a typical agricultural watershed within the Inland Pacific Northwest region. It drains an area of 478 km2 and is majorly located in Garfield County, WA (Fig. 1). It is a main tributary of the Tucannon River located 18 km above the Tucannon’s confluence with the Snake River. In 1993, the Pataha Creek Watershed was selected as a “model watershed” by the Northwest Power Planning Council and the Bonneville Power Administration (Bartels 2003; Fu et al. 2006). Based on the data of 18 raingauges in the Garfield County, the mean annual precipitation is 408 mm (Pomeroy Conservation District 2009). The mean annual temperature is 10.5 °C from the statistics for Pomeroy, WA by Cligen (Nicks et al. 1995).

Fig. 1
figure 1

The Pataha Creek Watershed and field experimental sites (DEM)

Field data were collected from January 2009 to June 2010. Two factors, tillage and climate, were considered in the field measurements of SWC. CT and NT (~5–15 years) management are both practiced in this area and precipitation varies from a relatively low northern zone to a relatively high southern precipitation zone. Two replicated sites were selected for each combination; thus, eight field sites were monitored in the experiment (see Fig. 1 and Table 1). SWC and temperature at three depths (25, 50, and 120 cm) were simultaneously monitored by an EC-TM sensor (Decagon Inc., Pullman, WA) at a time interval of 10 min. The data were recorded and stored in an EM-50R data-logger (Decagon Inc., Pullman, WA).

Table 1 Field sites for SWC measurements

At each SWC experimental site, eight soil columns (48.8 mm in diameter, 25.4 mm in height) were sampled during four field visits for the measurements of bulk density at the depth of 25 cm. Soil bulk density is calculated as the dry weight of soil divided by its volume (Wang et al. 2012). The soil texture was determined by soil particle analysis with hydrometer (Flury 2009) and classified according to the USDA texture system (Brown 2003). The analysis of variance (ANOVA) (Giraudoux 2011) was used to test the effects of factors (e.g., tillage and climate) on SWC and soil bulk density. The Kruskal–Wallis (KW) test (Giraudoux 2011) was employed to analyze the pattern of difference between multiple means of bulk density at the eight sites.

Digital Elevation Model (DEM) and land cover (National Land Cover Database 2001) data were downloaded from USGS Seamless Data Warehouse (http://seamless.usgs.gov/index.php). Both DEM and land cover are raster data and have a spatial resolution of 30 m. Soil data in vector format was downloaded from the SSURGO database via Soil Data Mart (http://soildatamart.nrcs.usda.gov/). Weather data, including precipitation, air temperature, humidity, wind speed and direction, from four weather stations (see Fig. 1) located in the two precipitation zones were collected. One weather station (W24) was installed for this study near Site NT24 in the southern watershed. The data from the other three stations (Raws Alder Ridge, Geiger Hill, and Pomeroy Downtown) were obtained from Weather Underground (http://www.wunderground.com/).

2.2 SWAT model setup, calibration and verification

2.2.1 SWAT model setup

The Soil and Water Assessment Tool (SWAT) model has been widely used in a variety of investigations (Arnold and Fohrer 2005; Boskidis et al. 2012; Gassman et al. 2007). Streamflow originates from four sources: surface runoff, lateral subsurface flow, return flow (base flow), and pond/reservoir outflow. SWAT partitions ground water into two aquifer systems: a shallow aquifer and a deep aquifer. Water percolating past the bottom of the root zone becomes recharge for both aquifers. Base flow is contributed from the shallow aquifer. Water entering the deep aquifer (a confined aquifer) is considered to be lost from the watershed system (Neitsch et al. 2002).

Different from the SCS (Soil Conservation Service) curve number method developed by the USDA (United States Department of Agriculture), the Green–Ampt infiltration method considers rainfall intensity and duration (King et al. 1999) and has been incorporated into SWAT (Arnold and Fohrer 2005; Neitsch et al. 2002). In the SCS curve number method, the curve number (ranging from 0 to 100) is a function of the soil’s permeability, land use and antecedent soil water conditions. The curve number is used to calculate surface runoff with rainfall as input. The Green–Ampt method calculates infiltration as a function of the wetting front matric potential and effective hydraulic conductivity (Neitsch et al. 2002). Water that does not infiltrate becomes surface runoff. We selected the Green–Ampt method to determine infiltration and surface runoff in this study. The effective hydraulic conductivity in Green–Ampt method can be estimated from saturated hydraulic conductivity and curve number (Nearing et al. 1996; Neitsch et al. 2002):

$$ KE = \frac{{56.82 \cdot K_{sat}^{0.286} }}{1 + 0.051 \cdot \exp (0.062 \cdot CN)} - 2 $$
(1)

where CN is the curve number; KE is the effective hydraulic conductivity (mm/h); K sat is the saturated hydraulic conductivity (mm/h), which is a quantitative measure of a saturated soil’s ability to transmit water when subjected to a hydraulic gradient (NRCS 2013).

The study area was delineated into 207 subbasins, with the elevation ranging from 260 to 1,772 m. The land cover were reclassified into 12 categories, where the cropland (distributed in 112 subbasins) occupies 47.2 % of the total watershed area. In addition, nine soil types and corresponding soil properties were initially assigned to the subbasins.

An integrated SWAT modeling system was developed for this research. AVSWAT (ArcView SWAT) interface was provided by the SWAT developers to generate input files for the SWAT model by using the GIS (Geographic Information System) software ArcView (Di Luzio et al. 2005). The input files organize input information according to the type of input at different levels (watershed, subbasin, or Hydrological Response Unit). Major inputs for SWAT include watershed configuration, subbasin delineation, soil type/land cover/plant growth database, management database, and climate inputs (Di Luzio et al. 2005). The Data Transformation Services (DTS) tool developed by Microsoft (Daniel et al. 2007) was used to transfer SWAT output data from text files to a Microsoft Access database. In addition, a SWAT calibration and output analysis interface was developed in C++ language (Wang and Xia 2010).

2.2.2 SWAT model calibration and verification

The 18-month experiment was divided into two periods. The first 12 months (January–December of 2009) were the calibration period and the later 6 months (January–June of 2010) were used for model verification. The model simulations were run at a daily time-step. Under NT with residue cover, the Manning’s roughness coefficient for overland flow (OV_N) and the biological soil mixing efficiency (BIOMIX) were set to 0.3 and 0.4 compared to 0.14 and 0.1 for CT, respectively (Arabi et al. 2008; Ullrich and Volk 2009).

Based on previous studies (Arnold and Fohrer 2005; Barlund et al. 2007; Neitsch et al. 2002; van Griensven and Bauwens 2003; Wang and Xia 2010), 11 parameters were selected for model calibration (Table 2). A stochastic parameter optimization algorithm, the SCE-UA (Shuffled Complex Evolution developed at the University of Arizona), was directly incorporated into the source code of SWAT2000 model for auto-calibration. The most important features for SCE-UA are the combination of competitive evolution and complex shuffling, which enhance the survival ability of offspring in the population (Duan et al. 1992; Wang and Xia 2010).

Table 2 Parameters for model calibration

The model was calibrated and verified by using the SWCs at the eight sites instead of runoff data, since the streamflow is greatly disturbed by agricultural irrigation. The simulated soil water (mm) of the whole soil profile from the SWAT outputs was converted to volumetric SWC (m3 H2O m−3 soil). The average observed SWCs over the three soil depths were used as observations.

The following objectives were used for optimization:

$$ {\text{Ratio of simulated to observed average SWC:}}\quad RSO ={\bar{\theta}}_{sim}/{\bar{\theta}}_{obs} $$
(2)
$$ {\text{Nash-Sutcliffe efficiency criterion}}\;({\text{Wang et \,al.}}\;2009){:}\quad NSEC = 1 - \frac{\sum[\theta_{obs}(i)-\theta_{sim}(i)]^{2}}{\sum[{\theta_{obs}}(i) - {\overline{\theta}_{obs}}]^{2}}$$
(3)

where \( \bar{\theta}_{sim} \) and \( \bar{\theta }_{obs} \) are simulated and observed average SWC (m3 m−3), respectively; θ obs (i) and θ sim (i) are observed and simulated SWC at time i respectively; NSEC is the Nash–Sutcliffe efficiency criterion (Arnell and Reynard 1996; Wang et al. 2009), which is also called the Coefficient of Determination (Devore 2008). NSEC is the percent of the variation that can be explained by the model with the optimized parameter values. The ratio of simulated to observed average SWC (RSO) denotes the goodness-of-fit of mean SWC during the simulation period.

The overall objective function (J) was the combination of the above two objectives:

$$ J = w_{1} \cdot \left| {1 - NSEC} \right| + (1 - w_{1} ) \cdot \left| {1 - RSO} \right| $$
(4)

where w 1 is a weighting factor, 0 ≤ w 1 ≤ 1, usually w 1 = 0.5, indicating that the two objectives are equally important. This objective function takes account of the goodness-of-fit of both the mean value and the temporal variation for SWC. An objective function value close to zero (i.e., both NSEC and RSO approach 1.0) indicates a good agreement between simulated and observed SWCs.

2.3 Uncertainty analysis based on SCE-UA optimization

In the model calibration process, we were more interested in obtaining a group of “relatively optimal” parameter sets than in pursuing a single “optimal” parameter set (Wang and Chen 2012). These parameter sets can be used to quantify the uncertainty in parameters (Shen et al. 2013; Wang and Chen 2013a). In this study, the optimal parameter set generated in each loop of the SCE-UA searching process was combined to form a feasible parameter space. The optimal parameter set in each searching loop is not the global optimum, but a local or a “relative” optimum during the optimization. Parameter uncertainty at each SWC site was evaluated by calculating the critical objective function values (Batstone et al. 2003):

$$ J_{cr} = J_{opt} \left( {1 + \frac{p}{n - p}F_{\alpha ,p,n - p} } \right) $$
(5)

where J cr is the critical value that defines the parameter uncertainty region, J opt is the optimum (minimum) objective function value that is calculated by Eq. (4), n is the number of measured data points, p is the number of parameters, and F α,p,np is the value of the F distribution for α, p, and n − p. In this study, p = 11, n = 365, we used α = 0.05, with F 0.05,11,354 = 1.8157 to estimate the 95 % confidence parameter uncertainty regions. From Eq. (5), we know that J cr  > J opt . Those parameter sets (found by SCE-UA algorithm) resulting in objective function values less than J cr form the surface of the confidence space.

2.4 Cluster analysis

The cluster analysis was used in this study to reduce the sizes of “relatively optimal” parameter sets for each site and compare parameter sets between tillage practices (CT and NT). The basic idea for k-means clustering is to minimize the clustering error, which is measured by the sum of squared distances of each parameter set from the corresponding cluster center (Likas and Vlassis 2003).

For the purpose of reducing the size of parameter sets at each site, the following procedure was developed: (i) Determine the number of clusters (i.e., k clusters) for the parameter sets of each site, which can be determined by the ratio of the between-group variance to the total variance and ploting “within groups sum of squares” against “number of clusters” (Likas and Vlassis 2003). (ii) Compute the centers (i.e., means) of these k clusters and use them as a sample of parameter values. (iii) These k parameter sets of each site were used for subsequent scenario analysis of tillage impacts on hydrological processes.

The aforementioned procedure will generate k parameter sets for each of the study sites under CT or NT management practice. The differences in the parameter sets among these field sites were identified by both k-means and Ward Hierarchical Clustering in R language (R Development Core Team 2011). The Ward Hierarchical Clustering seeks to build a hierarchy of clusters by using the Ward agglomerative algorithm, i.e., each element (e.g., parameter sets grouped by field sites in this study) starts in its own cluster, and pairs of clusters (elements) are merged as one moves up the hierarchy (Murtagh and Legendre 2011).

2.5 Scenario analysis regarding tillage impacts

Two scenarios were designed to assess the impacts of tillage practices on hydrological regime in the Pataha Creek Watershed. The scenarios were to set all the croplands (47.2 % of watershed area) to CT or NT managements. In the absence of long-term observations, 50-year daily weather data generated by CLIGEN (Zhang and Garbrecht 2003) for Pomeroy, WA in SWAT were used to drive model. The monthly average precipitation database for Pomeroy, WA in CLIGEN was modified by the observations at 18 raingauges (Pomeroy Conservation District 2009). The clustered k parameter sets for each of the eight sites (four sites with CT and four with NT practice) were used to parameterize the model. Thus there were 4×k model runs (each model run is associated with a parameter set) for each of the two scenarios (CT and NT). Mean monthly and annual runoff and ET were analyzed.

3 Results

3.1 Measurements of soil texture, bulk density, and soil water concent

The soil textures of the eight sites were classified as loam except Site NT24, which is silty loam (see Table 1). The average bulk density of each site ranges from 1.13 to 1.34 g cm−3 (Fig. 2). Statistical tests by ANOVA and KW test at α = 0.05 indicate: (i) in terms of tillage and climate factors, the impact of tillage on bulk density is significant (P < 0.001 by ANOVA); neither climate (P = 0.26 by ANOVA) nor the interaction between tillage and climate (P = 0.22 by ANOVA) are significant. (ii) NT24 and NT27 have significantly higher bulk densities than the other sites, whereas CT26 has significantly lower bulk density. (iii) Bulk density under NT (1.31 ± 0.11 g cm−3) is significantly higher than under CT (1.20 ± 0.11 g cm−3). Higher bulk density under NT than under CT has been reported by many studies (e.g., Bhattacharyya et al. 2006; Heard et al. 1988; Roth et al. 1988).

Fig. 2
figure 2

Comparison of soil bulk density; the numbers below site names denote the sample sizes (n); the error bars refer to standard errors; different letters on error bars indicate significantly different means at P < 0.05 according to the KW test

Monthly average SWC data at the three depths (25, 50, and 120 cm) were compared. Generally, the monthly average SWCs increased with depth. At each depth, the SWCs at the CT sites were usually lower than those at the NT sites. Further statistical analyses indicate that only tillage impact on SWC is significant at 25- and 50-cm depth, whereas both tillage and climate have significant influences on SWC at 120-cm depth.

3.2 Model calibration and validation

As stated before, the averaged SWCs from the three depths were used for the observed SWCs for the whole soil profile. Comparisons between the simulated and observed SWCs for each site during the calibration period (1/1/2009–12/31/2009) and the verification period (1/1/2010–6/21/2010) are shown in Fig. 3a–h. From model calibration results of the eight field sites (Table 3), all the RSO criteria were very close to 1.0, indicating that the simulated average SWCs were very close to the observed mean SWCs. The NSEC criteria ranged from 0.80–0.90 (average 0.86) and 0.62–0.90 (average 0.72) for the CT sites and NT sites, respectively, which showed good agreement between the simulated and observed SWCs at each site.

Fig. 3
figure 3

Comparison between simulated and observed daily SWCs (calibration period: 1/2009–12/2009; verification: 1/2010–6/2010). ah Site NT23–CT30

Table 3 SWAT model calibration and verification using daily SWCs

Although model verifications were not as good as calibrations (Table 3), the trend predictions agreed well except for CT25 and CT28 (see Fig. 3). Regarding model verifications, the model underestimated the SWCs at three CT Sites, i.e., RSO = 0.77, 0.67, and 0.83 for CT25, CT28, and CT30, respectively. However, at CT30, the predicted SWCs agreed well with the observations during the last month of the verification period. The large discrepancy at CT25 during the verification period was attributed to the change in land surface condition, i.e., the soil surface was covered by plastic mulch for growing trees. In addition, the discrepancy between simulations and observations might be caused by the variations in local precipitation that were not captured by the raingauge observations.

3.3 Uncertainty and cluster analyses of parameter sets

According to Eq. (5), the parameter sets describing parameter uncertainty region were determined by J cr at eight field sites. The size of selected parameter sets for each site ranged from 102 to 1,762 with an average of 935. As previously mentioned, we used cluster analysis to reduce the size of parameter sets. The plots of “within groups sum of squares” against “number of clusters” implied that the number of clusters can be set as k = 100 (see Fig. 4 for an example from NT27 with 1,762 parameter sets). In addition, with k = 100, the ratios of the between-group varaince to the total variance reached 95–98 %, which also supported the selection of k = 100. Finally, 100 parameter sets (i.e., centers of the 100 clusters) were obtained for each of the eight sites by k-means clustering.

Fig. 4
figure 4

Determination of number of clusters (an example from NT27)

The parameter means for eight field sites and the means and standard deviations for CT and NT are summarized in Table 4. Compared with CT, the NT sites had higher values for parameters ESCO, EPCO, ALPHA_BF, CH_N2, CH_K2, SOL_AWC, SOIL_K, and KE, but lower values for the other parameters. KW tests on one parameter at a time indicated that each of them was significantly different between CT and NT. Barplots of three parameters clearly showed the difference in ESCO (soil evaporation compensation factor), SOL_AWC (available water capacity), and KE (effective hydraulic conductivity) between tillage practices (Fig. 5).

Table 4 Parameter statistics for eight field sites and different tillage practices
Fig. 5
figure 5

Barplots of parameter values for eight sites and two tillage practices, a ESCO: soil evaporation compensation factor, b SOL_AWC (mm H2O/mm soil): available water capacity, and c KE (mm/h): effective hydraulic conductivity in Green–Ampt method (see Eq. (1))

As for comparisons between parameter sets, the parameter vector consisting of ten parameters (ESCO, EPCO, ALPHA_BF, CH_N2, CH_K2, SURLAG, SOL_AWC, GW_DELAY, RCHRG_DP, and KE) was used as the study object, since KE is calculated from CN2 and SOL_K and is able to represent the two parameters. The cluster dendrogram generated by the Ward Hierarchical Clustering elucidated the “distance” between field sites (Fig. 6). Generally, the four CT sites and four NT sites are seperated from each other and fall into two wards (clusters), which concurs with the result from k-means (with k = 2). For the CT sites, CT25 in the northern low precipitation zone and CT26 in the southern high precipitation zone had similar parameter sets, so did CT28 (north) and CT30 (south). For the NT sites, the parameter sets at NT27 (north) were close to those at NT24 (south)

Fig. 6
figure 6

Ward Hierarchical Clustering of parameter sets from eight sites. Such notations as CT25 indicate the elements (e.g., parameter sets grouped by field sites); each element starts in its own cluster; “hclust (*, “ward”)” is a function in R language with the method of “ward” to implement hierarchical clustering; “distance” indicates the difference between each individual cluster (element); pairs of clusters are merged as one moves up the hierarchy noted by “height”

3.4 Scenario analysis

As previously mentioned, the 400 parameter sets from four CT sites and 400 parameter sets from four NT sites were regarded as parameter samples for CT and NT scenarios, respectively. The simulated 50-year weather data indicated an annual mean precipitation and potential ET of 410 mm (112 mm as snowfall) and 1,428 mm, respectively.

From the perspective of 50-year mean annual values, no significant differences in ET, total runoff (QT), later flow (QL), shallow aquifer return flow (QR), total aquifer recharge (RCT), deep aquifer recharge (RCD), or change in SWC (∆SWC) were found. However, the average SWCs under NT were significantly higher than SWCs under CT, where the difference was 14 mm at the watershed scale when SWCs were converted to water depth.

In terms of 50-year mean monthly values (Table 5), although statistical differences in runoff (QT) and later flow (QL) between CT and NT were found for some months, the practical differences were within 0.1 mm. Mean monthly ETs under NT were significantly different from ETs under CT except the ET in August, 2009. ETs under NT in eight months (January, February, May–July, September, November, and December) were 0.03–0.78 mm (average 0.3 mm) lower than ETs under CT. For the other three months (March, April, and October), ETs at NT were 0.28–1.55 mm (average 0.7 mm) higher than ETs at CT.

Table 5 Comparison of mean monthly lateral flow, runoff, and ET between CT and NT by KW test (α = 0.05)

4 Discussions

4.1 Parameter uncertainty and comparison

The parameter space constrained by J cr (critical objective function value) can quantitatively express the uncertainties in parameters. J cr gives out the upper boundary for “good” parameter sets that generate “good” agreements between simulated and observed model outputs. Compared to a single “best” parameter set, multiple “good” parameter sets are more practical to demonstrate the fact that there may coexist multiple choices of parameter sets or model structures for acceptable model simulations. This kind of phenomenon has been called “equifinality” (Beven and Freer 2001; Wang and Chen 2013b). In terms of the coefficient of variation (CV), ESCO (soil evaporation compensationf actor), SOL_AWC (available water capacity), and GW_DELAY (ground water delay) presented relatively low CVs under 15 %, whereas EPCO (plant uptake compensation factor), RCHRG_DP (deep aquifer percolation fraction), and KE (effective hydraulic conductivity) showed CVs as high as 32–44 % (Fig. 7). The high CVs for KE were originated from high variations in SOL_K (saturated hydraulic conductivity). Relatively high correlations (absolute correlation coefficient >0.5) also occurred in parameters between ESCO and EPCO, ESCO and SOL_AWC, and among SURLAG (surface runoff lag time), CH_K2 (channel effective hydraulic conductivity), and RCHRG_DP.

Fig. 7
figure 7

CV of parameters under CT and NT management (KE is the effective hydraulic conductivity in the Green–Ampt method, see Table 2 for description of other parameters)

When multiple “good” parameters representing parameter uncertainty are employed to run a model, the uncertainty is also propagated with model simulations. Thus the comparison between model output is not just focused on two individual value (or data series), but based on the statistics of model outputs. Take this study as an example, we statistically compared hundreds of monthly and annual outputs generated by 800 parameter sets under CT and NT managements.

Comparisons of parameters between CT and NT indicated differences in them. Generally, KE (effective hydraulic conductivity) was 44 % higher at NT sites than CT sites in this study. Higher hydraulic conductivity/infiltration rate under NT has been reported by many studies (Azooz and Arshad 1996; Benjamin 1993; Bhattacharyya et al. 2006; Gicheru et al. 2004; Mizuba and Hammel 2001; Singh et al. 1996). However, opposite conclusions have also been observed (Ferreras et al. 2000; Heard et al. 1988; Lipiec et al. 2006; Moreno et al. 1997). Moret and Arrúe (2007) even found lower hydraulic conductivity under NT than under CT and reduced tillage accross the entire range of hydraulic head. While examing individual site, we also found that KE at CT25 was higher than KEs at both NT27 and NT29, which implied that higher KE under CT was also possible because of the complexity and heterogeneity in soil properties.

Both ESCO and EPCO were lower under CT than for NT in this study. Greater ESCO means less soil evaporative demand (maximum evaporation), and greater EPCO means greater potential water uptake by plant (Neitsch et al. 2002). Thus the higher ESCO under NT might partly explain the lower ET under NT scenario in eight months, since ESCO affects the evaporative demand and actual ET is controlled by both potential ET and soil water conditions.

Higher SOL_AWC (available water capacity) for NT fields indicated that the NT soil had a higher capacity to hold water. Greater SOL_AWC with NT has been reported by many researches (Bescansa et al. 2006; Gicheru et al. 2004; Jones et al. 1994; Lampurlanes et al. 2001; Mizuba and Hammel 2001; Moreno et al. 1997), whereas higher SOL_AWC under CT than NT was also observed (e.g., Miller et al. 1999). The mean values of SOL_AWC (0.26 and 0.33 for CT and NT, respectively) were higher than the SOL_AWC (0.24) reported by Neitsch et al. (2005) for loamy soils. However, Madsen et al. (1990) reported a SOL_AWC range of 0.2–0.3 for loam/silty loam soils. In addition, following Hudson’s regression method (Hudson 1994), the average SOL_AWC for silty loam soils with 5 % organic matter content (for this study) is 0.28. Therefore, relatively high SOL_AWC values from our study are reasonable.

4.2 Impact of tillage practices on hydrological processes

The model simulation indicated that almost no surface runoff was generated under both CT and NT, which is consistent with the observations by Alvi and Chen (2003) in the same watershed. The scenario analyses regarding the impacts of CT and NT for croplands (47.2 % of entire watershed area) on hydrological processes indicated no significant difference in mean annual values and slightly difference in mean monthly runoff and ET. Some previous studies also found that tillage did not significantly impact the water budget (e.g., Chaplot et al. 2004). Higher SWCs under NT than under CT were observed and verified by scenario anlyses in this study, which agreed with previous studies (Bescansa et al. 2006; Lampurlanes et al. 2001). The high SWCs under NT were majorly mediated by the high SOL_AWC for NT sites. One of the reasons for this might be that greater percentage volume of larger pores (>30 μm) exist in soils for NT than for CT (Benjamin 1993; Ehlers 1975; Logsdon et al. 1990; Miller et al. 1999).

The results from scenario analyses were different from our hypothesis and expectation, where we presumed that tillage practices would affect hydrological regime if soil hydraulic properties and parameters were significantly different from each other. In both our study and aforementioned field-scale studies, soil properties and model parameters responded to tillage practices. However, model simulations at watershed scale seemed to eliminate the difference and produce similar water budget from the mean annual perspective. Higher SOL_AWC for NT fields indicated that the NT soil had a higher capacity to hold water. Thus the mean net changes in SWC during a year were not significantly different between CT and NT. Our sub-watershed calibrations and watershed modeling disclosed important information on tillage impacts: (i) the significant differences in soil properties and model parameters do not necessarily mean significant difference in hydrological response at watershed scale; and (ii) sometimes statistical difference does not indicate substaintial difference compared to the order of magnitude of a variable. The practical indiscrimination that were statistically significant indicate that our sample size (i.e., 400 parameter sets for each scenario) was too large (Johnson 1999). Therefore, model simulations at the watershed scale are as important as field-scale experiments and modelings.

5 Conclusions

The conclusions with respect to tillage impacts were derived as follows. (i) The soil properties and model parameters were significantly different between CT and NT. In particular, higher bulk density, larger available water capacity, and higher effective hydraulic conductivity were found for NT than for CT. (ii) SWCs at three depths of the NT sites were significantly higher than those of CT sites. According to the statistical test, the differences in SWCs were dominated by the tillage management, not the climate conditions. (iii) The increased infiltration under NT did not result in significant changes in mean annual water budget at a watershed scale, because NT soil of this study had a higher capacity to hold water. (iv) Although statistical differences in mean monthly runoff and ET (e.g., greater ET in eight months under CT than under NT) were found, the differences were not substantial compared to the amounts of monthly runoff and ET. In summary, the statistically different model parameters neither resulted in statistical differences in annual hydrological outputs nor practical differences in monthly outputs.