Introduction

Key aspects of regional hydrological assessments are accurate and reliable predictions of water fluxes and state variables such as runoff, evapotranspiration, groundwater recharge and sediment loads in watersheds. Distributed hydrological models are increasingly being used for this purpose, relying to a greater extent on computing power and remotely sensed information (Kumar et al. 2013). The spatial distribution of hydrological variables simulated with those models is achieved by accounting for spatial variability of typical physical characteristics like topography, land use/land cover, soil types and meteorological variables such as temperature and precipitation. Recurrent challenges in modelling medium to large scale watersheds (102–105 km\(^2\)) are typically overparameterization, parameter non-identifiability, non-transferability of parameters across calibration scales and across spatial scales and locations and last but not least, increasing computing time (Beven 1993; Haddeland 2002; Samaniego et al. 2010; Kumar et al. 2013). Because distributed hydrological models are spatially complex and deal with large numbers of unknown parameters, parameterization techniques have to be applied. The most common technique is based on the hydrological response unit (HRU), in which complexity is reduced through cell grouping of homogeneous units, using basin physical characteristics (Beven 1993; Abbaspour et al. 2007b; Arnold et al. 2012; Kumar et al. 2013). Other major challenges when applying distributed hydrological models are the non-transferability of model parameters through spatial resolution and transferability of parameters across scale and space. Several studies have shown that shifting model parameters across calibration scale generates bias in simulation of water fluxes and state variables (Haddeland 2002; Liang et al. 2004; Samaniego et al. 2010). Similarly, discrepancies occur when parameters are transferred across locations (Merz and Blöschl 2004; Samaniego et al. 2010; Smith et al. 2012; Singh et al. 2012). However, relatively few researchers have attempted to model parameter transfer so far and none, to our knowledge, have ever tried it in Ethiopia.

There have been numerous studies conducted in the Ethiopian highlands on modelling discharge and soil erosion with SWAT (Ndomba et al. 2008; Mekonnen et al. 2009; Setegn et al. 2010; Easton et al. 2010; Betrie et al. 2011; Notter et al. 2012; Yesuf et al. 2016; Lemann et al. 2016) to cite an incomplete list only. All of them focused on modelling with limited measured data, and none did attempt the model parameter transfer for lack of appropriate opportunities. The setup in this study is probably quite unique and non-existent in Ethiopia.

Several studies, outside of Ethiopia, focussed on temporal transfers of model parameters (Bingner et al. 1997; Liew and Garbrecht 2003; Abbaspour et al. 2007b; Chaubey et al. 2010; Sheshukov et al. 2011; Douglas-Mankin et al. 2013; Seo et al. 2014) and others more on a spatial transfer (Vandewiele and Elias 1995; Santhi et al. 2001; Merz and Blöschl 2004; Parajuli et al. 2009; He et al. 2011; Kumar et al. 2013).

For example, Merz and Blöschl (2004) examined the performance of various methods of regionalizing parameters of a conceptual catchment model in 308 Austrian catchments. They concluded that the methods based on spatial proximity performed better than those based on physiographic catchment attributes. Similarly, Kumar et al. (2013) concluded that the similarity approach, where a complete set of parameters is transposed from a donor catchment that is most similar in physiographic terms, performed best. Kokkonen et al. (2003) transferred the complete parameter set from the catchment outlet, while McIntyre (2004) defined the most similar catchment in terms of area, precipitation and baseflow and Parajka et al. (2005) used the mean for elevation, stream network density and lake index to define similarity.

The aim of the present study is to analyse the effects of this parameter transfer technique on the simulation of water fluxes and sediment loads at multiple modelling scales and locations. We specifically investigate the model parameter transfer from one subcatchment to the entire watershed for sediment load and streamflow modelling.

Methodology

Study area

The Gerda watershed is located in the central Ethiopian Highlands of the Amhara Regional State (see Fig. 1; Table 3 for details). It is situated approximately 45 km northwest of Debre Markos and 230 km northwest of Addis Abeba and covers a drainage area of about 4860.4 ha. The watershed is characterized by gently sloping to undulating hills at the top of the catchment, a rugged and dissected topography with steep slopes in the middle, and a gently sloping bottom part. Elevations range from 1980 to 2600 m a.s.l. The Minchet river, referred to as the Gerda river downstream, flows in a south-westerly direction to the outflow at Yechereka. Climate is dominated by a unimodal rainfall regime with a long rainy season from June to September (Kremt) and a long dry season from October to May. The average annual precipitation is 1690 mm, and the mean annual temperature is \(16\,^{\circ }\hbox {C}\). Local land use is dominated by smallholder rain-fed farming systems, emphasizing grain production, ox-ploughing and uncontrolled grazing practices (SCRP 2000). The Gerda watershed has undergone no significant development or changes since the early 1980s and no mechanization has occurred.

Fig. 1
figure 1

Overview of Gerda watershed and location

SWAT model configuration

The Soil and Water Assessment Tool (SWAT2012 rev. 620) was used to assess streamflow and sediment load prediction uncertainty through the ArcSWAT interface (Version 2012.10_1.14). SWAT is a physically based river basin or watershed-modelling tool, which is capable of continuous simulation over long time periods.

The SWAT model divides the watershed into subbasins for better representation of the spatial heterogeneity. The subbasins are further discretized into hydrological response units (HRUs), which are a unique combination of soil types, landuse types and slope. For every single HRU, the soil water content, surface runoff, crop growth including management practice and sediment yield are compiled and then aggregated to the subbasin level by a weighted average. For climate, SWAT calculates a centroid for each subbasin and uses the station nearest to that centroid. Runoff is predicted separately for each HRU and routed at subbasin level to obtain total runoff figures (Neitsch et al. 2011). Surface runoff is estimated using a modified SCS curve number method, which estimates amount of runoff based on local landuse, soil type, and antecedent moisture condition. Watershed concentration time is estimated using Manning's formula for both overland and channel flows. Soil profiles are divided into multiple layers, which influence soil water processes like plant water uptake, later flow and percolation to lower layers as well as infiltration and evaporation. Potential evapotranspiration can be modelled with the Penman-Monteith, the PriestleyTaylor, or the Hargreaves method (Neitsch et al. 2011), depending on data availability.

In this study, surface runoff was estimated using the Natural Resources Conservation Service Curve Number (SCS-CN) method (USDA-SCS 1972). Sediment loss for each HRU was calculated through the Modified Universal Soil Loss Equation (MUSLE), and routing in channels was estimated using stream power (Williams 1969). The Hargreaves method (Hargreaves and Za 1985) was used to estimate potential evapotranspiration, and the water balance in the watershed was simulated using Neitsches equation (Neitsch et al. 2011). Finally, sediment deposition in channels was calculated using fall velocity (Arnold et al. 2012). All equations and ensuing descriptions of elements can be found in SWAT theoretical documentation Version 2009 (Neitsch et al. 2011).

Model parametrization

A high-resolution (\(5\,\text{m}\,{\times }\,5\,\text{m}\)) digital elevation model (DEM) from the Advanced Land Observing Satellite Daichi [Alos of the Japan Aerospace Exploration Agency (JAXA)] was used to set up the SWAT model. Subbasin partitioning and stream networks were computed automatically through the ArcSWAT interface with the manual configuration of the outlet feature classes to include the Minchet catchment as a calibration feature at the top of the Gerda watershed (see Fig. 1, for details). A drainage area of 100 ha was chosen as a threshold for delineation of the catchment as they approximately correspond to the Minchet subcatchment size.

Data on agricultural practices were obtained from the Water and Land Resource Centre [WLRC, formerly the Soil Conservation Research Programme (SCRP)] and from the authors’ fieldwork and interviews conducted in 2008, 2012 and 2014. The land use data were adapted from a land use map with a field-scale resolution and nine land use categories, which was recorded in 2014 (WLRC 2016). Tillage was implemented using heat units, and the results were cross-checked with the observed seasonal incidence and adapted as necessary based on planting and harvesting dates from field interviews (Ludi 2004; Roth 2010). In addition, the traditional Ethiopian ploughing tool called Maresha was added to the ArcSWAT management database. The Maresha was assigned a tillage depth of 20 cm and mixing efficiency of 0.3 (Temesgen et al. 2008; Dile and Srinivasan 2014).

The physical and chemical parameterization of the soil maps was adapted from the WLRC soil report (Belay 2014) and, where WLRC data were missing, from the doctoral dissertation of Zeleke (2000), from the SCRPs Soil Conservation Research Report 27 for the Minchet catchment (Kejela 1995), and from Hurni (1985). The land use and soil data contained 19 soil and 12 land use classes (see Fig. 2 for details) The model setup comprised 2349 HRUs within 12 subbasins. The model was created using a zero per cent threshold, meaning all HRUs were accounted for in modelling. Daily precipitation records combined with minimum and maximum temperature records for the Minchet watershed were used to run the model. Weather station data from Yechereka were added for the years 2013 and 2014. Solar radiation, potential evapotranspiration and wind speed were generated by the ArcSWAT weather generator. Storm-based sediment concentrations measured at the Minchet and the Yechereka outlets were used for model calibration and validation. Flow observations were available for the entire year, while sediment data were only available during rainfall events. The sediment concentration in the Gerda watershed is measured only during the rainy season, which is from June to October and assumed to be negligible during the remaining months. This is a realistic assumption given the extremely low sediment concentration during the dry season (Easton et al. 2010; Betrie et al. 2011).

Fig. 2
figure 2

Soil map (a) and land use map (b) of Gerda watershed including details about area distribution

Model evaluation

The ArcSWAT model was run on a daily time step for a period of 31 years (1984–2014), including a warm-up period of two years. The model was calibrated using SUFI -2 in the SWAT-Cup (Version 5.1.6.2), using the objective function ‘\(bR^2\)’, where the coefficient of determination \(R^2\) is multiplied by the coefficient of the regression line between measured and simulated data (Abbaspour et al. 2015). Through this function, discrepancies between magnitudes of the two signals as well as their dynamics are accounted for.

$$\begin{aligned} bR^2 = \bigg \{\begin{array}{ll} \vert b \vert R^2 \quad &{\text { if }}\ \vert b \vert < 1\\ \vert b \vert ^{-1} R^2 \quad &{\text { if }}\ \vert b \vert > 1 \end{array} \end{aligned}$$
(1)

The threshold value of the objective function was set to 0.6, which is the minimum applicable value according to Faramarzi et al. (2013) and Schuol et al. (2008). The measured data were divided into two periods for calibration and validation. The calibration and validation periods were selected based on the availability of data and based on equally distributed years with similar amplitudes and seasonal occurrences of rainfall and discharge. Due to a prolonged gap in the Minchet catchment discharge data from SCRP/WLRC after the year 2000, the calibration period was set from 1984 to 2000 (without 1999) and the validation period was set from 2010 to 2014. Calibration was done for the Minchet catchment only. Subsequently, the model parameter ranges were transferred to the entire catchment, where discharge and sediment loads were validated with measured discharge and sediment load data from the outlet at Gerda.

In this study model, evaluation was first performed following the calibration technique by Abbaspour (2015) and Arnold et al. (2012) for P-factor and R-factor before considering model performance ratings suggested by Moriasi et al. (2007) for commonly applied statistical parameters: (1) the Nash-Sutcliffe efficiency (NSE), (2) the ratio of the root-mean-square error to the standard deviation of measured data (RSR), and (3) the percent bias (PBIAS). When using SUFI-2, the first evaluation aims at reaching reasonable results for P-factor and R-factor. The P-factor is the percentage of observed data enveloped by the modelling results—called 95 per cent prediction uncertainty, or 95PPU—while the R-factor is the relative thickness of the 95PPU envelope. Suggested values for the P-factor are >0.70 for discharge and an R-factor around 1 (Abbaspour et al. 2015); if the measured data are of high quality, then the P-factor should be >0.80 and R-factor <1. According to Schuol et al. (2008) for less stringent model quality requirements, the P-factor can be >0.60 and R-factor <1.3.

The NSE ranges from \(-\infty\) (negative infinity) to 1, with 1 representing perfect concordance of modelled to observed data, 0 representing balanced accuracy, and observations below zero representing unacceptable performance Nash and Sutcliffe (1970).

$$\begin{aligned}{ \text {NSE}} = 1-\frac{\sum _{i=1}^{n}(Q_{\rm{obs}}^{i}-Q_{\rm{sim}}^{i})^2}{\sum _{i=1}^{n}(Q_{\rm{obs}}^{i}-Q_{\rm{obs}}^{\rm{mean}})^{2}} \end{aligned}$$
(2)

where \(Q_{\rm {obs}}^i\) and \(Q_{\rm {sim}}^i\) are the observed and simulated data at the ith time step, respectively. \(Q_{\rm {obs}}^{\rm {mean}}\) is the average of the observed data, and n is the total number of observations.

$$\begin{aligned} {\text {RSR}} = \frac{\text {RMSE}}{\text {STDEV}_{\rm{obs}}} = \frac{\sqrt{\sum _{i=1}^{n}(Q_{\rm{obs}}^{i}-Q_{\rm{sim}}^{i})^2}}{\sqrt{\sum _{i=1}^{n}(Q_{\rm{obs}}^{i}-Q_{\rm{mean}})^2}} \end{aligned}$$
(3)

The RSR is a standardized RMSE, which is calculated from the ratio of the RMSE and the standard deviation of measured data \((\text {STDEV}_{\rm {obs}})\). RSR incorporates the benefits of error index statistics and includes a scaling factor. RSR varies from the optimal value of 0, which indicates zero RMSE or residual variation, which indicates perfect model simulation to a large positive value Moriasi et al. (2007).

$$\begin{aligned}{ \text {PBIAS}} = \frac{\sum _{i=1}^{n}(Q_{\rm{obs}}^{i}-Q_{\rm{sim}}^{i})*100}{\sum _{i=1}^{n}(Q_{\rm{obs}}^{i})} \end{aligned}$$
(4)

The PBIAS measures the average tendency of the simulated values to be larger or smaller than their observed counterparts. The optimal value of PBIAS is zero. A positive PBIAS value indicates the model is underpredicting measured values, whereas negative values indicate overprediction of measured values.

Moriasi et al. (2007) defined model performance ratings for evaluation divided into unsatisfactory, satisfactory, good and very good. For this study, we applied these recommendations strictly for hydrology and sediment loss.

A model can be considered as calibrated if there are significant NS, RSR or PBIAS between the best simulation and the measured data for a calibration and a test (validation) data set, while P-factor and R-factor are within defined ranges (Abbaspour et al. 2007a; Moriasi et al. 2007).

Results and discussion

Sensitivity analysis and calibration

A sensitivity analysis for seventeen streamflow and sediment load variables was carried out in a first step of calibration. These variables were gathered from several articles (Abbaspour et al. 2007a, 2015; Talebizadeh et al. 2010; Arnold et al. 2012) and separated into two categories. The first category contained variables that only affect hydrology, and the second category contained variables that affect both hydrology and sediment load. First the hydrology was calibrated to a satisfactory level before integrating sediment loss variables. In a second step, sediment loss was then calibrated together with the hydrology but the hydrological parameters were kept within the previously calibrated ranges. Both calibrations were performed in SWAT-Cup using SUFI-2 and were run with 500 iterations each. Final results of calibrated parameter ranges are presented in Table 1. Parameters were ranked according to their respective sensitivities. The curve number (CN2) followed by the groundwater revap coefficient (GW\(\_\)REVAP) and the deep aquifer percolation fraction (SOL\(\_\)AWC) were most sensitive for the hydrology. Measured and simulated results were correlated at the outlet of the Minchet catchment (Subbasin 1), while validation was carried out at the outlet of the Minchet catchment and at the outlet of the entire catchment at Gerda (Subbasin 11). The calibrated model uncertainty assessment was determined through P-factor and R-factor quantification. The model was able to explain 88\(\,\%\) of the observations within a very narrow 95PPU band of 0.57 (Table 2).

Table 1 Streamflow and sediment load parameter ranges for calibration
Table 2 Calibration results for discharge and sediment loss modelling
Table 3 Description of study sites and main characteristics SCRP (2000)
Table 4 Data sources and data resolution
Table 5 Water balance ratios and sediment yield for average annual data

Statistical performance for the calibration of hydrology in the Minchet catchment quantified by RSR (0.29), NSE (0.92) and PBIAS (\(-14.9\)) wasvery good, although PBIAS indicated a slight overprediction. Measured and simulated hydrographs were plotted for visual comparison including calibration and validation periods for Minchet and Gerda and visual distribution of the 95PPU band (see Figs. 3, 7 for details) (Fig. 4).

Fig. 3
figure 3

Calibration and validation graphic in the Minchet catchment. On top the streamflow calibration and validation and at the bottom the same for sediment loss

Fig. 4
figure 4

Validation results for model parameter transfer to the Gerda catchment. On top the streamflow validation and at the bottom the sediment load validation

Fig. 5
figure 5

Year by year calibration and validation results for streamflow in the Minchet catchment

The hydrograph of the individual years (Fig. 5) shows that streamflow is adequately represented for each year and that, except for some minimal over-predictions, amplitudes and seasonal incidences were very well reflected (Table 3).

Sediment loss calibration performed fairly well with satisfactory results. The model could explain \(45\,\%\) of the observations within a reasonable 95PPU band (1.04), while statistical parameters yielded satisfactory results for RSR (0.65), NSE (0.57) and good results for PBIAS (10.1). PBIAS indicated a minor under-prediction of sediment loss modelling. The visual interpretation of sediment calibration in the Minchet catchment showed a satisfactory overall agreement. The model generally slightly under-predicted the sediment load and generated some minor unexplained peaks (see Figs. 6, 8, for details).

Fig. 6
figure 6

Year by year calibration and validation results for sediment load in the Minchet catchment

Fig. 7
figure 7

Observed streamflow vs simulated streamflow dot plot for calibration

Fig. 8
figure 8

Observed sediment vs simulated sediment dot plot for calibration

The calibrated parameter ranges for hydrology and sediment loss were later used for the validation of the model for (1) the Minchet catchment and for (2) the uncalibrated Gerda catchment at the outlet downstream (see Fig. 1, for details).

Validation of streamflow and sediment load

Hydrological and sediment load responses during validation period

The calibrated parameter ranges were applied to the validation period from 2010 to 2014 in SWAT-Cup. Hydrology validation for the Minchet catchment performed very satisfactorily with 73\(\,\%\) of all observation explained by the model with a very narrow 95PPU band (0.45). Statistical parameters were very good considering Moriasi’s performance ratings (2007). RSR (0.32), NSE (0.90) and PBIAS (\(-13.7\)) were better than for calibration. This result could be in relation with differing general conditions between the calibration and the validation period, which could lead to differences in performance rating results for the respective periods as proposed by Zhang et al. (2008).

The sediment validation for the period from 2010 to 2014 for the Minchet catchment bracketed 42\(\,\%\) of all observations with a 95PPU band of 1.09. Statistical parameters were good with RSR (0.59), NSE (0.65) and PBIAS (\(-19.5\)). These results were slightly less efficient than the ones achieved bySetegn et al. (2010) with very good RSR (0.29) and NSE (0.79) but with a less accurate PBIAS (0.30).

The hydrograph of this validation period (see Fig. 3) shows a close agreement for streamflow and for sediment loss. The main discrepancies arise for the peaks during the main rainy season, and for the duration and the extent of the dry season. Increased uncertainty, shown through larger 95PPU bands, follows the same logic and mainly arises at peak and low-flow levels (Table 4).

Hydrological and sediment loss validation for parameter transferred catchment

Validation was also carried out for the entire Gerda catchment as to find out if a model parameter transfer from a catchment within a larger catchment is applicable and can be successfully achieved. For this, the calibrated parameter ranges from the Minchet catchment calibration were used to validate the model in the entire Gerda catchment, which is forty-six times larger. The hydrology validation yielded very good results in the performance rating proposed by Moriasi et al. (2007). With an R-factor showing that 68\(\,\%\) of all observations could be explained with the model with a 95PPU band of 0.71 and very good RSR (0.45) and NSE (0.79), the model validation was all in all satisfactory. Only PBIAS (\(-42.9\)) showed an unsatisfactory result, which can be explained with the fact that 2013 and 2014 were two extremes of climatic years. 2013 had very high rainfall events with the highest annual rainfall in the Minchet catchment recorded, while 2014 was a very low-rainfall year. Knowing these facts, the validation of the model in the Gerda catchment through model parameter transfer only, yielded very good results.

Catchment water balance and general results

Besides comparing the statistical parameters, which showed a close agreement for streamflow and sediment loss, we chose to monitor the water balance for the catchment. The movement of water through the continuum of the soil, the vegetation and the atmosphere is important to understand annual variability of water balance components (Neitsch et al. 2011) and is important to understand if a model is realistically moving the water components in a catchment. Water balance distribution represented as components averaged over the entire simulation period divided into calibration, and validation is shown in Table 5. The table includes precipitation (PCP), initial soil water content (SW), evapotranspiration (ET), surface runoff (SURQ), lateral flow (LATQ), groundwater (GWQ), percolation (PERC), water yield (WYLD) and sediment yield (SEDYLD). Simulated annual average baseflow to total discharge ratio was 0.77, while the annual average baseflow to total flow ratio obtained through digital filter methods from observed discharge averaged to 0.71 (+8.4\(\,\%\)). Streamflow-to-precipitation ratio from model output resulted in a ratio of 0.56, while the comparison of measured streamflow-to-precipitation ratio showed 0.6 (\(-6.6\,\%\)).

We compared the modelled sediment yield results for Minchet catchment to WLRC compiled sediment yield results and to other studies (Bosshardt 1997; Setegn et al. 2010; Guzman et al. 2013; Lemann et al. 2016), which show reported mean annual sediment yields from \(19.3\) to \(29.5\,\rm {ha}^{-1}\,\rm {y}^{-1}\) and resulting in an overall mean annual sediment yield of \(26.12\,\rm {ha}^{-1}\,\rm {y}^{-1}\) for the period of 1984–1993. The long-term mean annual measured sediment yield from the WLRC grab samples for our study from 1984 to 2014 is \(20.65\,\rm {ha}^{-1}\,\rm {y}^{-1}\) while the SWAT modelled annual mean was \(18.8\,\rm {ha}^{-1}\,\rm {y}^{-1} (-8.95\,\%)\).

We then compared the modelled sediment yield results for the entire Gerda catchment to WLRC measured data. The SWAT modelled annual sediment yield was \(27.07\,\rm {ha}^{-1}\,\rm {y}^{-1}\), while the measured amount resulted in a mean annual sediment yield of \(30.35\,\rm {ha}^{-1}\,\rm {y}^{-1} (-8.7\,\%)\).

Conclusions

The overall aim of this study was to evaluate the SWAT model performance (1) in the Minchet catchment and (2) to evaluate a possible model parameter transfer from a subcatchment to a substantially larger watershed through validation alone. The results showed that the SWAT model could, with a high agreement, catch the amount and the variations for both streamflow and sediment loss in the Minchet subcatchment. Monthly and annual mean discharge and sediment loss were easily reproduced, while the catchment water balance was highly accurate and realistic.

Overall, the results of the SUFI-2 calibration with \(\hbox {b}R^2\) objective function in the Minchet subcatchment and the Gerda catchment produced reasonable outcomes for calibration and validation as well as for uncertainty analysis. The model parameter transfer from the calibrated subcatchment to the uncalibrated watershed resulted in reasonable goodness of fit ratings for hydrology and just below the satisfactory threshold for sediment without any prior calibration.

The results showed that the SWAT model was able to capture streamflow amounts and streamflow variability for both catchments major deviations and optimized parameter ranges produced better results at the monitoring site of the calibrated watershed.

The applied SUFI-2 optimization scheme produced reasonable outcomes for calibration, uncertainty analysis and validation of the SWAT model. This means that the model calibrated in the subwatershed could be used to model the entire watershed through model parameter transfer within a reasonable deviation of under \(10\,\%\) for both streamflow and sediment loss.