Introduction

The importance of spatial variability of land surface characteristics in understanding physical, hydrological, biological, and other related processes in watersheds is widely recognized, and it is critical to account for spatial variability when modeling watershed hydrology and understanding watershed hydrological processes using hydrological models (Beven 2001). Hydrological models have evolved into indispensable tools for comprehending hydrologic processes at the watershed scale, and they are widely used for hydrologic prediction.

Recently, various lumped and distributed (semi- and fully distributed) hydrological models have been used for simulating watershed hydrological processes around the world, such as Système Hydrologique Européen (Abbott et al. 1986), MIKE-SHE (Refsgaard et al. 1995), TOPMODEL (Beven and Kirkby 1979), WEAP (SEI 2007), and SWAT (Arnold et al. 1998). Lumped hydrologic models are typically prohibited in applications to un-gauged watersheds due to significant differences in watershed conditions (Sahoo et al. 2006). In contrast, a distributed hydrological model provides a comprehensive approach for characterizing watershed spatial variability, allowing watershed spatial variability to be well characterized by specifying data and parameters for a network of grid of points.

SWAT hydrological model is one of the most widely used semi-distributed, continuous time-scale model (Arnold et al. 1998), and has been used in a wide range of countries (Piniewski and Okruszko 2011; Ficklin et al. 2013; Gosain et al. 2006). Like in other countries, SWAT hydrologic model has been fairly calibrated and validated using single-site streamflow data (e.g., Setegn et al. 2010, 2011; Shawul et al. 2013; Serur and Sarma 2018) and has been successfully applied for various hydrological activities including watershed and lakes water balance studies, climate, and land-use/land-cover change impacts on watershed hydrology, developing watershed management scenarios for different basins in Ethiopia. Besides, SWAT determines hydrological processes at three spatial scales (basin, sub-basin, and hydrological response unit); a multi-site calibration technique can improve the representation of the various basin hydrological processes (Migliaccio and Chaubey 2007). As a result, model calibration and validation at different basin locations is critical for the SWAT model, especially in a spatially heterogeneous basin like Ethiopia's upper Awash sub-basins.

Calibration and validation of hydrological models are a critical step in assessing the performance of hydrologic models in simulating watershed hydrology. In their study, Wheater et al. (2010) stated that calibrated and validated hydrological models provide the opportunity to assess variables that are difficult to quantify in the actual field due to their inherent nature (spatiotemporal variation), and they can be used for various water resource management and development activities, particularly for 'what if scenarios.' For example, hydrological models after their successful calibration and validation could respond basic research questions, such as how will hydrological processes respond to future changing environments? (i.e., how will future hydrological processes respond under changing land use–land cover, climate change, economic development, and catchment management activities; these are just to name a few).

A common approach to calibrate and validate hydrological models is at a single confluence point in a basin (Wang et al. 2012). However, many scholars (Piniewski and Okruszko 2011; Daggupati et al. 2015; Niraula et al. 2015; Desai et al. 2021) recommend calibration and validation of hydrological models using multi-site streamflow data to achieve the required model performance efficiency and detect spatial variability within the basins. Furthermore, when compared to single-site calibration, the multi-site calibration and validation approach provides an incremental stage of parameter freedom, which improves model performance (Moriasi et al. 2007; Shrestha et al. 2016).

A single-site calibration is the most commonly used model calibration technique, in which streamflow from a single gauging location (usually at the basin outlet) is used (Bannwarth et al. 2015; Shi et al. 2013). However, because only one-output information at the basin outlet is used for model constraint, the applicability of this technique for complex and spatially heterogeneous basins is questionable. Such a method could produce physically unrealistic parameter values that do not accurately represent the variability of the processes in the basin. Alternatively, a multi-site calibration technique (Niu et al. 2014; Piniewski and Okruszko 2011; Shope et al. 2014; Zhang et al. 2008) can be used to appropriately represent the spatial variability of a given basin. This method, which represents spatial variability with different parameter values, is expected to improve the performance of spatially distributed hydrologic models. Multi-site calibration techniques can reduce the likelihood of optimizing the model to physically unrealistic parameter values in this way. As a result, multi-site calibration may be able to improve hydrologic partitioning in a spatially heterogeneous basin.

Previous studies suggested that hydrologic model calibrated only against the streamflow data at the watershed outlet cannot perform well for the internal variables simulation (e.g., Freer et al. 2003; Moussa et al. 2007), and this calls for a rigorous calibration and validation using multi-site streamflow data. Researchers have long recognized the importance of multi-site calibration and validation for distributed hydrologic modeling (e.g., Andersen et al. 2001; Khu et al. 2008; Dai et al. 2010) against single-site calibration. Bergstrom et al. (2002) suggested that the model calibrated against multi-site streamflow data can greatly increase confidence in the physical relevance of the model. Vazquez et al. (2008) adopted a multi-criteria protocol which included statistical, analytical, and visual criteria to calibrate the model. They also suggested that multi-site calibration protocol enhanced the physical consistency of model prediction. Generally, in addition to reduce the uncertainty and modeling bias (Dai et al. 2010), it was believed that multi-site calibration strategy better constrain the calibration process, and is able to unlock the equifinality of distributed hydrological models to a certain degree.

Various hydrological models have been calibrated and validated using single-site streamflow data in Ethiopia (e.g., Legesse et al. 2003, 2004, 2010; Zeray et al. 2007; Desta and Lemma 2017); however, multi-site calibration and validation has not been sufficiently considered (Serur and Adi 2022), which is critical for planners and decision-makers to plan and implement sustainable water resource management strategies. As a result, the calibration and validation of the SWAT hydrologic model using multi-site streamflow data was used in this study to test SWAT's ability to simulate watershed hydrology and to analyze water balance in Ethiopia's upper Awash sub-basin.

Because of the preparation and encoding of input data, model structure, and the probabilistic nature of most hydrological parameters, hydrological models are frequently associated with uncertainty (Abbaspour et al. 2007; Mousavi et al. 2012) and estimation of these uncertainties is also causing assignment in hydrological modeling studies (Jiang et al. 2017). However, sensitivity analysis is a critical method for estimating system uncertainty and defining the effects of model input–output parameters (Srivastava et al. 2014; Paul and Negahban-Azar 2018).

Calibration and validation of hydrological models can be done manually, but it is subjective and time-consuming (Kannan et al. 2008; Mousavi et al. 2012). As a result, using the auto-calibration method in various water resource management and development studies has become common (Molina-Navarro et al. 2017). In the SWAT-Calibration and Uncertainty Programs (SWAT-CUP) auto-calibration tool, various approaches to performing uncertainty analysis are available, including "Generalized Likelihood Uncertainty Estimation (GLUE), Particle Swarm Optimization (PSO), Markov Chain Monte Carlo (MCMC), Sequential Uncertainty Fitting (SUFI-2), and Parameter solutions (Parasol). These algorithms differ in terms of assessment strategies and parameter range estimation for a specific objective function (Kouchi et al. 2017). SUFI-2 is a semi-automatic optimization technique that employs the Latin Hypercube sampling scheme, which is a highly efficient sampling method for obtaining optimal results and performing calibration and validation at multi-site hydrometric stations and allows for the use of a variety of objective functions (Wu and Chen 2015)."

Several studies on the performance evaluation of ParaSol, SUFI-2, and GLUE parameter optimization techniques in SWAT-CUP to estimate uncertainties in various river basins around the world have been conducted (e.g., Uniyal et al. 2015; Wu and Chen 2015; Zhang et al. 2015; Kouchi et al. 2017), and they concluded that SUFI-2 provides better results and best parameter ranges with the shortest running time than GLUE. As a result, the SUFI-2 algorithm was used in this study to auto-calibrate the SWAT model.

The Ethiopia’s upper Awash sub-basins provide a number of benefits to Ethiopia in terms of water supply, power generation, irrigation, and agricultural development. However, the catchment has been plagued by frequent floods and droughts as a result of a variety of natural (mainly very rugged geographical features and climate change and variability) and anthropogenic (human activity-driven changes like land-use/land-cover changes, rapid urbanization and industrialization, poor land management practices, increasing population growth and consequent land degradation, and water quality problems particularly in the upper portion of the catchment) factors. Besides, upper Awash river basin (UARB) in Ethiopia is facing many challenges, including ever-increasing water demand for various water competing sectors, such as domestic, industrial, public institutions, agriculture (irrigation and livestock), and environmental flow to maintain ecosystem health (Alemayehu et al. 2006; Ayenew 2007; Legesse and Ayenew 2006; Pascual-Ferrer et al. 2013, 2014). As a result, the first step to manage and develop basin’s scarce water resources sustainably is to evaluate the performance of hydrological model in simulating complex hydrological processes while taking into account the climatic and physiographic features of a specific basin. However, most hydrological models applied in Ethiopia are calibrated using single-site (at basin confluence point) data, which does not detect spatial variability within the basin.

In this piece of work, therefore, efforts were made to calibrate and validate SWAT hydrological model using multi-site gauged streamflow data to detect spatial variability within the upper Awash sub-basin, which is a key driver for innovation and technological development to ensure and formulate sustainable water resource management and development strategies in the specific sub-basin. Besides, the sub-basin's spatiotemporal variability and hydro-meteorological data quality are the easiest to hold responsible for the model simulation uncertainty. Data availability and quality are crucial when using distributed hydrological models. As a result, this study considered improved hydro-meteorological data length as compared to previous studies for the study area. Thus, the main objective of this study was to assess the performance of the SWAT model with the SUFI-2 algorithm in the upper sub-basins of the Awash River basin in Ethiopia, taking into account multi-site calibration and validation using observed streamflow, performing sensitivity analysis to reduce model uncertainty through detecting spatial variability, and analyzing water balance of the sub-basins.

Materials and methods

Description of the study area

Location

The Awash River Basin is one of the 12 Ethiopian River Basins that drain the country's central and eastern highlands, covering approximately 110,000 km2. The river flows from Ginchi, west of Addis Ababa, through the Rift Valley, and ends in Lake Abe, on the Ethiopia–Djibouti border. The elevation of the basin ranges between 250 and 3576 m above mean sea level. Based on agricultural activities, socioeconomic system, climatological, physical, and water resource characteristics, the Awash River basin is divided into three valleys (upper, middle, and lower) (Edossa et al. 2010). According to the Awash Basin Authority (2017) report, the total annual water demand in the Awash Basin for irrigation, domestic water supply, livestock, and industry is estimated to be around 3.4 BMC (billion cubic meters).

The Upper Awash sub-basin (Fig. 1), which includes Ethiopia's capital city, is one of the most densely populated sub-basins in the Awash River basin's western highlands. Large mechanized and private irrigated agricultural farms, as well as rapidly expanding industries, are found in this sub-basin. Water consumption rates will rise in the future due to population growth and other factors (Edossa et al. 2010; Tadese et al. 2019). As a result, the spatiotemporal calibration and validation of a hydrological model for simulating streamflow and analyzing the sub-basin's water balance is economically and environmentally significant for the sub-basin's water resource management and development. The geographical location lies between latitude 8°10′57″–9°13′54″N and longitude 37°57′–39°11′E. The sub-basin's total area is approximately 11,232 km2 (Fig. 2). Elevation of sub-basin varies between 1584 to 3576 m above sea level (Fig. 3).

Fig. 1
figure 1

Location map of the study area

Fig. 2
figure 2

Hydrologic and meteorological stations, and reach together with the three main sub-basins/watersheds in the UARB

Fig. 3
figure 3

DEM of the study area

Climate

The major contribution of rainfall to surface flow is with two distinct rainy periods, which are primarily influenced by a shift in the Inter-Tropical Convergence Zone (ITCZ). The Awash basin's mean annual rainfall ranges from 100 to 1700 mm, with wide spatial and temporal variation. The distribution of annual rainfall in the basin is mostly bimodal in the lower and unimodal in the Awash part (Edossa et al. 2010), with the temporal variation accounting for 71 and 29% of the share for rainy (June–October) and dry (November–May) seasons, respectively (AwBA 2017). Because of these variations, the basin is facing a critical water resource availability problem, as it is Ethiopia's most irrigated area, with several larger urban areas, including the capital Addis Ababa (Tadese et al. 2019). The Awash basin is characterized by different surface and groundwater potentials due to the spatial and temporal variation of rainfall in different sub-basins. The study sub-basin mean annual rainfall ranges from 876.45 to 1420.92 mm (with a long-term mean value of 1365.03 mm) and its mean annual temperature ranges from 15.70 to 26.45 °C.

Hydrology

It is hydrologically divided into 21 sub-basins based on the hydrological boundaries of tributaries and unique hydrological regimes that contribute significantly to the basin's water system. The basin's average annual rainfall generates 10.3, 4.6, and 3.6 BCM, respectively, for groundwater recharge, streamflow, and water stored in an open water system. Water harvesting and storage structures are being prioritized due to the temporal and spatial variation of available water. Furthermore, the average rate of streamflow in the basin is 1.4 L per second per square kilometer (AwBA 2017). The basin drains tributaries from highland areas, increasing flow and causing flooding in lower areas, especially during the rainy season. While some tributaries, such as Akaki, Mojo, Kasam, Kebena, Borkena, and Mile, are perennial, others, particularly lowland streams, contribute only during the rainy season.

Geology

Ethiopia is divided into four major physiographic regions: the western plateau, the southeastern plateau, the main rift, and the afar depression. The Upper Awash River Basin is bounded by the north-central plateau, an escarpment, and a rift valley. The regional geology of the Upper Awash River Basin is underlain by a diverse range of high grade metamorphic rock and is divided into rock categories with varying ages of formation. The regional geologic formation of the study area is composed of these Mesozoic Sedimentary successions, Tertiary and Quaternary age groups of acidic, basic volcanic rock, Quaternary Lacustrine and Alluvial deposits. According to the Oromia Water Work Design and Supervision Enterprise (OWWDSE 2017), the major geology formations found in this study area are Addis Ababa Ignimbrite (Nadl), Central volcanic of Wechecha, Fur, and Yerer (NcvTy), Tarmaber Basalt (PNtbB), Weliso Ambo basalt (QwaB), Entoto becho rhyolite (NebRy), and Akaki bas (Qal). The main Ethiopian Rift (MER) is typically sought after by a fault arrangement (sequence) trending in the direction of a NE–SW fault system and N–S to N–E trending system. Upper Awash Basin's structural setting is located at the intersection of two important structures, the NNE–SSW trending MER and the E–W trending Ababa–Nekemte volcanic lineament, which are subjected to NE–SW, E–W, and NW–SE large area faulting system.

Hydrogeology

Upper Awash River sub-basin in Ethiopia is a well-known aquifer and the most developed and intensively used aquifer in Ethiopia, providing water supply for the capital city (Addis Ababa) and special zones of Oromia (Sebata, Gelan, Holeta, Debrezeit, Mojo, Awash Kunture, Awash Melka, Tulu Bolo, and other small towns) (Dereje et al. 2015). The occurrence and distribution of groundwater are determined by the geological formation's porosity, hydraulic conductivity (permeability), and transmissivity, as well as the amount of recharge to the geological formation. It is primarily influenced by the area's geology, degree of geological weathering or geological structures, and geomorphology. The geometry of the fault aquifer dipping direction of the underlying lithology, porosity, and permeability of the fault lithology most likely control groundwater recharge and flow aquifer in the Upper Awash River Basin. Alluvial aquifer (primary porosity aquifer), lower basalt aquifer, regional Aquiclude, localized Aquiclude, and upper Awash aquifer are the hydrogeological (aquifer) classes in the sub-basin (Yitbarek et al. 2012).

Soil and land cover

The major physical catchment characteristics that govern runoff generation are soil and land cover, which are influenced by topography. The basin contains a variety of soil groups, the most common of which are leptosol, chromic, eutric, dystric, and vertic. Natural vegetation (short grasses, Savanah, tree/shrubs, and marshes), wasteland (desert and sand dunes), agricultural land, and lakes are the land covers of the area. Cultivated land and open shrublands dominate the basin (Taye et al. 2018).

Socioeconomic condition

The Awash River Basin is an important basin with a population of 18.6 million people, 34.4 million livestock, and 199,234 ha of irrigated land, and various commercial and industrial activities, including the country's capital, Addis Ababa. It is the most used river basin, because modern agriculture was introduced in the 1950s (AwBA 2017). It is Ethiopia's most irrigated region, and it is currently experiencing critical water resource availability problems (Tadese et al. 2020). The Awash basin economy is highly susceptible to hydrological and climate variability, making it extremely vulnerable. Furthermore, there is significant pressure due to rapid population growth, which has caused conflict in recent years. As a result, frequent and improved assessments of water resource availability, forecast, and supply–demand balance conditions using hydrologic models in the Awash basin are required for improved economic benefit, resilience, and decision-making in water allocation and investment policies (Vivid Economics 2016).

Input data used and their source

The main data needed to calibrate the SWAT hydrological model in a given catchment area are (i) daily weather data (rainfall, maximum and minimum temperatures, wind speed, solar radiation, and relative humidity), (ii) hydrologic data (in this case, streamflow), and (iii) spatial data (DEM, soil map, and LULC map).

Weather data

Daily weather data from eight weather stations within the catchment (Ginchi, Addisalem, Addisababa bole, Debrezeyit, Ejere, Ejersa lelle, Mojo, and Hombole) were collected for this study from the Ethiopian National Meteorological Agency (NMA) from 1988 to 2018.

Streamflow data

Daily streamflow data for the years 1988 to 2014 were collected from Ethiopia's Ministry of Water and Energy (MoWE) at three major gauged stations in the upper Awash river sub-catchment (Awash Melka Kunturi, Awash Melka Hombole, and Mojo Upstream of Koka). Figure 2 depicts the spatial distribution of hydrologic and meteorological stations within the sub-basin.

Digital elevation model (DEM)

This study used DEM data from the Shuttle Radar Topography Mission (SRTM) with a resolution of 30 m × 30 m. The DEM data (shape file) for this project obtained from MoWE. The elevation variation of the study catchment ranges from 1584 to 3576 m above mean sea level, as shown in Fig. 3.

Soil data

MoWE provided a soil map (in the form of a shape file) as well as major soil physicochemical properties (depth of soil layer, soil texture, hydraulic conductivity, bulk density, and organic carbon content). As a result, 15 different soil groups have been identified, with Chromic Vertisols and Calcic Fluvisols being the two most common in the catchment. Figure 4 depicts the spatial distribution of soil types throughout the sub-basin.

Fig. 4
figure 4

Soil map of the study area

Land-use and land-cover (LULC) data

The Ethiopian Mapping Agency (EMA) provided an LULC map in raster format for the year 2013. Agricultural land use is found to be the most dominant land use in the study catchment. Figure 5 shows the spatial distribution of LULC within the sub-basin.

Fig. 5
figure 5

LULC map of the study area

Soil and Water Assessment Tool (SWAT)

SWAT is a physically based, comprehensive, semi-distributed, and process-based river basin scale hydrological model developed by the Agricultural Research Services of the United States Department of Agriculture (USDA-ARS). It is a continuous daily, monthly, or annual time step ArcGIS interface model used to quantify the impact of land management practices in large and complex watersheds (Arnold et al. 2012a). It can simulate "surface and subsurface flows, pesticides, and nutrient and sediment movement in the hydrologic cycle of a catchment." The model includes hydrological processes, such as evaporation, infiltration, percolation, plant uptake, lateral and groundwater flows, snowfall, and snowmelt (Neitsch et al. 2005). When modeling watershed hydrology with SWAT, the watershed is divided into sub-watersheds, which are then segmented into hydrological response units (HRUs). The HRUs depict the physical heterogeneity of the catchment and are based on a unique combination of land use, soil type, and slope. On an HRU basis, the soil water balance is calculated using Eq. 1, and flow is routed from HRU to sub-catchments, and finally to the catchment outlet. In SWAT, each HRU's soil water balance is represented as follows (Arnold et al. 1998; Neitsch et al. 2011):

$${\text{SW}}_{{\text{t}}} = {\text{SW}}_{0} + \mathop \sum \limits_{i = 0}^{t} \left( {R_{{{\text{day}}}} - Q_{{{\text{surf}}}} - E_{{\text{a}}} - W_{{{\text{sweep}}}} - Q_{{{\text{gw}}}} } \right),$$
(1)

where SWt represents the final soil water content (mm), SW0 represents the initial soil water content on day I (mm), t represents the time (days), Rday represents the amount of precipitation on day I (mm), Qsurf represents the amount of surface runoff on day I (mm), Ea represents the amount of evapotranspiration on day I (mm), and Wsweep represents the amount of water entering the vadose zone from the soil profile on day I (mm).

The SWAT model includes a weather generator that generates daily values of precipitation, temperature, solar radiation, wind speed, and relative humidity based on statistical parameters calculated from mean monthly values. SWAT model estimates surface runoff volume using either the SCS curves number method or the Green and Ampt infiltration method, depending on data availability. To route flow through the channel, the SWAT model includes two methods (variable storage coefficient method and Muskingum routing). The model also includes three methods for estimating potential evapotranspiration (Penman–Monteith, Priestley–Taylor, and Hargreaves) (Neitsch et al. 2011). In this study, the SCS curve number method (SCS 1972) and the Penman–Monteith method (Monteith 1965) were used to estimate surface runoff and potential evapotranspiration during the SWAT simulation process, respectively, using available data in the upper Awash sub-basins. The ArcSWAT 2012.10_21 version, which is compatible with ArcGIS 10.4.1, was used in this study. A more detailed explanation of the equations used by SWAT can be found in (Neitsch et al. 2011).

SWAT is a popular hydrological model in Ethiopia. Studies have shown that it is applicable in various watersheds across the country in general (e.g., Setegn et al. 2010, 2011; Shawul et al. 2013; Serur and Sarma 2018), and particularly in the Awash river basin (e.g., Biru and Kumar 2018; Worqlul et al. 2018; Bekele et al. 2019; Shawul et al. 2019; Daba and You 2020; Getahun et al. 2020; Musie et al. 2020).

SWAT-calibration and uncertainty programs (SWAT-CUP)

The SWAT hydrological model's input parameters are process-based and must be calibrated to maintain a realistic uncertainty range (Arnold et al. 2012b). SWAT-CUP is an "auto-calibration tool developed by Abbaspour et al. (2007) as an interface to SWAT that can perform sensitivity analysis, calibration and validation, and uncertainty analysis." In the SWAT-CUP, there are several approaches to uncertainty analysis available, including PSO, MCMC, SUFI-2, GLUE, and Parasol. Among these approaches, the SUFI-2 algorithm is the most computationally efficient and has the best prediction uncertainty ranges (P-factor) and relative measurement coverage (R-factor) (Wu and Chen 2015; Khoi and Thom 2015; Paul and Negahban-Azar 2018). The P-factor is defined as the percentage of historical data that is linked by the 95% prediction uncertainty (95PPU) and is estimated at 2.5% and 97.5% levels of the cumulative distribution of output variables obtained using Latin hypercube sampling. The R-factor is the ratio of the average thickness of the 95PPU band to the measured data standard deviation (Abbaspour 2014) and recommended a 'P-factor' value of > 0.7 for streamflow simulation and an 'R-factor' value of around 1.0 depending on the situation.

Sensitivity analysis

The evaluation with the most critical parameters for a given catchment would be the first move in the calibration and validation process in SWAT (Arnold et al. 2012a). The global sensitivity analysis method in the SWAT-CUP software package (Abbaspour et al. 2007) was used for sensitivity analysis in this study. During the analysis, the larger the t-stat and the smaller the p value, the more sensitive parameter considering observed and simulated data and the most sensitive parameters to change streamflow in the catchment.

Calibration and validation

In this study, the streamflow for each of the three gauged stations—Awash Melka Kunturi, Awash Melka Hombole, and Mojo Upstream of Koka—was taken into account when calibrating and validating the SWAT model. This was done using SWAT-CUP, which offers a decision-making framework that incorporates a semi-automated SUFI-2 (Arnold et al. 2012b). Iteratively changing the values of the most sensitive parameters within the permitted upper and lower ranges was used to calibrate the model until a satisfactory level of agreement between the measured and simulated streamflow was attained.

Model validation was used after successful calibration by running a model with input parameters estimated during the calibration process. The validation process entails running a model with parameters identified during the calibration process and comparing the predictions to observed data that were not used in the calibration. As a result, SUFI2 was used for calibration and uncertainty analysis for the goodness of fit in this study, while all model input parameters were kept within a realistic uncertainty range in SWAT-CUP. This was accomplished by identifying the more sensitive parameters (Arnold et al. 2012b).

Calibration and validation were carried out by splitting the available observed streamflow data into a range of datasets for each of the three gauged stations. The availability of relatively consistent observed streamflow data influenced the choice of observed data periods for calibration and validation. First, the model was set to run from 1988 to 2014, with the first 2 years (1988–1989) serving as a model warm-up period, allowing the model to stabilize and fill the catchment depression for further simulations. Calibration (1990–2006) and validation (2007–2014) periods were established for the simulation processes.

Model performance evaluation measures

In this study, the SWAT model's performance in simulating the catchment hydrology of the upper Awash sub-basin was evaluated using three statistical performance evaluation indices (R2, NSE, and PBIAS), in addition to a physical examination of the hydrograph developed between measures and observed streamflows. Every stage of the model simulation, with the corresponding stage's printed parameters, tested the model's performance at each gauged station. For each gauged station, the parameter values were adjusted repeatedly within the permitted ranges until satisfactory agreements between observed and simulated streamflow were attained. R2 (Eq. 2), NSE (Eq. 3), and PBIAS (Eq. 4) statistical indices, along with a visual comparison of the observed and simulated streamflow hydrograph at three gauged stations, were used to assess the SWAT model's performance.

The statistical model performance evaluation indices chosen are based on Moriasi et al. (2007)'s recommendation for streamflow simulation and are given by Eqs. (24). Table 1 summarizes the overall recommended performance ratings for streamflow on a monthly basis

$$R^{2} = \frac{{\left( {\mathop \sum \nolimits_{i = 1}^{n} \left[ {X_{i} - X_{{{\text{av}}}} } \right]\left[ {Y_{i} - Y_{{{\text{av}}}} } \right]} \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left[ {X_{i} - X_{{{\text{av}}}} } \right]^{2} \left[ {Y_{i} - Y_{{{\text{av}}}} } \right]^{2} }}$$
(2)
$${\text{NSE}} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {X_{i} - Y_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {X_{i} - X_{{{\text{av}}}} } \right)^{2} }}$$
(3)
$${\text{PBIAS}} = 100\left( {\frac{{\sum\nolimits_{i = 1}^{n} {X_{i} } - \sum\nolimits_{i = 1}^{n} {Y_{i} } }}{{\sum\nolimits_{i = 1}^{n} {X_{i} } }}} \right),$$
(4)
Table 1 Overall recommended performance ratings for streamflow on a monthly time step adapted from (Moriasi et al. 2007)

where Xi represents the measured value, Xav represents the average measured value, Yi represents the simulated value, and Yav represents the average simulated value.

Previous studies (Worqlul et al. 2018; Bekele et al. 2019; Getahun et al. 2020; Musie et al. 2020) used the above performance evaluation statistical indices to evaluate SWAT streamflow simulation under different climate change conditions.

Results and discussion

Sensitivity analysis

To identify the most sensitive parameters influencing the model output, sensitivity analysis was performed using mean monthly observed data in the SUFI-2 algorithm, which is linked with SWAT-CUP. The parameters sensitive for Ethiopian catchments was conceived through in-depth review of the existing literatures (e.g., Legesse et al. 2003, 2004; Alemayehu et al. 2006; Legesse and Ayenew 2006; Ayenew 2007; Zeray et al. 2007; Setegn et al. 2010, 2011; Pascual-Ferrer et al. 2013, 2014; Shawul et al. 2013; Desta and Lemma 2017; Serur and Sarma 2018) to reduce immense figure of parameters for calibration of ArcSWAT model. SWAT-CUP ranks the sensitivity of parameters based on their t-stat and p value after running a series of simulations. The highest t-stat value indicates the ratio of the high parameter coefficient to standard error, while the lowest p value indicates the rejection of the hypothesis that an increase in the value of the parameter results in a significant increase in the variable response (Abbaspour et al. 2007). Figure 6 displays the catchment's fifteen most critical hydrologic flow parameters during calibration.

Fig. 6
figure 6

Streamflow sensitive parameters

Model performance evaluation

Comparing the actual streamflow with that predicted by the model during the calibration and validation periods at the gauged stations at Awash Melka Kunturi, Awash Melka Hombole, and Mojo Upstream of Koka revealed that the model accurately captured the monthly flows (Figs. 79 and Table 2).

Fig. 7
figure 7

Hydrographs and scatter plots of observed and simulated streamflow at the Awash Melka Kunturi gauged station: a hydrograph during calibration, b scatter plot during calibration, c hydrograph during validation, d scatter plot during validation

Table 2 Calibration and validation statistics for monthly streamflow at three gauged stations

The Awash Melka Kunturi gauged station's model performance was very good, with R2, NSE, and PBIAS values of 0.74, 0.66, and 14.9, respectively, during calibration and very good in terms of R2, good in terms of NSE, and satisfactory in terms of PBIAS (18.1) during the validation period (Fig. 7 and Table 2).

The model performed very well during calibration at the Awash Melka Hombole gauged station in terms of R2 (0.75) and PBIAS (1.30), and performed well in terms of NSE (0.66). However, the model performed satisfactorily in terms of NSE (0.62) and PBIAS (18.60) during the validation period, while performing well in terms of R2 (0.66) (Fig. 8 and Table 2).

Fig. 8
figure 8

Hydrographs and scatter plots of observed and simulated streamflow at the Awash Melka Hombole gauged station: a hydrograph during calibration, b scatter plot during calibration, c hydrograph during validation, d scatter plot during validation

The model performed very well at the Mojo Upstream of Koka gauged station, with an R2 value of 0.80 and a PBIAS value of -3.2, but it did well in terms of NSE (0.74) during the calibration period. The model performed very well during validation at this gauged station, with an R2 value of 0.74 and a PBIAS value of 0.80, but it did well in terms of NSE (0.71) (Fig. 9 and Table 2). Based on the evaluation rating for streamflow simulation provided by Moriasi et al. (2007), the SWAT model performance in all gauged stations is rated. Figures 7, 8, and 9 show the observed and simulated streamflow hydrograph and scatter plot with a 1:1 fitting line at three gauged stations during the calibration and validation periods.

Fig. 9
figure 9

Hydrographs and scatter plots of observed and simulated streamflow at the Mojo Upstream of Koka gauged station: a hydrograph during calibration, b scatter plot during calibration, c hydrograph during validation, d scatter plot during validation

When comparing SWAT applicability in simulating streamflow to findings by Biru and Kumar (2018), Worqlul et al. (2018), Bekele et al. (2019), Shawul et al. (2019), Daba and You (2020), Getahun et al. (2020), and Musie et al. (2020), there is a variation in R2, NSE, and PBIAS, which could be due to using different hydro-meteorological data. Among these, Shawul et al. (2019) reported that the SWAT hydrologic model achieved R2, NSE, and PBIAS values of 0.77, 0.76, -11.8, 0.78, 0.76, and 10.6 for the monthly calibration (1982–1987) and validation (1991–1994) periods between measured and simulated streamflow in the upper Awash sub-basin at Mojo gauged station, respectively. This variation could be attributed to the longer length of hydro-meteorological data used in this study (1988–2018). Serur and Adi (2022) developed the SWAT hydrologic model to assess the potential response of water balance components to land-use/land-cover change in a rift valley Lake Basin in Ethiopia, and the model demonstrated capability with R2 values ranging from 0.80 to 0.64 and 0.74 to 0.72 during calibration and validation periods, respectively. During the calibration and validation periods, NSE values ranged from 0.74 to 0.61 and 0.71 to 0.65, respectively, whereas PBIAS values ranged from 19.70 to −3.20 and 18.10 to 0.80, respectively.

Water balance of the sub-basins

For the purpose of analyzing the components of the water balance in the upper Awash River sub-basins, a long-term hydrologic simulation covering the years 1990 through 2014 (25 years) was conducted. For the 25-year (1990–2014) period, the mean annual streamflow was 745.77 mm. In the sub-basins, there were sizable spatiotemporal variations in streamflow. At Awash Melka Kunturi, Awash Melka Hombole, and Mojo Upstream of Koka, the mean seasonal streamflow was found to be 22.43, 4.95, and 75.69 m3/s, respectively, during the wet season (April–September), and 10.71, 2.40, and 35.72 m3/s, respectively, during the dry season (October–March) (Table 3).

Table 3 Multi-site seasonal streamflow variation at three gauged stations

In the entire sub-basin, the mean annual rainfall was approximately 1365.03 mm over a 25-year period; of this amount, 11.61% (158.55 mm) flowed as surface runoff (SURFQ), 7.43% (101.37 mm) as lateral flow (LATQ), about 35.47% (484.21 mm) flowed as baseflow (GWQ), and 45.41% (619.80 mm) vanished as evapotranspiration. The catchment's average net annual water yield (WY), which includes the SURFQ, LATQ, and GWQ, contributes about 54.63% (745.77 mm) of the average annual rainfall.

Previous studies in Ethiopia using hydrological models to simulate catchment hydrological components showed a monthly trend of decreasing hydrological components during the dry season and an increase in hydrological components during the wet season as a result of various natural and anthropogenic responses throughout the basin. According to studies by Choto and Fetene (2019), Shawul et al. (2019), Gashaw et al. (2018), and Kassa (2009) expanding agricultural land, bare land, and built-up area over forest and shrubland increases SURQ and wet season streamflow while decreasing LATQ and GWQ. As a result, the findings of this study indicated that LULC may have a significant impact on streamflow as well as water balance components in the sub-basins.

Most hydrological models applied in Ethiopia are calibrated using single-site (at basin confluence point) data which do not detect spatial variability within the catchment and could not provide reliable information to respective stakeholders and policy-makers so as to formulate sustainable integrated water resources management strategies. However, in this particular study, efforts have been made to calibrate and validate hydrological model considering multi-site gauged stations to detect spatial variability within the sub-catchment using ArcSWAT hydrologic model and showed very good-to-satisfactory performance in simulating catchment hydrology on a monthly time basis and detected spatial variability within the sub-catchment which is a key driver for innovation and technological development to ensure and formulate sustainable water resource management and development strategies in the upper part of Awash River catchment in Ethiopia.

The majority of hydrological models used in Ethiopia are calibrated using data from a single site (at the point where the basins confluence), which does not detect spatial variability within the catchment and cannot offer relevant stakeholders and policy-makers reliable information to develop sustainable integrated water resource management strategies. However, in this study, efforts were made to calibrate and validate the ArcSWAT hydrologic model, which demonstrated very well to satisfactory performance in simulating catchment hydrology on a monthly time basis and detecting spatial variability within the sub-basin.

Model uncertainty

The catchment's spatiotemporal variability and hydro-meteorological data quality are the easiest to hold responsible for the model simulation uncertainty. Data availability and quality are crucial when using distributed hydrological models. The biggest obstacle in this study was locating reliable hydro-meteorological data in the catchment. Without adequate data, it is impossible to implement a model, and getting accurate results is very challenging. Additionally, there are discrepancies in the data that are readily available for each sector, and spatial data, such as soil and LULC data, may also be the result of a slight discrepancy in the model simulation. Therefore, to integrate and coordinate the work of data collection and validation, the appropriate local, regional, and federal authorities should be involved. However, the graphical interpretation of the observed and simulated streamflow hydrographs, as well as the statistical model performance measures using the most commonly used statistical indices listed in Table 2, meet the Moriasi et al.’s (2007) criteria.

Conclusions

Multi-site calibration and validation of the SWAT model was evaluated in the upper part of the Awash River catchment using SWAT-CUP for simulating historical streamflow on a monthly basis. SUFI-2 algorithm embedded in the SWAT-CUP was applied for sensitivity and uncertainty analysis, and calibration and validation of the SWAT model. The performance of the SWAT hydrologic model in simulating catchment hydrology was measured using R2, NSE, and PBIAS statistical model performance evaluation indices, in addition to physical inspection of observed and simulated streamflow hydrographs at three gauged stations during calibration and validation periods. The results revealed that the SWAT model would simulate monthly streamflows very well at three spatially dispersed gauged stations in the catchment. The monthly observed and simulated streamflow statistics revealed that values of R2, NSE, and PBIAS varied from 0.80 to 0.74 and 0.74 to 0.66, 0.74 to 0.66 and 0.71 to 0.62, -3.20 to 14.90 and 18.60 to 8.00 during spatial calibration and validation periods, respectively. The model performed comparatively better during the calibration period than the validation period. The long-term hydrologic simulation from the year 1990–2014 (25 years) was performed to analyze the water balance components in the catchment and there were significant spatiotemporal variations of streamflow and water balance components in the catchment. In the entire sub-basin, the mean annual rainfall was approximately 1365.03 mm; of this amount, 11.61% flowed as surface runoff (SURFQ), 7.43% as lateral flow (LATQ), about 35.47% flowed as baseflow (GWQ), and 45.41% vanished as evapotranspiration. The sub-basin's average net annual water yield (WY), which includes the SURFQ, LATQ, and GWQ, contributes about 54.63% of the average annual rainfall. The multi-site calibration and validation-based performance evaluation results indicated that the SWAT model would simulate catchment hydrology very well at all gauged stations in the upper Awash sub-basin, which is immensely useful for planning and designing proper water management strategies in the Awash River basin.

Limitations, future research directions, and recommendations

Four major limitations, future research directions, and recommendations are proposed based on the findings and challenges encountered during the study execution period. (1) When using distributed hydrological models, data quality and availability are critical. The most difficult challenge in this study was locating high-quality hydro-meteorological data in the basin. Model implementation is impossible and extremely difficult without accurate data. As a result, the respective local, regional, and federal authorities should be involved in the compilation of integrated and coordinated data. (2) The land-use information used in this study is approximately 10 years old. Currently, the basin is experiencing noticeable land-use change, such as rapid urbanization encroaching on cultivated lands, an increase in rural settlements, and the expansion of cultivated areas into shrub lands; natural resource management practices are being promoted (different catchment activities such as physical and biological soil and water conservation measures); and land degradation is still an ongoing process. These modifications could have an impact on runoff generation and infiltration, as well as the evapotranspiration process. As a result, additional research must be conducted in light of the recently developed land-use map. (3) The SWAT hydrological model can detect historical water balances in upper Awash sub-basins and is ready for use. As a result, it may be suggested for further simulation under changing environmental conditions, particularly changing climate and land use/land cover in the study area. (4) The SWAT hydrological model was used in this study to simulate streamflow in the upper Awash River sub-basins of Ethiopia. A single model structure cannot adequately represent all governing processes of a watershed system's response to hydrological events, as is well known. As a result, to address model structural uncertainty in model prediction, the results of multiple hydrological models responsible for simulating streamflow in a catchment can be combined. As a result, additional research should be conducted to compare various hydrological models in simulating streamflow of the catchment hydrology to reduce uncertainty caused by model structure.