Introduction

Water resources are the prime contributor to a developing economy, environmental protection, and sustainable development (Madolli et al., 2022). It helps in advancing economic growth if managed and planned properly (Dhami et al., 2018). Shortage and misuse of freshwater cause a serious and growing threat to the protection, management, and sustainable development of water resources. Unless the water and land resources are managed accurately, industrial expansion, the natural ecosystems on which they depend, human health, social well-being, and sustainable food production are all in danger (ICWE, 1992). Only a small fraction (about 2.53%) of the estimated total volume of water available on the earth is freshwater (Water facts, 2020). A considerable portion of this freshwater is not available for use, as they lie in inaccessible deep aquifers or frozen in polar regions. This causes a challenge to protect, manage and develop water resources in a sustainable manner considering the economic growth, climate change, and population increase (Amrit et al., 2019; Fan et al., 2022; Kumar et al., 2021a, 2021b; Shiklomanov, 1998; Swain et al., 2021). Hydrologic models have been widely used to assess and manage the sustainability of water resources (Paul et al., 2021).

Hydrologic modeling is an efficient way for consistent long-term behavioral studies of hydrologic and climatic variables (Tanmoyee et al., 2015). Initially, hydrologic models were focused on the development of theories, concepts, and models for a particular component of the hydrologic cycle, such as baseflow (Barnes, 1940), overland flow (Horton, 1939; Keulegan, 1944), channel flow (Manning, 1891), subsurface flow (Fair & Hatch, 1933; Jacob, 1943, 1944; Theis, 1935), depression storage (Horton, 1919; SCS-CN method, 1956), evapotranspiration (Cummings, 1935; Penman, 1948; Thornthwaite, 1948), infiltration (Green & Ampt, 1911) and interception (Horton, 1919). The first physical model capable of modeling the entire watershed with all hydrologic cycle components was most likely the Stanford Watershed Model (SWM), developed in 1966 (Crawford & Linsley, 1966). Further, many hydrologic models were developed to advance computational abilities and algorithms with recently available databases like space technology, remote sensing satellite data, high-resolution digital elevation models (DEMs), and radar rainfall (Pandey et al., 2016). There is vast variability in the capabilities and characteristics of these hydrologic models, such as a representation of processes, accountability of spatial–temporal scale, algorithms used, input requirements, and types of output they provide (Pandey et al., 2016; Paul et al., 2021).

A complex hydrological system has always been investigated by employing physically-based models and simulating the major components like streamflow and sediment yield (Himanshu et al., 2017, 2018a). Many literature studies have proven the robustness of the SWAT (Aadhar et al., 2019; Dhami et al., 2018; Gupta et al., 2020; Murty et al., 2014; Pandey & Palmate, 2019; Swain et al., 2022) and VIC (Narendra et al., 2017; Oubeidillah et al., 2014; Srivastava et al., 2017) hydrologic models in the evaluation of the water balance components. Kang and Sridhar (2018) found SWAT and VIC models reliable for short-term drought forecasting in the contiguous USA. Alvarenga et al. (2020) compared VIC and SWAT hydrologic models in their capabilities to simulate runoff in the Verde River Watershed, Brazil. They found both models suitable for streamflow simulation and suggested that the integration of SWAT and VIC models can be useful in different water resource assessment studies.

With the development of advanced models and the availability of spatial–temporal data, modelers and stakeholders are now broadly depending on the information derived from hydrological models to make more sustainable choices. However, with the growing family of hydrological models and tools, it has become difficult for decision-makers to identify a plausible model for their intended application (Jajarmizadeh et al., 2012; Pandey et al., 2016). There are several queries related to the model's fitness for the intended application, model reliability, and uncertainties associated with the results. Moreover, because each model has a different modeling concept, algorithms, and input requirement, each would perform differently, and their performance could be non-unique in space and time. Further, to make an appropriate choice among various models, it is important to evaluate models with the available quantity and quality of the input data in the catchment.

Therefore, a comparative evaluation of the commonly used hydrological models (SWAT and VIC) was performed. Although both models perform catchment water balance, there are several characteristic differences between these models. The Soil and Water Assessment Tool (SWAT) was primarily developed by USDA’s Agricultural Research Service (ARS) to assess the impacts of land management practices on water quantity, water quality, and sediment fluxes in a watershed (Arnold et al., 1998; Borah & Bera, 2003; Himanshu et al., 2019; Miller et al., 2007; Palmate & Pandey, 2021). It is a physically-based, continuous-time, long-term, semi-distributed watershed-scale hydrologic model (Arnold & Fohrer, 2005; Arnold et al., 1998; Garg et al., 2012; Pandey et al., 2016). On the other hand, the Variable Infiltration Capacity (VIC) is a physically-based semi-distributed macroscale model developed on the Land Surface Modelling scheme, primarily to link with the climate models (Liang et al., 1994). The VIC model explicates the sub-grid level spatial heterogeneity, vegetation phenological changes, soil textures, and terrain characteristics at different spatial resolutions (Kimball et al., 1997). The model can simulate several hydrologic and climatic variables such as snow depth, snowmelt, ET, surface runoff, soil moisture, frozen soil, and streamflow (Tanmoyee et al., 2015). Unlike SWAT, the VIC model can simulate energy balance in addition to water balance at a sub-daily time step (Liang et al., 1994).

Various hydrological models have their strengths and weakness in representing hydrological processes (Li et al., 2018). Due to insufficient input data, model structure, and model output uncertainty in large-scale exercises, relying on a single hydrological model generally leads to simulation uncertainties (Dietrich et al., 2009; Kauffeldt et al., 2016; Li et al., 2018; Liu & Gupta, 2007; Palmate et al., 2021). To overcome uncertainty in modeling hydrological processes, several techniques have been used in the recent past (Liu et al., 2017; Kasiviswanathan and Sudheer, 2017; Gaur et al., 2022). Among these, the ensemble modeling technique has been gaining popularity in recent years in different sectors of water resources modeling (Doblas-Reyes et al., 2005; Gaur et al. 2021a, 2021b; Horan et al., 2021; Kumar & Nandagiri, 2015; Kumar et al., 2015; Li et al., 2018; Muhammad et al., 2018; Paul et al., 2021; Yadav et al., 2020). Multi-model ensembles, however, outperform individual models and tend to perform better than single-model ensembles in weather prediction and streamflow simulation (Gaur et al. 2021a, 2021b; Kumar et al., 2015; Mendoza et al., 2014; Paul et al., 2021). Different modeling approaches are used to simulate hydrological variables in the ensemble modeling technique. Out of which, the mean ensemble modeling approach that averages the individual hydrological model-simulated datasets with equal weights has been used in many studies to simulate hydrological variables for reducing errors with optimal bias (Doblas-Reyes et al., 2005; Baker & Ellison, 2008; Kumar & Nandagiri, 2015; Muhammad et al., 2018; Li et al., 2018; Horan et al., 2021).

SWAT and VIC models have their own strengths and weakness in representing hydrological processes. Many previous studies on SWAT and VIC comparison have shown that the results of these models may vary, and depending on the accuracy of inputs and parameters, these models overestimate, underestimate or contradict each other in assessing the hydrometeorological variables such as discharge and evapotranspiration. Seasonal simulation performance of the SWAT and VIC-3L models studied by Hu et al. (2007) showed underestimated runoff values than the observed values for spring and winter. The SWAT-simulated runoff values in summer were higher than the VIC-3L simulation but were smaller in winter. These hydrological models also provided useful insights into the impact of climate and anthropogenic activities on regional water security (Veettil et al., 2022). Kang et al. (2022) assessed the impacts of climate change on conventional and flash drought conditions using these models. The SWAT-driven drought indices showed an overall increase in drought occurrence; however, the VIC-driven drought indices showed a decrease in drought occurrence. Dash et al. (2021) revealed that the SWAT simulation-based standardized evapotranspiration drought index (SEDI) was consistent; in contrast, the VIC-3L simulation-based SEDI was continuously overestimated and underestimated with the benchmark satellite (MOD16A2-ET)-derived estimates. A study employing these models showed a remarkable decrease in drought predictions 36% for SWAT and 38% for VIC, due to uncertainties associated with the meteorological variables (Kang & Sridhar, 2018). An ensemble modeling approach leads to offset uncertainty in input data as well as poor reservoir operation functionality, if any, within the models (Horan et al., 2021).

In a large watershed, the application of a single model can lead to simulation uncertainties if detailed data are not available. Ensemble modeling combines multiple model predictions to create a single prediction that generally tends to perform better than the individual model. Ensemble modeling can be utilized to better simulate components of the hydrologic cycle, and provide a range of possible outcomes and uncertainty. Looking at the above mentioned, this study explores the applicability of an ensemble modeling approach for hydrological variables (runoff and evapotranspiration) over the study area. The study could help in developing a finer spatial resolution modeling framework to simulate the hydrology of a watershed that can contribute to policy and decision-making processes for sustainable water resource management. The SWAT and VIC models were selected to investigate the predictive capability of individual models and the performance of the mean ensemble of these two models (EnSwaVi) to improve the accuracy of the simulation of runoff and evapotranspiration in the study area. To evaluate the models for their ability to address spatial heterogeneity in soil, land cover, and topography in the semiarid region, a heterogeneous watershed named Marol watershed (5092 km2), India, was identified for the study.

Materials and Methods

Study Area

The Marol watershed is part of the upper Krishna River basin, which covers a geographical area of about 5092 km2 between longitude from 74°48′30″ E to 75°36′38″ E and latitude from 14º05′18″ N to 15º07′48″ N. The Krishna River is an important eastside flowing river in the peninsular region of India (Himanshu et al., 2018b). The study area is positioned along a sub-tributary Varada River of Tungabhadra River in the State of Karnataka, India (Fig. 1). The elevation of the watershed above the mean sea level varies from 340 to 848 m. An average slope varies from 0 to 8.9%, which majorly consists of a gently undulating plain area. However, because of some western hilly areas, the maximum slope of the study area goes up to 31%. Topographic elevation, land use/land cover, and soil textures of the watershed are given in Fig. 1. The average annual rainfall of the watershed is 1330 mm, with a variation in temperature between 16 and 38 °C. The availability of observed hydrometeorological data, heterogeneous land use, and absence of any large storage structure makes the watershed appropriate for the present case study.

Fig. 1
figure 1

Details of the Marol watershed: a location map, b land use/cover map, and c soil map

Data

Different types of datasets, including meteorological data, hydrological data, and thematic data, summarized in Table 1 were used in this study. Daily IMD gridded precipitation data available at a spatial resolution of 0.25° × 0.25° grid (Pai et al., 2014, 2015) were used as inputs to the model. The Marol watershed covers fifteen precipitation grid points. The daily IMD gridded temperature data available at 1° × 1° were also used in the present study (Srivastava et al., 2009). In addition to this, other important data variables, namely relative humidity, solar radiation, and wind speed, not available for the study area, were obtained from the Global Weather Database for SWAT (Dile & Srinivasan, 2014) website at 0.25° × 0.25° spatial resolution.

Table 1 Details of datasets used in the present study

Daily hydrologic data, i.e., streamflow, measured at the Marol gauge and discharge (G&D) site, were obtained for the years 2000–2010 from the India Water Resources Information System (WRIS) WebGIS portal, Government of India. The Marol G&D site of the Varada River is located at the longitude of 75º36′38″ E and a latitude of 14º55′04″ N. The period between 2005 and 2007 was not considered in the evaluation as no discharge data were available. Also, due to inconsistency in the data for November to May, the model evaluation was performed only for the months from June to October.

The freely available digital elevation model (DEM) of advanced space-borne thermal emission and reflection radiometer (ASTER) at 30 m spatial resolution was used to delineate the watershed and sub-watershed boundaries and generate drainage networks. In this study, the soil data were procured from the “National Bureau of Soil Survey and Land Use Planning (NBSS & LUP), Government of India” (Shivaprasad et al., 1998). The study area covers seven soil textural classes, as presented in Fig. 1. The spatial land use/land cover map was procured from the “National Remote Sensing Centre (NRSC) Hyderabad, Government of India”. The study area covers regionally important ten land use/land cover classes (NRSC, 2014) (Fig. 1).

The vegetation parameters were defined in the models based on the land use/cover map. The ET and vegetation parameters, including Leaf Area Index (LAI) and Albedo, were obtained from MODIS 1 km 8-day composite product (MOD16A2). The MODIS onboard the Aqua and Terra satellites makes available reliable ET estimates at different spatial–temporal resolutions (Anderson et al., 2011; Senay et al., 2013). The downloaded MODIS products were pre-processed to filter out poor-quality pixels utilizing MODIS Quality Control (QC) band. Finally, the ET values are resampled at the model grid/hru scale for the analysis. These pre-processing steps were performed on the MODIS data using the model builder and batch processing tools of ArcGIS. The study used the MODIS ET estimates at 8-day and monthly temporal resolutions to validate the model simulation-based ET values.

Model Performance Evaluation

In this study, the model simulation performance was evaluated using the four statistical measures, namely coefficient of correlation (CC), root-mean-square error (RMSE) observations, standard deviation ratio (RSR), percent error (PBIAS), and index of agreement (d-index). In addition, the Nash–Sutcliffe model efficiency (NSE) was also used to evaluate the model for discharge simulation. The d-index, which varies between 0 (no agreement) and 1 (perfect agreement), measures the degree of model simulation error (Willmott et al., 1985) (Eq. 1). The CC, which ranges from − 1 to + 1, measures the direction and strength of a linear relationship between observed and estimated data (Eq. 2). The CC value of 1 represents the perfect correlation, while 0 represents no correlation, and—and + signs indicate negative and positive linear correlations between the observed and simulated values. The RSR, which ranges from the optimal value of 0 to a large positive value, is estimated as the ratio of RMSE and a standard deviation of the measured data (Eq. 3). The PBIAS was used to assess systematic over- or under-prediction and varies between − 100 and ∞ (Xu et al., 2010) (Eq. 4). The PBIAS value close to 0 shows a perfect agreement between observed and simulated data. The NSE is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance (Nash & Sutcliffe, 1970) (Eq. 5). NSE ranges between − ∞ and 1.0, with NSE = 1.0 being the optimal value.

$$d - {\text{index}} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {{\text{Y}}_{{\text{i}}}^{{{\text{sim}}}} - Y_{i}^{{{\text{obs}}}} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\left| {Y_{i}^{{{\text{sim}}}} - \overline{{Y^{{{\text{obs}}}} }} } \right| + \left| {Y_{i}^{{{\text{obs}}}} - \overline{{Y^{{{\text{obs}}}} }} } \right|} \right)^{2} }}$$
(1)
$${\text{CC}} = \left[ {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i}^{{{\text{obs}}}} - \overline{{Y^{{{\text{obs}}}} }} } \right)\left( {Y_{i}^{{{\text{sim}}}} - \overline{{Y^{{{\text{sim}}}} }} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i}^{{{\text{obs}}}} - \overline{{Y^{{{\text{obs}}}} }} } \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i}^{{{\text{sim}}}} - \overline{{Y^{{{\text{sim}}}} }} } \right)^{2} } }}} \right]$$
(2)
$$RSR = \left[ {\frac{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i}^{{{\text{obs}}}} - Y_{i}^{{{\text{sim}}}} } \right)^{2} } }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i}^{{{\text{obs}}}} - \overline{{{\text{Y}}^{{{\text{obs}}}} }} } \right)^{2} } }}} \right]$$
(3)
$${\text{PBIAS}} = \left[ {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\overline{{Y^{{{\text{sim}}}} }} - \overline{{Y^{{{\text{obs}}}} }} } \right)*\left( {100} \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} \overline{{Y^{{{\text{obs}}}} }} }}} \right]$$
(4)
$${\text{NSE}} = 1 - \left[ {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i}^{{{\text{obs}}}} - Y_{i}^{{{\text{sim}}}} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i}^{{{\text{obs}}}} - \overline{{{\text{Y}}^{{{\text{obs}}}} }} } \right)^{2} }}} \right]$$
(5)

where \({Y}_{i}^{sim}, {Y}_{i}^{obs}\),\(\overline{{Y }^{sim}}\) and \(\overline{{Y }^{obs}}\) are the simulated, observed, average simulated, and average observed values, respectively.

SWAT Model Setup

The hydrological SWAT model is governed by water mass balance. The model processes depend on the discretized hydrological response units (HRUs) and are simulated at daily time steps using the following soil water balance equation (Eq. 6) (Neitsch et al., 2011).

$${\text{SW}}_{t} = {\text{SW}}_{o} + \mathop \sum \limits_{i = 1}^{n} \left( {R_{{{\text{day}}}} - Q_{{{\text{surf}}}} - E_{a} - w_{{{\text{seep}}}} - Q_{gw} } \right)$$
(6)

where SWt = final soil water content (mm); t = time (days); SWo = initial soil water content on day i (mm); Rday = amount of precipitation on day i (mm); Qsurf = amount of surface runoff on day i (mm); Ea = amount of ET on day i (mm); wseep = amount of percolation and bypass exiting the soil profile bottom on day i (mm); Qgw = amount of return flow on day i (mm).

Modified rational method and modified soil conservation service curve number (SCS-CN) method (USDA, 1972) are used to compute peak runoff rate and surface runoff, respectively. The actual ET, as well as potential transpiration, is calculated using the Penman–Monteith method. In the present study, Muskingum method approach (Cunge, 1969) was adopted for flood routing.

The required weather and spatial datasets were prepared using the ArcGIS interface. The whole Marol watershed was discretized into several smaller sub-watersheds, which were further sub-divided into HRUs representing homogeneous combinations of land use/land cover, soil texture, and slope class. In this study, 1% threshold for each land use, soil, and the slope was considered and generated 647 HRUs. The ASTER DEM data were used to delineate watershed, sub-watershed, and drainage networks. By specifying the initial threshold on the drainage area, the ArcSWAT interface allows the user to fix the number of sub-watersheds. This study uses a threshold value of 8000 ha to delineate the drainage network and define outlet points for discretizing the Marol watershed into 31 sub-watershed. The threshold value of 8000 ha was considered to discretize the watershed so that each sub-watershed has a drainage area smaller than the precipitation grid area. This ensures minimum spatial degradation of precipitation data for capturing temporal variations over the watershed. The minimum and maximum area of a particular sub-watershed are estimated as 48.25 km2 and 311.35 km2, respectively, with an average area of 164.26 km2. The delineated sub-watersheds and reach map of the study area are presented in Fig. 2. The SWAT model was simulated at daily time steps in this study.

Fig. 2
figure 2

a Delineated sub-watersheds and reach map of the study area, and b 3-min grids covering the Marol watershed

VIC Model Setup

The VIC model accounts for sub-grid scale land use fractions of transpiration from vegetation, canopy layer evaporation, and soil evaporation for partitioning the grid-scale ET (Liang et al., 1994). The VIC model computes the water balances and surface energy budgets within the specified grid using the water budget equation (Eq. 7) (Narendra et al., 2017). The budget balance equation for the canopy layer is expressed in Eq. (8).

$$\frac{\partial S}{{\partial t}} = {\text{PR}} - {\text{ET}} - {\text{RF}}$$
(7)
$$\frac{\partial Wi}{{\partial t}} = {\text{PR}} - {\text{Ec}} - {\text{Pt}}$$
(8)

where \(\frac{\partial S}{\partial t}= \mathrm{change in}\) water storage, PR = precipitation, ET = Evapotranspiration, RF = runoff, \(\frac{\partial Wi}{\partial t}=change in\) canopy intercepted water, Ec = evaporation from the canopy layer, and Pt = throughfall.

The total evapotranspiration is computed as the summation of canopy layer evaporation (Ec), evaporation from bare soil (Eb), and transpiration from vegetation (Et), as follows (Eq. 9) (Liang et al., 1994):

$${\text{ET}} = \mathop \sum \limits_{n = 1}^{N} C_{v} \left[ n \right].\left( {E_{C} \left[ n \right] + E_{t} \left[ n \right]} \right) + C_{v} \left[ {N + 1} \right] \cdot E_{b}$$
(9)

\({C}_{v}\left[n\right]\)= fraction of vegetation cover for the nth surface cover class (vegetation tile). \({C}_{v}\left[N+1\right]=\) fraction of area covered with bare soil.

To calculate the runoff, the VIC model utilizes the variable infiltration curve accounting for the spatial variation. It states that runoff generates from two upper layers of soil when received precipitation and soil moisture from the initial time step exceeds the storage capacity of the soil. The VIC model computes the fluxes at each cell, consisting of the discharge, base flow, evapotranspiration, soil moisture, and other outputs. These outputs are routed using a separate routing model to obtain the discharge at the outlet locations. The routing model developed by Lohmann et al. (1998) was used for this present study. The routing model states that the water flowing outwards of any grid cell does not flow back toward the same grid cell, and the water after entering the river channel no longer remains part of the water balance. The routing model comprises routing within the grid cell and routing in the channel (river routing). Impulse response function within each grid cell, which is depicted by the Linear transfer function, is used for routing within the grid cell, whereas after streamflow reaches the channel, Saint–Venant’s equation-based channel routing is used to generate the discharge at the required outlet.

3' × 3' grid (~ 5.5 km) resolution was used to estimate water balance components for the Marol watershed of Karnataka, India. The Marol watershed is covered under 216 grid points. The average rainfall, ET, and runoff fluxes were calculated based on clipped gridded portion coming inside the watershed boundary by the weighted area method. The dominant land-use type in the Marol watershed is agricultural land (77.56%, covering 168 grids) followed by deciduous forest (10.74%, covering 23 grids). The model was run in water balance computation mode from January 2000 to December 2010 using various geo-spatial datasets (Table 1). Soil hydrologic and hydraulic characteristics for 3-layer depth were calculated from NBSS & LUP soil map based on USDA soil texture classification. The three soil layer depths at 0–15 cm, 15–35 cm, and 35–100 cm intervals were set for the top, bottom, and deep layers, respectively. A soil texture ID was assigned to each grid cell for its use in the VIC model. One of the strengths of the VIC model is its capability to compute variable infiltration through the definition of multiple soil layers. Each grid fraction covering specific soil was given to the input file of the VIC model. The major surface soil types in the Marol watershed categorized under USDA textural classification are silty clay (31.11%, covering 68 grids) followed by sandy clay loam (28.01%, covering 61 grids). The delineated sub-watersheds and reach map of the study area and 3-min grids covering the Marol watershed are presented in Fig. 2.

Ensemble of Model Outputs

Hydrological models—VIC and SWAT—although provide satisfactory outputs, the uncertainties caused by inadequate data and assumptions/simplifications made in modeling are obvious. A variety of techniques has been developed to minimize modeling uncertainties, such as by incorporating in situ data through data assimilation and by merging outputs of multiple models by generating ensembles. A multi-model ensemble can provide a more skillful and reliable system of hydrological simulation by combining the strengths of multiple models. A classic argument to support the use of a multi-model approach has been that it allows “compensatory effects” that control the excess spread coming from individual model errors. However, it should also be regarded that the verification metrics used to compare the single best model with several multi-model configurations might make a big difference when deciding what approach should be used (Mendoza et al., 2014). There are several approaches to combining the outputs of different models, while a simple and effective way of developing a model ensemble is to take the arithmetic mean of output variables. This approach has been used widely by researchers to simulate hydrological variables (Williams, 1969; Doblas-Reyes et al., 2005; Baker & Ellison, 2008; Kumar & Nandagiri, 2015; Muhammad et al., 2018; Li et al., 2018; Horan et al., 2021), and the same approach has also been adopted in this study. The HRU-based SWAT-simulated runoff is compared with grid-based VIC-simulated runoff at the watershed outlet. However, for ET comparison, an average of SWAT- and VIC-simulated ET over the watershed was considered. A schematic outlining the procedure used to generate the ensemble model is presented in Fig. 3.

Fig. 3
figure 3

A schematic outlining the procedure used to generate the ensemble model

Results and Discussion

Sensitivity and Uncertainty Analysis

The sensitivity and uncertainty analyses of the SWAT model parameters were performed using the Sequential Uncertainty Fitting (SUFI-2) algorithm of the SWAT calibration and uncertainty program (SWAT-CUP) (Abbaspour et al., 2007). The analysis showed a p-factor between 0.6 and 1 and an r-factor between zero and 0.3, i.e., the model simulation values correspond to observed values. Hence, the uncertainty associated with the model simulation was considered lower and acceptable. This study considered 17 most sensitive parameters (Table 2). The analysis showed that streamflow is most sensitive to CH_N2 (Manning's ‘n’ value for the main channel), followed by CH_K2 (Effective hydraulic conductivity in main channel alluvium).

Table 2 Sensitivity order and calibrated values of the SWAT model parameters for Marol Watershed

Further, to calibrate the VIC model, seven model parameters, namely infiltration parameter (b-infilt), subsurface flow parameters (Ds, Dsmax, and Ws), and three soil layers (d1, d2, and d3), were considered (Table 3). The b-infilt parameter was altered to a low and high value to match the observed peak flows. A lower value was given to lower the peak, and a higher value to increase the peak. The Dsmax and Ds parameters were adjusted to fit the baseflow, while parameter Ws was adjusted to fit the soil moisture. Detailed information on sensitivity and uncertainty analysis adopted for both SWAT and VIC models can be found in the supplementary file.

Table 3 Sensitivity order of the VIC model parameters for Marol Watershed

Evaluation of the SWAT, VIC, and EnSwaVi Models

The SWAT, VIC, and EnSwaVi models were evaluated for the period from 2000 to 2010 on a daily and monthly basis for discharge simulation and an 8-daily and monthly basis for ET simulation. The model results were evaluated using observed discharge data for the Varada River at the Marol G & D site in the state of Karnataka, India, and the reference ET dataset from the MODIS. The total available observed data series were divided into two parts, 2000–2004 for calibration and 2008–2010 for validation, out of which the year 2000 was used as the model warm-up period. The performance evaluation of the SWAT, VIC, and EnSwaVi models for discharge and ET simulations is presented in Tables 4 and 5, respectively.

Table 4 Performance evaluation of the SWAT, VIC, and EnSwaVi for discharge simulation
Table 5 Performance evaluation of the SWAT and the VIC and EnSwaVi for evapotranspiration simulation

The discharge simulation analysis based on PBIAS for the SWAT and VIC model indicates that the SWAT model captured the physical processes accurately on a daily basis during both calibration and validation stages. However, it marginally underestimated the discharge values in the monthly simulation. On the other hand, the VIC model overestimated discharge values during the calibration and validation stage at daily and monthly time intervals (Table 5). Further, the coefficient of correlation (CC) for the SWAT model-simulated discharge was consistently better than the VIC model-simulated discharge, especially for daily simulations. Similarly, the RSR values for the SWAT-simulated discharge were comparatively better than the VIC model-simulated discharge for both daily and monthly simulations. In general, different performance statistics (PBIAS, CC, RSR, d index) indicated that the EnSwaVi model's performance was marginally better than the SWAT and VIC models for discharge simulation on a daily and monthly scale.

Table 5 presents the performance evaluation of the SWAT and VIC model while predicting the ET at 8-daily and monthly time intervals. Significant underestimation by the SWAT model (negative PBIAS) and overestimation by VIC model (Positive PBIAS) was observed while simulating the evapotranspiration at 8-daily and monthly intervals during the calibration stage. In general, the VIC model performance was comparatively better than the SWAT model, especially for monthly simulation (Table 5). Overall, the performance of both SWAT and VIC models for ET simulation was good. However, the EnSwaVi model-simulated ET values were considerably better than SWAT, and VIC-simulated ET on daily and the monthly timescale. Hence, it can be inferred that the ensemble model provides a better ET estimate than the individual one (Table 5).

Evaluation of Discharge

The observed and simulated daily discharges for the calibration and validation period using SWAT, VIC, and EnSwaVi models are presented in Fig. 4. Similarly, the observed and simulated monthly discharges for the calibration and validation period are presented in Fig. 5. The scatter plot between observed and simulated discharge for daily and monthly calibration and validation using the SWAT, VIC, and EnSwaVi outputs are presented in supplementary Figs. 1 and 2. The graphical results show that the observed and simulated discharges using the SWAT model closely matched for the most part except for some high-flow events, which were slightly underestimated. Similarly, a good agreement between the observed and simulated hydrographs was observed using the VIC model. However, in general, the high flow events were overestimated. The SWAT simulation using monthly discharge data has performed better, which reveals that in comparison to short-term or single storm simulation, the SWAT model performs better for long-term simulation, and such observations were also reported previously (Borah et al., 2007). In general, the EnSwaVi-simulated discharge values were found more accurate than the SWAT- and VIC-simulated discharge values on both daily and monthly scales, specifically, over low and high flows. These results are also reflected through different performance statistics (Table 4).

Fig. 4
figure 4

Comparison of the observed and simulated discharge for daily calibration (2001–2004) and validation (2008–2010) using the a SWAT model, b VIC model, and c EnSwaVi output

Fig. 5
figure 5

Comparison of the observed and simulated discharge for monthly calibration (2001–2004) and validation (2008–2010) using the a SWAT model, b VIC model, and c Ensemble output

For the SWAT model simulation on a daily scale, the coefficient of correlation (CC) values were estimated as 0.87 and 0.90; however, on the monthly timescale, the CC values were estimated as 0.94 and 0.95 during the calibration and validation period, respectively. Similarly, for the VIC model simulation on a daily scale, the CC values were estimated as 0.78 and 0.83; however, on a monthly timescale, the CC values were estimated as 0.96 and 0.95 during the calibration and validation periods, respectively (Table 4). For EnSwaVi-simulated discharge, the CC values were estimated as 0.89 and 0.90; however, on a monthly timescale, the CC values were estimated as 0.98 and 0.96 during the calibration and validation period, respectively. The CC values in EnSwaVi-simulated discharge were improved by approximately 10–15% on a daily scale and 5–6% on a monthly scale as compared to the VIC-simulated discharge.

The RSR values for daily simulation were estimated as 0.51 and 0.49 using the SWAT model, while 1.28 and 0.71 using the VIC model during the calibration and validation period, respectively. Similarly, the RSR values for monthly simulation were estimated as 0.33 and 0.33 using the SWAT model, while 0.58 and 0.45 using the VIC model during the calibration and validation period, respectively (Table 4). The RSR values for daily simulation using the EnSwaVi model were estimated as 0.49 during both calibration and validation period, which showed a considerable/clear improvement over VIC-simulated discharge. Similarly, a significant improvement was seen for monthly simulation. However, improvements were not substantial as compared to SWAT-simulated discharge.

The different performance evaluation criteria showed a good agreement between observed and simulated hydrographs on daily and monthly timescales, indicating the good performance of the SWAT and the VIC models (Moriasi et al., 2007). PBIAS of 0.3 and 2.1 for daily calibration and validation, respectively, using the SWAT model, indicated that on average, the SWAT model overestimated discharge by 0.3% and 2.1% during daily calibration and validation, respectively (Fig. 4). Similarly, PBIAS of 32.7 and 2.2 for daily calibration and validation, respectively, using the VIC model, indicated that, on average, the VIC model overestimated discharge by 32.7% and 2.2% during daily calibration and validation, respectively (Fig. 5). A similar trend was observed for monthly simulation; the VIC model overestimated the discharge during both calibration and validation period; however, negligible overestimation/underestimation was observed using the SWAT model. On the other hand, the EnSwaVi model overestimated the discharge by 10.1% and 2.1% during the calibration and validation periods, respectively, on a daily scale. Moreover, it overestimated the discharge by 15% and 4.5% on a monthly scale (Table 4). Based on the PBIAS values, it can be inferred that overestimation was significantly lower in the EnSwaVi model compared to the VIC model at daily and monthly timescale.

Evaluation of Evapotranspiration

The observed and simulated 8-daily ET for the evaluation period using the SWAT, VIC, and EnSwaVi hydrologic models are presented in Fig. 6. Similarly, the observed and simulated monthly ET for the calibration and validation period are presented in Fig. 7. The scatter plot between observed and simulated evapotranspiration for 8-daily and monthly calibration and validation using the SWAT, VIC, and EnSwaVi outputs are presented in supplementary Figs. 3 and 4. The graphical results show that the observed and simulated ET were mostly matched during the simulation period using both SWAT and VIC models. However, the VIC simulation results were comparatively matching better with the reference ET dataset than the SWAT simulation results. It can also be seen that the EnSwaVi simulation results matched closely with the reference ET. Interestingly, these EnSwaVi-simulated ET estimates were better than the SWAT and VIC model’s simulated ET.

Fig. 6
figure 6

Comparison of the observed and simulated evapotranspiration for 8-daily calibration (2000–2004) and validation (2008–2010) using the a SWAT model, b VIC model, and c ensemble results

Fig. 7
figure 7

Comparison of the observed and simulated evapotranspiration for monthly calibration (2000–2004) and validation (2008–2010) using the a SWAT model, b VIC model, and c ensemble results

For the SWAT model simulation on an 8-daily scale, the CC values were estimated as 0.68 and 0.60, however, on a monthly timescale, the CC values were estimated as 0.77 and 0.62 during the calibration and validation period, respectively. Similarly, for the VIC model simulation on an 8-daily scale, the CC values were estimated as 0.62 and 0.64; however, on a monthly timescale, the CC values were estimated as 0.70 and 0.69 during calibration and validation period, respectively. It is interesting to note that the CC values were 0.71 and 0.75 for the EnSwaVi model simulation on an 8-daily scale, however, the values were 0.82 and 0.71 on a monthly scale during calibration and validation period, respectively (Table 5). The RSR values for an 8-daily simulation were estimated as 1.93 and 1.94 using the SWAT model, while 1.04 and 1.24 using the VIC model during calibration and validation period, respectively. Similarly, the RSR values for monthly simulation were estimated as 1.08 and 1.14 using the SWAT model, while 0.94 and 0.71 using the VIC model during calibration and validation period, respectively. On the other hand, the RSR values for 8-daily simulation were 0.75 and 0.84 using the EnSwaVi model on an 8-daily scale; however, these values are 0.73 and 0.76 on a monthly scale during calibration and validation period, respectively (Table 5).

PBIAS of − 11.8 and − 4.71 for an 8-daily calibration and validation, respectively, using the SWAT model, indicated that on average, the SWAT model underestimated ET by 11.8% during calibration and 4.71% during the validation period (Fig. 6). Similarly, PBIAS of 12.3 and 14.9 for 8-daily calibration and validation, respectively, using the VIC model, indicated that on average, the VIC model overestimated ET by 12.3% during calibration and 14.9% during the validation period (Fig. 7). Performance of the VIC model for ET simulation was observed to be very good on a monthly timescale (PBIAS of 1.8 and 4.7 for calibration and validation, respectively). However, the performance of the SWAT model for ET simulation was observed relatively poor on a monthly timescale (PBIAS of − 6.8 and − 8.2 for calibration and validation, respectively). One can note that the PBIAS of 1.2 and 5.1 was observed for the EnSwaVi model on an 8-daily scale for ET simulation, while these values were 0.5 and 1.8 on a monthly scale for calibration and validation, respectively. Overall, different performance evaluation criteria showed relatively better performance of the VIC model compared to the SWAT model; however, the performance of the EnSwaVi model was marginally better than the SWAT and VIC models.

Discussion on the Performance of the SWAT and VIC Models

Both the hydrologic model SWAT and VIC can be efficiently applied to carry out water balance analysis and for planning and management of water resources. The SWAT model simulated the discharge more accurately than the VIC model. The results were in conformity with Hu et al. (2007), Dash et al. (2021), and Kang et al. (2022). In general, overestimation was observed using the VIC model during both calibration and validation period; this may be due to inconsideration of the upstream abstraction. However, the accuracy of the VIC model in simulating the ET was found better than the SWAT model. Study results contradict Dash et al. (2021), which revealed that the SWAT simulation-based standardized evapotranspiration drought index (SEDI) was consistent; in contrast, the VIC-3L simulation-based SEDI was continuously overestimated and underestimated with the benchmark satellite (MOD16A2-ET)-derived estimates. In general, underestimation was observed using the SWAT model during the calibration and validation periods. The SWAT hydrologic model lump the soil characteristics and land use in each grid cell without considering the sub-grid scale variability of LULC and soil moisture resulting in more bias in the ET estimates (Rathjens et al., 2015). It computes water balance components over the HRUs and averages it for a sub-watershed. Conversely, the VIC modeling framework can be advantageous over the SWAT model as it accounts for the sub-grid scale variability of soil types, soil moisture, and vegetation; hence, it can simulate the ET more closely to reality (Srivastava et al., 2017). One can note that the EnSwaVi-simulated discharges were considerably better than VIC-simulated discharges; however, the improvement was not much as compared to the SWAT-simulated discharges. On the other hand, the EnSwaVi-simulated ET was found superior to both SWAT and VIC-simulated ET. These results revealed that the ensemble model performs better as compared to the individual model for ET as well as discharge simulations (Horan et al., 2021). Our study outcomes are consistent with Horan et al. (2021) and Muhammad et al. (2018), and suggest that an ensemble reduces the noise, bias, and variance of simulations and can potentially create a more in-depth understanding of the data. However, ensemble modeling results can suffer from a lack of interpretability and are dependent on the prediction accuracy of the ensemble members.

While dividing the data into calibration and validation subsets, it is important to check the data which present the same statistical population (Masters, 1993). The model performs better if it does not extrapolate beyond the range of the data used for model calibration (Tokar & Johnson, 1999). Although, in general, the calibration and validation datasets have relatively similar statistical characteristics, it has been observed that a few higher-value peaks are there in the validation period, which may be not well-calibrated, which resulted in higher overestimation/underestimation of these peak values. The SWAT and VIC models were calibrated and validated using observed discharge data at the watershed outlet and the average reference ET at the watershed scale only. Though model calibration performance seems quite good for the calibrated gauging station, multi-site evaluation of the models should be carried out to achieve a better representation of the physical parameters and to improve the model’s predictability. But due to the availability of observed data at watershed outlet only, single-site calibration was carried out in this study. The model's simulation capability could also be improved if standardized MODIS-derived ET estimates are used since the MODIS-derived ET estimates are generally not free from bias.

The average annual water balance for simulation has also been estimated for the entire 31 sub-watersheds using the SWAT model. It has been inferred that about 39.75% flows out as surface runoff from the watershed, out of annual average precipitation of 1330.90 mm. ET has been found predominant and accounts for about 38.46% of the annual average precipitation falling over the area. It was observed that almost all the sub-watersheds flow out more than 25% of annual precipitation as surface runoff, indicating the need for implementing suitable soil and water management programs to decrease the runoff volume by increasing in-watershed application of water, in turn minimizing soil erosion.

The SWAT and VIC hydrologic models are useful platforms extensively applied for water resource assessment and management worldwide. The ensemble of VIC and SWAT outputs, i.e., EnSwaVi model, can be used by policymakers to make decisions regarding water resource management in the study area. However, in this study, the water balance was carried out assuming that land use/land cover and other parameters remain constant with time. In reality, several parameters change with time/season. Therefore, a water balance study incorporating the variability of these parameters with time/season in the GIS environment can be a scope for future research. Further, it is recommended that additional studies should be conducted over other river basins/watersheds to evaluate the long-term capabilities of hydrologic models in simulating the water balance components.

Conclusions

In the present study, the SWAT and VIC hydrologic models were used to simulate the water balance components (runoff and ET) over an agriculture-based watershed. Further, the ensemble of VIC and SWAT outputs (EnSwaVi; averages of individual model-simulated datasets with equal weights) were also simulated for hydrological variables to assess whether modeling uncertainties could be minimized. Following major conclusions were drawn from the present study:

  1. 1.

    The results revealed that discharge had been simulated well using both SWAT (d-index 0.93 and 0.94 during daily calibration and validation; d-index 0.97 during both monthly calibration and validation) and VIC (d-index 0.78 and 0.89 during daily calibration and validation; d-index 0.93 and 0.96 during monthly calibration and validation) models. However, the discharge simulated by the SWAT model was found more accurate than the VIC model.

  2. 2.

    The performance of the VIC model (d-index 0.72 and 0.66 during 8-daily calibration and validation; d-index 0.72 and 0.71 during monthly calibration and validation) in simulating ET was found better as compared to the SWAT model (d-index 0.67 and 0.63 during 8-daily calibration and validation; d-index 0.71 and 0.64 during monthly calibration and validation).

  3. 3.

    The EnSwaVi (ensemble of VIC and SWAT) model-simulated runoff (d-index 0.94 during both daily calibration and validation; d-index 0.98 and 0.97 during monthly calibration and validation) and ET (d-index 0.83 during both 8-daily calibration and validation; d-index 0.88 and 0.86 during monthly calibration and validation) were more accurate than individual SWAT and VIC outputs.

  4. 4.

    Based on the results, it can be concluded that both SWAT and VIC models can be efficiently applied to carry out water balance analysis and for planning and management of water resources. However, the EnSwaVi model could marginally improve the results.

  5. 5.

    ET has been found predominant and accounts for about 38.46% of the annual average precipitation falling over the area. It has been inferred that about 39.75% flows out as surface runoff from the watershed, out of annual average precipitation of 1330.90 mm.