Introduction

The hydrological response of river basins is constantly changing due to disparity in precipitation, temperature, topography, lithology, vegetation, and other climatic factors. There are many sub-basins and catchments currently experiencing water stressed condition, primarily owing to anthropogenic (including the increasing population and economic growth), and climatic factors (including the increase in global average temperature as well as extreme weather events like heat waves and storms). Hydrological cycles are impacted by streamflow, which is one of the most important variables and is a result of both atmospheric and topographic processes (Jimeno-Sáez et al. 2018). In addition, streamflow simulation can be extremely challenging due to complex hydrological phenomenon and nonlinear interactions between climate inputs and landscape characteristics (such as topography, geology, soils, and land cover) across a wide range of spatial and temporal scales (Yasmin and Sivakumar 2018; Wang et al. 2021). Therefore, hydrologists must identify and understand the several factors affecting the hydrological cycle in order to meet our needs in a sustainable manner and without disrupting the ecological balance. Thus, in order to manage and plan water resources effectively, more realistic and robust streamflow simulation is of primary importance (Liu et al. 2017; Al-Sudani et al. 2019; Wang et al. 2021).

Over the last two decades, a growing number of papers reporting the research involving various approaches such as physically based and data-driven approaches to simulate streamflows have gained increasing attention (Sharma and Machiwal 2021). Recent years have witnessed rapid growth in hydrological simulation–based models to quantifying forecast uncertainty and streamflow complexity. In the quest for accurate streamflow simulation and watershed information, water managers are constantly striving for accuracy. Many studies utilizing the Soil and Water Assessment Tool (SWAT) in the literature have predominantly of streamflow focused on the simulation. Abu-Allaban et al. (2015) conducted a study utilizing the SWAT model to evaluate the effects of climate change on water scarcity in the semi-arid regions of the Mujib Basin, located in Jordan. They utilized the monthly and yearly base flow and surface runoff data for the years 1970 to 1997. The findings suggest that the SWAT model successfully generates dependable and accurate hydrological data for the Mujib Basin in Jordan. Koycegiz and Buyukyildiz (2019) employed a semi-distributed SWAT model to simulate daily streamflow spanning a period of 13 years, specifically from 2003 to 2015 at the headwater of Çarşamba River, Turkey. The results obtained from the SWAT model were compared with those obtained from artificial intelligence-based models, namely, the radial-based neural network (RBNN) and support vector machines (SVM), and it was observed that the SWAT model demonstrates superior performance in simulating low flows compared to capturing peak flows. Singh and Saravanan (2020) applied SWAT model for simulating the monthly streamflow of Ib river watershed, India, for a period spanning 19 years, from 1990 to 2011. The results indicate the reasonably good agreement between the observed and simulated streamflow data signifies that the model was successful in capturing the variations of streamflow. Additionally, the reduced levels of uncertainty indicate a higher level of confidence in the model’s predictions.

Currently, the Sot river catchment, a tributary of river Ganga, faces major water crises and acute water shortage indicates the possibility with an increase in frequency of extreme weather events such as high intensity rainfall of short duration and long dry spells that lead to flood and drought. In the watersheds, where ground-based observations are scarce, not accessible, or time-consuming and economically inefficient, process-based hydrological models can be most effective and viable means to improve the modeling accuracy and to quantify and simulate the streamflows at spatial and temporal resolution with reliable data sets. It can also be a useful for analyzing, predicting, and estimating the various catchment processes (Baffaut et al. 2015; Rahman et al. 2020). Because of this constraint, this study successfully calibrated and validated the streamflows of the Sot River catchment using the SWAT model.

The SWAT model is physical and semi-distributed public domain model (Arnold et al. 1998), which works in a daily time step, and has been widely applied in hydrological and environmental studies, due to its capacity to simulate streamflow under varying land use/cover (LULC) and climate conditions (Dixon and Earls 2012; Shi et al. 2013; Noori et al. 2014; Fan and Shibata 2015; Glavan et al. 2015; Krysanova and Srinivasan 2015; Noori and Kalin 2016; Pradhan et al. 2020). However, SWAT model requires numerous input parameters that are sometimes hard to predict, including spatial and temporal scale data (Makwana and Tiwari 2014). It relies on the quality of input data and parameters in the model to produce good results. In recent years, numerous studies have been undertaken to simulate streamflow in various river basins across India. However, no studies have been reported specifically on the Sot river catchment, making it a unique and unexplored area in terms of streamflow modeling. Therefore, the primary objective of this study is to accurately predict streamflow, providing valuable insights for hydrologists and water managers to facilitate effective planning and management of water resources systems. In this study, a hydrological model using ArcSWAT was developed to simulate streamflow in the Sot river catchment at a daily timescale. Additionally, this study placed emphasis on conducting sensitivity analysis, model calibration, and validation processes using SWAT-CUP. It also involved assessing the hydrological conditions of the watershed in the Sot river catchment.

Study area

The present study was conducted for the Sot river catchment, which is a tributary of the Ganges in India. The course of the river traverses expansive agricultural and industrial lands across various districts of Uttar Pradesh, including Jyotiba Phule Nagar, Moradabad, Budaun, Shahjahanpur, and Farrukhabad. Within these regions, the river plays a crucial role as a source of potable water for the local communities, serving their drinking water needs. Additionally, it serves as a vital water resource for irrigation purposes, supporting agricultural activities in the area. In the study area, there has been a significant surge in water demand over the past few decades owing to rapid urbanization, expanding industrialization, and the flourishing agricultural sector. Surface water being the most dynamic natural resource to meeting diverse needs has experienced extensive exploitation across the entire study region in recent years. Consequently, Sot river suffers from severe water scarcity and numerous hydrological challenges, such as declining groundwater levels, frequent droughts, soil erosion, and desertification in certain areas. Additionally, the Sot river catchment witnesses strong seasonal climatic variations, which are reflected in the monthly fluctuations of streamflows. The Sot river catchment stretches over a drainage area of 3752.73 km2 spanning between 78° 30′ 00″ to 79° 30′ 00″ E longitude and 27° 30′ 00″ to 29° 00′ 00″ N latitude. The topography of the catchment varies, with elevation ranging from 150 to 250 m above mean sea level (amsl) in the northern part (based on Shuttle Radar Topographic Mission digital elevation model). The higher elevation areas are located along the northern ridge of the watershed. The majority of the watershed consists offlat terrain, with only 5% having slopes greater than 3%. The study area receives 950 mm of average rainfall annually, with hot summers and cold winters. Mean monthly minimum temperature ranges from 5 °C in January to 25 °C in June, while the mean monthly maximum temperature varies from 30 °C in January to 43 °C in May. The geographical location of the study area is depicted in Fig. 1.

Fig. 1
figure 1

Index map of the Sot river catchment

Materials and methods

Data preparation and pre-processing

The input data required for SWAT modeling are digital elevation model (DEM), soil type, land use/land cover (LULC) map, and meteorological data. The input forcing data required for SWAT modeling can be broadly classified into two categories: (1) spatial data and (2) temporal data. The details of spatial and temporal data, procured from different organizations, are given in Table 1. This data was pre-processed for preparing the input data, which was further used for ArcSWAT model building. Moreover, the basic statistical characteristics such as mean, standard deviation, minimum, maximum, skewness, and kurtosis of rainfall and streamflow time series for the calibration and validation period are summarized in Table 2. Additionally, the comprehensive, step-by-step procedure of the SWAT model employed in this study is depicted in Fig. 2.

Table 1 Details of input data used for the SWAT model and their sources
Table 2 Basic statistical characteristics of rainfall and streamflow time series for the calibration and validation periods
Fig. 2
figure 2

Framework for the proposed methodology in this study

Digital elevation model

DEM file is projected to coordinated system (WGS 1984 UTM Zone 43 N). The DEM is used to define the topography of study area that describes the elevation of any point in a catchment at a specific spatial resolution (see Fig. 3a). It is also used to delineate the network of river streams, sub-catchments, and parameters like slopes for HRUs.

Fig. 3
figure 3

a Digital elevation model; b land use land cover map; c slope map; d soil map of Sot catchment

Land use/land cover

For preparing LULC, three satellite imageries were downloaded from the USGS Earth Explorer at 30-m resolution. These images were processed, and image classification was performed by using ENVI tool. These satellite images comprised multiple bands which together form an image. In this study, supervised classification was used. In this technique, the software is guided by the researcher in specifying the land cover classes of interest as a signature dataset, which is then automatically used by the software to create the spectral classes. LULC is mainly used to define the factors that affect streamflow, evapotranspiration, and surface erosion in a watershed. The LULC map of the Sot catchment, for the year 2018, is shown in Fig. 3b.

Slope map

The slope is a crucial factor that significantly impacts the infiltration of surface water. The angle of the slope distinguishes between flat and steep terrain, with lower values indicating flatter areas and higher values indicating steeper terrain. The slope map of our study area was generated from SRTM-DEM data with a resolution of 90 meters by 90 meters using ArcGIS 10.4 tools. Fig. 3c illustrates that a significant portion of the study area is characterized by moderate to minor slopes (ranging from 0 to 4 degrees).

Soil map

The soil map with spatial resolution of 1:500,000 were obtained from the Food and Agricultural Organisation (FAO), United Nations. Soil data of Sot catchment is mainly divided into two different soil groups. It was observed that clay soil is covering more than 90% area of Sot catchment. The soil map of the Sot catchment is presented in Fig. 3d.

Streamflow simulation using Soil and Water Assessment Tool (SWAT) model

ArcSWAT model

In this study, the ArcGIS 10.4 interface of SWAT (version 2012) was utilized for hydrological modeling, which is popularly known as ArcSWAT. ArcSWAT, as a software tool for watershed modeling and hydrological analysis, offers several advantages over other artificial intelligence (AI) techniques. Firstly, ArcSWAT provides a more explicit and transparent approach to modeling, allowing users to understand and interpret the underlying processes and assumptions. In contrast, AI techniques often involve complex black-box models that lack transparency and may be challenging to interpret. Soil and Water Assessment Tool (SWAT) is a physically based, spatially distributed, and continuous-time-step hydrologic model used to simulate the impacts of different land use and land management practices and climate change on hydrology and water quality of a watershed. It is a free software and was developed at the USDA-ARS during the early 1970s. The hydrological component simulated by ArcSWAT is generally based on the principle of water balance equation, which is mathematically expressed as follows:

$${SW}_{t_i}={SW}_o+\sum_{i=1}^t\left({R}_{{\textrm{day}}_i}-{Q}_{{\textrm{surf}}_i}-{E}_{a_i}-{W}_{{\textrm{seep}}_i}-{Q}_{gw_i}\right)$$
(1)

where is the soil water content (mm) at time t, SWo is the initial soil water content (mm), t is the simulation period (days), \({R}_{{\textrm{day}}_i}\) the amount of precipitation on the ith day (mm), \({Q}_{{\textrm{surf}}_i}\) the amount of surface streamflow on the ith day (mm), \(E_{\mathrm{ai}}\) the amount of evapotranspiration on the ith day (mm), the \({W_{\mathrm{seep}}}_{\mathrm i}\) amount of water entering the vadose zone from the soil profile on the ith day (mm), and \({Q_{\mathrm{gw}}}_{\mathrm i}\) the amount of base flow on the ith day (mm).

Model setup

Watershed delineation is the first step of model setup. DEM data was used as an input to delineate the watershed in a number of hydrologically connected sub-watersheds. The next step in model setup is the hydrological response unit (HRU) analysis. HRUs are the unique combination of land cover, soil type, and topographic slope which represent the characteristics of the sub-catchments. The SWAT divided the watershed into 29 sub-watersheds and 90 HRUs represented by dominant land use, soil, and slope within each sub-catchment. Figure 3 shows the delineated watershed, sub-catchments, reach, and main outlet of the Sot catchment. After delineating the watershed and defining the HRUs, the surface runoff was calculated using the curve number method.

Sensitivity analysis

Sensitivity analysis is conducted to identify the parameters that are important for accurate results. SWAT-CUP, an automatic calibration tool, used the multiple regression analysis to find out the most sensitive parameters. Afterwards, the Student’s t-test was applied to get the statistic value (p-value) of each parameter. The smaller the p-value, i.e., <0.05, the more sensitive and significant the parameter is (Abbaspour 2013).

SWAT calibration and validation

The calibration of SWAT model can be performed by using two techniques: conventional trial-and-error method and auto-calibration technique. SWAT-CUP (SWAT Calibration and Uncertainty Procedures), a computer programme, was used for sensitivity analysis, calibration, validation, and uncertainty analysis of SWAT models. The program is linked with many algorithms such as SUFI-2 (sequential uncertainty fitting algorithm), PSO (particle swarm optimization), MCMC (Markov chain Monte Carlo), GLUE (generalized likelihood uncertainty estimation), and Parasol (parameter solution) procedures to SWAT. The SUFI-2 algorithm was employed to perform calibration and validation of streamflow data set. During this calibration phase, the model’s parameters were adjusted and fine-tuned to best match the observed streamflow data for the specified period. The goal of calibration is to achieve a close alignment between the simulated and observed streamflow values, ensuring the model accurately represents the hydrological processes. During the validation phase, the parameter values that were determined during the calibration process are not modified (Moriasi et al. 2007). Instead, the model is applied to a separate time period within the same river basin to assess its performance. The objective is to evaluate how well the calibrated model can accurately simulate the hydrological processes and replicate the observed streamflow. Once the initial construction of the SWAT model and the uploading of rainfall and temperature data have been completed, the subsequent step involves calibrating and validating the model. Indeed, calibration of the SWAT model involves the adjustment of various parameters to improve the model’s performance and achieve a closer match between simulated and observed data. Table 3 represents some of the key parameters used for calibration with their minimum and maximum values.

Table 3 Parameters used for calibration with their minimum and maximum values

Statistical performance evaluation indices of SWAT model

The statistical performance indices of SWAT model were evaluated and compared with the observed monthly streamflow during the calibration and validation period. These statistical goodness-of-fit measures include correlation coefficient (R) (Sharma et al. 2018) and Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe 1970).

Result and discussion

Results of streamflow time series using SWAT model

SWAT model output

In this study, the SWAT model was run for the Sot catchment falling in the state of Uttar Pradesh. It was estimated that the average annual rainfall of the catchment is 867.10 mm, snowfall is 0 mm, surface runoff is 145.88 mm, lateral discharge is 0.54 mm, and groundwater discharge from shallow aquifers is 279.99 mm. The average value of total aquifer recharge is 294.04 mm. The evapotranspiration is 426.40 mm. A pictorial representation of the SWAT output with water balance components is shown in Fig. 4. From the Fig. 4, it is clearly observed that, on an average, more than 45% of the total rainfall water is lost in surface runoff and evapotranspiration.

Fig. 4
figure 4

Pictorial representation of SWAT output

Calibration and validation results for the SWAT model

The entire data set from 2009 to 2016 was used for simulation with two-years (2009–2010) of warm-up period. The data was subsequently partitioned into two distinct phases: the calibration phase encompassing the years 2011 to 2014, and the validation phase covering the remaining data from 2015 to 2016.The SWAT-CUP framework, coupled with the SUFI-2 algorithm, was employed to iteratively adjust the model parameters until a satisfactory fit was achieved between the observed and simulated streamflows (Abbaspour et al. 2007). The calibration and validation of the SWAT model parameters were conducted based on streamflow data, following the approach used in previous studies (Singh and Saravanan 2020; Shrestha et al. 2016). To begin with, eight parameters were selected with their initial values, including minimum and maximum values, and updated in the par.info file. Table 4 presents the results of the parameter sensitivity analysis, which indicated that, among the 8 input parameters, 6 were most sensitive to streamflow simulations at the outlet of the Sot river catchment. In order of p-values, the most sensitive parameters are listed from least to most sensitive (see Table 4).

Table 4 Most sensitive parameters for streamflow simulation in the Sot river catchment

The calibration process involves adjusting the selected parameters of a model based on the characteristics of a watershed or catchment, within the recommended ranges of each parameter. Moreover, the calibration is used to optimize the model output so that the simulated values match with the observed values. In order to get better results, the calibration process was initiated for 2000 simulations by giving two years of warm-up period. Table 5 summarizes the minimum and maximum, as well as their fitted values of selected parameters obtained from auto calibration process. All of these parameters were adjusted manually and automatically until the simulated values matched the observed data the best.

Table 5 Results of auto-calibration values of fitted sensitive parameters

In order to compare observed and simulated streamflow, both graphical approaches and quantitative statistics were used. Table 6 provides a statistical summary of the relationship between observed and simulated streamflows over the calibration and validation period based on the performance evaluation indices. These performance evaluation indices gave satisfactory results between the observed and simulated stream flows with statistical values of R (0.73 and 0.84) and NSE (0.49 and 0.63) during the calibration and validation period. For calibration as well as validation periods, the R value indicates a good correlation between observed and simulated streamflows. Results obtained from the SUFI-2 showed that the value of p-factor values were 0.47 for the calibration period and 0.80 for the validation period. These values indicate that 47% and 80% of the data measured during calibration and validation period respectively captured or considered for the correct simulated streamflow by the model, while the remaining occur due to errors in input data such as variation in the rainfall and streamflow data. Understanding and addressing these errors in the input data, such as through data quality control measures, can help improve the model’s performance and further enhance the accuracy of the simulated streamflow. The r-factor evaluates the uncertainty in the calibration and measures the thickness of the 95 ppu envelop. The value of r-factor during calibration and validation period was 10.59 and 5.83, respectively, which indicate that less uncertainty and average performance of the model.

Table 6 Statistical summary of daily observed and simulated streamflows

The hydrographs of the daily streamflow over the calibration and validation periods are plotted for visual comparison are shown in Fig. 5. Figure 6 illustrates the scatter plot of the daily streamflow over the calibration and validation periods. This scatter plot illustrates the coefficients of determination (R2) values for the calibration period and validation periods, with values of 0.53 and 0.71, respectively. These R2 values indicate the level of goodness-of-fit between the observed and simulated data. A higher R2 value signifies a stronger correlation and suggests that the model performed well in replicating the observed streamflow patterns. In the calibration period, the R2 value of 0.53 indicates that approximately 53% of the variability in the observed streamflow can be explained by the model's simulated values. Although it may not represent a perfect fit, it demonstrates a significant level of agreement between the observed and simulated data during this period.

Fig. 5
figure 5

Hydrographs of the daily observed and simulated streamflow

Fig. 6
figure 6

Scatter plots of the daily observed and simulated streamflow

During the validation period, the higher R2 value of 0.71 suggests an improved performance of the model. Approximately 71% of the variability in the observed streamflow is accounted for by the model’s simulated values. This indicates a stronger correlation between the observed and simulated streamflow data during the validation period. Similar findings were reported by Singh and Saravanan 2020 for Ib river watershed, India. Additionally, it can be clearly seen from the scatter plot, presented in Fig. 6a and b, that the proposed model was unable to accurately represent the high values of observed data (> 2 m3/s) as the deviation from the trend line is very high compared to the low values of streamflow (< 2 m3/s). Difficulties in accurately capturing peaks and extremes are commonly encountered in hydrological modeling, and these challenges can be attributed to the significant variability of rainfall. This variability can lead to natural fluctuations in streamflow at various spatial and temporal scales. Additionally, the river flow system itself exhibits natural variabilities, including chaotic disturbances, non-stationary patterns, and complex and non-linear behaviors. These factors contribute to the challenges of achieving a good fit between observed and simulated data, particularly during peak flow events. The inherent complexities and uncertainties in hydrological processes and the unpredictable nature of extreme events make it challenging to accurately replicate them in hydrological models. As a result, poor-fitting for peaks and extremes is a common issue in hydrological modeling due to the natural variabilities and non-linear behaviors associated with rainfall and river flow systems. Earlier studies conducted by Sharma et al. (2021) have reported similar results. The variation in water yield for the sub-catchments simulated by SWAT model is presented in Fig. 7. The result shows that sub-catchment 16 has maximum contribution of the total streamflow available at the catchment outlet, while sub-catchment 3 has minimum share. Furthermore, sub-catchments 11, 13, 14, and 18–22 have significantly high contribution. The observed increase in surface runoff can be primarily attributed to a decrease in the amount of groundwater percolating into the soil. This reduction in groundwater infiltration leads to more water flowing over the land surface, resulting in increased surface runoff. Previous studies conducted by Sharma et al. (2022), Shukla et al. (2020), Leng et al. (2020), and Paul et al. (2017) also reported a similar trend of increased surface runoff in their respective study regions.

Fig. 7
figure 7

Sub-catchment wise variation in water yield simulated by SWAT model

Conclusions

The focus of this paper was on utilizing a physically distributed ArcSWAT model to simulate the daily streamflow of the Sot river catchment over a period of 8 years’ (2009 to 2016). In order to evaluate the model’s performance, we conducted an assessment that involved sensitivity analysis, calibration, and validation using the SUFI-2 algorithm within the SWAT-CUP framework. This assessment was specifically focused on simulating daily streamflow time series data. A total of six parameters were found the most sensitive during calibration and validation of the model for Sot river. The model performance criteria including correlation coefficient (R) and Nash-Sutcliff efficiency (NSE) showed 0.73 and 0.84 and 0.49 and 0.63, respectively, during the calibration and validation periods. These results indicate a good agreement between the observed and simulated streamflow data, suggesting a good fit between the model outputs and the actual streamflow dynamics. The results of the study demonstrated that the SWAT model performed relatively well in capturing the quantity and variability of daily streamflow hydrograph both during both the calibration and validation periods. The findings suggest that the SWAT model is a useful tool and provided reasonably acceptable results for simulating streamflow in the Sot river catchment. In conclusion, this research highlights the effectiveness of the physically distributed SWAT model in simulating streamflow dynamics in the Sot river catchment, emphasizing its accuracy and suitability for hydrological modeling and water resource management applications.