Introduction

Parameter calibration in a hydrological model is a process of constantly trying and adjusting the parameter values by various means to make the calculated flow, as closely and consistently as possible, appropriate to the measured flow of the watershed (Duan et al. 1992; Rui 2017). From a mathematical point of view, parameter calibration is an optimization problem (Yapo et al. 1998; Fan et al. 2015). The calibration method consists of objective function and constraints (Yen et al. 2019). At present, most of the parameter calibration methods of hydrological model studied in the world are single-objective calibration procedures, such as the simplex method and the shuffled complex evolution algorithm (SCE-UA) (Daggupati et al. 2015). The sequential uncertainty fitting approach-version 2 (SUFI-2), generalized likelihood uncertainty estimation (GLUE), particle swarm optimization (PSO), parametric solutions (parasol), and Markov chain Monte Carlo (MCMC) embedded in SWAT calibration and uncertainty programs (SWAT-CUP) are also single-objective algorithms (Jiang et al. 2017; Liu et al. 2019). The fundamental challenge stems from the effective design of multiple objective calibration strategies to find Pareto-optimal solutions in a single run and, therefore, eliminate the need for running a sequence of single-objective optimization problems.

The calibration methodology based on a single optimization objective only considers one aspect of the complex hydrological process, cannot fully mine other characteristic information contained in hydrological data (Chen et al. 2008; Li et al. 2010), and may not be enough to capture all the aspects of the system response that the model is supposed to reproduce. Therefore, several criteria need to be considered simultaneously, so that the multi-objective automatic calibration algorithms have been widely used in the hydrological models (Yang and Wang 2010). For example, Cao (2010) proposed the multi-objective differential evolution adaptive metropolis (MODREAM) and applied it to the runoff simulation of catchment moisture deficit loss equation-three parallel linear reservoirs (CMD-3PAR) model and found the automatic calibration algorithm could better reflect the actual hydrological characteristics of the watershed and showed higher simulation accuracy than the traditional single objective algorithm. Guo (2013) proposed the multi-objective culture shuffled complex differential evolution (MOCSCDE), analyzed its performance in the calibration of lumped Xin’anjiang model, distributed Xin’anjiang model and support vector regression model, and considered the MOCSCDE calibration method could reflect the hydrological characteristics in different periods, effectively avoid the “homogenization effect” produced by the single-objective algorithm, and significantly improve the simulation performance.

The non-dominated sorting genetic algorithm II (NSGA-II), developed by Ercan and Goodall (2016), is a fast and efficient multi-objective genetic algorithm (MOGAS) (Anderton et al. 2002). NSGA-II algorithm has been applied in the calibration of hydrological model and is an effective tool for the multi-objective calibration of watershed model parameters (Chen et al. 2018). For example, Bekele and Nicklow (2007) used NSGA-II to optimize the runoff parameters of different hydrological stations at the same time, which made the simulation performance higher than other single-objective scenarios. Guo et al. (2013) found that the NSGA-II method could obtain better simulation results in Xin’anjiang model than the single-objective method by reasonably selecting the type and number of objective functions. Overall, multiple objective optimization is used to solve coupled inverse problems that can extract information from multiple data sources or multi-sites, or both to reduce uncertainty of the inverse solution. Examples of such multiple objectives include the objective function value such as NSE, BIAS, and R2 calculated separately on low and high flows, timing errors, and error in reproducing water balance (Chilkoti et al. 2018). More importantly, the models constituting the Pareto in criteria space can be seen as the best models, since there are no models better than these on all criteria (Vallerio et al. 2015). This is critical to test hypotheses regarding the need to design additional criteria to determine a Pareto-optimal solution for the multiple objective optimization.

The objectives of this study are to (i) compare the runoff calibration performance between the non-dominated sorting genetic algorithm-II (NSGA-II) and the single-objective SUFI-2 method, (ii) evaluate the single and synchronous calibration efficiency of runoff and sediment by NSGA-II at different sites, and (iii) demonstrate the possible advantages of multiple objective synchronous calibration over single-site or single-element calibration for the SWAT model. The findings of this study can provide suggested possible remedies for the certain deficiencies of single-objective function and single-element calibration.

Materials and method

Study region

Yanhe River Watershed (108°39′E ~ 110°29′E and 36°22′N ~ 37°20′N) is located in the north of Shaanxi Province, belonging to the loess hilly and gully region (Fig. 1). The total area of Yanhe River Watershed is about 7725 km2. The altitude is about 471 ~ 1213 m. The terrain is high in the northwest and low in the southeast, with an average slope angle of 17°. From the headwater to the estuary, the Yanhe River is divided into upper, middle, and lower reaches with Huaziping and Ganguyi as the dividing point. The upper, middle, and lower reaches are the loess hilly and gully area with dominated ridges and hills, the loess hilly and gully area with dominated hills, and the crushing table plateau, respectively. The interannual difference of precipitation in Yanhe River watershed is obvious, and the distribution within a year is very uneven. The annual average temperature is about 9.3 ℃, and the annual average precipitation for many years is 482.3 mm, which is mostly concentrated in June to August, accounting for 68.55% of the annual precipitation (Liu et al. 2010). Its runoff accounts for 99.95 ~ 100% of the annual runoff, and the scouring amount of sediment accounts for 94 ~ 100% of the whole year (Wu et al. 2020). The month with monthly precipitation frequency greater than 50% is from May to October, which can be considered as the wet season, and the runoff in wet season accounts for more than 90% of the whole year. Loessial soil is the most widely distributed soil type in the Yanhe River watershed, accounting for more than 85% of the total area. The loessial soil is loose, and the particle size is mainly silt, belonging to cultivated soil. It is very easy to be eroded by rainfall runoff and is widely distributed in ridge and hill tops, slopes, tablelands, and ditches.

Fig. 1
figure 1

Study region: a the relative location of the Yanhe River watershed in China, and b digital elevation model (DEM), longitude and latitude coordinates, and the upper reaches of Ganguyi and Yan’an hydrological stations in the Yanhe River watershed

The basic data used for the establishment of SWAT model of runoff and sediment mainly include digital elevation model (DEM), land use types, soil types, meteorological data, runoff, and sediment (Table 1).

Table 1 Data types, descriptions, and sources of the SWAT model in the Yanhe River watershed

Construction of SWAT model

In the modeling process using ArcSWAT 2012, the river network is generated through DEM, and the catchment area threshold is set as the default value (15095 hm2). In addition, two hydrological stations, Ganguyi and Yan’an, were added to the river channel for hydrological calibration. By setting the total outlet of the watershed to obtain a total of 27 sub-watersheds. In this study, the preheating period of the model is 1989–1990, the calibration period is 1991–1995, and the validation period is designed as 1996–2000 or 1996–1997. First, the NSGA-II and SUFI-2 algorithms were adopted with SWAT to compare the calibration performance of runoff parameters (Wu et al. 2022a).

In this study, twelve parameters related to runoff and eight parameters related to sediment are selected with the help of relevant literatures to study the calibration performance of runoff and sediment under different scenarios. The parameters, parameter range, optimal parameters, and the physical meaning of parameters are shown in Table 2. Among them, the parameter range is based on SWAT input and output manual (Abbaspour et al. 2007).

Table 2 Variable name and definition, minimum, maximum, and optimal parameters by the NSGA-II calibration method

Calibration scenario design

This study explores and evaluates the calibration strategy and the efficiency of NSGA-II through nine designed scenarios.

Firstly, the scenarios of S1, S2, and S3 (Table 3) were, respectively, designed to calibrate and validate the runoff parameters of SWAT model by NSGA-II and to compare the simulation results of single and synchronous calibration in the Yanhe River watershed. The initial population was set to 1000, the population size of offspring is 50, and the number of generations is 50. The objective functions of the single-site calibration were set as NSE (1) and PBIAS (2), and the objective functions of the two-site calibration are \(\overline{\mathrm{NSE}}\) (3) and \(\overline{\mathrm{PBIAS}}\) (4), respectively.

$$\mathrm{NSE}=1-\frac{{{\sum }_{i=1}^{n}({Q}_{\mathrm{si}}(m)-{Q}_{\mathrm{oi}}(m))}^{2}}{{{\sum }_{i=1}^{n}({Q}_{\mathrm{si}}(m)-\overline{{Q}_{\mathrm{o}}}(m))}^{2}}$$
(1)
$$\mathrm{PBIAS}=\frac{{\sum }_{i=1}^{n}({Q}_{\mathrm{oi}}(m)-{Q}_{\mathrm{si}}(m))}{{\sum }_{i=1}^{n}{Q}_{\mathrm{oi}}(m)}\times 100$$
(2)
$$\overline{\mathrm{NSE}}=\frac{1}{m}(\mathrm{NSE}(1)+\mathrm{NSE}(2)+...+\mathrm{NSE}(m))$$
(3)
$$\overline{\mathrm{PBIAS}}=\frac{1}{m}(\mathrm{PBIAS}(1)+\mathrm{PBIAS}(2)+...+\mathrm{PBIAS}(m))$$
(4)

where \({Q}_{\mathrm{si}}\) is the simulated value (m3/s), \({Q}_{\mathrm{oi}}\) is the observed value (m3/s), \({Q}_{\mathrm{o}}\) is the average observed value (m3/s), and m is the site number, m = 2.

Table 3 Different scenario designs of single and synchronous calibration in runoff and sediment parameters of Ganguyi and Yan’an hydrological stations by NSGA-II

Secondly, the scenarios of S4, S5, and S6 (Table 3) were, respectively, designed to calibrate and validate the runoff and sediment parameters of SWAT model by NSGA-II and to compare the simulation results of single and synchronous calibration in the Yanhe River watershed. The initial population was set to 1000, the population size of offspring is 50, and the number of generations is 50. The objective functions of S4 and S5 were set as NSE (5) and PBIAS (6), and the objective functions of S6 are \(\overline{\mathrm{NSE}}\) (7) and \(\overline{\mathrm{PBIAS}}\) (8), respectively.

$$\mathrm{NSE}=1-\frac{{{\sum }_{i=1}^{n}({Q}_{\mathrm{si}}(k)-{Q}_{\mathrm{oi}}(k))}^{2}}{{{\sum }_{i=1}^{n}({Q}_{\mathrm{si}}(k)-\overline{{Q}_{\mathrm{o}}}(k))}^{2}}$$
(5)
$$\mathrm{PBIAS}=\frac{{\sum }_{i=1}^{n}({Q}_{\mathrm{oi}}(k)-{Q}_{\mathrm{si}}(k))}{{\sum }_{i=1}^{n}{Q}_{\mathrm{oi}}(k)}\times 100$$
(6)
$$\overline{\mathrm{NSE}}=\frac{1}{2}(\mathrm{NSE}(1)+\mathrm{NSE}(2))$$
(7)
$$\overline{\mathrm{PBIAS}}=\frac{1}{2}(\mathrm{PBIAS}(1)+\mathrm{PBIAS}(2))$$
(8)

where \({Q}_{\mathrm{si}}\) is the simulated value (runoff, m3/s; sediment, t), \({Q}_{\mathrm{oi}}\) is the observed value (runoff, m3/s; sediment, t), \(\overline{{Q}_{o}}\) is the average observed value ( runoff, m3/s; sediment, t), “\(k=1\)” indicates the runoff data, and “\(k=2\)” indicates the sediment data.

Thirdly, the scenarios of S7, S8, and S9 (Table 3) were respectively designed to calibrate and validate the runoff and sediment parameters of SWAT model by NSGA-II and to compare the simulation results of the synchronous calibration in single-site and two-site in the Yanhe River watershed. The initial population was set to 1000, the population size of offspring is 50, and the number of generations is 50. The objective functions of S7 and S8 were set as \(\overline{\mathrm{NSE}}\) (9) and \(\overline{\mathrm{PBIAS}}\) (10), and the objective functions of S9 are \({\overline{\mathrm{NSE}}}_{2}\) (11) and \({\overline{\mathrm{PBIAS}}}_{2}\) (12), respectively.

$$\overline{\mathrm{NSE} }(k,m)=1-\frac{{{\sum }_{i=1}^{n}({Q}_{\mathrm{si}}(k,m)-{Q}_{\mathrm{oi}}(k,m))}^{2}}{{{\sum }_{i=1}^{n}({Q}_{\mathrm{si}}(k,m)-\overline{{Q}_{o}}(k,m))}^{2}}$$
(9)
$$\overline{\mathrm{PBIAS} }\left(k,m\right)=\frac{{\sum }_{i=1}^{n}\left({Q}_{\mathrm{oi}}\left(k,m\right)-{Q}_{\mathrm{si}}\left(k,m\right)\right)}{{\sum }_{i=1}^{n}{Q}_{\mathrm{oi}}\left(k,m\right)}\times 100$$
(10)
$${\overline{\mathrm{NSE}}}_{2}=\frac{1}{4}(\mathrm{NSE}(\mathrm{1,1})+\mathrm{NSE}(\mathrm{1,2})+\mathrm{NSE}(\mathrm{2,1})+\mathrm{NSE}(\mathrm{2,2}))$$
(11)
$${\overline{\mathrm{PBIAS}}}_{2}=\frac{1}{4}(\mathrm{PBIAS}(\mathrm{1,1})+\mathrm{PBIAS}(\mathrm{1,2})+\mathrm{PBIAS}(\mathrm{2,1})+\mathrm{PBIAS}(\mathrm{2,2}))$$
(12)

where \({Q}_{\mathrm{si}}\) is the simulated value (runoff, m3/s; sediment, t), \({Q}_{\mathrm{oi}}\) is the observed value (runoff, m3/s; sediment, t), \(\overline{{Q}_{\mathrm{o}}}\) is the average observed value (runoff, m3/s; sediment, t), “\(k=1\)” indicates the runoff data, and “\(k=2\)” indicates the sediment data. “\(m=1\)” indicates data from Yan’an station, and “\(m=2\)” indicates data from Ganguyi station.

Results

Calibration and validation by NSGA-II and SUFI-2

NSGA-II and SUFI-2 algorithms are, respectively, used to calibrate runoff parameters of Ganguyi catchment and Yan’an catchment (Table 4). During the calibration period from 1991 to 1995, both R2 and NSE of Ganguyi catchment reached > 0.7, PBIAS were < 20%, and R2 and NSE of Yan’an catchment reached > 0.65; PBIAS were also < 20%. These data indicate that the two calibration methodologies can meet the accuracy requirements of hydrological modeling in the Yanhe River watershed. Moreover, R2 and NSE at both sites are satisfactory in the validation period from 1996 to 1997, although R2 and NSE decreased significantly or even negative in the validation period from 1998 to 2000. More significantly, NSGA-II algorithm presents an obvious advantage that its PBIAS is better than that of SUFI-2 method in the calibration and validation period of the two sites.

Table 4 Comparison of runoff calibration and validation results between NSGA-II and SUFI-2 in different periods

Comparison of single-site and two-site calibration by NSGA-II

After the parameters of S1, S2, and S3 scenarios are calibrated, the simulation performance of each station in the calibration and validation period was statistically shown in Table 5. Generally, NSGA-II method has more constraints than SUFI-2 on the parameter calibration process, and the simulation results are more reasonable. However, the parameters corresponding to S1 and S2 scenarios have obvious spatial limitations. For example, the NSE coefficients of S1 in Ganguyi station are respectively 0.72 and 0.77 in the calibration and validation period (96–97), but the NSE coefficients in Yan’an station are reduced to − 1.04 and 0.42 in the calibration and validation period (96–97). The NSE coefficients of S2 in Yan’an station were, respectively, 0.76 and 0.81 in the calibration and validation period (96–97), but in Ganguyi station, the NSE coefficients decreased to 0.60 and 0.45, respectively. This phenomenon indicates that there will be poor NSE and PBIAS, if the S1 scenario parameter is used to Yan’an catchment or the S2 scenario parameter is applied to Ganguyi catchment regardless of the calibration and validation period. On the contrary, the parameters corresponding to S3 scenario obviously have better applicability. The NSE coefficients of S3 in Ganguyi station are 0.63 and 0.72, respectively, in the calibration and validation periods (91–95, 96–97), and those in Yan’an station are, respectively, 0.75 and 0.87. Therefore, the simulation performance of S3 in two hydrological stations is satisfactory in the calibration and validation periods (91–95, 96–97).

Table 5 Comparison of runoff calibration and validation results under single-site (S1, S2) and two-site (S3) scenarios

Comparison between single and synchronous calibration of runoff and sediment parameters by NSGA-II

After the parameters in S4, S5, and S6 scenarios are calibrated, the simulation performance of each station in the calibration and validation period was statistically shown in Table 6. The calibration and verification effect of runoff and sediment in S6 scenario is relatively good, but there are obvious differences between S4 and S5 scenarios. Firstly, the S4 scenario only uses runoff data for calibration, but the calibration and validation performance of runoff and sediment in these two stations is acceptable. The NSE coefficients of the runoff parameters in Ganguyi station are 0.72 and 0.77, respectively, in the calibration and validation period (91–95, 96–97), and the NSE coefficients of sediment parameters in Ganguyi station are 0.53 and 0.46 in the calibration and validation period (91–95, 96–97). The NSE coefficients of the runoff parameters in Yan’an station are 0.76 and 0.81, respectively, and the NSE coefficients of sediment parameters in Yan’an station are 0.51 and 0.56, respectively. Secondly, the S5 scenario only uses sediment data for calibration, and the calibration and validation performance of sediment is better than that of S4 scenario, but the runoff effect is relatively poor, even negative. Specifically, the NSE coefficients of S5 in the sediment parameters of Ganguyi station are 0.71 and 0.60, respectively, in the calibration and validation period (91–95, 96–97), but the NSE coefficients of runoff parameters in Ganguyi station are − 0.93 and − 3.26, respectively; the NSE coefficients of S5 in the sediment parameters of Yan’an station are, respectively, 0.64 and 0.65 in the calibration and validation period (91–95, 96–97), but the NSE coefficients of runoff parameters in Yan’an station are − 14.83 and − 6.59, respectively.

Table 6 Comparison of single and synchronous calibration and validation results of runoff and sediment in different periods

Furthermore, the NSE coefficients of S6 scenario in the runoff parameters of Ganguyi station are 0.74 and 0.82, respectively, in the calibration and validation period (91–95, 96–97) and 0.56 and 0.58, respectively, in the sediment parameters of Ganguyi station. The NSE coefficients of S6 scenario in the runoff parameters of Yan’an station are 0.78 and 0.83, respectively, in the calibration and validation period (91–95, 96–97) and 0.56 and 0.60, respectively, in the sediment parameters of Yan’an station.

Synchronous calibration of runoff and sediment parameters in two sites

Like S1–S3 scenarios, the parameters corresponding to S7 and S8 scenarios still have obvious spatial limitations on runoff (Table 7). When S7 scenario parameter is applied to the runoff in Yan’an catchment or S8 scenario parameter to the runoff in Ganguyi catchment, the simulation effect is obviously poor in the calibration and validation period. For example, the NSE coefficients of S7 in runoff parameters of Ganguyi station are 0.74 and 0.82, respectively, in the calibration and validation period (91–95, 96–97), but the NSE coefficients in Yan’an station are reduced to − 0.87 and 0.37 in the calibration and validation period (91–95, 96–97). The NSE coefficients of S8 in runoff parameters of Yan’an station are 0.70 and 0.89, respectively, in the calibration and validation period (91–95, 96–97), but the NSE coefficients in Ganguyi station are reduced to 0.67 and 0.52. The above results indicate again that using a single station to calibrate runoff will make the application of the model have a large deviation. By contrast, S9 has a good runoff simulation effect in the calibration and verification period (91–95, 96–97) at Ganguyi and Yan’an stations, the two-site method based on NSGA-II can improve the adaptability of parameters in runoff simulation.

Table 7 Comparison of results of single and synchronous calibration of runoff and sediment at two stations

Discussion

Comparison of the NSGA-II and SUFI-2 algorithms

The monthly simulated runoff obtained by the NSGA-II and SUFI-2 algorithms is basically consistent with the measured runoff at the two stations (Fig. 2), indicating that the SWAT model has good adaptability in the runoff simulation of Yanhe River watershed. Furthermore, the simulation effect of the calibrated model in the wet season is significantly better than that in the dry season at the two stations. Many scholars have similar results that there will be significant differences in the simulation effects in wet season and dry season when SWAT is applied to watersheds with extreme uneven in inner-annual flow (Gao 2018). Firstly, SWAT model uses soil conservation service curve number (SCS-CN) method to calculate surface runoff, and dry season factors are not fully considered in the equation (Liew and Garbrecht 2003). Secondly, the objective functions, such as NSE and PBIAS, are more sensitive to the peak value, because they mainly characterize the overall simulation effect and may not reflect the change in dry season (Zhang et al. 2015).

Fig. 2
figure 2

Comparison of two calibration algorithms in different hydrological stations: a Ganguyi and b Yan’an

R2, NSE, and PBIAS during the validation period from 1996 to 2000 were partly worse than that of the calibration period, which was related to complex and severe human activities after 1996 (Chen et al. 2010). Particularly, the decline of validation effect after 1998 may be related to the effective implementation of large-scale “Grain for Green” projects in the late 1990s in the Yanhe River Watershed (Wu et al. 2019), such as the project of “beautiful mountains and rivers” implemented in 1997 and the project of “returning farmland to forest/grass” implemented in 1999 (Yu 2008). These soil conservation practices have greatly changed the underlying surface conditions of the watershed, but the model does not take this change into account and only uses the land use data in 1995, which may be the other important reason for the poor simulation performance after 1998.

More importantly, PBAIS is the average deviation between the simulated value and the measured value. The small PBIAS by NSGA-II algorithm implies that this method can not only characterize the consistency of the average annual runoff between simulation and observation but also can better weaken the impact of peak flow (Gebremariam et al. 2014). Therefore, the NSGA-II algorithm based on multiple objective functions can better constrain the parameter calibration process and make the calibrated model more in line with the physical conditions of the watershed.

Calibration of single- and multi-hydrological elements

Using NSGA-II algorithm to calibrate runoff or sediment alone can effectively improve the simulation performance of runoff or sediment. However, using only runoff data to calibrate model parameters may improve the calibration performance of sediment to a certain extent according to the results of S4 and S5 scenarios, while using only sediment data to calibrate model parameters has no obvious effect on runoff performance. This result proves the rationality of first calibrating runoff and then calibrating sediment in the model calibration. This is because runoff parameters will not only affect runoff but also affect sediment processes (Abbaspour et al. 2007). Although the runoff simulation of S5 scenario is relatively poor, the simulation effect of sediment is significantly better than that of S4 scenario, which can be attributed to the differential impact of some parameters on runoff and sediment simulation (Ghasemizade et al. 2017). The effect of S6 scenario is better than S4 in the runoff simulation of these two stations during the calibration and validation period and is equivalent to S5 scenario in the sediment simulation of these two stations. Therefore, using runoff and sediment data to calibrate hydrological parameters may be more reasonable than using runoff data only. It is feasible to use the NSGA-II methodology and two data sets to calibrate runoff and sediment synchronously, which can not only reduce the number of parameter iteration (Ercan and Goodall 2016) but also improve the simulation effects of runoff and sediment. More importantly, the synchronous calibration of runoff and sediment parameters avoids the cumbersome steps of calibrating runoff and sediment, respectively, and can make full use of runoff and sediment data information and improve the calibration efficiency of SWAT model in hydrological parameters.

Synchronous calibration of runoff and sediment at multi-sites

Single-site calibration is more reliable in small watersheds, but it will ignore the spatial heterogeneity of parameters and affect the effectiveness of the application of the model in the whole large-scale watershed (Anderton et al. 2002). Many studies have shown that multi-site calibration methodology is an effective way to solve this problem (Bai et al. 2017; Molina-Navarro et al. 2017; Gong et al. 2012), which can make full use of the existing data and reduce the uncertainty of hydrological model of medium- and large-scale watersheds with high spatial heterogeneity (Bekele and Nicklow 2007). Some researchers have proved that better simulation results can be obtained by using multi-site data in the calibration and validation processes (Zhang 2008; Ercan and Goodall 2016). For example, Zhang et al. (2013) proposed a multi-algorithm genetically adaptive multi-objective (AMALGAM) optimization algorithm, which could not only effectively avoid the spatial heterogeneity of parameters but also improved the model calibration efficiency and the reliability of simulation results.

There are some differences between the synchronous calibration of runoff and sediment in single-site and the synchronous calibration of runoff and sediment under multi-sites (Table 7). The applicability of calibrated parameters in sediment simulation may be wider than runoff in the Yanhe River watershed. For example, the NSE coefficients of S7 in the sediment parameters of Ganguyi station are 0.56 and 0.58, respectively, in the calibration and validation period (96–97), and even better in Yan’an station, with the NSE coefficients of 0.60 and 0.69, respectively; the NSE coefficients of S8 in the sediment parameters of Yan’an station are 0.56 and 0.60, respectively, in the calibration and validation period (91–95, 96–97), and 0.50 and 0.48 in Ganguyi station, respectively. Generally, sediment is usually inseparable from runoff when studying sediment yield in a watershed, and it is meaningful to assess the adaptability of parameters to sediment only when they first have good adaptability to runoff. Therefore, both S7 and S8 scenarios still have spatial limitations under certain circumstances. This is because single-site calibration may produce local optimal solutions, and there are obvious adaptability problems in the application of different watershed scales (Huo et al. 2020). S9 scenario not only realizes the applicability of parameters but also its sediment simulation effect is also basically equivalent to that of S7 and S8. The NSE coefficients of monthly runoff in Ganguyi station are 0.67 and 0.52, respectively, in the calibration and validation period and 0.70 and 0.89 in Yan’an station; the NSE coefficients of monthly sediment in Ganguyi station are 0.67 and 0.64, respectively, and 0.56 and 0.55 in Yan’an station. The NSE coefficients of runoff and sediment in Ganguyi and Yan’an stations are all greater than 0.5, indicating that the synchronous calibration of runoff and sediment parameters in these two stations based on NSGA-II can determine a Pareto-optimal solution for the multiple objective optimization scheme in a single run and obtain a unique parameter set with wide applicability, which can not only improve the calibration performance of model but also can provide a robust model basis for watershed management.

The set of all noninferior solutions form the Pareto front. The collection of all Pareto-optimal solutions often forms an L-shaped curve in the objective space (Jia and Ierapetritou 2007). Figure 3 shows the Pareto front of a bicriterion minimization problem, where the dots correspond to Pareto-optimal solutions. The parameter set with the optimal average NSE (0.65) is selected as the final solution, meanwhile, with the average PBIAS of 0.19%. Practically, the solution of a bicriterion optimization problem is a process of making tradeoffs between Pareto-optimal solutions (Abdalla et al. 2023). By using the Pareto-solution set from multi-site calibration, one can generate a Pareto-ensemble of simulated outputs for different sites, so that the uncertainty in the model simulations due to different ways of trading-off the model and data errors can be examined (Salazar and Rocco 2007).

Fig. 3
figure 3

Pareto-optimal front of NSGA-II calibration methodology in objective function space

The calibration and validation results of runoff and sediment corresponding to S9 scenario can be respectively presented in Figs. 4 and 5. On the whole, the simulation effect in the wet season is better than that in dry season, with NSE and R2 of 0.67 and 0.77 in Ganguyi station and 0.77 and 0.78 in Yan’an station, while the simulation effect in the dry season is relatively poor. The reason is that the runoff generation in SWAT model is characterized by the SCS-CN method and a set of parameters associated with that structure, which primarily considers the precipitation and physical factors of the watershed (Liew and Garbrecht 2003). Furthermore, the validation effect of runoff in Yan’an station is better than that of Ganguyi station, but the simulation effect of sediment in Yan’an station is relatively poor. This is because the same set of parameters cannot fully characterize the spatial heterogeneity of the actual underlying surfaces in different catchments (Wu et al. 2022b). The noninferior solution obtained by the NSGA-II method is not the optimal solution, and the influence of parameters on hydrological elements at different stations is not completely consistent, and it also has a certain variability and uncertainty (Leta et al. 2017).

Fig.4
figure 4

Calibration (1991–1995) and validation (1996–1997) results of runoff corresponding to scenario S9 in a Ganguyi and b Yan’an hydrological stations

Fig.5
figure 5

Calibration (1991–1995) and validation (1996–1997) results of sediment corresponding to scenario S9 in a Ganguyi and b Yan’an hydrological stations

Conclusions

This study focuses on the NSGA-II algorithm to compare and analyze the impact of nine designed scenarios such as single- and two-site calibration, separate and synchronous calibration of runoff and sediment on the simulation performance of SWAT model in an arid and semi-arid watershed. Results indicate that (i) both NSGA-II and SUFI-2 algorithms can meet the accuracy requirements of parameter calibration in watershed modeling, but NSGA-II based on multiple objective functions can better constrain the parameter calibration process and obtain a hydrological model more in line with the physical conditions of the watershed. (ii) The two-site calibration based on NSGA-II can effectively reduce the impact of spatial heterogeneity on model parameters and obtain an optimal parameter set with wide applicability. The synchronous calibration of runoff and sediment parameters based on NSGA-II can not only improve the calibrating performance but also enhance the calibrating efficiency of SWAT model. (iii) The multi-objective two-site synchronization calibration of runoff and sediment based on NSGA-II combines the advantages of two-site and two-element synchronization calibration, which can identify a Pareto-region of the parameter space and obtain optimal parameters in different locations and can also characterize the trade-offs that can be made between different “optimal” ways of constraining the model to be consistent with the data in the presence of model and data error. The finding aims to provide insights on the practical implication of multi-objective optimization for model calibration and leading to the elusive goal of finding a unique optimal parameter set more efficiently. It is recommended that multi-optimization criteria that measure different aspects of system behavior can be used to investigate model uncertainty and performance, prior to broad use of a watershed water quality model.