1 Introduction

Influenced by increasing concentration of greenhouse gases since pre-industrial time and the subsequent global warming, hydrological cycle has changed significantly (Milly et al. 2005). Meehl (2007) found that the precipitation intensity is generally projected to increase, particularly at middle and high latitudes. Extreme hydrological events (e.g. floods, debris flows, and droughts) are likely to increase in frequency, duration and magnitude in climate sensitive regions (Milly et al. 2005; Sivakumar 2011). Therefore, studies on extreme hydrological events in response to global climate change have grown fast in recent years (Baguis et al. 2010).

Till far, there are a number of literatures addressing future flood or drought in the context of global climate change. For instance, Dankers and Feyen (2008) evaluated the impacts of climate change on future flood hazards in Europe by means of multiple regional climate models (RCMs). It was found that obvious inconsistence in the magnitude of Q100 (flood corresponding to the 100-year return period) exists among the hydrological simulations for different climate experiments. This motivated a multiple-model means to reduce possible uncertainty in subsequent climate impact assessment. Subsequently, the knowledge about climate impact on future flood in Europe is further enriched by Rutger and Luc (2009), Rojas et al. (2012) and Kay and Crooks (2014). Ngongondo et al. (2013) analyzed climate change impact on flood in the southern Africa by using outputs from three GCMs to force the hydrological model WASMOD-D. Immerzeel et al. (2010) studied climate change impact on the Asian water towers (i.e. Indus, Ganges, Brahmaputra, Yangtze, and Yellow rivers) using a revised snowmelt runoff model and climate scenarios from five GCMs. Xu et al. (2011) projected intra-annual change in flood discharge based on SWAT hydrological model and multiple GCM models in Yellow River basin. Similar study by Yang et al. (2014) was also conducted in the headwater catchment of Yellow River. These studies show flood indicators are generally derived from annual maximum streamflow series (AMAX), streamflow percentiles and peak over threshold series (POT). Flood corresponding to T-year return level based on AMAX is a more common indicator in flood projection studies. In some studies, POT series are also used to denote the flood frequency, i.e. changes in the number of floods occurring each year.

Notwithstanding a series of similar scientific efforts have been committed worldwide (Piao et al. 2010), modelling studies on floods in response to climate change are still highly challenging for hydrologists due to substantial uncertainty inherently existing in GCMs, downscaling methods and hydrological models in reproducing climate and floods. Furthermore, runoff processes in the Tibet Plateau generally differ from those in the humid zones (Wilbly and Harris 2006): runoff recharge source is composed of precipitation, snow- and glacial-melt collectively. Floods in the Tibet Plateau are strongly affected by precipitation and temperature, and therefore are more sensitive to climate change than in the humid regions. Hence, flood modelling in response to climate change in such regions is more complex and challenging to science community. Meanwhile, previous investigations were mainly focused on the uncertainty induced by GCMs, RCMs and the greenhouse gas scenario in assessing climate change impacts (e.g. Dankers and Feyen 2008; Rojas et al. 2012). Since hydrological models have been widely used to generate future streamflow scenarios in climate impact study, uncertainties introduced by a wide range of hydrological models should be sufficiently addressed.

The Tibet Plateau is located in a high elevation and cold area, where hydrological and ecological processes are highly sensitive to climate change (Houghton 2001). Therefore, growing studies on climate change and impact research in this region have been reported. So far, most of the studies focus on analyzing change in local climate variables (e.g. Wang and Yang 2012), mean annual and monthly streamflow (e.g. Xu et al. 2009; Liu et al. 2011; Tang et al. 2008; Zhang et al. 2012, 2013a, b) during historical or future period. Only few studies (Yang et al. 2014; Xu et al. 2011) on future flood could be found. Moreover, the difference in projected flood between different hydrological models has not been well studied in both above studies, resulting in limited knowledge on the role of different hydrological models to estimate future climate impact on flood. Therefore, this study strives to address the potential impact of climate change on flood magnitude (based on AMAX) and on flood frequency (based on POT series) in the headwater catchment of Yellow River (belonging to the Tibet Plateau) using a range of hydrological models. Toward this end, the article aims to: (1) testify different skills for various hydrological models in reproducing floods; (2) evaluate and compare the changes in magnitude and frequency of future high-flow series generated by diversified hydrological models under the condition of climate change; (3) quantify the uncertainty in flood quantile estimation by different hydrological models in future climate scenarios. The results are expected to contribute in improving our knowledge on simulation and projection of hydrological extremes in response to climate change in the elevated and cold mountainous regions. This is beneficial for policymakers and stakeholders in local water hazard mitigation management.

2 Study area and data set

2.1 Study area

The source region of the Yellow River refers to the catchment between 96°–101°30′E and 33°45′–37°05′N, with a contributing area of 121,000 km2 upstream at Tangnaihai station (Fig. 1). It belongs to the Tibet Plateau (YRCC (Yellow River Conservancy Commission) 2002). The region is dominated by a semi-humid monsoon climate. Annual average air temperature ranges from approximately −4 to 2 °C (Xu et al. 2009). The annual average precipitation is about 450 mm. Meanwhile, precipitation in the flood season (from July to October) exceeds 70 % of the total annual precipitation. Runoff processes are dominated by snow melt and precipitation.

Fig. 1
figure 1

Map of the headwater catchment of Yellow River basin

The mean elevation of the basin is about 4000 m above sea level. Frozen areas at the high altitude are pretty sensitive to climate change. Even slight changes of climate would induce change of permafrost, hence impacting water recycling in the region. During recent years many environmental issues have emerged, including degradation of environment, decrease of available water resources and acceleration of soil erosion due to significant climate change and increasing pressure generated by economic development. Consequently, growing studies were reported on climate change and impact research (e.g. Xu et al. 2009; Zhang et al. 2009).

2.2 Data

Available observations at 11 meteorological stations (Table 1), including daily precipitation, mean temperature, sunshine duration, wind speed, relative humidity and pan evaporation data (1961–2005), were collected from the National Climate Center. The vegetation data used in this study was obtained from the land cover classification map of China from the Chinese Academy of Sciences. The streamflow data for three hydrological stations (Jimai, Maqu and Tangnaihai) and a digital elevation model were from the Hydrological Bureau of Yellow River.

Table 1 List of 11 meteorological gauges (1961–2005) in the headwater region of the Yellow River (Source of data: The National Center of Climate, China)

The 26 atmospheric variables as predictors were derived from the following two datasets with a spatial resolution of 3.75° (longitude) × 2.5° (latitude): (1) daily reanalysis dataset of NCEP/NCAR for 1961–2003; (2) outputs of scenarios A2 and B2 of HadCM3 during 1961–2099. HadCM3 model shows relatively better performance in modeling temperature and precipitation in the East Asian regions (Xu et al. 2002), therefore the outputs of HadCM3 model after a quantile–quantile mapping transformation (Boe et al. 2007; Themeßl et al. 2011) were corrected and used in this study.

3 Methodology

3.1 Downscaling method

3.1.1 The statistical downscaling method (SDSM)

The SDSM, developed by Wilby et al. (2003), is a hybrid of a stochastic weather generator and regression methods. This method can apply a variety of data transformation types (e.g. squares, cubes, fourth powers) to the predictor and/or the predict and variables prior to calibration of downscaling model, obtaining secondary data series of the predict and and/or the predictor that have stronger correlations than the original data series. In addition, lagged predictor variables can be generated by means of shifting data series forward or backward by any number of time steps. As a consequence of its advantages over some other downscaling methods (Diaz-Nieto and Wilby 2005), SDSM is recommended as an effective tool for climate impact studies in many regions worldwide (e.g. Kim et al. 2006; Hashmi et al. 2011; Tatsumi et al. 2014). Application of HadCM3 and SDSM can also be found widely during past years (e.g. Xu et al. 2009; Chu et al. 2010).

As a preliminary basis for this study, statistical downscaling of temperature and precipitation in headwater catchment of Yellow River has been conducted by means of SDSM statistical model from the HadCM3 GCM model (Wang and Yang 2012). In the work, Nash–Sutcliffe efficiency measure (NSE), root-mean-square error (RMSE) and the ratio of standard deviation of the modeled and observed indices (RS) are selected as criteria for performance assessment. Results indicated that the model skills for temperature extremes are satisfactory (average NSE >0.95), while the performance in precipitation is weaker but acceptable (average NSE >0.60). Details described in Wang and Yang (2012) are not addressed in this article.

3.1.2 The artificial neural network methodology (ANN)

Since the first simple neural network is proposed by McCulloch and Pitts (1943), many types of ANN have been developed. The models are characterized by nonlinear nature, which makes the ANNs more efficient in identifying and representing relationships using noisy data (Hewitson and Crane 1996). The BP neural network model applied in this study is the most commonly used ANN model. A typical BP neural network model is composed of an input layer, a hidden layer and an output layer. The model adopts a feed-forward configuration and its learning process is based on back propagation method (Wasserman 1989). This algorithm repeatedly runs through the training data, comparing the simulated and the observed values of the output variables. The back propagation learning algorithm has two parameters: the learning rate (η), and the momentum factor (α). In past years, ANN has been widely used in downscaling precipitation where the highly nonlinear processes are involved that cannot be captured well by other methods (e.g. Coulibaly et al. 2005; Schoof and Pryor 2001). In present study, three indices (NSE, RMSE and RS) are selected as criteria for performance assessment of ANN in modeling climate variables.

3.2 Hydrological models

The river flow simulations performed in this study were carried out with HBV model, XAJ model and TOPMODEL, where both XAJ model and TOPMODEL are extended by adding snow accumulation and melt components (see Sect. 3.2.4). Presently, only a 0.1 percentage of study region area approximately is dominated by the glaciers (e.g. YRCC (Yellow River Conservancy Commission) 2002). A number of previous studies (e.g. Xu et al. 2009; Zhang et al. 2013b) concentrated on discharge change for the headwater catchment of major rivers over the Tibetan Plateau, finding minor difference in simulated monthly discharge between VIC-glacier model and VIC model in the headwater catchment of the Yellow River. Therefore, the glaciermelt-induced streamflow has not been taken into accounts in modeling the runoff processes in this study. Calibrations of the three models are performed using the method of Monte Carlo. For a given hydrological model, three steps are done to obtain the most skillful parameter set: (1) range for each parameter included in the Monte Carlo simulation is specified (addressed in our previous work, Chen et al. 2013). Next, each parameter value is drawn uniformly and independently from the ranges by means of Monte Carlo method (runs = 100,000). (2) The created 100,000 parameter sets in conjunction with the other model input data are used to drive the hydrological model, thus 100,000 flow sequences are generated in reference period (the calibration or validation period). (3) The most appropriate parameter set is acquired. In this step, if the goodness of fitness between the ith simulated daily streamflow series and observed values in reference period is highest, the corresponding ith parameter set is regarded as the most skillful one. In the study, 1961–1990 is used for calibration; 1991–2005 is for validation.

3.2.1 HBV model

The HBV model (Bergström 1995) is a precipitation-runoff model, which has been developed at Swedish Meteorological and Hydrological Institute. In past years, it has been successfully and widely applied to different countries all over the world (Bergström 1995; Seibert 2003; Krysanova et al. 1999). The HBV light version is used in this study. It is normally run on daily data for precipitation, and air temperature, and monthly potential evaporation estimated by the Penman formula (Penman 1948). A spatial discretization was used in this region, where calculation was conducted in 10 elevation zones. Three main modules included in the model are snow accumulation and melt, soil moisture routing, and river routing and response modules. Snowmelt routine is simulated using a degree-day approach. More details about the model are available from above mentioned references.

3.2.2 TOPMODEL

TOPMODEL (Beven and Kirby 1979) is a variable contributing area conceptual model in which the topography of the basin is one of predominant factors determining streamflow generation. The model has been widely reported across the world (e.g. Beven 1997; Huang and Jiang 2002). It was originally developed to simulate hydrological processes in humid catchments (Quinn and Beven 1993; Robson et al. 1993). Presently, TOPMODEL has been extended to simulate hydrological processes in alpine regions (e.g. Schild et al. 1998; Volk 2000). Detailed description of the model can be found in above mentioned references, it is therefore not presented here.

3.2.3 XAJ model

XAJ model, developed by Zhao et al. (1980), is a well-known lumped watershed model. The structure of the model is available in many references (Li et al. 2009; Zhao 1992). The soil is considered as three vertical layers in evapotranspiration module: an upper layer, a lower layer and a deep layer (Jayawardena and Zhou 2000). With a consideration of partial-area runoff generation, a parabolic curve is utilized to express spatially heterogeneous distributions of tension water storage capacity. Routing in channel system is estimated by Muskingum routing scheme. Recently, studies on projecting future scenarios of runoff processes under climate change based on this model have been reported for various worldwide regions (e.g. Jiang et al. 2007; Ju et al. 2009).

3.2.4 Snow accumulation/melt models

Generally, TOPMODEL and XAJ model do not have snow accumulation and melt simulation components. For this study the two models are extended by adding the procedures as follows. Snow accumulation is simulated from precipitation by using atmospheric temperature records to separate the precipitation into snow and rainfall (Davies 1997). The well-known degree-day approach is chosen for estimating snowmelt; it has been successfully verified word-wide over a range of catchments (e.g. Davies 1997). The basic equation of the degree day method is:

$$ M = C_{m} \left( {T - T_{melt} } \right) $$
(1)

where M is the daily snowmelt (mm/day), C m is the melt rate factor (mm/°C per day), T is the mean daily temperature (°C), and T melt is the critical temperature for melt to occur (°C). The degree–day factor is an empirical constant that accounts for all the physical factors not included in the model, which varies with the land cover.

3.2.5 Measures of model skills

For statistical measurements of hydrological model performances, Nash–Sutcliffe efficiency (NSE, Nash and Sutcliffe 1970), root-mean-square error (RMSE) and percent bias (PBIAS, Gupta and Sorooshian 1999) were selected to compare observed streamflow to the model simulations.

The differences (residuals) between simulated values by a model and the observed values can be measured by RMSE, defined as:

$$ RMSE = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {(O_{i} - S_{i} )^{2} } }}{n}} $$
(2)

where n is the number of time-steps, O i is the observation at time step i, and S i is the simulation at time step i. A smaller RMSE value indicates a better model performance.

The average tendency of the simulated data to be larger or smaller than their observed counterparts measured by PBIAS is simulated:

$$ PBIAS = \left[ {\frac{{\sum\nolimits_{i = 1}^{n} {(O_{i} - S_{i} ) \times 100} }}{{\sum\nolimits_{i = 1}^{n} {O_{i} } }}} \right] $$
(3)

The optimal PBIAS value is 0. A PBIAS value less than 0 indicates a model bias toward underestimation, whereas a PBIAS value more than 0 indicates a bias toward overestimation.

The dispersion degree of variables (RS, Hundecha and Bardossy 2008) is applied to estimate downscaling model efficiency:

$$ RS = \frac{{S_{sim} }}{{S_{obs} }} $$
(4)

in which S sim , S obs are the standard deviation of the modeled and observed indices, respectively.

3.3 Flood estimation

In the section, we examine two features of flood: frequency and magnitude. Following the approach of Gellens and Roulin (1998), flood frequency denotes occurrence days of high-flow exceeding the 5th percentile of the 1961–1990 baseline period. The flood magnitudes corresponding to T-year return level are obtained by fitting appropriate probability distribution to annual maximum 7-day streamflow. Six probability distributions are used to estimate T-year return level floods by the L-moments method. The distributions are screened to assess goodness of fit. The recommended goodness-of-fit criterion is the probability plot correlation coefficient (PPCC, Looney and Gulledge 1985; Vogel 1986, Eq. 5):

$$ PPCC = \frac{{\sum\nolimits_{i = 1}^{N} {(O_{i} - \overline{O} )(S_{i} - \overline{S} )} }}{{\sqrt {\sum\nolimits_{i = 1}^{N} {(O_{i} - \overline{O} )^{2} \sum\nolimits_{i = 1}^{N} {(S_{i} - \overline{S} )^{2} } } } }} $$
(5)

where N is the number of time-steps, O i is the observed discharge at time step i, and S i is the simulated discharge at time step i. In addition, \( \overline{O} \) and \( \overline{S} \) are the means of O i and S i , respectively. In addition, RMSE (Eq. 2) is used collectively as another measure.

3.4 Measure of uncertainty in flood quantile estimation

Uncertainty in hydrological modelling for quantile estimate is generally addressed by means of confidence interval (Burn 2003). In this section, the method is used for estimating confidence intervals for the annual maximum 7-day streamflow. Firstly, the best distribution in fitting is screened based on the method in Sect. 3.3. Flood quantiles are estimated by means of the L-moments method. The Monte Carlo simulations (runs = 2000 in this study) are performed to construct confidence intervals for flood quantiles. The details for this procedure can be found in the references (Hosking and Wallis 1997; Yang et al. 2010).

The relative length of confidence interval (RL) is used to measure the uncertainty in flood quantile estimation:

$$ RL = \frac{{Limit_{Upper} - Limit_{Lower} }}{S} $$
(6)

where Limit Lower and Limit Upper are the lower and upper boundary values of 90 % confidence interval corresponding to flood quantile. S represents the estimated flood quantile. High values of RL indicate remarkable uncertainty in flood quantile estimation.

3.5 Construction of future flood scenarios

Future scenarios of rainfall, temperature and evaporation, downscaled by SDSM and ANN from HadCM3 outputs in three time periods: (2020s, 2050s and 2080s), force three hydrological models (HBV, TOPMODEL, and XAJ) to generate future flood scenarios. Uncertainties in estimating flood magnitude and frequency by different hydrological models under the current and future climate conditions are intercompared.

4 Results and discussion

4.1 Calibration and validation

4.1.1 Validation of the artificial neural network downscaling model (ANN)

The predictors for the ANN model have been presented in our previous work (Wang and Yang 2012). The indices for mean climate and extremes are listed in Table 2, where three temperature indices, two evaporation indices and two precipitation indices are included. Table 3 shows the performance of the ANN downscaling method for the indices of daily mean temperature, evaporation and precipitation in the calibration (1961–1990) and validation (1991–2000) periods. It can be seen that the method is perfect in reproducing the temperature-related indices (NSE >0.95, 2.02 °C≥ RMSE ≥1.01 °C). The performance in modeling precipitation and evaporation-related indices (NSE ≥0.65, 6.03 mm≥ RMSE ≥0.97 mm) is acceptable. Meanwhile, Table 3 shows that ANN model can reproduce well the temporal variability of the temperature indices (RS ≥0.95). However, the variability of evaporation and precipitation indices is lower (0.92≥ RS ≥0.75).

Table 2 Statistical indices for mean temperature, evaporation and precipitation
Table 3 Performance assessment for the predictands in calibration (during 1961–1990) and validation (during 1991–2000) using ANN downscaling model

4.1.2 Inter-comparison of various hydrological models in reproducing historical streamflow

  1. (1)

    Hydrological models

This section presents performances of the three hydrological models in simulating observed daily streamflow and extremes during the period 1961–2005. Table 4 summarizes the performance for three hydrological models in simulating daily streamflow in the headwater catchment of Yellow River. It is found that the models can reasonably simulate daily streamflow at most stations during both the calibration and validation periods. Especially, the NSE is higher than 0.70, and PBIAS is less than 4 % at Tangnaihai station. There is a decrease in the model performances at other stations, but the results are still acceptable.

Table 4 Skill scores for three hydrological models in calibration (1961–1990) and validation (1991–2005)

Additionally, the performance of three hydrological models in simulating floods at Tangnaihai station is slightly lower than that for daily streamflow: NSE ranges from 0.62 to 0.82. However, the trends of floods are modeled quite well. Figure 2 shows the trends of observed and simulated annual Q5 during 1961–2005. The downward trend (significant at 0.1 level) shown by the observed Q5 series matches well with the simulations by three hydrological models. In general, satisfactory statistical results are shown in reproducing floods by these models, hence they can be used in the climate impact study.

Fig. 2
figure 2

Observed and simulated high-flow (Q5) by a XAJ; b HBV; and c TOPMODEL at Tangnaihai station. The linear fitting curves indicate the trend and the p values detect the significance of the trend

  1. (2)

    Flood return level estimation with six different probability distributions

Assessment of different return periods flood (maximum 7-day streamflow) is performed during the period (1961–1990). Six well known probability distributions (i.e. the Weibull, Gumbel, Generalized Extreme Value, Log Pearson Type III, Log-Normal Distribution and Pearson Type III distributions) are used in flood fitting. The results are listed in Table 5. The similar high values of PPCCs for these distributions suggest they have consistent good skills in probability analysis of floods. However, low RMSEs imply that the Generalized Extreme Value distribution is appropriate (RMSE = 721.74 m3/s). Inter-comparisons between simulations by three hydrological models and observations are presented (Fig. 3). It shows the probability plots for floods corresponding to different return periods (2–50 years). Figure 3 suggests the return levels of observed high-flow are generally in agreement with simulations. Therefore, the observations and simulations by XAJ, TOPMODEL and HBV matched well with an overlap of the confidence intervals (Fig. 3a–c).

Table 5 Skill scores for frequency analysis of high-flows with six different probability distribution models in baseline period (1961–1990)
Fig. 3
figure 3

Extreme high-flow from observations and simulations based on three hydrological models a XAJ; b TOPMODEL; c HBV during 1961–1990

4.2 Projected changes for future climate scenarios

4.2.1 Future scenarios of mean climate

Figure 4 shows the changes in seasonal precipitation, evaporation and mean temperature between the control period (1961–1990) and three future periods (2020s, 2050s, 2080s) using SDSM and ANN downscaling methods. It is found that there are dissimilarities between projected changes in seasonal precipitation. The precipitation is increasing obviously in winter and spring. Maximum increases occurring in 2080s were up to 41.8 % in spring and 90.4 % in winter. In summer, there was a consistent decrease in 2020s and 2050s. In 2080s, increases in summer precipitation projected by ANN reach to 39.1 % under A2 scenario and 8.2 % under B2 scenario, while a decrease is found for SDSM downscaling output. Besides, precipitation in autumn is decreasing in 2020s and increasing in 2080s (Fig. 4a, d).

Fig. 4
figure 4

Projected changes in seasonal a precipitation; b evaporation and c temperature based on SDSM downscaling outputs and changes in d seasonal precipitation; e seasonal evaporation and f seasonal temperature for ANN downscaling outputs under A2 and B2 scenarios in three future periods

The change pattern in mean temperature (Tmean) is less heterogeneous than that in seasonal precipitation. Increase pattern can be found for Tmean in all seasons (Fig. 4c, f), with more obvious changes in summer and autumn. Besides, projections under different scenarios are different. For instance, the growth in Tmean from SDSM downscaling output in 2080s (Fig. 4c) ranges from 4.6_C (winter) to 7.2_C (autumn) under the A2 scenario, and from 3.5_C (winter) to 5.1_C (summer) under the B2 scenario. General increases in evaporation are also presented with an exception of SDSM downscaling output in 2020s (Fig. 4b, e). In 2080s, the increase was sensitive to the choice of emissions scenarios, especially for ANN downscaling outputs in autumn.

4.2.2 Future scenarios of extreme climate

The changes in monthly maximum consecutive 5 days total precipitation (Px5d) in period (2010–2099) by two downscaling models are presented in Fig. 5. It shows a quite similar pattern of Px5d in the wet season (i.e. June–October, Fig. 5a–c). Namely, Px5d in three future periods under different scenarios is smaller than that in control period, with an exception of ANN model outputs in July and August in the 2080s (Fig. 5c). What’s more, decreases in projections from SDSM model are more significant compared with that from ANN model. In the dry period (i.e. December–February), it shows a complex pattern of both decreasing and increasing Px5d. It is uncertain to project how Px5d in dry season will change in future. Similar results are obtained for monthly maximum evaporation (Emax). The changes in Emax from different scenarios in flood season (i.e. June–August) show a common increasing trend (Fig. 5e, f) in 2050s and 2080s. The pattern is identical for two downscaling models. The increase ranges from 0.7 to 42.0 % in 2050s, and 1.1 to 60.6 % in 2080s. However, no identical changes of Emax in dry period (i.e. December–February) are found during three future periods.

Fig. 5
figure 5

Projected changes of monthly Px5d in a 2020s, b 2050s, c 2080s and changes of monthly Emax in d 2020s, e 2050s, f 2080s under A2 and B2 scenarios using SDSM and ANN downscaling methods

4.3 Scenarios of flood under climate change

4.3.1 Changes in flood frequency

Figure 6 demonstrates the projected changes in flood frequency (i.e. change in occurrence days of flood exceeds the 5th percentile) in SRES scenarios (A2 and B2) in three future periods (2020s, 2050s, 2080s) minus the control simulation (1961–1990) in the catchment. Figure 6a–d generated by three hydrological models collectively suggest the occurrence days of high-flow will decrease in future under most climate scenarios. In detail, the projected decrease in seasonal average occurrence days of high-flows by various hydrological models ranges from 7.2 to 10.0 in summer and from 6.2 to 10.7 in autumn in 2020s, from 6.2 to 9.2 in summer and from 5.8 to 10.9 in autumn in 2050s. In 2080s, increases in seasonal average flood frequency are found in HBV and XAJ projection driven by outputs of ANN model (Fig. 6c, d), while a significant decrease is observed in the other climate scenarios. In order to explain these results, the climate changes over the headwater catchment of Yellow River should be taken into account. As Figs. 4 and 5 show, precipitation will decrease in summer under most future climate scenarios, and obvious temperature and evaporation increase throughout the year. These factors could collectively account for the remarkable decrease in the flood frequency of occurrence in summer.

Fig. 6
figure 6

Projected changes in high-flow frequency (change in occurrence days of high-flow exceeds the 5th percentile) in SRES scenarios (A2 and B2) for three future periods (2020s, 2050s, 2080s) minus that of control simulation (1961–1990) in a summer and b autumn, based on SDSM downscaling outputs; in c summer and d autumn, based on ANN downscaling outputs

4.3.2 Changes in flood magnitude with different return periods

We proceed now to a flood analysis using extreme value distributions for various hydrological models, downscaling methods and SRES scenarios in the headwater catchment of Yellow River. In this section, we will examine flood (maximum 7-day high flow) in the control and future periods corresponding to 2-, 5-, 10- and 50- year return period. Changes in flood return periods and the 90 % confidence bands are shown.

The 3-parameter generalized extreme value (GEV) distribution is used to fit the annual maximum 7-day high flow (Table 5). The data of annual extreme are from daily streamflow records (30 years) in the control and future scenarios, therefore we do not estimate flood return period more than 50 years to ensure the estimation reliability. In addition, confidence intervals are estimated through Monte Carlo simulation methods. The T-year return extremes from the control and future scenario simulations are distinct if their 90 % confidence intervals do not overlap.

Percentage change in the maximum 7-day high-flow corresponding to the T-year return period is shown in Tables 6 and 7. Different models produce quite similar results in the future maximum 7-day high-flow. Negative changes in flood for all return periods are found significant in the 21st century, with an exception of XAJ and HBV simulations in 2080s under A2 scenarios. In the projection, decreases (29.45–57.21 % in 2020s, 24.62–56.28 % in 2050s) at the 5-year return period and decreases (30.15–58.02 % in 2020s, 23.56–53.80 % in 2050s) at the 10-year return period are found. In 2080s, though no significant changes are found in HBV and XAJ projection under A2 scenarios, a significant decrease is observed in the other model runs. Besides, decrease in 2080s will reduce compared with 2020s and 2050s under most climate scenarios. For example, the reduction in maximum 7-day high-flow corresponding to the 50-year return period is around 25.63–56.88 % in 2020s, 16.27–51.83 % in 2050s, and 14.62–41.00 % in 2080s respectively.

Table 6 Changes (%) in projected maximum 7-day high-flow based on three hydrological models and SDSM downscaling outputs at different return periods in three future periods (2020s, 2050s, 2080s) compared with the baseline period (1961–1990)
Table 7 Changes (%) in projected maximum 7-day high-flow based on three hydrological models and ANN downscaling outputs at different return periods in three future periods (2020s, 2050s, 2080s) compared with the baseline period (1961–1990)

4.4 Uncertainty in flood quantile estimation

Relative length of confidence interval (RL) for T-year return level extreme high-flow (T = 2, 5, 10, 50) are calculated based on flood series by various hydrological models under current and future climate scenarios. To investigate the effect of selected hydrological model on RL, RLs are grouped by hydrological model. Each group includes RLs corresponding to T-year return level flood simulated by specific hydrological model, driven by two downscaling methods outputs in three future time slices (2020s, 2050s and 2080s) under A2 and B2 emission scenarios; hence the sample size is twelve in each group. Figure 7a–c show box plots of RL for projected flood quantile for different return periods by hydrological models. RL increases with recurrence interval. This shows confidence in projected flood quantile decreases as the return period increases. In addition, there is a significant difference between RL medians corresponding to 50-year return period and shorter return period (<30 years), because 50 years is beyond the size of window (30 years) used for fitting the extreme distributions.

Fig. 7
figure 7

ac Box plots of RL for projected extreme high-flow under selected combinations of hydrological model and return period. df Box plots of change in RL for T-year return level extreme high-flow between future time slices and the control period. The horizontal lines represent the median values; the interquartile range (25th–75th quantiles) is represented by boxes; the whiskers indicate 10 % quantile and 90 % quantile

Similarly, changes in RL for extreme high-flows between future time slices and the control period are simulated and further grouped by hydrological model. Note that sample size for each group is also twelve. Figure 7d–f show percentage changes in RL for extreme high-flows at different return levels. Results by three hydrological models indicate changes in RL also increase with recurrence interval. In addition, medians for changes in RL for 50-year return level extreme high-flow are above zero (Fig. 7d–f). Meanwhile positive change in RL for 10-year return level extreme high-flow from XAJ and HBV can be obtained under most future climate scenarios (Fig. 7d, f). It suggests that the uncertainty in flood quantile estimation is likely to become higher in future climate scenarios, even though return period is below the size of window (30 years) used for fitting the extreme distributions.

5 Conclusions

In the work, flood scenarios in the headwater catchment of Yellow River basin during the 21st century are constructed by means of a variety of hydrological models and statistical downscaling outputs (SDSM and ANN) from the HadCM3 GCM model, under a range of emission scenarios. Meanwhile, the uncertainty for flood quantile estimation is analyzed. The major points are summarized as following:

  1. (1)

    Three hydrological models generate satisfied results in daily streamflow at most station. Especially the Nash–Sutcliffe efficiency of daily streamflow exceeds 0.7, and PBIAS is less than 4 % at Tangnaihai station. In addition, the generalized extreme value distribution is selected as the more appropriate one to perform flood return level estimation, among six well known probability distributions. Meanwhile, flood magnitude, trend and frequency at Tangnaihai station can be well reproduced by three hydrological models.

  2. (2)

    It is found that flood frequency will undergo a significant reduction under most scenarios in the 21st century. A possible explanation maybe the remarkable increases of temperature and evaporation throughout the year and precipitation decreases in summer. Meanwhile, different models produce quite similar results in future maximum 7-day high-flow. Negative changes in floods corresponding to all return periods are found significant under most scenarios in the 21st century. The point could be extended by using more RCMs in parallel with downscaling methods from 1 GCM as well as comparing with the results from various hydrological models, to strictly check uncertainty related to hydrological models.

  3. (3)

    RL in projected flood quantile increases with recurrence interval as a consequence of the diminishing number of events in the sample. The results are similar for changes in RL for floods between future time slices and the control period. Larger RLs for extreme flood quantile in future climate scenarios will be likely to present even if the return period is below the size of window used for fitting the extreme distributions. These results highlight the need of appropriate treatment of the data sources uncertainties in extreme flow quantile estimation, for sake of improving reliability in extreme high-flow projection.