1 Introduction

The latest report of the Intergovernmental Panel Climate Change [IPCC 2013] states that since the 1950s observed changes in the climate system such as heavy precipitation events are unprecedented over decades to millennia. It is very likely that further changes by the late twenty-first century will take place. For Europe, on average, flood peaks with return periods above 100 years might double in frequency within three decades [Alfieri et al. 2015b]. By the end of this century, the socio-economic impact of river floods could increase on average by 220% due to climate change only [Alfieri et al. 2015a]. More specifically, in Belgium, there is a possible increase of up to 40% for mean precipitation; extreme precipitation changes, however, might increase by up to 50% in winter and 100% in summer [Tabari et al. 2015].

To quantify climate change impacts on water resources, a vertical top-down methodology is typically followed (see e.g. Xu 1999): global climate models (GCMs) with driving greenhouse gas scenarios (GHGs) or representative concentration pathways (RCPs), in combination with downscaling methodologies, dynamically through regional climate models (RCMs) or statistically with statistical downscaling methods (SDMs), first describe the local climate change signal for a given future period. This is further used as input for a (calibrated) hydrological model (HM) in order to quantify changes in hydrological variables, which are often related to water availability and/or flooding conditions. Each consecutive step of the above modelling chain adds uncertainty to the results. Knowledge and understanding of these uncertainties is crucial for water managers to guide them in the decision-making process.

Recent research acknowledges that not adequately addressing all sources of uncertainty may result in biased climate change impact studies and resulting decisions [Chen et al. 2011; Najafi and Hessami Kermani 2017]. Most studies, however, only consider a limited ensemble describing the combined climate and hydrological uncertainty.

Table 1 shows some of the recent literature focusing on climate change impact on water resources and its related uncertainty. Note that this list is a limited sample taken from all available literature. The table shows that, mostly, the research is focusing on only one specific source of uncertainty and, hence, these studies possibly underestimate the total uncertainty involved. This has recently been confirmed by Seiller et al. [2017], who concluded that “[…] the diagnosis of the impacts of climate change on water resources are quite affected by the hydrologic models selection and calibration metrics”. Moreover, Sunyer et al. [2015], among others, pointed out that various downscaling methods will result in significant differences in the (magnitude) of impact results.

Table 1 Selected recent climate change impact studies on water resources, sorted by decreasing total ensemble size

Here, we apply an ensemble of 93 GCM runs (a combination of 24 climate models with 4 RCPs, see Table 2), 9 SDMs, 5 HMs and 10 parameter sets per HM, for two catchments in Belgium. The resulting balanced ensemble of 41,850 members per catchment, is, to the authors’ knowledge, unseen in literature. It allows us to better investigate the relative importance of each contributing source of uncertainty. This study focuses on the impacts of climate change on low and peak flows, hence the types of impacts of importance for water management: hydrological extremes, floods and water availability.

Table 2 Selected global climate models

2 Study Area

Two study areas in Belgium have been selected: the Grote Nete catchment and the Dijle catchment (Fig. 1). The Grote Nete catchment in the North-Eastern part of Belgium has an area of 402 km2 with a sandy soil coverage. The Dijle catchment in central Belgium has an area of 636 km2 with as main land use: urban (about 20% of the study area), pasture and forests (resp. 16 and 18%) and arable land (silt and silty loam soils). Both catchments have high baseflow during winter and very low discharges during summer. Belgium has a maritime, temperate climate with annual precipitation ranging between 700 and 1000 mm/year. The long term average evapotranspiration amount is about 670 mm/year. These catchments have been subject to several climate change impact analyses previously [Van Steenbergen and Willems 2012; Vansteenkiste et al. 2012, 2014a, 2014b; Willems., 2014].

Fig. 1
figure 1

Selected case studies in Belgium

3 Materials and Methods

3.1 Meteorological Data

3.1.1 Global Climate Models

The ensemble includes 93 global, CMIP5 based climate model runs, hereafter denoted as GCM runs, and comprises 24 different models and 4 RCPs (RCP 2.6, 4.5, 6.0 and 8.5, Table 2). The periods 1961–1990 and 2071–2100 were chosen as reference and future period, respectively.

For all selected climate model runs, data is available for precipitation and other variables required for the various downscaling methods.

Evapotranspiration (ETo) is, here, calculated based on the Penman-Bultot evapotranspiration calculation method [Baguis et al. 2010; Bultot et al. 1983] and requires various meteorological variables such as radiation, heat fluxes, relative humidity and wind speed as input. This puts a limit on the availability of GCM runs: ETo can only be calculated for those GCM runs indicated with an asterisk (*) in Table 2. More information on the ETo calculation method and how it has been calibrated for Belgium can be found in Bultot et al. [1983] and Baguis et al. [2010].

3.1.2 Downscaling Methods

The spatial and temporal resolutions of the climate models mismatch the resolutions required for hydrological impact modelling [Xu 1999]. Also, the output of climate models is potentially biased. This impedes the direct application of climate model outputs and requires downscaling. In general, two downscaling approaches exist: dynamical and statistical downscaling. High resolution regional and limited area climate models (RCMs and LAMs) form the basis for dynamical downscaling. However, recent research has shown that high resolution climate models do not appear to give added value to simulation of precipitation extremes and remain biased [Tabari et al. 2016]. Moreover, compared to the GCMs, the availability of RCMs and LAMs is limited and this might create biased uncertainty estimates [Van Uytven and Willems 2018]. Given the research objective to focus on all uncertainty sources and, thus, to design a large multi-ensemble, we have chosen to implement statistical downscaling methods (SDMs) and apply these on an ensemble of global climate models.

The ensemble includes 9 SDMs. Different categorizations exist for these SDMs; here, we consider the categorization of Maraun et al. [2010] which divides the methods into three separate categories:

Model Output Statistics (MOS 1–5): 1 delta change method (method CFM in Sunyer et al. 2015); 3 different types of quantile perturbation methods (the quantile perturbation method accounting for changes in the number of wet days as described in Ntegeka et al. 2014 and, methods SD-A-4 and SD-A-5 in Willems and Vrac 2011) and 1 event based perturbation method (method SB in Sørup et al. 2017);

Weather Generator (WG): 1 event based, re-sampling based weather generator [Thorndahl et al. 2017]);

Perfect Prognosis (PP 1–3): 3 different types of weather typing methods including 1 resampling method (method SD-B-1 in Willems and Vrac 2011) and two analogue methods (methods SD-B-3 and SD-B-7 in Willems and Vrac 2011). These methods are implemented, based on the ERA-40 re-analysis data [Uppala et al. 2005].

Methods MOS 1–5 are change factor based statistical downscaling methods. Change factors are calculated using the climate model outputs for the future period with respect to the outputs for the reference period. They are thereafter applied to the observed time series. The considered weather generator is also a change factor based statistical downscaling method. The parameters of the weather generator are calibrated using observations and are next modified by change factors. Hence, the modified change factors are representative for the future climate. Change factor based statistical downscaling methods assume that the climate model biases remain time-invariant and are accounted for by considering change factors rather than the direct climate model outputs.

The weather typing methods are perfect prognosis methods. For this type of statistical downscaling methods, the relation between the predictors and the predictand is defined using historical observations. Next, the relation is applied to the climate model output assuming the predictors are accurately and adequately simulated (i.e. perfect prognosis assumption). However, the original implementation of the weather typing methods is not tailored to the biases in the weather type occurrences (Willems and Vrac 2011). This has been identified as a method shortcoming by Van Uytven et al. (2019). Remark that also change factor based statistical downscaling methods have shortcomings, e.g. they are data driven and not physically based, and that the shortcomings of statistical downscaling methods are accounted for by ensembles.

3.2 Hydrological Modelling

3.2.1 Hydrological Model Structures

The ensemble includes five hydrological model structures: VHM [Willems 2014;], HBV [Lindström et al. 1997], NAM [Nielsen and Hansen 1973; DHI 2009], GR4J [Perrin et al. 2003] and PDM [Moore 1985, 2007]. These are all conceptual hydrological models (in contrast to physically based models) and are implemented in a lumped way (in contrast to spatially distributed modelling – some of these hydrological models have been applied in a distributed form, however, this is not used in this study).

Previous research has shown that conceptual lumped hydrological models are able to show a similar or higher overall performance compared to the more complex distributed and physically based models [Breuer et al. 2009; Viney et al. 2009; Ghavidelfar et al. 2011; Liu et al. 2011; Apip et al. 2012; Lobligeois et al. 2014; de Boer-Euser et al. 2017]. On top of that, physically based and distributed hydrological models are more prone to the issue of equifinality and overparameterization [Gupta and Sorooshian 1983; Beven 1989, 1993, 2006; Jakeman and Hornberger 1993; Uhlenbrook et al. 1999;; Das et al. 2008; Sivakumar 2008; Andréassian et al. 2012]. The two above reasons, complemented with the low computational cost of lumped conceptual models compared with distributed physically based models, drive us to the choice for the former in the context of this study.

3.2.2 Calibration of Hydrological Models

The calibration protocol is detailed below:

  • Calibration period: 1 September 2003–31 December 2005, with a warming up period starting on 13 August 2002 for the Grote Nete catchment; for the Dijle catchment this was 1 November 2009–28 February 2013, with a warming up period starting on 1 September 2008.

  • Validation period: 1 January 2006–31 December 2008 (the Grote Nete catchment) and 1 March 2013–1 March 2016 (the Dijle catchment), with the calibration period as warming up period

  • Algorithm: the MOSCEM-UA algorithm [Vrugt et al. 2003] was used to perform the (automatic) calibration Nash-Sutcliffe efficiency (NSE) and NSE of the log of the flows (NSElog) were chosen as objective functions for the calibration. The MOSCEM-UA algorithm results in a Pareto-front of best parameter sets, of which 10 sets, evenly spread across the objective function space, are selected for the further analysis.

  • Daily time step was used.

Resulting hydrographs of the calibrated and validated hydrological models can be found in the Supplementary material.

3.3 Impact Calculations

Calibrated hydrological models are forced by time series of 100 years of current and future (projected) meteorological variables (precipitation P and potential evapotranspiration ETo). From the resulting (current and future) discharge series, low and peak-flow extremes are extracted by means of a Peak-over-threshold method [Willems 2009]. After ranking the flow extremes from more extreme to less extreme, empirical return periods are assigned and a climate change impact factor is calculated per return period:

$$ {IF}_x(T)=\frac{X{(T)}_{future}}{X{(T)}_{current}}, $$
(1)

with X the variable of interest (low-flow extreme, or peak-flow extreme) and T the empirical return period. Impact factors are further averaged over all empirical return periods larger than 1 year (only if the behavior of the impact factors are found to be stable within that range) in order to obtain one impact factor per ensemble member. With Eq. (1), one can see that IF <1 (for low-flow extremes) and IF >1 (for peak-flow extremes) would correspond with more extreme conditions in the future.

The forcing data for conceptual hydrological models is precipitation P and potential evapotranspiration ETo. However, as shown by, only a limited number of GCM runs are able to produce projections for ETo according to the Penman-Bultot method. Every downscaled time series of P is therefore combined with every downscaled series of ETo. After propagation through the impact model, impact factors are averaged for the various ETo-projections. This results in one set of impact factors per downscaled time series of P. Such an approach does not bias the impact results, as shown in De Niel et al. [2018].

For the stochastic downscaling methods, impacts are calculated for each of the 100 stochastic simulations. After propagation through the impact model, the impact factors are averaged.

Note that the land use/land cover will – most probably – have changed by the end of this century and these changes will – most probably – further impact river flow extremes. However, as the present study is limited to an uncertainty analysis of climate change impacts only, we do not consider these land use/land cover changes.

3.4 Uncertainty Analysis

In order to quantify the relative contribution of the various members of the impact modelling chain (GCM runs, SDM, HM and parameter sets) to the total uncertainty in low-flow and peak-flow impact factors, ANOVA (analysis of variance) is applied. This technique is widely applied in hydrology [Dams et al. 2015; Giuntoli et al. 2015; Sunyer et al. 2015; Vidal et al. 2016; Meresa and Romanowicz 2017] and is able to distinguish the main effects of each contributing source of uncertainty, as well as interaction effects between the various factors.

4 Results

Figure 2 shows the impact factors for low-flow extremes and peak-flow extremes. These impact factors (one per ensemble member) are grouped based on the representative concentration pathways (RCPs). For both peak-flow and low-flow extremes, more extreme conditions are projected towards the end of the century: the impact factors are generally larger than 1 for peak-flow extremes, and smaller than 1 for low-flow extremes, irrespective of the RCP considered. Furthermore, the impact becomes more extreme with increasing radiative forcing: the mean impact factor for peak-flows considering a radiative forcing of 8.5 W/m2 is equal to 1.18, vs. 1.07 with a radiative forcing of 4.5 W/m2. Similar for low-flow impacts: 0.84 for 8.5 W/m2 vs. 0.98 for 4.5 W/m2. Note that, for this figure, the results of both catchments are pooled since they were showing almost no difference.

Fig. 2
figure 2

Climate change impacts on low-flow (left) and peak-flow (right) extremes become more extreme with increased radiative forcing

The large spread in impact results, as seen in Fig. 2, is further investigated. The ANOVA results in Fig. 3 show the relative contributions of the different sources of uncertainty to the spread in impact results, both for the impact on peak-flow extremes and low-flow extremes. The results are shown for both case studies and seem to be comparable for both catchments.

Fig. 3
figure 3

Contributions of various sources on the total spread of low-flow (left) and peak-flow impact results (right), for the two case studies, indicate relative importance of climatological and hydrological modelling

The GCM runs (including RCP and initial conditions) have a similar contribution of 30–40% to the spread in peak-flow and low-flow impacts. For low-flow extremes, the downscaling methodology has no no significant influence on the impact results. The hydrological model structure, on the contrary, explains up to 34% and 42% (for Grote Nete catchment and the Dijle catchment, respectively) of the spread in low-flow impact results. For peak-flow extremes, the downscaling methodology has a significant influence on the impact results (21% and 26% for the Grote Nete and Dijle catchment, respectively).

The ANOVA-technique does not only allow to quantify the main effects as discussed above, but also the interaction effects between the various sources of uncertainty. The interaction effects were found missing in most of the selected climate change analyses from Table 1. Interestingly, Fig. 3 indicates that the interaction effects between GCM runs and SDMs account for about 20% of the total uncertainty range (Dijle catchment) to 25% (Grote Nete catchment) for peak-flow changes, and only 4–8% for low-flow changes. Interactions between hydrological model structure and model parameters are found to be negligible. The remaining second order, and higher order effects, add up to 10–20%.

5 Discussion and Conclusions

It is shown that, for our case studies, hydrological extremes will become more extreme with climate change: dry becomes drier and wet becomes wetter; the sign of the changes is quite clear. This is in agreement with previous studies on the same and neighboring catchments [Tavakoli et al. 2014; Vansteenkiste et al. 2014b; Dams et al. 2015]. Furthermore, Hundecha et al. [2016] stated that, for European catchments with a rainfall dominated flood regime (opposed to spring/summer snowmelt floods), extreme flow indices would increase in the future. Lobanova et al. [2018] concluded that peak-flows in winter periods would increase across Central, Northern and Eastern Europe, whereas, in general, no significant changes were projected for low-flow periods. Gosling et al. [2017] found, for the Rhine river basin, a projected decrease in low-flows, and increase in peak-flows; moreover, the projected changes became stronger with an increased signal of global warming, confirming results from present study. For other river basins in the world, however, the sign of change was unclear. Meaurio et al. [2017] also found a signification increase in durations (days) of low-flows. On the contrary, Seiller et al. [2017] argued that in a multi-model ensemble, the direction of projected change might not be clear.

Despite the clear signal related to the direction of change for our case studies, there is a high uncertainty concerning the magnitude of these changes.

Low-flow conditions in our case studies are generally governed by total seasonal precipitation. We concluded that the contribution of SDMs to the uncertainty in projected low-flow extremes, is rather limited. This can be explained by the total summer precipitation amounts for the various downscaling methodologies, which remains fairly constant (Fig. 4): median values for RCP 8.5 GCM runs vary between 0.70 for the event based weather generator to 0.84 for weather typing method 3 (Fig. 4). This is confirmed by Vidal et al. [2016], who reported only a minor contribution of SDMs on low-flow impacts for two selected catchments in France. In contrast, Chen et al. [2011] found, based on the older CMIP3 GCMs and for a Canadian watershed, a significant contribution of SDMs on the low flow impacts. The ensemble size for SDMs in both studies was, however, rather limited.

Fig. 4
figure 4

Climate change impact on summer precipitation amount for selected GCM runs with RCP 8.5 (n = 30)

On the (high) contribution of uncertainty for future low-flow extremes related to the hydrological models, our findings confirm earlier research on the same case studies [Vansteenkiste et al. 2014b]. This earlier research, however, considered a smaller ensemble of lumped conceptual hydrological models and older generation GCMs. With only one downscaling methodology applied, they concluded that “[…] model choice as well as calibration strategy hence have a critical impact on low flows, more than on peak flows”. Similar conclusions were found by Velázquez et al. [2013], for one Canadian and one German catchment and an ensemble of four hydrological models. Another recent model-inter-comparison study compared 11 hydrological models for the Meuse river and its sub-catchments [de Boer-Euser et al. 2017]. It was found that, for current conditions, the behavior of the different hydrological models was very different under drier conditions, even though the overall performance of the models was similar.

Peak-flow conditions in our case studies are generally due to extreme precipitation events in winter periods. The largest fraction of the uncertainty in peak-flow impact is explained by the various downscaling methods. Indeed, it is known that different types of downscaling methodologies might result in significantly different quantifications of extreme peak-flows [Chen et al. 2011; Teng et al. 2011; Tavakoli et al. 2014; Hundecha et al. 2016; Meaurio et al. 2017].

Interaction effects between the various levels in the impact modelling chain (2nd and higher order), although often neglected in most climate change impact studies, are found to be responsible for 24–38% of the total uncertainty on future extreme river flows. Here, we acknowledge the importance of these interaction effects, however, we do not dig deeper into the underlying reasons of these interaction terms.

From the literature review (Table 1), one could conclude most studies focus on only one uncertainty source in their impact analysis. With an very large multi-model ensemble, we were able to show that biased results might be obtained in doing so. When considering peak-flow extremes, one should focus on climate modelling and downscaling uncertainty, and it seems somehow justifiable to limit the ensemble of hydrological models; on the contrary, neglecting hydrological model structure uncertainty in low-flow predictions will undoubtedly lead to biased results.

In most climate change impact analyses to date, the various GCMs are considered independent. And, as such, the impact factors resulting from these GCMs are considered equiprobable. However, with the growing number of climate models that become available through public databases, this assumption has been challenged by several studies [Knutti 2010; Pennell and Reichler 2010; Knutti et al. 2013; Sunyer et al. 2013; Hosseinzadehtalaei et al. 2017b]. They all concluded that there is a significant interdependency among GCMs (and RCMs) and only a reduced number of GCMs should be used for impact analyses. Failure to take into account this interdependency would lead to biased results, conservative uncertainty estimations and overly confident predictions. In order to solve this issue of climate model dependency, Knutti et al. [2017] recently proposed a climate model projection weighting scheme. In the present study, impact factors for low-flow and peak-flow extremes were analyzed per RCP (Fig. 2). For each RCP, all GCM runs were considered equiprobable and thus violating the above interdependency issue. We acknowledge this limitation; results hence should be interpreted with caution. When inter-dependency would be taken into account, we expect that (1) the range of results in Fig. 2 would not increase – it might decrease; and (2) the uncertainty contributions of the various factors might shift, most probably towards higher contribution of the GCMs. Moreover, recent research has shown that weighting GCMs would only have a limited impact on hydrological variables [Chen et al. 2017; Das et al. 2018]. Thus, it is expected that the main conclusions of the study (dry gets drier and wet gets wetter, plus the importance of hydrological model structure and downscaling methodology in the uncertainty of impact results) would not change by including the inter-dependency among GCMs.

Also, we acknowledge the debate related to the choice of calculation formula for potential evapotranspiration and its influence on hydrological projections in a climate change context [Kingston et al. 2009; Milly and Dunne 2016, 2017; Seiller and Anctil 2016; Hosseinzadehtalaei et al. 2017a; Paparrizos et al. 2017; Seong et al. 2017]. This source of uncertainty has not been investigated in the present study. In order to add the uncertainty related to potential evapotranspiration calculation formulae, historical evapotranspiration needs to be re-calculated, as well as the future projections. On top of that, each hydrological model should be calibrated separately for each ETo calculation formula, making the problem a lot more complex.