1 Introduction

Global Climate Models (GCMs) are an essential tool in assessing climate variability and change. These models, based on the governing physical laws such as conservation of mass, energy and momentum, and physical processes, represent large-scale (generally hundreds of kilometers and larger) flow patterns and dynamics of Earth system components, including the atmosphere, ocean, land surface, and sea ice and have shown encouraging skill in simulating global climate responses to external forcing such as increased greenhouse gases (Randall et al. 2007). However, the ability of GCMs to simulate climate variations at local or even regional scales is limited by their coarse spatial resolution and the increasing importance of internal variability at small scales (Deser et al. 2014).

Dynamical downscaling provides a physically consistent approach to resolve finer scale processes. Limited area, high-resolution climate models (Regional Climate Models, RCMs) are forced by lateral boundary and initial conditions commonly generated by a GCM (Giorgi 1990; Leung et al. 1999, 2004; Denis et al. 2001; Leung and Qian 2005; Wang et al. 2004; Liang et al. 2008; De Sales and Xue 2011). RCMs have been used widely to improve the simulation and projection of small-scale climate information deriving benefits from finer scales in surface characteristics and improved representation of finer scale physical processes (Dickinson et al. 1989; Giorgi et al. 1994; Wang et al. 2004; Lo et al. 2008; Heikkila et al. 2010; Maraun et al. 2010; Qian et al. 2009). RCMs produce high-resolution climate change scenarios and allow us to explore uncertainty due to large-scale forcing and model formulation in regional-scale projections of future climate (Mearns et al. 2009; PaiMazumder et al. 2013; PaiMazumder and Done 2014). Dynamical downscaling has been successfully employed by a number of ensemble-based regional climate simulation and assessment projects such as the Regional Climate Model Intercomparison Project for Asia (RMIP; Fu et al. 2005), Ensembles-Based Predictions of Climate Changes and Their Impacts (ENSEMBLES; van der Linden and Mitchell 2009), the North American Regional Climate Change Assessment Program (NARCCAP) (Mearns et al. 2009) and the Co-ordinated Regional Climate Downscaling Experiment (CORDEX).

The quality of any single downscaled simulation is inevitably limited by the quality of the boundary conditions provided by the GCM and acceptable biases at global scales can degrade the downscaled simulation of regional climate and extreme weather (e.g. Liang et al. 2008; Holland et al. 2010; Ehret et al. 2012; Xu and Yang 2012; Bruyère et al. 2013; Done et al. 2013). To compensate for this deficiency, some method of bias correcting driving data (known as boundary bias correction) prior to running the RCM has become a standard procedure (Xu and Yang 2012; Bruyère et al. 2013; Done et al. 2013). One such method used in climate change studies is to construct boundary conditions for a RCM by adding a seasonally and spatially varying climate change perturbation from a GCM simulation to reanalysis climate, a technique known as pseudo-global-warming (Schär et al. 1996; Sato et al. 2007; Rasmussen et al. 2011a). Using pseudo-global-warming method, Rasmussen et al. (2011a) investigated climate change impacts due to increased temperature and water vapor content on snowfall, snowpack, and runoff in the Colorado Headwaters region. A more recent boundary bias correction approach, developed by Holland et al. (2010) and Bruyère et al. (2013a) corrects the seasonally varying mean bias in the GCM with 6-hourly reanalysis data but retains the six-hourly weather, longer-period climate variability, and climate change from the GCM. Application of this approach produces realistic tropical cyclone frequencies (Holland et al. 2010; Done et al. 2013). Bruyère et al. (2013) showed that the correction of all boundary variables best reproduces regional climate across a range of metrics. A number of variations on this approach have been studied including; correcting bias in the mean and variance (Xu and Yang 2012), quantile–quantile mapping (Colette et al. 2012), and feature location correction (Levy et al. 2012) although the comparison study of White and Toumi (2013) recommends use of the simple mean bias correction.

In addition to bias introduced to the RCM by the driving GCM, RCMs are also subject to biases due to model formulation. For example, most RCMs have difficulty in simulating the occurrence of light and heavy precipitation (Fowler et al. 2007). RCMs also have larger biases for summer precipitation and temperature than other seasons, due to the difficulties in simulating convective rainfall (Christensen et al. 2008; Maraun et al. 2010; PaiMazumder et al. 2013). This makes the use of RCM output data as direct forcing for impact models problematic (Wood et al. 2004; Baigorria et al. 2007; Ghosh and Mujumdar 2009; Teutschbein and Seibert 2010). Therefore, some form of post-processing bias correction of RCM output data is a necessary step for most climate change impact studies.

Several post-process bias correction methods ranging from a simple correction of the long-term mean to sophisticated weather generators have been developed in the last decade. Roy et al. (2012) removed the mean bias of daily minimum and maximum temperature and applied a multiplication factor to daily precipitation so that the simulated and observed distributions have the same mean, in the output from RCM simulations driven by reanalysis data and showed noticeable improvement in temperature and precipitation extremes. An alternative, widely used bias correction technique is to employ a transfer function derived from cumulative distribution functions of observed and simulated data (e.g., Wood et al. 2004; Ines and Hansen 2006; Li et al. 2010; Piani et al. 2010a, b; Dosio and Paruolo 2011). Using RCM simulated daily precipitation over Europe Piani et al. (2010a) showed that this technique performed satisfactorily not only for mean but also for time dependent statistical properties, such as the number of consecutive dry days and the cumulative amount of rainfall for consecutive heavy precipitation days. Other, more advanced, bias correction methods have also been trailed such as quantile mapping (Themeßl et al. 2011).

Irrespective of these bias correction approaches, decreasing the horizontal grid spacing to a resolution that begins to explicitly simulate finer-scale processes such as deep convection and more realistically represent finer scales of the land surface, has been shown to improve RCM simulations for a number of studies. Ikeda et al. (2010) and Rasmussen et al. (2011b) showed noticeable improvements of simulated snowpack at horizontal resolution less than 6 km due to the improved representation of orographic forcing. Prein et al. (2013) showed a 4-km model improves upon a 12-km model for the simulation of summertime precipitation extreme events whereas for winter events, the performance of 4- and 12-km grid models were comparable, yet both outperformed a 36-km simulation. However, regional climate simulations at these finer resolutions are computationally very expensive.

The relative importance of bias in the driving data and bias due to RCM model formulation in producing high-resolution climate scenarios for impact assessments requires further investigation. In particular the importance of bias correcting the driving data versus bias correcting the RCM output for the simulation of regional weather and climate extremes is not well understood. Recent studies that analyzed precipitation extremes in the NARCCAP ensemble found large differences in the ability of the models to capture the statistics of extremes (Wehner 2013; Singh et al. 2013). In addition, for the NARCCAP models driven by reanalysis data, the ability to capture day-to-day correspondence with observed extremes is found to vary by season and distance from the domain boundary (Weller et al. 2013), suggesting an important role for the large-scale environmental forcing of extremes and the potential for improvements through bias correcting the driving data.

The overarching goal of this study is to determine the relative merits of bias correcting RCM driving data versus bias correcting the RCM output, and whether the benefits of bias correction outweighs the benefits of high resolution for the simulation of summer extremes. Model evaluation is focused on the simulation of summertime temperature and precipitation extremes since these provide a hard test for the modeling systems, are variables that are widely used for impact assessments, and simulation skill of the extremes cannot be inferred from simulation skill of mean quantities (Wehner 2013). The specific objectives are (1) to assess the baseline dynamical downscaling ability of RCMs when driven by GCM data to reproduce the observed statistics of precipitation and temperature extremes, (2) to assess the relative merits of bias correcting RCM driving data versus bias correcting the RCM output, and (3) to assess the benefits of bias correction approaches compared to higher resolution.

2 Experimental design

The experimental approach uses global model simulations to drive limited area regional models combined with two bias correction techniques. Simulations are analyzed for statistics of summer (June–July–August) extremes for nine climatic regions (Fig. 1) across the US and evaluated using reanalysis data.

Fig. 1
figure 1

NRCM model domain at 36 km grid spacing (larger black domain) and 12 km grid spacing (smaller magenta domain), and three NARCCAP models (at 50 km grid spacing): CRCM (green domain); MM5I (blue domain) and WRFG (red domain). Nine climatic regions over the US are indicated where 1 Northwest, 2 West, 3 Southwest, 4 Northern Rockies and Plains, 5 South, 6 Southeast, 7 Northeast, 8 Ohio Valley and 9 Upper Midwest

2.1 Dynamical model simulations

Global model data are provided by an existing simulation run using the Community Climate System Model version 3 (CCSM3; Collins et al. 2006a) as part of the Coupled Model Intercomparison Project 3 (CMIP3, Meehl et al. 2007). CCSM3.0 is a fully-coupled climate-system model consisting of the Community Atmosphere Model (CAM) version 3 (Collins et al. 2006b), the Community Land Model (CLM) version 3 (Dai et al. 2003; Oleson et al. 2004; Dickinson et al. 2006), the Community Sea Ice Model (CSIM) version 5 (Briegleb et al. 2004) and the Parallel Ocean Program (POP) version 1.4.3 (Smith et al. 1992). The CCSM3 simulation used here was initialized in 1850 and ran under 20th Century emissions at T85 resolution (approximately 1.4° grid spacing in the atmosphere, and 1° in the ocean) using 26 vertical layers.

These global data have been downscaled using multiple RCMs. Done et al. (2013) describe one such set of downscaled data using the Nested Regional Climate Model (NRCM; Bruyère et al. 2013b, based on the Weather Research and Forecasting model, Skamarock et al. 2008) at two different grid spacings (36- and 12 km) over large domains that cover most of North America and the North Atlantic (Fig. 1) for the period 1995–2005. The CCSM3 data were used to drive the 36 km domain, which in turn was used to drive the nested 12 km domain, using one-way nesting. To explore downscaling ability of climate extremes and bias correction across multiple RCMs, we additionally analyze three other RCM simulations available to us from the NARCCAP (Mearns et al. 2007). Conveniently, these simulations were driven using the same global CCSM3 dataset in addition to NCEP/NCAR reanalysis data for the period 1979–1999. Specifically, we analyze simulations from the Canadian Regional Climate Model (CRCM; Caya and Laprise 1999), the Weather Forecasting and Research Model (WRFG; Michalakes et al. 2004) and the PSU/NCAR Mesoscale Model (MM5I; Grell et al. 1995) that were run using 50-km grid spacing and a smaller domain than NRCM (Fig. 1). WRFG and NRCM36 both use WRF as their base model. NRCM36 uses the Kain–Fritsch cumulus scheme while WRFG uses the Grell scheme.

2.2 Evaluation datasets

NCEP Climate Forecast System Reanalysis (CFSR; Saha et al. 2010) is a global, high resolution, coupled atmosphere–ocean-land surface-sea ice system designed to provide the best estimate of the state of the atmosphere. CFSR hourly 2 m temperature (at 0.3° grid spacing) and precipitation (at 0.5° grid spacing) for the period 1995–2005 are used to evaluate the RCM datasets. Although various studies have shown improvements of the CFSR dataset over earlier reanalysis products, noticeable biases remain. Eichler and Londoño (2013) urged caution in the use of reanalysis data such as CFSR to assess regional climate variability, especially in areas of steep topography. Although several studies showed that CFSR improved the precipitation distribution and daily precipitation statistics compared to earlier reanalysis products (Higgins et al. 2010; Wang et al. 2011; Wang and Zeng 2012), the dataset may miss the very extreme values, particularly for precipitation. However, the relatively high resolution of the CFSR data (0.3°–0.5° grid spacing) allows for comparison with the similar resolution RCM datasets. In addition, CFSR temperature and precipitation fields are similar to other observational datasets. For example, CFSR and NOAA CPC daily precipitation data are highly correlated (average correlation over the US is 0.532 and peaks at 0.985). In this study, CFSR data are used to assess the relative merits of two bias correction approaches in the context of model resolution sensitivity. It is possible that the high-resolution simulation may be penalized by using a coarser resolution CFSR evaluation dataset and this is discussed further in Sect. 4. CFSR data are also used to remove systematic bias (described below) from simulated temperature and precipitation.

2.3 Bias correction techniques

The relative merits of bias correcting RCM driving data versus bias correcting the RCM output data are assessed for the NRCM simulations only. Two bias correction methods are evaluated. Boundary Bias Correction (BBC described later in Sect. 2.3.1) is applied to the driving global model data prior to driving the RCM, and Systematic Bias Correction (SBC, described later in Sect. 2.3.2) removes the mean bias from the RCM output data.

2.3.1 Boundary bias correction (BBC)

BBC is described in Bruyère et al. (2013) and Done et al. (2013) and removes the mean bias in the annual cycle from CCSM3 data while retaining the simulated synoptic and longer timescale variability. Briefly, 6-hourly CCSM3 data are broken down into a mean annual cycle plus a perturbation term. The mean annual cycle is then replaced with a mean annual cycle calculated using NCEP/NCAR Reanalysis Project (NNRP, Kalnay et al. 1996) data for the period 1975–1994. BBC is applied to all variables needed to generate the lateral boundary conditions for NRCM; zonal and meridional wind, geopotential height, temperature, relative humidity, mean sea level pressure, and lower boundary condition of sea surface temperature. BBC is applied prior to running the 36 km NRCM simulation and this simulation is referred to hereafter as NRCM36_BBC. NRCM36_BBC is further downscaled using high-resolution (12 km grid spacing) simulation nested within the NRCM36_BBC simulation (Fig. 1) and the resulting data are referred to hereafter as NRCM12_BBC.

2.3.2 Systematic bias correction (SBC)

The relative merits of NRCM36 are assessed under an unbiased mean state, where the systematic bias of each simulated variable is removed, following Roy et al. (2012). The correction is calculated and applied at each model grid point. For temperature, the mean monthly bias is interpolated to 6-hourly values and subtracted from each simulated 6-hourly value over the period 1995–2005. For precipitation, a multiplication factor is calculated based on the ratio of mean monthly observation and simulation and applied to the 6-hourly values so that the two distributions have the same mean. The systematic bias is removed from 6-hourly temperature and precipitation simulated by NRCM36 and the resulting data are referred to hereafter as NRCM36_SBC. This systematic bias correction technique was successfully used to assess the ability of climate models to reproduce climate extremes (Roy et al. 2012). All downscaled datasets analyzed in this study are summarized in Table 1. Although NNRP data are used to bias correct the driving global model data, and CFSR data are used to bias correct the RCM outputs and CFSR data are used to evaluate the experiments, this is not an unfair comparison since Wang and Zeng (2012) suggested that no reanalysis product is superior to others in all variables at both daily and monthly time scales. In addition, there is a high correlation between CFSR and NNRP June–July–August mean temperature for the period 1995–2005, with an average correlation across the US of 0.798 that peaks at 0.995. For seasonal mean precipitation, the average correlation is 0.498 and peaks at 0.998.

Table 1 Description of regional climate model experiments

2.3.3 Boundary systematic bias correction (BSBC)

The systematic bias is removed from 6-hourly temperature and precipitation simulated by NRCM36_BBC and the resulting data are referred to hereafter as NRCM36_BSBC using similar systematic bias correction technique mentioned in the previous Sect. (2.3.2). The relative merits of NRCM36, NRCM36_BBC, NRCM36_SBC and NRCM36_BSBC to simulate temperature and precipitation extremes are assessed in this article.

2.4 Assessment indices

Daily minimum temperature (Tmin), daily maximum temperature (Tmax) and daily precipitation (Prec) are used to derive six extreme indices (Table 2) for each summer. The temperature indices are chosen to assess the intensity of hot and cold extremes using the 90th percentile of Tmax and the 10th percentile of Tmin for each summer. For precipitation, the frequency of wet days (using a threshold of 1 mm/day, see Hennessy et al. 1999), the maximum number of consecutive dry days (CDD) and extremes of daily precipitation totals using the 90th percentile value for each summer are assessed. Systematic bias correction is performed on NRCM36 and NRCM36_BBC simulated temperature and precipitation prior to calculating Tmin, Tmax and the extreme temperature and precipitation indices.

Table 2 Five extreme indices used to analyze hot and cold extremes, and precipitation intensity and frequency extremes

2.5 Performance measures

A set of performance measures (Table 3) is used to assess the ability of the RCMs to reproduce observed extremes for each climatic region (defined later in Sect. 2.6). Bias indicates systematic error caused by differences in physical and/or geometric factors (terrain elevation, vegetation type, vegetation fraction, soil type, etc.) between simulation and observations. The root-mean-square error (RMSE) assesses the average magnitude of the error. The standard deviation error (SDE) assesses the average magnitude of the error when the bias is removed and therefore indicates the average magnitude of the error in the variability. SDE stems from observational error, or from initialization and boundary conditions of the model. The variance ratio (VR), defined as the ratio of the simulated variance to the observation variance, is used to assess the ability of RCMs to reproduce the spread of the summer season extremes. If VR <1, model variance is less than observed and if VR >1 model variance is greater than observed. Finally, Spearman’s correlation (SC) is used to assess the ability of the simulations to capture the spatial pattern of the extremes within each climatic region.

Table 3 Skill score definitions used to evaluate model performance

Statistical significance tests are performed to assess the significance (at the 95 % confidence level) of differences in temperature and precipitation extremes simulated by NRCM36, NRCM36_BBC, NRCM36_SBC, NRCM36_BSBC and NRCM36_BBC in comparison to CFSR. Given that extremes are not normally distributed the t test is not the most appropriate significance test. Instead the classical non-parametric Wilcoxon–Mann–Whitney rank-sum test is used. Wilks (2006) states that this test is resistant to outliers and robust in the sense that it is almost as powerful as the t test. The test statistic is a function of the sum of the ranks of the pooled samples. If the sum of the ranks is far apart compared to most other possible partitions of the data, then the null hypothesis that the samples are drawn from the same distribution is rejected.

2.6 Climate regions

Nine climatic regions (Northwest, West, Southwest, northern Rockies and Plains, upper Midwest, Ohio Valley, South, Southeast and Northeast) (Karl and Koss 1984) have been chosen to validate the RCM simulations over the US (Fig. 1). For this study, performance measures are calculated for the extreme indices for each climatic region.

3 Results

3.1 Dynamical downscaling ability of RCMs

In this section, the ability of CCSM3 and several CCSM3-driven regional climate models to simulate the spatial distribution of Tmax, Tmin and Prec is evaluated for the period for which all RCM datasets overlap (1995–1999). Figure 2 shows average Tmax, Tmin and Prec from CFSR for the period 1979–2010, and the difference between CCSM3 and CFSR and the difference between each RCM and CFSR for the period 1995–1999. Although 5 years is insufficient to remove the influence of decadal variability phase disagreements between CFSR and CCSM3, using 32 years of CFSR is sufficient to smooth out decadal variability from the evaluation dataset. Sensitivity tests using a series of 5-year periods (1980–1984, 1985–1989, 1990–1994, 1995–1999 and 2000–2004) of the CFSR data are shown in terms of variability in the bias fields using different observed 5-year periods. The variability in the bias field is quite low (Fig. 2) and for most regions the bias patterns are robust to the evaluation period except for regions where the bias changes sign (warm to cold and dry to wet). In general, CCSM3 exhibits a warm bias in Tmax and a cold bias in Tmin. In summer biases in CCSM3-simulated Tmax and Tmin are caused by misrepresentation of convective events. For precipitation, CCSM3 has a wet bias over the Central Great Plains and eastern US and dry bias elsewhere (Fig. 2).

Fig. 2
figure 2

Daily summer maximum and minimum temperature (K) and precipitation (mm/day) derived from CFSR for the period 1979–2010 (top row), and the biases in CCSM3, three NARCCAP models (CRCM50, MM5I50 and WRFG50) and NRCM36 simulated maximum and minimum temperature (K) and precipitation (mm/day) with respect to CFSR for the period 1995–1999. Shaded areas in the bias plots represent regions within ±0.5 for the coefficient of variation of biases in the model simulations for 1995–1999 with respect to series of 5-year periods (1980–1984, 1985–1989, 1990–1994, 1995–1999 and 2000–2004) of CFSR data

The three NARCCAP models and NRCM36 improve the simulations of Tmax, Tmin and Prec in comparison to their driving model, CCSM3 (Fig. 2). The pronounced warm bias in CCSM3-simulated Tmax is reduced in all the RCMs with the magnitude of the reduction dependent on the RCM. In some RCMs this cooling is sufficient to reverse the sign of the Tmax bias over the western US. The strong cold bias over northern, western and northwestern US in CCSM3-simulated Tmin is diminished in all the RCMs (Fig. 2). The wet bias in Prec over the Central Great Plains and eastern US is reduced in all the RCMs while the dry bias is enhanced, particularly over the southeastern US (Fig. 2). NRCM36 is wetter everywhere than WRFG and the Kain–Fritsch cumulus parameterization scheme used in NRCM36 has been shown to overestimate precipitation (Raktham et al. 2014).

Although dynamical downscaling using RCMs considerably improves the CCSM3 simulation of Tmax, Tmin and Prec, the remaining bias reveals that the RCMs are limited by both the quality of the boundary conditions provided by the CCSM3 and the RCM formulation. Both are explored in the remainder of this section.

3.2 Impact of boundary bias correction

The influence of boundary forcing errors are illustrated by comparing the spatial distributions of biases in Tmax, Tmin and Prec derived from the NARCCAP model CRCM driven by NNRP versus driven by CCSM3 with respect to CFSR for the period 1979–1999 (Fig. 3). Overall results suggest that there are noticeable differences in biases in CCSM3- and NNRP-driven CRCM simulations. For Tmax, CCSM3-driven simulation has warmer bias than that for NNRP-driven simulation with the largest differences in biases of 4 K over central, southern and southeastern United States. For Tmin, the differences in biases are far smaller and within ~1 K. For Prec, the CCSM3-driven simulation has drier bias than the NCEP-driven simulation over South, Southeast and Ohio Valley while NCEP-driven simulation has wetter bias over Southwest and Rockies. The other RCMs also showed evidence for boundary forcing error (not shown). These boundary-forcing errors provide motivation to perform bias correction on the driving CCSM3 data.

Fig. 3
figure 3

Biases in daily summer maximum and minimum temperature (K) and precipitation (mm/day) derived from the NARCCAP model CRCM driven by NCEP-NCAR reanalysis data (top row) and driven by CCSM3 (bottom row) with respect to CFSR for the period 1979–1999

The performance of boundary bias correction (BBC) for the simulation of regional climate mean quantities is evaluated by comparing NRCM36 and NRCM36_BBC simulations with CFSR (Fig. 4). The spatial bias pattern in NRCM36_BBC reveals that BBC considerably improves NRCM36 simulated Tmax, Tmin and Prec shown in Fig. 2. Further downscaling to 12 km (NRCM12_BBC) further improves the warm bias in Tmin over the Western US, but becomes too cold for Tmax and too wet over much of the US (Fig. 4). The performance of these simulations for summer extremes is presented in the next section.

Fig. 4
figure 4

Biases in NRCM36_BBC and NRCM12_BBC simulated daily summer maximum and minimum temperature (K) and precipitation (mm/day) for the period 1995–2005

3.3 Impact of bias correction and resolution for the simulation of extremes

The relative merits of bias correcting RCM driving data (i.e. BBC) versus bias correcting the RCM output (i.e. SBC) for the simulation of two summertime temperature and three summertime precipitation extreme indices are presented here for the nine climatic regions over US (Fig. 1). The NRCM is the RCM chosen to conduct this comparison because it simulated Tmax, Tmin and Prec well overall (as described in Sect. 3.1 and Fig. 2).

Figure 5 shows the bias in the 90th percentile of Tmax (Tx90), 10th percentile of Tmin (Tn10), 90th percentile of precipitation (P90), number of wet days (WD) and maximum number of consecutive dry days (CDD) for NRCM36, NRCM36_SBC, NRCM36_BBC, NRCM36_BSBC and NRCM12_BBC. In general, for Tx90 all simulations have a cold bias with high resolution degrading the benefits of bias correction. Specifically, NRCM36 has a cold bias in Tx90 over most climatic regions with the largest bias over western regions. The cold bias is reduced in NRCM36_SBC, NRCM36_BBC and NRCM36_BSBC over western regions and the South (Fig. 5). NRCM36_SBC and NRCM36_BSBC slightly improve the simulation of Tx90 over eastern regions while NRCM36_BBC increases the cold bias. Over the Midwest, SBC and BBC are unable to improve the performance of NRCM36. In general, for Tn10, simulations have a cool bias in eastern regions and a warm bias in western regions. Bias correction increases the warm bias over western regions with some recovery with increased resolution. Over eastern regions bias correction has little impact on the small biases whereas increased resolution increases the cold bias. Specifically, SBC and BSBC reduce the cold bias in NRCM36 over southern and eastern climatic regions while BBC tends to increase the cold bias. The biases over Midwest are reduced with both SBC and BBC.

Fig. 5
figure 5

Bias in the 90th percentile of Tmax (Tx90), 10th percentile of Tmin (Tn10), 90th percentile of precipitation (P90), number of wet days (WD) and maximum number of consecutive dry days (CDD) for NRCM36, NRCM36_SBC, NRCM36_BBC, NRCM36_BSBC and NRCM12_BBC experiments with respect to CFSR. The x-axis represents all the experiments for each extreme index and the y-axis represents the nine climatic regions

For P90, NRCM36 has a wet bias over the majority of the climatic regions (Fig. 5). BBC generally produces little impact while SBC and BSBC enhance the wet bias in eastern regions. Increasing resolution tends to reduce the wet bias and changes over to a dry bias in eastern regions. NRCM36 overestimates WD over a majority of climatic regions with the largest bias over western climatic regions (Fig. 5). The bias correction procedures and increased resolution have little impact, although BBC brings the greatest bias reduction. NRCM36 underestimates CDD over most regions with the largest bias over western regions and is consistent with the overestimation in WD (Fig. 5). Similar to WD, each bias correction procedure and increased resolution has little impact although, again, BBC brings the greatest bias reduction.

Figure 6 shows the RMSE for the five extreme indices. For Tx90, errors are reduced in the NRCM36_SBC and NRCM36_BSBC experiments over most regions with high resolution generally degrading the benefits of bias correction. NRCM36 captures Tn10 well with bias correction having little impact. NRCM12_BBC brings improvement in the performance of NRCM36_BBC over western regions and degradation over eastern regions. For P90, NRCM36 has low RMSE with bias correction and high resolution acting to increase the error. For WD and CDD, NRCM36 has lowest RMSE over eastern regions indicating rainfall frequency errors over the complex topography of the Western US that is only somewhat improved with bias correction.

Fig. 6
figure 6

Same as Fig. 5, but for RMSE

Figure 7 illustrates the SDE for the five extreme indices. For Tx90, systematic bias correction only plays a minor role in alteration of variability error while BBC acts to increase or decrease the variability error depending on the region. For Tn10, each bias correction has little impact on SDE while higher resolution reduces the error. For precipitation indices, both bias correction and higher resolution have little impact although act to increase the variance error in P90 over eastern regions.

Fig. 7
figure 7

Same as Fig. 5, but for SDE

Figure 8 shows the VR for the five extreme indices. For Tx90, both bias corrections generally have little impact on the spread although both bias corrections and higher resolution increase and degrade the spread compared to CFSR over western regions. For Tn10 both bias corrections again have little impact. For P90, SBC increases and degrades the spread whereas BBC has little impact. Bias corrections have little impact on the high model variance in WD over western regions and CDD sees generally mixed results. Higher resolution generally increases the spread across all five extreme indices.

Fig. 8
figure 8

Same as Fig. 5, but for VR. Blue colors indicate a variance ratio less than 1 (model variance less than observed) and red colors indicate a variance ratio greater than 1 (model variance greater than observed)

Figure 9 shows SC, a measure of spatial correlation, for the temperature and precipitation extreme indices. In general, SC is higher in western than eastern regions likely due to the strong local topographic forcing over western regions. SC is also higher for temperature than precipitation likely due to the convective-scale noise inherent in the spatial precipitation fields and precipitation is much more complicated to simulate than temperature because of microphysics and interaction with topography. Both bias corrections generally have little impact on the spatial correlation of the extremes although SBC is notable in the degradation of the spatial correlation in P90 and WD over eastern regions.

Fig. 9
figure 9

Same as Fig. 5, but for SC

The overall impact of bias correction and model resolution for simulation of the statistics of the summer extremes is summarized in Table 4. The table shows the experiments that were statistically different (at 95 % confidence) to the non-bias corrected simulation (NRM36). For Tx90, only a few experiments have a significant impact over eastern regions while the majority of the experiments have a significant impact over western regions. For Tn10, all three bias correction experiments have a significant impact over western regions (except the Northwest) while over eastern regions, increasing resolution has a significant impact (Table 4). For P90, the majority of experiments have significant impact over the majority of climatic regions (except Southwest). For WD, on the other hand, only western regions show significant impacts for all experiments and for CDD, results are generally mixed.

Table 4 A summary of the regional climate experiments that were statistically different (at 95 % confidence) from the non-bias corrected experiment (NRCM36) for nine climatic regions and five extreme indices

The overall performance of regional climate experiments over nine climatic regions and five extreme indices is summarized in Table 5. The table shows the regional climate experiments that were statistically similar (at 95 % confidence) to CFSR. If two or more experiments are significant for the same climatic region and index, the experiment listed has the highest skill scores (Table 3). For Tx90, SBC performs best over Northeast, South, Midwest and Rockies and BBC shows best performance over West. None of the experiments show any improvement in comparison to CFSR over Southeast, Ohio Valley and Southwest (Table 5). For Tn10, experiment with increasing resolution performs best over Rockies, Southwest and West and SBC improves the model performance over Southeast, South and Midwest while bias corrections and increasing resolution degrade the model performance over Ohio Valley and Northwest (Table 5). For P90, BBC and increasing resolution show best performance over majority of the climatic regions except Northeast, Ohio Valley and Midwest. For Northeast and Ohio Valley, bias correction and increasing resolution degrade the model performance while none of the experiments shows any improvement over Midwest (Table 5). For WD, none of the experiments shows any improvement over western regions and Rockies while increasing resolution performs best over Northeast and South and BBC and SBC improve the model performance over rest of the climate regions (Table 5). For CDD, SBC shows best performance over Ohio Valley and South while bias corrections and increasing resolution degrade the model performance over eastern regions and none of the experiments shows any improvement in comparison to CFSR over Rockies and western regions (Table 5). For some cases, application of both bias corrections (BSBC) provides greater benefit, albeit slight, than either BBC or SBC alone but for majority of these cases, there is no significant difference between BSBC and either SBC or BBC.

Table 5 A summary of the regional climate experiments that were statistically similar (at 95 % confidence) to CFSR for nine climatic regions and five extreme indices

4 Discussion and conclusion

The quality of high-resolution climate scenarios for impact assessments obtained through dynamical downscaling is limited by the quality of the driving data and RCM error. For RCMs driven by GCM data, the relative merits of two approaches to bias correction (bias correcting RCM driving data versus bias correcting the RCM output) were examined and compared to the benefits of higher model resolution for the simulation of summertime temperature and precipitation extremes. Analysis focused on the 5-year period 1995–1999 for multi-RCM analysis over the US and the 11-year period 1995–2005 for the NRCM across nine climatic regions of the US.

Initial analysis considered daily temperature maxima and minima and daily rainfall. The driving model, CCSM3 had an overall warm bias in temperature maxima, a cold bias in temperature minima, and wet bias over the central Great Plains and eastern US and dry bias over western regions. Biases in CCSM3 may be attributed to the coarse representation of local climate forcings such as lake-land temperature and moisture contrasts (Molders et al. 1996) and lower mountain peaks and higher valleys than observed, misrepresentation of convective events, vertical grid resolution and incorrectly simulated atmospheric moisture transport. In CCSM, terrain elevation is grid-cell average height, so mountains are flatter than the highest natural peaks. Consequently, orographically-induced precipitation may be underestimated or occur further downwind than in nature. The RCMs considered here generally reduce the magnitude of the biases in CCSM3 for 1995–1999, although the RCMs also change the large-scale spatial pattern of the biases and remaining bias indicates that dynamical downscaling is limited by the quality of the boundary conditions provided by the CCSM3 and RCM error.

To assess the role of bias correction and model resolution for simulation of the statistics of the summer extremes, a comprehensive evaluation was conducted for the NRCM using five extreme indices of temperature and precipitation, nine climatic regions and five performance measures. The NRCM has a general cool bias for hot and cold extremes and the cool bias for hot extremes is greater than the cool bias in cold extremes. The NRCM also has a wet bias for wet precipitation extremes and dry bias for dry precipitation extremes that is particularly pronounced over western regions. In general, the bias correction methods had significant impacts over western regions whereas impacts over eastern regions were less consistent. Both bias correction methods generally reduced the bias and reduced the average magnitude of the errors. The notable exception is for extreme precipitation intensities in eastern regions for which bias correcting the model output severely degraded model performance. Bias correcting the mean of the precipitation intensity distribution therefore does not lead to a reduction in bias in the extremes. Conversely, boundary bias correction improved simulation of extreme precipitation intensities in this region and this may be due to improved representation of the large-scale environment within which convective episodes develop.

Higher resolution tended not to lead to further improvements. However, the performance of model physics across scales and the use of a coarse resolution evaluation dataset requires careful interpretation of these results. First, we did not perform any additional tuning of the parameterizations for simulations on the 36- and 12 km grids. The parameterizations will therefore interact differently with the resolved scales on the two different grids meaning comparisons are not solely due to model resolution. Secondly, it is likely that our 12 km simulation is being penalized by the use of the coarser resolution evaluation dataset. For example, high resolution leads to a more realistic representation of orography and surface fields, and for the Western US Prein et al. (2013) revealed that a 12 km simulation outperforms a 36 km model simulation to simulate daily heavy precipitation in comparison to precipitation derived from 99 stations within the Snowpack Telemetry network (Serreze et al. 1999). However, the differences between NRCM36_BBC and NRCM12_BBC are not only at small scales but extend to continental scales (see Fig. 4, for example). It is therefore likely that the large-scale environment is also sensitive to model resolution and these large-scale differences can be evaluated fairly using the coarser resolution evaluation dataset. Future work will use multiple resolution evaluation datasets and surface station data to more comprehensively assess the relative merits of bias correction versus high resolution.

Isolating error in the variance from error in the bias revealed that bias correcting the mean of the RCM output has little impact on error in the variance (by definition). Boundary bias correction, however, either increased or decreased the variance error depending on the region and is likely due to the changed variability in the large-scale environment (permitted by the large NRCM domain) that improved the variance in some regions and degraded it in others. Higher resolution tends to increase variability in the summer extremes above that in CFSR for both temperature and precipitation and in particular precipitation intensity. Finally, temperature and precipitation extremes have a far higher spatial correlation with observations over the Western US than over the Eastern US likely due to the strength of the local forcing from complex topography in the West. This agrees with Singh et al. (2013) who found NARCCAP ensemble was unable to accurately capture spatial patterns of extreme precipitation in the western region. Spatial correlations are also far higher for temperature than precipitation extremes due to the dominance of convective scales in the precipitation fields.

Overall, both bias corrections generally reduced the bias and overall error with some indication that boundary bias correction provided greater benefits than bias correcting the mean of the RCM output data, particularly for precipitation. Regional extreme precipitation is strongly influenced by the large-scale environment that is improved through boundary bias correction, as found by Done et al. (2013) for the case of tropical cyclones, whereas correcting the mean of the RCM output does not necessarily confer benefits to the extremes. This study also showed that bias correction techniques (particularly using the ‘mean-correction’ approach as studied here) are regionally dependent as bias correction can increase errors in some regions. Therefore, the effects of a particular bias correction should be fully studied for a region before being used in impact studies. In summary, this study demonstrates that application of simple bias correction approaches such as correcting the mean bias in the driving data and correcting the downscaled outputs can improve the quality of dynamical downscaling in producing summer temperature and precipitation extremes. High resolution tended not to lead to further improvements though further work is needed using multiple resolution evaluation datasets and convection permitting resolution simulations to comprehensively assess the value of high resolution. Additional work is needed to explore the relative merits of bias correction to alternative methods that improve the representation of extremes in model data such as weighting RCM ensemble members (e.g. Wehner 2013) or improvements in model physics (e.g. Yang et al. 2012).