1 Introduction

Global climate change has led to an increase in temperature, which has widespread impacts on humans and ecosystems (Matthews et al. 2017; Smith et al. 2015; Sun et al. 2020). In recent years, extreme climate events such as heatwaves, droughts, and floods have resulted in notable mortality and morbidity, and catastrophic consequences on economic productivity and the environment (Blöschl et al. 2017; Demirhan (2022); Sheridan and Allen 2015; Yuan et al. 2019; Zhang et al. 2022). Among them, the influence of extreme maximum and minimum temperature events is most widespread and obvious (Burkart et al. 2021; Huang et al. 2022; Liu et al. 2022; Wu et al. 2020). The Sixth Assessment Report of the Intergovernmental Panel on Climate Change points out the occurrences of extreme events are expected to be more intense and frequent in the future (IPCC 2021). To better adapt and mitigate climate change and reduce the societal and economic losses caused by extreme maximum and minimum temperatures, reliable predictions of future daily maximum and minimum temperatures are required.

Global Climate Models (GCMs) have been widely applied to simulate and predict climate changes. GCMs from the newest Coupled Model Intercomparison Project Phase 6 (CMIP6) incorporate new climate processes compared to previous generations, which provides an unprecedented opportunity to generate reliable future climate projections and conduct climate impact studies (Eyring et al. 2016; Grose et al. 2020; Nie et al. 2020). However, it is noteworthy that large systematic biases are exhibited in the model simulations and the resolution of GCMs is too coarse for regional impact studies (Cannon 2017; Li et al. 2020; Maraun 2016). Therefore, it is necessary to downscale and bias correct GCM outputs for fine spatial-scale climate information (Yang et al. 2018).

Various bias correction methods have been developed to downscale the climate variables and reduce model biases. Based on their schemes, commonly applied methods can be divided into three categories. The first considers the spatial consistency of climate variables such as analog approaches (Maurer et al. 2010; Pierce et al. 2014). The second is to adjust the mean and standard deviation of climate variables in the model simulations based on observations, such as linear scaling, and delta change methods (Chen et al. 2011a, b; Fang et al. 2015). The third is to correct the probability distribution of the modeled simulations by applying the transfer function to the Cumulative Distribution Function (CDF) of modeled and observed variables (Maraun 2013; Pierce et al. 2015). For instance, Quantile Mapping (QM) and Distribution mapping (DM) are based on empirical and theoretical CDF, respectively (Zhu et al. 2022). QM is advantageous in its effectiveness in removing model biases not only for the mean and standard deviation but also for extreme events (Teutschbein and Seibert 2012; Thrasher et al. 2012). Many bias correction methods, including QM and DM as well as the second category of bias correction methods, assume that the statistical relationship between historical model simulations and observations remains unchanged in the future, while the distributions are expected to change with time (Wood et al. 2004; Yang et al. 2018). These conventional bias correction methods may artificially distort the climate change signals, and methods that could efficiently preserve the changes are required (Bürger et al. 2013; Hempel et al. 2013; Li et al. 2010).

Quantile Delta Mapping (QDM) and Scaled Distribution Mapping (SDM) are developed based on QM, which retains the advantages of QM and considers the changes of CDF for different time periods (Cannon et al. 2015; Switanek et al. 2017). Both methods are appealing given that they are computationally and conceptually simple and thus have been applied in daily mean temperature and precipitation prediction as well as extreme indices of temperature and precipitation prediction (Eum and Cannon 2017; Lanzante et al. 2019; Qin and Dai 2022; Tong et al. 2020). The results have shown that QDM and SDM can effectively mitigate biases across the distribution of climate variables (Casanueva et al. 2020; Switanek et al. 2017). Despite their advantages for bias correction, two major research gaps of QDM and SDM are identified from previous literature. Firstly, the evaluation of bias correction methods applied in daily maximum and minimum temperatures is relatively limited. For example, Wang and Tian (2022) compared the Super Resolution Deep Residual Network deep learning model with QDM for multivariate bias correction of daily maximum and minimum temperatures. However, the work didn’t thoroughly evaluate the performance of QDM in capturing the temporal and spatial pattern of daily maximum and minimum temperatures. Secondly, most of the previous studies assessed bias correction methods using global climate outputs from CMIP5 or its previous generations (Switanek et al. 2017). CMIP6 models have a higher climate sensitivity than previous generations, where hotter temperature projections are expected (Forster et al. 2020; Gettelman et al. 2019; Li and Li 2022). CMIP6 models have a higher climate sensitivity than previous generations, where hotter temperature projections are expected (Zelinka et al. 2020). Considering that the change in temperature remains uncertain, it is important to reassess the capability of bias correction methods with the latest CMIP6 simulations.

To fill the above-mentioned gaps, this paper presents a comprehensive study on evaluating the performance of QDM and SDM on bias correcting daily maximum and minimum temperatures with CMIP6 GCMs for Canada. The main purpose of the study is to assess how bias correction methods modulate climate change signals from different temporal scales and to identify the optimal bias-correction model with CMIP6 GCMs based on their ability in capturing the temporal and spatial pattern of daily temperatures for further application in regional impact studies. We stress that to our knowledge, intercomparative analysis of different bias correction methods in generating high-resolution simulations of daily temperatures is not available in the literature. Therefore, the study provides valuable information to improve the understanding of these techniques as well as build confidence in applying them in CMIP6 GCMs for generating reliable high-resolution temperature projections, which could serve as a foundation for further application in regional climate impact studies.

The structure of the paper is organized as follows: Sect. 2 describes the observations and CMIP6 GCM simulations used, bias correction methods, and evaluation metrics applied to assess the model performances. The performances of QDM and SDM methods and raw CMIP6 models in simulating daily maximum and minimum temperatures are shown in Sect. 3, where in-depth comparisons of bias correction methods and GCMs in capturing the temporal and spatial characteristics of daily temperatures are conducted. Section 4 summarizes the main findings and discusses the contribution and limitations of the study.

2 Methodology

2.1 Data

In this study, daily temperature simulations archived by the latest Coupled Model Inter-comparison Project Phase 6 (CMIP6) are used for bias correction over Canada. Daily maximum temperature (Tmax) and daily minimum temperature (Tmin) are available in 11 GCMs with a resolution of 100 km when accessed in January 2022 (Table 1). Those GCMs are available for historical scenario and three representative future scenarios (SSP1-2.6, SSP2-4.5, and SSP5.8.5), which could be further applied to generate future high-resolution projection. GCMs with a coarser resolution than demanded have not been assessed in this study considering the substantial biases introduced (Casanueva et al. 2020; Xie et al. 2015). Prior to bias correction, the GCM outputs are interpolated to a universal 1° × 1° grid using bilinear interpolation to facilitate comparisons across models. The multi-model ensemble mean is generated by merging 11 GCMs with equal weights (Li et al. 2020). QDM and SDM are applied to the ensemble mean with the results labeled as “mean”. QDM and SDM are also applied to the 11 individual GCMs. For each bias correction method, an ensemble mean was created by merging 11 bias-corrected models, with the results labeled as “ensemble mean”.

Table 1 Information of CMIP6 models used in this study

The observed Tmax and Tmin covering 1950–2012 over Canada with latitudes ranging from about 42°N to 83°N and longitudes ranging from approximately 53°W to 141°W on a 1/12° grid (∼ 10 km) are used to validate and bias correct the simulations. The NRCANmet observational dataset is obtained from Natural Resources Canada (Hopkinson et al. 2011; McKenney et al. 2011). The gridded dataset was developed from station observations at Environment Canada observing sites using the Australian National University Spline (ANUSPLIN) smoothing splines (Jeong et al. 2015; Werner and Cannon 2016). NRCANmet outperforms other gridded observation products and has been widely used over Canada for downscaling, trend analysis, and impact assessment (Islam and Déry 2017; Mandal et al. 2016; Singh et al. 2021). The GCM simulations are interpolated to the same grid as observations with bilinear interpolation. The study takes the period 1961–1990 as the calibration period for the model establishment and 1991–2010 as the validation period to evaluate the performance of bias correction.

2.2 Bias correction methods

QDM is one of the most widely adopted bias correction methods in recent studies. Compared to QM, QDM takes the difference between historical and future simulations into account. Thus, QDM not only preserves the changes in quantiles but also bias corrects future climate projections with the CDF of observations, which results in superior performances.

As defined in Eq. 1, the equation of QDM for temperature comprises two terms: the historical bias-corrected value term employing CDF of observations and the relative change term indicating the change in quantiles between historical and future periods.

$$\hat{x}_{m,p} (t) = \hat{x}_{o:m,h:p} (t) + \Delta_{m} (t)$$
(1)
$$\hat{x}_{o:m,h:p} (t) = F_{o,h}^{ - 1} \left\{ {F_{m,p}^{(t)} \left[ {x_{m,p} (t)} \right]} \right\}$$
(2)
$$\Delta_{m} (t) = x_{m,p} (t) - F_{m,h}^{ - 1} \left\{ {F_{m,p}^{(t)} \left[ {x_{m,p} (t)} \right]} \right\}$$
(3)

where \({\widehat{x}}_{m,p}(t)\) is the bias-corrected value of the model projected variable at time t, \({\widehat{x}}_{o:m,h:p}(t)\) is the historical bias-corrected value and \({\Delta }_{m}(t)\) is the change signal in quantiles, which is calculated with Eqs. 2 and 3, respectively, \({x}_{m,p}(t)\) is the model projected variable at time t, \({F}_{m,p}^{(t)}\) is the CDF established from the model projected data at time t, and \({F}_{o,h}^{-1}\) is the inverse CDF from observed historical data and \({F}_{mh}^{-1}\) is the inverse CDF from modeled historical data. A more detailed description of QDM can be found in Cannon et al. (2015).

SDM is a trend-preserving parametric method and it makes no assumption of stationary. SDM is employed for each month separately. A normal probability distribution is fitted to the detrended observed historical time series, detrended raw historical modeled time series, and detrended raw future modeled time series, respectively. The scaling is calculated between the fitted raw future model distribution and the fitted raw historical distribution at each probability of the events occurring in the raw future model with Eq. 4.

$${\text{SF}}_{{\text{A}}} = \left[ {{\text{ICDF}}_{MODF} (CDF_{MODF} ) - {\text{ICDF}}_{MODH} (CDF_{MODF} )} \right] \times \left( {\frac{{\sigma_{{{\text{OBS}}}} }}{{\sigma_{{{\text{MODH}}}} }}} \right)$$
(4)

where SFA is an array of absolute scaling factors, ICDFMODF and ICDFMODH are inverse CDF for the fitted future and historical model distributions, respectively. CDFMODF is the CDF for the raw future data. σOBS and σMODH are the standard deviations of the observed and raw historical data.

The recurrence intervals (RIs) for three fitted distributions are calculated with Eq. 5 and the adjusted RISCALED for the raw future model is then calculated with Eq. 8. The modified CDF is generated with RISCALED (Eq. 7) and the bias-corrected values for the detrended future modeled time series are calculated with Eq. 8. Lastly, the trend of the raw future modeled times series is added back into the bias-corrected time series.

$${\text{RI}} = \frac{1}{{0.5 - \left| {{\text{CDF}} - 0.5} \right|}}$$
(5)
$${\text{RI}}_{{\text{SCALED}}} = \max (1,\frac{{RI_{IOBS} \times RI_{MODF} }}{{RI_{IMODH} }})$$
(6)
$${\text{CDF}}_{SCALED} = 0.5 + {\text{sng}} (CDF_{OBS} - 0.5) \times \left| {0.5 - \frac{1}{{{\text{RI}}_{{\text{SCALED}}} }}} \right|$$
(7)
$${\text{BC}}_{INITIAL} = {\text{ICDF}}_{OBS} (CDF_{SCALED} ) + {\text{SF}}_{{\text{A}}}$$
(8)

where RIMODF is RI for the raw future model and RIIOBS and RIIMODH are the linearly interpolated RIs for the raw observations and raw historical model, respectively.

2.3 Evaluation indicators

To quantify the performance of bias-corrected models and raw GCM models, Taylor Diagram is first employed to facilitate the comparisons between observations and models (Taylor 2001). The correlation coefficient (R value), root mean square error (RMSE), and standard deviation are shown in the figures. A model with a higher R value, lower RMSE, and a similar standard deviation is implied to have better performance relative to observations. Subsequently, the bias and Mean Absolute Error (MAE) of QDM and SDM corrected models are analyzed to assess the ability in simulating spatial patterns and probability distribution (Cort and Kenji 2005; Wang and Chen 2014).

3 Results

3.1 Overall performance of GCM, QDM, and SDM

Taylor Diagram is introduced to evaluate the overall performance of GCMs and the bias-corrected simulations. Tmax (Fig. 1a–c) and Tmin (Fig. 1d–f) derived from each original GCM and bias-corrected simulations with QDM and SDM are evaluated against the observations on an annual scale. For Tmax, the R values of GCMs are 0.87 and 0.88 for QDM and SDM. The R values are also relatively high for Tmin, with 0.86 for GCMs and 0.88 for QDM and SDM. QDM and SDM corrected ensemble means have a slightly higher R value than bias-corrected individual models with no significant difference. The ensemble means of GCM and bias-corrected models have higher R values (greater than 0.9.) for Tmax and Tmin. The high correlation coefficients indicate that GCMs before and after bias correction are in good agreement with observations. However, GCM and SDM have large normalized standard deviations of around 3.7 and 3 for Tmax and Tmin, respectively, while QDM has a much smaller normalized standard deviation of around 1.1 which is closer to the observed standard deviation. This indicates that GCM and SDM projections have higher annual variability than QDM projections and observations. QDM has a similar variation to observations. Among the raw GCMs, TaiESM1 has the highest normalized standard deviation of 4.3 and 3.3 for Tmax and Tmin, respectively, while other GCMs have a similar normalized standard deviation of approximately 3.6 and 3. This implies raw TaiESM1 has higher variation than other GCMs. As for the RMSE difference, GCM and SDM have similar values of 2.9 and 2.2 for Tmax and Tmin, respectively, with QDM having a smaller value of 0.5, which indicates QDM’s performance is better than GCM and SDM relative to observations. Overall, QDM simulations are more consistent with observations and show better performance in simulating Tmax and Tmin, while SDM shows a better ability in preserving raw GCM signals.

Fig. 1
figure 1

Taylor diagram for Tmax and Tmin obtained from GCMs, QDM, and SDM simulations during 1991–2010 at the annual scale

3.2 Monthly performance of GCM, QDM, and SDM

Given that QDM and SDM are quantile-based methods, their performance in reproducing the observations for selected quantiles and standard deviation for each month is further evaluated. Figure 2 shows the monthly mean biases of Tmax (°C) of models for the 25th, 50th, and 75th percentiles, mean, and standard deviation. For GCMs, the biases for all percentiles share a similar seasonal pattern. Nor-ESM2-MM and EC-Earth3 underestimate in all statistics analyzed for all months. In the winter months, NorESM2-MM, TaiESM1, INM-CM4-8, and INM-CM5-0 underestimate Tmax for all percentiles with larger biases found in the lower percentile, while MRI-ESM2-0 overestimates Tmax, having larger biases at higher percentiles. In spring, MRI-ESM2-0, CMCC-ESM2, GFDL-ESM4, and INM-CM4-8 simulate higher values than observed Tmax for the 25th percentile with biases greater than 2 °C. In summer, GFDL-ESM4, INM-CM4-8, INM-CM5-0, and NorESM2-MM have the largest negative biases for the 75th percentile. The largest underestimation is found in July with a bias value greater than 2.5 °C. As for standard deviation, the majority of GCMs have a similar variation to observations. GFDL-ESM4, INM-CM4-8, and INM-CM5-0 have smaller variations in the hot season while TaiESM1 has a larger variation in the cold season. The ensemble mean of GCMs has low biases for the median and mean and smaller variance across the year. Large overestimation for the 25th quantile and underestimation for the 75th quantile are also found across the year, especially in winter. Compared to each ensemble member, the spread of the ensemble mean would be narrowed down. This would result in a low bias for mean and median while overestimation for low quantiles and underestimation for high quantiles.

Fig. 2
figure 2

Mean biases of Tmax (°C) of models for the 25th, 50th, and 75th percentiles, mean and standard deviation (for each pair of plots, the left panel shows the results for the raw models and the right two panels show the results for the QDM and SDM bias-corrected models)

QDM and SDM show a significant reduction in biases for all percentiles with a mean bias smaller than 0.7 °C. QDM and SDM show similar skills in removing the biases for the whole distribution of Tmax. Similar seasonal patterns are found in the 25th, 50th, and 75th percentiles. Except for October, where QDM and SDM underestimate Tmax for all percentiles. Relatively large negative biases are found in March and April for higher percentiles with CMCC-ESM2, GFDL-ESM4, INM-CM4-8, and INM-CM5-0. EC-Earth-Veg bias-corrected by SDM has the best performance with low biases for all percentiles across the year. In terms of standard deviation, QDM and SDM have a similar variation to the observed Tmax with slightly smaller variations in the hot season and slightly larger variations in the cold season. The ensemble means of QDM and SDM corrected models show a slightly better performance than the ensemble mean of GCMs, with low biases for mean and median, large overestimation for the 25th quantile, and underestimation for the 75th quantile and variance. Relatively large biases are found in winter for the 25th quantile and the 75th quantile, with biases greater than 3 °C. While QDM and SDM corrected ensemble means are found to have low biases for all percentile and variance, and share similar seasonal patterns with other QDM and SDM corrected models. For the mean and median, QDM and SDM corrected ensemble means have similar biases to GCM ensemble mean with biases less than 1 °C. Low biases of QDM and SDM for all percentiles and standard deviation indicate that QDM and SDM can effectively adjust the distribution of simulations to be well-matched with the observed Tmax. In general, all CMIP6 GCMs bias corrected by the QDM and SDM methods are able to capture the seasonal and monthly pattern of Tmax and exhibit close similarity to the observations.

The monthly mean biases of Tmin (°C) of models for the 25th, 50th, and 75th percentiles, mean and standard deviation is shown in Fig. 3. Similar to Tmax, all percentiles share a similar seasonal pattern for raw GCM results of Tmin. Except for NorESM2-MM and TaiESM1, GCMs have positive biases for most months across the year, which indicates GCMs tend to overestimate Tmin. AWI-CM-1-1-MR, MPI-ESM1-2-HR, MRI-ESM2-0, and CMCC-ESM2 have large positive biases higher than 1 °C for all months and percentiles. Especially in October–December, the biases values are greater than 2 °C and higher biases values are found in lower percentiles. This indicates that these four GCMs have more difficulty in simulating low values of Tmin. MRI-ESM2-0 and GFDL-ESM4 overestimate Tmin in the cold season, while AWI-CM-1-1-MR, MPI-ESM1-2-HR, INM-CM4-8, and INM-CM5-0 have relatively large positive biases in the warm season. INM-CM4-8 and INM-CM5-0 underestimate Tmin in winter for the 25th percentile. Large negative biases occurred in November–March for TaiESM1 with biases ranging from − 2 to − 5.62 °C. TaiESM1 has larger biases in winter and for lower quantiles. The standard deviation is slightly higher than the observations. The GCM ensemble mean has large positive biases for the 25th percentile across the year with biases ranging from 1.6 to 4.4 °C. For the 75th percentile, large negative biases are founded in winter and low biases for other months. The standard deviation is underestimated, and positive biases are found for the mean and median for all months. In general, the biases of Tmin are greater than that of Tmax, which implies the GCMs are more skillful in simulating Tmax than Tmin in terms of percentiles.

Fig. 3
figure 3

Mean biases of Tmin (°C) of models for the 25th, 50th, and 75th percentiles, mean and standard deviation (for each pair of plots, the left panel shows the results for the raw models and the right two panels show the results for the QDM and SDM bias-corrected models)

The biases for all percentiles are reduced dramatically for all GCMs corrected by QDM and SDM with a mean bias smaller than 1 °C. The monthly biases of QDM and SDM for all quantiles are similar. This indicates that QDM and SDM exhibit comparable skills in terms of reducing the biases of the raw GCM projections for the whole distribution. QDM and SDM also exhibit similar seasonal patterns for all percentiles, except for October, where QDM and SDM underestimate Tmin for all months. Larger negative biases are found in December–March for all percentiles. The largest negative biases occur in December with a mean bias of − 1.3 °C. The performance of QDM and SDM in warm seasons is better than that in cold seasons. MPI-ESM1-2-HR and TaiESM1 corrected by QDM and SDM have a larger bias in cold seasons than other models. MRI-ESM2-0 bias-corrected by QDM has the best performance with low biases for all percentiles across the year. In terms of standard deviation, QDM and SDM outputs have a slightly smaller variation than observations in summer and autumn. Similar to Tmax, the ensemble means of bias-corrected models have low biases for the mean and median, a large positive bias for the low quantile, and large negative biases for the high quantile and standard deviation. This indicates that the ensemble means of bias-corrected models have poor performance in representing extreme events and variance. The biases of the ensemble mean of bias-corrected models are smaller than the biases of the ensemble mean of GCMs. Compared to all other models, QDM and SDM corrected ensemble means have the best performance with low biases for all quantiles and standard deviation. Overall, all GCMs corrected by QDM and SDM exhibit higher similarity to the observed Tmin than to the raw GCMs. QDM and SDM can significantly improve the prediction performance and have satisfactory skills in capturing the seasonal pattern of Tmin.

3.3 Spatial performance of SDM and QDM

The spatial distribution of the mean seasonal biases of GCMs and bias-corrected models for Tmax and Tmin are shown in Figs. 4, 5, 6, 7, 8, 9. Figure 4 presents the mean Tmax biases of GCMs simulations in summer (JJA) over Canada. Compared to observations, all GCMs have an underestimation in Yukon with a bias greater than 3 °C. Except for CMCC-ESM2, GCMs also underestimate regions with latitudes greater than 80°N, which locates in northern Nunavut. AWI-CM-1-1-MR, CMCC-ESM2, MPI-ESM1-2-HR, and TaiESM1 overpredict Tmax within the region of 120°W-90°W and 50°N-70°N, while other GCMs underestimate Tmax across Canada. Among the 11 GCMs, NorESM2-MM has the largest biases, with positive biases along the southern boundary and negative biases for the rest. The mean absolute error of NorESM2-MM is 4 °C, followed by TaiESM1 at 3.1 °C. The MAE for other GCMs ranges from 1.7 to 2.2 °C. The ensemble mean of GCMs shows negative bias in Yukon, British Columbia, Quebec, and the Arctic and positive bias in the prairies. Except for Yukon and Quebec, the biases across Canada are less than 2 °C.

Fig. 4
figure 4

Spatial distribution of daily Tmax seasonal bias (models minus observation) of the 11 GCMs and observations in summer

Fig. 5
figure 5

Spatial distribution of daily Tmax seasonal bias (models minus observation) of the 11 QDM corrected models and observations in summer

Fig. 6
figure 6

Spatial distribution of daily Tmax seasonal bias (models minus observation) of the 11 SDM corrected models and observations in summer

Fig. 7
figure 7

Spatial distribution of daily Tmin seasonal bias (models minus observation) of the 11 GCMs and observations in winter

Fig. 8
figure 8

Spatial distribution of daily Tmin seasonal bias (models minus observation) of the 11 QDM corrected models and observations in winter

Fig. 9
figure 9

Spatial distribution of daily Tmin seasonal bias (models minus observation) of the 11 SDM corrected models and observations in winter

Figures 5 and 6 present the mean Tmax biases of the bias-corrected simulations using QDM and SDM in summer over Canada. Biases of GCMs are all reduced substantially after the application of QDM or SDM, with an average MAE less than 1 °C. The spatial biases for the QDM and SDM are very similar, which indicates that QDM and SDM exhibit comparable skills in terms of reducing the biases of the raw models. There is an underestimation in the Tundra climate region of Canada, especially in northwest Quebec, and an overestimation over southern Canada. Most bias-corrected models underestimate Tmax over regions with latitude greater than 70°N, while CMCC-ESM2 and EC-Earth3 overpredict Tmax across Canada. The largest bias is found in MPI-ESM1-2-HR, while the rest models present smaller biases with an MAE ranging from 0.6 to 0.8 °C. Compared to raw GCMs, NorESM2-MM shows the largest bias reduction, indicating that QDM and SDM have superior performance in reducing the bias of NorESM2-MM than other GCMs. The ensemble means of QDM and SDM corrected models show a similar spatial bias pattern with QDM and SDM corrected ensemble means with cool bias in the prairies and warm bias in Quebec and the Arctic. Spatial patterns of bias arising consistently across models may be affected by the biases within the observation dataset, NRCANmet. Compared to station observations, NRCANmet has warm biases over the Prairies and cold bias to the west of the Rocky Mountains (Singh et al. 2022; Singh and Reza Najafi 2020). Compared to the GCM ensemble mean, the biases are greatly reduced for those ensemble mean models. In summary, the biases in CMIP6 GCMs for Tmax in summer are greatly reduced by QDM and SDM.

The spatial patterns of Tmin bias in winter (DJF) from the raw GCM models compared against the observations are shown in Fig. 7. Compared to Tmax, the biases of Tmin are larger, with the average MAE greater than 2 °C. This indicates that GCMs have better performance in simulating Tmax in summer than Tmin in winter. TaiESM1 and INM-CM5-0 show a negative bias prevails throughout the country. For the other GCMs, negative biases are found in the middle latitudes and warm biases in southwestern British Columbia. AWI-CM-1-1-MR, EC-Earth3-Veg, MPI-ESM1-2-HR, and MRI-ESM2-0 overestimate Tmin over most regions of Canada. Among the GCMs, TaiESM1 has the largest underestimation and MRI-ESM 2-0 has the largest overestimation over Canada with an average MAE of 4 °C. The average MAEs of other GCMs vary from 1.8 to 2.8 °C. The ensemble mean of GCMs shows a cool bias over the prairies, western Northwest Territories, Ontario, and Quebec and a warm bias in British Columbia and the southern Arctic. Compared with individual GCMs, the bias of the ensemble mean is greatly reduced.

Figures 8 and 9 show the spatial patterns of Tmin bias in winter for QDM and SDM methods. Similar bias distribution patterns and magnitudes are found between QDM and SDM for Tmin in winter. The bias-corrected mean bias is improved dramatically using QDM and SDM for all GCMs, with the mean MAEs of the bias-corrected models varying from 0.56 to 1.76 °C. This indicates that QDM and SDM are effective in terms of reducing the raw model biases of Tmin and exhibit comparable skills with no one method being superior to the other. The spatial pattern of Tmin biases in winter after bias correction is similar for all GCMs, where QDM and SDM corrected models have underestimation over most of Canada. Especially in the 50°N–70°N region, negative biases are found to be greater than 2 °C except for northwest Quebec and eastern Nunavut, where most models present positive biases. The underestimations for high latitude regions are relatively small with an absolute value smaller than 1 °C. Among the models, TaiESM1 has the largest bias while MRI-ESM2-0 has the lowest bias. Compared to the raw GCM, MRI-ESM2-0 also shows the largest improvement with the most significant decrease in bias over Canada. This indicates that QDM and SDM have superior performance in reducing the bias of MRI-ESM2-0 compared to other GCMs and MRI-ESM2-0 has the best performance in simulating Tmin in winter among individual GCMs. The ensemble means of bias-corrected models and bias-corrected ensemble means show similar spatial bias patterns with individual GCMs and have smaller biases across Canada. In summary, QDM and SDM are effective methods that can significantly reduce the biases of the original models for Tmax and Tmin. The bias-corrected results can be used for the regional climate change analysis and climate impact assessment over Canada.

4 Conclusions

This study evaluated the performances of QDM and SDM for generating high-resolution and bias-corrected daily maximum and minimum temperature simulations with 11 CMIP6 GCMs over Canada. First, the overall annual performance of individual GCMs and ensemble means both before and after bias correction was evaluated. The performance of QDM and SDM are then assessed in terms of reproducing the monthly probability distribution and capturing the seasonal spatial pattern of Tmax and Tmin.

CMIP6 GCMs have shown overall consistencies with observations before and after bias correction. QDM-corrected simulations are found to have higher consistencies with Tmax and Tmin observations while SDM shows great ability in preserving raw GCM signals. In terms of monthly performance, most CMIP6 GCMs overestimate Tmin for most months across the year and are more skillful in simulating Tmax than Tmin with lower MAE. GCMs also show better seasonal performance for simulating Tmax in summer compared to Tmin simulation in winter. QDM and SDM can significantly reduce biases for all percentiles and standard deviation of Tmax and Tmin for all CMIP6 GCMs. Both methods show similar skills in removing the monthly biases and capturing the seasonal spatial pattern of Tmax and Tmin. The spatial pattern of Tmin biases in winter after bias correction is similar for all GCMs, where underestimation is found in most of Canada. Among individual GCMs, SDM-corrected EC-Earth3-Veg and QDM-corrected MRI-ESM2-0 are the optimal projections for Tmax and Tmin, respectively. The ensemble means of GCMs, QDM and SDM corrected models perform well at simulating mean and median of Tmax and Tmin, while they overestimate the low quantiles and underestimate the high percentiles and variance. QDM and SDM corrected ensemble means have the best performance of simulating Tmax and Tmin with low monthly biases and capturing the seasonal spatial pattern.

This study made a few novel contributions. Firstly, it shed light on the applicability and capability of bias correction methods in generating high-resolution daily Tmax and Tmin simulations, which provides a reference significance for bias correction studies around the world. The results of the study emphasize the importance and essence of applying bias correction methods before conducting climate change impact assessment with GCM outputs of Tmax and Tmin, especially for those focusing on extreme events and variance. Using the ensemble means of GCMs and bias-corrected models would discard the important information on internal variability that is present in the ensemble spread (Chen et al. 2019). Secondly, it initiated the discussion on identifying the best combination of bias correction methods and CMIP6 GCMs. This study demonstrates promising performance of QDM and SDM corrected models. The identification process used in this study could be applied to other regions across the world and the identified optimal combination could be applied to generate accurate and reliable projections with minimized biases and increase confidence in predicting future changes in temperatures. The reliable high-resolution projections generated will provide a scientific basis for formulating appropriate climate change adaptation and mitigation plans.

A few limitations are also highlighted here for further improvement. Firstly, this study investigates the performance of 11 CMIP6 GCMs with a 100 km resolution available for both historical scenario and three representative future scenarios from low to high emission scenarios. The number of GCMs investigated is limited due to the fine resolution and data availability. Given that new CMIP6 GCMs and RCMs are being released, their ability in simulating Tmax and Tmin remains uncertain. Further studies could consider generating a multi-model ensemble with all available CMIP6 GCMs and RCMs of various resolutions or retaining climate models with good simulation capacity to obtain accurate and reliable projection. Secondly, the representative bias correction methods tested in this study are based on univariate distribution adjustment. Other bias correction methods can also be investigated for the projection of multiple climate variables. Thirdly, limited research has been conducted on the physical reasons for the performance difference between Tmax and Tmin. GCMs are more skillful in simulating Tmax than Tmin both before and after bias correction. Similar results are founded in Egypt and Alberta, Canada (Cheng et al. 2017; Hamed et al. 2021; Masud et al. 2021). Further studies are needed for investigating the reasons responsible for better performance and skills of bias correction methods in preserving the signal and spatial patterns of bias for Tmax than Tmin.