1 Introduction

Extreme wintertime European cold waves (CW) are usually caused by the incursion of cold air masses from the North Pole or Eurasia into the mid-latitudes. European extreme cold waves are decreasing in frequency and severity in recent decades (Twardosz and Kossowska-Cezak 2016; van Oldenborgh et al. 2019; Twardosz et al. 2021), will decrease globally in frequency in a warmer future but will remain present in regions favored by CW such as Europe (Vavrus et al. 2006; Kodra et al. 2011; Sgubin et al. 2019). Actually, the recent decade was punctuated by CW in Europe during February 2012 (Planchon et al. 2015), January 2017 (Anagnostopoulou et al. 2017) and March 2018 (Karpechko et al. 2018). Moreover, the recent trends in Northern Hemisphere winter surface temperatures in past decades have not followed the mean warming trend in Europe (Cohen et al. 2012a, b). It has been proposed that the variability of the wintertime temperature in Europe is large compared to anthropogenic warming, which can hide the decreasing trend signal (van Oldenborgh et al. 2019). However, few studies explore extreme cold events as stated by the AR4, AR5 and AR6 of the Intergovernmental Panel of Experts on Climate Change (IPCC 2007, 2021; IPCC et al. 2013) despite their huge economic and social impacts (such as health issues, energy supply, agriculture, snow tourism or meteo-sensitive industry). Even in a global warming context, extreme cold events in Europe thus deserve much attention: in case of fuel and gas shortages (as the ones in winter 2023 due to the Russian-Ukrainian war began in 2022 (Abnett and Sharafedin 2022)), the negative consequences on the European economy, energy market and public health to cold waves can be magnified (Jahn 2013; Añel et al. 2017). In consequence, the assessment of the extreme cold waves in Europe as simulated by global climate models (GCM) remains critical to consistently model CW in the future.

Furthermore, in terms of physical mechanisms preceding cold waves, many forcings have been evoked in the literature. In general, extreme cold events in Europe result from persistent weather patterns, which are associated with Scandinavian blocking and high amplitude waves in the upper-level flow. In Europe, seasonal predictability of extreme temperature events appears to be low, and more for winter mainly because of surface temperature variability dominated by the chaotic fluctuation of westerly winds (Lavaysse et al. 2019; Kautz et al. 2022). To summarize, three factors can affect the seasonal and intra-seasonal dynamics of surface air temperatures: soil boundary conditions, radiation budget and large-scale circulations. In winter, snow on the ground, with its high albedo, high infrared emissivity, low thermal conductivity and soil-moisture sink, has important effects on the surface radiative budget and can further perturb the hydrologic cycle (Xu and Dirmeyer 2013). Snow-albedo feedback plays an important role in the strengthening, maintenance or severity of winter cold temperatures, locally and in remote regions (Qu and Hall 2006; Wang and Zeng 2010). Eurasian snow cover in October has been suggested to be associated with European cold wintertime temperatures (Cohen and Fletcher 2007; Cohen et al. 2007). The slow-evolving component of the sea surface temperatures is considered in every seasonal predictability meteorological hindcast but provides low skill for winter in general (Palmer et al. 2004; Doblas-Reyes et al. 2013). However, some studies suggest that warming in the Atlantic tropical ocean band before winter can induce a negative phase of the North-Atlantic Oscillation (NAO) and further cold winter temperatures in Europe (Czaja and Frankignoul 2002). Anomalously cold North-Atlantic waters (Cattiaux et al. 2011), specific tropical patterns of SST (Folland et al. 2012), sudden stratospheric warming events (Scaife et al. 2017; Kautz et al. 2020), El-Niño Southern Oscillation teleconnections (Fraedrich and Müller 1992) or fall Arctic sea-ice loss (Honda et al. 2009; Petoukhov and Semenov 2010; Tang et al. 2013) albeit controversial (Blackport et al. 2019) are put forward to drive extreme cold European temperatures. Incoming shortwave radiation is also hypothesized as a potential factor of temperature variability: some modeling studies showed a correlation between harsh winters in Europe and less ultraviolet and/or longwave radiation (Ineson et al. 2011; Scaife et al. 2013). Finally, persisting “blocking” weather conditions and polar or Eurasian origins of cold air masses—closely linked to winds, pressure and geopotential variables—are also suggested as potential drivers of the occurrence of cold waves (e.g. Walsh et al. 2001; Vavrus et al. 2006; Sillmann et al. 2011; Lavaysse et al. 2018, 2019). Even if some case studies of singular cold waves in Europe made an attempt to understand the underlying large-scale mechanisms or land–atmosphere feedbacks (Anagnostopoulou et al. 2017; Kautz et al. 2020; Demirtaş 2022), the broad regional climate picture and the physical mechanisms behind the extreme European cold waves needs further study.

Here, we aim to provide a systematic assessment of the main climate variables associated with CW in the present-day period in reanalysis and identify the biases in global climate models from the Coupled Model Intercomparison Project Phase 5 (CMIP5). Given the abovementioned research gaps, we aim to address the following scientific questions on CW:

  • What are the climatic signals associated with CW and to which extent are they robust?

  • Are global climate models able to reproduce these climatic signals associated with CW, during and before these events?

  • What are the links between present-day model biases in CW characteristics and these climatic signals?

Section 2 describes reanalysis data, model simulations, indices of CW and variables associated with the extremes used in this study. In Section. 3, an analysis of patterns and statistics is presented. We discuss this assessment in light of the model biases and the simulated associated climatic signals. Ultimately, conclusions and perspectives on the poorly known climate signals of CW are presented in Section. 4.

2 Data and methods

2.1 Climatic variables

Using CMIP standard abbreviations, ten climate variables have been selected on a daily timescale: snow cover (SNC) (see Snow cover below), snow water equivalent (SNW), surface downwelling shortwave radiation (RSDS), surface downwelling longwave radiation (RLDS), total cloud fraction (CLT), sea surface temperature (SST), sea level pressure (PSL), 500 hPa geopotential height (ZG), surface eastward (UA) and the northward component of wind velocity vector (VA). For all the climate variables except ZG, the studied sector (called “European” region hereafter) has been defined as [− 10°W, 30°E; 35°, 70°N] and for ZG as North-Atlantic sector [− 80°W, 30°E; 30°N, 70°N], given that quasi-stationary states (so-called “North-Atlantic weather regimes” (Vautard 1990)) of European climate variability can be detected on a larger scale.

Two reanalysis datasets have been used to calculate climate variables anomalies: Rutgers University Global Snow laboratory (hereafter “GSL-NASA”) for snow cover (SNC) (Brown and Robinson 2011) spanning from January 1st, 1967 to March 31st, 2012 and ERA-Interim (Dee et al. 2011) for all the other nine variables in the period 1979–2012. The European gridded observational dataset E-OBS used daily mean surface air temperature (TAS) spanning from 1950 to 2005 (Haylock et al. 2008), to determine days corresponding to CW useful to compute composite anomalies during and before CW of the ten above-mentioned variables (see below Calculation of composite anomalies). For each variable, the maximum number of CMIP5 models available for the historical period (1950–2005) are used (between 17 and 35 models depending on the variables, see Table 1). The first run (called r1i1p1 in all the models except for GISS: r6i1p1) of each historical experiment was arbitrarily chosen.

Table 1 Description of 37 CMIP5 models, reanalysis datasets, climate variables used for historical period (1950–2005) and number of CW (CWN) per model.

We point out some issues in data with little consequences on the results. First, the CLT variable for December 1981 in ERA-Interim is not available: hence, CLT was exceptionally linearly interpolated in time during this month. Second, geopotential variable (zg standard CMIP5 variable abbreviation) for HadGEM2-ES model on the historical period is not available from 1st January 1950 to 31th December 1979. Therefore, we decide to remove the results of this model for the ZG variable. Third, missing land/sea masks in some CMIP5 models (e.g. ec-earth or inmcm4) are reconstructed using ocean variables. Fourth, variables of some CMIP5 models with unspecified grid type are previously bilinearly interpolated based on the finest CMIP5 model resolution (i.e. MIROC4h).

2.2 Snow cover data

Land snow cover is one of the slowly varying components of the climate system. Its relevance in winter is based on its insulating and reflecting properties. As European intra-seasonal cold events are considered in this study, both daily and European scales for snow cover are needed, which is not so common for snow cover datasets. The GSL-NOAA weekly snow cover extent of Northern Hemisphere dataset spanning from 4th October 1966 to 5th August 2013 from Rutgers University has been used with satisfactory results in several climate studies (Brown and Derksen 2013; Brasher and Leathers 2021). A linear interpolation towards the daily timescale has been performed at each gridpoint.

2.3 Definition of the Cold Wave index

No unanimous definition of CW (so-called “Cold Wave index”) exists, but some appropriate properties for observations and climate models assessment should be gathered: filters only extreme persisting cold events but with reasonable sample size, be related to meteorological variables and impact studies (e.g. epidemiological or energy-related studies) and allow adequate model/data comparison. In this study, we use here a definition inspired by Vavrus et al. (2006) that fulfills these properties. We define a CW as the occurrence of two or more consecutive days in winter (November to March, NDJFM) during which the continental mean daily surface air temperature is at least two standard deviations (2σ) below the continental wintertime temperature climatology (over 1950–2005). The value of this threshold at local level is shown in Supplementary Figure S1a, going from 8 °C in Southwestern Europe to − 20 °C in Northeastern Europe, with very few changes across reanalysis datasets (Supplementary Figure S1b). A similar index was already used in several studies (Kolstad and Bracegirdle 2008; Cattiaux et al. 2013; Twardosz and Kossowska-Cezak 2016). Hence, for each dataset, we obtain a catalog of CW and extreme cold days corresponding to these CW, composed of occurrence and temperature anomalies over the 1950–2005 period. This definition allows studying the CMIP5 models or E-OBS dataset variability in their own climate (i.e. with respect to their wintertime climatology and standard deviation).

Note that with this definition, almost all cold waves of very heavy to medium intensity defined by Meteo-France are captured in the period 1950–2005 of E-OBS. Moreover, temperatures below these 2σ thresholds are potentially responsible for an increase of more than 10% of cold-related mortality in Europe (Supplementary Figure S2), according to regional epidemiological studies. This aspect makes it very useful as an impact-oriented and coherent extreme index. The number of CW events is noted hereafter CWN. We also calculate CWF (“Cold Wave Frequency”) as the total number of days associated with CW, CWD (“Cold Wave Duration”) as the duration in days of the longest cold wave of the period and CWI (“Cold Wave Intensity”) as the sum of σ exceeding the 2σ threshold of all days associated with CW.

2.4 Calculation of composite anomalies

For each of the above-mentioned variables, daily wintertime climatologies from November to March (NDJFM) are computed at each gridpoint and daily anomalies of these variables. Finally, at each gridpoint, the composite anomaly of a given variable is calculated during or several days before CW. Such composite anomalies are calculated during (denoted “D + 0”, i.e. during all days characteristics of the CW) and also before the CW at lag 3 (D-3), 7 (D-7), lag 15 (D-15), lag 30 (D-30) and lag 60 (D-60) (respectively, 3, 7, 15, 30 and 60 days before). The computation is made on all the corresponding days before each day of CW except days falling before November. In general, for D-30 (resp. D-60), in E-OBS and CMIP5 models, less than ~ 6% (resp. ~ 15% on average) of days are removed.

Ensemble CMIP5 results are obtained considering the weighted mean by models of the same institute and among all the available CMIP5 models for the given variable (e.g.for SNC, the three MPI simulations considered together have an equal weight compared to the unique institute model CNRM-CM5).

We also test whether composite anomalies are “robust” if (1) a composite anomaly is significant with 95% confidence (Student t test) and, for CMIP5 models ensemble mean, (2) if more than 80% of them exhibit the same sign anomaly. In maps, robust signals are represented through stippled or contoured areas.

Then, the composite anomalies will serve to (1) draw a wider regional climate picture during and before CW, (2) assess global climate models in their capacity to represent this picture and (3) link present-day model biases in associated climate variables to CW and biases in CWN.

3 Results and discussion

3.1 Patterns during and before a cold wave

A systematic overestimation of the number of cold waves is found (+ 47%, CMIP5 vs. E-OBS) in the global climate models. The computation of the number of CW for each CMIP5 model and for E-OBS is shown in Table 1 (last column). The number of selected CW—from continental average temperatures—for the 1950–2005 period varies from 41 to 80 in CMIP5 models compared to 41 events in E-OBS. Moreover, we find that the climate models overestimate the number of extreme cold days associated to CW (by 11%; CWF), the intensity (by 28%; CWI) but underestimate the maximum duration of cold wave (by 40%; CWD) (see Supplementary Figure S3).

With a local definition, the systematic overestimation of CWN is mainly located in Central and Southern Europe (< 50°N), as shown in Fig. 1a–c.

Fig. 1
figure 1

Comparison between the number of local winter cold waves (CWN) in (a) the average of CMIP5 models and in (b) E-OBS observations over the common period 1950–2005; (c) Relative percentage error between the CWN simulated by CMIP5 and E-OBS models (units: %). Grid points with 80% of the CMIP5 models showing the same percentage difference sign are marked with a black symbol. Each model was bilinearly interpolated on the E-OBS grid after computing CWNs

This European bias in CWN could be related to the inadequate representation of physical processes during or before CW. The analysis below through the assessment of 10 climate variables in models and reanalyses aims to clarify, among others, the existing biases of associated variables to CW.

Figures 2, 3, 4, 5, 6, 7 and 8 present the results of composite anomalies for CMIP5 models in Europe and reanalysis datasets for associated variables (SNC, SNW, RLDS, RSDS, CLT, SST and ZG) during CW events. Supplementary Figure S4 shows the comparison of the spatial patterns between CMIP5 ensemble mean (left part) and reanalyses (right part) for each variable (sorted by row) before the CW.

Fig. 2
figure 2

Anomalies of wintertime NDJFM snow cover (SNC) variable associated with CW (D + 0) for CMIP5 models and reanalysis (GSL-NASA here). For CMIP5 and reanalysis, dots symbolize significant anomaly (p-value < 0.05, Student T-test). For ensemble-CMIP5, contour area groups anomalies shared by at least 80% of the models (equal weight by institute), red (resp. blue) for positive (negative) anomalies. Maximum number of climate models available is used (see Table 1). Units: %

Fig. 3
figure 3

Same as Fig. 2 but for SNW (ERA-Interim for reanalysis dataset). Units: mm

Fig. 4
figure 4

Same as Fig. 2 but for downward shortwave radiation (RSDS) (ERA-Interim for reanalysis dataset). Units: W/m.2

Fig. 5
figure 5

Same as Fig. 2 but for downward longwave radiation (RLDS) (ERA-Interim for reanalysis dataset). Units: W/m.2

Fig. 6
figure 6

Same as Fig. 2 but for total cloud fraction (CLT) (ERA-Interim for reanalysis dataset). Units: W/m.2

Fig. 7
figure 7

Same as Fig. 2 but for Sea Surface temperatures (SST) (ERA-Interim for reanalysis dataset). Only a subset of models is shown. Units: °C. Color bars span from − 2 to 2 °C

Fig. 8
figure 8

Same as Fig. 2 but for geopotential height at 500 hPa (ZG) (ERA-Interim for reanalysis dataset). Units: m. Color bars span from − 150 to 150 m

Snow cover during CW is anomalously positive with an average difference of about ~ 12.0% during CW mainly located in Western Europe both for models and reanalysis (Fig. 1). Indeed, in more than 75% of CW cases, a positive anomaly of SNC is visible in this region. Almost all the CMIP5 models satisfactorily and robustly represent during CW on average (~ 11.7%) and the patterns (r = 0.64) of this snow cover anomaly, albeit shifted towards East. Some models as GISS-E2-R give opposite patterns (r < 0) and incoherent results for SNC. This can be linked to its well-known slowdown of the North-Atlantic thermohaline circulation that tends to bring cold air masses to continental Europe (Jacob et al. 2005). Then, CW in this model could be essentially associated with specific atmospheric circulations more than snow amount.

Furthermore, the positive SNC anomaly before CW is robustly present at least 7 days before (Supplementary Figure S4 for the spatial patterns and Fig. 9 for the regional composite averages). A large majority of CMIP5 models, 80% at D-7 and all at D-15, overestimate SNC. This could indicate snow cover slightly too persistent before CW. Although a weak signal, SNC is overestimated across Northwestern Russia and Baltic countries until 30 days before, a region particularly favored by CWN (Fig. 1a–b) and possibly intense snow albedo-temperature feedbacks that enhance it.

Fig. 9
figure 9

Inter-comparison of CMIP5 models vs. reanalysis of spatial average bias (ordinates) and correlation patterns (size of circles and color bar) for each of the 10 climate variables (panels) before and during CW (abscissa). Navy blue squares represent the spatial average composite anomalies of reanalysis (ERA-Interim or GSL-NOAA), circles are for each CMIP5 model and heights of the bar for ensemble-CMIP5 mean (equal weight for each institute). Black uncertainty segment on each bar symbolizes the 10th/90th quantile values of composite anomalies among the CMIP5 models. The size of the circles (CMIP5 model) and the color of the bar (ensemble-CMIP5 mean) indicate the value of the spatial correlation of CMIP5 models patterns compared to corresponding reanalysis ones (see legend of circles and color in upper panels). The dotted line indicates the zero value of the anomaly

Some similar patterns are shared with snow water equivalent SNW (Fig. 3) through a robust continental positive anomaly in Central and Western Europe. Those aspects are well captured by models (r = 0.53 and + 5.3 mm in CMIP5 vs. EOBS with + 7.4 mm of SNW in average). ERA-Interim exhibits a robust positive persistent anomaly before (until D-30) and during CW around the Carpathians mountains. This pattern is not present in any CMIP5 model including the ones with the finest resolutions.

Figures 4 and 5 show the average daily composite anomaly of incoming shortwave and longwave radiation respectively, during CW in Europe. In general, a positive (resp. negative) anomaly is present in SW, especially in Northern (resp. Southern) regions such as Scandinavia and Northern Germany (resp. Mediterranean regions). These patterns are correctly simulated in general by CMIP5 models at D + 0 (Fig. 4 for Ensemble-CMIP5; r = 0.72). Spatially, the gradient of incoming shortwave (difference between RSDS values above and below 47.5°N, coherent with observed patterns at D + 0; see Fig. 9) of ~ 11.4 W/m2 is observed during CW. More than 90% of CMIP5 models tend to underestimate this gradient as it is only about ~ 7.8 W/m2 on average. Patterns are in general consistent with the total cloud fraction composite anomalies (Fig. 6) that suggest in general fewer clouds in northern Europe. For incoming longwave radiation patterns (Fig. 5), results are stronger and more significant as a negative anomaly of about ~  − 20 W/m2 during CW is noticeable.

Northeastern regions of Europe can experience (> 90% of the CW cases in ERA-Interim) robust negative anomalies of incoming longwave radiation during CW, by more than − 50 W/m2 on average (Fig. 5). Moreover, in a vast majority of model results presented in Fig. 5, oceanic areas exhibit a lower deficit of longwave radiation than continental areas. This is also the case in ERA-Interim. This important point suggests the influence of soil conditions during CW and not only the influence of atmospheric circulations. In this way, the pattern of ensemble-CMIP5 in Fig. 5 is instructive as CMIP5 models represent the maximal deficit in the same region of CWN maximum in Europe (Western Russia/Belarus, see Vavrus et al. 2006; Cattiaux et al. 2013; Quesada 2014); see Fig. 1a–b). Thus, less energy is incoming to the soil during and before CW in almost all CMIP5 models which can be linked to more sensitive land/atmosphere feedbacks that enhance or trigger extreme cold events as CW (i.e. more CWN). The sky with fewer clouds leads to a strong diabatic cooling which is consistent with several cold spells in Europe (Kautz et al. 2022). Another explanation is that over the Baltic sea, the formation of clouds is enhanced due to the destabilization of the boundary layer given warmer temperatures of surface water than advected cold air masses. These clouds could lead to an increase in thermal incident radiation at the surface of the Baltic Sea with respect to surrounding continental areas.

Moreover, Fig. 9 shows that this deficit of incoming longwave radiation is still present and robust more than 7 days before CW. As this deficit is stronger in more than 90% of the CMIP5 models than in ERA-Interim, it suggests fewer clouds than observed and probably fewer high clouds (thin and highly radiative). Dipole patterns (deficit of clouds in Northern regions and slightly more clouds around the Mediterranean sea) are almost correctly simulated by CMIP5 models (r = 0.44 at D + 0) but with an underestimation with a quite similar gradient anomaly (see Fig. 9). However, this signal is poorly significant before CW in general.

During and before CW, solar irradiance changes tend to be less important than cloudiness deficit impact as RSDS anomaly is positive but RLDS anomaly is much more negative: + 11.4/ − 19.9 W/m2 in ERA-Interim vs. 7.8/ − 24.4 W/m2 in Ensemble-CMIP5, respectively.

In Fig. 7, European SST composite anomalies patterns are largely and persistently negative during CW, with spatial anomalies of about ~  − 0.33 °C. Until 7 days before CW, near-coastal areas such as the North and Baltic Seas are robustly colder-than-average which is well captured by almost every CMIP5 model (Fig. 7).

However, CMIP5 models underestimate (95% of CMIP5 models during CW) the negative anomaly of SST during and before CW (~ − 0.33 °C vs. − 0.20 °C for ERA-Interim). Roesch (2006) suggests that accurate sea surface temperatures in GCM models are important to reproduce trends in snow cover extent that can further influence temperature extreme distributions. Kushnir et al. (2002) particularly pointed out a more important role of extratropical SST in winter cold temperatures than operational models could capture. Then, this low-frequency phenomenon could trigger or at least help to maintain CW on the continent.

Dynamic circulation representation is studied in Fig. 8 through inter-comparison of patterns of geopotential height composite anomalies at 500 hPa (ZG) during CW.

Composite patterns of ZG, which are more related to CW, do not project exactly on the position of NAO- pattern (Ouzeau et al. 2011) although it shares some similar characteristics of high geopotential height anomalies around Iceland (maximum of 150 m) and negative ones around the Azores anticyclone (minimum of − 160 m). It appears to be a mixture between Blocking (westernmost center of positive anomaly) and NAO- (easternmost center of positive anomaly), two weather regimes that further enhance CW during winter (e.g. Cattiaux et al. 2013). The difference in geopotential height anomalies between the negative (green and blue in Fig. 8) and positive (pink and red in Fig. 8) values is displayed in Fig. 9 (ZG variable).

On the one hand, this difference indicates the weather regimes’ strength associated with CW and allows strength comparison with CMIP5 models. On the other hand, spatial correlation r is very useful to compare the location of maximum and minimum centers. The anomaly of ZG gradient in CMIP5 models is fully comparable to the observed anomaly (less than 5% of difference).

Moreover, patterns of ZG during CW are very well simulated by every CMIP5 model (r > 0.7 for all CMIP5 models). Patterns in ERA-Interim are robustly persistent for more than 7 days preceding CW. Blocking or NAO-–like conditions are robustly appearing in more than 75% of the cases before CW (Supplementary Figure S4). However, 15, 30 and 60 days before CW, patterns of geopotential heights are quite chaotic and non-systematic while CMIP5 models tend to conserve the dipolar pattern for a large majority of them (> 80%) at least 30 days before. These characteristics are shared by PSL composite anomalies, very similar to geopotential patterns (Supplementary Figure S5). Wind speeds and directions (using UA and VA composites, see Supplementary Figure S5) during CW are also well captured by the global climate models.

During CW, patterns of sea level pressure anomalies are zonal with positive (resp. negative) anomalies above (below) ~ 48°N with maximum (minimum) around Northwestern Scandinavia (Mediterranean sea). The same metric used for ZG is used for PSL: positive/negative gradient during CW is around 1625 Pa while CMIP5 models tend to underestimate (for more than 80% of them) this value with 1300 Pa on average. These patterns are less persistent in ERA-Interim (until 7 days before) than in CMIP5 models (Supplementary Figure S4 and Fig. 9). CMIP5 models simulate very well PSL patterns during CW (r > 0.9 except fgoals-g2 with 0.77).

Surface wind speed anomalies are closely linked to dynamical patterns of ZG or PSL. Anomalies of some m/s, with a direction North-East/South-West along blocking pressure structures, bring cold air masses from Siberia, Eurasia or the North Pole (not shown). Ensemble-CMIP5 is very good to simulate these circulations (r ~ 0.8 on average during CW) as well as for PSL and ZG. A lower deficit of eastward surface wind is simulated which can be related to more efficient advection of eastern cold air masses to Western Europe. Models simulate also correctly wind anomalies before CW (as shown in Supplementary Figure S5 and Fig. 9). Seven days before CW (D-7), a strong blocking pattern (gradient of geopotential height is about ~ 85 m) and a regional deficit of − 5 W/m2 in incoming longwave are robustly observed, variables that are particularly linked to CW and among the most robustly present before CW (see Fig. 8). Far before CW (D-30, D-60), no robust patterns of climatic signals exist except for snow conditions with anomalously positive snow height (e.g. around Carpathian regions).

To summarize the results of previous figures, it should be noted that CMIP5 models are satisfactorily able to reproduce spatial patterns: the average spatial correlation by variable during CW is greater than 0.4 and still better for dynamical and radiative-related variables with r > 0.72. In particular, CMIP5 models correctly simulate the sign of anomaly during and before CW (at least 7 days before) as CMIP5 models anomalies for all variables are of the same sign as for ERA-Interim.

Hence, on average, there is no incoherent anomaly sign in CMIP5 models of associated climatic signals to CW but rather some robust underestimations (incoming longwave and shortwave radiation for > 87% and > 90% of models respectively, North–South gradient in sea-level pressure PSL for > 83% of models, snow water equivalent for > 78% of models) or overestimations (European sea temperatures SST for > 83% of models) during CW that impact the radiative balance at the surface and then the wintertime surface temperatures (Fig. 9). There are also important processes that influence the simulated number of CW as incoming energy, through longwave and shortwave radiation, is underestimated several days before.

To give more insights into potential climatic signals of CW, we calculate the ratio (R) between the sum of the climate variable anomalies after CW (combinations of D + 3, D + 7, D + 15, D + 30, D + 60 days; numerator) vs. the sum of the climate variable anomalies before CW (combinations of D-3, D-7, D-15, D-30, D-60 days; denominator) for climate models and reanalysis (see Table 2). In this way, for a given variable, R <  < 1 indicates that associated climate variable anomalies rather precede CW than follow them, which suggests a potential control on the occurrence of these extreme events. We calculate the ratio for different timesteps to test the systematic aspect of the signal (e.g. R7-60 represents the ratio between the sum of the climate variable anomalies 7, 15, 30 and 60 days after vs. before CW). For instance, a R3-60 equal to 0.22 for the variable VA means that the meridional wind anomalies more than 3 days after the cold waves are on average 22% of the ones before the cold waves (i.e. 4.5 times less important).

Table 2 Ratio (R) between the sum of climate variable anomalies after CW and the sum of climate variable anomalies before CW for ERA-Interim and the ensemble-CMIP5 mean. RX-60 represents the ratio taking into account the sums of climate variable anomalies X days before/after until 60 days before/after. R is dimensionless and can vary from 0 to infinite. R lesser than 1 (i.e. climate anomalies precede more than follow CW) display a potential early warning signal and are highlighted in bold

Table 2 shows that, first, the European Snow cover and snow depth are unlikely to have caused occurrences of CW because the anomalies before are much lower than the ones following these events (factor 3 to 30; columns SNC and SNW for ERA-interim). All the more so as CMIP5 models underestimate the winter snow cover in Europe, particularly from December to February (Supplementary Figure S6). Second, to a lesser extent, sea surface temperatures and solar irradiance anomalies are also unlikely to cause CW occurrences (factor 2 to 6; see Table 2) which tends to reduce the role of the surrounding ocean or direct solar forcing as a substantial driver of CW occurrences. However, third, the advection of cold air masses, mainly meridional (VA), emerges as a potential precursor signal of a majority of CW (as R < 1 for all given timesteps; see values of the column VA for ERA-Interim and ensemble-CMIP5). This latter feature is consistent with studies using other methodologies (Bieli et al. 2015; Sousa et al. 2018). These aspects are represented by CMIP5 models in general although they do not tend to simulate enough climate anomalies after CW compared to before CW. To a lesser extent, pressure and geopotential at 500 hPa positive anomalies slightly more precede than follow CW (columns PSL and ZG are < 1 in general).

3.2 Present-day biases in associated variables to CW vs. CWN

To go further in explaining CW bias (see Section. 3.1), we study here the relationship between CWN and associated climate variables to CW. Figure 10 displays the bias (CMIP5 model minus reanalysis) between each associated variable and CWN bias. Note that the number of cold wave (CWN) is calculated considering the daily spatial average temperatures at which we apply the Cold wave index (see Table 1). Significant correlations are obtained for local land conditions (snow water equivalent SNW) and dynamical variables (ZG and PSL metrics and surface wind speed UA, VA). Although quasi-systematic underestimation of RLDS, a model that reduces this bias w.r.t. ERA-Interim does not seem to reduce CWN bias. For dynamical conditions, correlation is coherent: strengthened pattern of blocking or North–South gradient in pressure sea level anomalies (increasing ZG and PSL), which goes along with enhanced North-Easterly circulations (decreasing UA and VA), in turn leading to more extreme CW and higher CWN. Present-day biases in the number of CW appear to be strongly associated with biases in snow height, geopotential (or sea level pressure) or winds.

Fig. 10
figure 10

Scatterplot of present-day bias in composite anomalies during CW (D + 0) of climate variables (each panel: CMIP5 model bias minus reanalysis bias) vs. Cold Wave Number Bias (CWN). The coefficient of correlation is indicated in each panel; bold indicates significant correlation to the 95% confidence level. Dotted lines represent zero value of bias. Values are indicated by model names

Finally, the influence of the model resolution on the representation of climate variables associated with CW was also tested (not shown). We define a basic metric to test each model, i.e. the sum of absolute normalized biases for each variable. Only models with at least 8 variables were used.

A slight positive impact of model resolution on results is found—i.e. better resolution tends to give better CW statistics—in general. For the following [lower/higher resolution model version] couples of the same institute models, the ranks of the abovementioned metric across all the models give: 5/3 for [mpi-esm-lr/mpi-esm-mr], 14/1 for [miroc5/miroc4h], 4/13 for [bcc-csm-1/bcc-csm-1–1-m], 12/6 [cmcc-cesm/cmcc-cm], 16/11 [gfdl-cm3/gfdl-esm2g] and 19/18 for [ipsl-cm5b-lr/ipsl-cm5a-mr]. Thus, in five cases out of six, a rank improvement is effectively noted while model resolution is improved. This is in phase with Cattiaux et al. (2013) showing, in the same model differing only by the grid, that CWN averages and patterns are slightly improved.

Furthermore, it is worth noting that, in general, models that correctly simulate the variable anomalies 3 days before the CW are also good to simulate it during the CW (model ranks at D + 0 and D-3 are significantly correlated: r ~ 0.72, p value < 0.05).

4 Conclusions

During and before extreme cold waves in Europe (CW), dynamical, radiative, atmospheric and land/sea conditions variables are assessed in global climate models and reanalysis. We find a systematic overestimation of the number of extreme cold waves (CWN) by the global climate models. This is coherent with the results on CMIP3 models (Quesada 2014) and has been investigated in Cattiaux et al. (2013) through the specific impact of IPSL model resolution only. We study whether models and reanalyses identify common robust climatic features before those weather events to contribute to their adequate predictability. Moreover, some potential explanations for the CWN biases are studied.

Several key results have been found to answer our three scientific questions mentioned in the Introduction.

First, about the identification of the robust climatic signals associated with the European extreme winter cold waves, we find that they are on average associated with + 12% of snow cover mainly in Central and Eastern Europe, + 7 mm of snow water equivalent maximum in Central Europe, an increase in solar irradiance mainly in Northern Europe largely compensated by a strong deficit in continental incoming longwave (presumably linked to less high clouds), slightly cooler (− 0.3 °C) coastal oceanic temperatures and a mixture of strong Blocking and NAO- pattern accompanied by above-than-normal north-easterlies circulations (> 1 m/s) bringing polar or Eurasian cold air masses. During and before CW (until D-15), solar irradiance anomalies are positive on average (< 10 W/m2) which contradicts arguments of direct solar negative forcing for most extreme cold events in winter. These anomalies tend to be less important than the longwave deficit impact (− 20 W/m2). Thus, a net deficit of incoming radiation is observed mainly located in North-Eastern continental regions suggesting negative soil-atmosphere feedbacks that reinforce the number of CW.

Second, about the ability of the climate models to reproduce the climatic signals associated with CW, we find that they correctly simulate the sign of anomaly during and before CW (at least 7 days before, D-7). They also correctly reproduce the corresponding spatial patterns: the average spatial correlation by variable during CW is greater than 0.4 and for dynamical and radiative-related variables r > 0.7.

Third, about the potential explanations of the model biases, our results suggest that the robust underestimations (incoming LW and SW, North–South gradient in sea-level pressure PSL and snow water equivalent) and overestimations (European sea temperatures SST) during CW tend to increase the occurrence and/or severity of extreme wintertime temperatures.

In conclusion, this pioneering study provides a regional climate picture during and before CW in Europe and can help as an observational basis and benchmarking model diagnostics for further seasonal to intra-seasonal predictability studies. We stress that the diagnostics shown here do not per se demonstrate causality between a particular variable and the occurrence of CW but rather provide a consistent bundle of new evidence. That is why it highlights the need for complementary sensitivity experiments with climate models that investigate the relative impact of each climate variable and quantify feedbacks that trigger or inhibit cold wave development in Europe. Even more in a context of fossil fuels scarcity and increased uncertainty due to geopolitical events (IEA 2022), it is crucial to study the sensitivity of the energy, health and agriculture sectors to compound extreme weather and socio-economic events (e.g. a colder-than-average winter combined with low hydrological reservoirs puts electricity offer at risk in Europe) in order to better design structural, social and technical solutions.