1 Introduction

Extratropical explosive cyclones are rapidly intensifying low pressure systems causing strong winds, extreme precipitation, and high waves (Sanders and Gyakum 1980). These so called meteorological bombs affect the livelihoods and infrastructure of primarily coastal and marine environments, and have been the cause of fatalities and billions of $USD property damage (e.g. Kocin et al. 1995; Fink et al. 2012; Liberato 2014). The primary mechanism for cyclogenesis in the extratropics is baroclinic instability, where small perturbations convert the available potential energy of the mean zonal flow into the kinetic energy of the transient eddies (Chang et al. 2002). This requires strong meridional temperature gradients, implying strong vertical shear due to the thermal wind relationship, and weak static stability. Other conditions favoring especially rapid cyclogenesis include large moisture availability, and large-amplitude waves in the jet stream (Stull 2000). All these conditions are found during the cold season in the vicinity of the warm currents of the Kurushio and the Gulf Stream, where most of the bomb genesis of the Northern Hemisphere (NH) takes place (Sanders and Gyakum 1980; Roebber 1984). Here, cold air masses move over the northern edge of the warm ocean currents, allowing for strong evaporation and heat transfer from the sea surface (Stull 2000). Cloud condensation releases latent heat, warming the air in the column, and causing it to diverge horizontally aloft, which in return lowers mean sea level pressure (MSLP). The jet stream can enhance this horizontal divergence when the surface low is accompanied by an upper-level trough located to the west and by an upper-level ridge located the east of the surface low. Wind speeds are slower around troughs because the pressure gradient and the centrifugal force act in opposing directions. Around ridges on the other hand, both forces act in the same direction, leading to faster wind speeds. The difference in wind speeds enhances horizontal divergence above the region of the surface low, lifting air from below, and further lowering MSLP. This deepening can therefore increase with the amplitude of the meanders of the jet stream. Finally, upper-level divergence is also evoked by the jet streak, i.e. the relative maximum of wind speed within the jet stream. Since the actual wind speed in the exit region of a jet streak exceeds the geostrophic wind speed, an ageostropic wind to the south is formed, which further causes upper-level divergence in the left exit region of the jet streak (Stull 2000). The most intense storms are usually found near to the semipermanent Aleutian and Icelandic Lows (Lambert 1996). Given the potential destructive power of meteorological bombs, a better understanding of the possible impacts of climate change on these events is of great societal relevance. Such an analysis requires a thorough evaluation of CMIP5 model biases with respect to explosive cyclone frequencies and intensities for present-day climate, as well as a better understanding of the underlying causes of model biases.

Various algorithms have been used to identify and track extratropical cyclones in observational reanalysis data and climate model simulations. Most approaches are based on vorticity and/or MSLP, and further differ in the cyclone identification procedure, threshold values for filtering out noise, and algorithms used to combine the cyclone centers into tracks (Neu et al. 2012). Despite all these differences, results are consistent for strong cyclones when applying different algorithms to the same reanalysis data (Neu et al. 2012). Numerous authors have analyzed the climatology of explosive cyclones in different reanalysis products (e.g. Lim and Simmonds 2002; Allen et al. 2010; Black and Pezza 2013). All studies agree with the earlier findings from Sanders and Gyakum (1980) on high bomb frequencies in the northwest Pacific and the North Atlantic during the winter months. Allen et al. (2010) found good agreement on bomb frequency between different reanalysis products, including the National Centers for Environmental Prediction reanalysis-2 data (NCEP2), the Japanese 25-year Reanalysis (JRA-25), and the 40-year European Centre for Medium-Range Weather Forecast (ECMWF) reanalysis ERA-40. Contrary to an earlier analysis by Lim and Simmonds (2002), Allen et al. (2010) showed that there are no significant trends in bomb frequencies between 1979 and 2008.

The previous generation of climate models of the third phase of the coupled model intercomparison project (CMIP3) reproduces the general structures of the observed climatological storm-track pattern under present-day climate (Ulbrich et al. 2008). However, these models tend to slightly underestimate cyclone frequency, and show considerable inter-model variability with respect to intense events (Lambert and Fyfe 2006). The more recent generation of climate models of the fifth phase of the coupled model intercomparison project (CMIP5) generally underestimates the intensity of extratropical cyclones in the NH when compared to the ECMWF ERA-Interim reanalysis data (hereafter ERA-INT) (Zappa et al. 2013a). Corresponding runs from the atmospheric model intercomparison project (AMIP) show that the intensity bias is only weakly related to biases in SST. The tilt of the jet stream is found to affect the intensity bias mainly in the Norwegian Sea. However, the intensity bias has a hemispheric character, and cannot be simply attributed to the representation of the North Atlantic large-scale circulation (Zappa et al. 2013a).

Single-model and multi-model studies generally agree that CMIP3 and CMIP5 climate models tend to project a decline in the frequency, and a weak polar shift of extratropical cyclones in the Northern Hemisphere as a response to anthropogenic climate change (e.g. Bengtsson et al. 2006; Lambert and Fyfe 2006; Finnis et al. 2007; McDonald 2011; Chang et al. 2012; Zappa et al. 2013b). However, changes are also subject to important regional differences, including an increase in the number of cyclones in the vicinity of the British Isles, central Europe, and the Bering Sea (Bengtsson et al. 2006; Pinto et al. 2009; Ulbrich et al. 2008; McDonald 2011; Zappa et al. 2013b). Contradictory conclusions however exist with respect to projections of extreme extratropical cyclones, ranging from a reduction (Zappa et al. 2013b) to an increase (Lambert and Fyfe 2006) under a warming climate. Chang (2014) showed that these differences arise from the use of different methods, where an increase of strong cyclones is projected only when identifying cyclones based on a threshold value for core pressure, due to a projected deepening of the climatological mean Aleutian low.

While previous studies have already addressed extratropical cyclones in CMIP5 climate models, an analysis focusing on explosive cyclones is to our knowledge still missing. The recent discussion with respect to extreme cyclone projections mentioned above underlines the necessity for such study. Also, the current understanding of model biases of extreme cyclones is still very limited, and requires additional research. The purpose of this study is therefore to evaluate how well CMIP5 models reproduce explosive cyclones in the extratropics of the Northern Hemisphere, and to identify the main causes of model biases. A follow-up study will then address the possible impacts of climate change on meteorological bombs, identify the main mechanisms of change, and assess how projections are related to model biases presented in this study.

The following section presents our data and approach used for identifying and tracking explosive cyclones, and for relating simulated bomb characteristics with biases in the global circulation. Section 3 first shows explosive and non-explosive cyclone tracks computed from ERA-INT, and illustrates the synoptic environment associated with a historic meteorological bomb. We then evaluate how well CMIP5 models reproduce the frequency and intensity of bombs, and relate these findings to biases in SST gradients, jet stream, and lower tropospheric baroclinicity, as well as to horizontal and vertical model resolution. Section 4 elaborates on the main causes of model biases and the role of model resolution, and discusses the use of two related definitions used for explosive cyclones.

2 Methods

Our data is obtained from 3 reanalyses products (ERA-INT, NASA-MERRA, NCEP-CFSR) and 25 CMIP5 climate models listed in Tables 1 and 2, respectively. CMIP5 model runs belong to the historical period forced by all observed atmospheric composition changes (Taylor et al. 2012). With one exception, all model runs belong to the first model ensemble (r1i1p1). Runs from the model CCSM4 belong to the ensemble r6i1p1, as the first ensemble was not available for this model at the time of writing. About half of all models have high model tops with a more realistic representation of the stratosphere and an upper model boundary above the stratopause at 1 hPa (Charlton-Perez 2013). Variables include zonal (u) and meridional (v) wind components at 850 hPa (6 hourly and daily), 700 hPa (daily), and 250 hPa (monthly), 6 hourly mean sea level pressure (MSLP), monthly mean sea surface temperature (SST), and daily values of geopotential height (z), air temperature (T), and specific humidity (q) at 850 and 700 hPa. All data cover the period from 1980 to 2005. Extratropical cyclones are identified using the objective-feature tracking algorithm TRACK (Hodges 1999). TRACK first computes relative vorticity (\(\zeta _{850}\)) from the 6 hourly zonal and meridional wind components at 850 hPa:

$$\begin{aligned} \zeta _{850}=\frac{{\partial }v_{850}}{{\partial }x}-\frac{{{\partial }}u_{850}}{{{\partial }}y}. \end{aligned}$$
(1)
Table 1 Reanalysis data, their corresponding resolutions (number of grid cells in the zonal (x), meridional (y), and vertical (z) direction of the atmospheric model component), references, institutes, and countries
Table 2 Same as Table 1 but for CMIP5 models with additional information on the upper model boundary

It then remaps vorticity to a common T42 resolution (128 × 64), and identifies cyclone centers from the maximum of T42 vorticity. The paths of these centers are tracked for cyclones that exceed (1) a vorticity of \(10^{-5} \hbox { s}^{-1}\), (2) a life time of 2 days and (3) a propagation of 1000 km. The associated maximum wind speeds are computed by searching for the maximum wind speed within a given radius around each cyclone center. Our radius is set to 900 km, which corresponds to the upper end of cyclone radii over oceans reported by Rudeva and Gulev (2007). Explosive cyclones are identified as cyclones with a deepening rate exceeding 1 bergeron (b) (Sanders and Gyakum 1980):

$$b=\frac{24hPa}{24\,h}\frac{sin\left( \theta \right) }{sin\left( 60\right) },$$
(2)

where \(\theta\) is the latitude of each cyclone center. The MSLP is extracted at the center of the cyclone, and the deepening rates are computed for each time step by comparing the MSLP of cyclone centers which belong to the same track and are 24 h apart. The pressure drop between these two cyclone centers is scaled according to the latitude of the second cyclone center.

The performance of each CMIP5 model is compared to results from all three reanalysis data, with special focus on ERA-INT. We choose ERA-INT as our primary reference because the frequency of explosive cyclones computed from ERA-INT data lies between the corresponding frequencies computed from the other two reanalysis products NCEP-CFSR and NASA-MERRA. Also, the horizontal and vertical resolution of ERA-INT lies between the resolutions of NCEP-CFSR and NASA-MERRA.

To better understand the possible causes behind model biases, we compare the simulated bomb frequencies and intensities with the synoptic environment favoring bomb genesis as outlined in Sect. 1. As a first step we assess whether the interannual variability of bomb frequencies and intensities are indeed affected by meridional SST gradient, jet stream speed and meander, and lower tropospheric baroclinicity in CMIP5 models. The meridional SST gradient is computed from monthly mean SST values from October to March, and is defined as the difference of SST per \(2.8125^\circ\) latitude moving from south to north, where \(2.8125^\circ\) corresponds to the approximate meridional spacing of grid cells with a T42 resolution. The months October to March are chosen as they correspond to the months with highest bomb frequencies. The jet stream speed is approximated by the monthly mean zonal wind speed at 250 hPa \((u_{250})\), while the corresponding meander is represented by the absolute value of monthly mean zonal wind speed at 250 hPa \((|v_{250}|)\) both from October to March. Using the absolute values ensures that an underestimation of the jet’s amplitude results in a negative, and an overestimation in a positive bias when compared to reanalyses data. Averaged over multiple years, \(|v_{250}|\) may also be interpreted as a measure of the amplitude of stationary waves. Baroclinic instability is measured by the Eady growth rate \((\sigma _{BI})\) with units in \(\hbox {s}^{-1}\) (Hoskins and Valdes 1990):

$$\begin{aligned} \sigma _{BI}=0.31f\left| \frac{\partial \mathbf {v}}{\partial z}\right| N^{-1} \end{aligned},$$
(3)

where N is the Brunt-Väisälä frequency (Stull 2000):

$$\begin{aligned} N=\sqrt{\frac{\left| g\right| }{T_{v}}\left( \frac{\partial T_{v}}{\partial z}+\Gamma _{d}\right) } \end{aligned},$$
(4)

where

$$\begin{aligned} T_{v}=T*(1+0.61*r) \end{aligned},$$
(5)

and

$$\begin{aligned} r=q/(1-q) \end{aligned},$$
(6)

where f is the Coriolis parameter, v is the horizontal wind speed, z is the height, g is the gravitational acceleration, \(\Gamma _{d}\) is the dry adiabatic lapse rate, \({T_{v}}\) is the virtual temperature, T is the air temperature, r is the water vapor mixing ratio, and q is the specific humidity. The Eady growth rate is computed for daily values between 850 and 700 hPa, and is then averaged for the months October–March. No values are computed for the models HadGEM2-ES, INM-CM4, and MIROC4h due to missing data at the time of writing.

To assess how the SST gradient, the jet stream, and the Eady growth rate affect the interannual variability of bomb frequencies and intensities, we compute correlation matrices for each individual model where each correlation uses spatially averaged annual values derived from October to March of 1980 to 2005, resulting in a sample size of 26 spatially averaged values. These matrices are then summarized in a single matrix by averaging positive and negative correlation coefficients separately, and by counting the number of models with positive and negative correlations. This is done for all correlations, as well as for significant correlations at the 5 % level only.

In a second step we assess whether these variables can further explain biases in bomb frequencies and intensities. For this purpose we summarize the relationships between all variables in a correlation matrix, where each model is represented by a single value. This value represents a variable such as vorticity averaged across a region and across time (1980–2005). In addition, we assess if model biases are affected by horizontal and vertical model resolution. The horizontal model resolution is expressed as the number of grid cells of the model’s physics grid, while the vertical resolution is taken as the total number of atmospheric model levels. The sample size of this correlation matrix equals the number of CMIP5 models participating in the correlation matrix. This number is reduced from the original 25 CMIP5 models to a subset of 18 models because 3 models lack data for computing the Eady growth rate at the time of writing (HadGEM2-ES, INM-CM4, and MIROC4h) and 4 models cannot really be considered to be independent. The latter concerns the model pairs (1) GFDL-ESM2M and GFDL-ESM2G, (2) MPI-ESM-LR and MPI-ESM-P, (3) MRI-CGCM3 and MRI-ESM1, and (4) MIROC-ESM and MIROC-ESM-CHEM. From these 4 pairs we exclude GFDL-ESM2M, MPI-ESM-LR, MRI-CGCM3, MIROC-ESM, and test the robustness of the result for all possible alternative combinations of models resulting from the arbitrary exclusion of either one or the other member of model pairs.

3 Results

3.1 Storms in ERA-INT

Figure 1 illustrates the location of all cyclone tracks, bomb tracks, and their explosive segments computed from ERA-INT for the period 1980–2005. While non-explosive cyclones are present in all non-mountainous regions, the vast majority of explosive cyclones is located in the northern Pacific and Atlantic, as well as in the lee-side of the Rocky Mountains. The average cyclone has a vorticity of 4 × \(10^{-5}\) \(\hbox {s}^{-1}\) and a wind speed of 18 m \(\hbox {s}^{-1}(65\; \hbox {km}\,\hbox {h}^{-1}),\) while the average bomb shows a much higher vorticity and wind speed of \(9\times10^{-5}\) \(\hbox {s}^{-1}\) and \(33\,\hbox {m } \hbox {s}^{-1}\; (119 \hbox { km h}^{-1}),\) respectively (Fig. 2). Only 1.2 % of all cyclone centers in the extratropics of the NH exceed a deepening rate of 1 bergeron. The frequency of explosive cyclones decreases exponentially with increasing deepening rates. Our method identifies an average of 119 tracks per year from 1980 to 2005 in the extratropics of the NH (ERA-INT), with an interannual standard deviation of 7 explosive cyclone tracks.

Fig. 1
figure 1

a All cyclone tracks, and b bomb tracks (blue) with explosive segments (red) computed from ERA-INT from 1980 to 2005. Darker shades indicate higher track densities. Also shown are the boundaries of the northern Pacific and northern Atlantic used in this study

Fig. 2
figure 2

Frequencies of a cyclone vorticity, b wind speed, and c deepening rate computed from ERA-INT for the extratropics of the Northern Hemisphere from 1980 to 2005. Grey and blue shades present all cyclones and explosive cyclones, respectively

Figure 3 provides an historic example of an explosive cyclone track and its associated synoptic environment along the east coast of North America around March 13, 1993 (Kocin et al. 1995). As MSLP dropped from about 1010 to 965 hPa, vorticity increased from \(4\times 10^{-5}\) to 16 × 10−5s−1, and 850 hPa wind speed rose from 15 to \(50\,\hbox {m s}^{-1}(54 \hbox { to } 180 \hbox { km h}^{-1})\) (Fig. 3c). The development of the storm was accompanied by strong positive SST anomalies in the Gulf of Mexico (not shown), providing a strong source of energy for storm intensification (Gilhousen 1994). The Eady growth rate showed strong lower tropospheric baroclinicity of up to \(3.25 \hbox { day}^{-1}\) (Fig. 3b). The wind speed at the jet streak was as high as \(45 \hbox { m } \hbox { s }^{-1} (162 \hbox { km } \hbox { h }^{-1})\), and was located at the trough of the jet stream, as denoted by the 250 hPa geopotential height contours (Fig. 3d).

Fig. 3
figure 3

Historic example of an explosive cyclone computed from ERA-INT around March 13, 1993 (“Storm of the Century”; Kocin et al. 1995). a The 6 h cyclone centers computed from ERA-INT and the monthly mean meridional SST gradient from March 1993, b Eady growh rate on March 12, 1993, c different measures of storm intensities for each time step, and d wind speed and geopotential height contours for 250 hPa on March13, 1993. The wind speed plotted in (c) is extracted within a 900 km radius around each cyclone center, denoted by the shaded polygon in (a). The Eady growth rate in (b) is masked out in regions where the surface elevation exceeds 800 msl (hatching)

3.2 Bomb frequencies

Figure 4 shows the annual mean number of cyclone centers per grid cell (200 × 200 km) which belong to explosive segments of cyclone tracks for (a) the multi-reanalyses and (b) the multi-model-mean. The multi-model-mean accurately reproduces the spatial pattern of annual mean bomb frequency when compared to results from multi-reanalysis data \((R = 0.94, p\,\hbox {value} < 10^{-3})\), with high frequencies in the northern Pacific and Atlantic (Fig. 4a, b). However, the multi-model-mean underestimates bomb frequency by 22 % in the Pacific, and by 31 % in the Atlantic region on average (Fig. 4c). Looking at each model individually shows that most models underestimate bomb frequency significantly in both regions (18 in the Pacific, and 19 in the Atlantic out of 25 models) by about a third on average, and by 65 % in the worst case (Fig. 5). This negative frequency bias also applies to extreme cyclones with high vorticities exceeding \(10 \times10^{-5} \hbox {s}^{-1}\) (Fig. 6). A similar finding is made for all cyclones, however, biases are much smaller with values ranging between −12 to +14 %, and the sign does not always coincide with the sign of the explosive cyclone bias.

Fig. 4
figure 4

Multi-reanalysis mean (left column), multi-model mean (central column), and multi-model bias (right column ) of (ac) annual mean bomb frequency, and (df) annual mean bomb vorticity. Regions of low bomb frequency (<0.5 cyclone centers per year in reanalyses) are masked out in the vorticity plots (df)

Fig. 5
figure 5

Biases of bomb frequency, vorticity, wind speed, and deepening rate, and biases of climatological mean meridional SST gradient, \(u_{250}\), \(|v _{250}|\), and Eady growth rate computed for October to March, 1980–2005. All values are relative to ERA-INT, and are given for the Pacific (top rows) and Atlantic region (central rows). Negative meridional SST gradient biases indicate an underestimation, meaning a less negative SST gradient. Values which are significantly larger or smaller are denoted with plus and minus symbol, respectively (two-sample Wilcoxon test, 5 % level; R Core Team 2013). Also shown are the relative differences between the horizontal and vertical resolutions of CMIP5 models or reanalyses data when compared to ERA-INT (bottom rows). Models are sorted by bomb frequencies in both ocean basins

Fig. 6
figure 6

Relative cyclone frequency biases in the extratropics of the Northern Hemisphere when compared to ERA-INT. Biases are shown for all cyclones, explosive cyclones, and extreme cyclones with vorticities exceeding \(10 \times10^{-5} \hbox { s}^{-1}\)

The spatial correlation coefficients between simulated and ERA-INT annual bomb frequencies in the North Pacific and Atlantic region range from 0.40 to 0.85, with models that simulate higher bomb frequencies (shaded in red) tending to perform better than models with fewer bombs (shaded in blue) (Fig. 7a). The corresponding spatial standard deviations show a similar pattern, with stronger biases for models with fewer bombs (σ = 0.20 to 0.50). The respective root mean square errors (RMSE) range from 0.25 to 0.40. The multi-model mean frequency outperforms all individual models with respect to the correlation coefficient \((R = 0.90)\) and RMSE \((\hbox {RMSE} = 0.20)\), but underestimates the standard deviation with a \(\sigma\) equal to 0.30. The seasonal cycle is well captured in both the North Pacific and North Atlantic basins, with high frequencies during the winter, and low frequencies during the summer (Fig. 8).

Fig. 7
figure 7

Taylor plots of time-averaged bomb (a) frequency and (b) vorticity of CMIP5 models and reanalysis data for the Pacific and Atlantic region. CMIP5 and reanalysis data are denoted with letters and numbers, respectively. The mean bomb frequency increases from blue to red shades

Fig. 8
figure 8

Quantiles (5, 25, 50, 75, 95%) of monthly bomb frequencies of CMIP5 models (blue) and reanalyses data (red) in the a Pacific and b Atlantic region

In summary, most models accurately reproduce the spatial distribution of bombs when compared to results from reanalysis data, with high frequencies over the northern Pacific and Atlantic. Most models however significantly underestimate bomb frequencies, by about a third on average and by up to 65 % in the worst case. Models with higher bomb frequencies tend to show more realistic spatial patterns of bomb distributions than models with too few bombs. The seasonality of bombs is well captured, with high frequencies during the cold season, and low frequencies during the warm months.

3.3 Bomb intensities

The evolutions of MSLP, vorticity, and wind speed through the life cycle of an average explosive cyclone track are similar among CMIP5 models and reanalysis data (Fig. 9). As in reanalyses, CMIP5 bombs rapidly deepen until they reach their lowest central pressure around the third day of their life, and experience a slow increase of pressure from that point onwards. Vorticity and wind speed behave correspondingly, with a rapid increase during the first three days, followed by a slower decay.

Fig. 9
figure 9

a MSLP, b vorticity, and c wind speed during the life time of an average explosive cyclone track computed for each reanalysis and CMIP5 model data set

Figure 4 shows the annual mean vorticity of cyclone centers per grid cell (200 × 200 km) which belong to explosive segments of cyclone tracks for (d) the multi-reanalyses and (e) the multi-model-mean. The multi-model-mean reproduces the spatial patterns of bomb vorticities reasonably well when compared to results from multi-reanalysis data \((R = 0.91, p\,\hbox {value} < 10^{-3})\), with strong vorticity in the vicinity of the Aleutian Low and the Labrador Sea (Fig. 4d, e). However, these values are still too low when compared to reanalysis data, especially in the regions of the Aleutian and Icelandic Low (Fig. 4f). Similar patterns are found for wind speeds associated with bombs (not shown).

On average, CMIP5 models underestimate bomb vorticity by \(0.12 \times 10^{-5} \hbox { s}^{-1}\) and by \(0.21\times10^{-5} \hbox { s}^{-1}\) in the Pacific and Atlantic region, respectively. Looking at each model individually shows that 9 out of 25 models significantly underestimate vorticity in the North Pacific by up to 12 %, or \(0.92 \times 10^{-5} \hbox { s}^{-1}\) (MIROC5), while 6 models significantly overestimate vorticity (Fig. 5). A stronger signal is evident in the Atlantic, where 13 models significantly underestimate bomb vorticity by up to 14 %, or \(1 \times 10^{-5} \hbox { s}^{-1}\) (INM-CM4), while 4 models overestimate vorticity. Deepening rates behave correspondingly, with 17 models significantly underestimating the deepening of explosive cyclones in both ocean basins, and only 5 and 3 models overestimating the deepening in the Pacific and Atlantic basins, respectively. Patterns of wind speed biases are inconsistent with the biases described above, with 16 models overestimating, and 7 models underestimating wind speed in the Pacific. This inconsistency may be related to the large size of the search radius (900 km) applied to identify maximum wind speeds associated with a cyclone center. In the Atlantic, the number of models under and overestimating wind speed is roughly equal.

The individual models show fairly poor spatial correlations of the vorticity of bombs in the northern Pacific and Atlantic when compared to ERA-INT \((R = 0.3 \hbox { to } 0.6)\), which is consistent with the previous finding that models lack high vorticities near the Aleutian and Icelandic Low. As for bomb frequency, models with few bombs (blue shades) generally perform worse than models with high bomb frequencies (red shades) (Fig. 7b). The corresponding RMSEs are between 1.6 and 2.2, while standard deviations are fairly accurate for the majority of models \((\sigma = 1.7)\). The multi-model mean vorticity outperforms all individual models with respect to spatial correlations \((R = 0.7)\) and RMSE \((\hbox{RMSE} = 1.3)\), but underestimates the standard deviation, with \(\sigma\) being equal to 1.3.

In summary, the evolutions of MSLP, vorticity, and wind speed during the life cycle of an explosive cyclone track are similar among models and reanalysis data. Models accurately reproduce higher bomb intensities in the vicinity of the Aleutian Low and the Labrador Sea. However, they also generally lack higher vorticity values near the Icelandic Low, and tend to underestimate the overall vorticity and deepening rate of explosive cyclones. Models with higher bomb frequencies tend to show more realistic spatial patterns of intensities than models with too few bombs. The following section explores how biases in bomb frequencies and intensities may be related to the larger-scale global circulation.

3.4 Bomb biases

The tendency of models to simulate too few and, to some extent, also too weak bombs raises the question on the origin of these biases. We therefore consider whether biases in bomb characteristics can be related to biases in the conditions favoring bomb genesis as outlined in Sect. 1. The high bomb frequencies in the northern Pacific and Atlantic shown in Fig. 4 indeed coincide with (1) regions of strong meridional SST gradients caused by the Kuroshio Current and the Gulf Stream (Fig. 10a), (2) the main locations of the left exit regions of the jet streaks (Fig. 10d), and (3) regions of strong lower tropospheric baroclinicity (Fig. 10j). To gain more insight on the potential importance of these variables, we start our analysis by assessing whether the interannual variability of bomb frequency and intensity is affected by the interannual variability of the SST gradient, the jet stream speed and meander, and the Eady growth rate. For this purpose we compute correlation matrices for each individual model and summarize the result in a single matrix by averaging the correlation coefficients, and by counting the number of models with positive and negative correlations. We do this for all models, as well as for models with statistically significant correlations only (see Sect. 2 for details). Using this approach we find that more models exhibit statistically significant associations between the interannual variability of explosive cyclone frequency and \(u_{250}\) than with other variables considered in this study. In the Atlantic, 15 models show statistically significant positive correlations between bomb frequency and \(u_{250} (\overline{R} = 0.50)\), and all models (25) have positive correlation coefficients (Fig. 11b, d). A similar but less evident pattern is found for the Pacific, where 8 models show statistically significant correlations \((\overline{R} = 0.53)\), and 22 models have positive correlation coefficients (Fig. 11a, c). The second most important variable is the Eady growth rate in the Atlantic, with 10 models exhibiting significant positive correlations \((\overline{R} = 0.48)\), as well as \(|v_{250}|\) in the Pacific for which 7 models show significant positive correlations \((\overline{R} = 0.46)\). The negative meridional SST gradient appears to play a less important role for the interannual variability of bomb frequency, with only two models showing statistically significant negative correlations. However, 10 models show significant negative correlations between the negative meridional SST gradient and \(u_{250}\) in the Atlantic \((\overline{R} = -0.54)\). Also, 17 models show significant correlations between \(u_{250}\) and the Eady growth rate in the Atlantic \((\overline{R} = 0.68)\). Finally, the different measures of bomb intensity vary consistently, where nearly all models show significant positive correlations between vorticity and the maximum wind speed within our search radius (22 models in the Pacific with a mean R of 0.62, and 24 models in the Atlantic with a mean R of 0.65).

Fig. 10
figure 10

Multi-reanalysis mean (left column), multi-model mean (central column), and multi-model bias (right column) of (ac) meridional SST gradient (defined as the meridional difference of SST per \(2.8125^\circ\) latitude moving south to north), (df) \(u_{250}\), (gi) \(v _{250}\), and (jl) Eady growth rate of the climatological mean from October to March of 1980 to 2005. Regions with frequent sea ice are masked out in the SST gradient plots (white). The Eady growth rate in (jl) is masked out in regions where the surface elevation exceeds 800 msl (grey)

Fig. 11
figure 11

a, b Positive and c, d negative mean Spearman’s correlation coefficients computed for each CMIP5 model from annual values for the period 1980–2005 (n = 26 years) for a, c the North Pacific and b, d the North Atlantic. The numbers show the number of models (22 in the case of Eady growth rate, and 25 for all remaining variables), and the colors present the corresponding Spearman’s correlation coefficient averaged over these models. The upper-right triangles are based on all models, while the lower-left triangles are based exclusively on models with statistically significant correlations at the 5 % level. Meridional SST gradient, \(u_{250}, |v_{250}|\), and Eady growth rate are climatological means from October to March of 1980 to 2005

Next, we quantify the biases of meridional SST gradients, \(u_{250}, |v_{250}|\), and Eady growth rate for the multi-model mean, and for each model individually. Spatial correlations between multi-reanalyses (ERA-INT, NCEP-CFSR, and NASA-MERRA) and multi-model means are good for the meridional SST gradient \((R = 0.85, p\,\hbox {value} < 10^{-3}),\) and strong for \(u_{250} (R = 0.97, p\,\hbox {value} < 10^{-3}),\) \(v_{250} (R = 0.95, p\,\hbox {value} < 10^{-3}),\) and Eady growth rate \((R = 0.97, p\,\hbox {value} < 10^{-3})\) (Fig. 10). However, the multi-model means underestimate (1) SST gradient by 0.25 and 0.15 K per \(2.8125^\circ\) latitude, (2) \(u_{250}\) by 2.7 and 1.3 m \(\hbox {s}^{-1}\), (3) \(|v _{250}|\) by 0.85 and \(1.4 \hbox { m } \hbox { s }^{-1}\), and (4) Eady growth rate by 0.04 and \(0.02\,\hbox {day}^{-1}\) on average in the Pacific and Atlantic, respectively (Fig. 10c, f, i, and l). Underprediction occurs in more than half of the models in the case of the SST gradient and \(|v_{250}|\) (in both cases 14 in the Pacific, and 17 in the Atlantic) (Fig. 5). Numerous models also significantly underestimate \(u_{250}\) in the Pacific (11 models), while only 5 models underpredict \(u_{250}\) in the Atlantic. The number of models with significant positive and negative biases of the Eady growth rate are roughly even (around 8 models in both ocean basins).

To better understand the effects of these biases on bomb characteristics, we summarize the relationships between all variables in a correlation matrix, where each model is represented by a single value (see Sect. 2 for details). Differences in bomb frequencies between models are significantly correlated to differences in \(u_{250}\) (R = 0.51 in the Pacific, and 0.55 in the Atlantic), indicating that the bomb frequency bias is probably related to the speed of the jet stream (Fig. 12). No significant correlations are found between explosive cyclone frequency and SST gradient, \(|v_{250}|\), Eady growth rate, or model resolution. However, models with stronger SST gradients also tend to have faster jet streams \((R = -0.56)\), and stronger Eady growth rates \((R = -0.79)\) in the Atlantic. Stronger deepening rates are found for models with more frequent explosive cyclones (\(R = 0.93\) in the Pacific, \(R = 0.88\) in the Atlantic). These correlations are robust with respect to all possible combinations of models resulting from the arbitrary exclusion of similar models described in Sect. 2. In summary, our findings identify significant correlations between the interannual variability of explosive cyclone frequency and the interannual variability of the jet stream speed. Correlations with other variables considered in this study (SST gradient, \(|v_{250}|\), and the Eady growth rate) are also present, but for fewer models. The multi-model means tend to underestimate the SST gradient, \(u_{250}\), \(|v_{250}|\), and the Eady growth rate when compared to the multi-reanalysis mean. The negative bomb frequency bias present in most models is significantly correlated with \(u_{250}\) in the inter-model spread, indicating that models with slower jet streams tend to reproduce less explosive cyclones. Finally, bomb frequencies and deepening rates are correlated significantly, implying that models with more frequent bombs also tend to simulate more intense bombs.

Fig. 12
figure 12

Correlation matrix where each model is represented by a single mean value \((n = 18)\) computed for the period 1980–2005 from October to March for the Pacific (lower-left triangle) and the Atlantic region (upper-right triangle). Statistically significant correlations at the 5 % level are denoted by dots (Harrell 2014)

4 Discussion

We evaluate how well CMIP5 climate models reproduce explosive cyclones for the period 1980–2005 in the extratropics of the Northern Hemisphere. An objective-feature tracking algorithm is used to identify and track extratropical cyclones from 25 CMIP5 models and 3 reanalysis products. Cyclones are identified as the maxima of T42 vorticity of 6h wind speed at 850 hPa. Explosive and non-explosive cyclones are separated based on the corresponding deepening rates of mean sea level pressure. Most models accurately reproduce the spatial distribution of bombs when compared to results from reanalysis data. Bomb frequencies are high in the northern Pacific and Atlantic, and coincide with (1) regions of strong meridional SST gradients caused by the Kuroshio Current and the Gulf Stream, (2) the main locations of the left exit regions of the jet streaks, and (3) regions of strong lower tropospheric baroclinicity. The spatial distribution of explosive cyclone frequency presented here is consistent with previous findings from reanalysis data (e.g. Black and Pezza 2013). Three quarters of the models however significantly underestimate bomb frequencies by a third on average, and by up to 65 % in the worst case. This negative frequency bias also applies to severe cyclones not characterized by pressure change, but by a minimum threshold value for vorticity (\(\zeta _{850}= 10 \times10^{-5} \hbox {s}^{-1})\). Such a bias confirms findings from Zappa et al. (2013a) who reported that CMIP5 models underestimate the intensity of cyclones, implying that the frequency of intense cyclones is too low. The negative bomb frequency bias present in most models is significantly correlated with \(u_{250}\) in the inter-model spread, indicating that models with slower jet streams tend to simulate less explosive cyclones. Since the speed of the jet stream and the meridional temperature gradient are related through the thermal wind balance, models with faster jet streams may be expected to have stronger meridional temperature gradients, favoring conditions for cyclogenesis. A similar relationship between \(u_{250}\) and cyclone frequency was presented for the North Atlantic by Zappa et al. (2013a). The importance of \(u_{250}\) versus other variables considered in this study (meridional SST gradient, \(|v_{250}|\), and the Eady growth rate) also applies to the interannual variability of explosive cyclone frequency. The other variables considered in this study show no statistical significant correlation with bomb frequency in the inter-model spread. However, the relation between \(u_{250}\) and bomb frequency appears to be weaker in the Atlantic compared to the Pacific, with numerous models underpredicting bomb frequencies despite strong jet streams (NorESM1-M, IPSL-CM5A-LR, CanESM2, and IPSL-CM5A-MR). This indicates that the causality of biases may also vary among models. The biases in the models listed above could be related to weak meridional SST gradients (NorESM1-M, CanESM2, and IPSL-CM5-MR), too zonal jet streams (NorESM1-M, IPSL-CM5A-LR, CanESM2), and too weak Eady growth rates (NorESM1-M and IPSL-CM5A-MR). Other potentially important factors not included in this study are biases in the amount of moisture available for convective latent heat release near the center of the surface low, and biases in the land-sea temperature contrast. Finally, it is important to acknowledge that our analysis can only indicate potentially important relationships, while a more thorough analysis of causality requires a range of controlled experiments.

The frequency of explosive cyclones is not correlated with horizontal or vertical model resolution in the inter-model spread. Roeckner et al. (2006) explores the sensitivity of the simulated climate to increases in the horizontal resolution for two different vertical resolutions in the ECHAM5 atmospheric model. They show that increasing the horizontal resolution reduces the errors of the zonal mean zonal wind fields only when the vertical resolution is sufficiently high (L31). This may provide a possible explanation for the limited impact of the horizontal resolution on bomb frequencies, since about a third of the models used in this study have less than 31 atmospheric model levels. Furthermore, comparing the horizontal resolution of finite grid models and spectral models is challenging, given that the resolution of the latter could also be measured based on the spectral truncation rather than from the output grid (Anstey 2013). Also, vertical model levels may be located at very different heights, especially when considering both high and low top models. Finally, the interpretation of our correlation analysis is challenging due to the many other differences among models, including the representations of aerosols, atmospheric chemistry, land surfaces, oceans, ocean biogeochemistry, and sea ice (Flato et al. 2013; their Table 9.A.1). A more direct comparison is possible when analyzing the same model with different resolution settings. Comparing the low resolution version IPSL-CM5A-LR (96 × 96 L 39) against the mid-resolution version IPSL-CM5A-MR (144 × 143 L39) shows that the former has a stronger frequency bias (−12 % for all and −42 % for explosive cyclones) than the latter, which has a more moderate bias (−9 % for all and −29 % for explosive cyclones). This also holds true for the model pair BCC-CSM1-1 (128 × 64 L 26) and BCC-CSM1.1(m) (320 × 160 L 26). Increasing the vertical model resolution on the other hand appears to have hardly any impact on the frequency bias. The biases of the lower vertical resolution model MPI-ESM-LR (192 × 96 L 47) and mid-resolution model MPI-ESM-MR (192 × 96 L 95) are nearly identical. This indicates that cyclone frequency biases may be more affected by horizontal than by vertical resolution. A better understanding of the relationship between model resolution and explosive cyclones could be obtained by computing bomb tracks from the same model with a range of different resolution settings. Ideally such experiments should try to distinguish between the impacts of changes in model resolution and the impacts of the corresponding changes (if any) in the physical parameter value settings.

Our method identifies an average of 119 explosive cyclone tracks per year in the extratropics of the NH (ERA-INT, 1980–2005). This number is considerably larger compared to results from Lim and Simmonds (2002) (46 tracks per year), Allen et al. (2010) (37 tracks per year), and Black and Pezza (2013) (18 tracks per year during December to February). While our analysis is based on the original definition of explosive cyclones introduced by Sanders and Gyakum (1980) (Eq. 2), the authors listed above use an additional criterion where not only the change in absolute pressure, but also the change in pressure anomalies must exceed one bergeron. The motivation for this additional criterion is that cyclones may appear to deepen when moving rapidly toward an area of climatologically lower pressure. In a side analysis we tested the impact of this additional criterion and find that it reduces the number of bombs by about 50 % in the case of ERA-INT. This could explain at least a large fraction of the considerable differences between our results and the numbers presented in Lim and Simmonds (2002), Allen et al. (2010), and Black and Pezza (2013). Other potentially important reasons include differences in the considered years and months, reanalysis data, and cyclone identification and tracking algorithms. Our approach assumes that the minimum MSLP coincides with the location of the maximum vorticity value. Since this may not always be the case during the early stage of a cyclone, our method may potentially overestimate deepening rates, and may therefore detect more bombs compared to approaches which are based on MSLP only. Also, Lim and Simmonds (2002) and Allen et al. (2010) consider the 24-h periods of pressure deepening to all start at 0000 UTC, while we compute deepening rates for every 6h time step. We choose to base our analysis on the original bomb definition from Sanders and Gyakum (1980), because the climatologically lower pressure in the Aleutian and Icelandic low is also the consequence of high cyclone frequency (Zhu et al. 2007). Also, using pressure anomalies may bare the risk of introducing unphysical pressure jumps. Our side analysis shows that the additional criterion does not affect the frequency distribution of cyclone vorticity, implying that the additional criterion does not necessarily filter out weak cyclones or “false” bombs. The negative frequency and intensity bias of CMIP5 models presented in this study applies to both, the original and the revised definition of explosive cyclones.

To conclude, this study evaluates how well CMIP5 models reproduce the frequency and intensity of explosive cyclones, and further assesses how model biases are affected by SST gradient, jet stream, Eady growth rate, and model resolution. Our results contribute to a better understanding of model biases, and may be used for selecting suitable CMIP5 models for regional downscaling and storm impact modeling efforts. A follow-up study will assess the impacts of climate change on explosive cyclones, and evaluate how model biases presented in this study affect the projections.