1 Introduction

Recent fine-resolution modelling results have shown considerable ability to simulate the climatological observed global formation rate of tropical cyclones; for a recent review, see Knutson et al. (2010a). These models have also now shown an ability to generate a realistic distribution of tropical cyclone intensity (Bender et al. 2010; Lavender and Walsh 2011; Murakami et al. 2012a). While coarser-resolution models have only a limited ability to simulate tropical cyclone intensity, they have demonstrated good performance in simulating the interannual variation of tropical cyclone formation (Vitart and Anderson 2001; LaRow et al. 2008; Zhao et al. 2009). The quality of such simulations is important for skilful dynamical seasonal predictions of tropical cyclone formation as well as for projections of future climate. Since it is crucial that a climate model used for the prediction of future climate gives a good simulation of the current climate (e.g. Delsole and Shukla 2010), an evaluation of the ability of such models to reproduce the current tropical cyclone climatology is important. This is particularly vital at the scale of individual tropical cyclone formation basins, where models have shown less ability to simulate observed cyclone formation rates, and where the response to global warming of tropical cyclone formation varies considerably from model to model (Knutson et al. 2010a, b).

In many cases, it is not clear why models produce different basin-scale formation rates for tropical cyclones. There are many factors in the real climate that produce variations in tropical cyclone formation rate: vertical wind shear (Palmen 1956; Gray 1968; McBride and Zehr 1981); the presence of substantial pre-existing convective development (e.g. Hendricks et al. 2004); temporal and geographical variations in sea surface temperature (Gray 1968; Vecchi and Soden 2007; Murakami et al. 2012b); and variations in mid-tropospheric relative humidity (Bister and Emanuel 1997). The combined effects of these variables on tropical cyclone formation rates has motivated the development of climatological or seasonal genesis parameters, indices that are derived from the best climatological fit to observed tropical cyclone formation for variables that are known to affect tropical cyclone formation on shorter time scales (e.g. Gray 1975; Royer et al. 1998; Emanuel and Nolan 2004; Camargo et al. 2007, 2009; Tippett et al. 2011). While all of these physical factors are present in model simulations and influence simulated tropical cyclone formation rates, there are additional model-dependent factors that can influence formation rates: for instance, the model specification of horizontal diffusion and the details of the model’s convective parameterization (e.g. Vitart et al. 2001).

Identifying the reasons for these different model responses is the main goal of an intercomparison process. There are many possible strategies for determining the reasons for model responses. In principle, the use of a common set of physical parameterisations among a group of models should reduce the number of degrees of freedom between the models that would be causing different responses. In practice, even if models employ a similar parameterisation of cumulus convection, there is no guarantee that the effect of using this parameterisation would be the same in two different models, as interactions of the cumulus scheme with other elements of the physics in different models could generate different simulation outcomes. In addition, implementing these changes across a suite of climate models is time consuming and would also usually require re-tuning the model after the new parameterisation scheme is introduced.

Alternatively, some insight can be gained by comparison of the performance of groups of models that contain common elements. For example, Lin et al. (2006) evaluated the performance of 14 AR4 climate models in generating the Madden-Julian Oscillation (MJO; Madden and Julian 1971). This intercomparison strengthened previous conclusions (Tokioka et al. 1988; Wang and Schlesinger 1999) that the best models for simulating the MJO were ones with convective closures or triggers linked to moisture convergence. Physically, an important factor for a good MJO simulation appears to be the preconditioning of the atmosphere through moistening rather than quick release of available potential energy. This concept has been applied in a number of subsequent improvements of model simulation of the MJO (Fu and Wang 2009; Seo and Wang 2010).

This comparison approach has the advantage of simplicity but it does rely on the evaluation of the model performance being conducted in a consistent manner, using the same model output metrics for every model in the comparison. In general, the use of consistent evaluation metrics is an important first step in any intercomparison of climate model results but has not been employed to date in the analysis of most climate simulation of tropical cyclones (Walsh et al. 2007). This paper outlines initial results from a multi-model intercomparison project, the Tropical Cyclone climate Model Intercomparison Project (TC-MIP; Walsh et al. 2010). Like all intercomparison projects, it aims to improve the simulation of the chosen phenomenon through identification of common model features that have led to improved simulations. Ideally, such intercomparisons should have many models available for analysis, so that clear groups of better-performing models can emerge from the analysis of the results. One drawback of this approach for the generation of tropical cyclones by climate models is that relatively few global models have been run for the long, very fine resolution simulations required to generate a good tropical cyclone climatology. Such resolution is needed for best results because of the small scale of tropical cyclones compared to the typical resolution of a climate model; ultimately, a horizontal resolution as fine as a few kilometres may be required (Chen et al. 2007). Nevertheless, coarse resolution climate models have shown a surprising ability to generate realistic tropical cyclone formation rates, although the storms so generated clearly have lower intensities than many observed storms. Thus, in addition to selected recent fine-resolution modelling results, we also examine results from the CMIP3 archive (http://cmip-pcmdi.llnl.gov/).

Analysis of detected tropical cyclones for model results contained in the CMIP3 archive has been performed previously by a number of authors (e.g. Yokoi et al. 2009). In general, though, these results were either focused on a particular region or did not use systematic, model-independent common metrics for the specific purpose of comparing the model climatology of tropical cyclones with observations. Camargo et al. (2005) analysed the results of three GCMs with horizontal resolutions of approximately 2.5° using a model- and basin-dependent tropical cyclone detection routine. They found that the models were able to reproduce basic features of the observed tropical cyclone climatology. Camargo et al. (2007) used the same cyclone detection method for the analysis of the output of several GCMs and compared the detection tropical cyclone numbers to those estimated from an empirical index of tropical cyclone formation, the Emanuel and Nolan (2004) Genesis Potential Index (GPI). They found that there was little relationship from model to model between the GPI and model-simulated cyclone formation; a model with a high GPI did not necessarily have a high tropical cyclone formation rate. In the present study, we examine global model results and employ common metrics for model evaluation, including a resolution-dependent, model-independent tropical cyclone detection technique. Section 2 gives a list of models and of observations used for model validation, Sect. 3 describes the analysis methodology, Sect. 4 details the results and Sect. 5 provides a discussion and concluding remarks.

2 Models and validation data sets

As mentioned above, two sets of model results are examined here. To provide a baseline comparison, the CMIP3 model archive is analysed. Table 1 gives some details of the models, including their resolution as stored in the archive and their convection schemes. In addition, two finer-resolution, more recent model results are analysed for current climate conditions. The MRI/JMA 20-km global mesh model (Mizuta et al. 2006) is run using a timeslice method for model years 1979–2003. In the timeslice method, the SSTs from a coarser-resolution model run are used to force a fine-resolution atmospheric general circulation model (AGCM). The model is hydrostatic, with 60 vertical levels, uses a semi-Lagrangian time integration scheme and a prognostic Arakawa-Schubert cumulus convection scheme (Randall and Pan 1993). The CMCC_MED model (Scoccimarro et al. 2011) is a fully coupled GCM without flux adjustments, using an atmospheric spectral resolution of T159 (equivalent to a horizontal resolution of about 80 km; Roeckner et al. 2003). The parameterization of convection is based on the mass flux concept (Tiedtke 1989), modified following Nordeng (1994). The global ocean model used is a 2° resolution global ocean model (Madec et al. 1998) with a meridional refinement near the equator to 0.5°. The CMCC_MED model output used in this work are obtained running the model over the period 1970–1999 using twentieth century (20C3 M) atmospheric forcings as specified by the IPCC (http://www-pcmdi.llnl.gov/ipcc/about\_ipcc.php). Results from these two recent models are likely to be more similar to model results that will be obtained from a similar analysis of the CMIP5 model archive (http://cmip-pcmdi.llnl.gov/cmip5). Thus another purpose of this paper is to establish a model intercomparison methodology that can be applied to a suite of finer-resolution climate model results, when these become available.

Table 1 List of CMIP3 models analysed, along with their resolutions and convective parameterisations

Model tropical cyclone formation is compared with the IBTrACS best track data (Knapp et al. 2010), a global compilation of the best estimated tropical cyclone positions and intensities. The observed cyclones are analysed over a 20-year period corresponding to the current climate (1980–1999). Data used to construct observed versions of model diagnostic parameters is taken from the NCEP-2 reanalyses (Kanamitsu et al. 2002) over the same period. For selected fields, comparisons are also made with the ERA40 reanalyses (Uppala et al. 2005). Both reanalysis data sets are at a horizontal grid spacing of 2.5°.

3 Methods

It is important in an intercomparison project that aims to evaluate the ability of climate models to generate tropical cyclones that it is agreed what constitutes a tropical cyclone in the climate model output. One metric would be simply to apply the criterion applied to observed tropical cyclones, that the storms must have 10-min average wind speeds of 17.5 ms−1 or greater at a height of 10 m above the surface. This may not be appropriate for climate model output, though, as there are numerous cyclonic disturbances generated by a model that satisfy this criterion that are not tropical cyclones, for example, mid-latitude cyclones. Thus additional structural criteria that identify simulated tropical cyclones need to be imposed. Typically, these have been in the form of assuming that low-level wind speed, usually at 850 hPa, exceeds that in the upper troposphere, and that temperature anomalies in the centre of the storm are larger in the upper troposphere than in the lower troposphere. Due to the thermal wind equation, these conditions are essentially equivalent, but they are often both imposed because of the ability of mid-latitude storms to sometimes mimic one or the other of these two conditions (e.g. Shapiro and Keyser 1990).

Here, the resolution-dependent method of Walsh et al. (2007) is used to track cyclones. This method assumes that simulated tropical cyclones are best compared with fine-resolution observations that have been degraded to the resolution of the model, in a manner analogous to that usually performed for other comparisons of observations to model simulations of variables such as precipitation. When observed tropical cyclones are regridded to the relatively coarse resolution of a climate model, their maximum wind speeds become less, and so also the detection threshold for tropical cyclone winds falls from the observed value of 17.5 ms−1 to lower values (Fig. 1). The advantage of this technique is that it provides a baseline, model-independent comparison of simulated tropical cyclone formation rates. This detection technique also assumes a number of other thresholds:

Fig. 1
figure 1

Variation with resolution of 10 m wind speed detection threshold for tropical cyclones, for various vortex specifications as described in Walsh et al. (2007)

  • Points with vorticity more cyclonic than 1 × 10−5 s−1 are first identified; this threshold serves merely to eliminate isolated points of weak cyclonic vorticity, thus speeding up the detection routine;

  • A centre of low pressure is then found;

  • At the centre of the storm, there must be a warm core, specified as the sum of the temperature anomalies at the centre of the storm versus the surrounding environment, and the temperature anomaly at 300 hPa must be greater than zero; in addition, the mean wind speed over a specified region at 850 hPa must be greater than that at 300 hPa.

  • The resolution-dependent 10 m wind speed threshold is then imposed.

  • Detected storms need to satisfy these conditions for at least 24 h.

The solid line given in Fig. 1 is the one that is employed here to set the resolution-dependent detection threshold. Other symbols shown on Fig. 1 correspond to different vortex specifications, as explained in Walsh et al. (2007).

A number of atmospheric variables have been previously shown to influence the rate of tropical cyclone formation. The Emanuel and Nolan (2004) genesis parameter is here employed as a means of comparing the effects of several of these variables simultaneously:

$$ {\text{GPI}} = \left| {10^{5} \eta } \right|^{3/2} \left( {\frac{\text{H}}{50}} \right)^{3} \left( {\frac{{V_{pot} }}{70}} \right)^{3} \left( {1 + 0.1\,V_{shear} } \right)^{ - 2} $$

where η is the absolute vorticity at 850 hPa in s−1, H is the relative humidity at 700 hPa in percent, V pot is the potential maximum wind speed in ms−1 and V shear is the magnitude of the vertical wind shear between 850 and 200 hPa, also in ms−1.

A number of standard statistical measures were applied to the analysis of the climate variables that compose the GPI, collected in the form of Taylor diagram (Taylor 2001). In addition, in our analysis, for the first time a Taylor diagram is constructed comparing observed tropical cyclone formation rates to simulated rates. One difference in the analysis contained here from the standard Taylor diagram is that the zonal mean value of each quantity is removed before the correlation is performed, giving an anomaly correlation. This is a more sensitive statistic than the standard pattern correlation as it removes the high pattern correlation that is caused simply by the variables having substantial variation with latitude caused by the known equator to pole climatological gradients.

The results shown here are similar to those already described in Walsh et al. (2010), but there are two differences from the results described in that paper. Firstly, a bug was fixed in the data interface section of the detection routine, which improved the ability of the routine to detect weak tropical cyclones. In addition, a further improvement to the method was made, in that for the CMIP3 model results the “background” climatological mean sea level pressure (mslp) was increased. This further improved the detection of weak storms by enabling them to stand out from the background more clearly, resulting in an improved detection of storms in the CMIP3 model results.

4 Results

Figure 2 compares results of the GPI diagnosed from the higher-resolution CMIP3 simulations for the January through March climatology, to the GPI diagnosed from NCEP2 reanalyses with a horizontal resolution of 2.5°. While there appears to be considerable variation between the model simulations of GPI, most models generate a pattern similar to that derived from the NCEP reanalyses. Some systematic differences can be seen between the model results and the NCEP2 GPI, though. For instance, many models have excessive GPI in the South Atlantic, and many models have regions of GPI that extend too far east into the South Pacific. These simulated GPI values can be quite large: for instance, in the MPI ECHAM5 model, maximum values in excess of 40 (per 2.5 × 2.5° grid box per 20 years) are found, compared with maximum values derived from the NCEP2 reanalysis in the same region of 10–15. The excessive simulated GPI values are likely associated with the known dry bias in the mid-tropospheric relative humidity from the NCEP reanalyses (Bony et al. 1997). This would strongly affect the GPI values since they depend on the cube of the 700 hPa relative humidity. This result was also noted by Camargo et al. (2007).

Fig. 2
figure 2

Emanuel genesis parameter fields derived from NCEP2 reanalyses (top left) and higher-resolution CMIP3 models, January–March. Formation rate is per 2.5 × 2.5° grid box per 20 years

Figure 3 gives a Taylor diagram corresponding to the plots in Fig. 2, and this diagram also includes the lower-resolution CMIP3 models. Values are shown for both January–March (JFM) and July–September (JAS). The statistics are evaluated between latitudes 40 S and 40 N and the anomaly correlation rather than the pattern correlation is plotted, as described in Sect. 3. Models with horizontal grid spacings finer than 2.8° are indicated in red. In general, with the exception of one outlier, the finer-resolution models give superior performance, with better correlations and with standard deviations more similar to the NCEP2 reanalyses, indicated by the red line. Most models have higher GPI than that diagnosed from the NCEP2 reanalyses, as also seen in Fig. 2. Similarly, Fig. 4 shows the relationship between the GPI index and model resolution for JFM, with the GPI value averaged over the latitudes specified above. A linear regression line is fitted to the model results, and the NCEP2 and ERA40 reanalyses GPI values are given for comparison. With the exception of a few outliers, in general the finer-resolution models more closely approach the reanalysis values, although there is little dependence of GPI value on resolution. Interestingly, most GPI values from the models are lower than that diagnosed from the ERA40 reanalyses but higher than those from the NCEP2 reanalyses, consistent with the NCEP2 values having a dry bias in the mid-troposphere.

Fig. 3
figure 3

Taylor diagram of model GPI versus NCEP reanalyses, (top) JFM and (bottom) JAS. Model numbers are the same as in Table 1, with higher-resolution models in red. The standard deviation of the NCEP reanalyses is indicated by the red line

Fig. 4
figure 4

Emanuel and Nolan GPI versus resolution for the CMIP3 models, JFM. GPI value derived from NCEP2 reanalyses is indicated by a circle, and the value from the ERA40 reanalyses is indicated by a triangle

Figure 5 shows the detected January–March formation of tropical cyclones in the models compared with the best-track data, in the same order of models as Fig. 2 (note that not all models listed in Table 1 had sufficient output archived to enable cyclone tracking to be performed). It is clear that most finer-resolution models (finer than 2.8°) simulate a reasonable pattern of cyclone formation. In addition, Fig. 6 shows results from coarser resolution models, where the simulated pattern of formation is less adequate. In contrast to the results for the GPI, there is little or no simulated cyclone formation in the South Atlantic. In addition, a number of the finer-resolution models are simulating excessive formation in the northwest Pacific at this time of year, compared with the best-track data.

Fig. 5
figure 5

Tropical cyclone genesis for higher-resolution models (January–March), same units as Fig. 2, for iBTracs best track data (top left) and model tropical cyclone detections, after the method of Walsh et al. (2007)

Fig. 6
figure 6

The same as Fig. 5 for lower-resolution models

It is evident from Figs. 5 and 6 that the lowest resolution models tend to have less cyclone formation, and Fig. 7 summarize this result. The correlation between formation and resolution for the CMIP3 models is −0.5, which is statistically significant at the 95% level. Note, though, that this could also be regarded as a threshold effect. For instance, Fig. 7 shows that once the models have resolutions finer than about 4°, it could be argued that there is actually little relationship between resolution and formation rate for this set of CMIP3 models, since some finer-resolution models also have relatively low simulated cyclone numbers. Figure 8 shows the Taylor diagram of cyclone formation for JFM and JAS compared with the observed best track data, corresponding to Figs. 5 and 6. Also included in this diagram are the results from the two higher-resolution (post-CMIP3) models listed in Sect. 2, indicated by a red x. It is clear from this analysis that the higher-resolution CMIP3 models have the best pattern correlations compared with the observed formation, and the post-CMIP3 models have among the best correlations of all, although they do not necessarily have the smallest model biases. This may suggest that the main advantage of finer resolution is to generate a better pattern of formation. Note that the anomaly correlations for the GPI index (Fig. 3) are substantially higher than those for the directly simulated cyclone formation (Fig. 8), reinforcing the point that it is fundamentally easier for the models to simulate a good pattern of large-scale climate variables that are known to influence tropical cyclone formation rates than of tropical cyclone formation itself.

Fig. 7
figure 7

JFM simulated TC formation for CMIP3 models versus resolution. A line of best fit is included

Fig. 8
figure 8

Taylor diagram for tropical cyclone formation versus best track data corresponding to the models shown in Figs. 5 and 6: (top) JFM and (bottom) JAS. Higher-resolution CMIP3 models are indicated in red. Two finer-resolution recent models are indicated with a red x

Turning to Northern Hemisphere results, Fig. 9 shows GPI results for July–September compared with simulated cyclone formation. For brevity, only selected model results are shown. Once again, there is a large variation in the results, with some models capturing well the pattern of diagnosed genesis, and other models performing less well. The accompanying Taylor diagram is shown previously in Fig. 3. Once again the fine-resolution models appear to be capturing the NCEP2 GPI a little better, although there are a number of outliers. As in January–March, most models have values of GPI that are larger than observed, and many models simulate GPI values over the North Pacific that are higher than diagnosed from the NCEP2 data. A number of models (not shown) also have excessive GPI in the regions near Indonesia, again consistent with the dry bias in the NCEP reanalyses. These models also tend to be those that overestimate GPI across the Pacific.

Fig. 9
figure 9

The same as Fig. 2 for July–September (upper two rows), for selected fine and coarse-resolution models, including a comparison to model cyclone formation rates (lower two rows)

Figure 9 also shows the simulated formation rates for July–September, for selected models; the accompanying Taylor diagram is given in Fig. 8. Some systematic biases in model formation compared with the observations are apparent. Most models simulate considerably lower formation than observed in the North Atlantic, while simulated formation in the eastern north Pacific is usually lower than observed also. In contrast, simulated formation in the north-west Pacific appears to be more accurate. There is a similar relationship between cyclone formation and resolution in JAS as in JFM for the CMIP3 models, with a similar correlation of −0.54 (not shown). The corresponding Taylor diagram (Fig. 8) shows that once again the highest-resolution models have in general higher pattern correlations, although again not necessarily the smallest biases, as there is a considerable scatter in the simulated formation rates.

To examine the ability of the models to simulate the observed geographical pattern of cyclone tracks, Fig. 10 shows annual tropical cyclone tracks compared with the best track data, for finer-resolution models. As for formation, there are a number of systematic differences from the observed tracks that are common to many of models. Even so, the models are able to capture important aspects of the observed geographical variation of tracks: for example, most models simulate the observed minimum in cyclone track density in the central north Pacific, caused by the high climatological vertical wind shear in this region. Some models simulate a collection of short tracks in the South Atlantic, where cyclones are not observed frequently (Pezza and Simmonds 2005). The best track data have a higher track density overall than most models, and many more tracks at higher latitudes than the models. In the North Atlantic, model tracks mostly tend to be restricted to low latitudes, with few tracks approaching the eastern United States, unlike the observed track pattern. This can also be seen in the northwest Pacific, with few simulated storms striking Japan. At least part of this difference may arise from the lack of an objective criterion in the observed best track data that is systematically imposed to indicate extratropical transition (Kofron et al. 2010), which if imposed would shorten the observed tracks in the mid-latitudes. In addition, it is noted that the CMIP3 archive consists largely of daily-mean data, and the tracking in the present study was performed on those data. Further analysis of these data (S. Yokoi, personal communication, 2011) suggests that in mid-latitude regions, the faster translation speed of these storms makes them more difficult to detect in daily average data, thus leading to the lack of tracks at higher latitudes.

Fig. 10
figure 10

Annual tropical cyclone tracks for finer-resolution models. Observed and model-simulated formation rates for each basin are also given

While there may be some relationship between model formation rates and resolution, little or no inter-model global relationship was found between tropical cyclone formation and the GPI, or between model resolution and the GPI (not shown; see also Camargo et al. 2007). Nor was there are strong inter-model global relationship between TC formation and the various components of the GPI (wind shear, relative humidity or MPI; not shown). Since there is some relationship between model resolution and TC formation, this suggests that it is more difficult to improve the simulation of the large-scale variables that comprise the GPI simply by increasing resolution than it is to improve the model simulation of tropical cyclone formation by increased resolution. Some support for this hypothesis comes from Fig. 11, which shows TC formation normalized by GPI versus resolution. Comparing this result to Figs. 4 and 7, low resolution models tend to have reasonable to high GPI values but low TC formation. Thus in Fig. 11, the response shown in Fig. 7 is exacerbated. Coarse-resolution models have low values of this quantity, as for these models GPI tends to be more similar to that of the high-resolution models while the directly-simulated TC formation is low. While this relationship is statistically significant for the CMIP3 models, it clearly depends on other model-dependent factors apart from resolution. As an example of this effect, statistics show that the better resolution models are clearly performing better at simulating the observed wind shear (not shown), even though this is not translating into a genuine statistically-significant inter-model relationship between simulated wind shear and TC formation.

Fig. 11
figure 11

Cyclone formation rate normalized by GPI, as a function of resolution, for JFM. Included also is the same quantity for the best track values divided by the NCEP2 reanalyses-derived GP (circle) and by the ERA40 reanalyses-derived GP (triangle)

It is well known that observed tropical cyclones arise from regions of persistent deep tropical convection (e.g. Charney and Eliassen 1964; Evans and Shemo 1996). Nevertheless, there also appears to be little inter-model relationship between precipitation and TC formation rates: models with lower total precipitation rates appear to be giving slightly more tropical cyclone formation (not shown), although this relationship is not statistically significant. The finer resolution models also appear to have somewhat better simulation of precipitation overall (Fig. 12). In addition, there appears to be little relationship between convective precipitation rates, as specified by the model convective scheme, and tropical cyclone formation (not shown). Nor does there appear to be an inter-model relationship between the ratio of convective precipitation to total precipitation and the tropical cyclone formation rate (not shown). On the other hand, of the higher-resolution models, the MIROC hires model has high resolution but a rather low generation rate of tropical cyclones, combined with a low fraction of convective precipitation. This may be related to the results of McDonald et al. (2005), who found that there appeared to be a relationship between model-generated convective rainfall and tropical cyclone formation, at least for higher-resolution models. In the results shown here, there does not appear to be a strong correlation between this variable alone and seasonal formation rates of tropical cyclones.

Fig. 12
figure 12

Taylor diagram for JAS total precipitation

While the analysis indicates that it is difficult to find relationships that are robust between models, relationships between variables within a single model can be strong. As Fig. 3 shows, anomaly correlations between the individual model GPI patterns and the NCEP-derived GPI are high, with an average when taken across all models and seasons of about 0.6. Since the GPI was originally developed by tuning the NCEP-derived GPI values to the best track data, this implies that anomaly correlations between individual model GPI patterns and the best track observed patterns of formation are also strong. Nevertheless, the individual model GPI is less reliable as a predictor of that model’s pattern of simulated cyclone formation, with anomaly correlations when averaged across all models and seasons of about 0.3. Higher-resolution models mostly have higher anomaly correlations between model GPI and model cyclone formation, however (not shown).

5 Discussion

Several studies have shown that simulated tropical cyclone frequency increases with increased resolution, all other things being equal (Murakami and Sugi 2010; Gentry and Lackmann 2010). Figure 13 shows the relationship between annual model formation and resolution, using the Walsh et al. (2007) detection criterion. There is a statistically significant relationship between model formation of TCs and resolution, even when in this case the detection threshold is adjusted downwards for models of coarser horizontal resolution, thus making it easier to detect cyclones in such models. Even after this is done, simulated tropical cyclone formation in these coarse-resolution models remains low. Increased horizontal resolution thus may have an effect on tropical cyclone formation that is in addition to that of resolution only, as this would be accounted for solely by the increasing threshold imposed by the detection technique. If a fixed threshold rather than a resolution-adjusted threshold were employed, this relationship would of course be even stronger, as has been shown previously by others. For instance, for storms simulated by the GISS model, with a resolution of 4.5°, the maximum wind speed recorded for a simulated tropical cyclone is only just over 20 ms−1. Thus if the observed detection threshold of 17.5 ms−1 were imposed on the output of this model, even fewer storms would be detected than those shown in Fig. 13. More generally, if the formation and intensification of simulated tropical cyclones is related to a non-linear feedback process between the ocean and the atmosphere (Rotunno and Emanuel 1987), it can be argued that this process would operate more efficiently in a finer-resolution model. The higher wind speeds generated by the finer resolution model would enhance any such feedback process, and an increased number of model grid points in closer proximity to the storm centre would help amplify this process. An alternative explanation, though, is that the lack of detection of storms in low resolution models may be simply a result of the tracking algorithms not being able to track the storms properly at these resolutions, combined with the coarse temporal resolution of the CMIP3 results analysed here (Camargo and Sobel 2004).

Fig. 13
figure 13

CMIP3 model resolution (in degrees of latitude) versus diagnosed model TC genesis, with the detection threshold adjusted for resolution. Observed annual formation is shown by the red circle; green are models that employ versions of the Arakawa-Schubert convection scheme; yellow are models that use the Zhang-McFarlane scheme; brown are models that use mass-flux schemes; and blue are models with other convection schemes

There appears to be little relationship between the choice of convective parameterisation and the model generation rate of tropical cyclones (Fig. 13). Models employing various versions of the Arakawa-Schubert convection scheme (green squares) give a wide range of TC formation rates, as do models employing mass-flux or Zhang-McFarlane type schemes. While it is clear that the use of a particular convection scheme can give a systematic change in tropical cyclone formation rate within a single model (e.g. Yoshimura et al. 2011), there are other factors that can cause changes in tropical cyclone formation rates. For instance, the two versions of the GFDL model that were run as part of the CMIP3 model suite (models 7 and 8 in Table 1) have the same convective parameterizations but are based on different dynamical cores, and yet the tropical cyclone formation rate of the two models as analysed here differs by more than a factor of two. Thus, in agreement with the results of Camargo et al. (2007), dynamical factors appear to be playing a strong role in the intermodal differences in tropical cyclone formation rate.

The Taylor diagrams shown here for the different variables show that simulation of tropical cyclone formation is in general considerably worse that the model simulation of any variable that composes the GPI. The GPI is often well-simulated by coarse-resolution models (compare Fig. 3 with Fig. 8, for instance). We interpret this as further demonstrating the importance of resolution for the simulation of tropical cyclone formation. A coarse-resolution model may be able to generate a reasonable GPI pattern, derived as it is from large-scale variables, but is less well able to generate the actual rate of tropical cyclone formation. While this result might suggest that given limited computing resources, for making climate change predictions of tropical cyclone formation indices like the GPI should be used in preference to direct simulation of tropical cyclones, these indices have their own uncertainty issues. They are tuned to the current climate and it is debatable whether such a functional relationship would hold in a warmer world in exactly the same way. Note also that most models have larger GPI rates than observed. The original formulation of the GPI was tuned using the NCEP reanalyses, which are known to be drier than observed in the tropics (Bony et al. 1997), which would explain this bias in the GPI derived from the CMIP3 models.

Most models simulate little cyclone formation in the Atlantic, despite having reasonable GPI patterns in many cases. Table 2 compares results in the western North Pacific basin to those in the Atlantic. While GPI values are considerably lower in the Atlantic than in the western North Pacific, simulated formation rates in the Atlantic decrease even more than does the GPI. In addition, the ratios of both simulated GPI and tropical cyclone formation between the Atlantic and western North Pacific are both well below the observed ratio of formation of about 1:2. In the results analysed here, high-resolution models appear to have higher formation rates in this basin than coarse-resolution models. For the two post-CMIP3 models (Table 2), simulated Atlantic formation is higher than the CMIP3 average, although still below the observed numbers. Daloz et al. (2012) showed a strong relationship between the able of models to generate Atlantic Easterly Waves (AEWs) and the model generation of tropical cyclones. It is likely that the ability of models to generate AEWs, the main precursor for tropical cyclone formation in the Atlantic basin, is related to the resolution of the model (Thorncroft and Hodges 2001). This implies that climate model resolution may be particularly important in the Atlantic basin for a good simulation of tropical cyclone formation.

Table 2 Comparison of observed, CMIP3 and finer-resolution models average TC formation by basin with GPI values, July–September

In summary, we find the following results from the initial stage of this intercomparison:

  • There is some relationship between model resolution and tropical cyclone formation rate even after a resolution-dependent tropical cyclone detection threshold is applied. This may imply some non-linearity in the simulated tropical cyclone formation process different from the largely linear dependence of the resolution-adjusted detection threshold.

  • Coarse-resolution models simulate the Genesis Potential Index better than they simulate the formation of tropical cyclones directly. As a result, there appears to be little inter-model relationship between model GPI and model directly-simulated formation rate. In contrast, there are some relationships within individual, finer-resolution models between patterns of simulated tropical cyclone formation and genesis potential index patterns.

  • The main advantage of finer model resolution, apart from giving a somewhat better simulation of tropical cyclone formation rate, is to give a better pattern of formation rate.

Ideally, it would be preferable if such climate model intercomparison were conducted using a larger suite of fine-resolution simulations similar to the two post-CMIP3 models used here. In addition, performing common perturbation experiments to determine the model responses to idealized forcings will shed light on the model responses to climate change. This approach is envisaged as part of the US Clivar Working Group on Hurricanes (http://www.usclivar.org/hurricanewg.php), for which the analysis methodology established here will be employed.