1 Introduction

Many previous studies have examined the impact of global warming on tropical cyclone (TC) activity using climate models. Low-resolution general circulation models (GCMs) have been shown to be capable of producing TC-like vortices and providing the large-scale variables associated with TC activity (Bengtsson et al. 1982; Camargo 2013; Vitart et al. 1997). They have effectively provided future climate change projections, but their coarse resolutions prevent them from resolving aspects of individual TC structures (LaRow et al. 2008). Recognizing this limitation, high-resolution GCMs have been utilized to provide more realistic TC characteristics in many institutes (Manganello et al. 2012; Murakami et al. 2012a, b; Oouchi et al. 2006; Strachan et al. 2013; Walsh et al. 2014; Wehner et al. 2014; Zhao and Held 2012; Zhao et al. 2009). However, these experiments are still burdensome due to large computational resources required to run the high-resolution GCMs for multi-decadal simulations.

An alternative is to embed high-resolution regional models within GCMs producing more highly resolved TCs. This approach includes dynamical and statistical-dynamical downscaling techniques. The dynamical downscaling technique can provide more detailed information on TC activity over a region of interest in the GCM output using the high-resolution regional climate model (RCM) (Au-Yeung and Chan 2012; Bender et al. 2010; Camargo et al. 2007; Cha et al. 2011b; Feser and von Storch 2008; Knutson et al. 2007; Lee et al. 2013; Wu et al. 2014). This technique has been providing useful insights into the dependence of TC activity on climate. As a variant of dynamical downscaling, the statistical-dynamical downscaling technique described in Emanuel et al. (2008) can generate a very large number of synthetic TCs with more realistic intensity based on GCM environmental fields using simpler embedded models. It can also avoid the limitations on intensity simulations in the dynamical downscaling using the RCM. However, these two techniques present some drawbacks such as (1) an inconsistency in the physics between the GCMs and RCMs, (2) a lack of feedback from the RCMs to the GCMs, and (3) inherent uncertainties and systematic errors induced by single RCMs. In case of the dynamical downscaling, the first and second issues can be resolved by using a unified single model system across a range of spatial scales (convective scale to climate system) with an identical physics package for global and regional models [e.g., Model Prediction across Scales (MPAS, Skamarock et al. 2012) and Global/Regional Integrated Model system (GRIMs, Hong et al. 2013a)]. The third issue that makes it difficult to assess the significance and reliability of their future projections of TC activity, can be resolved by using multi-RCM ensemble.

In order to alleviate the uncertainties of the single RCM, the multi-RCM ensembles obtained by downscaling GCMs for future climate projections, have been generated through several international projects during the past decade (Christensen et al. 2007; Fu et al. 2005). Recently, the Coordinated Regional Climate Downscaling Experiment (CORDEX) was organized as an international coordinated framework to produce high-resolution regional climate change projections based on the Coupled Model Intercomparison Project Phase 5 (CMIP5) using multi-RCMs (Giorgi et al. 2009). CORDEX-East Asia, which is the East-Asian branch of the CORDEX initiative, is suitable to provide future projections of TC activity over the western North Pacific (WNP) because its domain covers a major part of this basin with reasonable fine-resolution in simulating TCs on the multi-decadal time scale. However, RCMs within CORDEX-East Asia have not been evaluated for their ability to simulate TC activity over the WNP. They have been only just evaluated for simulating the precipitation and temperature over East Asia (Huang et al. 2015; Lee and Hong 2014; Oh et al. 2013; Park et al. 2013; Suh et al. 2012; Zou et al. 2014). The model evaluation is a fundamental step prior to assessing climate change projections, since RCMs play critical roles in decision making processes with climate information (Hong et al. 2010; Kim et al. 2014b; Suh et al. 2012). Thus, it is necessary to objectively evaluate the capability of RCMs in simulating WNP TC activity.

This study aims to evaluate the capability of five RCMs within CORDEX-East Asia and their ensemble mean to simulate WNP TC activity at spatial and temporal scales as compared to observation. Namely, we confirm whether CORDEX multi-RCM ensemble can be applied to provide more reliable and credible estimation of future TC activity over the WNP. This paper is organized as follows. In Sect. 2, the data, model, and methodology including detection of TCs is described. In Sect. 3, the climatology of the simulated TC activity and large-scale environment in the RCMs is presented and discussed. The intensity distribution and interannual variability of the simulated TCs are described. The summary and conclusion are given in Sect. 4.

2 Data and methodology

The best track data for WNP TCs was obtained from the Regional Specialized Meteorological Centers (RSMC) Tokyo—Typhoon Center for the period 1989–2008. This dataset contains 6-h latitude and longitude locations, minimum central pressure, maximum wind speed, etc. The location where an individual TC first attains tropical storm intensity (i.e., greater than 17 m s−1) is set as the genesis point, so that the TCs include both tropical storms and typhoons. The analysis is focused on the TCs formed from June to November (JJASON) covering approximately 85 % of the total annual TCs over the WNP. To evaluate the performance of the models, simulated upper air fields are compared with the reanalysis driving forcings. In addition, the Climate Prediction Center Merged Analysis of Precipitation (CMAP) monthly data with 2.5° × 2.5° horizontal resolution (Xie and Arkin 1997) are used for precipitation evaluation.

In this study, five different RCMs [Hadley Centre Global Environmental Model version 3 regional climate model (HadGEM3-RA), Regional Climate Model (RegCM), Seoul National University Regional Climate Model (SNURCM), Weather Research and Forecasting model (WRF), and GRIMs] were used for the TC simulations. Details of their dynamics and physics are presented in Table 1. The RCM simulation covers the period 1989–2008, with lateral boundary forcing from the ERA-Interim reanalysis. A spectral nudging is implemented into all RCMs except HadGEM3-RA to reduce the systematic errors in long-term simulations. ERA-Interim SST data are prescribed as the model’s lower boundary. RCMs were performed over the CORDEX-East Asia domain with ~50 km resolution. The almost identical setups (i.e., driving data, simulation period, domain, and resolution) allows us to fairly assess the model performance and to reduce inherent uncertainty of single RCM in simulating TC by constructing multi-RCM ensemble. The model results have been interpolated onto the analysis domain (0–45°N, 100–160°E) of 0.5° to analyze WNP TC activity (Fig. 1). Since the analysis domain is smaller than the official WNP basin and the RCMs can be strongly influenced by the position of the boundary of the model domain, the observed and simulated TCs should be carefully analyzed (Camargo et al. 2007; Landman et al. 2005). To relieve this issue, the genesis and track position of observed TC advected from the outside the analysis domain, are defined as the first and subsequent positions after crossing the boundaries of the analysis domain. Details of the RCM experiments are provided in Suh et al. (2012), except for HadGEM3-RA which is described in Davies et al. (2005).

Table 1 The main characteristics of the RCMs used in this study
Fig. 1
figure 1

The CORDEX-East Asia domain with topography (m) and western North Pacific area (dashed line)

The detection and tracking methods for simulated TCs used in this study are almost the same as that used in previous studies (Cha et al. 2011b; Jin et al. 2013): (1) the potential storm is a local minimum of sea level pressure; (2) the maximum surface wind exceeds the wind speed threshold; (3) the maximum relative vorticity at 850 hPa exceeds the vorticity threshold; (4) the sum of the temperature deviations at 200, 500, and 850 hPa exceeds the temperature anomaly threshold; (5) the maximum wind speed at 850 hPa is larger than that at 300 hPa; (6) the duration is not shorter than 2 d; (7) tracks are traced from these identified potential storms. Unlike the previous studies, the wind speed, vorticity, and temperature anomaly thresholds are defined as model-dependent thresholds proposed by Camargo and Zebiak (2002) to fairly compare the capability of the individual models to simulate TC activity. The vorticity threshold is defined as two standard deviation of 850-hPa relative vorticity over the WNP. The vertically integrated anomalous temperature is defined as one standard deviation of temperature at 200, 500, and 850 hPa over the WNP, due to the absence of variables at 300 and 700 hPa in the CORDEX-East Asia outputs. The wind speed threshold is defined as the sum of the oceanic global wind speed plus one standard deviation of the WNP wind speed. Here, the oceanic global wind speed value is estimated using the ratio of the global-mean wind speed to the wind speed averaged over the WNP (0–45°N, 100–160°E) in the ERA-Interim reanalysis data. However, since the estimated values in the models are much smaller than 17 m s−1, which is the threshold given for a 50-km resolution model in Walsh et al. (2007), we adjust the largest estimated value to 17 m s−1 and then modify the remaining estimated values using this ratio. The three threshold values for the five RCMs used in this study are given in Table 2.

Table 2 Thresholds used for relative vorticity (10−5 s−1), surface wind speed (m s−1), and warm core (K) for defining TC in the WNP

To reduce the uncertainty of the single RCM, we construct the ensemble average of five RCMs using the performance-based ensemble averaging (PEA) method suggested by Suh et al. (2012). They showed that the PEA had the best skill among the various ensemble methods, regardless of variables and seasons. The PEA is calculated with different weighting using root-mean-square errors (RMSE) and spatial correlation coefficients between simulated and observed TC genesis densities in the analysis area, assuming that the simulation performance is inversely proportional to the RMSE but proportional to the correlation coefficient (Eqs. 13).

$$Pw_{i} = \frac{1.0}{{\left( {RMSE_{i} + 1.0} \right)}}Corr_{i}$$
(1)
$$NPw_{i} = \frac{{Pw_{i} }}{{\sum\nolimits_{i = 1}^{{N_{M} }} {Pw_{i} } }}$$
(2)
$$\tilde{M} = \sum\limits_{i = 1}^{{N_{M} }} {NPw_{i} M_{i} }$$
(3)

here, Pw i , NPw i , N M , and M i are weighting, normalized weighting, number of ensemble members, and variable for ith model, respectively. In order to analyze the performance of the PEA, we also calculate another ensemble average as a reference using equal-weighted averaging (EWA) method, which is commonly used in multi-model ensemble studies.

3 Results

3.1 Climatology of TC activity and their related large-scale environment

Figure 2 shows the 20-year mean TC genesis density from observations and the five RCMs. TC genesis density is displayed by binning the latitude and longitude positions into a 2.5° × 2.5° grid box; this method has been generally applied to the binning of TC data (Ho et al. 2013; Kim et al. 2010). In the observations, TC genesis locations are concentrated in a region between 5 and 25°N and three local maxima: the South China Sea (SCS; 115–120°E), the western part of the Philippine Sea (PS; 125–135°E), and Mariana Trench (around 145°E). HadGEM3-RA unrealistically underestimates the TC formations over most of the WNP compared to the observations. The mean seasonal total numbers are 18.1 and 4.6 in the observations and HadGEM3-RA, respectively (see Table 3). Two main TC genesis regions over the SCS and PS simulated by RegCM are slightly southward and eastward shifted compared to the observations, respectively. This pattern is consistent with what has been reported by previous studies using RegCM (Au-Yeung and Chan 2012; Huang and Chan 2014). Despite this systematic errors, RegCM reasonably reproduces observed pattern of TC genesis (spatial correlation is 0.85) and mean seasonal total numbers (18.9). SNURCM overestimates TC formation (22.0) over the entire WNP, but it captures well the observed main TC genesis areas over the PS and the SCS (spatial correlation is 0.89). The simulated TC genesis density by WRF agrees better with the observations than that by the other models in terms of both spatial distribution and genesis frequency. The spatial correlation of TC genesis density between WRF and the observations and mean total numbers are 0.89 and 17.6, respectively, which are the best statistics as compared to the other models. GRIMs has significant errors of simulating an unrealistic spatial pattern of TCs (spatial correlation is 0.49) and underestimating TC formations (14.6), especially over the PS. These results indicate that RegCM, SNURCM, and WRF are able to simulate TC formation over the WNP reasonably, while HadGEM3-RA and GRIMs have considerable systematic errors in the simulation of TC genesis.

Fig. 2
figure 2

Climatological mean of TC genesis density in a RSMC and bh models for the period 1989–2008. Also listed in each panel are the values of mean genesis frequency, spatial correlation, and RMSE with RSMC

Table 3 Statistics for TC genesis and ACE during the period 1989–2008

For the EWA, the spatial correlation of TC genesis density with the observations increases to 0.91, since the negative biases over the SCS and the WNP in HadGEM3-RA and GRIMs are offset by the positive biases in SNURCM and RegCM. However, the EWA still underestimates the TC formations (15.9), especially over the PS due to the large systematic errors of HadGEM3-RA and GRIMs. In the PEA, the pattern and frequency of TC genesis over the WNP are considerably improved. The spatial correlation and mean total number are 0.92 and 17.4, respectively. The improved results by PEA are due to the higher weighting coefficient for RegCM, SNURCM, and WRF, which have a better performance simulating the WNP TC genesis.

Similar to TC genesis, simulated track densities simulated by HadGEM3-RA and GRIMs have relatively large errors, while simulated those by the other three models are comparable to the observations (Fig. 3). TC track density is computed by binning 6-hourly TC positions onto the corresponding grid boxes. The same TC migrating in the same grid box is counted only once. HadGEM3-RA considerably underestimates track densities over the entire WNP because it cannot resolve the relevant TC formations there as shown in Fig. 2b. RegCM overestimates the track density over the eastern part of the WNP due to the exaggerated TC genesis there (Fig. 2c). In addition, RegCM slightly underestimates the track density over the East Asian coastal regions (20–35°N, 120–140°E), which is possibly associated with the low TC formation frequency over the PS. SNURCM reasonably simulates the spatial distribution of tracks, but generally overestimates TC tracks over the SCS and the entire WNP due to the formation of more simulated TCs therein as shown in Fig. 2d. WRF reproduces the track density most realistically compared to the other models. GRIMs tends to underestimate TC tracks near the East Asian coastal regions, which is related to the largely underestimated formations of TCs over the PS as shown in Fig. 2f. In the EWA, the ensemble mean track density is underestimated over most of the WNP because of the impacts of HadGEM3-RA and GRIMs. In the PEA, on the contrary, the simulated track density is closest to the observations in term of pattern and frequency.

Fig. 3
figure 3

As in Fig. 2, but for TC track density

These spatial distributions of TC activity can be explained by large-scale circulation patterns. The monsoon trough (i.e., positive relative vorticity) and weak vertical wind shear provide favorable dynamic conditions for TC formation and development over the WNP (Fig. 4). In particular, the monsoon trough is modulated by the low-level circulation pattern such as southwesterly monsoon winds in the tropical WNP (Fig. 5). In the observation, the most TCs are generated in the region where the relative vorticity is positive or the vertical wind shear is less than 10 m s−1 (Figs. 2a, 4a). All RCMs skillfully capture the gross pattern of the observed monsoon trough and vertical wind shear, however, some notable discrepancies should be carefully considered. In HadGEM3-RA, the weakened monsoon trough by the underestimated southwesterly over the SCS and strengthened vertical wind shear along the monsoon trough appear to inhibit TC formation therein (Figs. 2b, 4b, 5b). In RegCM, the monsoon trough is strengthened by the overestimated southwesterly wind over the SCS, and the convergence is strengthened by the distorted wind directions (i.e., northerly biases) over the eastern part of the tropical WNP (Figs. 4c, 5c). These results are associated with the overestimated TC formation over these regions (Fig. 2c). In SNURCM, the monsoon trough seems to expand northward due to the overestimated southwesterly winds over the tropical WNP (Figs. 4d, 5d). These broad difference leads to the overestimated TC activity over the most basin (Figs. 2d, 3d). WRF successfully reproduces the location and strength of the monsoon trough and vertical wind shear, except for the distorted wind directions around the eastern part of the tropical WNP (Figs. 4e, 5e). This discrepancy leads to the slightly overestimated TC formation there but does not prevent it from resulting in the most realistic TC activity among the models (Figs. 2e, 3e). In GRIMs, the monsoon trough (vertical wind shear) is excessively narrower (stronger) due to the overestimated southerly component of the southwesterly wind over the tropical WNP. Additionally, the upper westerly wind (i.e., high vertical wind shear) in the mid-latitudes is enhanced and expanded southward. Since these circulation patterns provide unfavorable conditions for TC activities, TC formation and tracks are underestimated over the PS and the East Asian coastal area, respectively. In the PEA, the simulated large-scale circulations appear to be closest to those in the observation among the models and EWA.

Fig. 4
figure 4

Climatological mean of vertical wind shear (m s−1; shading) and positive 850-hPa relative vorticity (10−5 s−1; contour) in a ERA-Interim and bh models for the period 1989–2008. Red dashed lines denote the axis of the monsoon trough in the reanalysis, which is determined with the positive relative vorticity and the line of the zero-zonal wind at 850 hPa

Fig. 5
figure 5

As in Fig. 4, but for wind (m s−1; vector) and zonal wind speed (m s−1; shading) at 850 hPa

Through the aforementioned analysis, we found that most of the discrepancies in the large-scale circulation are related to the regional biases in TC activity. The simulated large-scale circulation and TC activity can also be linked to characteristics of each model such as physics scheme and spectral nudging. The cumulus parameterization schemes used in the three models (Emanuel scheme in RegCM and Kain-Fritsch scheme in SNURCM and WRF) are known to overestimate the convective precipitation and monsoon westerly in the WNP (Cha et al. 2011b, 2015; Chow et al. 2006). However, these potential systematic errors can be reduced by applying the spectral nudging to the models (Feser and von Storch 2008). In turn, these models seem to be able to appropriately reproduce the observed spatial pattern of TC activity through the realistically simulated large-scale circulation by spectral nudging. The full wind (rotational and divergent components) nudging is applied to these models, whereas the only rotational wind nudging is applied to GRIMs (Table 1). For the spectral nudging, the exclusions of observed northerly (southerly) divergent winds at the lower (upper) troposphere lead to southerly (northerly) wind biases there (not shown). These result in the narrow monsoon trough and southward expansion of upper westerly that inhibit TC formation in the PS and TC movement toward the East Asian coastal area. According to Hong and Chang (2012), the full wind (rotational and divergent components) nudging could alleviate these systematic errors but was not applied to GRIMs used in this study, because it leaded to significantly distorted mass fields aloft. Unlike these models, the spectral nudging is not applied to HadGEM3-RA. This leads to the weakened monsoon trough by the underestimated southwesterly that inhibit TC formation. In addition, the simulated precipitation over the WNP is much smaller than in observations (not shown). Given that simulations without coupled air-sea interaction generally have positive biases in oceanic precipitation (e.g., Cha et al. 2011a; Cha and Lee 2009), the cumulus parameterization scheme resulting in the underestimated oceanic precipitation in HadGEM3-RA may not be appropriate for simulating the convective activity in the tropical WNP. These suggest that the extremely underestimated TC activity in HadGEM3-RA can be associated with not only the dynamically but also the physically unfavorable conditions for TC formation and development.

3.2 Intensity distribution

In this section, the distributions of the life-time maximum wind speed and minimum sea level pressure for all RCMs including the wind-pressure relationship are presented. These results can show how the simulated TC intensity varies in models with identical resolution, since the simulated TC intensity depends not only on the horizontal resolution but also on the physics (e.g., Tao et al. 2011).

Figure 6 shows the frequency of occurrence of maximum wind speed and minimum sea level pressure in the observations and models. Since 50-km resolution is not sufficient to resolve intense TCs, both observed variables are uniformly distributed but the simulated ones are skewed toward the weak intensities. In the simulations, there is a qualitative difference between the distributions of maximum wind speed for HadGEM3-RA and RegCM on the one hand, and SNURCM, WRF, and GRIMs on the other. In HadGEM3-RA and RegCM, the distributions are too narrow and skewed toward lower wind speeds, and their right tails above 32 m s−1 corresponding to typhoon intensity, are not reproduced at all. In SNURCM, WRF, and GRIMs, the shapes of the distribution become more similar to that in the observations, although these models tend to overestimate (underestimate) wind speeds below (above) 37 m s−1. This discrepancy has been detected in previous studies (Kim et al. 2014a; Murakami and Sugi 2010). It is noteworthy that simulated TCs by HadGEM3-RA do not have higher wind speeds above 27 m s−1, although the model has an advantage in resolving meso-scale convection associated with TC as a non-hydrostatic model. By contrast, GRIMs has a reasonable distribution of winds speeds, similar to the other models, in spite of being a hydrostatic model. Furthermore, GRIMs is the only model that can simulate higher wind speeds sufficiently exceeding 42 m s−1. This is because a revised roughness length formulation based on Donelan et al. (2004) is used to effectively reduce the surface drag coefficient in the presence of high winds. As a result, GRIMs seems to have a reasonable distribution of maximum wind speed. These results suggest that appropriate physics such as surface exchange as well as dynamics such as finer resolution and dynamic framework including finer resolution are important for the simulation of TC intensity.

Fig. 6
figure 6

Distribution of a maximum wind speed and b minimum sea level pressure from the RSMC and models for the period 1989–2008

Except for GRIMs, the distributions of minimum sea level pressure can be explained similarly as that of the aforementioned maximum wind speed. GRIMs seems to overcome the limitation of the hydrostatic model by showing a reasonable distribution of maximum wind speeds, however, there are too many weak TCs with pressures above 990 hPa. In HadGEM3-RA and RegCM, the distributions are too narrow and skewed toward higher sea level pressure, and their left tails below 970 hPa are not reproduced at all. In SNURCM and WRF, although there are still some differences compared to the observation below 970 hPa, the differences become smaller as compared to the other models.

Another way to assess simulated TC intensity is to examine the maximum wind speed with respect to the corresponding sea level pressure. This wind-pressure relationship is shown in Fig. 7, along with the lines denoting the second-order polynomial fit to the data points in the scatter diagram. Regardless of the simulated maximum intensities, HadGEM3-RA and SNURCM have difficulty generating strong enough wind speeds for a given pressure compared to observation. This feature has been attributed to insufficient horizontal resolution (Manganello et al. 2012) and deficiencies in the surface momentum flux parameterization (Powell et al. 2003; Moon et al. 2007). These issues are well known with other current models (e.g., Knutson et al. 2007; Murakami and Sugi 2010). In contrast, the RegCM and GRIMs have difficulty generating stronger wind speeds for a given pressure compared to observations. As the wind speed increases, this deficiency is reduced in RegCM but abnormally increased in GRIMs. This feature of GRIMs is consistent with the distribution that it is reasonable in the wind speed but skewed toward weak intensities in the pressure. This could be because GRIMs has the realistic surface momentum exchange in the high wind speed range but has difficulty in resolving meso-scale convection as a hydrostatic model. WRF is closest to observations compared to the other models.

Fig. 7
figure 7

Scatter diagram of the lifetime maximum wind speed and minimum sea level pressure in RSMC (black), HadGEM3-RA (red), RegCM (yellow), SNURCM (green), WRF (blue), and GRIMs (purple) for the period 1989–2008. Solid line shows the polynomial fit to the data points in the observation and models, respectively

3.3 Interannual variability

The capability of a model to reproduce the observed interannual variability in TC activity is important for assessing its overall performance in simulating TC activity. Figure 8a represents the time series of the observed and simulated TC genesis frequency. The capability of the models is measured by the RMSE and temporal correlation between observations and simulations (Table 3). HadGEM3-RA and GRIMs have large RMSE (14.0 and 7.9), low values of temporal correlations (not statistically significant 0.38 and 0.07), and a non-decreasing trend. In particular, GRIMs does not capture the observed interannual variation of TC genesis at all. This might be associated with unrealistic large-scale circulation in GRIMs (Fig. 4f). HadGEM3-RA also has relatively low temporal correlation due to the extremely underestimated TC genesis caused by the weakened convective activity over the WNP. This indicates that advancement in the convective parameterization in HadGEM3-RA can lead to an improvement in the interannual variability of the simulated TC genesis.

Fig. 8
figure 8

Annual number of a TC genesis and b ACE (104 m2 s−2) in RSMC and models during the period 1989–2008. Red and blue bars show El Niño and La Niña years, respectively

Overall, RegCM, SNURCM, and WRF skillfully capture well the interannual variations of TC genesis frequency with smaller RMSE (5.1, 6.1, and 4.0), significant temporal correlation (0.56, 0.44, and 0.53), and a decreasing trend, except that SNURCM shows a higher frequency than the observations during the period 1998–2002 which includes El Niño and La Niña event years. SNURCM tends to overestimate TC genesis in El Niño and La Niña years as compared to neutral years (not shown). RegCM has the highest temporal correlation and WRF has the smallest RMSE as compared to the other models. The temporal correlations in these models are similar or lower than those (0.43–0.75) reported in previous studies (Au-Yeung and Chan 2012; Jin et al. 2013; Zhan et al. 2011; Zhao et al. 2009). Given that the continuous 20-year integration and objectively defined thresholds for TC in this study as compared to the previous studies, which had seasonal integrations and their own preferred thresholds for TC, these models have reasonable capability in capturing the interannual variability of TC genesis. Interestingly, RegCM, SNURCM, and WRF (HadGEM3-RA and GRIMs) show higher (lower) ability in simulating not only the spatial distribution of TC activity but also their interannual variation. This suggests that the models’ performances of TC spatial pattern and temporal variation are similar to each other, since the frequency and locations of TC formation are simultaneously modulated by large-scale circulation such as the El Niño-Southern Oscillation (ENSO) and the WNP subtropical high (Ho et al. 2004; Wang and Chan 2002). Similarly, the performances of the interannual variabilities in the two ensemble averages are analogous to those of the spatial distribution. The EWA has a higher temporal correlation coefficient (0.58) than the other models, although HadGEM3-RA and GRIMs have statistically insignificant low correlation coefficients. Furthermore, the PEA generally fares better than the other models; RMSE and temporal correlation are 3.4 and 0.64, respectively.

In addition to TC genesis frequency, we examine the interannual variability of the seasonal accumulated cyclone energy (ACE), which is defined as the sum of the square of the maximum wind speeds during the lifetime of each TC during JJASON (Fig. 8b). The ACE index is useful for understanding the interannual variability of TC activity influenced by the ENSO, because it is positively correlated with the ENSO, which modulates the location of TC formations (Camargo and Sobel 2005). Since TCs in HadGEM3-RA have very weak intensities that seldom exceed the ACE threshold of 17 m s−1 (Fig. 6a), we define the ACE in the models, as the sum of the squares of the TC’s maximum wind speed, sampled at 6-h intervals without any intensity threshold (e.g., Camargo et al. 2005; Shaevitz et al. 2014). The models, except for HadGEM3-RA (5 %), have roughly 40–70 % of the observed mean ACE (42.9) because of the limitations of the TC-resolving capability in the models with 50-km horizontal resolution. Regardless of these large biases, the temporal correlations coefficients between the observed and simulated ACEs are 0.40–0.79, which is statistically significant at 90–99 %. Besides, these biases tend to be smaller during the La Niña (e.g., 1998 and 1999) years when strong TCs are reduced. In the EWA and PEA, the interannual variability for the ACE are comparable to those for TC genesis, although most models largely underestimate the mean ACE.

4 Summary and conclusion

In this study, CORDEX multi-RCMs were evaluated for their ability to capture climatology, intensity and interannual variability of TC activity over the WNP for the period 1989–2008. The multi-RCM ensemble consists of five different RCMs with ~50 km resolution driven by ERA-Interim reanalysis. Performances of the individual models were investigated in detail for TC activity compared to observational datasets.

  1. 1.

    HadGEM3-RA and GRIMs have prominent systematic errors in the spatial patterns of simulated TC activity over the WNP, while RegCM, SNURCM, and WRF tend to simulate them reasonably. In particular, WRF has the most realistic spatial patterns of TC activity compared to the other models and successfully reproduce the observed large-scale environment. On the other hand, HadGEM3-RA largely underestimates TC activity which is associated with unrealistically unfavorable conditions (i.e., weaker monsoon trough and convective activity in the tropical WNP) for TC formation and development.

  2. 2.

    With respect to the distributions of TC intensities, HadGEM3-RA and RegCM appear to have difficulty in reproducing TCs corresponding to typhoon intensity, while SNURCM, WRF, and GRIMs are capable of simulating TCs of this intensity. However, for intense TCs above 47 m s−1 and below 930 hPa, all RCMs critically limited, implying that the 50-km resolution is not sufficient to resolve the observed intensity of TCs.

  3. 3.

    HadGEM3-RA and GRIMs do not capture the observed interannual variation of TC genesis frequency with larger RMSE and low correlations, while RegCM, SNURCM, and WRF capture it well with smaller RMSE and high correlations. These two group are identically classified in relation to the spatial pattern of TC activity. Their performances of spatial pattern and interannual variation of simulated TCs correlate with each other, since the frequency and location of TC formation are simultaneously modulated by the large-scale circulation. For the interannual variability of ACE, most models have approximately one half of the observed mean ACE, but they have high correlations.

The individual models have a variety of systematic biases for simulating TCs, however, the ensemble averages successfully capture the spatial and temporal nature of TC activity. In particular, the PEA with smaller biases and higher correlation generally outperforms individual models and the EWA, since model results are averaged with different weighting based on the model performance. This confirms that the PEA provides the best performance of not only temperature and precipitation mainly applied in previous studies, but also TC activity by reducing the uncertainty in the simulated TC activity by a single RCM. Therefore, we conclude that the multi-RCM ensemble within CORDEX-East Asia can be applied to provide more reliable and credible estimation of future TC activity over the western North Pacific due to climate change.

In the future work, we intend to investigate the change in TC activity in response to two warming scenarios [representative concentration pathway 4.5 (RCP4.5) and RCP8.5] compared to historical simulations using multi-RCMs ensemble evaluated in this study. Furthermore, in order to more reliable climate change projections of TC intensity, the resolution of the CORDEX RCMs should be increased so far as computing resources permit over multi-decadal time scales. At the same time, it is necessary to improve the spectral nudging and physical processes such as convective parameterization and surface momentum flux parameterization.