1 Introduction

Climate models have proven to be effective tools for investigating the climate system and climate change. Over recent years, the Coupled Model Inter-comparison Project 5 (CMIP5) has developed a large number of climate simulations from global climate models (GCMs) (Taylor et al. 2012). Although these GCMs can capture the main large-scale circulation characteristics, studies into their application to regional climate simulations are limited because their resolutions are 100–300 km, which is too coarse to obtain useful information on climate features at a regional scale (Oreskes et al. 2010; Kitoh et al. 2013). Furthermore, physical processes are not accurately resolved at these resolutions, which poses additional disadvantages for accurately representing regional climates (Gao et al. 2015).

China is characterized by unique topography, landscapes and monsoon climates, with Tibetan Plateau in the west, the western Pacific Ocean in the east, and complex coastlines along the southeastern edge. These complex conditions make accurate simulations of climate change over China using GCMs a challenge. Substantial biases have been found in most GCMs to represent East Asian monsoon characteristics, especially for the associated summer precipitation (Kang et al. 2002; Wang et al. 2011) and surface air temperature spatial distributions, as well as their increasing trends (Zhou and Yu 2006). To overcome these difficulties, dynamical downscaling is indispensable, in which high resolution features are derived using regional climate models (RCMs) for a particular domain by employing GCMs or global reanalysis data as the lateral boundary conditions (LBCs). With better resolved regional characteristics, such as surface topography, vegetation and land-sea distributions, RCMs are more capable of reproducing climate change in East Asia than GCMs, as documented in successive Intergovernmental Panel on Climate Change (IPCC) reports (IPCC 2007a, b, 2012). These high confident climate change information of RCMs are also used as critical input data by various impact-relevant communities such as agriculture, hydrology, and health (Qian and Leung 2007; Liu et al. 2016). Recently, the RCM research is moving toward the development of convection-permitting models usable for resolution of a few kilometers, and have shown more detailed information over mountain ranges and better reproduced the formation of mesoscale phenomena than coarser resolutions of 25–50 km (Stéfanon et al. 2014; Prein et al. 2015; Giorgi 2019). However, it may be difficult to conduct such simulations over China because of expensive computational costs and the scarcity of observations for evaluation.

The World Climate Research Program (WCRP) initiated an international framework for RCM dynamical downscaling, namely the Coordinated Regional Climate Downscaling Experiment (CORDEX), to improve regional climate projections worldwide (Giorgi et al. 2009; Jones et al. 2011; Gutowski et al. 2016; Lake et al. 2017). The inclusion of major RCMs, typically used over China or East Asia, in the CORDEX project, provided an opportunity to compare the performance of various RCMs over this region. Zhou et al. (2016) compared an ocean–atmosphere coupled RCM with its stand-alone version to investigate the effects of air–sea interactions in simulating East Asian summer monsoon (EASM) rainfall. Park and Min (2019) systematically evaluated the performances of five RCMs in reproducing climate extremes during summer. Yu et al. (2020) compared a small-scale regional climate model (CCLM), Weather Research and Forecasting model (WRF) and Regional Climate Model (RegCM) for past climate performance.

Among various RCMs, RegCM (Giorgi et al. 1993a, b, 2012) developed at Abdus Salam International Centre for Theoretical Physics (ICTP) is the most commonly used. A number of experiments have been performed to downscale current and future regional climate with different generations of RegCM (Im et al. 2007; Ju et al. 2007; Zhang et al. 2008; Gao et al. 2016). Typically, compared to the driving GCMs, RegCMs do not only show more detailed geographical features, but also produces improved temperature and precipitation simulations in the warm monsoon season. There are still some systematic model biases persisting in the cold seasons, characterized by an underestimation of surface air temperature over China (Gao et al. 2017). Therefore, the impacts of model configurations (e.g. domain size, horizontal resolution), LBCs and physical parameterization schemes of RegCM systems have been explored and combinations for better reproducing regional climate over China have been suggested (Luo et al. 2002; Park et al. 2013; Zou et al. 2014; Wang et al. 2014).

The WRF was initially developed and tested for the simulation and forecasting of weather, and it is increasingly used as a RCM. The added value of downscaling the surface air temperature and precipitation from GCM simulations or reanalysis data over China with WRF has been documented in many studies (Yu et al. 2010, 2015; Ma et al. 2015; Bao et al. 2015). For example, Bao et al. (2015) showed that WRF can successfully simulate precipitation and its extremes, particularly in the western Sichuan Basin and eastern Tibetan Plateau regions. A regional downscaling over China using WRF was performed by Yu et al. (2015), in which the main features of climate extremes were predicted more accurately than the driving MIROC5. Some studies have used the ensemble approach to leverage the uncertainties of WRF dynamical downscaling with different physical configurations or LBCs, and superior behaviors against the driving reanalysis data in simulating circulation and precipitation are found (Yuan et al. 2012; Li et al. 2016).

A few studies have reported inter-comparisons based on regional climate simulations from RegCM and WRF models (Tang et al. 2016; Yu et al. 2020). However, most of these studies have focused on mean climatology, and extreme precipitation and climate trends have not been assessed sufficiently over China. Furthermore, neither of the RCMs can fully reproduce specific variables over the target regions. The ensemble approach, developed to reduce the uncertainties of a single model, is widely used to improve the climate simulations (Christensen et al. 2010; Ishizaki et al. 2012; Suzuki-Parker et al. 2018). Therefore, it is important to optimize a two-member ensemble based on RegCM and WRF models and verify its skills.

As part of the CORDEX project, we compared and analyzed the seasonal climate simulation performances of WRF relative to RegCM4 forced by reanalysis data during 1981–2010 over continental China. We also used 30-year historical records of RegCM4 and WRF simulations, downscaled from the global reanalysis data to develop the optimal ensemble. The paper is structured as follows: Sect. 2 gives a description of the RCMs, experimental designs, data, and methodology used in this study. Evaluations of present climate simulations of surface air temperature and precipitation, extreme precipitation and optimal ensemble results are shown in Sect. 3. Section 4 presents a discussion of the results and conclusions.

2 Experimental design, data and methods

2.1 Model, data and experimental design

As previously mentioned, the RegCM4 (version 4.6; Giorgi et al. 2012) and WRF (version 3.9; Skamarock et al. 2008) were used for dynamical downscaling. Many physical processes of the two models have been updated since previous versions. The model configurations including model dynamics, buffer zones and physical schemes, are listed in Table 1. The center of the WRF domain (Fig. 1b) is located at (35.17° N, 110° E) and covers all of continental China and nearby oceans. The spatial resolution is 30 km with 231 (x direction) × 171 (y direction) horizontal grid cells and 36 vertical levels extending to 50 hPa. As discussed in Liu et al. (2008), this domain was determined by considering interactions between the planetary circulation and East Asian surface processes with respect to orography, soil, vegetation and coastal oceans, which was optimal for climate simulations over China using RCMs. The buffer zones were located across 12 grids along each of four domain edges, where a dynamic relaxation technique was used (Davies and Turner 1977). The RegCM4 domain was the same as WRF, but there were only nine grids in the buffer zones, where exponentially nudging was used (Giorgi et al. 1993b). As typically applied, there were 28 vertical layers in RegCM4 spanning from the surface to 50 hPa.

Table 1 Model configurations for RegCM4 and WRF
Fig. 1
figure 1

Topographies (units: m) of the R2 reanalysis (a), RegCM4 and WRF models over China (b). The boxes represent the selected regions: Northeast China (NE), North China (NC), Yangtze River (YZ), Southeast China (SE), Northwest China (NW), Tibetan Plateau (TP), and Southwest China (SW)

The RegCM4 and WRF runs were initialed on 1 December 1980 and the simulations were continued until 31 December 2010. The first month is regarded as the spin-up period, and the simulation results during 1981–2010 were analyzed. The initial and lateral boundary conditions of RegCM4 and WRF were the National Centers for Environmental Prediction/Department of Energy (NECP-DOE) reanalysis II (R2; Kanamitsu et al. 2002) data, which had a horizontal resolution of approximately 2.5° × 2.5° with 18 vertical levels. Sea surface temperature (SST) forcing was the daily Optimum Interpolation SST dataset from the daily National Oceanic and Atmospheric Administration (NOAA) (OISST; Reynolds et al. 2002). Figure 1 shows the topographic heights for R2, RegCM4 and WRF, which are higher in the west and lower in the east. The highest section is formed by the Tibetan Plateau, with altitudes above 3000 m. The middle section is formed by basins and plateaus, mostly ranging from 500 to 3000 m. The lowest section is marked by foothills and lower mountains with altitudes below 500 m. It can be seen that clearer coastlines and terrain details were displayed by RegCM4 and WRF models than R2.

2.2 Observations, extreme precipitation indices and optimal ensemble approach

The high-quality daily surface temperature and precipitation gridded data with a resolution of 0.25° × 0.25° deduced from 2416 surface stations during 1961–2019 (Wu and Gao 2013) is used as observation. To facilitate RCM inter-comparison and verification against observations, the observation data were interpolated into 30 km grids using the objective analysis method (Barnes 1964). Additionally, seven sub-regions (Fig. 1b) were selected according to different climate characteristics, over which simulation results were analyzed. These regions include Northeast China (NE), North China (NC), Yangtze River (YZ), Southeast China (SE), Northwest China (NW), Tibetan Plateau (TP) and Southwest China (SW).

As in previous studies, three extreme precipitation indices were used (Table 2). The simple daily intensity index (SDII) represents the mean daily rainfall amounts, and R10 is the number of days with a daily precipitation amount > 10 mm, indicating heavy precipitation frequency. R95pT is defined as the percentage of the total rainfall exceeding the long-term 95th percentile, representing strong precipitation events. These indices are generally effective to extract climate extremes information that is sensitive to global warming and they have been widely applied to identify and monitor extreme precipitation (Zhang et al. 2011; Li et al. 2013; Huang and Gao 2017).

Table 2 Indicator, acronym and definitions of the extreme precipitation indices

Because each RCM has its own strength and weakness in simulating specific variables and regions, we computed the optimal weight for RegCM4 and WRF grid cells using a feasible sequential quadratic programming algorithm (FSQP; Zhou et al. 1997). FSQP is an effective method to minimize objective functions, which are constrained by linear or nonlinear, equality or inequality equations. The final solution was chosen to minimize the root-mean-square errors (RMSE) of the ensembles from the daily temperature and precipitation observations.

3 Results

3.1 Temperature climatology

Figure 2 illustrates the geographic distributions of winter (DJF), spring (MAM), summer (JJA) and autumn (SON) mean surface air temperature during 1981–2010 from observations, driving data (R2), and RegCM4 and WRF downscaling simulations. Observations indicated that the temperature generally decreased from southeast to northwest China, and the temperature over the mountains in Tibetan Plateau and Northwest China was colder than surrounding areas. R2 gave a good but rather smooth pattern, which missed the warm area in the Sichuan Basin, and the cold area in mountainous regions over Northwest China throughout the year. Additionally, it smoothed out most regional-scale structures over the Yangtze River and Southeast China, particularly in summer. These features were better resolved by RegCM4 and WRF, indicating strong support for the RCM downscaling ability. However, systematic cold biases in cold seasons were produced by both RegCM4 and WRF, which have also been found in Liu et al. (2013). These biases are not obvious in R2. This suggests that the biases originate in the RCMs, probably because of their inability to reproduce the snow process or its albedo feedback (Yao et al. 2016). In high latitudes and the Tibetan Plateau, the biases of RegCM4 were generally smaller than those of WRF in all seasons, which were 2–3 °C colder than the observations. In contrast, over Southeast China, the temperature was more realistically simulated by WRF, while RegCM4 gave a larger bias.

Fig. 2
figure 2

Spatial distributions of seasonal mean (DJF (a, e, i, m), MAM (b, f, j, n), JJA (c, g, k, o), SON (d, h, l, p)) temperature (units: °C) over China during 1981–2010 from observation (ad), R2 reanalysis (eh), RegCM4 (il) and WRF (mp) downscaling simulations

To access the capability of the models to reproduce temperature quantitatively, Taylor diagrams (Taylor 2001) including pattern correlation, root-mean-square error and spatial standard deviation ratio were examined (Fig. 3). The reference point represents a perfect simulation, with both the spatial correlation and ratio of standard deviations equal to 1, and the centered normalized RMSE equal to 0. Both WRF and RegCM4 had RMSEs within the range of 0.25–0.5, while RMSEs of R2 was larger than 0.5 for all regions. The temperatures downscaled by WRF and RegCM4 were also significantly correlated with observations with spatial pattern correlation between 0.9 and 0.97, while the R2 correlation was less than 0.9. This indicates that both RCMs reproduced a reasonable pattern and magnitude of temperature as shown in Fig. 2. Compared to RegCM4, WRF had a better downscaling ability for temperature with higher spatial pattern correlations and closer magnitudes in the YZ, SE and SW regions, but less accuracy over the NE, NW and TP regions. The differences between the two RCMs were small in the NC region.

Fig. 3
figure 3

Taylor diagrams for the annual mean temperature (units: °C) over NE and NC (a), NW and YZ (b), SE and SW (c), and TP (d) regions from R2 reanalysis, RegCM4 and WRF downscaling simulations

To assess the models’ capability in simulating daily temperature, the bias (relative to observation) frequency distributions of temperature for each sub-region are displayed in Fig. 4. R2 was the most realistic in the NE and NC regions, where the bias frequency peaked around 0 °C, while a cold bias of 2 °C was identified for WRF (RegCM4) simulation in the NE (NC) region. For the YZ, SE and SW regions, the bias frequency in RegCM4 peaked at − 2 to  − 3 °C, indicating systematic underestimations, which were largely reduced by WRF, where the peak shifted to − 0.5 to 1 °C. The R2 pattern was close to that of WRF, but with larger cold biases, especially in the SE region. For the NW and TP regions, RegCM4 had smaller bias than WRF and R2, with bias frequency peaks at − 0.5 °C in the NW and − 1.5 °C in the TP region, indicative of relatively better performance.

Fig. 4
figure 4

The bias frequency distributions of daily temperature (units: °C) over NE (a), NC (b), NW (c), YZ (d), SE (e), SW (f), and TP (g) regions from R2 reanalysis, RegCM4 and WRF downscaling simulations

3.2 Precipitation climatology

The evaluation of precipitation is also very important in climate simulation, especially for China which exhibits large spatial and temporal variability. The distributions of seasonal mean precipitation during 1981–2010 from the observations, R2 reanalysis, and RegCM4 and WRF downscaling simulations are compared in Fig. 5. Observations exhibit a clear southeast to northwest gradient, with minimum precipitation over Northwest China in all seasons. In DJF, cold and dry air were brought to China from the polar region by the winter monsoon, leading to very low precipitation (< 1 mm) in middle and high latitudes. This feature is captured well by the R2, but regions with precipitation of more than 2.5 mm extended west into Southeast China compared to the observations. The observed rainfall patterns over Southeast China were improved by the two RCM simulations, while dry biases existed in the RegCM4 (1–1.5 mm) and WRF (0.5–1 mm). They overestimated precipitation by 0.1–0.3 mm in high latitudes and Tibetan Plateau, and 0.5–1 mm over Southwest China. Since winter precipitation is primarily associated with non-convective systems, it is essential for RCMs to improve the interactions among surface, PBL, and cloud microphysics parameterizations (Liang et al. 2019). In MAM, the observed precipitation amount began to increase, and two main rainfall centers were located in the southern Yangtze River and northern Pearl Rivers. The R2 still produced a westward shifted center, as well as an artificial maximum precipitation area in the Sichuan Basin. Both RCMs underestimated these rainfall centers, and overestimated precipitation in Southwest China, with RegCM4 in particular producing a substantial deficit.

Fig. 5
figure 5

As in Fig. 2, but for precipitation (units: mm day−1)

In JJA, observations were characterized by two major rain belts located along the Yangtze River and Southeast China, which closely corresponded to the EASM. The precipitation was strongly overestimated by R2, result in a 2–3 mm wet bias over Southeast China, some parts of Northeast China and the southeastern edge of the Tibetan Plateau. The wet bias over the Tibetan Plateau is probably related to the orographic effect. The Himalayas in R2 are much lower than in reality because of the coarse resolution, which results in the penetration of precipitation into the Plateau (Gao et al. 2008). The WRF more realistically reproduced the two rain belts, although with an overestimation of 1–2 mm. The RegCM4 failed to distinguish the rain belt in Yangtze River, where its dry biases were mostly within 2–2.5 mm. In SON, the observed precipitation declined with broad rainfall ranging from 2–4 mm south of 35°N. These features were well captured by WRF, while R2 showed the maximum precipitation area in Southwest China, and RegCM4 shifted the rainfall center to the upper Yangtze River Basin, which were not found in the observations.

With regard to regional differences, the performances of RegCM4 and WRF relative to R2 in simulating annual precipitation are summarized in Taylor diagrams (Fig. 6). Except for the NE and NC regions, the RMSEs of R2 and RCMs were above 0.5, and the spatial correlations were smaller than 0.85, which were less accurate than temperature. This is because rainfall is more easily influenced by nonlinear processes, which are not yet fully resolved in current RCMs (Pérez et al. 2014). However, both WRF and RegCM4 outperformed R2 in precipitation simulation with higher correlations and lower RMSEs in all regions. Most notable was the ability of WRF dynamical downscaling in the rain belt (YZ, SE and SW) and NW regions, where spatial correlations ranged from 0.5 to 0.8, and RMSEs were below 1. In contrast, RegCM4 generated higher correlations than R2 in the NE, NC, NW, SE and SW regions, but produced larger RMSEs in the YZ region due to its underestimation of rainfall as described in Fig. 5. For the TP region, RegCM4 correlated more with observations than R2 and WRF, although the ratio of variance of WRF was closer to 1. Generally, the WRF was more capable of capturing the precipitation in rain belt regions, while the RegCM4 tended to better simulate the precipitation in the NE, NC and TP regions.

Fig. 6
figure 6

As in Fig. 4, but for precipitation (units: mm day−1)

Figure 7 displays the simulated annual cycles averaged over each sub-region, as well as the observations and R2 reanalysis. All models could precisely capture the observed seasonalities in each region, with clearly identified summer peaks. The overall annual cycle was captured well in R2, but the monthly rainfall amount over the rain belt and TP regions was largely overestimated in summer, mainly as a result of the overestimation of convective precipitation. Both RCM simulations reduced these wet biases in wet seasons by approximately 10–20%, although no obvious improvements were found in the dry seasons. Similar to the results of Taylor diagrams above, the WRF was generally realistic over the rain belt regions with higher correlations, although it produced somewhat earlier rainfall peaks in the SW region. Over the NE, NC and TP regions, the RegCM4 gave the best fit to the observed cycles and less biases except for the NW region, which was not resolved by any models.

Fig. 7
figure 7

Annual cycles of precipitation (units: mm) over the seven analysis sub-regions during 1981–2010

To depict the seasonal rain belt evolution over eastern China, the time-latitude cross-sections of daily precipitation averaged between 105°E and 122°E are presented in Fig. 8. The observations show that the rain belt began to move north in May and reached 30°N before June, and then extended to ~ 35°N in early July. This seasonal march of the EASM rain belt was associated with the western Pacific subtropical high (WPSH) jumping northward three times, as also revealed in Wang and Yang (2008). From mid-May, the southwesterly winds move northward and bring abundant precipitation to south China (22–26°N), signifying the onset of the EASM. Both R2 and RegCM4 shifted the timing of the summer monsoon onset, although this was corrected by WRF. From early June, a strong monsoonal flow migrated to the Yangtze River (26–32°N), which led to heavy precipitation over Southeast China and Yangtze River. This was also the primary month for the Meiyu front to occur, which affected the timing of peak rainfall in Fig. 7d–e. WRF captured these features well, while R2 overestimated and RegCM4 underestimated the rain belts in these regions. In mid-July, the rain belt shifts to North China and precipitation weakens. Interestingly, both the simulations and observations showed a second precipitation maximum over the Southeast Coast (near 22°N) from mid-July to late August. This may relate to the movement of the Inter-Tropical Convergence Zone (ITCZ) and typhoons. Overall, the WRF agreed better with the observed northward march of the EASM, although there was some overestimation in the north of the Yangtze River in late June.

Fig. 8
figure 8

Latitude-time cross-sections of the precipitation (units: mm day−1) average between 105°E and 122°E during 1981–2010 from observation (a), R2 reanalysis (b), RegCM4 (c) and WRF (d) downscaling simulations

3.3 Extreme precipitation

The detection of extreme precipitation is important, but it is always a challenge, because precipitation has an inconsistent response to greenhouse gases (Lambert et al. 2005). Figures 9, 10, and 11 compare the observed spatial distributions of seasonal mean extreme precipitation indices (SDII, R10 and R95pT) during 1981–2010 with those from R2 reanalysis, RegCM4 and WRF simulations. The observed pattern of daily precipitation intensity (SDII) is similar to that of the seasonal mean precipitation, which is characterized by a transition from small values in Northwest China to large values in Southeast China (Fig. 9a–d). In DJF and MAM, the WRF produced larger values (> 7 mm d−1) over Southeast China, which reduced the underestimation by 10–20% reflected in both R2 and RegCM4. However, the RegCM4 biases over Southwest and Tibetan Plateau were smaller than the other models in spring, and comparable in winter. In JJA, the SDII was overestimated by 40% in R2 over Southwest China and the southern edge of the Tibetan Plateau, and underestimated by 20% in RegCM4 over the Yangtze River and Southeast China. These biases were significantly reduced by WRF, but there was a larger bias of 2 mm d−1 over Northeast China compared with RegCM4. In SON, the WRF simulated more realistic regional details and less overall biases in eastern China than R2 and RegCM4, except for Northeast China where overestimation occurred.

Fig. 9
figure 9

Spatial distributions of seasonal mean (DJF (a, d, g), MAM (b, e, h), JJA (c, f, i), SON (c, f, i), SDII (units: mm day−1) over China during 1981–2010 from observation (ad), R2 reanalysis (eh), RegCM4 (il) and WRF (mp) downscaling simulations

Fig. 10
figure 10

As in Fig. 9, but for R10 (units: day)

Fig. 11
figure 11

As in Fig. 8, but for R95pT (units: %)

In contrast, for heavy precipitation days (R10, Fig. 10), the broad gradient from the southeast to the northwest was generally represented by all models. Both RCMs captured many detailed features at regional scales that were absent in R2 reanalysis data, mainly because of the models’ better physical representation refinements. In the R2, like most GCMs or reanalysis data, the frequency of heavy precipitation was too high in all seasons. This was particularly true for the summer and fall, when the smooth pattern was too strong over Southwest China and eastern China. The regions where R10 was overestimated covered less area with smaller biases ranging from 2 to 4 days in WRF simulation in all seasons. The observed maximum R10 (> 19 days) was underestimated by RegCM4 by up to 10–20% over Southeast China. The systematic error in the simulated mean precipitation (Fig. 5i–l) seemed to be resembled by these underestimations. However, the RegCM4 produced realistic values over Northeast China in summer and fall, while R2 and WRF generated large positive biases.

Figure 11 compares the heavy precipitation fraction (R95pT) among the models. In DJF, RegCM4 produced the most realistic simulation over Southeast China, whereas the R95pT was overestimated by WRF and underestimated by R2. In MAM, R2 failed to simulate the observed high R95pT over Northeast China and South China, which were better simulated by both RCMs, especially by RegCM4 in Northeast China. In JJA when the observed extreme events can explain as much as 25–30% of the total seasonal precipitation, R2 suffered from a large underestimation of up to 5–10% over most of eastern China, suggesting that the mesoscale convection embedded in Meiyu cloud bands, which are associated with heavy precipitation, may not be well represented. This underestimation is reduced in RegCM4 and WRF. In SON, the observed large R95pT along the southeastern coast is captured well in WRF, but RegCM4 underestimates its intensity while it is totally missed in R2. This result may be a sign of WRF’s ability to resolve precipitation enhancement by sea breezes. Nevertheless, the WRF generated larger errors than RegCM4 over Northeast China (winter, spring and fall), with errors exceeding 8%. These results indicate that both RCMs can better simulate the mesoscale processes that produce extreme precipitation than R2, but their differences also indicate uncertainties between the models in the simulation of the spatial pattern of extreme precipitation.

3.4 Climate trends

A warming trend dominated the global climate over the past century (Taylor et al. 2012). Figure 12 presents the spatial distributions of annual mean temperature trends from the observations and different models during 1981–2010. Observations showed that much of the land area in China has increasing trends especially for the high latitudes, where an increasing trend of 0.4–0.6 °C (10a)−1 was shown. Trends were generally statistically significant at the 95% level. The increasing trends in most low and middle latitudes were much smaller than those in high latitudes, also noted by the studies of Yang et al. (2016). The distribution of these changes is consistent with the increasing trends of hot days and heat weaves, especially over Northwest China, Northeast China and the Yangtze River (Ding et al. 2010). Compared with observations, all models could capture the main trend pattern with values increasing from south to north, but differed in the absolute values. Both R2 and RegCM4 presented a smaller amplitude than observations at high latitudes and the Yangtze River in general, where the increasing trends were mostly below 0.5 °C (10a)−1. These underestimations were greatly reduced by WRF, and the simulated pattern was improved, as there was an increased spatial correlation coefficient of 0.89 compared with the R2 (0.85) and RegCM4 (0.83).

Fig. 12
figure 12

Long-term trends of annual-mean temperature (units: °C a−1) over China during 1981–2010 from observation (a), R2 reanalysis (b), RegCM4 (c) and WRF (d) downscaling simulations (stippling indicates areas where the trend is statistically significant at the 95% confidence level)

For annual precipitation (Fig. 13), observations showed significant increasing trends over Southeast China, and decreasing trends over Northeast China, which concurs with previous studies by Ding et al. (2008) and Li et al. (2017). The decadal shift of precipitation over eastern China in the late 1970s may be partly responsible for the contrasting precipitation trends over the region (Kwon et al. 2007). The observed annual precipitation also showed an increase in North China and the eastern edge of the Tibetan Plateau, and a decrease in central and Southwest China. These features were captured to some extent by RegCM4, but the trends in the lower reaches of the Yangtze River and some parts of Southwest China were the opposite to observed trends. Both WRF and R2 produced an overall more realistic spatial pattern except for the east of Southwest China, and the spatial correlation coefficients were up to 0.84 and 0.81, respectively. However, the R2 revealed a smaller increasing trend over most of the Yangtze River and Southeast China, and a decreasing trend over central China compared with observations. Overall, the WRF showed the best ability to simulate both temperature and precipitation trends, compared to the R2 and RegCM4.

Fig. 13
figure 13

As in Fig. 12, but for precipitation (units: mm a−1)

3.5 Optimal ensemble temperature and precipitation

The above analysis indicated that both RCMs had some skills in the temperature and precipitation simulation over the target regions, but an optimal ensemble of different models for each grid cells is needed to improve the RCM downscaling ability. Figure 14 displays the 30-year annual mean temperature from observations, RegCM4, WRF simulations and their ensemble results. Both RCMs still had underestimations over most parts of China, and their biases varied geographically, as found in the analysis of the seasonal pattern in Fig. 2. The RegCM4 cold biases were particularly large in low latitudes, where errors exceed 4 °C. A smaller cold bias of 2–3 °C was observed in WRF, while substantial underestimations were identified at high latitudes and Tibetan Plateau, reflecting the influence of snow cover. The ensemble noticeably reduced these deficiencies, as well as giving a more realistic magnitude over Sichuan Basin and some parts of Southeast China. The absolute biases from RegCM4 and WRF were decreased by 15% and 30% over the entire domain, respectively.

Fig. 14
figure 14

Spatial distributions of the annual mean temperature (units: °C) over China during 1981–2010 from observation (a), RegCM4 (b), WRF (c) downscaling simulations, and their ensemble result (d)

By comparing the annual precipitation from the two RCM simulations (Fig. 15), it was found that the RegCM4 more accurately simulated light rainfall (< 2 mm) over most northern China but failed to capture observed heavy rainfall (> 4 mm) over Southeast China, while WRF better simulated moderate to heavy rainfall (3–5 mm) over most of Southeast China but yielded excessive rainfall over parts of the high latitudes and Southwest China. These differences indicated a strong degree of independence and complementarity between the RCM simulations. The ensemble based on these two RCMs significantly improved the rainfall pattern, with the spatial correlation coefficient increased to 0.88 (0.75 for RegCM4 and 0.80 for WRF). In particular, the overestimation over the east of Southwest China and Northeast China in WRF and the dry biases over Southeast China in RegCM4 were reduced. However, the ensemble overestimated rainfall along the southern edge of the Tibetan Plateau and over Southwest China, which were carried over from both RCM members. These results suggest that the optimal ensemble method based on different RCMs can improve the simulated temperature and precipitation.

Fig. 15
figure 15

As in Fig. 14, but for precipitation (units: mm)

4 Conclusions and discussion

In this study, two regional climate models, RegCM4 and WRF, were used to downscale NCEP Reanalysis II (R2) data with a horizontal grid spacing of 30 km over China during 1980–2010. The performances of RegCM4 and WRF with respect to R2 reanalysis and observations for the surface air temperature and precipitation were assessed. The simulation results using different RCMs showed large discrepancies in temperature and precipitation climatology, climate trends and extreme precipitation, with strong regional dependence. Therefore, an optimal ensemble approach based on the two RCMs was also used to improve the downscaling ability.

In general, the main spatial patterns of temperature and precipitation over China could be reproduced by R2 and the RCMs. With a better representation of orography and local scale processes, both RCMs simulated more realistic regional characteristics and magnitudes than R2, although this did not apply during the winter. The cold biases in the order of 2–3 °C in both RCMs over most areas in winter may be caused by the poor representation of land surface processes (Zeng et al. 2015). Overall, WRF had the most realistic temperature over the Yangtze River, Southeast China and Southwest China (wet regions), where the correlation coefficient in R2 was increased by 30–50%. The RMSEs of RegCM4 were the smallest in Northeast China, Northwest China and the Tibetan plateau (dry regions), with significant improvements over R2. For precipitation, RCMs generally simulated a more accurate spatial distribution by eliminating the artificial precipitation overestimation in R2, which is consistent with the study of Sato and Xue (2013). WRF better captured the rain belts over the Yangtze River and Southeast China in summer and fall, while the biases of RegCM4 in high latitudes were smaller than those of WRF. This pattern is also revealed in the simulated annual cycle of precipitation. A comparison of four RCMs in simulating the precipitation over China also found that the WRF results were clearly better than RegCM4 in the wet regions, while both models performed well in the dry regions (Wang et al. 2016). WRF was better at capturing the pattern of rainfall, although there was some overestimation in the north of the Yangtze River in late June. The spatial distributions of extreme precipitation were also better in both RCMs, suggesting they have an added value for modelling the detail of local heavy precipitation events associated with mesoscale convection. WRF generally performed the best over the Yangtze River and Southeast China, especially for the SDII in summer and fall. The RegCM4 could best reproduce the SDII, R10 and R95pT over Northeast China in summer.

For long-term climate trends from 1981–2010, warming was observed with the amplitude increasing from south to north. R2 and the RCMs all captured the warming trend, but WRF clearly reduced the underestimation in high latitudes and the Yangtze River produced by R2 and RegCM4. For precipitation, there was a dipole structure of north-drought and south-flood over the eastern part of China. It is likely that the EASM has weakened and does not extended as far north as it used to in the late 1970s, as a result of changes in the atmospheric circulation (Christensen et al. 2013). Although R2 and RegCM4 could reproduce some observed features, WRF exhibited the best agreement with observations, especially for the increasing trends over Southeast China and Yangtze River and decreasing trends over Sichuan Basin and Northeast China. Similar to the finding of Ishizaki et al. (2012), the ensemble results of annual mean temperature and precipitation showed improved behavior compared to individual models. The cold biases from two RCMs were noticeably reduced by 15–30%, and there was a more realistic magnitude over Sichuan Basin and some parts of Southeast China. The spatial precipitation patterns were also improved by the ensemble, with the spatial correlation coefficient increased by 0.08–0.13.

This study provides a reference for the differences of WRF and RegCM4 downscaling in simulating the surface air temperature and precipitation over China. Some useful indications to improve the RCM performance in simulating climate change over China were given. Because of the uncertainties remaining in both RCMs, especially for precipitation, ensemble downscaling using more models is needed to reduce the model uncertainness. Our future work includes sensitivity studies to identify which configurations in Table 1 explain the difference between the WRF and RegCM4 simulations. Finally, because of limited computational resources, we carried out our experiments with a horizontal resolution of 30 km. This is not fine enough to resolve the more complex mountain ranges and mesoscale phenomena, but it may represent well the summer monsoon rain belt features associated with the Meiyu front (Gao and Giorgi 2017). RCMs running at convection-permitting resolution are being applied to climate simulation. These call for more computational resources and more reliable and fine-scale observation combining in situ and remotely sensed data.