1 Introduction

Regional climate models (RCMs) are commonly used in high-resolution modeling for physical process understanding, seasonal climate prediction, climate change projection, and climate impact assessment (Giorgi 2006; Xue et al. 2014; Giorgi and Gutowski 2015). Accordingly, over a dozen RCMs have been developed and evaluated, with the objective of adding value to the larger-scale driving features resolved by general circulation model (GCM) simulations or observational reanalyses. The skills and biases of various RCMs over major domains have been well documented in regional model intercomparison projects (Roads et al. 2003; Fu et al. 2005; Rinke et al. 2006; Christensen et al. 2007; Mearns et al. 2012; Nikulin et al. 2012). This study documents the performance of the Climate-Weather Research and Forecasting model (CWRF, Liang et al. 2012) over China.

Twelve major RCMs, some with multiple variants, are typically used for regional climate modeling over China or East Asia. Their relevant applications are summarized in Table 1, including model resolutions, integration periods, study focuses, and references. [All RCM acronyms and other key abbreviations are listed in “Appendix A”.] Seven of these models participated in the East Asian Regional Model Intercomparison Project (RMIP), which compared present performance and future projections given identical driving conditions from a GCM simulation or reanalysis (Fu et al. 2005). These include GRIMs, JSM, MM5, RAMS, RegCM, RIEMS, and WRF (Feng and Fu 2006; Feng et al. 2011; Niu et al. 2015; Li et al. 2016; Tang et al. 2016; Wu et al. 2016). RSM was compared with WRF and RegCM for past climate performance (Wang et al. 2015), PRECIS, CCLM, and LMDZ were individually evaluated in both the present and future climate conditions, and IPRC was tested only in a summer case study.

Table 1 Summary of 12 RCMs used for regional climate modeling over China or East Asia, including model resolution (horizontal grid size: km; vertical level number: L), integration period (longest combined: ~ else: -), study focus, and references

The most popular of these models is RegCM, which is based on MM5 (Grell et al. 1994) and evolved from version 2 (Giorgi et al. 1993a, b) to version 3 (Pal et al. 2007) to the current version 4 (Giorgi et al. 2012). All three versions have been used for climate studies over the region (see Table 1 for the references), including sensitivities to model configurations such as lateral/initial conditions and horizontal/vertical resolutions; effects of terrain details, land use changes, land/ocean-atmospheric interactions, and cumulus parameterization and other physics improvements; and climate projections driven by various GCMs. These studies showed a large range of RegCM-simulated present climate biases and future trend uncertainties.

Recently, WRF has been increasingly used as an RCM for China or East Asian climate modeling. Some studies have examined its added value for downscaling GCM simulations (Yu et al. 2010) and reanalyses (Sato and Xue 2013; Gao et al. 2015), as well as its performance sensitivity to driving lateral conditions (Yang et al. 2012) and land surface representations (Li et al. 2015). Others have evaluated its ability to hindcast seasonal climate anomalies (Yuan et al. 2012; Ma et al. 2015) and project future climate changes, focusing on extreme events (Yu et al. 2015; Bao et al. 2015). Wang et al. (2015) compared the performance of WRF with RegCM4 and RSM in simulating China precipitation and temperature interannual variations, linear trends and extreme events during 1989–2008. Their results showed substantial differences in regional climate biases between the models, none of which had significantly superior skill.

WRF was designed originally for short-range numerical weather prediction but not expressly for long-term climate simulation. Liang et al. (2012) noted that direct climate applications of WRF are limited by its inadequate representation of essential physics at relevant scales, and therefore developed its climate extension CWRF with crucial improvements to land–atmosphere–ocean, convection–microphysics, and cloud–aerosol–radiation interactions, as well as system consistency throughout all process modules. As a result, CWRF more realistically simulates surface radiation, terrestrial hydrology, and precipitation (Choi and Liang 2010; Yuan and Liang 2011a; Liang et al. 2012; Liang and Zhang 2013; Qiao and Liang 2015, 2016a, b), and improves WRF regional climate prediction in the United States (Yuan and Liang 2011b; Liang et al. 2012; Liu et al. 2016; Chen et al. 2016). This study evaluates CWRF simulation of China climate characteristics during 1980–2015, relative to the latest RegCM4.6 simulations.

2 Model description

CWRF has been continuously developed since 2002 as a Climate extension of WRF (Skamarock et al. 2008) through improvements to the representation of numerous physical processes and integration of external (top, surface, lateral) forcings crucial to climate scales (Liang et al. 2012). It couples a state-of-the-art Conjunctive Surface–Subsurface Process model (CSSP) to predict detailed terrestrial hydrology and land–atmosphere interaction. CSSP is rooted in the Common Land Model (CoLM, Dai et al. 2003, 2004), with updates from the Community Land Model (CLM, Oleson et al. 2013). It integrates vertical water exchange (precipitation, evaporation, transpiration, infiltration) and hydraulic redistribution by deep vegetation roots; it also represents horizontal water movement (across grids) as surface and subsurface runoff resulting from rainfall excess and saturation depletion, as well as lateral flows due to resolved and subgrid topographic controls (Choi et al. 2007, 2013; Choi and Liang 2010; Yuan and Liang 2011a). It incorporates realistic distributions of surface (soil and vegetation) characteristics (Liang et al. 2005a) and an advanced dynamic-statistical parameterization of land surface albedo (Liang et al. 2005b) to enable credible evaluation of land use/land cover effects on regional climate (Xu et al. 2014). CWRF also couples a comprehensive multi-level upper ocean model (UOM, Ling et al. 2011, 2015) to resolve transient air-sea interactions critical to sea surface temperature diurnal cycle and daily variations, as well as a detailed Lake, Ice, Snow, and Sediment Simulator (LISSS, Subin et al. 2012) to predict the thermal effects of freshwater lake interactions with the atmosphere.

Furthermore, CWRF integrates a comprehensive ensemble of alternate parameterization schemes for each of the key physical processes, including surface (land, ocean), planetary boundary layer, cumulus (deep, shallow), microphysics, cloud, aerosol, and radiation (Liang et al. 2012). This facilitates the use of an optimized physics ensemble approach to improve weather or climate prediction (Liang et al. 2007, 2012; Zeng et al. 2008; Liu et al. 2009; Yuan et al. 2012) while providing reliable uncertainty estimates. In particular, CWRF has a built-in Cloud-Aerosol-Radiation (CAR) ensemble model that incorporates a wide variety of alternate parameterizations for cloud properties (cover, water, radius, optics, geometry), aerosol properties (type, profile, optics), radiation transfers (solar, infrared), and their interactions (Liang and Zhang 2013). CAR enables full quantification of radiative forcings and climate impacts as well as their uncertainties, all of which strongly depend on the choice of cloud, aerosol and radiation schemes (Zhang et al. 2013). CWRF also has a built-in ensemble cumulus parameterization (ECP), which uses a suite of alternate closure assumptions that may drastically affect rainfall distribution, frequency and intensity, and diurnal cycle (Liang et al. 2004a; Qiao and Liang 2015, 2016a, b). The optimized ECP ensemble can significantly improve precipitation prediction.

This study uses the following CWRF physics configuration as the control version: Cumulus—ECP penetrative convection (Qiao and Liang 2016a, b) plus UW shallow convection (Bretherton and Park 2009), Microphysics—GSFCGCE (Tao et al. 2003), Cloud—XRL (Xu and Randall 1996; Liang et al. 2004b), Aerosol—MISR (Kahn et al. 2005, 2007; Zhao et al. 2009), Radiation—GSFCLXZ (Chou and Suarez 1999; Chou et al. 2001), Planetary Boundary Layer (PBL)—CAM (improved Holtslag and Boville 1993) plus ORO (Rontu 2006; Liang et al. 2006), and Surface—CSSP land plus UOM ocean (described above). A more detailed description of these schemes is provided in Liang et al. (2012), with the key differences in the ECP, CSSP, and UOM updates referenced above. For each new regional domain, CWRF must be carefully localized to maximize its performance. In addition to these physics improvements, the CWRF localization for this study region includes the specific domain design and construction of surface boundary conditions (see Sect. 3). In particular, the dynamic surface albedo parameterization (Liang et al. 2005b) must be re-developed according to the updated vegetation data (Xu et al. 2014), and stream flow directions must be re-constructed (Choi et al. 2013) with visual reality check, both of which are time consuming and labor intensive.

RegCM4.6 (Giorgi et al. 2012) has been continuously developed from MM5 (Grell et al. 1994) over the last three decades. The physics configuration chosen for the present study includes Cumulus—TDK penetrative plus shallow convection (Tiedtke 1989), Microphysics—SUBEX (Pal et al. 2000), Cloud + Radiation—CCM3 (Kiehl et al. 1996), PBL—CCM3 (Holtslag et al. 1990), and Surface—CLM4.5 land processes (Oleson et al. 2013) plus surface fluxes over oceans (Zeng et al. 1998). The CCM3 and CAM PBL schemes are similarly formulated, as are the CLM4.5 and CSSP land schemes. Other physics schemes and the dynamic core differ significantly between RegCM4.6 and CWRF. Table 2 summarizes their major differences, which include dynamics and physics configurations as well as surface and lateral boundary conditions.

Table 2 Summary of major differences between CWRF and RegCM4.6 configurations

3 Model experiment design and observational reference data

The CWRF computational domain in this study (Fig. 1) is based on the Lambert conformal map projection centered at (35.18˚N, 110˚E) with a total of 232 × 172 grid points at 30 km spacing. Liu et al. (2008) demonstrated that this domain is optimal for modeling China’s regional climate, which is determined by interactions between the planetary circulation (as forced by lateral boundary conditions or LBCs) and East Asian surface processes, including orography, soil, vegetation and coastal oceans. Figure 1 depicts a small subset of the comprehensive surface boundary conditions (SBCs) used by CWRF (Liang et al. 2005a; Xu et al. 2014), showing the land cover and ocean depth distributions, lakes, major rivers, and main streams. In the buffer zones, located across 14 grids along the four edges of the domain, varying LBCs were specified using a dynamic relaxation technique with linear-exponential nudging coefficients that decrease toward the surface and the inner boundaries (Liang et al. 2001). By default, CWRF uses 36 terrain-following vertical levels with the upper boundary at 5distributions over Mainland0 hPa (Liang et al. 2012). Both the horizontal and vertical resolutions for CWRF are relatively finer than most RCMs applied for long-term simulations in the region, which typically use ~ 50 km grids and ~ 20 levels.

Fig. 1
figure 1

The CWRF computational domain for this study, overlaid with land cover, ocean depth (m), lakes, major rivers, and main streams. The hatched edge areas are the buffer zones, where LBCs are specified

RegCM4.6 uses the same domain (Fig. 1), but includes only 9 grids in the buffer zones with exponential nudging coefficients (Giorgi et al. 1993b). It uses relatively stronger (especially near the surface) and faster (toward the inner domain) relaxation than does CWRF (Liang et al. 2001). As typically applied, the model has only 18 vertical levels, half that of CWRF, with the same 50 hPa upper boundary. As designated for its interactive CLM4.5 (Oleson et al. 2013), RegCM4.6 employs 7 primary plant function types, whose properties (such as leaf and stem area indices) are derived from IGBP and other satellite data (Bonan et al. 2002). These land use/cover specifications differ from those of CWRF/CSSP, which incorporates USGS’s 24 dominant categories and MODIS satellite data (Liang et al. 2005a, b; Xu et al. 2014).

Both CWRF and RegCM4.6 simulations were driven by the ECMWF Interim reanalysis (ERI, Dee et al. 2011), one of the best available proxies for observations. They were initiated on October 1, 1979 and integrated continuously through December 31, 2015. The first two months are considered spin-up and were not used in the subsequent analyses. Sea surface temperature (SST) was prescribed from the daily observational analysis, available over the global oceans on a ¼º longitude by latitude grid mesh from November 1981 onward (Reynolds et al. 2007; Banzon et al. 2016). Before that, SST was supplemented by ERI daily mean ground temperature. For CWRF, the daily SST analysis was used as relaxation in UOM to predict ocean temperature variations (including the diurnal cycle) due to transient air-sea interactions (Ling et al. 2011, 2015). On the other hand, SST in RegCM4.6 varies exactly according to the prescribed daily means.

As the reference for model evaluation, observational data consist of a gridded (¼º longitude by latitude) daily analysis of precipitation, surface air (2 m) temperature and humidity, and surface (10 m) wind based on in situ measurements at 2416 stations in Mainland China from 1961 onward (Wu and Gao 2013). Given China’s total area of ~ 9.634M km2, these stations, if evenly distributed, would each cover an equivalent 63 km grid, coarser than the ¼º analysis grid. However, the stations (principal plus ordinary) are sparse in western China (Fig. S1), including the Tibetan Plateau and the Taklamakan-Gobi Desert, where the analysis contains substantial uncertainties due to arbitrary extrapolation from missing data. On the other hand, the stations are relatively dense in eastern China (east of ~ 100°E, except for the northern border including Inner Mongolia-Heilong Jiang), where the analysis represents climate characteristics at a finer resolution than the ERI ~ 80 km grid but still coarser than the CWRF 30 km grid. Consequently, our model evaluation focuses more on eastern China. Given different classifications of major climate regimes (Zheng et al. 2013; Shi et al. 2014; Han and Zhai 2015) and considering topographic characteristics and data availability, we further divide Mainland China into 11 broad regions (Fig. S1 provides names and boundaries) to evaluate regional model performance.

ERI uses a 4D-Var analysis to assimilate satellite-retrieved total column water vapor as a pseudo-observation of rainfall, as well as a separate surface analysis of screen-level temperature and humidity synoptic observations, along with station snow depth and satellite snow cover data (Dee et al. 2011). As such, precipitation and surface air temperature data from ERI are among the most realistic proxies of observations over East Asia (Zhu et al. 2016; Huang et al. 2016). Downscaling RCMs do not directly assimilate surface measurements, but predict these variables as driven only by planetary circulation, especially upper air conditions (Liang et al. 2001). Therefore, for RCMs to reproduce these variables with skill close to that of ERI is a significant achievement, not a general expectation. However, ERI uses measurements from significantly fewer than the 2416 stations used in Wu and Gao (2013), and thus cannot resolve the full characteristics in the reference data. As a result, RCMs may outperform ERI in certain circumstances, which would indicate that they incorporate more realistic physics representations (especially surface-atmosphere interactions) than ERI at this scale.

4 Annual cycle

Figure 2 uses a Taylor (2001) diagram to summarize the overall performances (relative to the driving ERI) of CWRF and RegCM4.6 in simulating seasonal mean precipitation geographic distributions over Mainland China. Spatial pattern correlations and normalized standard deviations are compared with observations for all four seasonal means averaged during 1980–2015. To better describe precipitation characteristics, we include statistics for seasonal average amount, number of rainy days (> 0.1 mm), and simple daily intensity index (total accumulated amount / number of rainy days).

Fig. 2
figure 2

Taylor diagram of pattern statistics in Mainland China comparing CWRF, RegCM4.6, and ERI overall performance in simulating geographic distributions of seasonal average precipitation amount (PR), number of rainy days (RD), and daily rainfall intensity (DI). Shown are the pattern correlation (azimuthal) and normalized standard deviations (radius) compared with observations. The dot marks the perfect score with a unit correlation and deviation

For precipitation amount, ERI strongly correlates to observed patterns, with some seasonal variation (0.77–0.82). It generally overestimates spatial variability (1.1–1.37), especially in autumn and summer, with winter closest to observations. In comparison, CWRF correlates less in summer (0.74), similarly in autumn (0.78), and more strongly in winter and spring (0.87), and even more significantly overestimates spatial variability (1.43–1.69). This increased variability may arise from the inability of the coarsely-resolved reference data to represent the actual signal. In contrast, the RegCM4.6 performs significantly worse, with lower correlations ranging from 0.42 (winter), 0.54 (autumn, summer) to 0.66 (spring).

For the number of rainy days as compared with the precipitation amount, ERI correlates more highly to observed patterns (0.83–0.91) except in winter (0.71), and also overestimates spatial variability more significantly (1.35–1.46) except in summer (1.21). Likewise, CWRF correlates more strongly in summer and autumn (0.85), similarly in spring (0.87) and less in winter (0.81), while systematically reducing overestimation of spatial variability, especially in summer (1.19). As such, overall CWRF performs close to ERI. RegCM4.6 generally captures rainy days better than precipitation amount, with an increased correlation (0.74–0.81) except in winter (0.32), and its overestimation of variability is reduced (1.04–1.26). However, it is still outperformed by CWRF, especially in pattern correlation.

For daily rainfall intensity as compared with the other two measures, the main performance difference is that all models more realistically simulate spatial variability. In particular, CWRF produces standard deviations close to observations (0.98–1.10), which is an improvement over ERI’s general underestimation (0.80–0.92). RegCM4.6 also simulates realistic variability (0.93–1.03), but has a systematically lower pattern correlation than CWRF and ERI for all seasons.

Figures 3 and 4 compare geographic distributions of the seasonal average precipitation amount and daily intensity. As discussed earlier, ERI assimilates pseudo-observations and station measurements and thus can well reproduce the general pattern and magnitude of precipitation in all seasons. In summer, the observed monsoon system consists of two major rain bands east of ~ 105°E: along the Yangtze River and across South China. ERI simulates a smoother structure, without a well-defined separating dry zone. In contrast, CWRF reproduces the two bands with a finer structure, but overestimates rainfall amount in South China, mainly by inflating the number of rainy days (Fig. S2). CWRF more realistically simulates daily intensity than ERI, which systematically underestimates both bands. Therefore, ERI produces a reasonable total amount by compensating for weaker intensity with more rainy days, a “drizzling problem” typical in GCMs (Sun et al. 2006). On the other hand, RegCM4.6 poorly simulates the two rain bands, with little organized structure and more scattered grid-point storms.

Fig. 3
figure 3

Geographic distributions of seasonal average precipitation amount (mm day−1) observed (OBS), assimilated (ERI), and downscaled by CWRF and RegCM4.6 for winter (DJF), spring (MAM), summer (JJA), and autumn (SON)

Fig. 4
figure 4

Same as Fig. 3 except for seasonal mean daily intensity (mm day−1)

Another key summer feature is the moderate precipitation in the Northeast, which is strongest along the three mountain ridges surrounding the Northeast Plain (Da and Xiao Hinggan Liang, and Changbai Shan). ERI captures this well, in part due to its data assimilation. In contrast, CWRF overpredicts the total amount and daily intensity but produces rainy days comparable to ERI. However, since monitoring stations over these mountains are sparse and mostly located at lower elevations, the reference data likely underestimate precipitation amount and intensity over mountains (Liang et al. 2004b). A finer-resolution monitoring network together with an objective topographic adjustment (Daly et al. 1994) is needed to provide more realistic reference data, against which model performance can be better evaluated.

Spring is China’s second most essential precipitation season, with main rainfall occurring between the Yangtze and Pearl Rivers. Observations exhibit two rainfall centers: immediate south of the Yangtze River and north of the Pearl River. ERI and CWRF both capture these centers, but ERI underestimates their intensity while CWRF overestimates it. On the other hand, RegCM4.6 fails to distinguish the two organized centers at all, producing scattered precipitation over the entire region. Again, the reference data may be inadequate to resolve topographic enhancement over this region, where numerous mountains have elevations exceeding 1 km.

Winter is China’s dry monsoon season, with observed precipitation typically under 10 mm per day. Precipitation is evenly distributed between the Pearl and Yangtze Rivers east of ~ 105°E. Model simulation of daily intensity is reasonable, with small underestimation by ERI and overestimation by CWRF, but less spatial correspondence by RegCM4.6. Biases are larger in rainy days, with ERI close to observations, CWRF overestimation (excessive amount) in the western part, and RegCM4.6 underestimation (deficit amount) in the eastern part. Since winter precipitation results primarily from non-convective systems, interactions among surface, PBL, and cloud microphysics parameterizations must be improved for models to more realistically capture rainy days.

Autumn is the transition season for China’s summer to winter monsoon, with the main rainfall retreating west of ~ 110°E. East of that longitude, precipitation is fairly uniform (similar to winter but with broader coverage) except for enhancement along the southeastern coast where intensity is 10 mm day−1 or higher. A weaker intensity center is observed over the Yangtze River Basin. These features are well captured by both CWRF and ERI, with some underestimation by the latter. In contrast, RegCM4.6 shifts the center over the Yangtze River Basin westward to the upper reach.

Notably, the reference data shows frequent rainfall in the southern foothills of the Yungui and Tibetan Plateaus, especially in summer and spring, with moderate rainfall (~ 10 mm day−1) on 75% of summer days. ERI captures this feature well, though it increases both intensity and coverage. Both CWRF and RegCM4.6 also reproduce the feature, but with a more scattered structure. Again, these areas contain sparse monitoring stations and hence little observational reference. The topographic uplifting effect on the prevailing moist southerly flow causes frequent rainfall, likely with heavier intensity than in the reference data, and the complex characteristics of the clustered mountains may lead to a more scattered rainfall structure. Thus, the CWRF or RegCM4.6 results may actually be more reasonable than the reference data.

For surface air temperature, the magnitude of spatial variations is much greater than that of model differences. Therefore, the spatial pattern correlations are all above 0.96 throughout the year. This applies to daily mean, maximum, and minimum, indicating that comparing these full temperature fields does not effectively separate model skill differences. Rather we compare their biases (simulations minus observations), including seasonal geographic maps and frequency distributions over Mainland China. Given that histograms are not smooth and depend on the width and end points of the bins, we use kernel density estimators to depict the frequency distributions (Hwang et al. 1994). This applies to all frequency distribution results presented below.

For the daily mean (Fig. 5), the bias frequency shows that ERI has a narrow peak around 1 °C, while RegCM4.6 has a widespread flattened pattern. The CWRF pattern is close to that of ERI, especially for the warm tail, indicating that it performs better overall than RegCM4.6. Important regional differences exist. ERI biases vary generally within ± 3 °C and mostly between ± 2 °C in all seasons, except for autumn, which is systematically 1–3 °C warmer. The ERI performance results from its surface data assimilation. In contrast, CWRF biases are comparable with or even smaller than ERI over broad regions in eastern China, where surface monitoring stations are relatively dense. This is most obvious to the south of the Yangtze River from spring to autumn. Exceptions include systematic colder biases in the Tibetan Plateau during winter and spring, and warmer biases in the Taklamakan-Gobi Desert during summer. The reference over these regions, however, is based on subjective extrapolation from measurements at a very limited number of stations, and so contains substantial uncertainties. RegCM4.6 produces similar biases in these regions. A more realistic reference is required to determine whether the biases are due to model errors or data uncertainties. In other regions, RegCM4.6 generates stronger warm biases than CWRF for all seasons, except for colder biases in North China spring.

Fig. 5
figure 5

Geographic distributions of ERI, CWRF and RegCM4.6 seasonal average biases (departures from observations) in daily mean temperature (°C) and their frequency distributions over Mainland China. The biases colored on the maps are statistically significant at a confidence level better than 95% with a student test, assuming yearly independence

For the daily maximum (Fig. S3), the bias frequency shows that ERI has a sharp peak around − 1 °C (colder), RegCM4.6 again has a flatter pattern (here even further widened), and the CWRF distribution is intermediate. ERI biases are systematically reduced relative to the daily mean, causing an improvement in most of eastern China and a skill loss to the west throughout the year. In winter and spring, ERI and CWRF both have consistent western cold biases; RegCM4.6 significantly enhances these cold biases, and generates them in most of northern China, suggesting enlarged daytime surface radiation deficits. In eastern China, CWRF performs well, with small biases mostly within ± 2 °C in both seasons, whereas RegCM4.6 produces much larger warm biases (3–7 °C in winter and 1–4 °C in spring) to the south of the Yangtze River and cold biases (2–6 °C in winter and 2–8 °C in spring) to the north of the Yellow River. RegCM4.6 may insufficiently represent snow and precipitation processes, since its performance improves (over the daily mean) in summer and autumn. In these seasons, ERI and CWRF are realistic across most of eastern China, as is RegCM4.6 to the north of the Yellow River. Exceptions include cold biases (1–3 °C) in summer for CWRF in the Pearl River Basin, and warm biases (2–5 °C) for RegCM4.6 in summer between the Yellow and Yangtze Rivers and in autumn between the Yellow and Pearl Rivers. These biases are opposite to precipitation biases shown in Fig. 3.

For the daily minimum (Fig. S4), the bias frequency shows that ERI peaks around 1 °C in autumn–winter and 2 °C in spring-summer, indicating an overall overestimation, whereas CWRF peaks near 0 °C with a flatter distribution, which is similar to but less skewed than RegCM4.6. ERI generates systematic warm biases (1–4 °C) over most of China, especially in spring and summer. CWRF also displays warm biases of similar magnitude, but these are generally limited to areas with sparse monitoring stations, in northern China in summer and autumn, and even further north in winter and spring. It produces cold biases over the Tibetan Plateau in winter and spring, with magnitudes similar to the daily mean and maximum biases. CWRF performs excellently across the rest of eastern China throughout the year. RegCM4.6 performance generally resembles CWRF, except that the summer and autumn warm biases are enhanced and expanded into North China, and spring cold biases (1–3 °C) occur broadly over the Northeast.

For the daily temperature range (maximum minus minimum, Fig. 6), ERI gives systematic underestimates, where the bias frequency peaks at − 2 to − 3 °C throughout the year. CWRF yields a general improvement, shifting the peaks to near − 1 °C, albeit with a wider spread. RegCM4.6 produces a bimodal pattern, most obvious in winter, autumn, and spring, indicating that it enhances both negative and positive biases relative to CWRF. Even assimilating 6-hourly surface data analysis, ERI is still not able to accurately resolve the diurnal range. In contrast, CWRF, which incorporates only synoptic conditions above the boundary layer, successfully captures the diurnal range, especially in eastern China where observations are abundant. One exception occurs between the Yellow and Yangtze Rivers, where CWRF overestimates the range by 1–3 °C in winter. This occurs as a combination of warmer maximum and colder minimum, suggesting insufficient cloud effects to reduce daytime incoming solar and nighttime outgoing infrared radiation. A similar but weaker CWRF bias pattern exists in spring. Additionally, CWRF underestimates the range in northern China areas of sparse observations, due mainly to warmer minimum temperatures. On the other hand, RegCM4.6 simulates significantly greater range biases, with overestimation to the south of the Yangtze River and underestimation in western China and to the north of the Yellow River. This amplification is most severe in winter, strong in autumn, and notable in spring. Such patterns are mainly explained by biases in the daily maximum for winter and spring, but by the combination of warmer maximum south of the Yellow River and warmer minimum in northern China for autumn. These imply more complicated deficiencies in RegCM4.6. Future investigation should also consider the temperature range effect due to the precipitation diurnal cycle, which remains a modeling challenge (Liang et al. 2004a).

Fig. 6
figure 6

Same as Fig. 5 except for daily temperature range (°C)

Some straight-line patterns appear in CWRF daily temperature range biases (Fig. 6) over the Taklamakan-Gobi Desert. Relative to the surrounding background, stronger negative range biases correspond to daily temperature colder maximum (Fig. S3) and warmer minimum (Fig. S4). They are identified with wetter soil moisture bands along with streamflow lines. Over the desert areas with relatively flat terrain, the existing digital elevation model data are inadequate to define actual streams and flow directions. The unrealistic specification of these and related SBCs causes CSSP to produce soil and air temperature departures from their references, which are also uncertain due to the lack of in situ observations. Correction to these deficiencies will require realistic representation of the terrestrial hydrology, which depends on accurate SBCs and real verification data.

Figure 7 compares model performances for surface wind speed. The bias frequency indicates that CWRF is more realistic than ERI, with the peak near 0 rather than 1 (m s−1) in all seasons. RegCM4.6 is worse than ERI, with the peak shifted to ~ 2 (m s−1), indicating systematic overestimation. CWRF’s superior performance is obvious in eastern China, where the model differs little from the reference that has abundant observations. In contrast, ERI contains overestimations of 1–2 (m s−1) over broad regions like South China, persistent throughout the year. In western China, CWRF and ERI share a similar pattern in all seasons, with underestimation in northern West Tibet and overestimation in South Xinjiang. However, RegCM4.6 overestimates most severely in East and South Tibet. Since there are very few observations in these regions, it is uncertain whether these biases reflect errors in the models or the reference data.

Fig. 7
figure 7

Same as Fig. 6 except for surface wind speed (m s−1)

5 Interannual variation

Figure 8 compares CWRF- and RegCM4.6-simulated surface air temperature interannual temporal correlations with observations, including seasonal geographic maps and frequency distributions over only eastern China (due to its abundant monitoring data). The correlations for ERI are all high (attributed to its effective surface data assimilation) and are not shown here. The CWRF correlations are very high from autumn to spring almost everywhere (except for South Xinjiang and West Tibet, where observations are lacking), indicating extraordinary model skill in capturing interannual temperature variations. Good performance is also seen in summer, except that most of Central to South China lacks useful skill. RegCM4.6 shares these performance features, although somewhat less skillful than CWRF, with the frequency peak or tail shifted to a smaller correlation, especially in winter and spring.

Fig. 8
figure 8

Geographic distributions of CWRF and RegCM4.6 simulated temperature interannual correlations with observations and their frequency distributions over eastern China. The correlations larger than 0.3 as colored are statistically significant at a confidence level better than 95% with a student test, assuming yearly independence

Figure 9 compares the precipitation correlations. CWRF performs very well in winter, with large, significant correlations almost everywhere in eastern China. Good performance is also seen in spring, except that correlations in the western part (105–110°E) of Central to South China and some portions of North and Northeast China are no longer significant. The area of significant correlations is further reduced in autumn, especially in North China and along the east coast. Overall performance is weakest in summer, when most areas along the Yangtze River to the south of the Yellow River lack significant correlations. For all seasons, RegCM4.6 has less overall skill in smaller areas than CWRF. Summer temperature in Central to South China consistently performs poorly in both CWRF and RegCM4.6 (Fig. 8). This consistency may indicate a challenging issue related to regional climate predictability during the summer monsoon, in which strong convective activities and land-sea-air interactions occur. Large-scale circulation forcings via LBCs are no longer dominant, whereas regional factors and feedbacks become more critical. Thus, skill enhancement in this region will likely depend on incorporating surface data assimilation to improve initialization and system memory in terrestrial hydrology and coastal oceans (Kumar et al. 2008), and developing an optimized multi-physics ensemble to improve model representation of key processes such as convection-microphysics-radiation and surface-atmosphere interactions (Liang et al. 2012).

Fig. 9
figure 9

Same as Fig. 8 except for precipitation

The covariability of precipitation and temperature is a key measure of a model’s ability to capture coupled physical processes (Trenberth and Shea 2005). Figure 10 compares CWRF and RegCM4.6 simulations to observed precipitation-temperature interannual correlations for each season. Observations show strong negative correlations in summer over most of Central-South China, as well as in the northern and western parts of the Northeast, Inner Mongolia, the northern (western) parts of North (South) Xinjiang, and South Tibet. These latter regions have sparse data, so the reference itself is uncertain. The negative correlations reflect more solar heating and less evaporative cooling under dry conditions (Trenberth and Shea 2005). CWRF captures observations in eastern China well, with a little underestimation in South China. On the other hand, RegCM4.6 overestimates the relationship, especially in North China, indicating a too strong coupling between precipitation and temperature.

Fig. 10
figure 10

Same as Fig. 8 except for cross correlations between precipitation and temperature observed (OBS) and simulated by CWRF and RegCM4.6

Other seasons show much weaker relationships. In spring, both models well simulate the strong observed negative correlations in the Southwest; CWRF overestimates and RegCM4.6 underestimates the weaker correlations between the Yangtze and Yellow Rivers; CWRF also overestimates correlations in the southeastern part of the Northeast. In autumn, observations exhibit weaker correlations, which RegCM4.6 underestimates in the Southwest but overestimates between the two rivers; in both counts CWRF is more realistic. In winter, CWRF well captures the strong negative correlations in the central Northeast, which RegCM4.6 underestimates; stronger correlations occur in the Southwest and extend to Sichuan, which CWRF simulates realistically but RegCM4.6 overestimates and expands further into the area between the upper reaches of the Yangtze and Pearl Rivers.

In regions where RegCM4.6 simulates stronger precipitation negative correlations with temperature, it also overestimates positive correlations with relative humidity for all seasons (Fig. S5). Thus, RegCM4.6 overestimates the coupling between precipitation, temperature and humidity, indicating unrealistic cloud radiative and surface evaporative effects, especially in summer and autumn. This overestimation is likely because increased precipitation is associated with more clouds (so less solar heating) and wetter surfaces (so more evaporation), both of which favor warmer and moister near-surface air.

Interestingly, scattered regions of positive precipitation-temperature correlations appear in winter along the Yangtze River and Jiangsu’s coast, as well as in spring in Inner Mongolia. Unlike RegCM4.6, CWRF captures both of these. Such positive correlations may result from precipitation favored by warm moist advection in extratropical cyclones and limited by low water availability in cold conditions (Trenberth and Shea 2005). In addition, CWRF generates positive correlations along the northern slopes of the Tibetan Plateau, strong in summer and weaker in autumn and winter. Perhaps orographic uplift causes warm air holding large amounts of water to precipitate more, especially over these high elevation regions. In contrast, RegCM4.6 simulates positive correlations over the Tibetan Plateau, strong in spring and weaker in summer. Perhaps this precipitation is associated with low clouds that radiate back to warm the surface. However, such a regional relationship is either not evident or can even be reversed (such as in spring in West Tibet) in CWRF. Negative correlations over the Plateau are also simulated by CWRF in winter (strong) and by both models in autumn (weaker). Observational data over the Plateau are needed to understand the actual processes responsible for these positive correlations.

Figures 11 and 12 compare CWRF and RegCM4.6 simulations with observed temperature and precipitation monthly anomalies during 1980–2015, averaged over the five regions with relatively dense monitoring stations (Southwest, South China, Central China, North China, Northeast). They are normalized against their own mean annual cycles of the same period, with the respective monthly means and interannual standard deviations also shown. As in the earlier discussion, the mean temperature annual cycle is depicted as the departure from observations. The results for the six regions with sparse data records (Inner Mongolia, East Tibet, South Tibet, West Tibet, South Xinjiang, North Xinjiang) are illustrated in Figs. S6, S7.

Fig. 11
figure 11

The 1980–2015 mean and standard deviation annual cycles (left, C) and normalized interannual anomalies (right) of temperature simulated by CWRF and RegCM4.6 along with observations (OBS) as averaged over the five key regions with good data. The models mean annual cycle is shown as monthly departures from observations

Fig. 12
figure 12

Same as Fig. 11 except for precipitation (mm day−1)

For the five regions with good data, CWRF simulates the annual cycle for both temperature and precipitation more realistically overall than RegCM4.6. In particular, RegCM4.6 is too cold from February to April in the Northeast and North China, and also too hot from July to September in North China. On the other hand, in Central China CWRF overestimates precipitation from January to June, while RegCM4.6 underestimates it from July to December; otherwise, both are realistic. In South China, RegCM4.6 underestimates precipitation throughout the year, while CWRF overestimates it from May to September. In both regions, the combination of the two models can better simulate observations, suggesting the advantage of an ensemble approach. Similarly, CWRF performs better than RegCM4.6 in the six regions with sparse data.

Temperature interannual anomaly correlations with observations are higher for CWRF than RegCM4.6 in all regions except North China (equal) and South Xinjiang (smaller). Correlation differences between the models are generally within 0-0.06, but are substantially larger in South Tibet (0.12), East Tibet (0.11), West Tibet (0.21), and South Xinjiang (− 0.14), all areas with sparse data and thus less confidence. Likewise, precipitation interannual anomaly correlations with observations are higher for CWRF than RegCM4.6 in all regions, with notably larger differences in the Southwest (0.24), South China (0.19), Central China (0.13), North China (0.15), and the Northeast (0.08), all of which have good data, as well as in Inner Mongolia (0.08), East Tibet (0.15), West Tibet (0.11), and North Xinjiang (0.16), which have sparse data. These results indicate that CWRF better captures observed characteristics of interannual anomalies along with a more realistic annual cycle than RegCM4.6, especially for precipitation over most regions in China.

It is important to identify the key regional anomalies that substantially differ between models and observations. A subsequent diagnostic analysis of these anomalies will offer insight into the climate processes and physical mechanisms that cause such model deficiencies. We choose 2.0 as the threshold that the absolute difference in normalized anomalies between simulated and observed must exceed. These exceedances contain substantial model errors that require future investigation to improve seasonal prediction skill. As marked in Figs. 11, 12 and S6, S7, these cases occur less frequently in CWRF than RegCM4.6, especially for precipitation. The overall results, including exceptions, are consistent with those revealed above in correlations. However, these cases are not coherent between precipitation and temperature, nor between the models and among the regions, indicating that process diagnosis will be challenging.

6 Extreme precipitation

CWRF and RegCM4.6 performance relative to ERI in simulating the 1980–2015 mean 95th percentile of daily precipitation in each season over Mainland China is summarized in a Taylor diagram (Fig. 13). CWRF exhibits outstanding performance in all seasons, producing a high pattern correlation (0.79–0.90) and realistic spatial variability (0.96–1.09), improving over ERI’s smaller correlation (0.71–0.88, except for autumn) and systematic lower variability (0.70–0.87). RegCM4.6 produces reasonable variability (0.85–1.01) but a significantly lower correlation (0.59–0.78) than CWRF and ERI throughout the year. For the larger 99th percentile (not shown), CWRF performance remains high with an even better correlation (0.84–0.91) but slightly larger variability (1.05–1.16). On the other hand, the RegCM4.6 performance is further degraded, with an even lower correlation (0.48–0.69) and a systematically reduced variability (0.78–0.86).

Fig. 13
figure 13

Same as Fig. 2 except for the 95th percentile of daily precipitation

Figure 14 compares seasonal geographic distributions of the 1980–2015 mean 95th percentile of daily precipitation. In summer, large rainfall (> 30 mm day−1) occurs over wide areas extending from South, Central and North China to the southern coast of the Northeast Plain. A band of maxima (exceeding 40 mm day−1) exists along both the Yangtze and Pearl Rivers. ERI systematically underestimates the extremes, especially in Southeast China, roughly capturing only the centers along the Yangtze River, though even these are displaced. Similarly, RegCM4.6 fails to simulate the Pearl River band and also underestimates the Yangtze River band. In contrast, CWRF realistically reproduces the location and magnitude of the centers along both rivers. The extremes along the Northeast Plain coast are well captured by both CWRF and RegCM4.6, but largely underestimated by ERI. CWRF generates another band of large precipitation along the three mountain ridges surrounding the Northeast Plain. These peaks are very weak in ERI and RegCM4.6, as well as in the reference data. As discussed earlier, monitoring stations along these ridges are rare, and thus the ground truth is not known.

Fig. 14
figure 14

Same as Fig. 3 except for the 95th percentile of daily precipitation (mm day−1)

The comparative summer features above are generally retained in spring. The area of large rainfall (> 30 mm day−1) shrinks, losing North China and the Northeast Plain coast, but the two bands of maxima remain along the Yangtze and Pearl Rivers. CWRF realistically captures this characteristic better than ERI and RegCM4.6, which both miss the Pearl River band. The increased correlation in the Taylor diagram shows that the spring patterns produced by ERI, CWRF, and RegCM4.6 are all more realistic than the respective summer patterns. This improvement tendency is also seen from spring to winter, when the percentile magnitude is reduced below 25 mm day−1. As a transition toward the dry winter monsoon, the autumn pattern resembles that of summer but with a systematic reduction in magnitude. Observations still show a band of maxima close to 25 mm day−1 along the Yangtze River. This is visible in CWRF but more scattered, whereas it is further weaker in ERI and displaced to the west in RegCM4.6. Another key feature in autumn is the large rainfall band along China’s entire southeastern coast. This feature is well captured by CWRF and RegCM4.6, but totally missed in ERI. These results indicate that the RCMs are better able to resolve rainfall enhancement by sea breezes along the coast than the coarser ERI.

7 Conclusion

The performance of CWRF for modeling regional climate in China has been rigorously evaluated relative to the most popular RegCM4.6 and the driving ERI through intercomparison of historical simulations during 1980–2015. The comparison focuses on the ability to reproduce the annual cycle, interannual variation, and extreme statistics including precipitation and surface temperature. It is demonstrated that CWRF performs better overall than RegCM4.6, as measured by various quantitative metrics such as bias, correlation, intensity, frequency, and extremes. In particular, CWRF captures the two major summer monsoon rain bands along the Yangtze River and across South China more realistically than RegCM4.6 and even ERI, despite the latter’s assimilation of surface observations. CWRF better represents the diurnal temperature range throughout the year, which ERI systematically underestimates, while RegCM4.6 enhances both negative and positive biases. It improves representation of surface wind, which ERI and especially RegCM4.6 overestimate. For all seasons, CWRF has more skill than RegCM4.6 in simulating interannual anomalies of precipitation and temperature as well as their couplings with humidity. Furthermore, CWRF exhibits outstanding performance in reproducing the 95th percentile of daily precipitation, which ERI persistently underestimates and RegCM4.6 simulates with less coherence. In all ranges of intensity for both daily and monthly precipitation, CWRF generates consistently higher scores than RegCM4.6.

It is challenging to identify which formulation differences listed in Table 2 explain the better performance of CWRF over RegCM4.6. Unexpectedly, increasing the vertical resolution to match CWRF further degrades the RegCM4.6 performance in all the metrics presented above, especially for precipitation related quantities. On the other hand, experiments varying CWRF physics configurations among alternative cumulus, microphysics, cloud, aerosol, radiation, PBL, and surface schemes reveal different levels of sensitivity. In summer, the primary sensitivity comes from the cumulus parameterization, where the CWRF default ECP scheme (Qiao and Liang 2016a, b) simulates more realistic monsoon precipitation characteristics than that used in RegCM4.6 (Tiedtke 1989). The secondary sensitivity lies in cloud-radiation and PBL-surface interactions, while microphysics and aerosol effects are relatively minor. However, the sensitivities change between seasons and variables. A comprehensive study of these sensitivities, which is underway, may help understand key physics parameterizations attributable to the CWRF-RegCM4.6 performance difference. Nonetheless, the comparative results presented in this study justify the initial release of the latest CWRF model together with its computational domain, comprehensive SBCs, and physics configuration, all of which are well tested at 30 km grid spacing for regional climate modeling applications over China.

This CWRF release has no intention to discourage the continuous use of RegCM4.6 or any other RCMs. In fact, the RegCM4.6 performance presented above is based on a single realization with a typical physics configuration conveniently available to us. RegCM4.6 currently includes 3 surface, 2 PBL, 2 microphysics, 6 cumulus, and 2 radiation schemes, so that 144 combinations can be configured. A systematic assessment of the RegCM4.6 performance with various physics configurations is yet to be conducted and compared more appropriately at the same finer vertical resolution as in CWRF. Similarly, CWRF has incorporated many more alternate physics schemes than are presented here, with the total combinations exceeding available computing resources to fully examine (Liang et al. 2012; Liang and Zhang 2013), and the skill of each configuration possibly depending on both horizontal and vertical resolutions. The released version likely does not represent the best performance of CWRF, since only a tiny set of its physics configurations has been tested. A more desirable approach would be an ensemble of multiple physics configurations of CWRF or RegCM4.6 or multiple RCMs. Such an ensemble approach can incorporate performance-based weighting for individual member’s contributions to optimize the outcome, offering a pragmatic way to enhance regional climate prediction skill (Liang et al. 2012). Furthermore, observational data at a finer resolution comparable to the model grid are needed. The available data used in this study are inadequate, especially in western China, causing large uncertainties in model performance evaluation. These are areas of future research focus, some of which are in progress.