1 Introduction

Extreme precipitation events are grabbing public attention due to their large impacts on society and ecosystems (e.g. economic loss and casualties), especially over the Yangtze River basin which has a dense population and high risks of intense summer precipitation. The projection of future changes in extreme events is of great importance to policymakers for risk assessment and infrastructure planning. Global warming is expected to intensify extreme precipitation by enhancing atmospheric moisture content (Trenberth et al. 2003; O’Gorman 2015). In addition to this thermodynamic factor, extreme events are also driven by dynamical factors such as local meso-scale and synoptic-scale motions, which may interplay with thermodynamic effects in many occasions (Bao et al. 2017a, b; Pfahl et al. 2017; Tandon et al. 2018).

Considering the competition between various factors, climate models from phase 5 of the Coupled Model Intercomparison Project (CMIP5) are primary tools used to project future climate changes under various scenarios of external forcing. Based on CMIP5 model outputs, previous studies have concluded that risks of extreme rainfall may increase over many sub-regions of China through the enhanced moisture supply and East Asia summer monsoon (Li et al. 2016, 2018). But uncertainties and discrepancies still exist in state-of-the-art models. Generally, global climate models appear to precipitate too often and too lightly (Deng et al. 2007; He et al. 2019), and underestimate the observed trend of extreme precipitation (Min et al. 2011). In addition, large inter-model spread is found in the projected changes in extreme precipitation (Bador et al. 2018; Zhou et al. 2014) as well as the key systems that modulate extreme precipitation changes (He and Zhou 2015).

Therefore, it is essential to evaluate climate models’ representations of the characteristics of precipitation extremes in historical record and select those models with better performances to improve the confidence we may place on the projected future changes. Given the crucial role of synoptic-scale circulations in driving extreme rainfall formation, some model evaluation studies pay more attention to model performance in synoptic circulations rather than modeled rainfall alone. Some researches utilized the Self-Organizing Map (SOM), an unsupervised pattern recognition technique, as a tool to group and evaluate simulated synoptic-scale circulation over New Zealand (Gibson et al. 2016) and polar regions (Cassano et al. 2006; Uotila et al. 2007) by comparing the frequency of the identified pattern of the synoptic circulation in CMIP3/CMIP5 models and reanalysis product and found a consistent relationship in a subset of CMIP3/CMIP5 models. Other studies concluded that models from CMIP3/CMIP5 generally could capture the composite circulation features associated with extreme precipitation events over North America, while the biases of simulated extreme precipitation intensities and/or timing are still quite robust in some cases (DeAngelis et al. 2013; Kawazoe and Gutowski 2013). Gao et al. (2014) even developed an analogue method in detecting heavy precipitation events over the United States in CMIP5 models, based on the similarity with composite fields from reanalysis for observed heavy precipitation events, and found consistent extreme rainfall statistics with observations. However, the composite circulation for extreme rainfall events may mix and average out different synoptic patterns, being too generalized to represent the fine details of synoptic-scale features conducive to extreme rainfall. SOM algorithm is also used to link extreme rainfall to distinct synoptic patterns, but the linkage sometimes appears unstable (Nishiyama et al. 2007; Ohba et al. 2015). Even though Gibson et al. (2017) modified the SOM algorithm to produce more realistic extreme patterns, they suggest the applications to studying extremes deserve more caution. Our recent work identified three synoptic-scale circulation patterns associated with extreme rainfall events over the middle reaches of Yangtze River (MRYR; 28–32°N, 113–119°E) through hierarchical clustering (Hu et al. 2019). The identified circulation patterns and their relationship with extreme rainfall are robust, because (1) different classification objects and clustering method yield similar results; and (2) the emergence of the identified circulation patterns often lead to increased probability of occurrence of intense rainfall over the MRYR (Hu et al. 2019). In this study, the relationship between the synoptic-scale circulation patterns and the extreme rainfall over the MRYR are further examined and a synoptic method is used to assess whether this close relationship is reproduced in CMIP5 models. The goal is to understand better what caused model biases in representing extreme rainfall and whether the models are producing extreme rainfall over the MRYR through synoptic-scale circulation drivers that are consistent with those observed. Also discussed are the implications of these results for utilizing multi-model ensemble to derive future projections of precipitation extremes.

2 Data and methods

2.1 Observations and model datasets

The daily precipitation observation used in this study is obtained from gauge stations covering mainland China during early summer (June and July) from 1980 to 2005. This dataset is provided by the China Meteorological Administration and has undergone systematic quality control (Wang et al. 2006). The observed daily geopotential height, covering the period 1980–2005, comes from the archives of the Modern Era Retrospective Reanalysis for Research and Application, Version 2 (MERRA-2), provided by the National Aeronautics and Space Administration (Gelaro et al. 2017). The model assessment is performed using daily precipitation and geopotential height output from nine CMIP5 models’ historical run for the same time period. One ensemble member per model is used (i.e. r1i1p1, except r6i1p1 for CCSM4), as the inter-ensemble-member differences of a model’s projected changes in extreme rainfall are relatively small compared to the inter-model spread (see Figure S3 in Pfahl et al. 2017). The names, institutions, and resolution information of each model are listed in Table 1.

Table 1 The names, institutions and approximate horizontal resolution (around China) in atmosphere for CMIP5 models

2.2 Methodology

A comparison of the basic statistics of the modeled rainfall over the MRYR region (box in Fig. 4) against the observations was carried out first. In addition to the assessment of each individual model, Multi-Model Ensemble (MME) is used to represent the holistic performance of all models, the statistics of which is calculated through pooling together all models’ outputs.

In addition to the traditional model evaluation strategies, a synoptic method was developed in this study to assess the models’ performances in simulating extreme rainfall over the MRYR and the corresponding circulation. The metrics of evaluation, are based on the three canonical low-level (i.e. 700 hPa) synoptic-scale circulation patterns around China (15–55°N, 85–135°E) that favor the occurrence of extreme rainfall over the MRYR (Hu et al. 2019). These three patterns are objectively identified through hierarchical clustering algorithm (Zhao et al. 2016, 2017a, b; Hu et al. 2019) based on the daily geopotential height patterns at 700 hPa (Z700, hereafter) around China (15–55°N, 85–135°E) on extreme precipitation days. The threshold of extreme precipitation over the MRYR is defined as the 95th percentile of daily rain rate averaged over the MRYR region (i.e. equals or exceeds the 95th percentile of the domain average rainfall) during early summer (June and July) from 1980 to 2016 (Hu et al. 2019). The extreme precipitation days in a specific CMIP5 model are also identified base on the 95th percentile of modeled daily rainfall averaged over MRYR during early summer from 1980 to 2005, as historical climate simulations of CMIP5 models end at year 2005. To measure the similarity of the circulation on a single day with the observed, extreme rainfall related circulation pattern in the reanalysis, the Reanalysis-based Circulation Index (RCI) for each cluster is defined as the spatial correlation coefficients between the corresponding cluster-mean Z700 and the Z700 around China (15–55°N, 85–135°E) from the MERRA2 dataset for each day in early summer from 1980 to 2005. Similarly, the Modeled Circulation Index (MCI) for each cluster is defined as the spatial correlation coefficients between the corresponding cluster-mean Z700 and the daily Z700 around China in CMIP5 model output. The cluster-mean Z700 anomaly fields of the three canonical patterns are shown in Fig. 1, which share common characteristics of positive (negative) geopotential height anomalies over the South China Sea (MRYR) but are characterized by distinct circulation anomalies at mid-high latitudes. Note that the cluster-mean Z700 anomaly, which is defined as the deviation of the daily mean from the daily climatology, is shown in Fig. 1 instead of the total Z700 field used in clustering analysis and the calculation of circulation index (i.e. RCI and MCI). While recognizing that circulation driving extreme rainfall includes both a climatological and a synoptic component, we plot the daily anomaly field to emphasize the latter given the fact that the magnitude of the climatological circulation change during early summer is smaller compared to that of synoptic variability.

Fig. 1
figure 1

The composite anomalies of geopotential height (shading, units: gpm) and wind vectors (units: m s−1) at 700 hPa of each cluster. The region where surface pressure is lower than 700 hPa (Tibet Plateau) is masked by grey shading

Furthermore, the model intrinsic circulation patterns associated with modeled extreme precipitation were identified through hierarchical clustering algorithm (Zhao et al. 2016, 2017a, b; Hu et al. 2019). In this algorithm, modeled Z700 over China on each extreme precipitation days is regarded as a singleton cluster at the first step; on each subsequent step, two clusters with smallest Ward’s distance (which reflects the dissimilarity between two clusters) are merged until a significant increase in the smallest Ward’s distance have been found at a certain step (see more details in Hu et al. 2019). Since the Ward’s distance and spatial correlation coefficient may be sensitive to the sample size (i.e. model grids number), Z700 from different models and reanalysis, are interpolated to the same grid (i.e. 2.0 × 2.5°) for clustering analysis and calculating RCI/MCI. To maintain consistency, observed and modeled rainfall are also interpolated to the same grid for calculating area-averaged rainfall, which results in 6 grid points over the MRYR.

The local statistical significance of the geopotential height anomalies and the regression slopes of rainfall against the circulation index is tested using the Student’s t test. Considering the inflated likelihood of falsely rejecting the null hypothesis in multiple hypothesis testing (i.e. simultaneous significance testing on multiple locations), we further corrected the statistical testing by controlling the False Discovery Rate (FDR; i.e. the expected ratio of falsely rejected local null hypotheses relative to the total rejected hypotheses) at the 0.05 level, to protect us against the over-interpretation of the multiple testing results (Wilks 2016).

3 Results

3.1 The direct rainfall assessment

An assessment of the basic properties of modeled summer-mean rainfall was conducted first to have a general appreciation of CMIP5 models’ capabilities in reproducing the characteristics of summer precipitation over the MRYR. Note that the models are free-running and not constrained by observations, we will only evaluate whether the precipitation amount and variability, rather than the exact timing of the modeled precipitation phases, are similar to observations. As shown in Fig. 2a, the amplitude of the interannual fluctuations of the summer-mean rainfall can be simulated in CMIP5 models reasonably well and these fluctuations reflect largely internal modes of variability whose phases differ among different models. The standard deviation of the summer-mean rainfall is slightly underestimated in most models except the CCSM4. Regarding the climatological summer rainfall from 1980 to 2005, many models underestimated the rainfall amount (Fig. 2b), with the bias ranging from − 3.9 to − 31.8% of the observed rainfall (Table 2). While a small subset of models produce slightly more rainfall than what is observed (i.e. NorESM1-M: + 5.0%; CCSM4: + 9.2%; MIROC5 + 23.2%). Figure 3 displays the frequency distributions of the daily rainfall over the MRYR from the CMIP5 models and observations for the period 1980–2005 (white boxes versus grey boxes). To simplify the results and also retain the main features, the daily rainfall are categorized into four bins according to the daily rain rate with 10 (mm per day) intervals. Note that the rightmost rainfall bin (Pr ≥ 30 mm per day) actually represents the extreme rainfall as 29.04 mm was identified as the threshold for daily extreme precipitation over the MRYR in observation (Hu et al. 2019). Most models, except for MIROC5 and CCSM4, produce overly frequent light rain (0 ≤ Pr < 10 mm per day) over the MRYR. Correspondingly, the frequency of intense and extreme rainfall (Pr ≥ 20 mm per day) is underestimated in these models. This evident frequency-intensity bias is consistent with other literatures based on various climate models (Deng et al. 2007; He et al. 2019). In our study, only MIROC5 and CCSM4 can reproduce relatively reasonable frequency-intensity structure of summer rainfall over the MRYR. The respective contributions to the daily rainfall total from different intensity bins in the models and the observation are also shown in Fig. 3 (red and blue curves). The model rainfall bias for each bin in terms of the percentage of the observation is listed in Table 2. The positive (negative) bias of light (intense and extreme) rainfall is quite pronounced. It is found that the proportion of the amount of rainfall resulting from intense events (Pr ≥ 20 mm per day) is significantly underestimated by most models except MIROC5 and CCSM4. Therefore, the negative model bias of the summer total rainfall over the MRYR could be largely attributed to the fact that too few intense rainfall events are simulated in models.

Fig. 2
figure 2

a The summer-mean (June and July) rain rate (units: mm day−1), and b the standard deviation (Y-axis; units: mm day−1) and the multi-year mean (X-axis; units: mm day−1) of summer-mean rain rate over the MRYR region from observations (OBS) and CMIP5 models

Table 2 The model rainfall bias (units: %) relative to the observed rainfall during early summer over the MRYR from 1980 to 2005 in different rainfall intensity (mm day−1) bins and total bins
Fig. 3
figure 3

The frequency distributions of the daily rainfall over the MRYR from the CMIP5 models (grey boxes; units:  %) and Observations (OBS; white boxes), and the corresponding contribution to the total rainfall (Observations: red solid lines, CMIP5 models: blue dashed lines; units:  %)

3.2 The synoptic assessment

3.2.1 The connection between the canonical circulation patterns and the extreme rainfall

Before we formally evaluate the synoptic-scale circulation patterns associated with extreme rainfall in models, we first revisit the linkages identified between the two in the observation. Specifically, we want to find out whether the presence of daily circulation anomalies similar to the canonical circulation patterns tends to cause the occurrence of extreme rainfall.

Figure 4a shows the “slope” coefficients of the daily rainfall at each station regressed against the daily RCI for the three clusters using observational data from 1980 to 2005. In general, regions with maximum positive value are found over the MRYR in all clusters. In other words, a higher daily RCI, which indicates a higher degree of similarity between the daily circulation and one of the canonical circulation patterns, leads to a higher chance of intense rainfall over the MRYR. Within the MRYR region, relatively lower values are found over the southwestern part, especially in Cluster 2 and Cluster 3. This implies that the synoptic patterns are relatively less important to the extreme rainfall formation over this part of the MRYR due to the presence of other impactful local factors such as topography. Evident increases of rainfall can also be found over parts of the eastern China including the lower reaches of Yangtze River, which are consistent with the composite rainfall distribution on the MRYR extreme event days (Fig. 4b). This broader-scale connection to rainfall to the east suggests that extreme rainfall over the MRYR in summer is generally tied to large-scale precipitation bands associated with the monsoonal frontal system.

Fig. 4
figure 4

The spatial pattern of a the slope coefficients (10−4 mm day−1) by regressing the observed daily rainfall against the Reanalysis-based Circulation Index (RCI) from 1980 to 2005 (Only the slope coefficients that are statistically significant at the 0.05 level are shown with the False Discovery Rate controlled at the 0.05 level) and b the composite observed precipitation rate (mm day−1) on extreme precipitation days for the three clusters. The black rectangle denotes the MRYR (28–32°N, 113–119°E) region

3.2.2 Model assessment based on the canonical circulation patterns

With the relationships between the three canonical circulation patterns and the extreme rainfall over the MRYR further verified, we now address the question whether such relations are reproduced in CMIP5 models. The probability distribution function (PDF) of the daily maximum RCI/MCI among the three clusters (Fig. 5a1a10), which indicates the probability of the rainfall-producing-patterns regardless of the specific cluster, and the PDF of the observed/modeled daily rain rate (Fig. 5b1b10) over the MRYR are shown in Fig. 5. Note that only the right tail of the PDF of RCI/MCI and rain rate are shown in Fig. 5 to highlight the relationship between the canonical synoptic patterns and the extreme rainfall. In most models (i.e. CanESM2, GFDL-CM3, IPSL-CM5B-LR, MPI-ESM-LR, MRI-CGCM3, NorESM1-M, MME), compared with the observation, the negative bias of the observed synoptic patterns’ occurrence is in agreement with the negative bias of the extreme rainfall’ occurrence. This matched bias implies that the close relationship between the canonical synoptic patterns and the extreme rainfall over the MRYR may be reproduced in these models. This is further verified by the similar spatial distribution (i.e. similar distribution to Fig. 4) of the slope coefficients of modeled rainfall regressed against the MCI, the maximum values of which tend to emerge over the MRYR region (figures not shown). However, models with mismatched bias between extreme rainfall and synoptic patterns are also found: (1) MIROC5 and CCSM4 have provided a reasonable rain rate distribution while the occurrence probability of the canonical synoptic circulation patterns in models are underestimated; (2) FGOALS-g2 was expected to produce more extreme rainfall because of an overestimate of the occurrence probability of these extreme-rainfall-producing-patterns yet the probability of intense rainfall is underestimated in FGOALS-g2. These results in general underpin the importance of synoptic-scale forcing in driving extreme rainfall statistics and suggest that the underestimated extreme rainfall probability in some models may be partly attributed to the inadequate simulation of the occurrences of the canonical synoptic-scale circulation patterns that are found to drive extreme rainfall in observations. The “bias mismatch” cases demonstrate that in addition to the misrepresentation of synoptic-scale circulation patterns critical for extreme rainfall formation, other factors including the well-known problems of cumulus parameterization might be the main culprits to blame for the underestimated extreme rainfall in some CMIP5 models.

Fig. 5
figure 5

The probability distribution function (PDF; Y-axis, units: %) of a the daily maximum Reanalysis-based Circulation Index (RCI; red lines; X-axis) and Modeled Circulation Index (MCI; blue lines; X-axis) in CMIP5 models among the three clusters, and the PDF of b the daily rain rate (X-axis, units: mm day−1) over the MRYR in observations (OBS; red lines) and CMIP5 models (blue lines)

4 Discussion and Conclusion

4.1 Discussion

One may suspect that significant discrepancies of the occurrence probability among the canonical synoptic circulation patterns (clusters) in the model may have led to the mismatch. However, as shown in Fig. 6, the model bias of MCI in different clusters is generally consistent. This concurrent underestimate or overestimate of these synoptic circulation patterns suggests potential dynamical connections (e.g. similar origins from upstream storm tracks) among the three synoptic circulation patterns identified in the observations.

Fig. 6
figure 6

The probability distribution of the daily Reanalysis-based Circulation Index (red solid lines) and Modeled Circulation Index (colored dashed lines) in CMIP5 models for the three clusters

The mismatched statistics between the synoptic-scale circulation patterns and extreme rainfall in three models (i.e. CCSM4, FGOALS-g2 and MIROC5) deserve some further discussions. We first check whether the mismatched models have their own “intrinsic”, model-dependent circulation patterns to produce extreme rainfall over the MRYR in summer. A model (i.e. GFDL-CM3) showing a good match between circulation and rainfall statistics in terms of their bias directions from the observation (i.e. overestimate or underestimate simultaneously) is used here as a reference for comparison. The model intrinsic synoptic patterns associated with the modeled extreme precipitation days were identified through hierarchical clustering and the cluster-mean anomalies of geopotential height at 700 hPa in these four models are shown in Figs. 7, 8, 9 and 10. Four circulation patterns were identified in GFDL-CM3 with Cluster 1 characterized by a low likelihood of occurrence (Fig. 7). The three dominant model-intrinsic-patterns (i.e. Cluster 2, 3, 4) in GFDL-CM3 are similar to the canonical synoptic-scale circulation patterns identified in reanalysis. They all show a similar meridional structure of geopotential height anomalies across the MRYR (115°E) and a similar zonal structure of geopotential height anomalies at mid-high latitudes. The extreme rainfall over the MRYR in GFDL-CM3 is therefore produced by synoptic-scale forcing that is consistent with what is observed. Within the seven identified intrinsic patterns of CCSM4, only three of them (Cluster 1, Cluster 3 and Cluster 4) bear certain degrees of similarity to the observed synoptic-scale circulation patterns (Fig. 8). Therefore, even though the occurrence frequency of the observed canonical circulation patterns are insufficiently simulated in CCSM4, the model produces other circulation patterns (i.e. Cluster 2, Cluster 5, Cluster 6 and Cluster 7) that are not present in the real world to make up for that underestimate, ultimately simulating a rain rate distribution consistent with the observation, assuming that the model has a reasonably good suite of physical parameterizations. In FGOALS-g2, the three major circulation patterns (Cluster 2, Cluster 3 and Cluster 4) are quite similar to the identified canonical patterns (Fig. 9) and their occurrence probabilities are even higher compared to those in the observation. The underestimate of extreme rainfall in FGOALS-g2 is likely tied to problems in model physics (potentially cumulus and microphysical parameterizations) since the model obviously captures the correct large-scale forcing of extreme rainfall thus is already dynamically well constrained. Nine patterns are identified in MIROC5. Prominent low-pressure anomalies over the MRYR are seen in all patterns, providing favorable conditions for rainfall formation (Fig. 10). However the overall distribution of the height anomalies in these patterns are quite different from what are observed over China. Therefore MIROC5 also generates its own synoptic-scale circulation patterns (or “weathers”) to produce extreme rainfall in summer. The fact that MIROC5’s rainfall statistics has a good match with the observation indicates that the model physics relevant to rainfall formation is reasonably good, since this is necessary for a good rainfall simulation in addition to the model generating its own extreme-rainfall-producing circulation patterns.

Fig. 7
figure 7

Composite anomaly of geopotential height (units: gpm) at 700 hPa of each cluster in GFDL-CM3. The region where surface pressure is lower than 700 hPa (Tibet Plateau) is masked by grey shading. The anomalies that are statistically significant at the 0.05 level are indicated by dotted area with the False Discovery Rate controlled at the 0.05 level

Fig. 8
figure 8

The same as Fig. 7 but for CCSM4

Fig. 9
figure 9

The same as Fig. 7 but for FGOALS-g2

Fig. 10
figure 10

The same as Fig. 7 but for MIROC5

4.2 Conclusion

In this study, we found that CMIP5 models in general underestimate the total rainfall over the middle reaches of Yangtze River (MRYR) in early summer (June and July), and this negative bias in rainfall is largely caused by models’ underestimate of the occurrence frequency of intense and extreme rainfall events (Pr ≥ 20 mm per day) in this region (i.e. CanESM2, GFDL-CM3, IPSL-CM5B-LR, MPI-ESM-LR, MRI-CGCM3, NorESM1-M and MME). Compared to the observation, the negative bias in the extreme rainfall occurrence frequency is consistent with model’s underestimate of the occurrence frequency of the canonical synoptic-scale circulation patterns that typically lead to the formation of extreme rainfall events over the MRYR. While many models show negative biases that are consistent between circulation patterns and extreme rainfall frequency, some models have biases that are “mismatched”. We demonstrate that these “mismatches” occur in two occasions: (1) Despite underestimating the observed canonical circulation patterns responsible for extreme rainfall, the model has its own intrinsic synoptic-scale circulation patterns that produce extreme rainfall over the MRYR region during summer, likely accompanied by reasonably good model physics (e.g. CCSM4; MIROC5); and (2) the model excites synoptic-scale circulation patterns similar to those observed yet extreme rainfall is still underestimated primarily due to model deficiencies in physical parameterizations relevant to rainfall (e.g. FGOALS-g2). Nevertheless, these results are based on the analyses of extreme rainfall in one region and one season, further research covering other regions/seasons is needed.

The synoptic assessment of model rainfall discussed here offers a straightforward way to isolate model errors in physics from those related to large-scale dynamical processes. By tracing back to the canonical synoptic forcing responsible for extreme rainfall, the synoptic method allows one to identify a subset of models with synoptic variability that is consistent with the observation. This identification is crucial given the diverse response of different circulation patterns to radiative forcing changes in the atmosphere. A projection of future changes in extreme rainfall over the MRYR will be better constrained dynamically if this subset of models are used in generating projection ensembles. Finally, our results also suggest the need of reducing model biases in synoptic variability that drive the formation of extreme events in order to achieve better simulations of these extreme and this is in addition to efforts aiming to improve model physical parameterizations such as cumulus and microphysics schemes.