1 Introduction

Southern China (SC), which is affected by the East Asian summer monsoon, features complex and varied weather and climate systems (Li et al. 2020a). The extreme rainfall is one of the most severe disastrous weather events in SC during summer, which could lead to flood, landslide, debris flow and other secondary disasters, and result in infrastructure damage and casualties (Li and Wang 2018; Wang et al. 2021; Yang et al. 2021a).

Accurate prediction of extreme rainfall at longer lead time (beyond 10 days) has important meaning for better disaster prevention and mitigation (Zhang et al. 2021). However, because of the imperfectness of numerical models (e.g., errors in data assimilation techniques, initialization schemes and parameterizations) (Liang and Lin 2018; Pegion et al. 2019), the subseasonal prediction (at a forecast lead time of 2–6 weeks) skills based on current numerical models cannot meet the demand of meteorological services yet. Understanding the source of subseasonal predictability and improving the model skills at this timeframe is grand challenge for both scientific and operational communities worldwide (Brunet et al. 2010; Lee and Wang 2016; Liu et al. 2020).

As the dominant mode of subseasonal variations in the tropical atmosphere during boreal summer, the northward-propagating intraseasonal oscillation from the equator into the Asian summer monsoon (ASM) region, commonly referred to as the boreal summer intraseasonal oscillation (BSISO, Lee et al. 2013), regulates the extremes in this region (Hsu et al. 2016, 2017; Lee et al. 2017; Yao et al. 2020). Several theories have been put forward to probe into the complex characteristics of BSISO (Goswami and Shukla 1984; Wang and Xie 1997; Fu and Wang 2004; Jiang et al. 2004, 2011). In order to better monitor and predict the BSISO over the ASM region (approximately 10° S–40° N, 40°–160° E), Lee et al. (2013) proposed two BSISO indices based on the four leading Empirical Orthogonal Function (EOF) modes of daily mean outgoing longwave radiation (OLR) and 850-hPa zonal wind (U850) anomalies over the ASM region. The first index of BSISO (BSISO1), consisting of the first two EOF modes, effectively represents the canonical northward and northeastward propagation feature of BSISO over the Indian Ocean with a period of 30–60 days. The second BSISO index (BSISO2) is defined by the combination of the third and fourth EOF modes, and it mainly captures the northward and northwestward propagation of ISO from the tropical central-western Pacific with a period of 10–30 days.

Statistical analyses show that the probability of rainfall and heat extremes over the Asian monsoon regions, including India, SC, and Northeast Asia, is notably modulated by BSISO1 and BSISO2 (Hsu et al. 2016, 2017; Lee et al. 2017; Yao et al. 2020). As indicated by Hsu et al. (2016), BSISO1 is more favorable for the increase of extreme rainfall events over the Yangtze River Valley in phases 2–4, while BSISO2 tends to relate to the positive anomalies of extreme rainfall along the southeast coast of SC. For example, floods in the Yangtze River Basin during 1998 occurred in tandem with the active phase of BSISO1 (Zhu et al. 2003; Lu et al. 2014), and BSISO2 made a great contribution to the devastating flood in Fujian province during 2010 (Hsu et al. 2016). This provides the potential for the real-time forecast of rainfall based on its relationship with BSISO (Lee et al. 2017; Wang et al. 2020).

Accurate prediction of atmospheric intraseasonal variation and its impacts, especially the extreme events 10–30 days in advance is imperative and is the main target of the subseasonal to seasonal (S2S) prediction project (Vitart et al. 2017). The S2S project establishes an extensive database (the S2S Database), including subseasonal (up to 60 days) forecasts and reforecasts from 11 operational and research centers (Vitart et al. 2017), providing a wide platform to study the intraseasonal issues. Thus, many studies have been carried out based on this database. Wu et al. (2017) found that the S2S models can effectively predict BSISO1 and BSISO2 events up to 6–24.5 and 6.5–14 day in advance, respectively. However, there is still a large room to improve BSISO since multi-model mean prediction skill is much lower than the potential/theoretical predictability of BSISO (Ding et al. 2011; Lee et al. 2015). Most models show deficiency in simulating the spatial structure, evolution, propagation, and intensity of BSISO (Fang et al. 2019; He et al. 2019; Bo et al. 2020; Zhu et al. 2021). The subseasonal prediction skills of rainfall and extreme rainfall are also limited. For the global weekly-mean precipitation over land, the forecast skills from most S2S models are confined in the first week (de Andrade et al. 2019). The significant prediction skills of South Asian Monsoon and Southeast Asian Monsoon rainfall are only up to about 2.5 days (Wu et al. 2017). The summer weekly mean East Asian precipitation can be predicted significantly up to 5–11 days in advance (Liang and Lin 2018); meanwhile, biases also exist in the prediction of subseasonal variability of East Asian summer rainfall (Fang et al. 2019). Although the model of ECMWF generally outperforms its counterparts (Fu et al. 2013; Jie et al. 2017; He et al. 2019), it still has difficulty in capturing the record-breaking Meiyu rainfall event in 2020, with the anomaly correlation coefficient being lower than 0.5 (Liu et al. 2020). It has also been shown that the intraseasonal forecast skill of rainfall in dynamical models tends to be higher when the amplitudes of the BSISO are larger (Liu and Wang 2015; Ren et al. 2018).

Previous studies mainly focused on the evaluation of model performance in simulating the BSISO index or subseasonal rainfall variation, whereas less attention has been paid to discussing the source of subseasonal predictability of extreme rainfall over SC. Considering the observed association between BSISO and extreme rainfall, whether the prediction skill of extreme rainfall is linked with the prediction skill of BSISO needs to be examined. Through what processes the BSISO influences the extreme rainfall prediction in the S2S operational models also merits further exploration. Unraveling these issues could help us to identify the source of biases in prediction.

Since the spatiotemporal variation and propagating feature of BSISO2 are significantly different from those of BSISO1, the relationship between the skills of BSISO2 and SCER will be reported in another separate paper. Based on the reforecast data of two S2S operational models, this paper aims to systematically assess the prediction skill of BSISO1 and SCER, including the deterministic and probabilistic prediction capability. Then, the sources of the associated prediction biases are diagnosed and discussed. This work is a basic but crucial step towards improving the current S2S prediction systems for predicting the high-impact extreme weather events at the subseasonal timescale.

2 Data and methods

2.1 Observational data

The observational datasets used in this study are as follows: (1) The daily mean precipitation data from gauge stations over China gridded into a horizontal resolution of 0.25° × 0.25°, which is provided by the National Meteorological Information Center (CN05.1, Wu and Gao 2013), and the Asian Precipitation Highly-Resolved Observational Data Integration Towards Evaluation (APHRODITE) gridded precipitation (Yatagai et al. 2012) with a 0.25° horizontal resolution are adopted. To reduce the uncertainty arising from different data, a simple arithmetic average of these two precipitation datasets is applied (Hsu et al. 2016). (2) The ERA-Interim reanalysis dataset (Dee et al. 2011) including daily mean wind and specific humidity is used, with its horizontal resolution being 1.5° × 1.5°. (3) The real-time BSISO index proposed by Lee et al. (2013) (http://iprc.soest.hawaii.edu/users/jylee/bsiso/), which better represents the northward propagation of ISO during boreal summer than the Real-time Multivariate MJO index (RMM, Wheeler and Hendon 2004).

Using the principal component (PC1 and PC2) time series of the BSISO1 index, the life cycle of BSISO1 is divided into 8 phases. The active BSISO1 days are defined when the normalized BSISO1 amplitude is greater than 1 (i.e., \(\sqrt{{{\mathrm{PC}}1}^{2}+{{\mathrm{PC}}2}^{2}}>1.0\)). In contrast, an inactive BSISO1 (or non-BSISO1) period is identified when its amplitude is smaller than 1, during which the BSISO1 signals are weakened and less organized. It is noteworthy that the phases 2–4 of BSISO1 are highly connected with the extreme rainfall occurrence over SC (Hsu et al. 2016). In this article, the model prediction skill in these specific phases of BSISO1 and the associated SCER probability will be the main focus.

2.2 S2S model data

The reforecast data from the operational models of ECMWF and CMA are derived from the S2S database (http://apps.ecmwf.int/datasets/data/s2s). Description of the reforecast data from the two S2S models can be found in Table 1 (more details are available in Vitart et al. 2017). Variables used in this study include daily horizontal winds, specific humidity, OLR and precipitation. The horizontal resolutions of these two S2S models are interpolated to a uniform resolution of 1.5° × 1.5°.

Table 1 Description of the reforecast data of CMA and ECMWF models

The CMA model produces daily forecast with four ensemble members, whereas the ECMWF model is initiated twice weekly (every Monday and Thursday) with eleven ensemble members. For a fair comparison, a data processing method developed by Yang et al. (2018) is utilized to reprocess the ECMWF’s twice-weekly model outputs (104 initialization dates yearly) to daily reforecast data. The so-derived dataset contains a continuous distribution of reforecasts on all dates from 1998 to 2012, at all lead times from 3 to 42 days. In this manner, the original data array and the new data array could be consistent, and the evaluation of prediction skills will not be influenced by this special treatment (Yang et al. 2018). In this paper, the ensemble mean is the arithmetic average of ensemble members for each of the two model reforecast sets. There is no filtering for observation and model outputs, and the real-time BSISO indices proposed by Lee et al. (2013) are employed to extract BSISO1-related signals.

Extreme rainfall over SC (18°–32.5° N, 105°–122° E) and enhanced BSISO activity are observed during the boreal summer monsoon season (May to August, hereafter MJJA). Taking into account that different operational centers’ forecast data cover different periods, MJJA during 1998–2012, the common period among them, is selected as the target period of assessment in this study for a fair comparison.

2.3 Methods

2.3.1 Definition of extreme rainfall

Given that models generally have systematic biases, a relative threshold (percentile-based threshold) is used to define the observed and forecasted extreme rainfall (Jones et al. 1999; Yan et al. 2002; Zhang et al. 2011; Li et al. 2012; Xavier et al. 2014). For a given calendar day at a grid, a rainfall extreme occurs when the daily precipitation amount exceeds a criterion of the 90th percentile of a set of daily records, including those on the same calendar day and 90 neighboring days (45 before and 45 after that day) from 1998 to 2012 (Li et al. 2017, 2020b).

In each BSISO1 phase, the probability of extreme rainfall occurrence (Px, x denoting the phases 1–8), is defined by the ratio of the number of extreme rainfall days to the number of total days. To quantify the influence of BSISO1 state on extreme rainfall, the probability of extreme rainfall occurrence during different BSISO1 phases relative to the non-BSISO1 period is compared. Thus, the percentage change in the probability of extreme rainfall occurrence during each of the BSISO1 phases is calculated as \([{(P}_{X}-{P}_{\text{non-BSISO}1})/{P}_{\text{non-BSISO}1}\times 100{\%}]\), where the \({P}_{\text{non-BSISO}1}\) represents the probability of extreme rainfall during the non-BSISO1 period.

2.3.2 Verification metrics

For the deterministic verification metric, the bivariate anomaly correlation coefficient (ACC) of PC1 and PC2 associated with the BSISO1 is used to quantitatively evaluate the forecast skills of BSISO1 on different lead times (Lin et al. 2008). The definition of ACC is given below:

$${\text{ACC}} = \frac{{\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} \left[ {{\text{F}}_{{1{\text{t}}}} {\text{O}}_{{1{\text{t}}}} + {\text{F}}_{{2{\text{t}}}} {\text{O}}_{{2{\text{t}}}} } \right]}}{{\sqrt {\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} [{\text{F}}_{{1{\text{t}}}}^{2} + {\text{F}}_{{2{\text{t}}}}^{2} } \left] {\sqrt {\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} [{\text{O}}_{{1{\text{t}}}}^{2} + {\text{O}}_{{2{\text{t}}}}^{2} } } \right]}},$$
(1)

where F and O refer to the forecasted and observed BSISO1 index, respectively. The t indicates time, and T is the total number of forecast times. The subscripts 1 and 2 denote different variables (such as PC1 and PC2).

Because both amplitude and phase errors of BSISO may contribute to skill degradation, the observed and predicted BSISO1 index are rewrote into polar coordinates as O (o, φ) and F (f, θ), respectively, to separate the relative contribution of the amplitude and phase of BSISO to ACC skill. Here, o and f refer to amplitude, and φ and θ are phase angles in the observations and predictions, respectively. ACC in polar coordinates is then defined as Wang et al. (2019):

$${\text{ACC}} = \frac{{\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} {\text{f}}_{{\text{t}}} \cdot {\text{o}}_{{\text{t}}} \cdot {\text{cos}}\left( {\uptheta_{{\text{t}}} - \upvarphi_{{\text{t}}} } \right)}}{{\sqrt {\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} {\text{f}}_{{\text{t}}}^{2} } \sqrt {\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} {\text{o}}_{{\text{t}}}^{2} } }}.$$
(2)

Assuming phase of BSISO is perfectly forecasted, i.e., \(\text{cos}\left({\uptheta }_{\mathrm{t}}-{{\upvarphi }}_{\mathrm{t}}\right)=1\), ACC is completely determined by the relation between the predicted and observed amplitude of BSISO:

$${\text{ACC}}_{{\text{a}}} = \frac{{\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} {\text{f}}_{{\text{t}}} \cdot {\text{o}}_{{\text{t}}} }}{{\sqrt {\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} {\text{f}}_{{\text{t}}}^{2} } \sqrt {\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} {\text{o}}_{{\text{t}}}^{2} } }}.$$
(3)

Assuming amplitude of BSISO is perfectly forecasted, i.e., the linear correlation coefficient between \({f}_{t}\) and \({o}_{t}\) is 1, ACC is the scalar phase correlation between the forecasted and observed phases of BSISO:

$${\text{ACC}}_{{\text{p}}} = \frac{{\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} {\text{o}}_{{\text{t}}}^{2} \cdot {\text{cos}}\left( {\uptheta_{{\text{t}}} - \upvarphi_{{\text{t}}} } \right)}}{{\mathop \sum \nolimits_{{{\text{t}} = 1}}^{{\text{T}}} {\text{o}}_{{\text{t}}}^{2} }}.$$
(4)

To judge the similarity of the spatial distribution between observed and forecasted fields, the pattern correlation coefficient (PCC) is calculated. The normalized root-mean-square error (NRMSE), indicating the amplitude of forecast error, is the RMSE normalized by the observed spatial standard deviation with reference to the whole domain (Lee and Wang 2014). ACC and PCC range from − 1 to 1. The closer to 1 the values of ACC and PCC are, the more skillfully the model performs. The NRMSE varies from 0 to 1. The smaller the NRMSE is, the less biased the amplitude is.

For the probabilistic verification metric, the categorical verification score referred to as Heidke Skill Score (HSS) is used to appraise the hit rate of extreme rainfall. The HSS, which can comprehensively evaluate model performance in simulating the probability of SCER occurrence, measures the fraction of correct forecasts after eliminating the corrected forecasts that are purely due to random chance (Heidke 1926). The HSS is written as follows:

$$HSS = \frac{{2\left( {ad - bc} \right)}}{{\left( {a + c} \right)\left( {c + d} \right) + \left( {a + d} \right)\left( {b + d} \right)}},$$
(5)

where a denotes the number of observed extreme rainfall that are correctly forecasted, b represents the number of forecasted extreme rainfall that do not occur, c denotes the number of observed extreme rainfall that are not forecasted, and d represents the number of correct rejections. The range of the HSS is -∞ to 1. A negative HSS indicates a forecast worse than the random forecast, while 0 means no skill, and 1 denotes a perfect forecast.

3 Forecast verification of SCER and BSISO1

3.1 Forecast skills of climatological mean and variability of summer rainfall

Figure 1 shows the observed mean and daily standard deviations of MJJA rainfall over China and their forecast biases at a 14-day lead in the two S2S models. It can be found that both the climatological mean and the variability of summer daily rainfall are maximized over SC (Fig. 1a, d). Overall, both the ECMWF and CMA model have good capability in predicting the spatial distributions of summer mean precipitation (Fig. 1b, c) and the daily precipitation variability (Fig. 1e, f) over China at a 14-day lead. However, the ECMWF model obviously outperforms the CMA model for the spatial distribution of both summer mean rainfall (with PCC of 0.86 vs 0.64) and daily standard deviations (with PCC of 0.89 vs 0.61) over China. For ECMWF, both summer mean rainfall and intensity of daily rainfall variability are overestimated over most regions of northwestern SC, but they are slightly underestimated over the southeast coast. The CMA presents an evident underestimation for both mean rainfall and daily variations over entire SC.

Fig. 1
figure 1

Distribution of a climatological summer mean precipitation during 1998–2012 over mainland China (units: mm day−1) and the biases (prediction minus observation) at a 14-day lead from the ensemble mean prediction of b the ECMWF and c CMA models. df are same as ac, except for the daily precipitation standard deviations in MJJA (units: mm day−1). The PCC and NRMSE skills are calculated over the whole China and given in the bottom-left of each panel. The green boxes delineate the region of southern China (SC; 18°–32.5° N, 105°–122° E)

3.2 Forecast skills of the SCER

Consistent with the summer mean precipitation prediction results, the domain-averaged 90th percentile of precipitation predicted by ECMWF model is higher than observation, whereas the CMA model shows a smaller value (Fig. 2a). With the observed areal-mean 90th percentile of precipitation being 14.5 mm/day, the CMA forecasted thresholds range from 7.5 to 9.5 mm/day for different lead times. As for ECMWF, the forecasted threshold is around 15–17 mm/day. According to the forecasted threshold (90th percentile) of extreme rainfall in each grid at different lead times, the capability of predicting SCER occurrence by the two S2S model is then assessed.

Fig. 2
figure 2

a The observed (grey dashed line) and forecasted (colored curves) areal-mean 90th percentile of rainfall (units: mm day−1) over southern China during the summers of 1998–2012 at 5–30-day leads. The blue (red) curve represents the ensemble mean prediction from the ECMWF (CMA) model, along with the inter-member spreads shown by whiskers. b Areal-mean Heidke Skill Score (HSS) of SCER occurrence for ensemble mean predictions from ECMWF (blue curve) and CMA (red curve) and their inter-member spreads (shadings) as a function of forecast lead time (in day). Distribution of HSSs for the forecasted SCER occurrence at 7-, 14-, and 21-day lead for the ensemble mean prediction from ce ECMWF and fh CMA models

The HSS spatial distributions in Fig. 2c–e suggest that the ECMWF model has useful skills (with HSS larger than 0.1) in predicting extreme rainfall occurrences over the majority of SC, and the useful skills can persist up to a 21-day lead along the southeast coast. In contrast, the CMA model shows lower prediction skill at a 7-day lead, as only a small part of region shows useful HSS (larger than 0.1), and no useful skills can be found at 14-day and 21-day leads (Fig. 2f-h). Figure 2b shows that the areal mean HSS over SC drops quickly as the lead time increases in both two models, either for the predictions of individual members or of the ensemble mean. Compared to the predictions of individual members, the ensemble prediction has higher and more stable predictive skills at most lead times. Using a criterion with HSS of 0.1 (which is considered as a useful forecast), the ensemble prediction of the CMA model can capture the SCER occurrence within 7 days in advance, while the ensemble prediction from the ECMWF model can effectively reproduce SCER up to a 14-day lead.

In sum, the above forecast verification reveals that the two models have limited skills beyond the lead time of 14 days in predicting the SCER occurrences, which is consistent with the subseasonal predictive level of extreme rainfall among most of the current operational models (Li et al. 2019).

3.3 Forecast skills of BSISO1 index

Because of the significant influence of BSISO on SCER (Hsu et al. 2016), the models’ capacity in capturing the BSISO1 could directly affect the prediction skill of SCER. Therefore, we first evaluate the prediction skill of the BSISO1 index in the two models. Here, the forecasted BSISO1 index values are obtained by projecting the forecasted normalized OLR and U850 anomalies from each S2S model onto the observed BSISO1 spatial patterns, which are consistent with the observed first two EOF modes of BSISO defined by Lee et al. (2013).

Figure 3 shows the ACC skills of BSISO1 index with a function of lead time. For both the ECMWF and CMA models, the ACCs for the forecasted BSISO1 index decrease with the increase of lead time. Taking ACC = 0.5 as the threshold of a valid forecast skill, the lead time of useful ensemble prediction of the BSISO1 index from ECMWF is up to 24 days, which is noticeably higher than that of CMA with the lead time of useful prediction being only up to 10 days. If the amplitude of BSISO1 is perfectly forecasted, the ECMWF (CMA) model can skillfully predict the BSISO1 index 30 (11) days in advance, and the ACCp is always slightly higher than ACC. If the phase error of BSISO1 is ignored, ACCa of the two S2S models is above 0.85 at all lead times. This indicates that the phase error, rather than the amplitude error, matters more in the prediction skills of BSISO1 index. Because the models always have skills in forecasting the BSISO1’s intensity, whether they can skillfully predict the BSISO1 depends largely on the capacity for predicting the phase of BSISO1.

Fig. 3
figure 3

Bivariate Anomaly Correlation Coefficient (ACC) for the forecasted BSISO1 index from the CMA (red solid curve) and ECMWF (blue solid curve) models during 1998–2012 as a function of forecast lead time (in day). The short and long dashed curves represent the ACC with the assumption of perfect phase (ACCa) and perfect amplitude (ACCp) prediction, respectively

4 Forecast verification of BSISO1’s modulation on SCER

Is the prediction skill of the BSISO1 index related to that of SCER probability as the BSISO1 strongly modulates the SCER occurrences? To address this question, we calculate the linear correlation between the ensemble prediction skills (HSSs) of BSISO1 index and SCER occurrences at all lead times, and it is found that in both models, the HSS skills for the BSISO1 index are always significantly correlated with the areal mean HSS skills of SCER with a correlation coefficient of 0.88 in the ECMWF model and 0.90 in the CMA model (supporting Fig. S1), both of which pass the 99% confidence level. The robust relationship suggests that the prediction by S2S models could reflect the strong modulation of BSISO1 on SCER probability. In other words, these high correlation coefficients imply that skillful prediction of BSISIO1 is helpful to improve the prediction skill of areal-mean SCER occurrences over southern China.

As shown in Fig. 4, with respect to the non-BSISO1 period, the observed probability of extreme rainfall increases mostly over the Yangtze River Valley (YRV) with rises of 30–80% occurring during phases 2–3 of the BSISO1, while pronounced increases of extreme rainfall higher than 60% appear in southeastern China at phase 4 (Fig. 4). How well do the S2S models reproduce the modulation of BSISO1 on SCER probability? As shown in Fig. 4, the ECMWF model can predict the increased extreme rainfall probability over the YRV during phases 2–3 and over southeastern China in phase 4 at lead times within 21 days, although the intensity and location of maximum probability slightly depart from the observation. In phase 2, the ECMWF model underestimates probability of extreme rainfall in the YRV, especially beyond 14-day lead times. The ECMWF model performs better in phase 3. It can skillfully predict the regions with increased probability of extreme rainfall up to a 21-day lead, with PCCs larger than 0.5. In phase 4, ECMWF underestimates the observed probability increase of extreme rainfall over southeastern China, and overestimates the probability intensity over the YRV, resulting in relatively lower PCCs than in the other two phases. It is disappointing that the CMA model shows very limited skills in predicting the spatial distributions of SCER probability in phases 2–4 of BSISO1. This model generally underestimates the probability of extreme rainfall occurrence in phases 2–4 of BSISO1, which is 10–60% in its simulation. Even at a 7-day lead, the forecasting underestimates the extreme rainfall probability over the YRV in phases 2–3 and over southeastern China in phase 4. In general, these results suggest that the modulation of BSISO1 on SCER is robust. However, other factors, such as the convective scheme (Vitart 2017; de Andrade et al. 2019) and the land model (Kumar et al. 2014; Zhang et al. 2016; Liang and Lin 2018), may also play an important role in the SCER prediction in dynamic models, since the relationship between SCER and BSISO1 in simulation is not as strong as that in observation.

Fig. 4
figure 4

Percentage changes (%) in probability of SCER during phase 2 (upper two rows), phase 3 (middle two rows), and phase 4 (bottom two rows) of BSISO1 with respect to the non-BSISO1 period. Panels from left to right are the observation, 7-, 14- and 21-day lead of ensemble predictions, respectively. The PCC skills are shown in the bottom of each panel. Changes exceeding the 95% confidence level are dotted

Figure 5 shows the PCC and NRMSE prediction skills for the distribution of SCER probability as a function of lead time during phases 2–4 of BSISO1. In general, the PCC (NRMSE) tends to decrease (increase) with the lead time. The ensemble predictions show that the ECMWF model can skillfully predict the SCER probability at a 7-day lead in phase 2, and at up to a 25-day lead in phase 3. During phases 2 and 3, the PCC skills of the CMA model are mostly lower than those of the ECMWF model, as the skills are only useful at 2–3-day lead. No prediction skills can be found in phase 4 for both two models. Note that although the ensemble-mean prediction skills (solid curves in Fig. 5) from ECMWF do not show obvious better performance than that from CMA in phase 2 and 4. However, if we see the predictions skills from all individual members of ECMWF, the overall PCC skills of ECMWF are higher than CMA (the blue shadings are higher than the red shadings in the Fig. 5).

Fig. 5
figure 5

The PCC skills for the percentage changes in SCER probability in a phase 2, b phase 3 and c phase 4 of BSISO1 with respect to the non-BSISO1 period as a function of forecast lead time (in day). The red (blue) curves represent the ensemble mean prediction from the CMA (ECMWF) models, along with inter-member spreads shown by shadings. df are same as ac, but for the NRMSE skills

Figure 6 shows the HSS distributions of SCER probability during phases 2–4 of the BSISO1. In phase 2, within 1-week lead times, the ECMWF model has HSSs over 0.2 in most areas of SC, and the areal mean HSSs over SC are above 0.1. In phase 3, ECMWF also has high HSSs over the majority of SC at 7–14-day leads, suggesting encouraging forecast skill of BSISO1’s modulation on SCER probability during this phase. In phase 4, high HSSs are mainly confined to the southeast coast only at lead times within a week. For CMA, negative HSSs appear at most parts of SC in each phase, indicating the poor skill of CMA in forecasting the probability of SCER under the influence of BSISO1.

Fig. 6
figure 6

Heidke skill score (HSS, shading) of SCER for phase 2 (upper two rows), phase 3 (middle two rows), and phase 4 (bottom two rows) of BSISO1 with respect to the non-BSISO1 period. Panels from left to right are 7-, 14- and 21- lead from ensemble mean predictions, respectively. Areal mean HSS over SC is shown in the bottom of each panel

Figure 7 shows the areal mean HSSs of SCER modulated by BSISO1 with a function of lead time. It can be found that useful skills (HSS exceeds 0.1) of ensemble prediction can be obtained at lead time up to around 10 days for ECMWF during phases 2–3 of BSISO1, while the lead time is limited to 7 days during phases 4 of BSISO1. The CMA model exhibits relatively low skills in phase 2–4, with HSSs being lower than 0.1 only beyond 3- or 4-day lead.

Fig. 7
figure 7

As in Fig. 5a-c but for the areal-mean HSS

In summary, the two models have some capability to predict the probability of SCER that is modulated by BSISO1, but biases can be found in both intensity and location of SCER probability changes. In terms of the deterministic skills (PCC and NRMSE) for SCER probability, the ECMWF model displays useful skills up to 1 week in advance in phase 2, and 25 days in advance in phase 3. The CMA model cannot predict the BSISO1’s modulation on SCER probability in phases 2–3 even at a 5-day lead. Poor skills are commonly found in phase 4, with the useful prediction skills for ECMWF can only be gained within 3-day lead times and no such skills exist for CMA. Based on probabilistic forecast verification (HSS), useful skills of SCER probability can be gained at up to around a 10-day lead during phases 2–3, and at a 7-day lead in phase 4 for ECMWF, but the lead time is only 3 or 4 days for CMA in all three phases.

5 Causes of the prediction biases in BSISO1’s modulation on SCER

Although the S2S models can reproduce the modulations of BSISO1 on SCER probability to some extent, the prediction skill is still quite limited. What are the possible causes of model biases in predicting the SCER modulated by BSISO1 activity? A clear answer to this question may provide some heuristic clues to improving the S2S prediction of SCER.

Because extreme rainfall is related to the favorable atmospheric circulation and abundant moisture conditions, the moisture flux convergence \(-\nabla \cdot (q\overrightarrow{V}\)) is considered as the main factor of SCER during BSISO1 phases (O’gorman and Schneider 2009; Hsu et al. 2016; Loriaux et al. 2016). The moisture flux convergence \(-\nabla \cdot (q\overrightarrow{V}\)) could be further divided into two terms: moisture convergence \((-q\cdot \nabla \overrightarrow{V})\) and moisture advection \((-\overrightarrow{V}\cdot \nabla q\)). In this section, we will diagnose the column-integrated moisture convergence and moisture advection during phases 2–4 of BSISO1 in the observation and prediction, and further identify the possible causes for the model biases in predicting the BSISO1’s modulation on SCER probabilities.

In observation, during phases 2–3 of BSISO1 (Fig. 8, the first column), the YRV is dominated by low-level southwesterly anomaly on the northwestern flank of the anomalous anticyclone over the western North Pacific (WNPAC). Therefore, strong moisture convergence appears over the YRV, providing a favorable condition for the increase of extreme rainfall in the region (Fig. 4). As the WNPAC further propagates northeastward in phase 4, the moisture divergence originally located in the south coast of SC diminishes, and the majority of SC witnesses a replacement by moisture convergence (Fig. 8), leading to enhanced extreme rainfall over most regions south of Yangtze River (Fig. 4).

Fig. 8
figure 8

Composites of column-integrated moisture convergence (shading, unit: 10−5 m s−2) and 850-hPa wind field (vector, unit: m s−1) anomalies for phases 2 (upper two rows), 3 (middle two rows), and 4 (bottom two rows) of the BSISO1. Panels from left to right are the observation, 7-day, 14-day and 21-day lead of ensemble mean predictions, respectively. Only the fields exceeding the 95% confidence level are shown. Letter “A” represents the center of the anticyclonic anomaly. The case number for phase composite is shown in the upper-left corner of each panel

Can the S2S models forecast the low-level WNPAC and the associated moisture convergence? The horizonal patterns of the 850-hPa wind and moisture convergence forecasted at 7-, 14-, and 21-day lead are shown in Fig. 8. In general, ECMWF can realistically reproduce the large-scale circulation and moisture convergence during phases 2–4 of BSISO1. In phase 2, because of the weaker southwesterly over the northwestern flank of the WNPAC, the moisture convergence in the YRV region is underestimated in the ECMWF model at a 7-day lead. At lead times beyond 14 days, the WNPAC is forecasted northeastward in the ECMWF compared with observation, leading to weakened moisture convergence and therefore underestimated probability of extreme rainfall over the YRV (Figs. 4 and 6). In phase 3, the moisture convergence (divergence) in the YRV (southeast coast) is nicely forecasted up to 21 days in advance in the ECMWF, resulting in the high prediction skills of SCER probability in phase 3 (Figs. 4 and 6). In phase 4, the moisture convergence center is predicted northward by 5° in the ECMWF, leading to the underestimation (overestimation) of extreme rainfall probability south (north) of the Yangtze River (Fig. 4). In comparison to ECMWF, the CMA model cannot well reproduce the intensity and spatial patterns of moisture convergence and WNPAC.

Figure 9 shows the observed and forecasted spatial distribution of moisture advection during phases 2–4 of the BSISO1. It is clear that the moisture advection also plays an important role in BSISO1’s modulation on SCER probabilities, but in a opposite way. Negative moisture advection could suppress the SCER probabilities. In ECMWF, for phases 2–3, the negative moisture advection over SC is general forecasted, although its intensity gradually decreases with increasing lead time. For phase 4, although the low-level wind field is well captured (Fig. 8), the negative moisture advection over SC could not be reproduced with all lead times, suggesting the low capacity of ECMWF in forecasting the spatial distribution of moisture (\(\nabla q\)). In general, the ECMWF model performs much better than the CMA model, as the latter always underestimates the negative moisture advection over SC.

Fig. 9
figure 9

As in Fig. 8, but for the column-integrated moisture advection (shading, unit: 10−5 m s−2) anomalies

The PCC and NRMSE skills of ensemble prediction as a function of lead time for moisture convergence and moisture advection fields are shown in Fig. 10, which are only calculated over the region with statistically significant signals in observation, therefore it emphasizes the contrast between ECMWF and CMA in predicting moisture convergence (advection) in key regions. It is obvious that the CMA model shows much lower PCC skills in predicting the moisture convergence and advection than ECMWF, leading to the much worse prediction skills in SCER probability (Fig. 5) and areal-mean HSSs (Fig. 7) of SCER. Another notable feature is that the prediction of moisture convergence always has better skills than that of moisture advection in ECMWF, suggesting that the model still has difficulty in reproducing the spatial pattern of moisture in the region. Taking PCC = 0.5 as the threshold of useful prediction skill, the ECMWF model can predict the moisture convergence 25 days in advance in phase 2, and beyond 30 days in phases 3–4. For the moisture advection, useful skills can be gained up to 25 days in advance in phase 2, and beyond 30 days in phase 3, but only within 5 days in advance in phase 4. It is noteworthy that although the moisture convergence in phase 4 shows a skill comparable to those in phases 2–3 in ECMWF, the low prediction skill of moisture advection in phase 4 results in the poor prediction skills for the probability and HSS of SCER in phase 4 (Figs. 5 and 7). The CMA model has poor skills (PCC less than 0.5) at all lead times for both moisture convergence and moisture advection, which are consistent with its low prediction skills of SCER probability (Figs. 5 and 7).

Fig. 10
figure 10

The PCC skills for the ensemble mean forecasted column-integrated moisture convergence (solid curves) and moisture advection (dashed curves) from ensemble predictions of CMA (red) and ECMWF (blue) in a phase 2, b phase 3 and c phase 4 of BSISO1 as a function of forecast lead time (in day). df are some as ac, but for the NRMSE skills. To quantitatively examine whether the models can correctly predict the moisture field in critical areas, the PCCs and NRMSEs are only calculated over the region with statistically significant signals in observation

To further check whether the prediction skills of SCER probability during phases 2–4 of the BSISO1 are related to the models’ performance in capturing the BSISO1-related moisture convergence and advection, the scatter plots between the prediction skills (PCC) of SCER probability and column-integrated moisture convergence and moisture advection in ECMWF and CMA ensemble members at all forecast leads are shown in Fig. 11. In all ensemble members from both two models, the PCC skills for moisture convergence are significantly correlated with those of the SCER probability, suggesting that the prediction errors in SCER probabilities may stem from the biases in predicting the BSISO1’s modulation on the large-scale moisture convergence. On the contrary, the PCC skills of moisture advection in CMA (ECMWF) have no (relatively weaker) relationship with the PCC skills of SCER probability, indicating that moisture advection may play a secondary role in BSISO1’s modulation. Thus, the models’ capability to represent the associated moisture convergence is the key to the skillful prediction of BSISO1’s modulation on SCER probability. Note, however, that the relationship between PCC skills of different contributors and SCER probability also depends on the overall prediction skill of SCER probability. If the overall prediction skill is high, the PCC relationship becomes relatively weak (Fig. 11, phase 3), and verse visa (Fig. 11, phase 4). In other words, when the prediction is approaching perfection, the prediction skill is more sensitive to the secondary term \((-\overrightarrow{V}\cdot \nabla q\)), rather than the first contributor \((-q\cdot \nabla \overrightarrow{V})\).

Fig. 11
figure 11

Scatter diagrams for PCC skills of the percentage changes in SCER probability (y-axis) against the column-integrated moisture convergence (x-axis) over the key regions (0–45°N, 70–150°E) in a phase 2, b phase 3 and c phase 4 of the BSISO1 for all individual members at all forecast lead times from the two models. df as in ac, but x-axis represents PCC skills of column-integrated moisture advection. The linear fit curves for ECMWF (308 blue dots) and CMA (120 orange dots) are in blue and red, and the large blue and red dots are the averaged PCC for ECMWF and CMA, respectively. The correlation coefficients (R) between PCC skills of percentage changes in SCER probability and those of column-integrated moisture convergence (advection), together with their P values are given in each panel, and asterisks indicate the R is significant at the 95% confidence level

6 Conclusion and discussion

6.1 Conclusion

Using the reforecast data of 1998–2012 from the CMA and ECMWF models, which have been involved in the S2S project, the present study investigates the prediction skills of SCER and BSISO1 activity and unravels how the prediction biases of BSISO1-related moisture processes lead to the biases in predicting the SCER probability. The main conclusions are summarized as follows:

  1. 1.

    Although both S2S models can predict the spatial pattern of summer mean rainfall and the standard deviation of daily rainfall over China, the ECMWF model obviously outperforms the CMA model. For the ECMWF model, both summer mean rainfall and daily variation are slightly overestimated (underestimated) over the northwest (southeast) part of SC. The CMA model presents an evident underestimation for both mean rainfall and daily variation over entire SC.

  2. 2.

    Compared to the CMA model, the ECMWF model shows higher skills in reproducing the summer rainfall variability and relatively small differences in the threshold value of SCER (the 90th percentile of rainfall amount) at all lead times against the observation. The ECMWF model also yields higher HSS skills of rainfall extreme occurrence within 14-day lead times over the entire SC area, and at up to 21-day lead over the southeast coast. In contrast, the CMA model can only perform useful HSS skills of rainfall extreme occurrence within a forecast lead time of 7 days.

  3. 3.

    The ensemble forecasts from the ECMWF and CMA models show skillful prediction of the BSISO1 index (with ACC larger than 0.5) at 24-day and 10-day forecast lead time, respectively. The prediction skills of BSISO1 phases, rather than of its amplitudes, determine the total ACC skills of the BSISO1 index, suggesting that elimination of phase errors could improve the prediction skills of BSISO1.

  4. 4.

    Given that the probabilistic prediction skill (HSS) of SCER occurrence corresponds well to the deterministic prediction skill (ACC) of the BSISO1 index, how the SCER prediction is modulated by the predicted BSISO1 is further revealed. The diagnostic results suggest that the prediction skills of moisture convergence and advection play an important role in the prediction skills of SCER probability influenced by BSISO1. The correlation analysis between the PCC skills of SCER and moisture convergence/advection reveals that moisture convergence is the first contributor to the prediction skill of SCER. However, it should be noted that moisture advection, as the secondary contributor, may exert a “Wooden Bucket Theory” (Cannikin Law). This is revealed by its good relationship with PCC of SCER but the lowest prediction skill of SCER during phase 4 of BSISO1, which imply that moisture advection is also a key factor in predicting the SCER modulated by the BSISO.

6.2 Discussion

The above results suggest that improving models’ capability of predicting BSISO is crucial for promoting model performance in prediction of SCER. Specifically, the SCER prediction skill largely depends on whether the models can correctly reproduce the BSISO-related moisture convergence and advection. Thus, one strategy to enhance the prediction skill of SCER is to improve models’ capability in predicting BSISO-related moisture processes, which could be achieved by adjusting/improving the convection parameterization schemes (Kim et al. 2014; Jiang et al. 2015; Yang et al. 2021b). Improvements of additional factors such as the data assimilation schemes (Liu et al. 2021), initialization (Orsolini et al. 2013; Bo et al. 2020), ocean-land–atmosphere coupling techniques (Ford et al. 2018) and the resolution (Vitart 2017; de Andrade et al. 2019) of the models may also help to increase the forecast skills of BSISO and its related moisture processes and thus improve the SCER prediction skill. However, the S2S datasets provide reforecast from models with various physics and parameterizations. It is difficult to identify the key parameters controlling the BSISO moisture processes and thus the SCER prediction skill by comparing the reforecast data from models with different configurations. Performing model experiments based on the same model is required, which will be a focus in our future study.

One may wonder whether the models’ climatological mean fields affect the simulation of BSISO1. Thus, we have also checked the two S2S models’ capability in simulating seasonal (MJJA) mean precipitation, sea surface temperature (SST) and vertical shear of zonal wind over the key regions related to BSISO (supporting Fig. S2), i.e., the Indian Ocean-West Pacific region (Inness et al. 2003; Sabeerali et al. 2013). Compared to the ECMWF model, the CMA model shows relatively larger biases for the climatology fields (precipitation, zonal wind, and SST) at all lead times (supporting Fig. S2), corresponding to the fact that the ECMWF model has better prediction skill of BSISO1 than the CMA model. Earlier studies (Ajayamohan and Goswami 2007; Sperber and Annamalai 2008) suggested that BSISO1 is closely linked to regions with maximum summer mean precipitation, and the easterly vertical shear of the summer mean monsoon flow may play a critical role in the northward propagation of BSISO1 (Jiang et al. 2004; Drbohlav and Wang 2005). In addition, previous research has shown that the prediction skill of BSISO is also limited by the insufficient increases in both the low-level cyclonic vorticity and the moistening and warm temperature anomalies to the north of the convection (He et al. 2019), the biases in initial and boundary conditions (Vitart et al. 2007; Woolnough et al. 2007; Fang et al. 2019), the misrepresentation of atmosphere–ocean interactions (Fu et al. 2007, 2013), and the land–atmosphere coupling (Jeong et al. 2013; Orsolini et al. 2013; Li et al. 2020c).

Considering the robust relationship between the SCER probability and BSISO, before a further advance in dynamical models is achieved, the dynamical-statistical hybrid method could be an effective way to improve the subseasonal prediction skill for extreme rainfall (Ren et al. 2014; Guo et al. 2017). In the current study, the observed BSISO1-SCER relationship is gained based on the percentage change in SCER probability during different phases of BSISO1 (Fig. 4). Thus, we can only make a hybrid prediction for the percentage change in SCER probability during different BSISO1 phases, rather than having a direct prediction of SCER.

While the present study shows the evident effects of model biases in BSISO1 on the prediction skill of SCER, whether and to which extent the model skill of BSISO2 (quasi-biweekly oscillation) impacts on the SCER prediction merit further exploration.