1 Introduction

One of the main components of flood warning systems is “precipitation forecasts.” For this purpose, often deterministic numerical weather prediction (NWP) models are considered. However, such models highly depend on the initial atmospheric conditions and do not take uncertainties into account (Gourley and Vieux 2005; Ye et al. 2014). To overcome this limitation, the use of meteorological Ensemble Prediction Systems (EPSs) is an alternative that is increasingly considered for flood warning systems (Hamill et al. 2000; Goswami et al. 2007; Verkade et al. 2013). EPSs are numerical weather prediction systems which make it possible to estimate the uncertainty in a weather forecast using probability distribution functions of the atmospheric conditions and also provide the most likely outcome (WMO 2015). The European Center for Medium-Range Weather Forecasts’ Ensemble Prediction System (ECMWF-EPS) is one of the well-known EPSs in this regard (Woods 2006). The short to medium range (1–15 days forecast lead time) ensemble weather forecasts of this center are available since November 1992. ECMWF forecasts include 50 members which are derived from different initial atmospheric conditions.

In spite of positive reports about ECMWF-EPS, the evaluation of its forecasts is still under investigation. Operational challenges are false warnings, particularly as the lead times increase. The consequences of errors can be costly and damage the reputation of forecasting institutions (Ye et al. 2014). To assess the performance of ECMWF forecasts, different skill and accuracy criteria are suggested (Liu et al. 2014). For instance, ECMWF forecasts’ skills were evaluated in the Danube basin using the Brier skill score (BSS) (Pappenberger and Buizza 2009). Monhart et al. (2018) evaluated ECMWF model skill against 1000 precipitation stations for Europe using the continuous ranked probability skill score (CRPSS), ranked probability skill score (RPSS), and receiver operating characteristic (ROC). Thirel et al. (2008) compared ECMWF and Météo-France Prévision d’Ensemble ARPEGE-EPS precipitation over France using BSS and CRPSS. To improve the performance of EPSs, post-processing of the output information is also considered, especially bias correction. Due to the limitations of the NWP models and their associated data, the forecasts may contain biases in moments of the forecast distribution. Verkade et al. (2013) evaluated ECMWF precipitation ensemble forecasts for biases in the mean, variance, and forecast probabilities. They applied quantile-to-quantile transformation (QM) and a linear regression method as bias correction methods.

In addition to general evaluation of the ECMWF forecasts, there are a number of studies that considered other criteria such as physical conditions in basins. For example, Pappenberger and Buizza (2009) evaluated the skill of ECMWF forecasts in the upper, middle, and lower basin of the Danube. Their results showed that the surface area of the sub-basin affects the skill of forecasts, where larger sub-basins show better results in the verification. In another study, Ye et al. (2014) considered the performance of ECMWF for different basin sizes during flooding and non-flooding seasons. They used the continuous ranked probability score (CRPS), CRPSS, and reduction continuous ranked probability score (RCRPS). Their results showed that the forecast performance vary with sub-basin properties, e.g., area and skewness of daily precipitation and difference between the flooding and non-flooding seasons.

According to the previous studies, ECMWF has turned out to be the best model among those in TIGGE (THORPEX Interactive Grand Global Ensemble) due to the high forecast skill (Zhao et al. 2011; Zhi et al. 2011; Su et al. 2014). However, the influence of climate conditions on the skill of this model is still unknown. Therefore, this study aims to evaluate the skill of ECMWF’s ensemble forecasts across dry to humid regions. Moreover, in this study, specific attention is paid to forecasting of precipitation thresholds (as percentage). The forecasts are evaluated at certain thresholds derived from observed precipitation. Since this study focuses on high flow forecasts, the thresholds chosen in this study are exceedances of 0, 25%, 75%, 95%, and 98%. Such evaluation is highly imperative since in case of showing high skill at extreme thresholds, this model can be applied in flood warning systems for predicting floods.

2 Material and methods

2.1 Study area and data

The Karun River basin is located in the middle Zagros mountains in Iran and extends over geographical coordinates of 48–52° E and 30–34° N. The basin is relatively large with an area of 67,297 km2, elevations ranging from 0 to 4400 m above mean sea level (Fig. 1a), and a spatial variation of annual precipitation from 50 to 1800 mm. This basin is one of the most important basins of Iran in terms of water resources and discharges to the Persian Gulf and Oman Sea. The basin includes three large dams that are constructed for flood management and irrigation of the large agriculture areas of Karun plain. There is a dense precipitation gauge network in this region, but for this research, the statistics and time series of 21 stations were available. Location and characteristics of these stations are given in Fig. 1b and Table 1. All data were adopted from the National Weather Organization of Iran.

Fig. 1
figure 1

a Elevation range and b ECMWF grids, rainfall stations and dams shown on the climatic zones in Karun River basin

Table 1 Description of rainfall stations in Karun basin

2.2 Climatic regions in Karun basin

Using the de-Matron method, Karun basin can be classified into six climatic regions, namely very humid, humid, semi-humid, Mediterranean, semi-arid, and arid. Climate in this region is mainly influenced by altitude that varies between the lowland plains (0 m.a.s.lFootnote 1at the outlet of Karun River to the Persian Gulf) and elevations over 4000 m.a.s.l at the Koohrang elevations. The annual precipitation in the basin is as low as 150 mm in the lowland plains and reaches 1800 mm in the highlands (Table 2).

Table 2 Characteristics of climatic regions in Karun basin

2.3 ECMWF data

THORPEX is the Global Project and World Weather Research Programme project (Bougeault et al. 2010) that was set up by the World Meteorological Organization (WMO) to improve mid-term (1 day to 2 weeks) forecasts (Clock et al. 2009). In this project, TIGGE was a major database which began using the forecast data from October 2006 from 10 global numerical weather prediction models. All TIGGE models have a resolution of 1 × 1 degree or coarser, except for ECMWF and CMA (China Meteorological Administration) with a resolution of 0.5 × 0.5 degrees (Louvet et al., 2015). Researchers have paid a lot of attention to ECMWF as one of the leading global modeling centers in the world, due to its larger number of members in the ensemble forecasts and its higher spatial accuracy compared to other models (Woods 2006). This data center has been producing ensemble forecasts since 1992 (Pappenberger and Buizza 2009). In recent decades, ECMWF forecasts have reached high standards and its analyses have been extended to provide real-time forecasts of floods around the world.

The forecasts of the ECMWF model are only available at a resolution of 0.5 × 0.5 degrees, and therefore, interpolation has been carried out using inverse distance weighting (IDW) to provide forecasts at the locations of the observation stations with lead times of 24, 72, and 120 hr from September 24, 2010, to September 23, 2014. To calculate the amount of rainfall in each climate region, the Thiessen polygon method was used within ArcGIS. Figure 1 shows the distribution of ECMWF grid points in the Karun basin. In this study, the ECMWF mid-term skill was evaluated using two bias correction methods, i.e., linear mapping (LM) and quantile mapping (QM) for three lead times.

2.4 Bias correction methods

Linear mapping and quantile mapping bias correction methods were used in order to identify and correct the possible systematic differences between observed and forecasted data. In the LM), the average of the observations and forecasts is considered. This method aims to correct the mean forecasts based on the difference between the mean of observed and forecasted values (Crochemore et al. 2016). In the QM method, observations and forecasts are matched based on probability distributions or quantiles. In this method, the cumulative distribution function (CDF) is first calculated for the forecasts, then the CDF for the observed data is mapped, the value of each member and its probability will be determined, and finally, for each probability, the observed value will be estimated and replaced by the forecasted value (Crochemore et al. 2016). To compare the effectiveness of these methods, a comparison of the bias correction methods for mid-term ECMWF data was investigated.

2.5 Evaluation indices

2.5.1 Continuous ranked probability score

The continuous ranked probability score (CRPS) index shows the rate of consistency between the CDFs of the observed and probabilistic predictions. This index assesses the accuracy by comparing probability distributions of ensemble forecasts with observed values (Liu and Xie 2014):

$$ CRPS=\frac{1}{n}\sum \limits_{i=1}^n{\int}_{-\infty}^{+\infty }{\left[{F}_i(x)-{o}_i(x)\right]}^2 dx=\frac{1}{n}\sum \limits_{i=1}^n{\int}_{-\infty}^{+\infty }{\left[{F}_i(x)-H\left(x-{x}_{o.i}\right)\right]}^2 dx $$
(1)

where n is the number of predictions, F(x) denotes the cumulative distribution function of predictions, o(x) is the observed cumulative distribution function, and H(x − xo.i) is the Heaviside function for observed values, which equals 0 if H(x − xo. i),P(x)= is less than zero and is one if H is greater than one (Hersbach 2000). The ideal value for CRPS is zero.

A critical issue when using probabilistic skill scores is the lack of attention to dispersion and the number of examined parameters. To circumvent this issue, the reduction continuous ranked probability score (RCRPS) was used. This is the normalized form of the CRPS using the standard deviation that was introduced by Trinh et al. (2013):

$$ RCRPS=\frac{CRPS}{\sigma } $$
(2)

where σ is the standard deviation of the observed values for each study area or time period.

2.5.2 Continuous ranked probability skill score

This index is another normalized form of CRPS (Trinh et al. 2013). Continuous ranked probability skill score (CRPSS) is used to quantify the skill of ensemble forecasting compared to a reference forecast. The value of CRPSS ranges between − ∞ and 1 and its ideal value is 1 (Ye et al. 2014).

$$ \mathrm{CRPSS}=1-\frac{{\mathrm{CRPS}}_{\mathrm{forecast}}}{{\mathrm{CRPS}}_{\mathrm{reference}}} $$
(3)

where CRPSforecast is the CRPS related to the forecast and CRPSreference refers to the CRPS determining the forecast over a reference period. In this paper, the reference forecast was considered as climatology in the form of rainy and non-rainy months for each climatic region from December 2010 to December 2014 (see, e.g., Benninga et al. 2017).

3 Results and discussion

This section is divided into three parts: the first part examines the skill of forecasts of raw ECMWF data in flooding and non-flooding seasons. The flooding season is from October to March and the non-flooding season is from May to September. The second part examines the effect of bias correction on the predictive skill in these seasons and across all climatic regions. The third part assesses the model skill in the flooding region over a 4-year period. The precipitation forecast skill of ECMWF is investigated using the three indices CRPS, CRPSS, and RCRPS in the study area for different lead times, climatic regions, and flooding and non-flooding seasons.

3.1 Skill assessment for different lead times

3.1.1 CRPS

The evaluation results are shown in Fig. 2. As expected (Liu et al. 2013; Benninga et al. 2017), with an increase in lead time from 24 to 120 hr, the CRPS value has increased which indicates a decrease in the forecast skill with an increase in lead time. This situation can be observed for all climatic zones. Ye et al. (2014) carried out a comparable study for different catchment sizes and similar to one of our climatic regions, they found cases where this trend was not the same. Nevertheless, our results confirm the findings by Tao et al. (2013) and Ye et al. (2014) in which with an increase in the lead time, model performance was weaker.

Fig. 2
figure 2

CRPS values for three lead times (24, 72, and 120 hr) and six climatic regions for flooding seasons (a) and non-flooding seasons (b)

3.1.2 CRPSS

As mentioned above, the CRPS is not suitable for the purpose of comparison, and thus, the CRPSS was used as well. Evaluation of ECMWF forecasts is shown in Fig. 3. The figure shows the performance of the forecasts in 6 climatic regions for flooding and non-flooding seasons. Observations from the figures can be summarized as follows: (1) as before, with the increase of the lead time, the skill of the forecast is reduced as for instance also found by Benninga et al. (2017). This reduction in performance in the wet area is significantly different from the other areas. (2) The performance of forecasts in different regions is not the same for the flooding and non-flooding seasons. For example, a similar performance was observed for both seasons for the very humid area; however, for the humid region, the difference is larger between forecasts for lead times of 24 and 120 hr. This finding is in line with Tao et al. (2014), who investigated the ECMWF skill in Huai River basin over the period 2007–2011 during the summer and fall seasons. Their results showed a decrease in CRPSS with increasing lead time.

Fig. 3
figure 3

CRPSS value for three lead times (24, 72, and 120 hr) and six climatic regions for flooding seasons (a) and non-flooding seasons (b)

3.2 Comparison of CRPSS and RCRPS indices

Both CRPSS and RCRPS indices can be used for relative comparison between the performance of different forecast systems. Figure 4 shows the comparison between these indices for the six climatic regions of the basin. Main observations are as follows: (1) as before, the performance of the forecast model for the flooding seasons is better than for the non-flooding seasons, and its accuracy decreases with increasing lead time; (2) the larger sensitivity of CRPSS compared to RCRPS is noteworthy in the results; and (3) the difference between the results of the evaluation based on CRPS for flooding and non-flooding seasons substantially increased from wet to arid climates. This difference is well depicted in Fig. 4a for the “very humid” and Fig. 4f for the “arid” areas.

Fig. 4
figure 4

af CRPSS in flooding seasons (black lines) and non-flooding seasons (black dashed) and RCRPS in flooding seasons (blue line) and non-flooding seasons (blue dashed) for six climatic regions. X-axis indicates lead time and Y-axis covers the range of CRPSS and RCRPS values

3.3 Bias correction effects on model forecast skill

This section tends to identify and correct the possible systematic differences between observations and ECMWF data and ultimately evaluate the feasibility of enhancement in the predictive skill. Figure 5 shows the results related to the bias correction phase. It shows the improvement of CRPS, CRPSS, and RCPRS after using the LM and QM methods in different climatic regions.

Fig. 5
figure 5

Improvement in CRPSS, CRPS, and RCRPS: CRPSS (a flooding season LM, b flooding season QM, c non-flooding season LM, d non-flooding season QM), CRPS (e flooding season LM, f flooding season QM, g non-flooding season LM, h non-flooding season QM), RCRPS (i flooding season LM, j flooding season QM, k non-flooding season LM, l non-flooding season QM)

3.3.1 CRPS

As shown in Fig. 5, the improvement in CRPS after bias correction using the linear method was up to 16% in all climatic regions and in the flooding season. Nevertheless, in non-flooding seasons, the improvement was less than 10%. In flooding seasons, the greatest impact was observed in regions with very humid, humid, and Mediterranean climates, and in non-flooding seasons, the largest improvements were in regions with humid and semi-humid climates. The rate of increase in performance by application of the QM method was up to 35% across various climates.

3.3.2 Comparison between CRPSS and RCRPS

Comparison between CRPSS and RCRPS indices after bias correction shows the following. (1) The bias correction has improved the performance of the model forecast and indicators. The higher values of CRPSS and RCRPS are due to the bias correction as shown in the figures, the effect of bias correction on the performance enhancement of RCRPS was up to 15%, while this rate for CRPSS was up to 40%. Liu et al. (2013) also found CRPSS values above zero when evaluating the effect of the bias correction on this index, but the results in winter (flooding season) were weaker than those in the other three seasons. Comparing the results, it can be inferred that the QM method shows a better performance for all climatic regions and shows a better performance in the flooding seasons. The CRPSS value decreased with increasing lead time, in line with for instance Verkade et al. (2013). When increasing the lead time from 24 to 120 hr, the impact of bias correction on improving the performance of ECMWF forecasts becomes smaller.

3.4 Effect of various precipitation thresholds on ECMWF forecast skills in flooding climatic zones

Evaluation of ECMWF forecasts based on thresholds shows its performance for different ranges of precipitation. This is particularly important for extreme precipitation that can lead to floods. Based on the characteristics of our study area, this assessment was carried out only for very humid, humid, and semi-humid climates. These regions are located upstream of the dams (Fig. 1) and collectively constitute 48% of the basin area. The influence of precipitation thresholds ranging from 25 to 98% on the forecast skill was investigated.

For a better comparison of the results, the evaluation was performed based on CRPSS and RCRPS. As can be seen in Fig. 6, across all three regions, the CRPSS trend is descending with increasing thresholds and lead times, and the opposite results are observed for RCRPS. Both of these trends demonstrate a decline in the forecast skill as the threshold increases. Since the percentage of rainy days is less than 20% over the entire year, the amount of precipitation in most days is almost zero. Also, in humid and very humid regions for thresholds above 90%, the performance decreases sharply which can be due to the skewness of the precipitation in these areas.

Fig. 6
figure 6

RCRPS variations (linear charts) and CRPSS (column charts) in very humid, humid, and semi-humid climates for 24, 72, and 120 hr lead times for various precipitation thresholds

4 Conclusions

This paper evaluated the skill of ECMWF’s ensemble forecasts in the Karun River basin in Iran. The entire study period was divided into flooding seasons (October to March) and non-flooding seasons (May to September). The investigation was carried out for 6 climatic zones (very humid, humid, semi-humid, Mediterranean, semi-arid, and dry desert), which directly and indirectly include the influence of temperature, altitude, and precipitation of the basin. Also, forecasts were made for lead times varying from 24 to 120 hr based on the intended applications of the outcomes of this study in flooding and real-time water depletion management to irrigate downstream fields. The general outcomes of this investigation are as follows:

  1. 1.

    Assessing the forecasting skill in climatic regions showed that typically, the performance of forecasts is decreasing from wet regions towards dry regions. The very humid region, however, was an exception that could be attributed to its small area (3500 km2 ~ 1% of basin area). The weak forecasting skill in small areas was also reported by, for example, Ye et al. (2014).

  2. 2.

    The ECMWF forecast skill decreases with an increase in lead time. However, its performance for the flooding and non-flooding seasons and based on CRPSS is lower in regions with dry climates and higher in wet areas. Nevertheless, using the RCRPS, these differences are more tangible for different climatic regions.

  3. 3.

    Bias correction of the ECMWF data was performed using the QM and LM and showed that it can significantly improve the forecasts, particularly when using QM. The bias correction was more effective in the flooding season and the skill in this season was increased by up to 40% based on the CRPSS.

  4. 4.

    To evaluate the forecast performance of the model in our study area, precipitation thresholds are among the most important features that should be considered. Since the percentage of rainy days is less than 20% over the entire year, the amount of precipitation in most days is almost zero. When the precipitation thresholds were increased towards extreme values, the forecast performance of ECMWF became worse. It should be noted that this evaluation was carried out for only three climatic zones, i.e., very humid, humid, and semi-humid, which form the main precipitation input for the basin and dams. The better performance of CRPSS and RCRPS across humid and semi-humid regions could be presumably attributed to the existence of skewness in the observed precipitation.