1 Introduction

Climate change is expected to influence flood risks in cities due to a shift in the occurrence and magnitude of rainfall extremes (e.g. Fowler and Hennessy 1995). To avoid flood damage and the corresponding costs, estimations of changes in flood risks, as a response to anthropogenic greenhouse gas (GHG) forcing, are required. The design practices for urban drainage structures are based on calculations using sub-hourly rainfall input (Arnell et al. 1984; Olsson et al. 2009; Willems et al. 2012) because pluvial flooding is a major contributor to urban floods. Consequently, any dataset provided for urban flood risk analysis should hold a spatial and temporal resolution, which ensures an adequate representation of convective rain (Fowler and Hennessy 1995).

Today, most assessments of future changes in urban floods rely on high-resolution regional climate model (RCM) simulations. Precipitation is generated by micro-physical processes at scales that are far below the feasible grid size of even the most advanced climate or weather-forecasting model (Baker and Peter 2008). In addition, the spatial extent of convective clouds (1–10 km2) is smaller than the grid resolution of the current climate models. These models therefore include a parameterization scheme specifically for the generation of convective clouds and convective precipitation (Lenderink and Meijgaard 2008). However, evaluations of RCM skills for simulation of precipitation extremes are rarely made at temporal scales dominated by convective generation mechanisms. Lenderink and Meijgaard (2008) found that the RCM RACMO could partly reproduce the observed Clausius–Clapeyron scaling between temperature and precipitation at an hourly time scale. Nevertheless, marked differences still exist when comparing observed extreme areal rainfall intensities with RCM simulations (Hanel and Buishand 2010).

The issues mentioned above point towards a clear gap between RCM outputs and their direct application in water management (Kundzewicz and Stakhiv 2010). The hydrological community has, in response, introduced a variety of statistical downscaling methods, among others summarized by Fowler et al. (2007) and Willems et al. (2012). One of the most straightforward methods is to assume that the relative temporal development in climate model projections can be used in impact analyses (Arnbjerg-Nielsen 2012; Larsen et al. 2009). The shortcomings of this, and other methods, are not to be discussed here, but in general it is recognized that physically-based solutions must be developed by the climate model community (Kundzewicz and Stakhiv 2010). The hydrological community must, in turn, clearly communicate its specific needs.

This study reflects upon two central issues regarding the use of RCM output for evaluation of changes in flood risks. First, the simulation of temporal changes in key design parameters, here represented by the frequency and mean intensities of rainfall extremes. Secondly, we discuss and suggest two properties which can be used as measure of RCM performance in relation to the simulation of short-duration rainfall extremes.

In Denmark a regional increase of extreme rainfall intensities has been reported on the basis of a dense station network (Madsen et al. 2009). The same network is presently the backbone of a regional extreme point rainfall model applied in urban drainage design (Madsen et al. 2002). Hence, it offers a good basis for comparison when discussing the performance of the present RCM simulations. To represent the variety of RCM output two different RCMs and two different types of forcing are analysed, using rainfall extremes for durations between 1 and 24 h.

2 Datasets

2.1 Observed point rainfall

The station network consists of 70 high-resolution rain gauge stations with 10–31 years of observations in the period 1979–2009. When periods of rain gauge malfunction have been taken into account, the present data material represents 1428 station-years. A particular feature of the network is the very high, but uneven, spatial density of stations, see Fig. 1. The distance between the stations varies between 3 and 370 km. The data have been quality checked, partly by the Danish Meteorological Institute and partly by the authors. The original data resolution is 1 min and 0.2 mm. In the remainder of the paper the abbreviation SVK refers to the observed data.

Fig. 1
figure 1

The model area. RCM data grids are shown as squares, SVK gauge locations as dots

2.2 RCM simulation outputs

The study uses three versions of RCM simulated rainfall, representing two different models and two different types of forcing. The RCM datasets are:

  1. 1)

    The regional climate model RACMO version 2.1 (Meijgaard et al. 2008) forced by:

    1. a)

      ERA40 re-analysis data, which cover the period 1958–2002. This dataset is abbreviated RACMO-ERA40.

    2. b)

      A transient simulation from the global climate model ECHAM5 covering 1950–2100. Until 2000 the external forcing of the climate system in ECHAM5 comes from observed GHG concentrations, and after this year it is based on the A1B scenario. This dataset is abbreviated RACMO-ECHAM.

  2. 2)

    The regional climate model HIRHAM version 5 (Christensen et al. 2007) forced by ECHAM5, with specifications as described above. This dataset is abbreviated HIRHAM-ECHAM.

All simulations were carried out as a part of the EU funded ENSEMBLES project (van der Linden and Mitchell 2009) and they have identical data structures. The spatial and temporal resolutions are 25 × 25 km2 and 1 h, respectively. Denmark is represented by 87 land-cells as shown in Fig. 1.

3 Data treatment

3.1 Identification of independent rainfall events

From the RCM simulations and the observed rainfall series, I(ξ), mean intensities, Y(t) τ at time t, for a selected duration, τ, are found by a moving average procedure

$$ Y{(t)_{\tau }}=\frac{{\int_{{t-{\tau \left/ {2} \right.}}}^{{t+{\tau \left/ {2} \right.}}} {I\left( \xi \right)d\xi } }}{\tau } $$
(1)

τ-values equal to 1, 6, 12 and 24 h are applied for the four data sets. From Y(t) τ maximum mean intensities, z τ,i , are found for each independent rain event, i, that initiates at time step t 0i and terminates at t 1i

$$ {z_{{\tau, i}}}=Max\left( {Y{(t)_{\tau }},{t_{0i }}\leqslant t\leqslant {t_{1i }}} \right) $$
(2)

In accordance with previous studies two events are assumed to be independent if the dry weather period between them are longer than or equal to the rainfall duration (Madsen et al. 2002; Madsen et al. 2009). It is generally known that RCMs produce too much drizzle and thereby too long wet spells. To ensure consistency between the data sets, a threshold value of 0.22 mm/h was applied to define a dry hour in the RCM simulated rainfall series. With this threshold the average annual number of rainfall events is approximately the same for all datasets, thereby enabling a comparison of the properties of precipitation extremes across data sets.

3.2 Peak over threshold

Having time series of maximum mean intensities, extreme values can be extracted using a Peak over Threshold (POT) procedure (Coles 2001). In the present study a fixed threshold was applied for each of the datasets. Previous studies on the SVK data have applied a threshold corresponding to approximately three extreme events per year (Madsen et al. 2002; Madsen et al. 2009). For all RCM data sets a threshold corresponding to a similar rate of occurrence is selected for the period 1961–1990, see Table 1.

Table 1 Applied thresholds [μm/s] in the POT procedure

The thresholds for the two RACMO simulations are comparable and approximately half of the SVK threshold, when looking at 1-h rainfall extremes. The difference decreases when the duration increases. The observed relation is most likely due to the fact that point and areal rainfall are compared. The HIRHAM thresholds are three to four times larger than the RACMO thresholds. Such a large difference between two RCMs is not uncommon (e.g. Christensen et al. 2010; Kjelstrup et al. 2011). Some downscaling methods aim at correcting this bias, but a simulation of ‘too high’ rainfall intensities is not necessarily indicating an erroneous simulation of temporal development or spatial properties of extreme rainfalls. Therefore we proceed with the applied thresholds.

Several key-parameters can be estimated from the series of extreme events and applied for the estimation of urban design values (Mikkelsen et al. 1996; Madsen et al. 2002). Here we focus on the number of extreme events and the mean intensity of the extremes.

3.3 Trend modelling

The development over time is evaluated on an annual basis. Regional averaging is a common procedure when evaluating RCM output at daily and hourly time scales (Kendon et al. 2008; Hanel and Buishand 2010). The uneven distribution of rain gauge stations does challenge this approach, since a trend in an area with many stations could dominate the result. However, a general increase in extreme rainfall intensities in Denmark has been documented using a regional extreme value approach, which accounts for station inter-dependency (Madsen et al. 2009). Hence, we evaluate the annual development of regional averages for Denmark.

Denote by N y,s the number of extreme events in a given year, y, at a given location, s, in a data set with M years of measurement at K different sites. Applying a fixed threshold in the POT method implies that the number of extreme events within a given time period follows a Poisson distribution (Coles 2001). A regional estimate of the annual Poisson frequency is obtained from

$$ {\lambda_y}=\frac{{\sum\nolimits_{s=1}^K {\left( {\frac{{{N_{y,s }}}}{{{l_{y,s }}}}} \right)} }}{K} $$
(3)

where l y,s [years] is the observation period at the specific site in a given year. This type of correction is often applied for observed rainfall extremes due to rain gauge breakdowns periods (Madsen et al. 2002). For the RCM data l y,s is equal to unity for all grid cells for all years.

Denote by z i,y,s the i’th extreme rainfall intensity in a given year, y, at a given location, s, i = 1… N y,s . A regional estimate of the annual mean intensity of the extremes is obtained from

$$ {\mu_y}=\frac{{\sum\nolimits_{s=1}^K {\sum\nolimits_{i=1}^{{{N_{y,s }}}} {{z_{i,y,s }}} } }}{{\sum\nolimits_{s=1}^K {{N_{y,s }}} }} $$
(4)

The following regression equations are applied

$$ {\lambda_y}=\exp \left( {{\alpha_{\lambda }}+{\beta_{\lambda }}{t_y}+{\varepsilon_{\lambda }}} \right) $$
(5)
$$ {\mu_y}={\alpha_{\mu }}+{\beta_{\mu }}{t_y}+{\varepsilon_{\mu }} $$
(6)

Poisson regression, which belongs to the family of generalized linear models, is a feasible model for the temporal changes in λ y and thereby N y (Strupczewski et al. 2001; Villarini et al. 2011). Note that standard assumptions in the Poisson regression model leads to the exponential term in Eq. 5, for further explanation we refer the reader to, e.g. Faraway (2006). The parameter describing the temporal dependence of λ, β λ , can therefore be interpretated as the percentage-wise annual increase. Temporal changes in μ y are modelled by ordinary linear regression utilizing that, according to the central limit theorem, the mean of a quantity asymptotically follows a normal distribution.

To evalute RCM performance in relation to the simulation of temporal development of short-duration rainfall extremes, the two rates of change (β λ and β μ ) are compared for the four datasets described in Section 2.

3.4 Measures of RCM performance

Convective and frontal rain have very distinctive characteristics in terms of spatial extent and duration, because high intensity convective cells are local and have a short lifetime (Austin and Houze 1972). As a rough guideline, short-duration rainfall extremes often originate from convective activity, which predominantly takes place in the summer, while long-duration extremes are dominated by large mesoscale frontal systems and exhibit a less seasonally dominant behaviour (Arnell et al. 1984; Niemczynowicz 1988). However, convective activity also occurs in frontal systems, and it is difficult to separate the two mechanisms on the basis of the duration of the observed rainfall extremes. For a specific extreme event measured at a given station there is no universal means of rainfall type classification. However, an analysis of the spatial correlation structure, based on a set of stations, can identify properties of the extreme rainfall events that may be related to the generation mechanism. As such, differences in the correlation structure for different durations may be an indication of the dominant generation mechanism, and hence the correlation structure is an important feature of rainfall extremes.

Based on the discussion above we suggest two different measures:

  1. 1)

    Comparisons of the seasonal distribution of occurrence of rainfall extremes

  2. 2)

    Comparisons of the spatial correlation structure

The first measure can be obtained by a visual comparison of histograms, while for the second we need a methodology that retains the physical relation between the extreme rainfall events. Mailhot et al. (2007) evaluated the spatial correlation of extreme rainfalls from an RCM for four different durations on the basis of an Annual Maximum Series approach. However, having only one event per site per year the correlation is not necessarily computed for events which in a physical sense can be regarded concurrent. Mikkelsen et al. (1996) developed a general framework describing the variability of urban design parameters, when these are estimated for POT data from a network of stations. The measure Mikkelsen et al. (1996) refer to as the ‘sampling-error correlation between exceedance mean values’ estimates the correlation between extreme events that originates from the same meteorological phenomenon. Hence the methodology can serve as a measure for comparison.

An estimation of the unconditional correlation, ρ, between the magnitudes of the extreme events observed at two different sites (A and B) requires a pairing of events that in a physical sense can be regarded concurrent. For the i’th extreme intensity measured at site A, Z Ai , to be concurrent with the j’th extreme at site B, Z Bj , the following should be fulfilled

$$ \left\{ {{Z_{Ai }},{Z_{Bj }}} \right\}:{{\left[ {{t_{0i }}-\varDelta t,{t_{0i }}+\varDelta t} \right]}_A}\cap {{\left[ {{t_{0j }}-\varDelta t,{t_{0j }}+\varDelta t} \right]}_B}\ne \emptyset $$
(7)

where t 0 is the start time of the event and Δt is a lag time introduced to compensate for the travelling time of the weather systems between sites A and B. A suitable Δt must be inferred from meteorological knowledge and hence depends on the rainfall duration in question. In this study we applied a time window of 11 h plus the duration in accordance with the recommendations in (Arnell et al. 1984; Niemczynowicz 1988). The unconditional covariance is estimated accounting also for extreme events at the two sites that are not concurrent

$$ Cov\left\{ {{Z_A},{Z_B}} \right\}=Cov\left\{ {E\left\{ {{Z_A}|U} \right\},E\left\{ {{Z_B}|U} \right\}} \right\}+E\left\{ {Cov\left\{ {{Z_A},{Z_B}|U} \right\}} \right\} $$
(8)

where U is a stochastic variable that takes the value 1 when concurrent events exist and 0 otherwise. The two terms in Eq. 8 are derived in Mikkelsen et al. (1996) on the basis of the probability of U taking either of its two possible values. ρ is obtained by dividing the estimated covariance in Eq. 8 with the product of the sampling error standard deviations estimated from the series of extreme values observed at the two stations. In a final treatment, estimates of ρ are averaged in selected bins as a function of distance between sites and fitted by an exponential model.

4 Results and discussion

4.1 Trends in frequency and intensity of rainfall extremes

The temporal development in the annual frequency and annual mean intensity of the extremes are shown in Figs. 2 and 3 for 1 and 24 h duration. Estimates of the rates of change (β λ and β μ ) are given in Table 2 for all durations, together with their standard errors and the corresponding significance level.

Fig. 2
figure 2

Temporal development in the annual number of extreme events modelled by Poisson regression according to Eq. 5. The development is exponential, and the slope represents the percentage-wise annual increase. The figures in the first, second and third columns compare the observed and re-analysis-driven increases, the RACMO runs and the ECHAM forced simulations, respectively. Note the difference in time period on the x-axis

Fig. 3
figure 3

Temporal development in annual mean intensity of the extremes modelled by linear regression, according to Eq. 6. The figures in the first, second and third columns compare the observed and re-analysis-driven increases, the RACMO runs and the ECHAM forced simulations, respectively. Note the difference in time period on the x-axis and the difference in scale on the y-axis

Table 2 Rates of change, β λ and β μ , using time as independent variable in the regression models

The frequency of extremes shows a significant increase for all datasets and all durations, when a significance level of 5 % is applied (see Table 2). There is a tendency towards a decrease in the significance level when the duration increases, but the estimates of β λ show no duration based-variations. Dividing the data into observations/re-analysis-forced (SVK and RACMO-ERA40) and ECHAM-forced (RACMO-ECHAM and HIRHAM-ECHAM) sets, data from the first group exhibit a substantially larger rate of change. β λ is also larger for SVK data compared to RACMO-ERA40 data, but considering the bounds of uncertainty the two rates cannot be statistically differentiated. RACMO-ECHAM and HIRHAM-ECHAM rates are within the same order of magnitude, but for 1-h extremes the difference is large enough to enforce a visible difference in the predicted annual number of extreme events at the end of the century, see Fig. 2.

The rate of change for annual mean extreme intensities, β μ , is significantly different from zero for the two ECHAM-forced models for all durations, when a significance level of 5 % is applied. However, β μ differs substantially between RACMO-ECHAM and HIRHAM-ECHAM, the latter showing a significantly higher development over time but also higher annual variations (see Fig. 3). Furthermore, there is a tendency towards a decrease of the rate when the duration increases. For SVK a significant increase is seen for 1 and 3 h, the latter only for a significance level of 10 %. This development is not reflected by the RACMO-ERA40 simulation. For longer durations both data sets show no significant changes in the intensities.

The interpretation given above implicitly assumes that a direct comparison of changes in annual frequencies and mean intensities of precipitation extremes between point observations and spatially averaged data is possible. This is a standard assumption in many statistical downscaling methods (Arnbjerg-Nielsen 2012; Willems et al. 2012) and seems to be reasonable also in the current analysis. The observation period of the SVK data may be so short that the increase could be a result of an inter-decadal oscillation or changes caused by other factors than GHG emissions. Based on a long historical record from Belgium, Ntegeka and Willems (2008) identified a cyclic pattern with a period of 30–50 years and showed that the present high occurrence of precipitation extremes can partly—but only partly—be attributed to such a cyclic pattern. This leads to the hypothesis that the common patterns of SVK and RACMO-ERA40 can be attributed to the presence of an inter-decadal variability, which currently leads to more frequent extreme precipitation. However, for β μ there is a difference between SVK and RACMO-ERA40 for the short durations and between RACMO-ECHAM and HIRHAM-ECHAM for all durations. This could indicate that changes in the magnitude of short-duration rainfall extremes are either less dominated by inter-decadal variability or masked by the individual RCM ability to simulate precipitation extremes.

In summary, our analysis of both observed and RCM simulated rainfall data points towards increased frequency and magnitudes of the extremes. Important issues have been raised above regarding both the future levels of increase and the RCMs ability to predict it, but the observed series are too short to allow an interpretation of the underlying mechanisms generating the changes over time.

4.2 Seasonal distribution of occurrences

Figure 4 shows the seasonal distribution of the extreme events for the four data sets. The seasonal distribution based on a duration of 3 h are similar to the results presented for 1 and 6 h. The histograms are based on data from the period 1979–2009, except for RACMO-ERA40 where data from 1958 to 2002 are used. It has been verified that the difference in evaluation period has no influence on the distributions.

Fig. 4
figure 4

Seasonal distributions of the occurrence of extreme rainfall events. For SVK, RACMO-ECHAM and HIRHAM-ECHAM data these are evaluated for the period 1979–2009. For RACMO-ERA40 data from 1958 to 2002 is used

The duration-dependent shift in distribution for SVK is in accordance with our expectations in relation to the dominant mechanisms behind the rainfall type. For the short durations the RACMO-ECHAM data show a reasonable agreement with the SVK data, while the HIRHAM-ECHAM data have a general bias towards too many occurrences in the spring. For the longer durations both RACMO-ECHAM and HIRHAM-ECHAM are biased towards too many occurrences in the spring and too few in the summer and autumn. A comparison between RACMO-ECHAM and RACMO-ERA40 data shows that the seasonal distribution of occurrences is partly controlled by the RCM forcing. For 1-h data the RACMO-ERA40 and SVK distributions are similar. However, looking at the longer durations the RACMO-ERA40 distributions do not resemble the SVK distributions well. The main difference to be mentioned is the extensive occurrences of winter extremes in the RACMO-ERA40 simulation.

In summary, the RACMO data compare better to the observed seasonal distribution of extreme events for short durations than HIRHAM data. For longer durations data from none of the RCM simulations show a satisfying seasonality agreement with the SVK data; however, RACMO-ECHAM and HIRHAM-ECHAM data perform similarly. This suggests that the choice of RCM, and thereby the convective scheme, is controlling the seasonality of precipitation extremes for the short-duration extremes, while the external forcing becomes more importance for longer durations.

4.3 Spatial correlation structure

The spatial correlation structure is computed according to the method described in Section 3.4, and a comparison is made between the structure obtained for SVK data, RACMO-ECHAM data and HIRHAM-ECHAM data, see Fig. 5. The results for a duration of 3 h confirm the findings presented in Fig. 5. From comparisons between RACMO-ECHAM and RACMO-ERA40 (not shown) it has been confirmed that the structure of the spatial correlation is only marginally affected by the type of forcing. To help the visual comparison of the correlation curves Table 3 contains estimated e-folding distances, which is the distance where the correlation has decreased to 1/e.

Fig. 5
figure 5

The spatial correlation for observed (SVK) and simulated (RACMO-ECHAM and HIRHAM-ECHAM) mean intensities of extremes for 1, 6, 12 and 24 h duration. An exponential model is fitted to highlight the tendencies using a least-squares technique

Table 3 Estimated e-folding distances [km]

For SVK data shorter correlation lengths are obtained for shorter durations, see Table 3. This behaviour is again in accordance with our knowledge about the mechanisms behind the different rainfall types. However, this is not the case for data from the two RCMs that both show larger spatial correlations compared to SVK data. For RACMO data shorter correlation lengths are obtained for longer durations. For HIRHAM data the correlation lengths remains relatively constant, but HIRHAM do show shorter correlation lengths than RACMO data for short durations. This difference cannot be attributed to a better HIRHAM simulation of short-duration extremes, since the seasonal distribution of occurrences in HIRHAM data at the same time was found to be biased, i.e. too many of the short duration extremes occurred in the spring (see Section 4.2).

For a duration of 24 h the correlation models for SVK, RACMO-ECHAM and HIRHAM-ECHAM data are virtually identical. A disparity is still visible for short distances, but this could be attributed to the fixed grid of the climate model output, the uncertainty that lies within the statistical treatment, or an effect of the spatial averaging in the RCM data that may influence neighbouring grid points.

For 1 and 6 h duration there is a clear difference in the spatial correlation pattern between the SVK and RCM data sets. This conclusion is drawn even when acknowledging that we compare point and areal rainfall extremes. Although spatial averaging will reduce the variance of a set of data, it is reasonable to assume that the correlation structure between grid points will be scale invariant, perhaps with the exception of neighbouring grid points. Correlation curves for observed areal rainfall extremes can in turn be misleading due to the introduced interpolation error, which is of high importance for extremes (Arnell et al. 1984 and Hofstra et al. 2010). Considering a situation where unbiased areal rainfall extremes are available, we expect that differences in correlation structure would be found only on short ranges. Grid cells more than 50 km apart should not be affected by the spatial averaging, so this points towards inadequacies in the correlation structure for short durations, and hence in the convective parameterization of the RCM.

In summary, it seems that neither the RACMO nor the HIRHAM RCM can reproduce the observed correlation structure. This is probably related to different technical properties of the models. RACMO and HIRHAM show many differences both in relation to the mean intensity of extremes (see Fig. 3) and in relation to the seasonal distribution of the extremes (see Fig. 4).

On the positive side, the estimated spatial relationships were found easy to compute and interpret. Further work is needed in order to develop a more specific measure, e.g. based on the e-folding distances as suggested in Table 3. Together with an evaluation of the seasonal distribution of the extremes we believe that this type of measure possessed high potential as a new tool in the validation process of climate projections, most importantly for evaluation of climate models but also for evaluation of statistical downscaling procedures.

5 Conclusions

The analysis shows an increase in the frequency of observed point rainfall extremes for durations between 1 and 24 h in the period 1979–2009. The increase is to some extent reproduced by the RACMO RCM when forced by ERA40 re-analysis data. The two sets of data also show similar results with respect to changes in the annual mean intensity of rainfall extremes, except for high temporal resolutions (1 and 3 h) where the SVK data exhibit larger changes than the ERA40 driven dataset. Similar analysis of two ECHAM forced RCM simulations shows significant increases in the frequency and mean intensity of extremes over a 150-year period. The change in the frequency is significantly lower than in the observed series, while the changes in the annual mean intensity of extremes differ between the two RCMs, the HIRHAM RCM being larger than in the observed series.

For a specific extreme rainfall event there is no universal means of identifying the generation mechanism behind. However, the season of occurrence and analysis of the spatial correlation structure of the extreme rainfall events are both indirectly related to the rainfall type. Therefore histograms of the seasonal distribution of occurrence of rainfall extremes and the spatial correlation structure were computed and compared for observed point rainfall extremes and RCM simulations. The results showed that both the HIRHAM and RACMO RCMs have difficulties in capturing the spatial correlation structure of observed data for temporal resolutions of 1 and 6 h. This conclusion is independent of the RCMs’ ability to reproduce a realistic seasonal distribution for the occurrence of extremes. Overall it leads to reduced confidence in the RCMs’ ability to predict future change in short-duration rainfall extremes.

As major impacts of increased pluvial flood risk are expected to occur for sub-hourly and hourly rainfall durations, we need an increased focus on the possible change in convective rainfall as a response to GHG forcing. The preliminary conclusions from the present study suggest that a reinforced focus is needed within this area. Our suggested additional criteria for evaluating RCM modelling performance and other downscaling methods is to assess the seasonal distribution of occurrence and the spatial correlation structure for rainfall extremes on high temporal resolutions, such as 1 h. The result of such effort would be highly useful for prediction of changes in the hydrological cycle.