1 Introduction

As climate changes under human influence, a need has arisen for tools to estimate the nature and amplitude of future changes in precipitation. Sun et al. (2006) analysed daily precipitation data from 14 AGCMs that used the SRES A2 scenario (Nakićenović et al. 2000) to evaluate potential future changes and found a global shift toward more intense and extreme precipitation during the late twenty-first century. Dessai et al. (2005) estimated uncertainties in regional climate projections applying skill scores to seasonal precipitation data and found that the differences in emission scenarios dominates the precipitation uncertainty at the 95th percentile.

In the European framework project PRUDENCE, AGCMs are used to provide low resolution boundary conditions for a region inside which a detailed regional climate model (RCM) is run (Christensen et al. 2007). The models are validated using current climate and then used for making climate projections. Here, we use daily precipitation output from an ensemble of RCMs to assess future precipitation projections. We present ways to reliably determine the ability to simulate daily precipitation statistics based on a comparison with observed precipitation, before using the models to assess future changes. This is done using a method based on statistical metrics specially produced for the task. The nature of the metrics used to intercompare models and reality takes into account the properties of precipitation statistics. In particular, the metrics focus on the spectrum of precipitation amount versus daily intensity.

A model matching the observations for the full spectra of precipitation values indicates that the model is likely also to capture the full range under climate change, including the tail of the distribution where the statistics often are not well-sampled. Furthermore, precipitation statistics vary from region to region based on the type of climate. Perkins et al. (2007) suggest using a metric where probability density functions (PDFs) are constructed to generate a ‘match metric’ based on the overlap of normalised PDFs (model and observational precipitation, respectively). A perfect overlap results in a skill score of one while the score is closer to zero for poor overlaps.

Note that the approach of using PDFs as our method of comparison is focused on the models’ ability to optimally describe the full precipitation distribution. Although the PDF technique implicitly includes moments like the average and standard deviation, it does not directly describe their possible differences versus observations. Furthermore, the tail of the PDFs has only a small influence on the skill score due to a relatively small contribution from extreme events.

The analysis approach described here will be subsequently used on RCM data from the ENSEMBLES project (Hewitt 2005). In this study we are aiming at quantifying the error over the entire PDF, using only one out of several metrics that will subsequently be used for ranking individual ENSEMBLES RCM parameters.

Robustness of the optimal metrics must be estimated so that we can state the quality of our answers. Here we use a Monte Carlo method based on bootstrapping with replacement to evaluate the robustness of the skill scores. Using the results from the robustness test, we then consider the technique of the crossing-point statistics (Gutowski et al. 2007) in order to address the implications under climate change conditions. Gutowski et al. (2007) examined changes in normalised precipitation intensity spectra for several regions in the United States under global warming as produced by two RCMs. They found a single percentile separating decreasing from increasing contributions to the normalised spectrum and that it occurred around the 70th percentile of cumulative precipitation, irrespective of the governing precipitation processes or which model produced the simulation. They further suggested that this should be a robust behaviour when the intensity spectra adhere to a gamma distribution. Here we seek to demonstrate that this result also applies for Europe, and is a robust result not depending on model formulation. Our working hypothesis is as climate changes, the crossover point separating increasing from decreasing contributions to the normalised spectrum can be a basis for relating precipitation changes with a temperature change.

The structure of this paper is as follows. Section 2 presents the model data and the observations used in this study. In Sect. 3, we present the results of assessing model quality from a match with observations. In Sect. 4, we use these highest scoring models to determine any possible scenario changes in precipitation distribution. The paper ends with a discussion in Sect. 5 and conclusions in Sect. 6.

2 Data

Two different data sets are analysed in this study. The first data set consists of daily precipitation from ten PRUDENCE RCMs driven by the GCM model HadAM3H (Pope et al. 2000; Buonomo et al. 2007), including control (1961–1990) and scenario (2071–2100) runs for each (Christensen and Christensen 2007). Control runs are forced by observed SSTs and observed greenhouse gases. All of the simulations we use here have 50 km horizontal resolution, and where future scenarios are discussed we refer to the SRES A2 scenario (Nakićenović et al. 2000). The different models used in this study and their parent institutes are listed in the leftmost column of Table 1.

Table 1 Results of robustness analysis for wet days for ten different RCMs all driven by the HadAM3H GCM

The second data set in this study is based on measurements monitored by the European Climate Assessment (ECA) project (http://eca.knmi.nl/). ECA was initiated to combine, present and analyse daily series of meteorological measurements (Klein Tank et al. 2002). In this study, we examine climate change by analysing series of daily observations of precipitation at ECA stations within nine European sub-regions with varying climatology (see Fig. 1). Stations with fewer than 3,000 daily measurements during the period 1961–1990 were arbitrarily excluded from further analysis due to poor sampling. This left a data set containing observations from 455 stations used in this study. Note that the distribution of stations is inhomogeneous within some sub-regions and sparse for some regions.

Fig. 1
figure 1

Geographical position of ECA compiled precipitation stations (marked by times symbol) and extent of eight predefined European PRUDENCE sub-regions marked by solid lines (Christensen and Christensen 2007). These regions are British Isles (BI), Iberian Peninsula (IP), France (FR), Mid-Europe (ME), Scandinavia (SC), the Alps (AL), the Mediterranean (MD), and Eastern Europe (EA). A ninth region (dashed lines), defined as the maximum area included in all the ten PRUDENCE models, termed ‘EUx’ will also be used in this work. This region covers latitudes 41.0°N–59.5°N and longitudes 10.5°W–30.0°E. Note that region SC is not covered well by models RegCM and PROMES while the IP and MD regions are not well covered by the HIRHAM_M model

Because we want to compare the model data to observations, PDFs are constructed for both data sets. Dry days are defined as days with precipitation less than 1 mm and these days are removed from the two precipitation data sets. Observations of weak intensity precipitation are not always reliable and low intensity precipitation does not contribute significantly to the total amount of precipitation (Dai 2001). Furthermore, low rainfall has a small societally impact through flash floods, etc. Then PDFs are calculated from the daily land data for each region and model through three steps. First the data are put into bins of 1 mm day−1 width starting at 1 mm day−1. Secondly each bin’s frequency of events is multiplied by the bin average to get the precipitation amount. Finally the binned data are normalised by dividing each bin value with the total wet day precipitation amount for the 30 year period for the particular region and model run in question. The same procedure is carried out for the observational data to enable a direct comparison with the PRUDENCE RCM control runs.

3 Metric analysis

Figure 2 compares binned normalised precipitation distributions for the ten PRUDENCE models driven by the same GCM, for all nine sub-regions, with ECA precipitation data for the period 1961–1990. Note that the plots are truncated at bin 110 mm day−1 for the sake of clarity. With this truncation at 110 mm day−1, a total of about 0.01% of the daily values is omitted and a total of approximately 0.23% of the precipitation amount is omitted.

Fig. 2
figure 2

Comparison between ECA compiled precipitation observations (black) with precipitation data from ten PRUDENCE RCM models (grey) for the period 1961–1990. The panels show distributions for the nine sub-regions. Arrows mark the 95th percentile of accumulated observed precipitation. The vertical lines in the top part of each panel represent the 95th percentile of accumulated precipitation for the various models covering the sub-region in question

The models’ poorest performance versus the observed precipitation distribution occurs for the low-latitude region 2 (IP). Low performance also occurs for the region 7 (MD). The relatively few days with observed light precipitation is not modelled well for region IP and to some extent also regions MD and AL. These three regions are, according to the relatively small slope of the observational PDFs compared to the other regions’ PDFs, characterised by long periods of drought or only light precipitation interspersed with heavy showers.

As seen in Fig. 2, extreme precipitation is, as expected, more difficult to simulate. One reason for this is due to the 50 km horizontal resolution of the RCMs that smooth any single-point extremes occurring in the observational data. A conversion to gridded data for the observations would decrease its extreme precipitation. The comparison between model data and observations presented here is mainly a demonstration case to the upcoming study using ENSEMBLES RCM data validated using a gridded dataset of observed precipitation (Haylock et al. 2008).

Also given in Fig. 2 are the 95th percentiles of accumulated precipitation, both for observations and model simulations. The model percentiles for the low-latitude regions IP, AL, and MD have a relatively large scatter compared to the other regions. For regions IP and FR no model percentile is able to match the observed percentile due to dissimilar PDFs.

To determine each model’s precipitation skill we compare each PRUDENCE model against observed precipitation using the skill score metric of the degree of overlap of normalised precipitation intensity PDFs (Perkins et al. 2007). The PDF overlap as a function of sub-region for all ten models is shown in Fig. 3. We see that the RegCM, HadRM and CLM models have high scores for most regions and that the lowest scores occur for region IP. Furthermore, the spread in skill score for the ten models is relatively small for region BI.

Fig. 3
figure 3

Results of comparing ten PRUDENCE models, all driven by the HadAM3H GCM, for precipitation against ECA precipitation data, using daily values for the period 1961–1990. Plotted is the common PDF area as a function of sub-region (cf. Fig. 1) for all ten models. The highest skill score for each region is BI 96.6%, IP 82.3%, FR 95.3%, ME 96.6%, SC 97.7%, AL 93.6%, MD 92.3%, EA 94.9%, and EUx 97.5%

The European sub-regions are chosen in order to be consistent with previous analyses (e.g. Christensen and Christensen 2007). One example where the choice of region might not be physically justified is the SC region, where the precipitation in western Norway is much more extreme than that in the rest of the region. The effect of dividing the region into two smaller regions depending on the skewness of the frequency PDF, i.e. separating the parts of Norway with extreme precipitation, was explored but the ranking of the models was not affected by this (Boberg et al. 2007). The European region was also divided into a Northern and a Southern region, with the same result (Boberg et al. 2007). This suggests that the results are not dependent on the choice of regions. This is an encouraging sign of robustness in the overall method. Seeking an optimal subdivision is not pursued here, though we note that the models that do best in the SC region (and in the subdivisions thereof described above) also do well in the AL region (see Fig. 3) which has similar orography to western Norway—i.e. it is a region with a similar mechanism for driving precipitation—and this is at least a consistency check of the finding.

The robustness of the single model skill scores shown in Fig. 3 is evaluated by a Monte Carlo method based on bootstrapping with replacement (Efron and Tibshirani 1993). From the observations we have a number of time series for each region, and by making random selections of these time series we can evaluate how sensitive the skill score is to changes in the sampling of observations. The selections are here repeated 10,000 times for a good statistical description of the variations in the skill scores for the different models, and we can determine the robustness of the ranking of the individual models. We can thus assess whether or not one model can score significantly higher than the others.

In Table 1 we show the percentage distributions of models’ probabilities of ranking the highest, across all regions, for 10,000 Monte Carlo bootstrap trials. The values given in bold refers to the models having the highest skill score for each region in Fig. 3. Once again we see that the models CLM, HadRM and RegCM score well. We also note that the best performing models from Fig. 3 are clearly high ranked for all regions. The results, therefore, are fairly robust.

4 Crossing-point statistics

We use the method of Gutowski et al. (2007) to study changes in precipitation over long time scales. Daily precipitation PDFs for the HIRHAM_D experiment for all land grid points are shown in Fig. 4 for both the control run and the scenario run. We show here only the result for the HIRHAM_D model (of the Danish Meteorological Institute) since all models give similarly shaped curves as indicated by Table 2. Similar to the results by Gutowski et al. (2007) and Sun et al. (2007), the plot in Fig. 4 indicates that days with moderate precipitation (less than about 10 mm) will contribute less to the total precipitation in the future scenario while days with higher (more than about 10 mm) precipitation will have a larger contribution. The crossover point x c for this transition is found by subtracting the control precipitation from the scenario precipitation (see Fig. 5) and determining where the curve goes from negative to positive values. The related transition percentile, \(P_{x_{\rm c}},\) is calculated as the accumulated precipitation for the control run from 1 to x c mm day−1 divided by the total precipitation.

Fig. 4
figure 4

Binned normalised precipitations for the full HIRHAM_D land region covering 4,150 grid points. Square symbols refer to the 1961–1990 control run and pluses represent the 2071–2100 scenario run. The arrow marks the crossover point x c at 10.4 mm day−1 separating increasing from decreasing contributions to the normalized spectrum

Table 2 Characterising parameters for the scenario minus control difference for the ten RCMs (cf. Fig. 5 for the HIRHAM_D model) using their full individual land region
Fig. 5
figure 5

Binned normalised precipitation change (scenario minus control run) for the full HIRHAM_D land region. The error bars are calculated using Eq. 1. The curve’s crossover point x c is 10.4 mm day−1 with a corresponding crossover percentile \(P_{x_{\rm c}}\) of 69.2%. The percentile value P max of 99.7% relates to the largest significant PDF change x max of 64 mm day−1 as shown by the inset (see Table 2)

The error bars, σ i , in Fig. 5 are calculated by propagation of errors assuming independence between PDF bin values for the control and scenario runs:

$$ \begin{aligned} \left({\frac{\sigma_i} {\Delta_i}}\right)^2 &= E_{C}^{2} + E_{S}^{2} \\ &= {\frac{M} {N_{C,i}\, {U_{C}}}} + {\frac{M} {N_{S,i}\,{U_{S}}}}. \\ \end{aligned} $$
(1)

The left hand side of Eq. 1 is the squared relative error of the normalised precipitation change (scenario minus control run), where Δ i is the normalised precipitation change plotted in Fig. 5. E 2 C and E 2 S are the squared relative errors of the normalised precipitation for each run, assuming Poisson statistics, corrected for spatial correlation. M is the number of land grid points for the region in question, N i is the number of values in bin i and U is the number of uncorrelated areas in the region in question. U is found by calculating the autocorrelation between all land grid points. All grid points with a correlation above e −1 relative to the grid point in question are defined as belonging to the same correlated area. U is then given by the total number of land points in the specific region divided by the average size of the correlated area. This is done for the control run and scenario run separately. For the HIRHAM_D model, M = 4,150 and U is 29.6 and 27.0 for the control and scenario periods, respectively, and we see in Fig. 5 that the amplitude of the precipitation change is significant.

As indicated by the error bars in the insert, the normalised precipitation PDF change for the full land region in Fig. 5 is no longer significantly different from zero at 64 mm day−1 (99.7 percentile). The analysis given in Figs. 4 and 5 were performed for all ten RCMs with their full individual land regions, and the characteristic values for all these models are summarised in Table 2. The main effect of considering the full land region instead of only the maximum land region common to all models is that the error bars are roughly 20% smaller. This error reduction in turn is found to increase x max by about 10% and P max by about 0.5 percentage points.

The overall shape as well as the amplitude of the curves are found to show strong similarities for all PRUDENCE RCMs (cf. Table 2). The crossover point x c is similar for all models with the possible exception of the HadRM model with x c shifted towards lighter precipitation. The crossing percentile is roughly 60% in most models, which agrees better with the theory presented in Gutowski et al. (2007) than the model results they analysed. This may be due in part to the longer sampling periods used here (30 years) versus the 9-year periods they analysed. Investigating the crossover points for the individual sub-regions gave values in the interval 9–13 mm day−1 for all models and for all sub-regions except IP, AL, and MD. Here the values were in the range 14–28 mm day−1 (Boberg et al. 2007).

Table 2 shows that the largest significant PDF change is in the range 64–90 mm day−1 at an average percentile well above 99%. Note that when comparing the error bars in the insert of Fig. 5 with the variance of the PDF change, we see that the error estimates are conservative (i.e. larger than the given variance of the curve) leading to an underestimation of x max and P max.

Also given in Table 2 is the threshold for extreme events, adopting the definition by Liebmann et al. (2001) as days with precipitation exceeding 4% of the mean annual total precipitation for the region in question. This definition accounts for spatially varying climatologies and the 4% threshold is chosen to ensure that extreme events occur relatively rarely. We see that the largest significant PDF change for the full PRUDENCE model regions is clearly above the threshold for extreme precipitation. The rightmost column of Table 2 gives the average number of days N max per year with precipitation exceeding x max, after correction for the number of uncorrelated areas within the individual full land regions. N max is in the range 2–7 events per year, making it another measure of extreme precipitation.

By dividing the 30-year long scenario period into three 10-year periods we investigate how x c changes with time relative to the control run (cf. Fig. 5). Plotted in Fig. 6 are the precipitation distribution changes for all nine sub-regions. We use the optimal models given in Table 1 for the nine sub-regions separately, but the results are similar using the other models (not shown). The error bars in Fig. 6 are calculated using Eq. 1, where M is between 124 and 1,608 depending on the size of the sub-region and U is between 1.8 and 12.3 depending on the size and extent of the sub-region in question. No region shows any clear temporal change in x c which is in accordance with scenario precipitation theory where this consistent change can be described by a gamma distribution having a single transition point between precipitation rates that contribute more/less to the total precipitation (Gutowski et al. 2007). Notable is that the amplitude of the difference between the 10-year scenario periods and the control run is increasing with time. As the mean temperature over most of the European domain is slowly increasing within the 30 year time slice 2071–2100, this is an indication that changes in the precipitation intensities are approximately linearly linked with the degree of warming. Moreover, this dependency is in general strong enough to overcome natural variability. Exceptions to this general behaviour are seen for the AL and EA sub-regions where no clear temporal change is seen.

Fig. 6
figure 6

Binned normalised precipitation change (scenario minus control run) for three 10-year scenario periods compared to the 30-year control period. The panels represent the individual result for the nine sub-regions, using the optimal models given in Table 1, separately

For the six regions and models with a distinct PDF change profile in Fig. 6 similar to the one given in Fig. 5, we calculate x max and x e (cf. Table 2). The values are BI: 37 (41) mm day−1, FR: 38 (30) mm day−1, ME: 38 (37) mm day−1, SC: 52 (32) mm day−1, EA: 46 (31) mm day−1, and EUx: 65 (25) mm day−1. With the exception for region BI, the PDF changes described in Fig. 5 and Table 2 are also valid for extreme precipitation for individual sub-regions, using the definition by Liebmann et al. (2001).

5 Discussion

Extreme precipitation has important effects on society and, therefore, is it of great interest to determine the extreme precipitation levels with significant scenario change. In this study we used bin sizes of 1 mm day−1 when calculating the PDFs. According to Eq. 1, the error in PDF change will decrease when the number of values in a particular bin increases. By making the bin errors smaller, thereby increasing the largest significant PDF change x max, we make an attempt to find the maximum precipitation for which the found precipitation change is significant. We, therefore, calculated PDFs using bin sizes of 1, 2, 3, 5 and 8 mm day−1, respectively, for the HIRHAM_D model. The corresponding x max values for these bin sizes are 64, 73, 79, 88 and 100 mm day−1 (cf. Table 2). The relative increase in x max for bin sizes above 8 mm day−1 turned out to be smaller than the relative increase in bin size and, therefore, not of further interest. The equivalent values for P max and N max (cf. Table 2), for HIRHAM_D, when using PDF bin sizes of 8 mm day−1 are 99.95% and 0.34, values that characterise very extreme conditions.

The results presented here can now be used in two ways. First, we can estimate precipitation PDFs for the scenario period 2071–2100 using models with varying scores for the control period and then use the robustness of change in the PDFs to assess future precipitation distribution. Secondly, we can use the analysing and validation methods described in this study on the upcoming ENSEMBLES RCM models (Hewitt 2005) and make similar evaluations for the full twenty-first century using its model data. Furthermore, the ENSEMBLES RCM data will be validated using a high-resolution observational dataset specially produced for the task (Haylock et al. 2008).

In the present analysis, no attempt has been given to address seasonal variation in the ability to simulate precipitation intensities. However, as found by Frei et al. (2003) there appears to be substantial seasonality in the ability to simulate the daily precipitation statistics. In the same study, no hint toward specific errors of particular physical parameterisations as regards the simulation of daily precipitation statistics was identified. In fact, two models sharing similar parameterisation packages (HIRHAM_D and REMO) differed considerably in their representation of the statistics, while the models with similar error structure in all statistics (CHRM and REMO), shared the same dynamical core, but differed in parameterisation package. Both results are confirmed by the present study. This intriguing result suggests that the simulation of daily precipitation statistics is sensitive to a variety of model components including the dynamics, physical parameterisation, and their interplay. An in depth analysis of this finding is beyond the scope of the present work.

Our analysis shows that for all sub-regions we find that the robustness of the crossing-point exists. As the different regions experience different changes in mean annual precipitation (Boberg et al. 2007), there is a clear indication that our finding is suggesting that a detection of such a signal in observations is indicative of the change being related to temperature changes.

6 Conclusion

All of the RCMs analysed in this study show distinct similarities with observations for moderate precipitation intensities during the control period 1961–1990. The largest discrepancies were found for days with light precipitation (below ∼5 mm day−1) and for days with extreme precipitation (above ∼80 mm day−1). With the exception of region IP, the overlaps for the highest scoring models were between 92 and 98%. These similarities have been validated and confirmed using a robustness test. No single model was found to outperform all other models in terms of the metrics we have chosen here. However, models CLM, HadRM, and RegCM were found to score higher than the others in all regions, except region 7 (MD), whereas models HIRHAM_D and RCAO were found to never score high using our metrics. However, note that a high overlapping skill score between a model and observations for precipitation PDF does not necessarily mean that they correlate well for extreme conditions that are too rare to significantly contribute to the PDFs.

When comparing climate scenarios (2071–2100) relative to the control run (1961–1990), we have found that a significant change in the modelled PDFs occurs for both full land regions and for specific European sub-regions. When using the full land regions, the change in precipitation PDF is significant for precipitation up to 64–90 mm day−1, depending on the model analysed. This level of daily precipitation is clearly above a commonly used threshold for extreme precipitation. This change in PDF is distinguished by an increase in the contribution to the total precipitation from the more intensive events (above ∼11 mm day−1) in the scenario period in all sub-regions. This change is also found to increase with time during the scenario period for all regions except regions AL and EA.

The models are also found to agree on being able to detect projected changes in the intensive tail of the PDF. Also, it is important to note that this is a robust result that cuts across all models despite their different ability to simulate the observed statistics.