1 Introduction

Global climate models (GCMs) are our primary tool for understanding how climate may change in the future with increasing greenhouse gases. These typically have coarse resolutions with grid spacings of 60–300 km (Taylor et al. 2012). To provide regional detail, higher resolution regional climate models (RCMs; 12–50 km grid spacing) are often used, which only span a limited area (Jacob et al. 2014). These give a better representation of mountains and coastlines and fine-scale (order 10–100 km) physical and dynamical processes. In general, RCMs are able to capture the average statistics of daily precipitation on scales of a few grid boxes, with greatest agreement for moderate intensities and model biases increasing for heavier events (Boberg et al. 2009; Kjellström et al. 2010).

Both GCMs and RCMs with typical grid spacings (> 10 km) rely on a convection parameterisation scheme to represent the average effects of convection. This simplification is a known source of model errors and leads to deficiencies in the diurnal cycle of convection (Brockhaus et al. 2008) and the inability by design to produce hourly precipitation extremes (Hanel and Buishand 2010; Gregersen et al. 2013). Very high resolution models (order 1 km grid spacing), can represent deep convection explicitly without the need for such parameterisation schemes (Kendon et al. 2012; Hohenegger et al. 2008). Such models are termed ‘convection-permitting’ (or for simplicity sometimes ‘convection-resolving’ but this is not stricly true): larger storms and mesoscale convective organisation are permitted (largely resolved) but most turbulent kinetic motions are not represented (Wyngaard 2004). More specifically, while there is some evidence that km-scale resolution represents convection in some bulk sense (Langhans et al. 2013), resolving convective updrafts requires about ten times higher resolutions (Dauhut et al. 2015).

Convection-permitting models (CPMs) are commonly used in short-range weather forecasting, where they have been shown to give a much more realistic representation of convection and can be used to forecast the possibility of localised high-impact rainfall not captured at coarser resolutions (Done et al. 2004; Richard et al. 2007; Lean et al. 2008; Weisman et al. 2008; Weusthoff et al. 2010; Schwartz 2014). However, due to their high computational cost, they have not commonly been applied at climate-time scales. Studies to date show that convection-permitting models do not necessarily better represent daily mean precipitation (Chan et al. 2013), but have significantly better sub-daily rainfall characteristics with improved representation of the diurnal cycle of convection (Ban et al. 2014), the spatial structure of rainfall and its duration-intensity characteristics (Kendon et al. 2012), the intensity of hourly precipitation extremes (Chan et al. 2014; Ban et al. 2014; Fosser et al. 2015), orographic precipitation and snowpack (Liu et al. 2016), which are typically poorly represented in climate models.

Convection-permitting models provide a step change in our ability to represent convection, but there are still remaining issues. Smaller showers are not properly resolved, which results in a tendency for heavy rain to be too intense and for cell sizes to be too large. CPMs are also sensitive to sub-grid scale process representation (turbulence, microphysics), associated with many unknowns. The use of ever higher resolution does not necessarily result in convergence in terms of the representation of convection. For example, showers tend to become smaller (more speckly) with finer resolution rather than upscale on to the correct meteorological scale (Hanley et al. 2015) and improvement with resolution can depend on use of appropriate parameterisation (Bryan and Morrison 2012).

Although convection-permitting simulations have been used at climate-scales on small domains in several regions of Europe and North America (see Prein et al. 2015 for a review), Mediterranean intense precipitation events occuring in autumn have not yet been studied with such high-resolution on long time-scales. These events have been mainly studied on single cases with convection-permitting models within the the Mesoscale Alpine Programme (MAP, Richard et al. (2007) and the HyMeX project (Drobinski et al. 2014) and climatologically with convection-parameterised models within Med-CORDEX framework (Berthou et al. 2016; Cavicchia et al. 2016; Vaittinada Ayar et al. 2016; Ruti et al. 2016). Khodayar et al. (2016) compared various convection-permitting models and convection-parameterised models on a single case study and showed that the former better represent the short-intense convective events whereas the convection-parametrized models tend to produce a large number of weak and long-lasting events. Although convection-parameterised models at scales of 10–40 km are able to capture the role of orography, blocking and convergence lines in shaping heavy-precipitation events, organised convection only represented at convection-permitting scales and interaction of this convection with the orography can be important in the triggering, propagation and life-time of some heavy precipitation in the Mediterranean (Ducrocq et al. 2008; Bresson et al. 2012; Manzato et al. 2015; Meredith et al. 2015; Barthlott and Davolio 2016).

Following the work of Leutwyler et al. (2017), who provided an analysis of the performance of the 10-year long ETH-2.2 km simulation in comparison with the driving 12 km simulation, we compare 9 years of simulations for a pan-european domain from the UKMO and the ETH 2.2 km models with 12 km models and with observations. The main added value of the article is to provide the first model-intercomparison study of convection-permitting climate simulations across the wide variety of European climates and to objectively assess in which regions and seasons they differ most with coarser resolution models in terms of precipitation.

After presenting the models and datasets in Sect. 2 and the methods in Sect. 3, we identify regions and seasons where the 2.2 km models differ most from the 12 km models in terms of distribution shape and mean of hourly precipitation in Sect. 4 and evaluate if this is an improvement against observations. We then gain more insight as to how the distribution changes in summer in Sect. 5. Finally, we focus on the representation of Mediterranean heavy precipitation in Sect. 6 with the use of high percentiles and an illustrative case study. We provide conclusions and a discussion in Sect. 7.

2 Datasets and simulations

2.1 Datasets

2.1.1 Daily precipitation

For the analysis of daily precipitation we use the regional gridded datasets presented in the top section of Table 1 covering the UK, France, Germany, the Netherlands, the Alps and Spain. Regional datasets were chosen for the comparison, as advised by Prein and Gobiet (2017): their native resolution are higher than the european-wide EOBS dataset (Haylock et al. 2008) and include higher densities of raingauges (up to 44 times more). Furthermore, EOBS is not advised to be used for coastal areas and mountainous regions of Southern Europe (Flaounas et al. 2012) and can be biased over regions with a low density of stations, especially regarding the extremes (Hofstra et al. 2010; Lenderink 2010; Prein and Gobiet 2017). Further information about how each dataset was computed can be found in Sect. 8. CMORPH (NOAA Climate Prediction Center morphing method) was also used to evaluate the representation of heavy precipitation events in the Mediterranean in autumn in Sect. 6. It was not included in the rest of the analysis as it is not representative of the whole precipitation spectrum in northern Europe (Kidd et al. 2012).

Table 1 Datasets used in this study: daily datasets are in the top part of the table, hourly datasets are the three last datasets

2.1.2 Hourly precipitation

For the analysis of hourly precipitation we use the datasets presented in the bottom section of Table 1 covering the UK, Germany and Switzerland, which were all the gridded hourly datasets available to the authors. 8 or 9 years were used to compare with the models, they are not necessarily the same as the model years due to data availability (see Table 1). However, we are interested in the multi-year climatology of hourly precipitation, and this is not expected to depend strongly on the exact choice of years providing a sufficient number of years are chosen.

The percentage of missing values for the hourly datasets for 2003–2008 in summer are shown as a map in Fig. 1b. The German dataset shows between 10 and 20% of days with missing data all over Germany, the Swiss dataset about 10% of missing data in the southeast of the country and the UK dataset does not cover some regions in the southeast and northeast of England, and has variable coverage in Scotland with about 40% of missing data. For this dataset, only grid-points with less than 30% of missing data are used and the same points are used in the models to avoid inconsistencies.

It should be kept in mind that possible uncertainties in the datasets arise from rain-gauge undercatch, gridding procedures (Frei et al. 2003), and weather radar measurements (Wüest et al. 2010). The rain-gauge undercatch implies that rainfall intensities may well be underestimated with an amplitude that is difficult to assess. Prein and Gobiet (2017) mention that it can reach up to 80% in mountainous region for snowfall at exposed locations.

All the datasets were conservatively regridded to the 12 km UKMO grid with the Python interface to the Earth System Modeling Framework (ESMF) regridding utility interface before the calculation of indices. The first-order conservative regridding is a variant of a constant method which compares the proportions of overlapping source and destination cells to determine appropriate weights.

2.2 Models

Both CPMs use the same pan-european domain as shown in Fig. 1a defined on a 2.2 km regular grid with a rotated pole located at (43N, 190E). The grid has \(1536 \times 1536\) points and 70 vertical levels for the UKMO model and 60 for the ETH model. Both models are forced at their boundaries with 6-h ERA-interim reanalyses. ETH-2.2 km uses a 12 km-simulation as an intermediate step for the downscaling (dashed domain in Fig. 1a), whereas the UKMO-2.2 km is directly forced by ERA-interim. This large resolution jump (factor 34) for the UKMO configuration implies that the spin-up zone for small-scale transient eddies to develop is larger than for the ETH model. In fact, Matte et al. (2017) suggest that spin-up effects for small-scale transient eddies in the vorticity field are present on a \(3 \times L\) zone, where L is the e-folding distance on which the asymptotic value is reached. According to their findings, we get a spin-up zone of \(3 \times 2 \times 75\) km/2.2 km \(\simeq\) 205 grid points. Comparing maps of mean precipitation between the UKMO-12 km and UKMO-2.2 km (not shown), we removed 220 points from the domain on each side for our analysis (zone depicted in Fig. 1) to prevent contamination from the downscaling method.

The simulations are starting in March 1998 for UKMO-2.2 km and in November 1998 for ETH-2.2 km. The soil moisture initial conditions in UKMO comes from ERA-interim from the start of the run. The ETH-2.2 km initialisation is based on the soil moisture fields of ETH-12 km after 5 years of simulation initialised with the CCLM EURO-CORDEX simulation (Kotlarski et al. 2014). The UKMO-12 km simulation was set on a wider domain (in grey in Fig. 1a ) and started in January 1998. The article is based on 9 years of simulation from January 1999 to December 2007.

Fig. 1
figure 1

a Domains of the different models and subregion definitions: Neur: Northern Europe, CEurL: Central Europe (low land below 500 m); CEurM: Central Europe medium height (above 500 m and below 1500 m), > 1500 m: high lands above 1500 m (Alps, Atlas and Pyrenees), Med coast: Mediterranean coasts, Med sea: Mediterranean sea. b Percentage of missing values in the hourly precipitation datasets

2.2.1 UKMO 12 and 2.2 km

The Met Office Unified Model (UM) can be run in climate mode (Walters et al. 2016), seasonal forecasting mode (Scaife et al. 2014) or at convection-permitting scales for numerical weather prediction (NWP) (Clark et al. 2016). The UKMO 2.2 km (UM version 10.1) model is based on the UKV Met Office regional model which has been in use for operational numerical weather prediction since 2012 (Clark et al. 2016). The UKMO 12 km (UM version 10.3) is based on the climate version (Williams et al. 2018).

The UM is a non-hydrostatic model with a deep-atmosphere formulation based on a semi-implicit semi-Lagrangian dynamical core: ENDGame (Even Newer Dynamics for General atmospheric modelling of the environment) (Wood et al. 2014). The prognostic fields are discretised horizontally onto a rotated-pole grid with Arakawa C-grid staggering (Arakawa and Lamb 1977) whilst vertical decomposition is done via CharneyPhillips staggering (Charney and Phillips 1953) using terrain-following hybrid height coordinates on 70 levels for the 2.2 km model and 63 levels for the 12 km model. Both models have a 40 km top, but different spacing of levels in the lower troposphere. The lowest grid level is 2.5 m above the ground and the grid spacing increases quadratically with height. The model time-step is 1 mn at 2.2 km and 4 mn at 12 km.

The 2.2 km model does not include any convection parametrization and relies on the model dynamics to explictly represent convective clouds. Although it is acknowledged that not all types of convection are represented with such grid-spacing, this choice was made in the current absence of a scale-aware convection scheme which correctly parametrizes sub-grid convective motion and hands over to the model dynamics for clouds larger than the model filter scale. The UKMO 12 km model uses a mass flux convection scheme based on Gregory and Rowntree (1990) with various extensions which include downdrafts (Gregory and Allen 1991) and convective momentum transport.

The UKMO 12 km model uses a prognostic cloud fraction and prognostic condensate scheme (PC2; Wilson et al. 2008) whereas the UKMO 2.2 km model, like other convection-permitting UM formulations, uses the diagnostic Smith (1990) scheme.

Both models use the radiative transfer scheme of Edwards and Slingo (1996) with a similar configuration as described by Walters et al. (2011), with several upgrades (more details in Stratton et al. 2018). Aerosol absorption and scattering assumes climatological aerosol properties. Full radiation calculations are made every 15 min, with sub-stepped corrections due to cloud evolution performed every 5 minutes. The treatment of cloud microphysical processes is based on Wilson and Ballard (1999), with extensive modifications described in Williams et al. (2018). The UKMO 2.2 km model includes graupel as a prognostic variable in addition to the moist variables of water vapour, cloud liquid, cloud ice and rain used by the 12 km model. This allows the inclusion of a lightning flash rate prediction scheme (McCaul et al. 2009). The UKMO 2.2 km model uses the blended boundary-layer parametrization (Boutle et al. 2014). This scheme transitions from the one-dimensional vertical scheme of Lock et al. (2000), used for lower resolution simulations such as UKMO 12 km, to a three-dimensional turbulent mixing scheme based on Smagorinsky (1963) and is suitable for high-resolution simulations, with a weighting which is a function of the ratio of the grid-length to a turbulent length scale. The UM uses the JULES (Best et al. 2011; Clark et al. 2011) land surface scheme with the default four soil layers with thicknesses of 0.1, 0.25, 0.65 and 1.0 m, giving a total depth of 3 m. The tiles share a common soil water reservoir, with the van Genuchten et al. (1991) relationship describing soil hydraulic conductivity and soil moisture. Note, however, it has recently been discovered that there may be an inconsistency between the Van-Genuchten hydrology and the soil properties provided in the ancillary, such that soil moisture infiltration rates may be too low. Initial tests using Brooks–Corey hydraulic equations, which are consistent with the soil properties, show that this impacts the soil moisture content but appears to have only limited impact on surface temperature and precipitation. The 12 and 2.2 km models also have a different set up in the treatment of saturated soil layers: in the 2.2 km model excess water moves upward, whilst in the 12 km model it moves downward. The sensitivity of the results to this setting are discussed in Sect. 7 (see supplementary material for more detail).

The sub-grid hydrology model is also different: the UKMO 2.2 km configurations use the probability distributed model (PDM Moore (1985)) and the 12 km follows the climate configuration of the TOPMODEL (Beven and Kirkby 1979).

More details can be found in Walters et al. (2016), Williams et al. (2018) and Stratton et al. (2018). The latter article provides a more detailed description of a similar model set-up over Africa.

Note that unlike flux formulated schemes, semi Lagrangian advection schemes are typically not designed to locally conserve the advected quantities. Correctors are applied in the global UM, but in regional configurations the issue is complicated by the need to account for fluxes through the lateral boundaries in the calculation of the error and no correction scheme is implemented in these versions of the model. Stratton et al. (2018) showed that it is likely causing enhanced mean precipitation (by \(\simeq\) 20% in Africa), especially due to increased intense rainfall events.

2.2.2 ETH 12 and 2.2 km

The simulation setup has been introduced in Leutwyler et al. (2016) and verification was performed in Leutwyler et al. (2017). Therefore we here only briefly summarize the most important aspects.

The 12 and 2.2 km ETH simulations have been performed with version 4.19 of the Consortium for Small-scale Modeling weather and climate model (COSMO) (Böhm et al. 2006; Rockel et al. 2008). COSMO is a non-hydrostatic limited-area model solving the fully compressible governing equations with finite-difference methods in a rotated coordinate system, projected on a regular structured grid (Steppeler et al. 2003; Förstner and Doms 2004). To integrate the prognostic variables forward in time, a split-explicit 3-stage Runge-Kutta integrator is used (Wicker and Skamarock 2002). For horizontal advection a fifth-order upwind scheme and in the vertical an implicit Crank–Nicholson scheme are used (Baldauf et al. 2011). Multi-dimensional advection of scalar fields is implemented using the one-dimensional Bott scheme (Bott 1989; Schneider and Bott 2014). The model time-step is 90 s for the 12 km model and 20 s for the 2.2 km model, considerably shorter than for the UKMO equivalent model.

Depending upon resolution, sub-grid convection is parameterized using an adapted version of the Tiedtke mass-flux scheme with moisture-convergence closure (Tiedtke 1989). Cloud-microphysics are parameterized with a single-moment bulk scheme using five species (cloud water, cloud ice, rain, snow, and graupel) (Reinhardt and Seifert 2005), radiative transfer is based on the \(\delta\)-two-stream approach (Ritter and Geleyn 1992), and a turbulent-kinetic-energy-based parametrization is used in the planetary boundary layer (PBL) as well as for surface transfer (Mellor and Yamada 1982; Raschendorfer 2001). The ten-layer soil modeel TERRA_ML has a total soil depth of 15.24 m (Heise et al. 2006) and the aerosol climatology has been changed from the default climatology (Tanré et al. 1984) to the AeroCom climatology (Kinne et al. 2006).

The model configuration follows a two-step one-way nesting approach with the outer nest consisting of a simulation with parameterized convection (ETH 12 km) and the inner nest of a simulation with the parameterization of deep-convection switched off (ETH2km, Fig. 1a). It should be noted that the parameterization of shallow convection remains active in the ETH 2.2 km model, which is an important difference compared to the UKMO configuration (which has no convective parameterization). The outer nest has a grid spacing of 12 km and the inner nest follows the same setup as the UKMO 2.2 km simulation. In both ETH simulations, the vertical direction is discretized using 60 stretched model levels, ranging from the first model level at 20 m to the model top at 23.5 km. To provide adequately spun-up soil moisture fields, the soil layers in ETH 12 km have been initialized on 1 November 1993 based on the soil-moisture fields from the CCLM EURO-CORDEX simulation (Kotlarski et al. 2014), and thereafter integrated for 5 years. Subsequently ETH 2.2 km was initialized on 1 November 1998 with the soil moisture fields of ETH 12 km, leaving two months of integration for soil spinup.

The simulations have been performed with a version of COSMO capable of using GPU accelerators (Fuhrer et al. 2014). The new COSMO version enables execution of the time stepping algorithm entirely on accelerators, which is essential to minimize expensive data movements between the CPU and the GPU. To this end the dynamical core has been rewritten in C++, using the domain-specific Stencil Loop Language (STELLA) (Gysi et al. 2015; Osuna et al. 2015), and the physical parametrization have been ported using OpenACC (2011) compiler directives (Lapillonne and Fuhrer 2014). Data exchange at the sub-domain boundaries (i.e. halo exchange) is handled using a re-usable communication framework. On 144 compute nodes of a hybrid Cray XC30 system, the time-to-solution for a 10-year-long integration is about 1.7 months (Leutwyler et al. 2016).

3 Methods

All the models and datasets are regridded to the UKMO 12 km grid before the computation of all diagnostics in order to show a fair comparison between models. Therefore, scales smaller than 12 km are not evaluated. However, most of the regional datasets are also not necessarily accurate enough to evaluate scales smaller than 12 km. It should be stressed that this approach is not entirely fair for the 12 km models, as they are not supposed to represent the 12 km scale properly, but rather a 25 km scale or larger (Skamarock 2004).

We use the average of values above the \(\mathrm {99\mathrm{th}}\) percentile of all days to evaluate the representation of moderate to intense events (p99avg) to compare fairly the model extremes, independently from the wet-day/wet-hour frequency, as recommended by Schär et al. (2016).

To gain insight into the distribution of precipitation, we use the ASoP method (“Analyzing Scales of Precipitation”, version 1.0 ASoP1) presented in Klingaman et al. (2017), which gives a spectrum of the precipitation intensities contributing to the mean precipitation rate. This allows a comparison of the contribution of different intensities to the mean across different time-scales and grid-point by grid-point to better understand the underlying model physics. It provides a view of differences in the distribution in its entirety and also allows differences coming from a pure shift to higher/lower intensities to be distinguished from an increase/decrease of precipitation in all the bins.

Figure 2 shows the steps of calculation for the ASoP method and illustrates the differences with a probability density function. The example uses the distribution of daily precipitation in the southern UK from the UKCPOBS dataset and the UKMO-12 km model for 1999–2007. The bins used to calculate precipitation frequency in the ASoP method are designed such that the number of events per bin is rather similar across bins ( except in the largest bins so that their signal is not lost in one single bin). This is illustrated in panel b compared to panel a, where the vertical bars representing each bin are spaced differently. The function defining the bins is given by Eq. (1).

$$\begin{aligned} b_n = e^{\left( ln(0.005)+\left[ n \frac{(ln(120)-ln(0.005))^2}{59}\right] ^{\frac{1}{2}}\right) } \end{aligned}$$
(1)

The frequency of events \(f_i\) in each \(i\mathrm{th}\) bin is multiplied by the mean precipitation rate of the bin \(p_i\): \(C_i = f_i\,p_i\). This provides the actual contribution \(C_i\) of the bin to the mean precipitation rate. The sum across all bins (area under the curve) gives the mean precipitation rate. The resulting spectrum is shown in Fig. 2c. It provides information about the relative contribution of each bin to the mean.

Further dividing each bin’s actual contribution by the mean precipitation rate (sum across all bins of the actual contribution spectrum), as shown in Eq. (2) gives a spectrum which area under the curve is unity (fractional contribution, panel d), providing information mostly about the shape of the distribution, independently from the mean precipitation.

$$\begin{aligned} {FC}_i = \frac{C_i}{\sum C_i} = \frac{C_i}{mean} \end{aligned}$$
(2)

This method provides a quantitative visualisation of model differences or biases against observations in all parts of the precipitation distribution, and not only in the head or tail of the distribution like more traditional approaches (probability distribution function, cumulative distribution function). As an example, two spectrums are plotted: a reference spectrum and a model spectrum for the actual contribution (Fig. 2e) and the fractional contribution (Fig. 2f). Their difference is plotted in panels g and h, respectively. These figures illustrate two facts: first of all, the model shows a dry bias compared to the reference: the area under the red curve in panel e is smaller than the area under the blue curve, which is more easily seen in panel g, where the negative area between the curve and the zero line is larger than the positive area. More importantly, it illustrates which bins contribute to the mean bias: the model shows mainly too much precipitation from the intensities below 8 mm/day but a stronger underestimation of the contribution from events between 8 and 100 mm/day. This latter contribution has the largest effect on the mean as the sum of all bins is negative. Panels e and g therefore mix the information between the mean bias and the shape of the distribution.

Looking at the fractional contributions in panels f and h, they mainly illustrate the differences in the shape of the distribution between the model and the data. By construction, the integral of the difference between the two curves is zero: the positive and negative grey areas in panel h compensate each other. These figures mainly show that the lower intensity bins contribute too much to the mean compared to the higher intensity means. It loses the information about the differences in the means of the models. In this case, actual contribution and fractional contributions are not very different, but it is easy to think about a model which would have the right shape of fractional distribution but too much precipitation coming from all the bins: the actual contributions would be larger and the mean bias positive, but the fractional contribution would be similar as in the observations. These contributions are calculated at each grid-point and can then be averaged over a given region or maps can be shown by aggregating the contributions over several bin categories.

On top of this method, we build indices to summarize information about the shape of the distribution and the mean precipitation differences between datasets to serve two purposes:

  • identify the regions, seasons and timescales where the mean precipitation and the shape of the precipitation distribution are most different between the 12 and the 2.2 km

  • in these cases, identify whether the 2.2 km models provide an overall better or worse representation of the contributions of different intensities to mean precipitation where observations are available.

The first index gives information about how much the fractional contributions differ between a model (mod) and a reference dataset (ref): the Fractional Contribution Index (FC) is given in Eq. (3).

$$\begin{aligned} FC({mod, ref}) = \sum \limits _{i} | {FC}_i^{{mod}} - FC_i^{{ref}} | \end{aligned}$$
(3)

FC represents how different the shapes of the two distributions are independently from the differences in the means. It has no units and is the area between the two fractional contribution spectrums (in grey in (Fig. 2f, h). Its minimum and best value is zero while the maximum is two and means no overlap between the distributions.

The second index assesses which model (between m1 and m2) performs best in terms of fractional contributions to the mean (Eq. (4)).

$$\begin{aligned} {FC}_{{best}}(m1, m2) = \frac{FC(m1, {obs}) - FC(m2, obs)}{FC(m2, obs)} \times 100 \end{aligned}$$
(4)

This index measures the percentage of improvement or worsening of the fractional spectrum of model m1 over model m2 with regards to the observations. If m2 agrees better with the observations, the index is positive (the area between m1 and obs is larger than the area between m2 and obs) and the index is negative if m1 agrees better. The index gives some credit to a model which has a better fractional contribution but a worse bias, meaning that it would potentially reproduce well the underlying physical processes.

The indices are calculated at each grid-point and then averaged over regions or presented as maps. With these score we require the models not only to capture the area mean, but also each grid point accurately.

Fig. 2
figure 2

Explanation of the ASoP spectral method. a Probability density function with regular 1 mm bins (frequency of events as a function of event intensity), vertical lines represent the bin widths; b probability density function with the ASoP spectral bins defined in Eq. (1); c actual contributions to mean precipitation (\(\mathrm {mm/day}\)) as a function of event intensity (from b to c, each frequency was multiplied by the average intensity of the bin), the area under the curve is the mean precipitation; d fractional contributions (percentage of the mean that each bin represents): from c to d, each bin was divided by the mean precipitation. The area under the curve is 1; e actual contributions for a reference (blue) and a model (red), the grey area is the mean absolute difference (\(\mathrm {mm/day}\)); f fractional contributions for a reference (blue) and a model (red), the grey area is the absolute difference (%); g difference in actual contributions between the model (red in e) and the reference (blue in e); f same as g for the fractional contrbutions: difference between the red and blue curves in f

4 Regions and seasons of largest difference across resolution

4.1 Comparing 2.2 with 12 km models

In this section, we identify where and for which season the 2.2 km hourly precipitation statistics differ most from the 12 km ones. We use a combination of the absolute mean difference ratio: AMD \(=\) (|mean(2.2 km) − mean(12 km)|)/mean(12km) and the fractional contribution index (FC) presented in Sect. 3, calculated between the 2.2 and the 12 km models. Calculated at each grid point, these measures are then averaged over the different domains presented in Fig. 1a. Only points with average precipitation above 0.03 \(\mathrm {mm/h}\)for hourly precipitation and 0.5\(\mathrm {mm/day}\)for daily precipitation are taken into account, to avoid regions with too little precipitation which do not have a robust precipitation spectrum. The results are not very sensitive to these chosen thresholds (not shown).

Each domain can be quite vast but this region definition is chosen as a first order description of the variability of climates using a limited number of regions, inspired by the Köppen–Geiger map of climates (Peel et al. 2007). Figure 3 shows a plot of FC(2.2 km, 12 km) as a function of AMD for hourly precipitation. The higher the value of FC, the larger the differences in shape of the distribution between the 2.2 and 12 km models and the larger the AMD, the larger the difference in means of the two models.

For both model sets and for each individual region, the 2.2 km models differ most from the 12 km models in summer in terms of precipitation distribution (FC). In terms of differences in the mean (AMD), it is largest over high orography in all seasons (> 1500 m), and it is high in the Mediterranean in summer too. Note however that the ‘Med Sea’ and ’Med coast’ points for summer are not as reliable since they are based on a limited number of points because mean precipitation is very low.

A second conclusion which can be drawn is that the UKMO is overall more sensitive to the changes in resolution than the ETH model for most regions and seasons, comparing the right and left panels. This may partly be due to the fact that the UKMO 12 km model and 2.2 km model do not have exactly the same model physics.

Another point is that in all seasons, the largest differences between resolutions in terms of shape of the distribution (FC) are found over the Mediterranean sea or coasts, these points especially stand out compared to the other regions in autumn and summer.

In most seasons and models, the smallest sensitivity to resolution is found in flat lands in Northern Europe and Central Europe except in summer in the UKMO where these regions show large differences in the shape of the distribution. Most of the differences occur in the shape of the distribution and not in the mean state except in summer in the UKMO. In the ETH model, CEurM (orography between 500 and 1500 m in Central Europe) shows more sensitivity to resolution than flat lands in all seasons in both the mean and shape of the distribution.

Fig. 3
figure 3

Fractional contribution index between the 2.2 and 12 km simulation (FC(2.2 km, 12 km)) as a function of the absolute mean difference (|mean(2.2 km)—mean(12 km)|/mean(12 km)) averaged over the regions defined in Fig. 1a FL for a ETH and b UKMO models. Red is summer, blue winter, cyan spring and black autumn

Guided by these findings, we will mainly focus the rest of the study on the summer season in all regions and the Mediterranean coasts and sea in autumn, where the differences between models are largest.

4.2 Model performances against available observations

We show the mean bias compared to observations and a comparison of the fractional contribution differences [FCbest (2.2 km, 12 km)] at each grid-point on a map for daily (Fig. 4) and hourly precipitation (Fig. 5).

Regarding daily precipitation differences, Fig. 4 first highlights that the Alpine region stands out as a region of large increase in the mean precipitation in the two models, as highlighted in the previous part. The bias increases with height above 800 m in both of the 2.2 km models (Fig. 5 of the supplementary materials) and areas above 1500 m in this region show a wet bias of around 30–70%. Although this region tends to be more biased in the 2.2 km mean, it shows a better performance for the distribution in the western part of the mountain range and worse in the northeastern part (panels c and f). The wet bias partly comes from an overestimation of wet days for all intensities, which was quantified as an increase by 10–30% of the wet-day frequency (not shown). It should be stressed that the observations over high ground may underestimate precipitation by at least 10% as discussed in Sect. 8.

A second point which can be made is the overall improvement in the shape of the fractional contribution to the mean in both 2.2 km models south of the Alps and to a lesser extent north of the range. The improvement is about 30–50% compared to the 12 km performances. This is associated with a smaller mean bias in the ETH 2.2 km in this region. The UKMO is however dominated by a dry bias in this region, although there are improvements in the mean bias in Liguria.

Northern Germany, the Netherland and the UK coasts are also regions of improvement in both 2.2 km models. The other regions do not show any clear improvement in the distribution between the 2.2 and the 12 km models.

The mean bias is not very different across resolutions in the ETH model, and is worse in the UKMO 2.2 km model with an overall dry bias of 20–50% in northern Italy, northern Spain, France and western Germany. The fact that there is not much of a resolution-dependence in the model skill in capturing the shape of the distribution, but a large dependence in the mean indicates that the dry bias in the UKMO 2.2 km mostly comes from a reduction in the overall wet day frequency, which was quantified as being around 20%.

Fig. 4
figure 4

Mean daily precipitation bias in percentage of the observation values for the (top) UKMO and (bottom) ETH models in summer at a, d 12 km and b, e 2.2 km resolution. The best daily fractional index between the 2.2 and the 12 km for c UKMO and f ETH model (as described by Eq. (4)). Blue means the 12 km model is closest to the observations, red means the 2.2 km is closest. Values indicate percentage of improvement compare to FC(12 km, obs). Regions with means smaller than 0.5 \(\mathrm {mm/day}\)in the observations are masked out

Regarding the mean and shape of the distribution for hourly precipitation, Fig. 5 shows similar mean biases as for daily precipitation, which is reassuring given that the reference datasets are different and the time-period of comparison is not the same (Table 1). The signal in \(FC_{{best}}\), showing which model shows the best overlap with the observation in terms of fractional contributions is now much stronger than for daily data. There is a clear improvement by the 2.2 km models in terms of which intensities contribute to the mean for Switzerland in both models, especially at higher altitudes for the UKMO. In Germany, the overall tendency is to a worsening of the distribution in the 2.2 km models, especially strong on a southwest-northeast diagonal. A common improvement is however found in north and northwestern Germany. In the UK, the model performance is very spatially dependent and there is mostly a tendency of improvement along the coasts and of a deterioration inland.

Fig. 5
figure 5

Same as Fig. 4 for hourly precipitation

Overall, the 2.2 km models improve the daily and hourly distribution shape over the western Alps but show a tendency of having too many wet-days in the high-grounds, although the raingauge under-catch is hard to evaluate in this region. They seem to deteriorate the hourly distribution on flat land away from the coasts in the UK and Germany. The UKMO has also an overall dry bias linked with too few wet hours and days in France, Spain and northern Italy.

5 Shift to shorter and more intense wet-spell intensities in 2.2 km models

5.1 Shift to larger contributions from moderate and intense precipitation

We now look further into the distributions to evaluate which parts are most affected by the changes in resolution. We focus on hourly distributions, since the differences are clearer at this scale.

Both the differences in fractional and actual contributions against observations are shown in Fig. 6. They illustrate the very different behaviour of the 2.2 km models compared to the 12 km models on the hourly time-scales. The 12 km models tend to show a too large contribution to total precipitation from low-intensity events (below 2–3 mm/h) by 5–40% depending on countries, which is can be over-corrected in the 2.2 km, which tend to have too much rainfall contributed by moderate and intense (3–30 mm/h) events by 10–40%. There is a significant improvement in Switzerland and to a lesser extend in the UK in terms of fractional contributions but the 2.2 km ETH model overestimates precipitation in all bins in terms of actual contributions. In Germany, the 12 km models already have too large a contribution from intense events (>8 mm/h) by around 40% and the 2.2 km models have even larger contributions from events above 2 mm/h, resulting in a 40% increase in contributions from intensities above 2 mm/h. It increases the distribution biases against observations in this country. In both models, this is due both to a decrease in the actual contribution of low-intensity events and an increase in the moderate events. The decrease in actual contribution from low-intensity events is larger in the UKMO than in ETH and results in biases of − 8 to − 34% in the UKMO 2.2 km model to − 12 to − 15% in ETH (in the UK and Germany only) for this range of intensities.

Although the number of hourly datasets available is limited, the 12 and 2.2 km model contributions to total rainfall can be compared on the whole domain by plotting maps of the fractional contribution to total rainfall from low intensity events (<2 mm/h), moderate events (2–8 mm/h) and intense events (>8 mm/h), as shown in Fig. 7. These maps show that the shift in contribution of precipitation from low to moderate and intense precipitation in both 2.2 km models is present everywhere on land and is much larger than differences between the 12 km models. This leads to improvement in Switzerland and to a lesser extend in the UK but to larger biases in Germany.

5.2 Analysis of wet-spell durations and intensities

Figure 8 presents the distribution of hourly wet-spell frequencies by duration (in hours) and mean intensity over the wet-spell for the available observation datasets and the four models. A wet spell is defined as consecutive hours with precipitation rates larger than 0.1 mm/h at a single grid-point. For the observations, the wet spell frequency is shown and for the models we show the difference in the number of wet spells per year in each intensity/duration bin between the model and the observations normalised by the number of wet spells per year in the observations. This way, a positive difference between model and observations in a given bin reflects an overestimation of wet spells in this bin, not just a larger share of this bin in the wet-spell distribution. We also show percentage differences in the number of wet-spells against the observations in each panel title.

In all three countries, the 2.2 km models increase the frequency of short-lasting (<10  h) moderate to intense (average intensity of 1–20  mm/h) events and decrease the share of long-lasting (>5  h) low-intensity (<1 mm/h) wet spells compared to 12 km models. The latter effect is especially strong in the UKMO 2.2 km model. As a result, the 2.2 km tend to underestimate the long-lasting weak wet-spells contrary to the 12 km models which overestimate them: the 2.2 km models yield better results for these events in all countries for the ETH2.2 km and only in Switzerland and to a lesser extent the UK for the UKMO2.2 km. The total number of wet-spells generally decreases from 12 to 2.2 km, the effect is more pronounced in the UKMO 2.2 km due to the former point. The short-lasting moderate to intense wet-spells tend to be underestimated in the 12 km models and overestimated by the 2.2 km models (except in the UK). Improvement for these high-impact events occur for the UK and Switzerland (only for the ETH model).

The ETH 2.2 km model also decreases the occurrence of short-lasting wet-spells whereas the UKMO 2.2 km increases these occurrences compared to the 12 km model: in this model, low-intensity wet spells become shorter.

Note that the UKMO 12 km model shows intense and very short-lasting (\(<3\,\hbox {h}\)) wet spells, in disagreement with the German and Swiss datasets but not the British one, this is probably due to grid-point storms.

5.3 Changes in the tail of the precipitation distribution

Looking at the representation of intense events, Fig. 7 shows a larger contribution from intense events (> 8 mm/h) to total precipitation in the 2.2 km models, especially in the ETH 2.2 km where these events can represent up to 20% of the mean, as also shown in Fig. 6. The average top 1% of all hours shown in Fig. 9 shows that the increase in contribution from the moderate and intense events in the 2.2 km models is partly due to more intense hourly rainfall in both models and not only linked with a decrease in number of low-intensity hours. This is again an improvement for Switzerland and the UK and a deterioration for Germany, where this index is overestimated by 10–30% in the UKMO2.2 km and 10–50% in the ETH 2.2 km. This is not the case for daily precipitation on flat land where this diagnostic does not show a large intensification (not shown).

Fig. 6
figure 6

Differences in the fractional and actual contribution of hourly precipitation between models and the observations (JJA) for different countries (left: actual contribution, right: fractional contribution). See Sect. 3 and Fig. 2 for details about the method. a Germany, b Switzerland, c United Kingdom (only points where less than 30% of data is missing in the observations are taken into account)

Fig. 7
figure 7

Fractional contribution (ratio of actual contribution on mean precipitation) of three bin categories in summer: top (< 2 mm/h), middle (from 2 to 8 mm/h), bottom: above 8 mm/h. From left to right: observations, UKMO-12 km, ETH-12 km, UKMO-2.2 km, ETH-2.2 km

Fig. 8
figure 8

Frequency of wet spells in summer in different duration and intensity bins for the a UK, b Switzerland, c Germany. In each panel, observational datasets are shown as reference and model differences with the observations are shown as indicated in the panel titles (see Sect. 5.2 for details). The number written above the observation plots is the average number of wet spells per grid point per season and the percentage indicated above each model panel is the percentage difference in number of wet spells between models and observations

Fig. 9
figure 9

Average of values above the 99th percentile of all hours in summer. Top row: UKMO 12 km (left) and 2.2 km (right) models (percentage difference with the observations), bottom row: ETH 12 km (left) and 2.2 km (right) models (percentage difference with the observations); right column: observations (\(\mathrm {mm/day}\))

5.4 Diurnal cycle in summer

Finally, Figs. 10 and 11 respectively show the amplitude and the phase (hour of the maximum precipitation in local time) of the mean diurnal cycle at each grid-point. They show stronger amplitudes in the 2.2 km models over high orography (> 1500 m) compared to the 12 km models, especially in the Swiss, Austrian and north-Italian Alps. According to the Swiss and German datasets over the Alps, this is an improvement, although the amplitude may tend to be too strong in the convection-permitting models. Both models also generally show larger amplitudes of the diurnal cycle on lower level topography (Massif Central, Appenines, Dinaric Alps), where MCSs are often triggered (Morel and Senesi 2002). The UKMO and to a lesser extent the ETH 2.2 km models also reproduce the larger amplitude of the diurnal cycle in southern Germany along the Alpine foothills, where MCSs are observed (Hagen and Finke 1999; Kaltenböck 2001).

Figure 11 shows the better timing of the peak precipitation in the 2.2 km models, the peak being shifted from late morning-early afternoon in parameterised models to mid-late afternoon in the convection-permitting models, which is more realistic, in line with Ban et al. (2014) and Fosser et al. (2015). It is worth noting that the UKMO-2.2 km still produces precipitation too early in the day in the Swiss Alps (around 2 p.m.–4 p.m.), whereas the ETH-2.2 km model is in better agreement with the observations with a peak between 4 p.m. and 8 p.m. The later peak of ETH 2.2 km in the Po valley is also in line with observations presented in Nisi et al. (2016). Generally, the UKMO 2.2 km model tends to produce earlier afternoon peaks by about 2 h than the ETH 2.2 km model, further away from the observations. Both models reproduce well the spatial gradients of the hour of maximum precipitation in the southwestern coasts of the UK.

Fig. 10
figure 10

Amplitude of mean FL diurnal harmonic fit to the diurnal cycle in summer (maximum–minimum) (FL mm). Top row: UKMO 12 km (left) and 2.2 km (right) models, bottom row: ETH 12 km (left) and 2.2 km (right) models; right column: observations

Fig. 11
figure 11

Hour FL of maximum precipitation of FL the mean FL diurnal cycle in summer (local time). Top row: UKMO 12 km (left) and 2.2 km (right) models, bottom row: ETH 12 km (left) and 2.2 km (right) models; right column: observations

6 Mediterranean heavy precipitation events

In autumn, the heaviest precipitation events in Europe occur on the Mediterranean coasts, as illustrated by the average of rainfall on the top 1% of all days shown in Fig. 12e. In this figure, we use daily CMORPH observations (2001–2008) (see Sect. 1) as a complement to the daily precipitation datasets for this metric. This satellite-derived product is not as reliable as daily observation products and not as high resolution (0.25\(^\circ\)), but it provides some estimate of convection over the sea and in the regions not covered by high resolution datasets, although it was shown to underestimate coastal heavy precipitation events in this region (Stampoulis et al. 2013). This can also probably be seen in the sharp transition between high values in Italy in the Appenines in the Alpine dataset and lower values in CMORPH.

Regions particularly hit by heavy precipitation events are the Valencian country in Spain, the southern part of the Massif Central (Cévennes) and the Alps in France, the Ligurian region in Italy, the whole southern edge of the Alps and the Dinaric Alps. Intense convection also occurs in the Gulf of Lions and the Tyrrhenian Sea. Liguria, most of Italy and the Dinaric Alps were identified as regions with rather large model biases in the extremes in convection-parameterised models (Berthou et al. 2016; Cavicchia et al. 2016; Fantini et al. 2016).

6.1 Contribution of intense events to mean precipitation in autumn

Figure 12 shows the \(p99_{{avg}}\) metric for all the models. The two 2.2 km models seem to actually converge to a solution closer to the observations compared to the 12 km models which differ from each other. The convection-parameterised models have very different biases: the UKMO-12 km model shows very intense wet biases on the upslope side of all mountain ridges and on the coasts, while the ETH 12 km model underestimates this metric by around 30–50%. The ETH 2.2 km is in better agreement with the observations and the UKMO 2.2 km mostly shows stronger intensities in northern Italy. All models show stronger precipitation in the coastal Pyrenees compared to the observations. The 2.2 km show stronger precipitation in the Valencian country, in better agreement with the observations.

Over the sea, precipitation maximum in CMORPH occurs in the Gulf of Lions and the Thyrrenian Sea whereas it is maximum in the Ionian Sea in the UKMO 2.2 km and in the Thyrrenian Sea in ETH 2.2 km. Precipitation is more intense over the sea in each 2.2 km model compared to its 12 km counterpart. This suggests that convection is more easily triggered over the sea away from the influence of the orography or the coasts in the 2.2 km models.

Fig. 12
figure 12

Average of values above the 99th percentile of all days in autumn (SON) in mm/day. a UKMO 12 km, b UKMO 2.2 km, d ETH 12 km, e ETH 2.2 km and f available observations (composite of CMORPH and gridded regional products, as shown in panel c). Yellow area in panel c shows the domain of the case study in Figs. 13 and 14

6.2 Case study: 8–9 Sept. 2002 in Southern France

Having examined the climatological differences between the 12 and 2.2 km models, we now focus on a single case study to illustrate how processes are represented differently across resolution. The chosen case is a Mediterranean heavy precipitation event which occurred on the 8th and 9th Sept. 2002 in the Gard region in Southern France. This case was chosen for three main reasons: first, it is well documented (Delrieu et al. 2005; Anquetin et al. 2005; Nuissier et al. 2008; Ducrocq et al. 2008). Second, it was strongly forced synoptically (Nuissier et al. 2008) so we can expect it to be present in the climate models (which only receive atmospheric information on the observed state at the lateral boundaries) and third, cold pool interactions with the mesoscale environment played an important role in setting the location and intensity of the event, so we may expect the 2.2 km models to behave differently from the 12 km models (Ducrocq et al. 2008).

Over the two days of the event, maximum rainfall of 600–700 mm was recorded (Fig. 13e ). The meteorological environment of the heavy rainfall event was characterized by an upper-level trough centred over Ireland and extending meridionally to the Iberian peninsula, progressively veering to a northwest, southeast axis. It generated a south-westerly diffluent flow over south-eastern France. An associated surface cold front, first located over western France, moved progressively eastward. Convection first formed well ahead of the front in the warm sector, where a low-level south-easterly flow prevailed and was later reinforced by embedded convection in the front. Figure 13 shows that for both 12 km models maximum precipitation falls on the southeast facing slopes of the Cévennes. In both 2.2 km models, precipitation occurs both on the slopes of the Cévennes and in the Rhone valley, the latter being where the maximum in the observations is found. All models underestimate the precipitation in the Rhone Valley, but the 2.2 km models have smaller negative biases.

The UKMO climate models show different time-evolutions of the surface cold front and first generate precipitation over orography, in association with a strong temperature gradient, on the afternoon of the 8th (this differs from the real event which already shows cold pools and precipitation in the valley by the afternoon of 8th, not shown). The 500 hPa synoptic situation is closer to ERA-interim in the 2.2 km model than in the 12 km model, probably as a result of domain size (not shown). The UKMO 12 km model mostly shows orographic precipitation and convection embedded in the cold front during the whole event. In the UKMO 2.2 km model, following the triggering of precipitation over orography, convection-induced cold air accumulates in the Rhone valley, leading to the formation of a mesoscale cold front. By the morning of the 9th, convective cells are triggered on the edge of the cold pool (Fig. 14) which gradually propagates upstream of a 50–60 knot southerly flow, maintaining convective cells in the valley in the 2.2 km model. There is no hint of interaction of the flow with a cold pool at any stage of the event in the UKMO 12 km model (not shown). The more realistic positioning of the rainfall maximum, and higher rainfall totals, in the 2.2 km models therefore seems to be related to their ability to represent cold pools and some form of organised convection. Given this is just a single case study, and we would not expect the timing or position of rainfall to be exactly captured across models, it is not possible to make any definite conclusions. However, the results are illustrative of the potential for improved representation of mesoscale processes and associated extreme precipitation events at convection-permitting resolution.

Fig. 13
figure 13

2-Day total precipitation between 08/09/2002 and 09/09/2002. The 12 km models, 2.2 km models and SAFRAN observations are respectively on the left, centre and right. Upper and lower row are for UKMO and ETH simulations. Green lines outline surface height above 500 and 1000 m for the UKMO 12-km simulation on which all models and observations are regridded. Maximum and spatial mean are also given. The domain corresponds to the box in Fig. 12

Fig. 14
figure 14

UKMO 2.2 km (upper panelFL ) and 12 km (lower panelFL ) and model-simulated snapshots of 3h-accumulated precipitation (thick black lines; 10, 20, 50 mm/3 h), 925 hPa wind (barbs; knots) and virtual temperature (colour shading). White space mask when 925 hPa isobar is below ground

7 Discussion and conclusion

This first intercomparison pan-European CPMs confirms and builds on previous studies on smaller domains or with single models. Quantitatively we find that the largest precipitation differences between CPMs and 12 km parameterised models occur at hourly time-scales in summer in most regions. Regions of high topography show the largest differences in mean precipitation at the convection-permitting scales and the Mediterranean coasts and sea are most affected in terms of precipitation distribution, especially in summer and autumn.

The two pan-european CPMs behave similarly in terms of differences in precipitation distribution at the hourly timescale in summer compared to 12  km models. Mean precipitation comes from an increased contribution of short-lasting moderate and intense events and a decreased contribution of longer lasting low-intensity events everywhere. This leads to an overall improvement compared with the 12 km models in Switzerland (also found in Ban et al. (2014); Lind et al. (2016)) and parts of the UK (also in Kendon et al. 2012) but deteriorates the distribution in most of Germany with too much moderate and intense precipitation, unlike the findings of Fosser et al. (2015) who evaluated their model against hourly raingauges in Southwestern Germany. The lack of low-intensity events in both models is especially large in the UKMO 2.2 km model and is responsible for a 10–30% dry bias in France, Spain and Italy in this model.

The daily precipitation distribution is mostly affected by resolution changes in the Alps, in northern Italy and near the coasts (UK/Germany). The Austrian Alps show a deterioration of the distribution while the southwestern Alps and northern Italy benefit from higher resolution. Mean precipitation is increased over the Alps and becomes larger than in the observations. This bias increases with height above 800 m in both 2.2 km models and it is unclear which part is due to observation uncertainties or model deficiencies (Lind et al. 2016 yield similar results). Mediterranean intense events in autumn at the daily scale are better represented by the 2.2  km models, which converge to a solution closer to the observations in terms of location and intensity than their 12 km counterparts.

The phase of the diurnal cycle is better represented in the CPMs but the UKMO-2.2 km has still too early a peak over orography. This is a well-known improvement in CPMs due to the fact that convective instability takes more time to build-up as it is not consumed by parameterised convection which tends to start convection around midday (Kendon et al. 2012; Prein et al. 2013; Ban et al. 2014; Fosser et al. 2015). Both CPMs have an enhanced amplitude over orography compared to the 12 km models, which is an improvement.

Regarding model differences, the UKMO-2.2 km has a much reduced wet-day frequency compared to the UKMO-12 km, which is a clear bias compared to the observations; this is not the case in the ETH model. It is not clear whether it comes directly from resolution changes. One of the model differences that we investigated is the the way saturated layers of soil are treated. At higher resolution, when the top layer of soil is saturated, excess water disappears into the surface run-off whereas it is drained into the second layer in the 12 km. Initial sensitivity tests have shown that modifying the treatment of saturated layers moistens the lower soil layers slightly, but has negligible impact on the surface soil moisture (not shown) and the surface climate (supplementary material). We note that the impact of soil moisture infiltration rates being too low in the UKMO models, due to the use of Van-Genuchten hydraulic equations, may impact the 2.2 km model differently to the 12 km model, given in the former rainfall is more intense and hence the surface layer is more likely to become saturated. Initial tests, however, suggest the impact of changing the hydraulic equations on the surface temperature is small, with warm/dry biases in the UKMO-2.2 km persisting. Thus it is possible that the intense/intermittent nature of rainfall in the 2.2 km model is responsible for dry soil conditions and associated warm temperature bias over Eastern Europe but further work looking at more variables such as the work of Brisson et al. (2016) on clouds is needed. In the ETH model such an effect is less apparent, possibly due to the use of a shallow convection parameterisation in this model. Other regions such as the UK are less sensitive, as soil moisture is not close to critical value for limiting evaporation. It should be noted that Liu et al. (2016) using ERA-interim driven WRF 4 km simulations also show a warm and dry bias in the Central US in 13-year long simulations over the US.

In this study we have shown that two 2.2 km convection-permitting models yield qualitatively similar differences to the precipitation climatology compared to 12 km models, despite using different dynamical cores and different parameterization packages. Its also highlights that both convection-permitting models will need to address how to better balance the increased number of moderate to intense events and the decreased number of low-intensity events, which are needed to improve the 12 km model hourly distributions but are overcompensated in both models.

Work is on-going to introduce a scale-aware convection parameterisation in future model versions of the UKMO, which would enable some sub-grid convection. Work on the boundary layer scheme and its coupling with convection is also on-going.

This intercomparison study would benefit from the availability of new generations of hourly precipitation datasets. Future work will examine whether there are similarly robust signals of future precipitation change across different CPMs, reducing uncertainty in projections of intense events at hourly and km-scales. To this end, the CORDEX-Flagship pilot study on CPMs is a promising initiative, allowing comparison of more CPMs beyond the two available for analysis here.