1 Introduction

Climate change is likely to impact the magnitude and frequency of extreme precipitation (IPCC 2012). With a warmer climate, the atmosphere is able to hold more moisture, which is expected to increase the intensity of extreme precipitation events (Trenberth et al. 2003). This is well aligned with findings from previous studies, which have found an increase in the intensity of extreme precipitation events (Christensen and Christensen 2003; Kendon et al. 2014; Prein et al. 2017c). Less certain is it how climate change will impact the frequency and duration of extreme precipitation events, as many factors are controlling these, including large-scale circulation patterns (Trenberth et al. 2003). Information on future extreme precipitation is needed to adapt to climate change, including building resilient cities and minimising flood risk, and to inform mitigation decisions (Semadeni-Davies et al. 2008; Urich and Rauch 2014; Rosenzweig et al. 2019).

Climate models are the main tool to project and understand changes in future climate, including extreme precipitation (Frei et al. 2006), with the scale at which changes occur being of great importance. Extreme precipitation at small temporal and spatial scales can have major impacts on society and cause pluvial flooding (Archer and Fowler 2015; Thorndahl et al. 2017). Due to the importance of short-duration rainfall extremes and the scale on which the processes leading to extreme precipitation occur, very high resolution climate models are needed to provide reliable future projections (Kendon et al. 2012; Ban et al. 2014; Chan et al. 2014b; Sunyer et al. 2017).

Regional changes in extreme precipitation can be inferred from three types of climate models; high resolution Global Circulation Models (GCMs) with grid-spacing ~ 50 km or less, Regional Climate Models (RCMs) with a grid-spacing of ~ 10–50 km, with examples down to 5 km (Lucas-Picher et al. 2012), and Convection-Permitting Models (CPMs) with a grid-spacing < 5 km. While CPMs are also regional climate models, in the sense that they span a limited area domain, they also differ from traditional RCMs by resolving convection explicitly as outlined below. Throughout this study we use the terminology RCM to describe coarser resolution regional models and CPM to describe higher resolution convection-permitting models. Both RCMs and CPMs are currently used together, as there is often limited availability of CPM datasets and high computational costs associated with running CPMs (Rummukainen 2010; Prein et al. 2015). As convective precipitation has a spatial scale smaller than the RCM grid scale, a convective parameterisation scheme is needed, which aims to represent the average effects of convection on the model grid. In contrast, CPMs represent convection explicitly, often using no convective parameterisation scheme, due to the very high resolution (e.g. Kendon et al. 2012). In particular, deep convective parameterisation is typically not used in CPMs, whilst the use of shallow convection parameterisation varies between studies (Kendon et al. 2017). Several studies have found that the CPMs perform better than RCMs in terms of the diurnal cycle of rainfall and the intensity, frequency and duration of sub-daily extreme precipitation (Kendon et al. 2012; Prein et al. 2013; Chan et al. 2014b; Ban et al. 2014). Extreme precipitation simulated in some RCMs has also been shown to be impacted by grid-point storms, which are physically implausible events that occur when the scale of convection approaches the model grid-scale and the assumptions of the convective parameterisation break down (Chan et al. 2014b). RCMs with a grid-scale of approximately 10 km are within the so-called “grey-zone”, where the assumptions underlying the convective parameterisation become invalid (Molinari and Dudek 1992). We note, however, RCMs using scale-aware convective schemes designed to operate in the grey zone would not be expected to have grid-point storms (Kendon et al. 2021).

A tracking algorithm can provide information on the characteristics and evolution of precipitation events, which is extremely valuable in assessing the underlying processes for rainfall generation; yet few studies have applied tracking algorithms to long continuous CPM simulations (Caine et al. 2013; Prein et al. 2017a, b; Purr et al. 2019; Li et al. 2020; Caillaud et al. 2021). Prein et al. (2017a) found an increase in both the intensity and size of future mesoscale convective systems (MCS) over North America analysing a CPM, indicating a doubling in the risk of flooding. While Caine et al. (2013), Prein et al. (2017b), Purr et al. (2019), Li et al. (2020) and Caillaud et al. (2021) analysed how well precipitation events are simulated in CPMs compared to observations, none of the studies compared results from the analysed CPMs with RCM simulations. Few studies to date have applied tracking algorithms to both CPMs and RCMs to identify differences in extreme precipitation event characteristics and evolution across model resolution (Crook et al. 2019).

In this study we explore and quantify the differences in extreme event characteristics between a CPM and an RCM, from the UK Met Office, over a northern European region. We examine the difference in the tracked extreme events (consecutive rainfall areas with intensities above 1 mm/h) between the CPM and RCM and develop a new method to simplify area-intensity evolution in diagrams. This method enables us to represent the typical event evolution across many events with different durations and allows comparison of the representation of extreme events between models or different time periods. Due to the lack of high-resolution (temporal and spatial) gridded precipitation observations, it is not possible to provide an observational reference for event evolution and thus we focus on the differences between the models. In particular, E-OBS, ERA5, IMERG satellite data and radar products were all considered, but not used due to them being of too coarse resolution, not continuous and/or of low data quality over the study area.

2 Methods

2.1 Climate model data

Two models are compared in their representation of extreme events, an RCM with a 12 km horizontal resolution, referred to as “RCM12” and a CPM with a 2.2 km grid spacing, referred to as “CPM2”. The RCM12 and CPM2 are configurations of the Met Office Unified Model (UM), developed by the UK Met Office and are described further in Berthou et al. (2018). Key differences between the two models are as follows:

  • The RCM12 (UM version 10.3) is based on a climate version of the UM (Williams et al. 2018) and uses a convection parameterisation based on Gregory and Rowntree (1990). The RCM12 has a model time step of 4 min and uses a prognostic cloud fraction and condensate scheme (Wilson et al. 2008).

  • The CPM2 (UM version 10.1) is based on the operational UKV Met Office model for numerical weather predictions (Clark et al. 2016) and runs without any convection parameterisation (both shallow and deep are switched off). The CPM2 has a model time step of 1 min and uses the diagnostic (Smith 1990) cloud scheme. The CPM2 includes prognostic graupel, which is a second category of ice that has higher fall speeds and is typically found in convective clouds (unlike the RCM12 which just has a single category of ice). The CPM2 uses a new blended boundary-layer parameterization (Boutle et al. 2014a, b).

The RCM12 and CPM2 are in this analysis driven by the ERA-Interim reanalysis (Dee et al. 2011) and both models directly downscale the driving model, hence the CPM2 model is not nested in the RCM12. Both model simulations span a pan-European domain: the RCM12 and CPM2 model domains are shown in Fig. 1. For both models, precipitation output is available at hourly resolution. 10 years of data from 1999 to 2008 is analysed. All analyses are carried out with model output regridded to a common 12 km grid (with mass conservation) for a direct comparison of results. Furthermore, all analyses are performed also for model output regridded to a common 25 km grid, results from these analyses are found in Supplementary Sect. 4.

Fig. 1
figure 1

Overview of data domain. Pink domain: 12 km RCM12 data domain. Orange domain: 2.2 km CPM2 data domain. Blue domain: Domain where tracking of events has been done. Red box: Case area for this study

2.2 Tracking algorithm DYMECS

The DYMECS tracking algorithm was applied to output from both models at 12 km resolution to identify precipitation events and is described in detail in Stein et al. (2014). The algorithm was developed for UK radar and Met Office convection-permitting forecast model precipitation data and has subsequently been applied to climate model data by Crook et al. (2019). The algorithm was applied to rainfall fields within the common part of the dataset (Tracking Domain, Fig. 1), removing 90 grid points (12 km resolution) from each side of the boundaries, still covering a large part of Europe. Events are defined as continuous rainfall fields above a certain threshold and labelled based on “local table method” (Haralick and Shapiro 1992). Here an intensity threshold of 1 mm/h was used and with no areal threshold, allowing an event to be as small as one grid cell. Events are tracked between two consecutive images (t and t + 1) by displacing tracked elements in time t into time t + 1 using the velocity field \(\mathcal{V}\)(t, t−1). The velocity field is based on windowed cross-correlations, dividing each image into 18 × 18 grid box windows (Rinehart and Garvey 1978). Analysing the overlapping areas between the advected image of t and the image of t + 1 tracks are identified using an overlap criterion of 0.6 (Stein et al. 2014). Settings for the algorithm are similar to the settings Crook et al. (2019) used for precipitation tracking, though with a slight adjustment of the size of the grid box window (18 × 18 instead of 20 × 20, to fit the number of grid points in the analysed domain).

The tracking algorithm considers birth, death, splitting and merging of events. Splitting describes the situation where an event in time step t overlaps sufficiently with two events in time step t + 1, hence the event splits into two events. When two events at time step t both overlap sufficiently with one event in time step t + 1, the events are considered to have merged. In both splitting and merging, the event with the largest overlap keeps the original event ID, while a new event ID is given in case of splitting (Crook et al. 2019). Splitting and merging of events are kept track of by marking which event IDs are linked to a given event ID. In this study, an event is defined by the period where it has the same ID in order not to account for the same part of an event twice.

The following event specific variables are extracted from the tracking algorithm and used to quantify extreme precipitation from the two models:

  • Centroid Location [lat, lon, t]: Centroid of fitted box around event area for each time step. Parenthesis giving description and dimensions of variable.

  • Maximum Location [lat, lon, t]: Location of the single maximum intensity cell for each time step.

  • Maximum Intensity [mm/h, t]: The maximum intensity of a single cell within the event for each time step.

  • Mean Intensity [mm/h, t]: Average intensity over grid cells within the event (only grid cells with intensities above 1 mm/hr are considered) for each time step.

  • Peak Maximum Intensity [mm/h]: Lifetime maximum intensity, based on the Maximum Intensity variable for each event.

  • Peak Mean Intensity [mm/h]: Maximum mean intensity over the lifetime of each event, based on the Mean Intensity variable.

  • Area [number of grid cells, t]: Number of grid cells which are included in the event (only grid cells with intensities above 1 mm/h are considered) for each time step.

  • Maximum Area [number of grid cells]: Lifetime maximum area for each event.

  • Box [lat min., lon min., n lat, n lon, t]: A rectangular box fitted around all grid cells (> 1 mm/h) within the event at each time step [t]. The box is used only to identify the location of the event (used for merging of events, Sect. 2.5), but not used in the tracking. The size and location of the box fitted around each event, is given by the bottom left corner [lat min, lon min] and size of the box [n lat, n lon].

2.3 Extreme event definition

Extreme events are sampled from a Northern European case area in order to be able to compare seasonality and movement of sampled extremes without mixing up different climatic zones. The Northern European case area is defined between 12° W to 20° E and 49 to 60° N (see Fig. 1). An event is considered within the case area if its Maximum Location is inside the case area at any time within the lifetime of the event. These events are all kept. The entire lifetime of the event is then treated as an event within the case area despite the possibility that the Maximum Location at some time steps is outside the case area. Extreme events which start and end at the boundary of the tracking area are included, even though these may suffer from boundary artefacts impacting the event evolution at the beginning or end of their lifetime. This is done to maintain the best possible extreme distribution in the case area.

Extreme events are sampled from the population of events within the case area for further analysis. Here, extreme events are defined based on their Peak Maximum Intensity (1-h intensity) within the case area, and the 10,000 most intense events are sampled in three bins, Top 100 (rank 1–100), Top 1000 (rank 1–1000), and Top 10,000 (rank 1–10,000). Extreme events are furthermore sampled and analysed within each season.

2.4 Event characteristics

Event characteristics are analysed for the Top 100 events in the RCM12 and CPM2 datasets considering four variables: Area, Maximum Intensity, Mean Intensity and Volume. To study the evolution in event characteristics for events with different lifetimes, the method proposed in Brisson et al. (2018) is used. Event lifetimes are normalised to a range between 0 and 1 and the event characteristic for each time step in the event is extracted. A second order polynomial is fitted to the event characteristic data for each of the Top 100 events. Brisson et al. (2018) also suggested a normalisation of the variables, introducing the term var′:

$$var_{t}^{'} = ~\frac{{var_{t} }}{{\overline{{var}} }}$$

where var is either Area, Maximum Intensity, Mean Intensity or Volume, vart is the variable at the given time step and \(\overline{{var}}\) is the mean value of the variable for the given event. Results with no normalisation of the variables (only normalisation of lifetime) are presented in Sect. 3.3, while results with normalisation of the variables are presented in the Supplementary Sect. 3.

2.5 Merging of events

Merging of events is applied as a post processing step based on the results of the tracking algorithm. Events are merged if the Box around two or more events are spatially overlapping or within a distance of 48 km (4 grid points) from each other at a single time step. Events are then merged for the entire lifetime of the events. The merged event is given the event ID of the event with highest Peak Maximum Intensity, and information from both events is merged. Area still only considers grid cells with intensities above 1 mm/hr. Centroid Location is calculated based on the new Box fitted around the merged event. The merging is done recursively (due to updating of the Box around the merged event), until no further events are merged.

2.6 Event volume

The total volume [m3] of rainfall associated with each event i, is defined as:

$${\text{Event}}\;{\text{volume}}_{i} = \mathop \sum \limits_{{{\text{t}} = 1}}^{{{\text{life}}_{{\text{i}}} }} {\text{Mean}}\;{\text{Intensity}}_{{i,t}} \cdot {\text{Area}}_{{i,t}}$$

The Event volume is calculated over the course of the entire event period, t = 1…life_i, defined as the period where the event has the same ID, disregarding splitting and merging with events of other IDs. For events within the case area the entire lifetime of the event is considered. The accumulated volume associated with events for a given area is defined as:

$${\text{Accumulated}}\;{\text{volume}} = \mathop \sum \limits_{{{\text{i}} = 1}}^{{{\text{nTracks}}}} {\text{Event}}\;{\text{volume}}_{i}$$

2.7 Simplified event evolution (SEE)

Area-intensity evolution diagrams have been used to describe the life cycle of the events, as seen in the case from numerical weather predictions (Keat et al. 2019). In this study, we suggest a method to simplify the evolution diagram across event durations, making it possible to create an average event evolution across numerous events and therefore suitable in a climate context. Here the metric is used to describe event representation in the two models. From the tracking algorithm the Area, Maximum Intensity and Mean Intensity time series are used to visualise the evolution of an event (see Fig. 2b, d). There are large variations in the event evolution between events and for longer lifetimes the event evolution can be more complex (shown in Supplementary Sect. 1). In order to compare event evolutions across datasets, the evolution is simplified into four points (see Fig. 2b, d):

  1. 1.

    Birth: Size and intensity when the event is first detected.

  2. 2.

    Peak intensity: Size and intensity at the point where the event reaches its peak intensity.

  3. 3.

    Maximum area: Size and intensity when the event reaches its largest size (defined as the horizontal area identified with the event [number of grid cells above threshold]).

  4. 4.

    Death: Size and intensity at the last time step the event is detected.

Fig. 2
figure 2

Event evolution over time an event in the CPM2 dataset (2002-06-19 14:00—2002-06-20 07:00). a: Storm track with indication of area of the event over time. c: Accumulated rainfall over the event duration (footprint). b: Event evolution over time in maximum intensity and area (dots indicate hourly time steps and colour indicate time proceeding), with simplified event evolution based on maximum intensity in black. d: Event evolution over time for mean intensity and area (dots indicate hourly time steps and colour indicate time proceeding), with simplified evolution for mean intensity in black

The simplified event evolution (SEE) is performed based on both Peak Maximum Intensity SEEmax (Fig. 2b) and Peak Mean Intensity SEEmean (Fig. 2d). Maximum Intensity is the maximum intensity of a single grid cell within the event for each time step, whereas Mean Intensity is the mean intensity for all grid cells included in the event for each time step.

The median SEE is calculated to compare the event evolution between different ranks of extreme events or between models. First a simplified event evolution is fitted to each event in a sample of extreme events. Then a median event evolution figure is created by finding the median of each of the four points (1. birth, 2. peak intensity, 3. maximum area and 4. death) within the individual fitted simplified event evolution figures, both for intensity (y-axis) and area (x-axis).

3 Results and discussion

3.1 Sampled extreme events

A total of 4,219,064 events were tracked in the RCM12 dataset and 6,456,733 events were tracked in the CPM2 dataset for the entire tracking domain (see Fig. 1). Events which did not reach a larger area than 1 grid point were removed, resulting in 2,494,326 RCM12 and 4,333,758 CPM2 events. Of these, 701,475 events (28%) were located in the case area in the RCM12 dataset and 1,457,943 events (34%) in the CPM2 data (see Fig. 1, red box). This corresponds to approximately 192 events per day in the RCM12 dataset and 399 events per day in the CPM2 dataset. Due to the definition of events as consecutive rainfall areas, events in this study must be considered distinctly different from large scale rainfall descriptions such as storms. The difference in number of events between the two models is further discussed in Sect. 3.2.

Ranks were chosen to sample extreme events, in order to accommodate the different number of tracked events between the two models. The 10,000 most intense events (based on the variable Peak Maximum Intensity) were sampled in three categories, Top 100 (rank 1–100), Top 1000 (rank 1–1000), and Top 10,000 (rank 1–10,000). Here an equal sample size is obtained between the two models, in the same way as sampling a specific number of events per year. Based on the pool of events from each of the two models, the Top 100, Top 1000 and Top 10,000 events correspond to percentiles ranging from 99.993 to 98.574 (Table 1). While the RCM12 dataset has largest maximum intensity for Top 100, the CPM2 dataset show larger intensity for Top 1000 and Top 10,000.

Table 1 Percentile related to the sampled extreme events (> 1 mm/h) and corresponding maximum intensities in mm/h, after merging Top100 events. Maximum intensities represent the maximum intensity located inside the case area, lower ranked events can have higher intensities outside the case area

3.2 Merging of events

Due to the different representation of rainfall in the two models, the number of events in the two models is not expected to be the same. Furthermore some of the expected difference between models can be explained purely by the difference in resolution. When comparing the same RCM12 and CPM2 simulations as studied here against observations, Berthou et al. (2018) found no clear signal of a better performance of one of the two models in terms of mean daily and hourly precipitation. Both models showed areas of better and worse performance compared to the other over the analysed areas of the UK, Germany and Spain. Results here, showing a very different number of events between the models (2,494,326 events in the RCM12 vs. 4,333,758 events in the CPM2), suggest a difference in how the tracking algorithm is able to define and track events in the two models. We note that this difference is not reduced when regridding to a coarser grid at 25 km resolution (Supplementary Sect. 4). Analysing periods with high intensity rainfall between the two models shows that rainfall is more scattered in the CPM2 dataset (see examples in Fig. 3). As the event definition in the tracking algorithm is based on a continuous area of rainfall (> 1 mm/h), this can lead to splitting events, which by eye could be classified as the same event. Here tracking is done using precipitation, while outgoing longwave radiation (OLR) is another well used method for tracking and detection of especially MCSs (e.g. Morel and Senesi 2002; Crook et al. 2019). OLR is smoother in space which would be likely to reduce the difference in the number of tracked events between the two models, but OLR tracking gives problems with false alarms as OLR is not a direct measurement of precipitation.

Fig. 3
figure 3

Footprint (accumulated rainfall) of periods with high intensity rainfall. a, b: Event 1 from 18:30 10/08-2007 to 18:30 11/08-2007 for RCM12 and CPM2 data, respectively. c, d: Event 2 from 05:30 19/09-2006 to 16:30 22/09-2006 for RCM12 and CPM2 data respectively

A scheme of merging events, as a post-processing step based on the tracking algorithm was tested on both model datasets. Events which at a certain time step were spatially very close to each other were merged with the aim of giving a better estimate of the number of independent events and reducing differences in the number of tracked events between the two models. However, the merging resulted in very large events and unrealistic event tracks due to rainfall often being spatially scattered over a large part of the domain. Therefore it was decided to only apply merging to events contributing to the sampled Top 100 set, in order to ensure that the 100 most extreme events are not spatially overlapping or in close proximity and therefore cannot be considered the same event. The process of merging Top 100 events was done recursively until none of the selected 100 events could be further merged. A total of 14 Top 100 CPM2 events and 19 Top 100 RCM12 events were merged. After the merging the Top 100 CPM2 events consists of 89 independent days while the RCM12 Top 100 events consists of 98 independent days.Footnote 1 The merging of the Top 100 events ensures a more similar sample of events are compared between the two models. While sampling by rank is expected to give a fairer comparison of extreme events between the CPM2 and RCM12, some CPM2–RCM12 differences might be explained by the fewer events tracked in the RCM12. This will be discussed in the following sections along with the results.

3.3 Event characteristics

Evaluating the characteristics of the Top 100 most intense events by Area, Maximum Intensity, Mean Intensity and Volume shows that the two models represent the Top 100 extreme events very differently. The RCM12 Top 100 extreme events have higher peak values for Area and Volume compared to the CPM2 Top 100 events, while the opposite is the case for Mean Intensity (Fig. 4). No large differences are seen in the evolution of Maximum Intensity for the Top 100 most intense events between the two models. The largest difference between the RCM12 and CPM2 events is seen when comparing the Area and Volume of the Top 100 most intense extreme events. Although we note that, if variables were normalised, no differences between the models would be detected (see Supplementary Fig. 6). This suggest that the difference is scaled with the mean value e.g. the difference in the Area variable between the RCM12 and CPM2 is due to the difference in mean Area over the lifetime of the events. Differences between CPM2 and RCM12 Top 100 events in Mean Intensity are more modest, but show higher Mean Intensity for CPM2 Top 100 throughout the lifetime of the events (Fig. 4c). The larger areas for RCM12 events is somewhat expected, and could be explained by the convective scheme smoothing out precipitation leading to fewer individual events (Fig. 4a). More surprising is it that the CPM2 events do not have higher Maximum Intensity compared to the RCM12 events, despite the convective scheme and higher original grid point resolution (Fig. 4b). The total Volume for the RCM12 Top 100 events are higher than for the CPM2 Top 100 events, which can be seen to mainly be influenced by the large difference in event area between the two models (Fig. 4d). Comparing these results to those where tracking has been applied to data regridded to 25 km shows only small differences (Supplementary Fig. 8). In particular, model differences in the evolution of event characteristics are in general the same for the 25 km data, although with smaller differences in Area and Volume, compared to those seen in Fig. 4 for the 12 km data.

Fig. 4
figure 4

Evolution of Area (a), Maximum Intensity (b), Mean Intensity (c) and Volume (d) for Top 100 most intense events in the CPM2 dataset (black) and RCM12 dataset (grey). Event durations are normalised. The 99% Confidence Interval (CI) are shown with dashed lines

3.4 Volume

The Accumulated volume of all tracked events within the tracking domain (see Fig. 1) is approximately 11% higher in the CPM2 dataset than in the RCM12 dataset (Table 2). In contrast, for events within the Case area (Northern Europe—see Fig. 1) the total Accumulated volume is similar (only 2% larger in CPM2 dataset compared to the RCM12 dataset, Table 2). Considering only extreme events, the picture changes: the Accumulated volume for RCM12 extreme events is larger than for CPM2 extreme events, with increasing difference for more extreme events (Table 2). For Top 100 events, the Accumulated volume for the CPM2 events is approximately 30% of the volume of RCM12 events (Table 2). The increasing difference in accumulated volume between the RCM12 and CPM2 for the most intense extreme events, suggests that this difference is not simply explained by the different number of tracked events between the two models. The same tendency is seen in the 25 km data (Supplementary Table 2), although with smaller differences between RCM12 and CPM2 events for the most severe extreme events.

Table 2 Total volume accumulated for events in the CPM2 and RCM12 dataset (after removing single cell events). Definition of Tracking area and Case area are seen in Fig. 1. For all cases the entire lifetime of the event is considered in the total volume

Considering all events in the case area, the contribution to the accumulated volume increases faster with maximum intensity in the CPM2 dataset compared to the RCM12 dataset (Fig. 5a), which is also seen for the 25 km data (Supplementary Fig. 9). This shows that the most intense events sampled with the tracking algorithm contribute a smaller fraction of the total volume in the CPM2 dataset compared to the RCM12 dataset. As the CPM2 extreme events in general have smaller areas than the RCM12 events (Fig. 4a), this explains the lower total volume in the extreme events for the CPM2 compared to the RCM12 (Table 2). The 10,000 events with the highest maximum intensity represent a bit less than 40% of the total volume in the CPM2 dataset, while for the RCM12 dataset these events represents more than 55% of the total volume in the dataset (Fig. 5b).

Fig. 5
figure 5

The contribution of events of increasing peak maximum intensity to the total accumulated volume (measured as the cumulative fraction) for all events (a) and for the 10,000 events with highest maximum intensity (b). Events are ranked by Peak Maximum Intensity. CPM2 dataset (black) and RCM12 dataset (grey). Marks indicate the rank 100, 1000 and 10,000 event for each of the datasets

3.5 Storm tracks

Tracks of the extreme events in the Northern European case area (Fig. 1, red box) are very different between the CPM2 and RCM12 (see Fig. 6). In the CPM2 dataset, the extreme events mostly occur over central Europe and southern Scandinavia (a–c) and tend to have a south to north (northward: 315–45°) direction of motion (Supplementary Fig. 4). By contrast, many of the extreme events in the RCM12 are located over the Atlantic Ocean and the British Isles (Fig. 6d–f) with a west to east (eastward) moving direction (Supplementary Fig. 4). Focussing on the maximum location inside the case area (Fig. 6c, f), CPM2 extreme events are mostly in the eastern part, whilst there is additionally a cluster of events in the western part in the RCM12. This indicates that some of the extreme events in the RCM12 are distinctly different from those in the CPM2. To understand these differences, the most intense extreme events in the CPM2 dataset are compared with tracks on the same day, with similar location and intensity, in the RCM12 dataset, and vice versa (see Supplementary Sect. 2). From comparisons of tracks between the two models, we find:

  • Extreme events from one dataset are rarely replicated by the other dataset, indicating completely different sets of extreme events in the two models.

  • Long event tracks in the RCM12 extreme set seem to be replicated well by the CPM2, though with notably lower intensities, indicating that the RCM12 extreme set includes a group of events, which according to the CPM2 are not extreme due to lower intensities.

  • CPM2 extreme events are largely absent in the RCM12, with no tracks in the RCM12 with a similar location and intensity on that day.

Fig. 6
figure 6

Storm tracks of Top 100 (rank 1–100, (a, d)) and Top 1000 (rank 1–1000 (b, e)) most severe events within the Northern European Case area. CPM2 dataset (ac) and RCM12 dataset (df). c, f: Location of the Peak Maximum Intensity within the case area for the selected Top 100 most severe events. Colours distinguish different event tracks, plotted by rank in reverse order: least intense plotted first (dark colours), most intense plotted last (light colours). Note: Only events which have high intensities within the case area are shown

If events were instead sampled by their Peak Maximum Area (sampling of spatially large, but not necessarily intense events), there would be no visible difference in storm tracks between the two models (results shown in Supplementary Sect. 3). The storm tracks of the spatially largest events have a large density of tracks over the British Isles and an eastward direction in both models. These tracks have very similar characteristics (both in terms of location and movement direction) as the tracks for events with the highest Peak Maximum intensity in the RCM12. Nevertheless, there is little overlap in events sampled by Peak Maximum intensity and Peak Maximum Area in the RCM12 dataset (4, 13 and 23% for Top 100, Top 1000 and Top 10,000), though this overlap is larger than in the CPM2 dataset (0, 3 and 14% for Top 100, Top 1000 and Top 10,000). The better agreement in location between models for the spatially largest events compared to the most intense events, and the larger overlap between large and intense events in the RCM12 dataset, suggest that the group of extreme intense events in the RCM12 not seen as intense by the CPM2 are large events. Analysing the size of the events in the western cluster in the RCM12 data shows that the events in this cluster are on average double the size of the events in the eastern cluster. One plausible explanation for this group of extreme events in the RCM12 data, could be that the high intensities come from grid-point storms, occurring within large area events, which is a well-known problem in some RCMs (Chan et al. 2014a).

3.6 Seasonal distribution

The seasonal distribution in the occurrence of extreme events shows a different pattern between the CPM2 and RCM12 dataset, again with largest differences for the most severe extreme events (Fig. 7a). The CPM2 dataset shows an increasing ratio of summer events on considering more extreme events (i.e. moving from Top 10,000 to Top 100, significant with a chi-square homogeneity test, p-value \(=5.6\times {10}^{-12}\)) which is not found in the RCM12 dataset. While the sample of CPM2 extreme events are highly dominated by summer events, RCM12 extreme events have a higher ratio of events from other seasons. Sampling extreme events by Maximum Area shows no difference in the seasonal distribution in occurrence between the CPM2 and RCM12 (see Fig. 7b). This confirms that similar events are sampled in the two models when selecting by Maximum Area, whereas this is not the case when selecting by Maximum Intensity. Analysing characteristics of MCSs over Europe, Morel and Senesi (2002) found a larger density of MCSs over land than sea, with a clear concentration in the eastern part of the case area. This suggests that the representation of tracking location is closer to observations in the CPM2 dataset compared to the RCM12. MCSs in Northern Europe were found to have the highest frequency between May and August (Morel and Senesi 2002) which is in agreement with the seasonal distribution in both models, although more apparent in the CPM2. Morel and Senesi (2002) define MCSs as events reaching an area above 10,000km2, while an areal threshold of 288km2 (excluding single cell events) is used in this study with no attempt to distinguish between MCSs and non-MCSs. Yet Top 100 extreme events still reach an average area of 750 grid cells (108,000km2) for RCM12 events and 200 grid cells (28,000km2) for CPM2 events (Fig. 4).

Fig. 7
figure 7

Seasonal occurrence of maximum intensity events (a) and maximum area events (b). Events sampled in CPM2 dataset in black and events sampled in RCM12 dataset in grey

3.7 Median simplified event evolution

When comparing the median Simplified Event Evolution (SEE) of the extremes for the RCM12 and CPM2, it is clear that the event evolutions between the two models are very different (see Fig. 8). For Top 100 SEEmax the RCM12 extreme events reach higher intensities than the CPM2 events, while for Top 1000 the median Peak Maximum Intensity is almost similar between the two models (Fig. 8a). When including more events (e.g. Top 10,000) the CPM2 extreme events reach higher median Peak Maximum Intensity than the RCM12 events (Fig. 8a). This somewhat surprising higher Peak Maximum Intensity in the RCM12 dataset for the most severe extreme events is most likely caused by grid-point storms. These grid-point storms often occur within large scale areas of heavy precipitation, where the convective parameterisation breaks down, resulting in intensities above 100 mm/hr and a very low parameterized convective rainfall fraction for one or a few grid cells compared to surrounding grid cells. These grid-point storms are a known problem in some RCMs, which operate at grid scales smaller than those for which the convective parameterisation scheme was designed (see e.g. Chan et al. (2014b)). Both in size and intensity it is clear that the Top 100 extreme events in the RCM12 data have a very different event evolution than the Top 100 extreme events in the CPM2 data. These differences between models, both in terms of shape and values, become smaller for the median SEEmax for Top 1000 and Top 10,000 events. While the area is still larger for the RCM12 events, the Maximum Intensity becomes higher for the CPM2 events (Fig. 8a). For Top 1000 and Top 10,000 SEEmax the difference in area between the two models could reasonably be described by the difference in how rainfall is modelled between the two models (convection being parameterised or not) and by the difference in the original resolution of the models. The Top 100 SEEmax confirms that a large part of the extreme events in the RCM12 dataset are very large events, and that these are not found in the CPM2 dataset. Together with findings from Sect. 3.5 and Sect. 3.6, we deduce that the RCM12 is overestimating the Maximum Intensity of these very large events due to the presence of grid-point storms (supported by an analysis of the convective fraction of rainfall above 100 mm/h in RCM12 events, Supplementary Sect. 5). Regridding data to 25 km shows similar intensities between models for Top 100 SEEmax while higher intensities for CPM2 Top 1000 and Top 10,000. Areal differences follow the pattern seen in the 12 km data with much larger arears for RCM12 Top 100 SEEmax compared to CPM2 Top 100 SEEmax.

Fig. 8
figure 8

Simplified median evolution based on maximum intensity SEEmax (a) and mean intensity SEEmean (b). Data for both CPM2 (black triangles) and RCM12 (grey circles). Severe events sampled based on maximum 1-h intensity, Top 100 (dashed line), Top 1000 (dotted line) and Top 10,000 (solid line)

For median SEEmean the CPM2 extreme events have higher Mean Intensities than the RCM12 events, for all percentiles (Fig. 8b). As CPM2 extreme events are approximately half the size of corresponding RCM12 extreme events, the CPM2 extreme events can be characterised as small and intense compared to the RCM12 extreme events. For the CPM2 Top 100 events, which by Maximum Intensity are less intense than RCM12 events, the higher Mean Intensities indicate that the CPM2 events overall are more intense than the RCM12 events, while the RCM12 events seem to have a more peaked intensity distribution (again consistent with these being associated with grid-points storms in some cases).

For the seasonal median SEE the Top 100, Top 1000 and Top 10,000 events for each season are sampled. Seasonal median SEEmax for Top 100 shows the largest difference between the RCM12 and CPM2 dataset for autumn and winter events (Fig. 9a, d). By contrast spring and summer events are less different for the Top 100 events (Fig. 9b, c). The same pattern is observed for the seasonal SEEmean for Top 100 events (Fig. 9e–h). For each seasons’ Top 100 events, the CPM2 exhibits lower intensity in the SEEmax compared to the RCM12, which corresponds well with the results found in Fig. 8a. Analysing seasonal SEEmax and SEEmean for Top 1000 and Top 10,000 show the largest difference between datasets for summer events with higher intensities in the CPM2 dataset and larger area in the RCM12 dataset (Fig. 9). Interestingly SEEmax winter events in Top 1000 and Top 10,000 have larger intensities in the RCM12 data than the CPM2 data, as opposed to the other seasons (Fig. 9a). The absence of the convective parameterisation scheme in the CPM2 is expected to result in a large difference in summer events between the two models, as it is in this season most convective events develop in the case area. The low winter intensities and small areas of the CPM2 compared to the RCM12 (mostly for Top 100 and Top 1000) could indicate that the difference in rainfall modelling in the two models also plays a large role for winter events.

Fig. 9
figure 9

Simplified evolution based on maximum intensity (SEEmax (ad)) and mean intensity (SEEmean (eh)). a, e: winter events (DJF), b, f: spring events (MAM), c, g: summer events (JJA) and d, h: autumn events (SON). Top 100, Top 1000 and Top 10,000 within each season is shown with dashed, dotted and solid lines respectively. Data for both CPM2 (black triangles) and RCM12 (grey circles) are shown

4 Conclusion

The difference in the representation of extreme events between an RCM12 and a CPM2 was analysed by applying a storm tracking algorithm to the two datasets. Extreme events in the Northern European case area were found to have very different storm tracks, both in terms of location of the tracks, location of the peak maximum intensity, and movement direction. The largest differences were found for the most severe extreme events, indicating completely different sets of extreme events between the two models. This corresponds well with a recent ensemble study of CPMs and RCMs, which found the greatest improvements in the performance of CPMs for heavy precipitation events (Ban et al. 2021). It is also consistent with earlier studies showing the improved representation of hourly precipitation extremes in CPMs, due to the improved representation of convection (Kendon et al. 2014). For the most intense RCM12 events, these were to a large extent captured in the CPM2 but with lower intensities, whilst the most intense CPM2 events were largely absent in the RCM12. The most intense events in RCM12 are considered unphysical, and likely due to grid point storms (Chan et al. 2014b). Seasonal differences also illustrate the differences between the models. Here it was found that the RCM12 data have a larger fraction of non-summer events in the extreme event set compared to the CPM2 data. These differences between models were not found when sampling events by maximum area, i.e. events that are spatially large but not necessarily intense. Analysing the coincidence of large and intense events showed a larger fraction of the events sampled as both intense and spatially large in the RCM12 dataset compared to the CPM2 dataset. In summary, the extremes of the two models have low correspondence with each other.

Analysing time series of area, volume, maximum intensity and mean intensity for the Top 100 most extreme events over the lifetime of the event, showed large differences between the models. Large differences in the area of the extreme events explained the model differences in event volume. The CPM2 produces a larger total volume of rainfall within the case area compared to the RCM12 due to higher mean intensities. In the RCM12, extreme events contribute proportionally more to the total volume than in the CPM2, due to their larger spatial size. These differences are again consistent with the expected different character of heavy rainfall in convection-permitting models (which tends to be more intense, Ban et al. (2021)) compared to convection-parameterised models (where heavy rain events are not heavy enough, but tend to be too persistent and widespread, Kendon et al. (2012)). Crook et al. (2019) found an improved contribution to total rainfall volume from MCSs in convection-permitting simulations, compared to convection-parameterised simulations over West Africa.

In this study we have developed a method of simplifying area-intensity diagrams to allow the typical event evolution to be visualised across many events with different durations. This makes the method suitable in a climate context and is valuable in assessing differences in the underlying processes. Using the median Simplified Event Evolution showed large differences between RCM12 and CPM2 extreme events. The differences were again largest for the most intense events (Top 100). The Top 100 RCM12 extreme events had higher maximum intensities and areas than CPM2 extreme events, and these events in the RCM12 dataset are likely to be influenced by grid-point storms. For less extreme indices, i.e., the Top 1000 and Top 10,000 events, extreme events in the CPM2 data were more intense. In general, on the basis of the results here, we conclude that we should have low confidence in the most (Top 100) extreme precipitation events on hourly timescales in convection parameterised RCMs.

Sampling extreme events by season showed the largest differences between models in autumn and winter for Top 100 events. For Top 1000 and Top 10,000 large differences between models were found for summer events, which was expected due to the differences between the models in how convection is represented, and convection having greatest impact in this season. The large difference in winter extreme events was less expected, with lower intensities for the CPM2 events compared to the RCM12. This indicates that the difference in the representation of convection between models does not only affect events in summer. In addition to the representation of convection, the finer grid spacing of the CPM2 may allow it to better represent mesoscale structures within fronts, thereby impacting frontal events in winter.

The analysis performed on coarser resolution data (regridding model data to 25 km resolution before tracking) did not explain differences in event track location and event evolution found between models in the 12 km data. We conclude that the difference between the models in how they represent rainfall strongly influences the event characteristics reported here.

While no suitable observational dataset was found to analyse the entire region for hourly data, comparing the location of storm tracks and their seasonal distribution against previous observational studies (Morel and Senesi 2002) suggests a better performance of the CPM2 compared to the RCM12. This work emphasises the large difference in representation of extreme events between convection-permitting and convection-parameterised models. Using results from a tracking algorithm gives the advantage of analysing the difference in extreme precipitation from an event perspective, which is here explored with a simple visual method, the Simplified Event Evolution. The influence of grid-point storms in the RCM12 dataset shows that analysing and comparing extreme events from the RCM12 dataset should be treated with care. Overall there are indications that the CPM2 is more reliable in representing hourly extremes than the RCM12, based on previous studies comparing with observations. The methods used in this study could additionally be used to compare differences in the representation of extreme events between models in future projections.