1 Introduction

Precipitation extremes remain one of the most challenging quantities to simulate in climate models (Stephens et al. 2010), regional climate studies (Sunyer et al. 2012) and in numerical weather prediction (Lavers and Villarini 2013). Here, we investigate how the choice of physical scheme in a regional climate inspired model configuration can influence the simulation of precipitation extremes (Fita et al. 2010; Liang et al. 2012). With the number of regional climate simulations being performed at ∼10 km resolution increasing (e.g. Evans et al. 2014; Jacob et al. 2014) and growing interest in the representation of precipitation extremes within these models (particularly for specific event studies Ji et al. 2015), a better understanding of the sensitivity of extreme precipitation to physics schemes is required. For regional climate modelling, it is important that the quantitative precipitation distribution can be captured locally at different time scales, while the timing of these events is less important (in contrast to a numerical weather prediction context).

To investigate the uncertainty associated with the selection of different, but commonly used, physical parameterization schemes, we use a case study approach to assess their impact on model skill in terms of capturing observed temporal and spatial structures of an extreme rainfall event. The specific event studied here is an extreme extra-tropical low-pressure system (storms known as East Coast Lows (ECLs) Speer et al. 2009) that formed off eastern Australia in June 2007 and caused significant damage to the city of Newcastle (here the storm is referenced as “Newy”). Newy was one of several ECLs used in studies of different configurations of the Weather Research and Forecasting (WRF) model for the production of regional projections for eastern Australia in the New South Wales/ACT Regional Climate Modelling (NARCliM) project (Evans et al. 2012, 2014). Each configuration reflects a different combination of physics parameterization schemes for microphysics, long and shortwave radiation, cumulus and the planetary boundary layer.

Evans et al. (2012) examined the performance of the ensemble over four ECL events including Newy. They considered modelled and observed patterns for a range of variables and found that whilst no single ensemble member performed best, a small number of combinations consistently showed low skill in simulating a range of ECL events. They also found that the spread of performance amongst the members was greater in more intense events, suggesting that extreme precipitation events provide good test environments to differentiate the impact of different physics parameterizations. Recently, Ji et al. (2014) examined the spatial patterns of precipitation for eight such ECL events. Both studies found that parameterizations of cumulus convection and the planetary boundary layer significantly influenced the spatial patterns of precipitation produced. We build on this work, focusing on the role played by the physical parameterizations in the quantitative simulation of precipitation extremes for hourly and daily totals. The extreme nature of Newy as one of the largest storms in the region for the past 30 years, means it is well suited to an in depth analysis of the influence of physics parameterization on precipitation extremes.

A number of previous studies have examined multi-physics ensembles of WRF (Gallus and Bresch 2006; Bukovsky and Karoly 2009; Argueso et al. 2011; Flaounas et al. 2011; Schumacher et al. 2013) but few have focused on short-term precipitation during an extreme event. Jankov et al. (2005) used a WRF multi-physics ensemble at 12 km grid spacing, created using three cumulus schemes, three microphysics schemes and two planetary boundary layer (PBL) schemes, to simulate a series of warm season mesoscale convective systems that included some extreme precipitation. While they found no single physics combination performed best, the systems were most sensitive to the cumulus scheme followed by the PBL scheme and then the microphysics scheme. When examining rain rates, the cumulus scheme was the dominant factor, while the microphysics scheme had a stronger influence on total rain volume. They also found that interactions of different schemes could influence the results as much as changing a single scheme though this effect varied between events.

Lowrey and Yang (2008) examined the ability of a multi-physics WRF ensemble to simulate daily extreme precipitation in Texas, USA. Their ensemble was made using four different microphysics schemes, three different cumulus schemes and two different radiation schemes. They found that the simulation of an extreme precipitation event was most sensitive to the cumulus parameterization, slightly affected by the microphysics scheme and largely unaffected by the radiation schemes. They found that the Betts-Miller-Janjic (BMJ) cumulus scheme coupled to the Lin microphysics scheme produced the best precipitation estimate, while other cumulus schemes (including Kain-Fritsch) tended to underestimate the precipitation intensity. They did not consider different PBL schemes. They also found the cumulus parameterization improved the precipitation simulation even at 4 km grid spacing which is often considered sufficient grid spacing to turn off the cumulus parameterization. The present study addresses how shorter timescale precipitation extremes respond to different physics schemes in the context of regional climate simulations.

This paper is outlined as follows. In the Methods section, we present the case study, and then discuss the model simulations and the observational precipitation datasets used in this paper. The Results section presents a comparison between the observations and the 36 simulations and their ensemble averages at daily and hourly resolution. We then discuss these results in light of regional climate modelling and close with recommendations for the use of physics schemes in regional climate ensembles, where precipitation extremes are of interest.

2 Methods

The work presented here consists of a comparative analysis between precipitation extremes observed in the Newy event occurring in June 2007 and 36 model simulations using different combinations of physics parameterization schemes. After introducing the case study, we discuss the WRF simulations, and then we describe the observational data used for the precipitation comparisons.

2.1 Case study

Newy was an extreme extra-tropical low-pressure system (storms known as East Coast Lows (ECLs) Speer et al. 2009) that formed off eastern Australia in June 2007. Newy produced highly localized extreme precipitation near Newcastle, Australia (32.9167o S, 151.7500o E) and was one of the strongest ECL events observed in the last 30 years. Newy flooded the Hunter River to levels higher than in the previous 36 years, breached the Pasha Bulker carrier and produced flash flooding in the Newcastle, Australia region. Nine fatalities were recorded, and an estimated 20,000 calls to emergency services were made (Mills et al. 2010; Verdon-Kidd et al. 2010). Wind gusts to 135 km/h were also reported. The storm developed in an existing low-pressure trough located in the northern Tasman Sea, resulting in humid air being funnelled towards the coast of eastern Australia from the north. The synoptic situation is shown in Fig. 1. Beyond its formation, the major contributing factors to the extreme nature of this event were the warmer than usual Tasman Sea, the high-pressure system to the south creating large pressure gradients and high atmospheric temperature gradients over eastern Australia. The synoptic situation and dynamics leading up to the event are further discussed in Mills et al. (2010).

Fig. 1
figure 1

Synoptic situation of the Newcastle East Coast Low at 12 UTC 07 June 2007. The major event at Newcastle occurs 24–36 h later. Data from ERA Interim and chart adapted from the Bureau of Meteorology, Australia (see http://www.bom.gov.au/nsw/sevwx/facts/events/june-07-ecl/e1-msl-loop.shtml where an animation is available)

2.2 Atmospheric model simulations

The atmospheric model used in this study is WRF version 3.2.1, and its primary use lies in mesoscale numerical weather simulations, for both research and operational purposes. Due to its versatile configuration, WRF contains an increasing number of physics parameterizations that can be used almost interchangeably (Skamarock et al. 2008). There are now such a large number of parameterization schemes for each physical process available within the WRF modelling system that it is only feasible to examine a small subset within a single study. Here, we wish to examine model configurations relevant for climate length simulations and hence focus on simple to medium complexity schemes that have been used in previous regional climate simulations. We briefly describe the model setup used for simulating Newy, noting that full details of the experiments are documented in Evans et al. (2012).

The model domains used in this study are the same as Evans et al. (2012) and are shown in Fig. 2. The outer domain is the Australasian Coordinated Regional climate Downscaling Experiment domain and the inner domain covers eastern Australia and a significant portion of the Tasman sea. The grid spacing of the two domains is approximately 0.44 and 0.088 (48.92 km and 9.78 km, respectively) with dimensions 216 ×145 and 311 ×201, respectively. The atmosphere comprises 30 levels in both domains with four soil layers and the sea surface temperature is updated every 6 h. Gravity wave damping at the model top is used in both domains with a damping layer depth of 5 km. The model was run from an initial condition starting at 00 UTC 01 June 2007 to 00 UTC 15 June 2007 with the event peak occurring from June 7 to 9.

Fig. 2
figure 2

Outer and inner WRF domains used in the study of the Newcastle East Coast Low storm. Elevation in meters above sea level is also show

Thirty-six different combinations of physics options are simulated starting from the same initial condition, boundary and sea surface temperature forcing. The employed schemes, although not exhaustive, provide a selection of those typically employed in WRF simulations and, therefore, allow us to probe typical uncertainty in scheme choice. Two cumulus schemes are used, the Kain-Fritsch (KF) (Kain and Fritsch 1990; Kain 2004) and BMJ (Betts 1986; Betts and Miller 1986; Janjic 1994) cumulus schemes, and two PBL schemes are used, the Yonsei University (YSU)/MM5 similarity (Hong et al. 2006; Paulson 1970) and Mellor-Yamada-Janjic (MYJ)/Eta similarity (Janjic 1994) schemes. Three microphysics schemes are employed: WRF Single Moment 3-class (WSM 3), WSM 5 (Hong et al. 2004) and WRF Double Moment 5-class (WDM 5) (Lim and Hong 2010), and three shortwave/longwave radiation combinations are simulated: Dudhia/Rapid Radiative Transfer (RRTM) model (Dudhia 1989; Mlawer et al. 1997), Community Atmosphere Model (CAM)/CAM (Collins et al. 2004) and the global version of RRTM: RRTMG/RRTMG. The run number of each combination is shown in Table 1.

Table 1 Physics options used in the ensemble runs. The cumulus scheme (CU), planetary boundary layer (PBL), microphysics (MP) and radiation schemes (RAD) are shown

The initial and boundary conditions to the WRF simulations are provided by the ERA-Interim reanalysis (Dee et al. 2011). The outer domain employs spectral nudging of the wind and geopotential fields in the upper atmosphere. One way nesting is used to set the inner domain boundary conditions using the spectrally nudged outer domain fields.

2.3 Precipitation data

The WRF simulations are compared with measures of observed extreme precipitation totals during the Newy event, and this is undertaken for daily and hourly totals. For both analyses, we attempt to consider uncertainty encompassed in the gridded observed data; a process described in the following sections. We now discuss the precipitation data used in this work and detail how the hourly precipitation totals are derived.

2.3.1 AWAP daily data

For the daily precipitation analysis, we use the Australian Water Availability Project (AWAP) (Jones et al. 2009).Footnote 1 AWAP uses a climatological anomaly based interpolation employing daily station data from across Australia and has been found to be suitable for studies of extremes despite some underestimation (King et al. 2012). We use AWAP data to perform comparisons with WRF from 9.00 a.m. EST 02 June 2007 to 9.00 a.m. EST 15 June 2007 (EST is 10 h ahead of UTC). To estimate the errors in the interpolated AWAP precipitation fields, we employ the root mean square (RMS) error product accompanying the standard interpolation. In the AWAP interpolation, the size of the RMS error is positively correlated to the precipitation totals and the AWAP precipitation error is largest where the extreme totals occur. This necessitates a careful consideration of errors in AWAP for extreme precipitation. To estimate the errors, a Monte Carlo bootstrap procedure is used over the AWAP RMS error product. The bootstrap is performed by assuming each precipitation value in AWAP has an associated Gaussian error distribution with standard deviation approximated by the RMS error at that point. We bootstrap AWAP on its native grid before interpolating to the WRF grid (whose grid spacing is about 1/2 that of AWAP) for model-observation comparison. We find about 200 bootstraps are sufficient to estimate the error.

2.3.2 Hourly precipitation data

In addition to a model-observation evaluation on daily timescales, we also investigate the model on the sub-daily timescale using hourly precipitation. Hourly precipitation data provides much greater temporal sensitivity than the daily AWAP product, and this allows us to understand the influence of the various physics options on precipitation over a range of time scales. To assess the agreement with observations, and the influence of the physics parameterizations on precipitation extremes, we undertake a grid point analysis and construct a simple hourly interpolation field using inverse distance interpolation.

The hourly precipitation data consists of data from Australia’s pluviograph network. We use 88 Australian Bureau of Meteorology (BOM) stations with hourly totals within a 500 km radius of Newcastle from 9.00 a.m. EST 01–15 June 2007, shown in Fig. 3. All daily pluviograph station totals have been verified with the published daily totals for the same station and different (daily only) stations within 40 km using BOM’s climate data online service.Footnote 2 Only one pluviograph was excluded as it showed erroneous readings compared to the daily totals, after storm peak. The temporal coverage of all stations during the aforementioned period is not continuous and some stations drop out after the event, with some of these being in the Hunter Valley where substantial flooding occurred.

Fig. 3
figure 3

Station locations with hourly rain gauge data. These stations provide the basis for the hourly precipitation analysis. The land mask for the hourly interpolation encompasses all stations and a circle of radius 1 surrounding each station. The eastern Australian states are marked, along with Newcastle and Sydney for reference

Our approach to the hourly distribution analysis consists of undertaking a grid point analysis and an interpolated hourly precipitation analysis. We undertake both approaches to probe the robustness of our results for hourly totals. The grid point analysis involves comparing the observations and model at (1) the model point nearest to the station and (2) at all model points within 0.25 of the nearest model point to the station.

For the interpolation, we first conduct a variogram analysis of the 88 stations. This shows that the decorrelation length of the hourly totals is ∼0.5. As we discuss below, a cut-off of 1.0 is favoured in the inverse distance interpolation. This introduces some uncertainty in the interpolation procedure, which we quantify below by considering the parametric errors in the interpolation procedure. Comparison across the grid point and interpolation methods shows our results for the hourly precipitation distributions are consistent (see Results Section 3.2). We also note that Newy was a heavy precipitation winter event, which was wide spread, large scale and very fast moving on hourly times scales, meaning that the precipitation interpolation method should be much better in this situation than in a summertime convective situation. Since we are only interested in the statistics (precipitation distribution and totals) of the hourly precipitation totals, and we find broad consistency with AWAP and the other hourly comparisons, we judge this interpolation method to be satisfactory.

2.3.3 Interpolation of hourly station data

To interpolate the hourly station data for comparison with the precipitation produced by the WRF model, we use an inverse distance interpolation procedure. This allows us to estimate the hourly precipitation totals P between the station locations. The inverse distance interpolation is given by

$$ P(\mathbf{x},t)=\left( \sum\limits_{i=1}^{n}D^{\alpha}_{i}\right)\left( \sum\limits_{i=1}^{n}\frac{p(t)_{i}}{D^{\alpha}_{i}}\right) $$
(1)

where p(t) i is the i th station precipitation time series, D i = ∥xy i ∥ is the distance between the station location y i and the current grid point x and α is the weighting exponent of the inverse distance interpolation. The sum is performed over the n stations within the distance D i <D max, where D max is the threshold after which stations are not included for estimating the precipitation at the current grid point x. Typically, n ranges from 1 to 10 in the interpolation depending on station density. The two parameters of the interpolation method, α and D max, are unknown a priori and must be chosen to reproduce a best interpolation.

To determine the values of α and D max for the hourly interpolation, the daily AWAP product is used to constrain the hourly interpolated product. Using the AWAP daily product to estimate the values of α and D max represents a “matching” approach across time scales, since we are selecting α and D max such that the daily AWAP totals are optimally reproduced when summing the hourly interpolation fields over the interpolation grid. This approach allows us to incorporate the precipitation totals from the entire daily station network (which are used in the AWAP construction), of which there are many more stations when compared to the pluviograph station network with hourly totals within 500 km of Newcastle.

The approach used to constrain the interpolation parameters is based on a χ 2 measure using AWAP as a ‘true’ reconstruction of the daily precipitation. The hourly interpolation is summed from 9.00 a.m. to 9.00 a.m. the next day and compared with the corresponding AWAP value. The difference between the daily AWAP value and summed hourly interpolation is weighted by the corresponding RMS product from AWAP at each grid point. The measure can be written schematically as

$$ \chi^{2}= \sum\limits_{m,j}\left( \frac{P^{m,j}_{\text{AWAP}}-P^{m,j}_{\text{Hourly}}}{P^{m,j}_{\text{AWAP,RMS}}}\right)^{2} $$
(2)

where the sum is taken over all days m and grid points j with daily precipitation values falling within the land mask from the hourly interpolation procedure. Here, P AWAP are the daily AWAP totals, P Hourly are the summed daily totals from the hourly interpolation and P AWAP,RMS is the daily RMS error from the AWAP interpolation. We also restrict the sum to precipitation values \(P^{m,j}_{\text {AWAP}}>2.0\,\text {mm}\,\text {day}^{-1}\) to avoid weighting drizzle in the parameter estimation.

To determine the interpolation parameters α and D max, we minimize the χ 2 by standard optimization techniques. The search range used is α = 0.5 to 5.0 and D max = 0.25 to 2.0. We find the interpolation parameters α and D max that minimize the χ 2 are (α,D max)=(2.2,1.0), and these parameters are used here. Figure 4 shows the daily precipitation from AWAP versus summed daily precipitation totals from the hourly interpolation at the same grid point. Good agreement is seen over the precipitation range—with drizzle and extremes >200 mm day−1 showing satisfactory agreement. Due to the difference between the best-fit cut-off of 1.0 and the variogram decorrelation length of ∼0.5 for the station data, we also estimate the parametric errors in this reconstruction. We examine the χ 2 at the 3 σ limit for the α parameter and provide these curves as the 3 σ errors in the Results section for the interpolated quantities. It is difficult to estimate the exact bias of the interpolation for observational data, however, given the rarity of high percentile extremes; we would expect the interpolation to underestimate the true precipitation extremes.

Fig. 4
figure 4

Daily precipitation for AWAP versus the summed daily totals for the hourly precipitation interpolation. All precipitation values contained in the station land mask are shown as well as a one-to-one line. This comparison takes place on the AWAP grid, before interpolation to the grid of the Newcastle WRF simulations. Good agreement is seen for the entire precipitation range

3 Results

Newy generated substantial precipitation from 7–9 June 2007, which lead to flooding in the Hunter Valley and around Newcastle. To understand how WRF simulated this event, we examine both the daily and hourly precipitation measures using the datasets discussed above. Using the daily AWAP data, we first consider totals, spatial and distributional properties as well as event timing and structure. Similar characteristics are then assessed for hourly totals.

3.1 Daily evaluation

3.1.1 Total areal precipitation

The total precipitation produced over land from 9 a.m. EST 02 to 15 June 2007 in the inner domain provides a general measure of the precipitation activity of the storm (Fig. 5). Compared to the AWAP precipitation of 5.19×107 mm, the total amount of precipitation is overestimated by the model across all physics runs, except N32, which has a 15 % deficit in total precipitation. The excess is on average 50 %, with a maximum overestimation of 82 % by N9. The choice of boundary layer and cumulus scheme has the largest effect, with average precipitation excesses of 69, 46, 70 and 19 %, for the KF-YSU, BMJ-YSU, KF-MYJ and BMJ-MYJ combination respectively. Ensemble averages over the radiation and microphysics schemes shows only small changes in the precipitation excesses between different schemes. This indicates the cumulus and boundary layer schemes play an important role in determining the total precipitation making landfall. We find the effect of the cumulus and boundary layer schemes to be important for most quantities considered below and will show averages for the nine runs belonging to each cumulus and boundary layer combination, in addition to the behaviour of the full ensemble.

Fig. 5
figure 5

Total land-based precipitation relative to AWAP. Runs are numbered N1 through N36 and color-coded. The runs can be split into the cumulus-boundary layer scheme combinations: KF-YSU (N1-9), BMJ-YSU (N10-18), KF-MYJ (N19-27) and BMJ-MYJ (N28-36). The averages of these combinations are also shown with color-coding: KF-YSU red, KF-MYJ dashed dark red, BMJ-YSU blue, BMJ-MYJ dashed dark blue. The same colours are used throughout for the individual runs and the cumulus-boundary layer ensemble averages

3.1.2 Spatial measures

We now examine the spatial precipitation properties of the simulations. Figure 6 shows total WRF precipitation from 9.00 a.m. EST 02 to 15 June 2007 for both grid resolved and parameterized precipitation. Figure 6 supports our statements above regarding total precipitation, with the KF scheme (N1-9 and N19-27) generally promoting larger spatial simulation precipitation than the BMJ cumulus scheme (N10-18 and N28-36). Even though the cumulus scheme influences total rainfall, analysis of the WRF runs here shows that for each specific run, total rainfall predominately comes from the microphysical scheme at the grid spacing used in simulations of this event. For example, the cumulus scheme precipitation can only achieve rates of ∼10 mm hr−1, where microphysical rates can be as large as ∼100 mm hr−1.

Fig. 6
figure 6

Total precipitation from 1 to 15 June 2007 from AWAP and all 36 simulations. Each panel is labelled according to the run number. Excluding AWAP, the first and last three columns are the KF and BMJ cumulus schemes respectively. The first and last three rows are the YSU and MYJ boundary layer schemes, respectively

To understand Fig. 6 more thoroughly, we decompose precipitation by latitude and coastal distance. Figure 7a shows total land only precipitation converted to a constant latitude projection and then summed over the storm. This shows that Newy produced a substantial precipitation peak at ∼32S, with ∼5 and ∼3.5 times less precipitation south and north of the main peak, respectively. The WRF simulations have a peak in precipitation at ∼32S in good agreement with the observation, except for some runs which show a more northerly position of the peak precipitation. The size of the peak is also well captured, with the largest model-observation discrepancy being less than 50 %.

Fig. 7
figure 7

Spatial precipitation from WRF and AWAP. Top row: Total precipitation as a function of latitude for a all model configurations and b averages for each combination of cumulus and boundary layer scheme. Bottom row: Total precipitation over the domain as a function of distance from the coastline, c all model runs and d scheme combination averages. AWAP is the black curve in all panels. The run colours and ensemble averages are the same as Fig. 5. In this figure, we have used a 0.125o lat-lon regridding

Mixed results are found on either side of the main peak in Fig. 7a. South of 32 S, there is good agreement between model and observed data with no systematic over or underestimation of precipitation. North of 32 S, there is a clear excess of precipitation in the model, with the average excess of all physics runs from 30 S to 24 S being twice the AWAP precipitation which has good spatial sampling of daily stations in this region. The precipitation north of the storm peak occurs before the main event in both WRF and the observations, and this indicates greater precipitation activity before the storm peak over southern Queensland in WRF relative to AWAP. Figure 7b shows the cumulus and boundary layer scheme ensembles as an average over each of its nine members. The storm peak is generally well captured by all combinations. An exception is the simulations using the BMJ-MYJ combination. For these runs, the ensemble average shows a northward shift of the peak, and an overall smoothed appearance due to the variability in peak location. The precipitation to the south of the main peak agrees well with AWAP in all cases. To the north, all four averages overestimate the total precipitation, indicating a shared origin of this effect in the simulations.

This storm produced extreme totals localized near the coast and we would like to know if the WRF ensemble adequately simulates this coastal precipitation. We examined precipitation as a function of linear westward distance from the eastern coast of Australia (Fig. 7c) using the constant latitude projection. Total coastal WRF precipitation ranges from 10×103 mm (N32) to 25×103 mm (N9) over the 36 different model physics runs. The AWAP observations indicate 14.9×103 mm, clearly falling within the lower end of the models. The cumulus and the boundary layer schemes have the largest influence on the coastal precipitation (Fig. 7d). Compared to AWAP, the BMJ scheme produces coastal precipitation that more closely mirrors the observations. The PBL influence is not as pronounced as the cumulus scheme, but the YSU scheme does produce higher coastal precipitation than MYJ. Of interest are the differences in the precipitation less than 40 km from the coast for the four cumulus and boundary layer combinations (Fig. 7d). The KF cumulus scheme has more than twice the precipitation compared to the BMJ scheme, whereas inland this difference is considerably reduced.

3.1.3 Precipitation structure and timing

Figure 8 shows the daily precipitation totals around storm peak starting from 9.00 a.m. EST 05 to 10 June 2007. The two model runs shown are those with the largest daily extreme totals over land (∼400 mm, model N2), and the smallest daily extreme precipitation total (∼160 mm, model N28), along with AWAP. Clearly, the storm in WRF is late by ∼1 day in both the maximal and minimal extreme event simulations. This delay is common amongst all simulations and is discussed later. Even though the events are delayed, both N2 and N28 have strong qualitative agreement in precipitation structure. For example, both have similar precipitation totals in Queensland on 06–07 June around 24 S, becoming coastal precipitation about the same latitude on 07–08 June, then moving down to 30 S on 08–09 June, and finally both events on 09–10 June produce substantial precipitation around Newcastle at 32 S. Throughout this development, the precipitation in the Tasman sea is also similar. One clear distinction occurs for extreme precipitation totals greater than 200 mm day−1, which are commonly seen in N2 with the KF-YSU scheme. While in N28 with the BMJ-MYJ scheme, these “cells” of extreme precipitation greater than 200 mm day−1 are absent. This shows that for this event and WRF configuration, the KF scheme promotes localized precipitation extremes relative to BMJ. We also note that the cumulus scheme produces only a minor component of the rain in these “cells”, with the microphysical scheme producing the majority. At least for this event and WRF configuration, this indicates that KF is more efficient at promoting large-scale accent of moisture, where the microphysics can then convert the moisture into surface rainfall.

Fig. 8
figure 8

Comparison of daily precipitation from AWAP vs. the most extreme simulations. The rows show daily totals starting from 9.00 a.m. 05 June 2007. For example, panels (a), (g) and (m) show the daily totals from 9.00 a.m. 5 June to 9.00 a.m. 6 June EST. Panels (af) show the land only AWAP daily totals with the ocean shaded, and for the simulations, (gl) the maximal daily precipitation extreme run N2, and (M)-(R) the minimal daily precipitation extreme run N28

3.1.4 Event precipitation distributions

We now examine daily precipitation distributions for the entire period over land. Distributional measures can be useful to identify certain precipitation characteristics, enabling a quantifiable comparison of precipitation events (Karl and Knight 1998; Adler et al. 2000; Lonfat et al. 2004). Here, we use metrics of precipitation distributions to tell us if the overall precipitation extremes are reproduced by the different physics choices. We are aware that by using different physics combinations, storm dynamics will be slightly different for each run and this will play a role in determining the extremes. However, we would like to know if the land-based precipitation distribution from the Newcastle event could be reasonably modelled by the different physics combinations, and when evaluated with other measures, such as spatial properties, timing and totals, this provides valuable information. This situation has direct applicability to regional climate downscaling, where limited computing resources are typically allocated for long simulations at small grid spacing rather than simulating a large range of different physics combinations as undertaken for Newy here (Fita et al. 2010; Liang et al. 2012).

Figure 9a shows the daily precipitation distribution for the model (coloured dots) and AWAP (large black dots) along with ±3σ (gray lines) from the AWAP Monte Carlo discussed in the Methods section. All physics combinations run with WRF reproduce the overall distribution shape surprisingly well—especially the extreme daily totals. Closer examination of Fig. 9a shows the simulations produce an excess over the AWAP distribution in the intermediate precipitation range, 40−140 mm day−1. This excess results from the additional precipitation over southern Queensland before the event maximum, as discussed above. The number of extreme events from 150–250 mm day−1 is well reproduced. A number of simulations produce daily extremes from 300–400 mm day−1 that do not appear to agree with the AWAP dataset. The four runs with the most extreme precipitation values are N2, N4, N12, and N7 with all belonging to the KF-YSU simulations except N12, which is a BMJ-YSU simulation. The simulations with the lowest precipitation extremes are N28, N26, N34 and N10, with the first three runs belonging to the BMJ-MYJ simulations. We also find similar behaviour amongst the schemes in the other events from Ji et al. (2014). The daily drizzle distributions from the model agree with AWAP in all cases (not shown).

Fig. 9
figure 9

Daily precipitation distribution and maximum daily totals. AWAP is compared to all model runs (left column) and averages for each combination of cumulus and boundary layer scheme (right column). Panels (a)–(b) show the daily precipitation distributions, and versus simulation day: (c)–(d) Maximum daily totals. The black dots are the mean of the AWAP Monte Carlo and the gray lines show the AWAP 3 σ errors. The run colours and ensemble averages are the same as Fig. 5

3.1.5 Maximum daily totals

We finish the analysis of the daily precipitation extremes by examining the maximum precipitation over land as a function of simulation day. The time series of maximum daily totals over land are shown in Fig. 9c,d. Clearly, the model is 1 day late but all simulations estimate the peak maximum and shape of the AWAP observations well. Before the main event good agreement is seen, however, after the event, there appears to be a general overestimation of the maximum daily precipitation by the models. At storm peak, the boundary layer scheme, rather than the cumulus scheme, produces the largest differences between the models with the YSU scheme showing higher daily totals than MYJ. We note that total rainfall over the land in the inner domain was shown in Fig. 3a of Evans et al. (2012) and when ensemble averaged, this shows similar results to maximum daily totals, that is, KF-YSU and BMJ-MYJ have the highest and smallest precipitation amounts at storm peak, respectively.

3.2 Hourly evaluation

Evaluating precipitation in WRF on an hourly basis provides greater temporal sensitivity than the daily totals discussed previously. The hourly evaluation consists of two parts, a comparison of (1) the hourly distribution of precipitation and (2) the total rain falling around Newcastle. For the precipitation distributions, we undertake a grid point and an interpolated precipitation comparison. This allows us to demonstrate a robust comparison for hourly totals. For the total hourly precipitation falling around Newcastle, we use only the interpolated product since its totals are well constrained by AWAP.

3.2.1 Hourly precipitation distribution

The hourly precipitation distribution over land is shown in Fig. 10 for both the grid point and interpolation comparisons. As explained in the data section, the three comparisons used are (1) station locations only (Fig. 10a,b), (2) including all model points within 0.25 of the station locations (Fig. 10c,d) and (3) the interpolated hourly precipitation including all model points in the interpolation land mask (Fig. 10e,f). Note that counts can be less than one because of the ensemble normalization.

Fig. 10
figure 10

Model hourly precipitation distributions compared to the hourly precipitation station data and interpolation. Top row: Station only model distributions for (a) hourly precipitation distribution for all runs and (b) the cumulus scheme boundary layer averages. Middle row: Including all model grid points within 0.25 of the stations for (c) all runs and (d) the cumulus scheme boundary layer averages. Last row: Interpolated station data verses all model grid points with the interpolation land mask for (e) all runs and (f) the cumulus scheme boundary layer averages. The run colours and ensemble averages are the same as Fig. 5. The hourly data in each row is the black curve. Model counts can go below 1 because of the ensemble and count averages used

For the grid point comparisons, the drizzle and intermediate part of the distributions are well modelled by WRF. However, it is evident in the ensemble averages (Fig. 10b,d) that the hourly data favours the KF cumulus scheme curves for the highest precipitation totals \(P\gtrsim 20\,\text {mm}\,\text {hr}^{-1}\). This indicates that the BMJ cumulus scheme does not produce high enough hourly precipitation totals in this event. To test the robustness of this relationship, we also cut the domain in two at Newcastle’s latitude and find the same preference for the larger precipitation extremes from the KF cumulus scheme.

For the interpolated product, reasonable agreement is seen across the models in Fig. 10e, particularly for the intermediate hourly totals from P∼20 to 30 mm hr−1. There is an over estimation of the total number of counts for P<20 mm hr−1 for all physics choices. Some models also have substantially greater hourly extremes than the interpolated hourly station data, with these models being runs N4 through to N8, which all belong to the KF-YSU simulations. For the cumulus and boundary layer averages in Fig. 10f, the effects of the physics do become evident. Figure 10b shows that the KF cumulus scheme precipitation distribution for \(P\gtrsim 30\,\text {mm}\,\text {hr}^{-1}\) provides a better representation of the observations. This result is in clear agreement with the grid point analysis above and provides additional evidence that the KF cumulus scheme produces extremes that are more realistic. It is also clear that the KF scheme generates stronger hourly extremes than the BMJ scheme. For the hourly totals, this results in a clear separation of the cumulus scheme precipitation distributions that was not present in the daily data (compare Figs. 9b and 10f). The boundary layer also plays a role here, with the YSU scheme having larger extremes than the MYJ scheme; however, the effect is less important relative to the cumulus scheme. Since the BMJ and KF distributions do not overlap for hourly totals, it is possible to discriminate between the BMJ and KF cumulus schemes using hourly precipitation distributions. We also find the cumulus and boundary layer schemes in the remaining events from Ji et al. (2014) display similar behaviour for hourly precipitation distributions.

3.2.2 Hourly precipitation totals

We now present the hourly precipitation time series starting at 9.00 a.m. EST 02 June 2007. Figure 11a,b shows total precipitation falling around Newcastle in a number of peaks, with the main event from 5.4 to 7.5 days (black line). A number of smaller events are also visible, with most occurring before the main event at 4.2–5.4 days.

Fig. 11
figure 11

Total precipitation in the interpolation land mask compared with the WRF runs for each simulation hour. Panel (a) shows all runs and panel (b) the cumulus scheme boundary layer averages. The run colours and ensemble averages are the same as Fig. 5. The hourly interpolation data is the black curve

Inspection of the WRF produced precipitation shows there is a delay of approximately 12–24 h relative to the actual event. Despite this, the magnitude of the WRF precipitation is generally similar (Fig. 11a), although most models overestimate the amount of precipitation occurring at event peak. For the physics averages (Fig. 11b), the KF cumulus scheme overestimates peak precipitation by ∼30 % on average, whereas the BMJ cumulus scheme is similar to the observed peak. Figure 11b also shows the boundary layer influence, with the YSU scheme promoting more precipitation than MYJ.

3.3 Storm time delay

Both the daily and hourly precipitation analysis, (Figs. 6 and 10) showed that the WRF simulations contained a common delay of 12–24 h relative to the actual precipitation observations from the Newy storm. The delay originates in the outer domain because the inner domain uses one-way nesting and has a relatively small size that cannot appreciably influence the synoptic scale processes that formed this event. Evidence for this is provided in Fig. 12. This displays a comparison of mean sea level pressure snapshots between ERA Interim and the outer domain simulations from the minimal (N28) and maximal (N2) precipitation extreme simulations as identified in the inner domain. This shows that the delay does originate in the outer domain as expected. Other runs have similar behaviour. Detailed study of the exact mechanisms of how different physics schemes influence the timing of the Newcastle storm, and its positioning relative to Eastern Australia through simulations in the outer domain is a subject for future work.

Fig. 12
figure 12

Evidence that the inner domain time delay originates in the outer domain of the simulations. The left, middle and right columns show mean sea level pressure for ERA Interim, maximal run N2, and minimal run N28, respectively. These are shown from 5 to 10 June at 00 UTC. The runs N2 and N28 are depicted here using the outer domain simulations only. Examination of the 8 to 10 June panels shows the low pressure system off the east coast of Australia in the WRF simulations are systematically late compared to ERA Interim

3.4 Grid scale analysis

Analysis of the WRF simulations showed that the cumulus and boundary layer schemes caused systematic changes in the hourly precipitation distribution. These differences were however produced by precipitation from the microphysics scheme rather than precipitation from the cumulus schemes (whose maximum hourly precipitation was about 10 times less than the microphysical scheme). This implies that the cumulus and boundary layer schemes are modifying the atmosphere to allow greater grid scale microphysical precipitation. We examined grid scale behaviour in the simulations and found that extremes in the distributions of the vertical moisture flux, given by qw at model grid points where q is specific humidity and w vertical velocity, was a robust predictor of hourly extremes. Figure 13 shows the land-based distribution for water vapour, vertical velocity and vertical moisture flux. Although each panel seems to be a good predictor in this case (compare to Fig. 10f), when we split the domain (such as land/water, north/south) only the vertical moisture flux remains a good predictor.

Fig. 13
figure 13

Distribution of a water vapour mixing ratio, b vertical velocity and c vertical moisture flux for ensemble averages of the four cumulus and boundary layer scheme combinations. These distributions are constructed from atmosphere over land in the simulations. Only the patterns seen in (c) are robust to changes in the area considered. Colours are the ensemble averages from Fig. 5

Further analysis shows that the boundary layer scheme controls the amount of water vapour in the inner domain, with YSU producing about 5 % more specific humidity on average than MYJ (Fig. 14a). In contrast, when regions with rain only are considered (i.e. excluding no rain and light drizzle) the cumulus scheme appears to control differences in column-average specific and relative humidity (Fig. 14b,c). Here, KF has a column-average relative humidity 6 % lower than BMJ. This appears to indicate that the KF scheme is causing vapour to rain out more vigorously, thereby causing stronger grid-scale circulations via the release of latent heat, but also competing with the stronger circulations by reducing water vapour in the raining regions (Wing and Emanuel 2014). If a higher threshold is used for raining regions, i.e. focussing on extreme instantaneous rain rates only, we start to see the microphysics influence the atmospheric conditions. This effect is already present in Fig. 14c where the WDM 5 scheme can be seen to produce lower column-average relative humidity than WSM 3 and WSM 5 for a given cumulus-boundary layer combination (e.g. compare N1 to N6 and N7 to N9). Detailed analysis of the specific physics influencing water vapour from the boundary/surface layers, cumulus schemes and microphysical parameterizations are left for future work.

Fig. 14
figure 14

Atmospheric conditions over a the inner domain (both raining and non-raining) for column water vapour, and within rain forming regions for b column water vapour and c column average relative humidity. The different runs are coloured by their cumulus-boundary layer group from Fig. 5

4 Discussion

We now discuss how the physics schemes influenced precipitation. For the entire storm period, the BMJ-MYJ combination produced the least biased precipitation totals. For the daily maximum extremes, all models agreed well with observations. However, for the hourly extremes, the different cumulus schemes had distinct behaviours: WRF with the BMJ cumulus scheme clearly underestimated extreme hourly totals, whereas WRF with the KF cumulus scheme had much higher extreme hourly accumulations (Fig. 10b,d,f) in better agreement with the precipitation data. For the hourly precipitation totals around Newcastle, the overall peak was reproduced, but many of the finer features were not seen due to the storm being one day late on average.

The difference in results between the daily and hourly time scale at storm peak provides evidence that the relative impact of cumulus and boundary layer schemes is time-scale dependent. Hourly precipitation extremes are influenced most by the cumulus scheme; however, high hourly totals tend to be compensated by lower totals during the same day. This results in a relatively small differentiation at the daily time scale, as evidenced by the reordering of the extremes in the daily distribution compared to the hourly distribution (Fig. 9b versus Fig. 10). Maintaining high precipitation rates throughout the day is more strongly influenced by the boundary layer scheme. This follows other studies, such as Kendon et al. (2012) who found that bias in hourly precipitation was most associated with deficiencies in the cumulus scheme.

The radiation and the microphysics schemes did not play a substantial role in changing the precipitation distributions when ensemble averages were considered on daily and hourly timescales, although differences were found in atmospheric conditions in raining areas for different microphysics schemes. This result for the microphysics scheme was interesting, since it is generating the majority of the precipitation in the simulations, and past studies indicated precipitation can be sensitive to the microphysics choice (Fiori et al. 2014). One possible reason for this lack of influence on the ensemble average is that the microphysics schemes used (WSM 3, WSM 5 and WDM 5) are all based on the underlying microphysics of Hong et al. (2004) and Lim and Hong (2010). It is also possible that large-scale forcing dominated the convergence and rainout of water vapour on daily and hourly scales in this synoptic scale event.

When compared with the storm dynamics, BMJ-YSU has the shortest time delay compared to the observations, but has a deficit in precipitation extremes for hourly totals. For the KF simulations, although they produce the larger hourly totals seen in the observations, they are ∼24 h late and produce too much precipitation over land. Gallus and Bresch (2006) also found KF produced higher peak precipitation rates compared to BMJ. We note the presence of the delay in storm initiation in the simulations is the major contributor to the low skill scores seen in Evans et al. (2012) for this event.

5 Conclusion

We investigated precipitation extremes produced in the Newy event of 2007 in WRF. This extreme extra-tropical low-pressure system provided an excellent test case for examining long-range extreme rainfall simulations. The long lead-time probes WRF’s ability to simulate extreme events in circumstances that are more typical of regional climate studies. Our simulations used 36 different physics combinations in the WRF model. We examined storm totals, daily and hourly precipitation, and spatial measures of the extreme precipitation. We showed that extreme precipitation could be well modelled in WRF at daily and longer timescales using a long lead-time. We showed that all four combinations of cumulus and boundary layer schemes modelled the observed daily extreme precipitation well, and when ensemble averaged (Fig. 7b), all four combinations fell within the 3 σ range of the observed distributions of daily precipitation extremes from AWAP. Although the range of the 36 runs themselves was large, systematic differences were observed in the ensemble averages for the different cumulus and boundary layer combinations indicating these combinations could potentially be discriminated.

Even though the simulations were designed with regional climate in mind, we have shown that higher frequency sampling of precipitation extremes, such as hourly totals, can reveal differences between cumulus and boundary layer schemes, whereas daily precipitation totals could not. This means the parameterization of cumulus and boundary layer schemes could potentially be improved by examining hourly precipitation totals. Moving to hourly totals for model comparison would be fruitful, however full spatial coverage of the area of interest would be desirable. This could be achieved by combining gauge measurements with radar measurements for example (Haberlandt 2007; Verworn and Haberlandt 2011).

The results presented here have direct implications for regional climate modelling of precipitation. In regional simulations, one usually chooses a set of physics options and then simulates with these over long times, typically for decadal long simulations. If one is interested in extreme precipitation, this approach could lead to large systematic errors in the resulting precipitation extremes for shorter accumulation periods. We showed that the BMJ-MYJ combination produced the best match with the observed precipitation total for the entire storm period; however, it underestimated hourly precipitation extremes substantially. On the other hand, the KF cumulus scheme did produce realistic hourly precipitation extremes but overestimated the total precipitation from the event. Therefore, using a physics combination that provides a good representation of the mean precipitation for a region may not produce realistic extreme precipitation for shorter accumulation periods. This could potentially introduce large biases in an hourly precipitation extreme analysis.

While the microphysics parametrization is responsible for the vast majority of the extreme precipitation in these WRF simulations of the Newy event, we show here that it is the cumulus (and to a lesser extent the PBL) scheme that determines the intensity of this precipitation. That is, the large scale convergence which is generating precipitation through the microphysics parametrization is strongly influenced by changes in the vertical atmospheric profile introduced by the cumulus scheme. Thus, the interaction of these grid point schemes with the larger-scale atmospheric motion is key to producing precipitation extremes for Newy in this WRF configuration.

Based on this work, we provide some recommendations for regional climate modelling simulations where precipitation extremes are of interest. Simulation, or at least testing with multiple physics combinations, is advisable, and such an ensemble has previously been found to provide a larger spread than a perturbed initial condition ensemble (Gallus and Segal 2001; Tapiador et al. 2012). This would allow an objective assessment of precipitation extremes in a given regional climate model experiment. This work shows that if hourly precipitation extremes are relevant to the intended regional climate application, then using select test cases for comparing various scheme configurations against observed hourly precipitation could provide important information about the potential biases introduced by scheme selection in longer regional climate simulations.