1 Introduction

Global Climate Models (GCMs) are indispensable tools for developing knowledge about the climate system. GCMs have been used widely to simulate large-scale properties and the response of the climate system under different greenhouse-gas (GHG) emission scenarios to project the future climate since the industrial revolution. One of the main goals for their use is to study the climatic impacts through extensive intercomparison exercises (e.g., Coupled Model Intercomparison Project Phase 5—CMIP5 (Taylor et al. 2012) and more recently, CMIP6 climate model projections (Eyring et al. 2015) that underpin the 6th Intergovernmental Panel on Climate Change assessment report). However, the coarse resolution of these models of approximately 80–00 km prevents detailed analysis of climate change at regional and local scales, such as changes in climate extremes, water resources, and various other elements crucial for future planning e.g., (Gutowski et al. 2020). This weak representation affects several physical mechanisms (e.g. convection, clouds and precipitation, heterogeneity of surface fluxes, and planetary boundary layer, turbulence), particularly for the regions characterized by complex topographical features (e.g., Henderson-Sellers et al. 1995; Déqué et al. 2007; Jacob et al. 2014; White et al. 2018; Vergara-Temprado et al. 2020). Comprehensive analysis of regional impacts, therefore, requires high-resolution climate variables that cannot be obtained directly from coarse-resolution models. In addition, the anthropogenically-induced regional atmospheric circulation changes are not easy to detect using global simulations due to high internal variability and low signal-to-noise ratio (Palmer 2013; Horton et al. 2015; Zhou et al. 2020). Thus, the human-induced dynamic contribution to regional extremes should be assessed through enhancing regional signals using a regional atmospheric model.

Of late years, downscaling methodologies, such as dynamical downscaling using a regional climate model (RCM), have been proposed to produce the high-resolution climate variables that are much needed. A recent study on the biases of the GCMs and RCMs indicated that RCMs can reduce systematically the biases of the driving GCMs (Sørland et al. 2018). According to Ke et al. (2013), RCMs with high spatial resolution: (1) resolve better physical processes of regional, mesoscale and local scale circulation effects (surface fluxes, breezes, convection, and heavy precipitation) and (2) improve the representation of surface characteristics and their spatial variability in case of the complex topography of the region with mountainous features and rough coastlines.

In the past years, regional climate simulations for the region of Europe at spatial resolutions of about 50–25 km, were provided by PRUDENCE (Christensen and Christensen 2007) and ENSEMBLES (Hewitt 2005; van der Linden and Mitchell 2009), respectively. More recently, EURO-CORDEX (Giorgi et al. 2009) and MED-CORDEX (Ruti et al. 2016; Colmet-Daage et al. 2018) multi-model ensemble projects produced higher resolution climate studies of about 12 km (0.11°).The latter two, through some meticulous studies on the benefits of increased spatial resolution in model skills of RCMs, have significantly contributed towards our understanding of regional climate processes and their response to climate change (Bartók et al. 2017; Knist et al. 2017; Cavicchia et al. 2018; Colmet-Daage et al. 2018; Lhotka et al. 2018; Coppola et al. 2020).

A high RCM resolution can help reduce model uncertainties, better represent topographic effects, and improve precipitation simulations (Sylla et al. 2010; Cardoso et al. 2013; Warrach-Sagi et al. 2013; Warscher et al. 2019; Tian et al. 2020). According to Expósito et al. (2015), this fact is especially relevant in climate studies on islands with a complex orography, where regional models should have a resolution of a few kilometers (Zhang et al. 2009, 2012). For example, Pérez et al. (2014) showed that for the Canary Islands, the model resolution should be of at least 5 km resolution to reproduce the observed geographical distribution of temperature and, particularly, of precipitation. Statistical analysis of different daily precipitation indices in ensembles of Med-CORDEX and EURO-CORDEX experiments reported that 0.11° simulations show remarkable performance in reproducing the spatial patterns and seasonal cycle of mean precipitation over all regions, with a consistent and marked improvement compared to the 0.44° resolution ensemble and the ERA-Interim reanalysis (Prein et al. 2016; Fantini et al. 2018). In terms of sub-daily scales of a subset of the EUROCORDEX 0.11° ensemble, Berg et al. (2019) showed that the spatial patterns over Germany were reproduced at least partly at a 12 h duration but not for shorter. In a systematic analysis of climate classifications with GCMs and RCMs, Tapiador et al. (2019) concluded that “the modeling of precipitation remains the Achilles' heel of models and thus of multidimensional indices, which are very sensitive to this variable”.

Recent studies, many of these realized in the framework of the CORDEX project, have focused on the performance of the Advanced Research WRF (ARW) model as RCM to represent extreme events of temperature or precipitation, climate indicators, and drought variability at high spatial resolutions. These studies have indicated improved description of simulation results, as the WRF model allows to easily choose among a large number of physical parameterization (Soares et al. 2012b; Berg et al. 2013; Wagner et al. 2013; Gao et al. 2015; García-Valdecasas Ojeda et al. 2015; Sun et al. 2016; Prein et al. 2017; Hu et al. 2018; Tian et al. 2020). Komurcu et al. (2018) reported that the improvement obtained with higher resolution dynamical downscaling is dependent on the region simulated and the choice of parameterizations and model setup used in the regional model.

The geomorphological complexity of Greece enhances the need for high-resolution climatology studies. More specifically, Greece is located in Southeastern Europe, bordering the Ionian Sea and the Mediterranean Sea, ranging from 35° to 42° N and from 19° to 28° E for latitude and longitude, respectively. It is a mostly mountainous country (circa 80% is mountainous with heights up to 2900 m) with an extended coastal line (measuring 15,021 km), encompassing many peninsulas and numerous islands. These topographic features influence some local climate characteristics for each region, providing many different climatic variations across the country (Eleftheriou et al. 2018). As a result, the various climatic characteristics and meteorological parameters can alter the local climate, even within a few kilometers distance (Spyridi et al. 2015) in a way that the country presents the inhomogeneous geographical distribution of climatic variables.

In this work, we present the application of the WRF model in a double nesting approach, firstly to an initial domain of approximate resolution of 20 km covering entire Europe and then to an inner 5 km domain over Greece, dynamically downscaling ERA-Interim reanalysis data for the period of 1980–2010. ERA-Interim is one of the most reliable reanalysis datasets and has been used extensively in WRF downscaling modelling studies (e.g., Cardoso et al. 2019; Bieniek et al. 2016). The 5 km resolution describes the Greek territory with significantly high detail than lower resolved RCM simulations, e.g., of the EURO-CORDEX ensemble.

To our knowledge, this is the first time that a comprehensive high-resolution WRF model evaluation effort is presented for this geographical region and a long-term, climatological period. The performed statistical analysis involves the comparison of the results from WRF-ERA-Interim (hereafter WRF_5) of the high-resolution domain and the driver data ERA-Interim with the available for Greece observational data. This analysis aims (1) to show that our downscaling of ERA-Interim reanalysis to the Greek area produces comparable results to the available observational products and (2) to highlight the improvement in downscaled fields (WRF_5) compared to ERA-Interim. This work builds upon our previous studies that examined the WRF model performance concerning the influence of different choices of parameterization schemes, initialization times, and domain resolutions on its simulation ability during different periods, encouraging its further evaluation for this long historical climate study. Those research works included first of all sensitivity tests with seven different combinations of physics parameterizations for one year (Politi et al. 2017), examining the performance of the model to simulate surface variables, to select the five best setups, and then sensitivity tests for a period of 5 years with the selected schemes to arrive at the optimal configuration for the model setup (Politi et al. 2020). In the current work, the WRF output obtained by applying this optimal model configuration is used for the quantification of the 5 km resolution model performance in a detailed validation effort at various spatial and temporal scales for minimum and maximum temperatures (TX and TN) and precipitation (PR). These meteorological variables are commonly employed in climate model validation and are useful for obtaining climate indices and the studies of climate change impact assessment.

In the following Sect. 2, we present the methodology that includes the model design, the datasets, and the statistical analysis used in this study. Section 3 presents the analysis of the results for the model performance evaluation as well as the added value analysis through the comparison of the model simulations with the observations. Finally, Sect. 4 summarizes the overall results and presents concluding remarks.

2 Materials and methods

  • (a) Model setup and design

    The WRF version 3.6.1 was used in this study (Skamarock et al. 2008). The model ran for 30 years, over the period 1980–2010. For each simulation, the last four (4) days of the previous month were regarded as model spin-up for the following month and were discarded, thus, the model was re-initialized every month. The re-initialization on monthly basis was previously investigated in a 1-year sensitivity experiment, in comparison with two different types of time integration approaches: (1) seasonal simulations initialized every 6 months and (2) continuous integration with a single initialization. Based on the statistical analysis, all simulations didn’t reveal a significant impact on the skill of regional dynamical downscaling and showed overall a good representation for the daily minimum and maximum temperatures. Regarding precipitation, all simulations displayed similar results along with the systematic overestimation of precipitation that was also verified by the bias in all experiments. Based on these results, we proceeded to the use of monthly re-initialization of model runs, as none of the three experiments introduced significant impact on the examined variables; moreover it was more efficient computationally to perform simultaneous runs (in parallel). We also followed this procedure to avoid possible climatic shifts that may result from long-term continuous simulations (Tian et al. 2020). The model configuration was selected, as previously mentioned, after several sensitivity tests for shorter test periods (Politi et al. 2017, 2020). The large-scale meteorological initial and boundary conditions for WRF simulation forced from the ERA-Interim reanalysis fields of 0.75° × 0.75° (~ 83 km) horizontal resolution were provided by the European Centre for Medium-Range Forecasting (Dee et al. 2011). The lateral boundary conditions and sea surface temperature were both updated every 6 h from ERA-Interim. According to Dulière et al. (2011), the reanalysis data can be used for the evaluation of regional models as they sufficiently represent the large-scale forcing necessary for the models to simulate the physical processes and surface interactions. Also, spectral nudging was applied to ensure that the model did not deviate significantly from the driver input. Spectral nudging was used above the PBL and only over the coarse domain in order to allow WRF model to create small-scale characteristics in the finer resolution domain and near the surface. The simulation was nudged using wave number 5 and 4 in the x and y direction, respectively. It was applied in this study for temperature, winds and geopotential height (Politi et al. 2017).

    The WRF model configuration applied in this study includes two one-way nested domains, with a spatial resolution of 20 km × 20 km in the outermost domain (D01, 265 × 200 grid cells; Fig. 1), centered in the Mediterranean basin, and 5 km × 5 km in the innermost one (d02, 184 × 184 grid cells). Both domains have 40 vertical layers. The model domains share the same options of physics for radiation, microphysics, boundary layer scheme, and convection. More specifically, the Mellor–Yamada–Janjic scheme (MYJ) (Mellor and Yamada 1982) was employed, associated with the corresponding surface layers (SLP) scheme, which provides the surface fluxes of momentum, moisture, and heat to the planetary boundary layer (PBL) scheme. In this scheme, the entrainment develops only from local mixing. Regarding cloud microphysics, the WRF single-moment six-class scheme (WSM-6) was used to simulate six classes of water mass processes: vapour, cloud water, cloud ice rain, snow, and either graupel or hail (Hong and Lim 2006). The Betts–Miller–Janjić scheme (Janjić 2001) was chosen for the cumulus parameterization, taking into consideration the grey zone between 5 and 10 km in the vertical for the cumulus option. The radiation scheme was set to the newer version of the Rapid Radiative Transfer Model, RRTMG (Iacono et al. 2008), for both longwave and shortwave radiation. Finally, the Noah LSM was employed as the land surface model (LSM), as it is used widely for climate studies (Chen et al. 1996, 2001; Zhang et al. 2009).

  • (b) Observational datasets

    The observational data were available from the Hellenic National Meteorological Service (HNMS), which is the formally authorized organization for supplying meteorological observational data. The availability of continuous observations covering the selected period of 30 years was not feasible due to the lack of validated data by HNMS. The HNMS temperature dataset covered the period of 1980–2004 with measurements from 32 stations. On the other hand, the HNMS network of 66 stations provided continuous precipitation observations for the period of 1980–2000. Thus, the model assessment was realized during these specific time ranges as dictated by the validated data availability. Figure 2 illustrates the spatial distribution of the HNMS stations for (a) precipitation and (b) minimum and maximum temperatures. The only available gridded observational dataset E-OBS (https://www.ecad.eu/download/ensembles/download.php) is produced following interpolations based on station observations, and in conjunction with the very coarse network density for Greece, the specific dataset becomes less reliable for this region, which is characterized by complex terrain. Therefore, the present study focuses on model evaluation against real points and only validated observations.

  • (c) Statistical analysis

    The approach to evaluating our model setup and the downscaling methodology included the analysis of coarse resolution original data and the simulated downscaled higher resolution datasets. Thereupon, the statistical analysis involved the comparison of the results from WRF-ERA of the inner (nested) domain and driver data ERA-Interim with the available observational data. WRF downscaled results were converted to daily maximum and minimum variables, derived from the 6-h data simulations. The minimum and maximum temperatures of the ERA-INTERIM data were derived from the processing of the 00 and 12 UTC forecasts.

Fig. 1
figure 1

Modelling domains: D01 refers to the outermost domain and d02 to the nested domain (region of Greece)

Fig. 2
figure 2

Circle points refer to the observational stations used for the validation of the model results: a precipitation (in blue color) and b temperature (in red color)

Following this processing, observational values from every station were compared directly with the nearest model points (e.g., Zittis et al. 2017; El-Samra et al. 2018). Four near-coastal stations (Corfu, Methoni, Heraklion, and Argostoli) had their closest point in the sea, so it was preferred to use the next nearest grid point located to land area to avoid anomalies due to sea interaction. Height differences between model topography and stations were observed because of the complexity of the topography and coastlines. Thus, before the statistical analysis, a constant lapse-rate elevation correction of 6 °C/km was applied (Barstad et al. 2009; Heikkilä et al. 2011) to both minimum and maximum temperatures. However, for precipitation it is more difficult to apply corrections associated to differences in altitude. WRF_5 model results were also examined and presented by grid point—station to spatially evaluate the WRF model simulation skills in detail, rather than dividing them into different sub-regions.

The following standard errors statistics, with formulae described in Table 1 (where “o” is the value of the observational data, “f” is the simulated data) were estimated: the BIAS, the root mean square error (RMSE) that gives an overview of the accuracy of simulations, the mean absolute error (MAE), a measure of the absolute values of the model errors, the Pearson’s correlation coefficient (COR), the modified Index of Agreement (MIA), developed by (Willmott 1981; Legates and McCabe Jr. 1999) as a standardized measure of the degree of model prediction error, and finally the Nash–Sutcliffe efficiency (Nash and Sutcliffe 1970), NSE, which is a normalized skill score that determines an overall performance and can vary between 1 for perfect agreement and − ∞ for complete disagreement. While the NSE has traditionally been used in hydrological applications, it can also be applied to any type of model data with paired observations of the same quantities (Lee et al. 2018). According to Bieniek et al. (2016), station and reanalysis data contain their uncertainties; however, the term BIAS is used only to denote the differences between the WRF model output and observational data and not to imply that the differences are errors entirely born in the model results.

Table 1 Summary of statistical formulas calculated for model evaluation in this study

The model error was calculated as the difference between the modeled and observed values. The total error was then found by pooling together all the points of meteorological stations and not by averaging. Further analysis of the meteorological variables, regarding the representation of the extremes, was performed on daily basis, in terms of probability density function (PDF) and quantile–quantile (Q–Q) plots.

3 Results

The model analysis proceeded with the application of three approaches. The first one involved the investigation of the climatology in terms of spatial distribution and different temporal scales (i.e., annual to monthly). The second one included the evaluation of the model results against observational data based on statistical metrics. Both approaches of the analysis involved the investigation of the performance of the driver data (ERA-Interim) by comparison with the observational data and WRF output to showcase the added value of the downscaling methodology. The third approach was based on the investigation of the ability of the WRF model to represent extreme weather events in terms of pdf distributions and q-q plots.

3.1 Model topography

As it was aforementioned in the introduction, dynamical downscaling can add value to the modelling process by using local information through the interaction with mesoscale atmospheric features, particularly in regions with complex topography like Greece. The difference in the elevation between the reanalysis orography of the outer and inner model domains of WRF was quite significant throughout the domain, while an improvement was obtained with the higher-resolution as can be deduced from the plotted data in Fig. 3.

Fig. 3
figure 3

Real topography of the study area for Greece (a) and models topography according to the horizontal resolution of b ERA-Interim reanalysis, c the outermost (D01) 20-km domain and d the inner (d02) 5-km domain

More specifically, the mountains reached in the ERA-I up to 1250 m of elevation, in the WRF D01 up to 1500 m, marking a significant deviation from the highest peak of mountain Olympus (of around 2900 m height), whereas in the higher resolution domain d02, the maximum elevation reached up to 2400 m. Also, the topography of Pindos, the major mountain range of the country, as well as the higher mountains of the Peloponnese and Crete are resolved very realistically in d02. Similarly, the lower elevation features of the topography (valleys) resolved better in d02. These differences occurred due to the smoothing of the topography caused by the weaker description in the lower resolution domains. According to those findings, the aforementioned improved topography of the study area obtained with D01 and d02 resolutions was not possible to attain with the ERA-I coarse resolution. However, it was further evaluated against observations to derive the degree of agreement between the two datasets in an attempt to quantify the benefit of downscaling the reanalysis dataset. Hence, the work proceeded with the attainment of the first goal for the analysis, which involved the investigation of the quality in the representation of the downscaled WRF results against station data and coarse resolution reanalysis. That was obtained by initially applying all statistical metrics for the total number of available stations and then separately by grid point to check the spatial distribution of the errors.

3.2 Analysis of spatial and temporal climatology

In this subsection, the 5-km WRF high resolution simulations for maximum and minimum temperatures as well as for precipitation are analyzed and compared against ERA-Interim reanalysis (ERA-I) and station data (OBS) to verify the added value of the increase of the horizontal resolution.

3.2.1 Maximum and minimum temperature

The calculated mean maximum and mean minimum temperature monthly cycles, averaged over the historical period 1980–2004, are presented in Fig. 4a and b, respectively, along with the corresponding values of the standard deviation. The monthly mean values were calculated, for each dataset at the grid-point location of each station and then, were averaged over the total number of points (stations). Likewise, Fig. 4c and d show the calculated mean inter-annual variability of maximum and minimum temperatures.

Fig. 4
figure 4

Mean annual cycle (a, b) and inter-annual variability (c, d) of maximum and minimum temperatures averaged over the historical period of 1980–2004 for the total number of stations

Overall, the monthly cycle patterns of TX and TN were well represented with WRF_5 and highly correlated to the climatology of the country. Greece has a typical Mediterranean climate with summers characterized by long hot and dry spells (T > 30 °C), peaking between end-July and August, and rather cold winter months particularly, in its northern parts between end-January and February. However, milder winter months experience the southern parts of the country and the islands (Zerefos et al. 2011). The WRF_5 TX followed the typical pattern of monthly variation by displaying lower values in the winter months and higher ones during the summer. Besides, their comparisons to OBS revealed a very good agreement, slightly under-predicted, but within the calculated error range (Fig. 4a). On the other hand, ERA-I simulations did not present a better comparison with OBS and systematically were underestimated throughout all months. The same conclusions were drawn for the inter-annual cycle of TX as depicted in Fig. 4c, where WRF_5 simulations were in impressive agreement with OBS, in contrast to ERA-I results, which were underestimated persistently.

Furthermore, the high-resolution domain of WRF didn’t cool enough throughout the year as the results concerning the monthly cycle of TN showed a persistent overestimation (by approximately, 1 °C) compared to the reanalysis and OBS data (Fig. 4b). The calculated annual cycle presented the same difference, where the simulation of the reanalysis revealed closer to the observed data values than the WRF_5 (Fig. 4d). The particular model behaviour was attributed to persistently clear sky strong inversions (e.g., Soares et al. 2012a) in the complex topography of Greece in conjunction with the smooth geomorphological representation of ERA-I that might allow lower values of TN, especially over mountainous regions not realistically resolved by reanalysis resolution (see Fig. 3a). Such an issue could not be translated into canceling the ability of WRF to represent properly TN. A more extended discussion on this can be found in the statistical comparison analysis section.

Figures 5a and 6a illustrate the spatial distribution of 25-years simulated (reanalysis and WRF model) mean daily TX and TN (in the same figure) with the respective one of the meteorological point observations data. The spatial patterns of the simulated WRF_5 TX and TN were in agreement with the general climatological knowledge for this area and with the observational data, where at the same time, they revealed ERA-I deficiencies in the representation of temperatures in Greece by losing important information concerning the mountainous areas.

Fig. 5
figure 5

a Spatial distribution of 25-years mean daily maximum temperature TX for ERA-I and WRF_5 compared to weather station observations (points). b Spatial distribution of seasonal mean daily maximum temperature over the historical period of 1980–2004 for ERA-I and for c WRF_5 in comparison to the weather station data

Fig. 6
figure 6

a Spatial distribution of 25-years mean daily minimum temperature TN for ERA-I and WRF_5 compared to weather station observations (points). b Spatial distribution of seasonal mean daily minimum temperature over the historical period of 1980–2004 for ERA-I and for c WRF_5 in comparison to the weather station data

Figure 5b and c depicts the spatial distribution of 25 years seasonal mean daily maximum temperature derived from ERA-I and WRF_5 compared to the stations, respectively. The depiction was for winter (December, January, and February, DJF), spring (March, April, and May, MAM), summer (June, July, and August, JJA), and autumn (September, October, and November, SON). The comparison of WRF_5 with the observational data (Fig. 5c) showed that the model represented very well the geographical distribution of seasonal mean daily TX and illustrated the seasonal variation with similar ranges of temperature values among the two datasets.

The higher deviations were mostly attributed to the values over altitude or steep terrain. Regarding WRF_5’s spatial patterns, differences between inland and coastal areas were more intense during the summer. In the winter, mean TX varied from − 4 to 8 °C, over mountainous regions, wherein in the summer, mean TX ranged from 32 to 36 °C in parts of the west and south Greece. The comparison of WRF_5 TX with the observational values showed an underestimation of WRF_5 TX during the autumn season (SON, Fig. 5c) as well as a more homogeneous spatial distribution of the model with values around 12–18 °C. During spring, WRF_5 and observational temperatures compared very well (MAM Fig. 5c) with values in the approximate range of 16–24 °C. The comparison revealed overall realistic seasonal TX temperature patterns for the parts of the domain of lower elevation. Moreover, it is emphasised, that there was no observational network on mountainous areas to deduce the temperature deviations based on the terrain’s altitude. The seasonal distribution of reanalysis, as expected, presented a limited variation in TX values across the whole domain (Fig. 5b).

Similarly, the spatial pattern of the WRF_5 simulated seasonal mean daily TN compared very well to that of the observations, as illustrated in Fig. 6c. The model represented TN very well across all seasons, with the most vivid variations found in the summer and winter periods with values higher than 12 °C and lower than 12 °C, respectively, throughout the domain. The autumn TN values tended to have a more homogenous spatial distribution over land, with values close to 12 °C, while spring presented higher TN up to 15 °C. Same as with TX, ERA-I did not show a realistic variation in the spatial distribution of TN values (Fig. 7b).

Fig. 7
figure 7

a Mean annual cycle and b inter-annual variability of precipitation (mm/month) averaged over the historical period of 1980–2000 for the total of stations

3.2.2 Precipitation

The WRF_5 mean annual cycle of monthly total precipitation (Fig. 7a) was well represented by the model concerning the maximum values in the winter and minimum ones in the summer period, with a rainier season from mid-autumn to mid-spring. According to the climatology of Greece, the precipitation patterns are generally higher during the late autumn and winter months, along with the most significant amounts of rainfall. In fact, in November and especially December, the country receives the highest amounts of monthly rainfall, which decreases towards spring (Zerefos et al. 2011). Substantially low precipitation amounts characterize the spring and summer months.

WRF_5 leads to an improvement in representing the annual cycle (Fig. 7a) in comparison with ERA-I, as it presents a better agreement with observations, except for March and the period between May and July. WRF_5 slightly overestimates the rainfall amounts between January and July; however from August to November the WRF performance is strikingly accurate and the WRF_5 annual cycle almost overlaps with that of observational data. In November and December, the performance of the model reversed, resulting in lower precipitation values than the observations.

The overlapping of WRF_5 annual cycle with that of observational data between August and November, is a good indicator of WRF’s ability to produce rainfall correctly when nested in good-quality boundary conditions (García-Díez et al. 2015) because the model parameterizations have a higher impact on rainfall outputs when precipitation is controlled by local factors mostly during the late summer and mid-autumn (Argüeso et al. 2012).

On the other hand, the ERA-I simulations underestimated rainfall during most months of the year (August until April) but, an overestimation was found between May and July. Annual precipitation can also vary considerably from year to year, as Fig. 7b illustrates in the inter-annual cycle of the historical period from 1980 to 2000. WRF_5 overestimated the mean total precipitation for some years while ERA-I tended to underestimate it. ERA-I and OBS precipitation patterns have only a very close agreement between the years 1989 and 1994. Figure 8a shows the spatial distribution of mean annual total precipitation for ERA-I and WRF_5 in comparison to point observations (together in the same figure). The WRF_5 model captured well, in general, the observed spatial pattern of the annual precipitation fields, while it was more than evident that for ERA-I, it failed to depict the variance in the spatial distribution by smoothing the precipitation patterns in the mountainous areas. Both ERA-I and WRF_5 outputs showed that the maximum values of annual total precipitation were observed in the western part of the domain and over the mainland in a direction running from northwest-to-southeast due to the presence of high mountains. On the other hand, the annual total precipitation pattern showed smaller values over the Aegean Sea following the climatology of the country.

Fig. 8
figure 8

a Annual total precipitation climatology averaged over the historical period of 1980–2001 for ERA-I and WRF_5 in comparison to weather station data (points data). b Spatial distribution of mean seasonal accumulated precipitation over the historical period of 1980–2000 for ERA-I and for c WRF_5 in comparison to weather station data (points data)

At this point, it should be pointed out that due to the coarse station network on mountainous areas, it was not feasible to verify the excessive and more intense rainfall amounts. For this reason, we made a cross-comparison with the mean annual precipitation provided by the climate atlas of Greece for the period 1971–2000. The climate atlas has been developed by the formal meteorological organisation of Greece, the Hellenic National Meteorological Service (HNMS) and is available at http://climatlas.hnms.gr/sdi/?lang=EN. Although there is a 10-year offset, this dataset remains the only reliable source of information on the mean climatology of Greece. The cross-comparison shows that the spatial model performance is in good agreement with the HNMS data, as large rainfall amounts above 2000 mm are mainly observed on the mountains of western Greece (Pindos), the mount Olympus and the mountains of the island of Crete, while 1200–2000 mm are observed on the mountainous regions of the Peloponnese. Additionally, these findings are in line with the Report of the Bank of Greece (Zerefos et al. 2011) and (Nastos et al. 2016), where the mean annual precipitation received by the Greek mountain ranges is reported to be above 2200 mm over Pindos, 1800 mm over the mountains of Crete, and 1600 mm over the mountains of the Peloponnese. The lowest amounts below 400 mm are reported in the two mentioned studies, in the Saronic Gulf, the Eastern Peloponnese and the islands of the Southern Aegean (see Fig. 3a—for location guidance). Furthermore, there are not available validated satellite high-resolution data that could be reliably used for model output validation because there are limitations in the evaluation of the satellite data, mostly due to the complex terrain of Greece and the data sparse mountainous regions, as acknowledged by Nastos et al. (2016). However, Tian et al. (2020) based on other studies of Herrera et al. (2010), Heikkilä et al. (2011), Argüeso et al. (2012) explained that complex terrain with high elevations (e.g., over high mountains) of more than 2000 m are related to the highest deviations of precipitation produced by the model, suggesting that WRF at a 10 km resolution may still not capture these topographical features. Based on this and the description of the model topography (Sect. 3.1), it could be assumed that the current deviations in precipitation amounts between the WRF_5 and OBS are due to the model horizontal resolution and the coarse network of the stations.

Figure 8b and c depicts the spatial distribution of mean seasonal total precipitation for ERA-I, WRF_5, with point observations, respectively. WRF_5 model results were overall in agreement with the observations for the more rainy seasons of autumn and winter. They also presented higher precipitation amounts over mountainous areas and in the western parts of the country, following the known climatological patterns. Besides, the spatial pattern of precipitation in Greece is strongly associated with orography, and almost all low-pressure systems crossing the country and resulting in intense rainfall come from the west. Finally, the spatial distribution of ERA-I seasonal total precipitation did not yield a variation across the domain that could be comparable to that of the observations in autumn and spring seasons (Fig. 8b).

3.3 Evaluation based on statistical metrics

To assess our downscaling methodology quantitatively, we proceeded to the statistical evaluation of the simulated mean fields from WRF_5 and the driver ERA-Interim with historical observations of the examined variables. The statistical errors (as described in Sect. 2c) of maximum and minimum temperatures for daily and monthly averages for WRF_5 and ERA-Interim were calculated against observational data from the weather stations over the entire domain and summarized in Table 2. Table 2 includes, also, the statistical errors for precipitation in terms of daily and monthly cumulative values. In general, the WRF model performed better than reanalysis showing improvement with the downscaling results (WRF_5). The daily and monthly scale correlation coefficients for TX WRF_5 were 0.95 and 0.98, respectively, while the respective ones for ERA-I were much lower and equal to 0.82 and 0.8. The rather better COR values for ERA-I could be attributed to the smooth patterns of the reanalysis dataset, although their values did not unveil the heterogeneity across the domain as manifested by the observations. As was foreseeable, the errors reduce with the increasing time-averaging for WRF_5. The statistical results showed a cold bias of around − 0.6 °C regarding the daily and monthly TX WRF_5 and distinctively larger values for ERA-I of − 2.2 °C and − 3.3 °C, respectively. For daily WRF_5 TX, the RMSE and MAE were of the order of 2.5 °C and 1.8 °C, respectively, while for monthly averaging, these errors reduced to values of 1.7 °C and 1.2 °C, respectively. On the other hand, for ERA-I, the statistical errors were higher than and at least twice as large as the ones for WRF_5. The efficiency metrics NSE and MIA also improved significantly with the downscaling to values approximately equal to 0.9, while for ERA-I, their values were below 0.7. The efficiency metric of MIA was improved significantly with the downscaling to values approximately equal to 0.9, while for ERA-I, their values were below 0.7.

Table 2 Maximum temperature statistical errors between model results and reanalysis against observations

Regarding TN, correlations were found to be slightly smaller than those of TX, similarly to Zhang et al. (2009) and Soares et al. (2012a), but indicated improved downscaled results compared to those of reanalysis. Overall, the other statistical errors have improved values against ERA-I except for bias. Both comparisons revealed a warm bias of about 1 °C for WRF_5 and a cold bias around − 0.5 °C for ERA-I. The efficiency metrics for TN showed an improved performance of the model compared to reanalysis for both temporal scales.

In what concerns precipitation statistical errors over the entire domain, relatively low correlation values were calculated between observations and WRF_5 results (around 0.5) and very much lower for the case of ERA-I (~ 0.13) on daily scale. Although the WRF model improved with downscaling the results on precipitation significantly compared to ERA-I according to the error statistics, the values of COR, NSE, and MIA remained lower than those on temperatures. In general, the WRF_5 model overestimated precipitation compared to observational values but, the overall improvement over the ERA-I values was a positive outcome.

The annual cycle of the mean statistical errors calculated for the monthly maximum-minimum temperatures is presented in Figs. 9 and 10, respectively, and for total precipitation in Fig. 11, concerning WRF_5 simulations and ERA-I against observations. The best results for TX were obtained with the WRF_5 downscaling, which displayed lower BIAS than ERA-I during all the months of the year and with values below 1 °C. In particular, from May to July, ERA-I showed higher errors (BIAS, MAE, and RMSE) and a very much lower correlation compared to WRF_5 simulation. April and May presented a bias error close to zero for WRF_5. Additionally, both ERA-I and WRF_5 underestimated TX throughout the year. For some months, the acceptance criteria, defined by Emery et al. (2001) for air temperature, − 0.5 °C < bias <  + 0.5 °C, were not sufficiently met. It was also observed that MIA values for WRF_5 were higher for all the months compared to reanalysis, with only slightly lower ones in summer and autumn. NSE was overall higher for WRF_5 but, it reached negative values only in June (NSE = − 0.003), indicating that mean of the observations was a better predictor than the model for that month. Similar results were obtained with the comparison of TN, where WRF_5 for all months yielded much lower statistical errors than ERA-I (Fig. 10). ERA-I shows constantly lower values of TN compared to observations and thus tends to yield lower values of bias error. ERA-I presents a constant cold bias during all seasons except autumn. Consequently, the reanalysis bias in TN is much lower than that of WRF_5 due to compensation errors. At the same time, downscaled model results were characterized with remarkably higher correlations coefficients, MIA, and NSE values than the reanalysis during all months. Those NSE values of the WRF_5 indicated that the downscaled model data set was a more skillful predictor than the mean of the observations.

Fig. 9
figure 9

Annual cycles of mean monthly maximum temperature errors (TX) of the ERA-I (dotted orange) and 5-km WRF (solid red) simulations over the entire domain

Fig. 10
figure 10

Annual cycles of mean monthly minimum temperature errors (TN) of the ERA-I (dotted orange) and 5-km WRF (solid red) simulations over the entire domain

Fig. 11
figure 11

Annual cycle of mean monthly precipitation errors (PR) of the ERA-I (dotted orange) and 5-km WRF (solid green) simulations over the entire domain

Regarding precipitation, the WRF_5 overestimated the rainfall during most months of the year, particularly from April to July, and underestimated it in November and December. On the other hand, ERA-I underestimated precipitation during autumn and winter months. The results obtained with the WRF_5 produced similar MAE and RMSE errors but higher correlation with the observed annual cycle compared to those of reanalysis which outperformed WRF_5 during summer months. Moreover, the annual cycle of the efficiency metric MIA showed an improved performance of the model compared to reanalysis. The NSE score for WRF_5 was negative only during May and summer. Its value for the rest of the months indicates positive model skill. Therefore, overall, the downscaled WRF_5 simulation clearly showed some added value compared with the driver reanalysis dataset ERA-Interim.

Maps of spatial distribution for some of the statistical errors for each point station are included in the online Supplementary Information (SI). Figure SI-1 in (SI) presents the spatial distribution of monthly statistical errors MIA and MAE for temperatures and precipitation. It would not be so safe to express an absolute conclusion regarding the minimum and maximum temperatures due to the poor sampling of the stations but, we could localize stations with lower performance, such as those of Chania for TX and TN, Chania, Kalamata, and Lamia only for TN, see arrows in (Figure SI-1a–d).

Concerning the spatial pattern for precipitation, MIA values in the majority of the stations range from 0.7 to 0.9 with some exceptions in the north of the country, and some limited coastal stations with values above 0.6. Some of these stations presented absolute errors above 30 mm (and probably related to mountainous or coastal locations), while most of them showed values between 10 and 30 mm.

The seasonal statistical analysis of WRF_5 and ERA-I data compared to weather stations data were calculated and summarized in Table 3 for TX, TN, and PR. We performed the analysis for each metric by pooling together all the points of monthly values for the four seasons for the entire domain. WRF_5 TX correlations coefficients were higher than for ERA-I in all seasons and more significantly with values around 0.95 for winter, spring, and autumn. Although the lowest value of 0.59 appeared in summer, the downscaling of the model still strongly outperformed the reanalysis value that was equal to 0.17. Furthermore, less cold bias was observed for WRF_5 TX compared to ERA-I, with remarkably improved results, especially for spring and summer seasons, as well as significantly smaller RMSE and MAE values across all seasons. ERA-I outperformed WRF_5 with a smaller warm bias of TX (0.44 °C) only in SON. NSE indicated a negative skill of ERA-I during all seasons.

Table 3 Seasonal statistical errors of maximum temperature, minimum temperature and precipitation between model results and reanalysis for the total stations' grid points

Seasonal statistical errors of TN varied compared to those of TX. Seasonal correlation values between model and reanalysis were comparable though WRF_5 outperformed ERA-I in all seasons. Although a consistent warm bias was found for WRF_5 during all seasons, the reanalysis results showed a warm bias of 0.8 °C only in autumn. WRF_5 turned negative bias in reanalysis into positive bias during MAM, JJA, and DJF with an improved model performance during spring. In general, the improvement was not as obvious in bias but it was unveiled with the higher WRF_5 COR, as well as with the lower RMSE and MAE statistics of monthly TN in all seasons with values not above 2.3 °C.

Regarding the seasonal statistical errors of monthly precipitation for WRF_5 and ERA-Interim, comparable results were found concerning the correlation, where model values were low in the range of 0.47 in summer to 0.62 in autumn but slightly better than ERA-I. Seasonal biases of WRF_5 were significantly lower compared to ERA-I, except in the spring and particularly in the summer, where a higher overestimation was noted. The amplitude of RMSE errors was also comparable for all seasons between WRF_5 and ERA-I.

The spatial distribution of the COR and BIAS statistics, between WRF_5 and stations data for each season, is presented in the SI (Figure SI-2) for the examined meteorological variables. Regarding TX, there was a significant change in the model downscaling performance for winter and summer compared to spring and autumn. The correlation coefficient reached lower values (below 0.9) for the majority of the stations during DJF and JJA but not less than 0.8. The MAM and SON COR values were consistently high, around 0.98 for all stations. Bias error was under-predicted for all seasons in the majority of the stations, except for a few stations that slightly over-predicted TX during mostly MAM and JJA (represented by orange to red colors dots). Similar results were found for TN concerning the seasonal correlations with values not lower than 0.9 except few stations during DJF and JJA where COR values varied from 0.7 to 1. We observed a systematic warm bias during all seasons except for colder bias, mainly in coastal stations, marked with blue color.

Likewise, concerning the precipitation fields, (see SI, Figure SI-3a), the seasonal pattern of WRF_5 yielded similar correlation coefficients for all stations with a range of 0.6–0.85, showing a good ability of the downscaling process to describe the precipitation in Greece with slightly higher values specifically in autumn. However, in summer, the correlation values were smaller. This correlation pattern was in agreement with the global seasonal precipitation. In all seasons, WRF_5 downscaled results overestimated precipitation in most parts of Greece, except in the southwest coasts as well as in the eastern coast (and islands) in the winter, where precipitation was underestimated with a range − 40 to − 10% (see SI, Figure SI-3b). The WRF_5 performance was regarded as outstanding because pbias rarely exceeded ± 25–30% in the majority of the stations, in agreement with Argüeso et al. (2012).

3.4 Probabilities densities and Q–Q plots

According to Komurcu et al. (2018), the ability of a downscaling methodology to reproduce mean values of observed fields and improve upon reanalysis forecasts is significant; moreover, a worthwhile downscaling methodology should have the ability to simulate climate extremes well. In this subsection, we assess the quality of our downscaled results based on the realistic simulations of extremes of daily TX, TN, and precipitation. Figure 12 shows the seasonal probability distributions of the daily maximum temperature for WRF_5, ERA-I, and station data for the four seasons. The median temperature was underestimated by the ERA-I reanalysis in general but more significantly during the summer period and slightly in spring, showing a significant shift towards colder values. Overall, WRF_5 simulations were in excellent agreement with the observations during all seasons, with some slight shift of the median maximum temperature towards cooler values, in winter and autumn.

Fig. 12
figure 12

Comparison of density distributions of daily TX between WRF_5, ERA-Interim and observations for all seasons for 1980–2004

The observed and modeled quantiles in Fig. 13, present the calculated Q–Q probability plots of daily maximum (TX) temperature produced by WRF_5 and ERA-I for 1980–2004. The improvement in the representation of almost all quantiles, including the extreme quantiles, with the downscaled results compared to those of the reanalysis was evident for all seasons, and actually with WRF_5 marking an excellent match with the 1:1 line.

Fig. 13
figure 13

Q–Q plots of daily maximum (TX) temperature generated by WRF_5 and ERA-Interim for 1980–2004, in comparison with observations for all seasons

Figure 14 depicts the density distribution of TN. The WRF_5 histogram was in line with the observations, while ERA-I indicated significantly lower density values for all seasons but with good agreement along with the distribution tails. The distribution of the WRF_5 model compared to observations showed a right shift towards higher TN values in all seasons and particularly, in the summer and autumn. In what concerns the daily minimum (TN) temperature quantiles in Fig. 15, there was a clear improvement of WRF_5 for all seasons compared to ERA-I, particularly in winter and spring. In general, the extreme temperatures, maximum and minimum, were better reproduced by the WRF_5 simulations. Overall, those results reinforced the added value of the downscaling compared to reanalysis.

Fig. 14
figure 14

Comparison of density distributions of daily TN between WRF_5, ERA-Interim and observations for all seasons for 1980–2004

Fig. 15
figure 15

Q–Q plots of daily minimum (TN) temperature generated by WRF_5 and ERA-Interim for 1980–2004, in comparison with observations for all seasons

To compute the PDF for precipitation, only the rainy days with precipitation amounts higher than 1 mm (Klein Tank et al. 2009) were included, because the focus was placed on the examination of the probability of rainfall intensity and not of the precipitation occurrence. The seasonal frequency distribution of daily precipitation (Fig. 16) was plotted on a logarithmic scale with bins of 1 mm to highlight the extremely-strong precipitation rates. Climate models tend to produce too much light precipitation, also verified for WRF according to our study. During all seasons, the downscaled model results improved compared to ERA-I, which presented a higher left shift with the absence of the highest precipitation bins due to the smoother fields of reanalysis. Noticeable were some cases where WRF produced in excess precipitation events (above 200 mm/day) compared to observations during spring. That might be caused either by the model or by the station density that could be too low to accurately satisfy the WRF_5 resolution, especially in mountainous areas. Based on observations, the longest tails, with events close to 200 mm/day were observed for winter and particularly in autumn. Sometimes, the later season is also associated with extratropical cyclones, which produce intense extremes and flooding events in West Greece (Pytharoulis et al. 2000; Nastos et al. 2018; Emmanouil et al. 2021). These events were properly captured only with the higher resolution simulations of WRF_5.

Fig. 16
figure 16

Comparison of frequency distributions of daily precipitation between WRF_5, ERA-Interim and observations for all seasons in the period 1980–2000

Figure 17 depicts the quantiles distribution of simulated and observed precipitation data to assess further the ability of the model to produce extremes. It was evident that WRF_5 presented more efficiently, especially the higher-ranking quantiles than ERA-I in all seasons, with the closest description of quantiles found in spring. Although during all seasons, both ERA-I and WRF_5 persistently underpredicted the strongest precipitation events, WRF_5 only presented the ability to overestimate the extreme quantiles in the spring, a fact that was also verified in the previous PDF analysis. The ERA-I dataset could not capture the high-intensity event tails in any of the cases due to the relative homogeneity induced by the coarse resolution of reanalysis.

Fig. 17
figure 17

Q–Q plots of daily precipitation generated by WRF_5 and ERA-Interim for 1980–2000, in comparison with observations for all seasons

4 Discussion and conclusions

Greece, located in the Mediterranean basin, is characterized by the typical Mediterranean climate with relatively mild wet winters and warm summers. However, due to the complex topography described by the mean altitude of the Greek mainland, the gradient in elevation (100–200 m per km) along with the extensive coastline and the prevailing weather systems, sometimes the aforementioned characterization of climate deviates from the “typical” in regional level. The presented study investigated the performance of the WRF model to dynamically downscale the coarse-resolution ERA-Interim dataset to the high spatial resolution of 5 km × 5 km grid over the area of Greece. It is the first study to our knowledge that analyses the WRF model performance for this area at high spatial horizontal resolution and a long-term period. More specifically, the precipitation was analysed for 21-year period and temperature for a 25-year period. Therefore, the main focus of this paper was on the general quantification of the high-resolution model performance regarding the spatial and temporal distribution of three meteorological variables, minimum temperature TN, maximum temperature TX at 2 m, and total precipitation. The assessment of this work included the statistical evaluation of the WRF_5 model and ERA-I reanalysis datasets with the historical observations from the HNMS. The procedure also aimed to highlight the added value of the downscaling methodology regarding the reanalysis fields. First analyzed was the model's added value in the representation of the topography of the study area. The elevation of the ERA-I orography improved from the outer to the inner WRF model domains by increasing the spatial resolution. Afterward, we analyzed and presented the results taking into account the peculiarities of the region in terms of complex topography, high elevation/orography, and spatial and temporal observational data availability. The investigation showed that the WRF model might very well represent the annual and seasonal geographical distribution of TX and TN in the study area. Also, the high-resolution model produced the seasonal differences observed with similar ranges concerning the temperature values, although there was a limited number of meteorological stations available (a network of 32 stations of continuous observations). Similar were the findings of Kryza et al. (2017) who indicated that the spatial distribution of meteorological variables obtained with the WRF model with the same horizontal resolution (5 km × 5 km) for Poland was convincingly reproduced, following the country’s climatology. We point out that the comparisons with similar regional climate studies should carefully be performed, as there can be quantified and qualified differences between geographical regions in terms of data availability, station network density, horizontal resolution, and driving forcings.

It is regarded as a valuable and important finding that our downscaling methodology provided a very good agreement with the observations for maximum and minimum temperatures compared to the coarse resolution ERA-Interim. More specifically, considering TX, WRF_5 reduced remarkably the daily bias from − 2.19 °C of reanalysis to − 0.6 °C with a very high correlation coefficient equal to 0.96. The same range of bias error (mean surface temperature) was also found by Kryza et al. (2017) that was equal to 0.23 °C for Poland, (but resulted from the use of twice as many stations) and Soares et al. (2012a) for Portugal with 9 km of horizontal model resolution, with the value of 0.1 °C. Another study of Heikkilä et al. (2011) using WRF at 10 km resolution and forced by ERA-40 reported a mean bias of − 0.7 °C and 0.97 correlation. Concerning TN, although a cold bias of ERA-I was found to change to warm bias from − 0.5 to 1 °C, all the other statistical metrics unveiled that downscaled model results remained to present the best performance against reanalysis. Other studies did not report improved results but a similar range of bias (e.g., Soares et al. 2012a) from 0.5 to − 0.4 °C. Daily maximum and minimum temperature biases were between 0.06 to 1.84 °C in the study of Zhang et al. (2016) for the Hawaiian Islands. Generally, improved results for WRF_5 were also found regarding the RMSE and MAE values of monthly TN in seasonal analysis, although correlation coefficients were comparable. PDF analysis and quantiles revealed an improvement of WRF_5 during spring, winter, and autumn but not for summer for the extreme quantiles compared to reanalysis.

Regarding precipitation, WRF_5 model results, as well as ERA-Interim, reproduced reasonably well the observed precipitation at monthly and inter-annual time scales, evidenced by the two more rainy seasons, spring and autumn, and the winter precipitation maximum. These results were generally in line with previous analyses (e.g., Fantini et al. 2018) that simulated similar regions (e.g., Italy). Overall, WRF_5 reproduced well the spatial pattern of the observed annual and seasonal precipitation in most parts of Greece, even though there were large wet biases over the mountainous regions. These biases most likely resulted from the unrealistic simulation of rain shadow effects on precipitation caused by the high mountains (Tian et al. 2020). Precipitation results were better reproduced in our WRF-downscaled simulations compared to ERA-Interim because biases and RMSEs were significantly reduced by the downscaling. Precipitation values satisfactorily correlated with observations from 66 stations (covering the period 1980–2000), uniformly distributed over the study area (monthly correlation coefficient mean COR = 0.67 for all stations; and seasonally COR = 0.62–0.82 for individual stations). Those findings were not as good as in the study of Cardoso et al. (2013) during summer regarding the seasonal precipitation correlation for the Iberia maybe due to the higher density network of the latter, but in agreement with Heikkilä et al. (2011) for Norway. PBIAS results were similar to other studies found by (Argüeso et al. 2012; Cardoso et al. 2013), where WRF significantly overestimated precipitation in most of Iberia during summer, while in winter and -autumn in our case—the underestimation of ERA-I turned to an improved small PBIAS for WRF. The monthly errors were similar and comparable to the other previous studies; for example, Soares et al. 2012a, reported monthly values of COR, RMSE, MAE, and PBIAS of 0.89, − 8.9%, 24.4 mm, and 43.4 mm, respectively. Furthermore, the WRF model performance was outstanding compared to other studies over Europe (Argüeso et al. 2012; Fantini et al. 2018) because pbias rarely exceeded ± 25–30% in the majority of the stations(Argüeso et al. 2012; Fantini et al. 2018). In q–q plots, WRF_5 simulation produced better extremes compared to the driver data that consistently underestimated most quantiles while WRF_5 showed an overprediction of higher quantiles during spring. Prein et al. (2016) in a comparison study via daily q-q plots of EU-CORDEX with observations found that the 0.11° models outperformed on the representation of extreme precipitation in all regions in MAM against 0.44°, but not for the Carpathians and the Alps regions. That behavior could be attributed to the fact that extreme precipitation events often have small spatial and temporal extents, and thus their analysis in a combination of complex topography remains very sensitive. As such, extreme precipitation rates will be investigated on shorter temporal scales in a future work. In general, the presented results highlight the ability of WRF_5 model to correctly distribute precipitation all over Greece, which indicates its efficiency to reproduce the climatic characteristics of different regions and to incorporate sufficiently the effect of complex topographical features.

At this point, it is necessary to discuss an important issue in what concerns the added value of WRF model regarding the downscaled precipitation results compared to ERA-I. At a first look, comparing the statistical metrics namely MAE, RMSE, PBIAS, COR, MIA and NSE between ERA_I and OBS and between WRF_5 and OBS, the improvement in downscaled results is not entirely clear (Tables 2 and 3). On the other hand, the representation of extreme climate by RCMs is an increasingly important issue for impact assessment. The process of deeper investigation of the ability of WRF model to simulate climate extremes in terms of probabilities densities and Q–Q plot revealed a clear improvement in terms of extreme values (Figs. 16 and 17). According to this analysis, WRF_5 represented in all seasons more efficiently the higher-ranking quantiles than ERA-I. These results highlight the fact that WRF_5 adds value compared to reanalysis in terms of extreme precipitation values, which is of high interest for evaluating the impact of climate change and at the same time, reinforcing the need of using dynamical downscaling. Thus, WRF_5 overcomes the problems associated with the observational dataset or even the lack of station data especially at high-altitude by yielding a significant improvement in terms of extreme values. This conclusion does not denounce the importance of the availability of high-quality observational datasets in terms of density network, long-term continuous and homogenous data, for high-resolution model studies, to overcome any deficiencies of an RCM in representing mean values.

According to future climate scenarios, the Mediterranean zone will be strongly affected by global climate change (e.g., Zittis et al. 2016; Molina et al. 2020). The presented results give confidence that the current version of the WRF model, set-up and parameterized with a high resolution of 5 km for the domain of Greece, can simulate synoptic meteorological variables and their extremes, pointing to its high potential to yield reliable information on future climate changes in extreme weather. In our next steps, we will focus on exploring the ability of a GCM to reproduce the climatology of Greece at high resolution. Further research will aim at establishing confidence in the use of historical and dynamically downscaled simulations using GCM projections.