1 Introduction

Moist mild winters and dry warm/hot summers are a common feature which characterises the Mediterranean climate. The large spatio-temporal precipitation and temperature gradients observed in these climates render these regions very vulnerable to climate change, particularly in the late twenty-first century (Diffenbaugh and Giorgi 2012; Alessandri et al. 2014). From 1880 to 2012 the global mean temperatures have increased on average by 0.85 °C and the last 30 year period was likely the warmest of the last 1400 years in the Northern Hemisphere (IPCC 2014), with each decade warmer than the preceding. In the recent decades, along with the rise in the mean temperature, the extreme high temperature events have also intensified (Frich et al. 2002; Klein Tank et al. 2002, 2003; Alexander et al. 2006; Tebaldi et al. 2006; Moberg et al. 2006; Della-Marta et al. 2007). From 1880 to 2005 the intra-annual daily temperature range has increased by 6% in Western Europe along with a tripling of the frequency of hot days and doubling of the length of heat waves (Fischer and Schär 2010). Changes in minimum temperature extremes are expected to be more pronounced than maximum temperature extremes (Sillmann et al. 2013; Cattiaux et al. 2013) yet, models generally underestimate the temperature range and overestimate intra-seasonal variability, particularly in summer. In CMIP5 (Coupled Model Intercomparison Project) GCMs’ (Global Climate Models) future seasonal temperature changes are on average higher in summer. In winter the rise of the minimum temperatures is higher than the maximum, whereas in summer it is the maximum temperature which has the largest expansion. The reduction of the number of cold spells (days with minimum temperature lower than the 10th percentile) during the twenty-first century occurs in the 13 GCMs analysed by Peings et al. (2013). All models, but one, also project less severe events and the intensity changes are in line with the mean temperature rise. Unlike the minimum temperature, maximum temperature extremes are enhanced far more than mean temperature (Meehl et al. 2007; Schär et al. 2004; Lau et al. 2014; Schoetter et al. 2015), i.e. the broadening of the pdf is also relevant for the occurrence of extreme maximum temperatures (Weisheimer and Palmer 2005; Lau et al. 2014). However there isn’t a unanimity of this effect on the frequency and duration of extreme heat events (Weisheimer and Palmer 2005; Della-Marta et al. 2007; Lau et al. 2014). Heat waves, which are extreme phenomena within extreme heat events, are expected to last longer, be more intense and encompass larger areas at the end of the twenty-first century (Russo et al. 2014). According to Schoetter et al. (2015), the pdf broadening plays a smaller part than the shift of the median in the increase in heat wave severity, it is, however, still responsible for an amplification by a factor greater than 1.5 and is regional and seasonal specific, increasing the uncertainty associated to future projections.

At the current GCM resolutions, many processes that are responsible for regional to local climates are not accurately represented; thus, the scales where climate change will be mostly felt are poorly characterised. This is greatly amplified in areas where local thermal circulations play an important role, i.e. coastlines land heterogeneities and complex topography (Frei et al. 2003; Soares et al. 2012a).

At the regional or local scale the assessment of the impact of climate change and the uncertainties associated to the climate change scenarios is usually achieved through an ensemble of dynamical or statistical regional climate downscaled projections. In recent years three multi-model ensemble projects have been dedicated to Europe: PRUDENCE (Christensen and Christensen 2007), ENSEMBLES (Hewitt 2005; van der Linden et al. 2009) and EURO-CORDEX (Giorgi et al. 2009). The latter represents the common European domain of the World Climate Research Program Coordinated Regional Downscaling Experiment (CORDEX) in which a large ensemble of continental scale simulations is performed for present and future climates based on the CMIP5 GCM projections. In CORDEX, all simulations were performed with a 0.44° horizontal resolution and, in EURO-CORDEX an additional 0.11° resolution was also implemented by some modelling groups. Three Representative Concentration Pathways (RCP) scenarios, RCP2.6, RCP4.5 and RCP8.5 were considered as future greenhouse gases emissions scenarios, however not all modelling groups simulated all of the scenarios. The less severe scenario (RCP2.6) assumes that the effect of the greenhouse gases will be equivalent to a radiative forcing increase of 2.6 W/m2 in the end of the twenty-first century relative to the pre-industrial era, associated to a peek of global emissions between 2010 and 2020 (Moss et al. 2010; van Vuuren et al. 2011). Since it is the mildest scenario, it is also the least simulated. Conversely, RCP8.5 assumes that the emissions rise throughout the twenty-first century (Riahi et al. 2011) implying at its end a radiative forcing of 8.5 W/m2 relative to the pre-industrial era. As the extreme severe scenario and since it represents the more credible forecast if no action is taken, it is the most simulated. In RCP4.5 emissions peak around 2040 and its stabilisation until 2100, representing an increase of 4.5 W/m2 in radiative forcing relative to the pre-industrial era. By the year 2100 the greenhouse gas concentrations will be greater than 650 and 1370 ppm CO2-e (Moss et al. 2010) and median global warming of 2.4 and 4.9 °C above the pre-industrial baseline is expected for RCP4.5 and RCP8.5 respectively (Rogelj et al. 2012).

The EURO-CORDEX hindcast simulations, forced by ERA-Interim (Dee et al. 2011), display cold wet biases in most seasons over large areas of Europe, but some models exhibit warm dry biases over southern and southwestern Europe (Kotlarski et al. 2014). These authors expose models’ ability to simulate the main seasonal characteristics of European climate indicating, however, that they usually overestimate the magnitude of the interannual temperature variability. A cold bias was also found by Katragkou et al. (2015) in their analysis of six hindcast WRF simulations for the EURO-CORDEX domain with different physical configurations. The EURO-CORDEX projections scenarios 2014(RCP8.5 and RCP4.5) are consistent with the ENSEMBLES SRES A1B results, however the higher resolution provides important regional details (Jacob et al. ). The EURO-CORDEX scenarios show robust and significant changes in the annual mean temperature and the projections for RCP4.5 are milder than the ENSEMBLES SRES A1B, while the changes in the RCP8.5 scenario are considerably more severe.

The expected changes in temperature extremes in the RCM ensembles is in line with the results from the GCM community, yet in some regions these are amplified. Model results show a larger discrepancy for the extremes than for the mean temperature and model spread of the forecasted changes is larger than natural variability (Kjellström et al. 2007). The large changes in cold extremes are associated to the reduction of the areas covered by snow or ice, particularly in northern Europe, whereas the soil moisture deficits play an important role in amplifying the warm extremes in southern Europe (Seneviratne et al. 2006; Jaeger and Seneviratne 2011; Hirschi et al. 2011).

The extreme heat wave of the summer 2003 prompted an evaluation of the frequency of past and future extreme phenomena. Beniston (2004) found that in the most extreme PRUDENCE scenario, these types of heat waves would be more frequent by the end of the twenty-first century. Vautard et al. (2015) analysed the EURO-CORDEX hindcast simulations ability to reproduce the summertime mean temperature 90th percentile. The results show a large spread between simulations however, for most models, the increase in resolution leads to a reduction of biases particularly in the representation of heat waves along coastlines. Russo et al. (2015) proposed a new metric to evaluate the magnitude of heat waves and applied it to the EURO-CORDEX climate simulations. An increased probability of occurrence of heat waves with magnitudes, duration and extension greater than the 2010 heat wave is projected for the next two decades. The probability of larger (in magnitude, extension and duration) heat waves is greater in RCP8.5 than in RCP4.5, and internal variability across Europe may mask or amplify regional heat waves.

Climate model biases contribute to the overall uncertainty in the climate projections. Often bias adjustment methods are used to provide additional information to the local climatic data however, even for current climate, the skill and use of bias adjustment methods is not without controversy. The skill is not only variable dependant but also contingent on the quality of underlying predictor, region under analysis and temporal time-frame (Maraun 2013; Gutiérrez et al. 2018). In a recent assessment of bias adjustment methods these fragilities were particularly highlighted for temporal variability (Maraun et al. 2017b), ability to represent extremes (Hertig et al. 2017) and atmospheric circulation processes (Soares et al. 2018). The application of such methods to climate change scenarios is even more questionable (Ehret et al. 2012; Maraun et al. 2017a) since climate model biases are nonstationary depending on the climate state (Bellprat et al. 2013). Current bias adjustment technics do not incorporate physical processes in their methodologies and thus inconsistencies between variables develop and introduce another level of uncertainty in the climate change signal (Ho et al. 2012; Maraun 2016, 2017a). Ehret et al. (2012) advocates that bias reduction should be accomplished by the increase of model resolution to convection permitting scales with an ensemble of RCMs and GCMs in a similar methodology as the one employed in the CORDEX ensemble.

Recently, Soares et al. (2012b, 2017a) characterised the climate change signal for precipitation for Portugal, focusing on the end of the twenty-first century. A significant reduction in precipitation intensity is projected for spring and autumn as well as a significant reduction of wet days (days with precipitation above 1 mm). The ensuing reduction of soil moisture will have a significant impact on temperature through evapotranspiration (latent heat fluxes) anomalies (Seneviratne et al. 2010; Dosio 2017). A reduction of wind speed in winter, spring and autumn along with an increase of northerly flow during summer is also projected for the end of the twenty-first century (Cardoso et al. 2016; Soares et al. 2017b). Thus a decrease in the transport, from the Atlantic Ocean, of mild moist air during the cold seasons and cool moist air during the summer will exacerbate the effects of global temperature changes. The analysis of temperature extremes for Portugal has been performed by either an analysis of one model and one scenario results (Miranda et al. 2002; Ramos et al. 2011 and; Pereira et al. 2017) or for mid-twenty-first century, with a multi-model ensemble form the ENSEMBLES A1B SRES scenario (Andrade et al. 2014). All of the results point towards higher future mean temperatures and to an increase in frequency and intensity of heat extremes and a reduction in cold. However, no attempt has been made to quantify or reduce uncertainty or to analyse interannual variability. Here we quantify changes in maximum and minimum temperatures over Portugal for the end of the twenty-first century using two scenarios (RCP4.5 and RCP8.5) based on the EURO-CORDEX simulations (15 at 0.11° resolution and 22 at 0.44°) as well as a high resolution WRF simulation (regular 9 km grid on a Lambert projection—~0.08o in latitude) for RCP8.5. The evolution of extreme heat and cold temperatures along the twenty-first century is also focused as well as the interannual variability. In order to reduce the uncertainty associated to the climate change projections we propose an analysis based on multi-model ensembles. Firstly, the historical simulations (1971–2000) are carefully evaluated focusing on the models’ ability to represent the maximum and minimum temperature spatio-temporal characteristics and PDFs. This will be achieved with the use standard error statistics and distribution skill scores. Secondly, multi-model ensembles are built, for each variable, from the EURO-CORDEX simulations (8 ensembles in total per variable—maximum or minimum temperature). A summarized account of the simulations and observations along with the description of the methods is given in Sect. 2. The historical simulations from the different models and ensembles are analysed in Sect. 3. In Sect. 4 the temperature changes for the end of the twenty-first century are presented for the multi-model ensembles and the WRF simulations. The main conclusions are summarised in Sect. 5.

2 Data and methods

With the purpose of quantifying the climate signal for the period 2071–2100 with respect to 1971–2000 which is already warmer than pre-industrial climate, we analyse a WRF high-resolution climate simulation for the CMIP5 RCP8.5 scenario and the EURO-CORDEX RCM simulations for the CMIP5 RCP8.5 and RCP4.5 scenarios.

2.1 EURO-CORDEX simulations

The EURO-CORDEX time periods analysed were 1971–2000 for the historical and 2071–2100 for the future. Here two scenarios were used, i.e. RCP4.5 and RCP8.5. The common domain is displayed in Fig. 1a and an example of the topography at 0.11° and 0.44° is presented in Fig. 1c. The maximum and minimum daily temperatures at 0.11° and 0.44° resolution were retrieved from the ESGFFootnote 1 portal (Earth System Grid Federation). When this analysis was initiated, 19 model datasets at 0.44° resolution and 15 at 0.11° resolution were available. A summary of the RCMs, their driving GCMs and the respective resolutions is supplied in Table 1. Kotlarski et al. (2014) and Katragkou et al. (2015) provide detailed descriptions of the different parametrisations. Regarding precipitation Prein et al. (2016) and Soares and Cardoso (2017d) found significant added value linked to the used of higher resolution within the EURO-CORDEX, especially for extremes. The ensemble herein used is an ensemble of opportunity, thus not all GCMs were used by individual RCMs to downscale the historical and future climate. Hence, uncertainty cannot be systematically evaluated due to the nonlinear interactions RCM/GCM which adds another layer of uncertainty (Collins et al. 2006; Déqué et al. 2007; Déqué 2010; Hawkins et al. 2009) Column 5 of Table 1 provides the institute acronyms which were used in the figure captions. When more than one GCM is downscaled a number is added in order to differentiate between GCMs.

Fig. 1
figure 1

a EURO-CORDEX common domain, b WRF 9 km simulation domain, c Portuguese Orography (m) according to Gtopo 30 dataset (30″ resolution), EURO-CORDEX RCMs topography (at 0.11° and 0.44° resolution), and d selected Portuguese stations

Table 1 EURO-CORDEX regional climate models considered in the present study, along with the responsible institution, the forcing global climate model, the acronym for each model combination (RCM-GCM) and the used resolutions

2.2 WRF climate simulations

The WRF model was used to perform a set of high resolution climate simulations. All comprise two domains centred on the Iberian Peninsula (Fig. 1b) with 27 km and 9 km resolutions, respectively. The higher resolution is nested in the 27 km domain with one way nesting. Both domains share 49 vertical levels up to 50 hPa and a high planetary boundary layer discretisation (Cardoso et al. 2013). The large scale circulation in the 27 km domain is nudged towards the forcing boundary conditions with spectral nudging. Further model set up description can be found in Soares et al. (2012a, 2017a) and Cardoso et al. (2013). This model set-up was extensively validated for inland surface variables, namely, temperatures and precipitation in Portugal (Soares et al. 2012a), Iberian precipitation (Cardoso et al. 2013), and offshore winds in Soares et al. (2014, 2017b). Moreover, its results were used to investigate moisture recycling processes in Iberia (Rios-Entenza et al. 2014), and the coastal clouds diurnal cycle in western Iberia (Martins et al. 2016). Other examples, are the studies focused on the development of wildfires propagation models (Sá et al. 2017; Pinto et al. 2016) and the characterization of the climatic cooling potential and energy savings for Iberia, using direct ventilation and evaporative cooling systems (Campaniço et al. 2016). The boundary conditions for the climate change assessment derive from an EC-EARTH v2.3 (Hazeleger et al. 2010) climate simulations for RCP8.5 scenario and encompassing two periods: 1971–2000 and 2071–2100. EC-Earth is the ECMWF Integrated Forecast System (IFS), cycle 31r, with 62 vertical levels and a horizontal spectral resolution of T159 (~ 125 km) for the atmosphere and ~ 1° horizontal resolution for the ocean. The 2-meters maximum and minimum daily temperatures are obtained directly from the WRF output.

2.3 Observations

Maximum and minimum daily temperatures, at 2 m AGL, were used from Instituto Português do Mar e Atmosfera (IPMA- Portuguese weather service). Only 42 stations with data from 1971 to 2000 were considered (Fig. 1d). The altitude differences between the stations and the model grid points were mitigated through a temperature correction assuming a 6.5 °C km−1 lapse rate. According to Zhang et al. (2009) the differences in the use of local lapse rates or the standard lapse rate in temperature corrections is very small.

2.4 Methods

2.4.1 Ensemble building

In order to evaluate the ability of each model to reproduce the historical period climate, the daily maximum and minimum temperatures of each RCM were compared with IPMA’s 42 station observations. As in Soares et al. (2017a), bias (1), normalized bias (2), mean absolute error (3), mean absolute percentage error (4), root mean square error (5), normalized standard deviation (6), spatial correlation (7) and Willmott-D score (8) (Willmott et al. 2012) were computed by merging time and space. The grid points considered were the nearest neighbours to the IPMA station.

$$Bias=\frac{1}{N}\mathop \sum \limits_{{k=1}}^{N} \left( {{m_k} - {o_k}} \right),$$
(1)
$$Bias\% =\frac{{\mathop \sum \nolimits_{{k=1}}^{N} \left( {{m_k} - {o_k}} \right),~~~}}{{\mathop \sum \nolimits_{{k=1}}^{N} {o_k}}},~~$$
(2)
$$MAE=\frac{1}{N}\mathop \sum \limits_{{k=1}}^{N} \left| {m - {o_k}} \right|,~~~~~$$
(3)
$$MAPE=\frac{{\mathop \sum \nolimits_{{k=1}}^{N} \left| {{m_k} - {o_k}} \right|~~}}{{\mathop \sum \nolimits_{{k=1}}^{N} {o_k}}},~~~~~$$
(4)
$$RMSE=\sqrt {\frac{1}{N}\mathop \sum \limits_{{k=1}}^{N} {{\left( {m - {o_k}} \right)}^2}~~~~,}$$
(5)
$${\sigma _n}=\frac{{{\sigma _p}}}{{{\sigma _o}}}=\frac{{\sqrt {\frac{1}{N}\mathop \sum \nolimits_{{k=1}}^{N} {{\left( {{m_k} - \bar {m}} \right)}^2}~~~~} }}{{\sqrt {\frac{1}{N}\mathop \sum \nolimits_{{k=1}}^{N} {{\left( {o - \bar {o}} \right)}^2}~~~~} }},$$
(6)
$$r=\frac{{\mathop \sum \nolimits_{{k=1}}^{N} \left( {{o_k} - \bar {o}} \right) - \left( {{p_k} - \bar {p}} \right)}}{{\sqrt {\mathop \sum \nolimits_{{k=1}}^{N} {{\left( {{o_k} - \bar {o}} \right)}^2}\mathop \sum \nolimits_{{k=1}}^{N} {{\left( {{p_k} - \bar {p}} \right)}^2}} }},$$
(7)
$$D=\left\{ {\begin{array}{*{20}{c}} {1 - \frac{{\mathop \sum \nolimits_{{k=1}}^{N} \left| {{m_k} - {o_k}} \right|~~~}}{{2\mathop \sum \nolimits_{{k=1}}^{N} \left| {{o_k} - \bar {o}} \right|~~~}},~~if~\mathop \sum \limits_{{k=1}}^{N} \left| {{m_k} - {o_k}} \right|~~~~ \leqslant ~2\mathop \sum \limits_{{k=1}}^{N} \left| {{o_k} - \bar {o}} \right|~~~} \\ ~ \\ {\frac{{2\mathop \sum \nolimits_{{k=1}}^{N} \left| {{o_k} - \bar {o}} \right|~~~}}{{\mathop \sum \nolimits_{{k=1}}^{N} \left| {{m_k} - {o_k}} \right|~~~}} - 1~,~~if~~2\mathop \sum \limits_{{k=1}}^{N} \left| {{o_k} - \bar {o}} \right|~~~<\mathop \sum \limits_{{k=1}}^{N} \left| {{m_k} - {o_k}} \right|~~~} \end{array}} \right.~~~,$$
(8)

where N is the number of observed/modelled days and \(\bar {o}\) and \(\bar {m}\) represent the mean of observed and simulated values. A simple bootstrapping technique (Wilks 2006, pag 166 ff) using 10,000 samples was used to estimate the 90% confidence interval of the different error statistics. The analysis of the probability distribution functions was also performed using the Yule-Kendall skewness measure (9) (Ferro et al. 2005), Anderson–Darling test (10) (Anderson and Darling 1954) and PDF matching scores suggested by Perkins et al. (2007) and Boberg et al. (2009) (11).

$$YK=~{\left[ {\frac{{\left( {{P_{95}} - {P_{50}}} \right) - \left( {{P_{50}} - {P_5}} \right)}}{{\left( {{P_{95}} - {P_5}} \right)}}} \right]_{model}} - {\left[ {\frac{{\left( {{P_{95}} - {P_{50}}} \right) - \left( {{P_{50}} - {P_5}} \right)}}{{\left( {{P_{95}} - {P_5}} \right)}}} \right]_{obs}},$$
(9)
$${A^2}=\frac{{MO}}{{{{\left( {M+O} \right)}^2}}}\mathop \int \nolimits_{0}^{1} \frac{{{{\left( {{E_M} - {E_O}} \right)}^2}}}{{E\left( {1 - E} \right)}}~dE,$$
(10)
$$S=\mathop \int \nolimits^{} \hbox{min} \left( {{E_M},{E_O}} \right),$$
(11)

where the different percentiles are indicated by P, EM and EO stand for empirical distribution functions for the models and observations respectively, E is the empirical distribution function of the model and observed pooled sample (Demortier 1995). Following Christensen et al. (2010), a ranking of the RCMs forced by the GCMs is performed in order to select the best models in relation to the maximum and minimum temperatures and to build different multi-model ensembles afterwards. The inverse of the absolute value of an individual metric was firstly calculated for metrics in which the best expected result should be closest to zero (bias, MAE, RMSE). As the optimal value for normalized standard deviation is 1 and since in some cases the deviations are very small, the metric is transformed in the following manner:

$${\vartheta _n}=\left\{ {\begin{array}{*{20}{c}} {{\sigma _n}~~if~{\sigma _n}<1} \\ {\frac{1}{{{\sigma _n}}}~~if~{\sigma _n}>1} \end{array}} \right.$$
(12)

Following a similar reasoning for the Yule-Kendall, it becomes:

$$Y{K_{new}}=\left\{ {\begin{array}{*{20}{c}} {YK+1~~if~YK<0} \\ {\frac{1}{{YK+1}}~~if~YK>0} \end{array}} \right.$$
(13)

For each metric, the individual model ranks were obtained by dividing each value by the sum of all values from all of the models. In this way, the sum of the ranks for each metric is equal to 1. Weights were, then, constructed by either averaging the ranks of all of the metrics or by multiplying the ranks. In the former the importance of each metric is averaged while in the latter the metrics are considered as independent of each other and a good performing model needs high scores in all metrics. Finally, each weight is divided by the sum of the weights so that the total sum of the weights is equal to 1. For each temperature (maximum or minimum), two multi-model ensembles were thus built (ENS_WM and ENS_WP). Additionally, another ensemble in which the weights are equal for all models (\(1/\left[ {no.~of~models} \right]\), ENS_F) was also considered.

For all of the ensembles, the ensemble temperature (maximum or minimum) is obtained from the ensemble of models through:

$$\bar {T}={\raise0.7ex\hbox{${\mathop \sum \nolimits_{{i=1}}^{N} {T_i}{w_i}}$} \!\mathord{\left/ {\vphantom {{\mathop \sum \nolimits_{{i=1}}^{N} {T_i}{w_i}} {\mathop \sum \nolimits_{{i=1}}^{N} {w_i}}}}\right.\kern-0pt}\!\lower0.7ex\hbox{${\mathop \sum \nolimits_{{i=1}}^{N} {w_i}}$}}$$
(14)

where Ti is the maximum or minimum temperature for model i, wi is the weight for model i, N is the number of models. The ensemble PDFs are obtained by:

$$PDF\left( T \right)=\mathop \sum \limits_{{i=1}}^{N} PDF{(T)_i}{w_i}$$
(15)

The effective number of models in each ensemble can be assessed through (Xu et al. 2010):

$${N_{eff}}=~\frac{1}{{\mathop \sum \nolimits^{} {{\left[ {{\raise0.7ex\hbox{${{w_i}}$} \!\mathord{\left/ {\vphantom {{{w_i}} {\mathop \sum \nolimits^{} {w_i}}}}\right.\kern-0pt}\!\lower0.7ex\hbox{${\mathop \sum \nolimits^{} {w_i}}$}}} \right]}^2}}}$$
(16)

Thus, for the ensembles with equal weighting \({N_{eff}}=N\) (ENS_F) and for non-uniform weights \(~{N_{eff}}<N\) (ENS_WM and ENS_WP). The four ensembles are subsequently evaluated and ENS_F along with best performing ensemble are chosen to represent future climate. An assessment of the uncertainty in the climate change projections associated to the weighting is also achieved.

2.4.2 Percentiles thresholds

The analysis of the changes in the tails of the temperature PDFs will be performed using percentiles to set the extreme thresholds. This minimises the impact of model biases on the results since each threshold is set for each grid point and model. This is a common approach in several studies with multiple models to investigate projected changes in the PDF tails, e.g. Meehl and Tebaldi (2004), Fischer and Schär (2010), Schoetter et al. (2015) and Argüeso et al. (2016). To avoid artificial discontinuities at the beginning and the end of the base period percentiles time series (historical), we follow Zhang’s et al. (2005) procedure, i.e.:

  1. 1.

    Each year under analysis was removed from the 30 year historical time series and its data was replaced by the data from the following year;

  2. 2.

    The percentiles from the preceding time series are determined, the data from year under analysis is compared with these thresholds and the exceedance rates are determined;

  3. 3.

    The procedure is repeated 28 times;

  4. 4.

    The percentiles for the analysis of the last year are determined by averaging the preceding 29 estimates from the previous steps;

  5. 5.

    The percentiles from step 4 are used for the future thresholds.

The percentiles are obtained for each day of the year centred on a 31 day window so that the percentile exceedances would be consistent with the heat wave methodology adopted by Russo et al. (2015). The heat temperature exceedances were obtained for an extended summer, from May to September (MJJAS) as current summer extremes are expected to occur in late spring and early autumn in the future. The cold exceedances refer to an extended winter from November to February (NDJF) since Novembers’ cold extremes can already have an impact in human health.

2.4.3 Heat waves

A heat wave usually refers to a consecutive number of days in which the temperature is excessively higher than normal (Perkins et al. 2012). However, several authors use different definitions which may have significant influence on the assessment of the impact of climate change on this phenomenon (Jacob et al. 2014). Here, we base our description on the World Meteorology Organisation’s definition (Frich et al. 2002). Heat waves are defined as periods of more than 5 consecutive days with maximum temperature above the historical period 90th percentile. The percentiles are computed as in the previous section. Here the magnitude of each individual heat wave is computed as the sum of the daily magnitudes of the consecutive days composing the heat wave. The daily magnitudes are calculated according to Russo et al. (2015):

$${M_d}\left( {T{x_d}} \right)=\left\{ {\begin{array}{*{20}{l}} {\frac{{T{x_d} - ~{P_{25}}}}{{{P_{75}} - {P_{25}}}} \quad if~T{x_d}>{P_{25}}~} \\ ~ \\ {0 \quad if~T{x_d} \leq {P_{25}}} \end{array}} \right.$$
(17)

where \(T{x_d}\) is the daily maximum temperature, \({P_{25}}\) and \({P_{75}}\) are 25th and 75th percentiles, respectively, of the historical time series. The percentage of land area under a heat wave is the ratio of the number of grid points under the heat wave to the total number of grid points within the Portuguese borders.

3 Results

3.1 Assessment of the historical simulations

The ability of the RCMs historical simulations to represent present climate is fundamental for climate change assessment studies. Fig. S1a shows the daily mean maximum temperatures for the 42 IPMA stations, WRF and EURO-CORDEX 0.11 from 1971 to 2000. In the IPMA stations, the lowest values are associated to altitude (stations above 1000 m) followed by the coastal areas. The influence of the northern topography is mostly visible in the maximum temperatures were larger values (21–23 °C) occur in the valleys, whilst the higher stations are 2 to 3 °C cooler. South of the Tagus River, the low rolling hills display the highest temperatures, particularly near the Spanish border. An overview of model’s representation of the maximum mean temperature exposes their good ability to reproduce the major characteristics enumerated above, yet, some spread between models is noticeable. The individual grid point biases between the local observational stations and simulations (WRF and the 0.11 RCMs) are presented in Fig. 2a. It is evident, that all models underestimate the maximum temperatures and the differences between model skills is rather visible. The WRF forced by EC-EARTH simulation has very similar biases to the ones encountered in Soares et al. (2012a), where the same model setup was forced by ERA-Interim. The higher biases are mostly associated to altitude or steep terrain. In the 0.11 simulations, the influence of the driving GCMs is apparent in the similitude between the CLM (1 to 4) and SMHI (1–4). The MOHC-HadGEM2-ES coupled with the RCMs is associated to the lowest biases in all of the domain. The second best tie is associated to the GCM MPI-M-MPI-ESM-LR. This is in agrement with Brands et al. (2013) who found that when used as boundary conditions for a domain encompasing EURO-CORDEX, MPI-ESM-LR and HadGEM2-ES are among the best of the seven analysed models and, MIROC-ESM and IPSL-CM5-MR generally perform poorer. The lower maximum temperatures can be attributed to an overestimated meridional pressure gradient during winter and spring which leads to wetter (see Soares et al. 2017) and milder seasons. In summer, the Sahara and Iberian heat lows are generally too weakly captured implying, once again, lower maximum daily temperatures. In the coarser resolution (0.44) the biases are slightly larger than the higher resolution (Fig. 3a). In general, the RCMs in the 0.44 resolution are colder than in the 0.11. Nevertheless, the results for the RCMs forced by HadGEM2-ES are very similar between resolutions (KNMI2 and SMHI2, vide Fig. S2a). These differences are summarised in Fig. 4a, where box plots with biases from all models are depicted (these biases are computed for each model by pulling together all the seasonal mean maximum temperatures for all stations—the individual model biases are illustrated in the supplemental Table S1). For both resolutions the median bias (~-1.35 °C) and the third quartile (~-0.65 °C) are similar. However, the inferior end of the bias distribution of 0.44 resolution is lower than the 0.11, which reflects the higher negative values in Fig. 3a. Table S1 also shows that the low biases associated to the RCMs coupled with MOHC-HadGEM2-ES is obtained through the compensation between positive and negative biases within the stations due to the high MAE. The similitude between the absolute bias value and MAE also points to a systematic negative bias which could be due to the negative cold bias in the forcing GCMs. This is not surprising since the CMIP5 multi model GCM bias is around − 1 °C and MAE between 0.5 and 1.5 (chapter nine WG1AR5, IPCC 2013) well in line with the values found here.

Fig. 2
figure 2

a Differences of mean maximum daily temperature between 0.11 RCMs control runs (1971–2000) and station observations and b differences of mean minimum daily temperature. Note that, whenever possible, RCMs forced by identical GCM are in the same column

Fig. 3
figure 3

As Fig. 2 but for EURO-CORDEX-44

Fig. 4
figure 4

Global error measures of WRF (9 km grids) and of EURO-CORDEX RCMs Tmax/Tmin against the observational stations for the Portuguese mainland. The error measures are a bias (°C), b normalised standard deviation, c S (Boberg et al. 2010) and d Yule-Kendall

Contrary to the maximum temperature, the mean minimum has a northeast to southwest gradient. The lowest minimum values are once again associated to the high altitude stations in the northwest, while the highest are observed in the southwest and southern coasts (Fig. S1b). All models are able to depict this gradient however a larger dispersion in results is observable. The biases are no longer consistently negative (Fig. 2b). Now the RCMs internal model variability are believed to have a larger influence than in the maximum temperature, i.e., the similitude between the results from RCMs forced by the same GCM is no longer evident. While the minimum temperatures from the CLM/HadGEM2-ES and CLM/ MPI-ESM-LR couples (CLM2 and CLM4 respectively) are larger than the observations the SMHI2 and SMHI4 are colder. Similar differences are found between DMI and KNMI1 and IPSL and SMHI5. The WRF9km minimum temperature is particularly lower than the observations in the southern part of the country whilst, the biases in the northern coast are positive. Yet, the EURO-44 IDL simulation, which is also performed with WRF and EC-EARTH has lower biases. This is in line with García-Díez (2015) who found that differences in configurations lead to large spread between ensemble members. The spread between the model biases is reflected in the large interquartile range of the 0.11 resolution (Fig. 4a). Now one-third of the models have positive biases and only one-third have a systematic negative bias inherited from the GCM (Table S2). In the 0.44 models half of the models have positive biases while the other half have negative biases. Nevertheless, the biases and MAE are smaller for minimum temperature than for maximum temperature, for both resolutions.

The spatial–temporal variability of the observations is mostly overestimated by the majority of the models and all resolutions since the normalised standard deviation is greater than one (Fig. 4b). Half of the models overestimate this feature by more than 10%. The similarity between the temperatures distributions are summarised by the S score (Fig. 4c). The distributions have a reasonable degree of overlap with Ss above 80%. Only CNRM at 0.44° resolution and IPSL maximum temperature distribution have Ss around 77%. The similitude between the observed distributions is greater for the minimum temperature, which might be due to the higher systematic negative bias of the maximum temperature, however shape of the PDFs is comparable (Fig. S3). The differences in skewness as measured by the Yule-Kendall difference score highlights this feature since the values for maximum temperature are near zero and one order of magnitude smaller than the observed Yule-Kendall (0.1837). In the case of the minimum temperature, the models’ PDFs have more Gaussian shape than the observations, thus lower Yule-Kendall scores than the observations. The median difference between the scores, although very small, is of the same order of magnitude as the observations Yule-Kendall (− 0.0513).

The ranking of each model for the individual global measures, for the two variables, is presented in Fig. 5. It highlights some models consistency across different measures. For the 0.11 resolution and in the case of minimum temperature, CLM1 and CLM3 have the highest ranking in the statistics measuring deviations from the mean (bias, MAE, RMSE, Wilmott D), however they do not represent the dispersion (\({\sigma _n}\)) and the spatial correlation as well as other models. Although with lower ranking, the other CLM simulations have similar performances. SMHI2, SMHI4 and IPSL also have consistent rankings across these measures. IPSL stands out in the analysis of the PDFs ranking first in the S score and in the Anderson Darling measure. On the other end of the spectrum, CNRM is clearly the worst performing model for almost all measures. The rankings for the maximum temperature are significantly different from the latter. Now, MPI and KNMI2 are undoubtedly the best performing models and IPSL is the worst. In the 0.44 resolution three models stand out with consistent poor results, CNRM, DMI and HMS. As before, IPSL is able to represent very well the minimum temperature (rank 2), but it is the worst for maximum temperature. The inverse happens with MPI, 17th and 2rd position for minimum and maximum temperatures respectively. The best model for the minimum temperature is now SMHI8 while SMHI9 is the best for maximum.

Fig. 5
figure 5

Maximum and minimum temperatures, EURO-CORDEX models’ ranking positions, at 0.11 and 0.44 resolutions

The different rankings between models for the two variables imply that if only one ensemble is constructed for both temperatures, it will fail to best perform for either variable. Thus, a separate ensemble for each variable was produced. Since most models have very similar performances for bias, MAE, RMSE and Wilmott D and they all assess similar characteristics, only the latter two were considered for the ensemble weights. The Yule-Yendall score was also disregarded. Different score combinations were also investigated and the weights and ranks were very similar, thus only RMSE, \({\sigma _n}\), Wilmott D, spatial correlation, S and Anderson Darling contribute to the definition of the ensemble weights. The weights for each ensemble are illustrated in Table 2. As referred earlier, two types of ensembles were constructed. One where the individual scores are averaged (ENS_WM) and another where the scores are multiplied (ENS_WP). In the case of the 0.44 resolution, there are different ensembles for each scenario, since IDL, and HMS do not have results for RCP4.5.

Table 2 Weighs for the different ensembles. Ensembles with 011 and 0.44 refers to the RCM resolution, and WP to product and WM averaged scores. The number of effective models in each ensemble is illustrated in the last row of each column

The number of effective models in each ensemble is illustrated in the last row of each column. The number of effective models in the ensembles built through the score averages is almost the total model number. Thus, poor performing models have still a reasonable input into the ensemble limiting the ability of the ensemble to reduce biases. The use of the product of the scores reduces significantly the number of effective models to between 20% and 30% of the number of models, and in this case the differences between weights can be one or two orders of magnitude. The same measures employed for the individual model performance were now applied to the three types of ensemble in order to assess which performs better (Table 3). The ensemble based on the product of model weights is the one which obtains better results for both variables and resolutions. Thus, this ensemble along with the ensemble in which all models have equal weights will be used to assess the climate change signal from here onwards.

Table 3 Global statistical errors and skill scores for the EURO-CORDEX multi-model ensembles

The seasonal mean maximum and minimum temperatures for both ensembles and resolutions for the historical period are depicted in Fig. 6a, b, respectively. Both ensembles correctly display the southwest-northeast winter and autumn temperature gradients as well as the north–south spring and summer gradients. The strength of temperature gradients is the major difference between them. These are more pronounced in the weighted ensemble. The summer maximum temperature for the 0.11 resolution is a good example, wherein the topographic effects of the mountains in central Portugal are more evident. In the 0.44 resolution the orography is smoothed and thus the topographic effects visible in the 0.11 are no longer observed but still a difference in the north–south gradient is clear. The full ensemble is generally colder in all seasons for both resolutions and for both variables. The WRF9km seasonal temperatures are also presented and further highlight the impact of topography on the temperature distribution. The maximum temperature WRF9km results are closer to the 0.11 weighted ensemble, however the minimum temperature negative bias masks the resemblance.

Fig. 6
figure 6

Seasonal a maximum and b minimum temperatures of station observations, WRF9km, EURO-CORDEX 0.44 and 0.11 ensembles, for the historical runs (period 1971–2000)

3.2 Climate change assessment for Portugal

The projected temperature changes over Portugal are rather severe, and are analysed in the current section. The focus will be on the RCP8.5 scenario and, in a briefer way, on the main differences according to the milder RCP4.5 scenario. In general, the warming projections for the end of the century show larger values for maximum than for minimum temperature, for both scenarios, and for yearly mean daily and seasonal temperatures (Figs. 7, 8, 9). In RCP8.5 and in agreement with the different multi-model ensembles, the mean daily maximum temperature is expected to increase up to 4.5 °C in some inland regions, and in the range between 3 and 4 °C near the coasts. The higher resolution WRF projections are milder, reaching maximum values around 4 °C. The yearly mean daily minimum temperature warming is expected to be between 3 and 4 °C. The four multi-model ensembles share the same kind of signal, where the weighted ensembles project smaller changes. Both maximum and minimum temperatures increases obey a sharp west-east gradient. WRF minimum temperature changes is, however, an exception. The changes pattern is mostly linked to orography, and milder increases of around 3 °C are expected in most of the territory. In the case of the RCP4.5 scenario all the projections show augments in the range of 1.5 °C, near the coast, and 2.8 °C in the eastern border regions. Statistical significance was investigated following Tebaldi et al. (2011), i.e., Student-t tests were performed for each individual model and grid point, and the change was considered significant if and only if 50% or more of the models show statistical significant differences and 80% agree in the sign of the change. All models agree on the sign of the change and more than 50% of the models that compose the ensembles have statistical significant changes at a 95% confidence level for all grid points. Thus, the results for all ensembles and scenarios show statistical significant changes at a 95% confidence level.

Fig. 7
figure 7

WRF9km, EURO-CORDEX 0.44 and 0.11 ensembles climate change signal for daily temperatures a maximum, and b minimum, (2071–2100 minus 1971–2000). Areas with dots specify changes not statistically significant using a Student’s t-test at the 95% confidence level

Fig. 8
figure 8

As Fig. 7a but for seasons

Fig. 9
figure 9

As Fig. 7b but for seasons

Fig. 10
figure 10

Temperature delta values (a, b—maximum; c, d—minimum) averaged over Portugal for the period 2071–2100 with respect to 1971–2000. The delta values calculated for the mean daily values in each season in winter (DJF, X axis) and summer (JJA, Y axis) and in spring (MAM, X axis) and autumn (SON, Y axis)

The inspection of the projected changes for the seasonal maximum temperatures reveal a strong seasonality. For the RCP8.5, the multi-model ensembles predict maximum temperature increases in the range of 2 and 4 °C for winter and spring, but for summer and autumn the changes may reach values from 3 °C, near the coast, up to more than 8 °C in large areas inland. Interesting, is the almost absence of annual cycle of these changes in some coastal regions, where the maximum temperature seasonal increases are always around 2–3 °C. This rather small annual cycle is linked to the typical northerly wind present offshore the Portuguese coast, that aloft is a coastal-low level jet. These winds are predicted to increase strongly in its frequency in the end of the century (Soares et al. 2017a, b, c), advecting cooler air from the north and counteracting the climate change local warming. The seasonal patterns changes are quite consistent from ensemble to ensemble or model, with the exception of the autumn when some noticeable differences are present. For example, the inland maximum temperatures increases projected by the ENS_F_0.44 are around 8 °C, by the ENS_WP_0.44 or ENS_F_0.11 are around 4.5 °C and by WRF the changes are seldom above 4 °C.

In the framework of the RCP4.5 the expected changes obey roughly to the same logic, i.e. similar annual cycle and spatial patterns of changes, but much less intense: for winter and spring the changes are between 1 and 2 °C, and for summer and autumn 1.5 and 3 °C.

The projected changes for the minimum temperature for the end of the century, according to both the RCP4.5 and RPC8.5 scenarios, are similar to the changes of the maximum temperature, but with mitigated properties: the changes in the annual cycle is less sharp, the coastal inland contrasts are smaller, the large inland warmings are also less intense and finally the multi-model ensembles projections are less consistent in summer and autumn. The inland maximum values of warming for summer are around 5.5 °C in the case of the two full multi-model ensembles, but around 4.5 °C for the others. The WRF orographic signature in the changes pattern is rather strong, since it’s a single model and not a multi-model ensemble. Again, WRF projects smaller increases for minimum temperature than the multi-model ensembles, only reaching 4 °C in some localised regions.

The individual RCM temperature delta changes, comparing the overall warming in Portugal, (Fig. 10) are fully consistent in their seasonal signal; i.e. all models project temperature increases in all seasons. This coherency is rather high when looking at the delta changes given by the different multi-model ensembles for both variables and RCP scenarios. However, there is a significant model spread in the degree of warming. For maximum temperature, the multi-model ensembles project changes, in agreement to the RCP8.5 (RCP4.5) scenario, of ~ 3 °C (~ 1.5 °C) for winter and spring, and of ~ 5 °C (~ 3 °C) for autumn and summer. The maximum temperature model spread is nevertheless significant, especially for summer amounting to almost 5 °C, i.e. the average warming is between 2 and 7 °C. For the other three seasons the spread ranges between 2 and 3 °C. For the minimum temperature, the changes are similar and as consistent as for the maximum, but smaller in summer and autumn by about 1 °C. In general, the full set of RCMs display a delta change spread for minimum temperature ranges from around 6 °C in summer and autumn, according to the RCP8.5, to 1 °C in spring for RCP4.5. The larger values of spread are due to a single model that projects little warming, otherwise the larger spreads are of ~ 3 °C.

The PDFs for maximum and minimum temperature are plotted in Fig. 11. The historical maximum temperature multi-model ensembles are rather coherent, presenting an almost Gaussian shape with mode around 14 °C. For future climate, the projected warming is reflected in the PDFs by a marked shift for higher temperatures, being the mode around 16 °C for the RCP8.5 scenario, and more importantly by much higher frequencies above 30 °C. In fact, this augment introduces an extra non-Gaussian character to the PDF and can probably be attributed to an increase in sensible heat flux due to a reduction of soil moisture/evapotranspiration. The expected reduction of 10–30% in precipitation in the intermediate seasons, 30–50% in summer and up to 10% in winter (Soares et al. 2017a) leads to a reduction in soil moisture and thus a greater availability of sensible heat flux which heats the atmosphere. Additionally, maximum temperatures below 2 °C are almost non-existent and temperatures above 40 °C are much more common. For the RCP4.5, the same PDF changes can be seen but in a milder manner. The WRF simulation is roughly in agreement with the ensembles. An analysis of the differences in several percentiles reveals an increase of 3.5–4 °C in the median and circa 2.5 °C of expansion of the interquartile range (not shown).

Fig. 11
figure 11

EURO-CORDEX 0.44 and 0.11 ensembles and WRF9km frequency distributions, for present and future climate, of a Tmax and b Tmin

In the case of minimum temperature PDFs, the multi-model ensembles have a consistent shape, but the WRF results are substantially different. As aforementioned, WRF underpredicts present climate minimum temperatures. The PDF changes for future climate, according to RCP8.5, are notorious. Similarly to the maximum temperature, there is a pronounced shift for larger minimum temperature values (increase around 3–3.5 °C in the median). The present climate PDF asymmetry, related to values around 0 °C almost disappears, and, in future climate, the asymmetry is linked to an increase of frequencies above 20 °C. In fact, minimum temperatures above this threshold are projected to be much more common in future climate. Unlike the maximum temperature, the interquartile range increases by less than 10% (~ 1 °C) and the changes in shape of the future PDFs can be attributed to changes in location (median) and shape (asymmetry), in accordance with Eq. 3 from Ferro et al. (2005).

3.3 Changes in extremes

The RCP8.5 30 year mean number of days in which the maximum temperature exceeds the historical 90th percentile is presented in Fig. 12a. The box-plots show the spatial distribution of the mean exceedances for 30 year overlapping intervals. During the historical period (1971–2000) circa 15 days per year have maximum temperatures above the 90th percentile in all ensembles and WRF9km. The interannual variability as measured by the standard deviation (std) is however quite different (Fig. 12d). In the 0.44 ensembles and ENS_F_0.11 the interannual variability is very similar across the country (std ~ 3 for ENS_F_0.44 and 4 for ENS_WP_0.44 and ENS_F_0.11) however, for the ENS_WP_0.11 and WRF9km it is 8 days on average, which is more than half the mean exceedance days. In the latter results there is also a noticeable difference between the coastal areas (lower standard deviation) and the inland regions (standard deviation differences larger than 3) and there are years during which the maximum temperature does not exceed the 90th percentile (not shown). The projection number of exceedances increases steadily until the end of the twenty-first century and the differences between coastal and inland regions become larger. For the last 30 years of the twenty-first century (2071–2100), the exceedances are four to fivefold larger than the historical (circa 0.5 of the number of days in MJJAS, ~ 80 days, exceed the 90th percentile). The anomaly between the historical and the 2071–2100 exceedances (Fig. 13b) displays a large west-east gradient highlighting the lower warming in the coastal areas and the opposite near the border with Spain. The west coast regions in the higher resolution ensembles and WRF9km show a remarkable lower rise in exceedances accentuating the importance of the resolution in the representation of local sea-breeze circulations. The latter are responsible for the transport of cooler and moisture air inland, thus mitigating the effects of climate change.

Fig. 12
figure 12

Decadal tendency in the twenty-first century of: mean tri-decadal number of days (left column) and tri-decadal standard deviation (right column) with a, d Tmax > 90th percentile, b, e Tmin > 95th percentile and c, f Tmin < 10th percentile. WRF9km and EURO-CORDEX 0.44 and 0.11 RCP8.5 ensembles

Fig. 13
figure 13

WRF9km, EURO-CORDEX ENSEMBLE 0.44, and EURO-CORDEX ENSEMBLE 0.11 end of the twenty-first century climate change signal for number of days (future—historical) with a Tmax > 97.5th percentile, b Tmax > 90th percentile, c Tmin > 95th percentile and d Tmin < 10th percentile (2071–2100 minus 1971–2000)

The surge in temperature is nonetheless associated to limited rises in interannual variability of the 0.44 ensembles and ENS_F_0.11 until the end of the century and a small increase in coastal/inland variability. In ENS_WP_0.11, however, large variances and contrast between coastal and inland regions occur in the beginning of the century but these decrease significantly along the century until the last two 30 year periods when the variability grows once again. By the end of the century the standard deviation is only ~ 15 days, now the 90th percentile is exceeded every year in every region during several days. The WRF9km results are similar to the 0.11 weighted ensemble, however the interannual variability is larger by about 2 days and the increments are similar for all regions. An analysis of the extremely hot days, exceedances of the 97.5th percentile, reveals that although these were rare events in the historical period (1 day per year), by the end of the twenty-first century these represent more than half of the days when the maximum temperature is above the 90th percentile (not shown). In the milder scenario (RCP4.5) the projections point to a slower expansion of the trend of the exceedances by mid-twenty-first century. Nonetheless, between 35 and 45 days in the extended summer will have temperatures surpassing the historical 90th percentile value (Fig. S4 and 13b).

Frequently very hot days are accompanied by warm nights, or tropical nights, which can pose a hazard to human health when the body is not able to recover from the diurnal excess of heat. Usually a tropical night occurs when the minimum temperature is above 20 °C. In Portugal this occurs, on average, during 10 nights over the extended summer in present climate (Fig. S5). Thus the 95th minimum temperature percentile was used to set the exceedance threshold for these types of nights (Fig. 12b). Up to mid-twenty-first century the number of tropical nights for all ensembles is very similar between regions (Fig. 13c); from then onwards the differences between the coastal areas and inland become more substantial. By the end of the century differences of 10 to 20 nights are projected, i.e. the coastal regions have, on average, 10 to 20 tropical nights less than the inland areas (Figs. 12b, 13c). The number of nights increases significantly along the century and, on average, 60 tropical nights are projected for the end of the century. Only ENS_WP_0.11 projects a slightly lower exceedances (50 nights on average). The interannual variability during the historical period is small for all of the ensembles (2 to 3 nights) and similar for all regions. WRF9km has higher variability (5 nights on average) and differences between the coastal and inland areas are apparent (1–2 nights). The standard deviation increases up to mid-century, reaching values of 8–10 nights, and then stabilises. The projections also show differences of 2–3 nights between the coast and inland. At the end of the century, WRF9km’s standard deviation doubles the EURO-CORDEX’s ensemble. In the RCP4.5 the number of exceedances are half of the projected by RCP8.5.

The number of very cold nights is, here, represented by the 10th percentile are shown in Fig. 12c as a boxplot for all the grid-points and in Fig. 13d as a map. A significant reduction of the cold nights occurs throughout the century and by its end the number of cold nights is decreased to only 1 night per year. There are no significant differences between regions in the EURO-CORDEX ensembles. The anomalies between 2071 and 2100 and 1971–2000 time intervals (Fig. 13d) clearly portray the homogenous spatial decline in the number of exceedances in the EURO-CORDEX ensembles. The lower warming of the WRF9km simulation is reflected in the higher exceedance number, still a sixfold reduction is projected in the northwestern coasts and a 4 to threefold reduction is projected in the southern half of the country and inland northeast (Fig. 13d). The small interannual variability is reduced even further as the century progresses and indicates that by the end of the century some years will not have extreme cold days since it is the same order of magnitude as the exceedances.

3.4 Heat waves

The most frequent heat wave amplitudes for all of the EURO-CORDEX ensembles and WRF9km are between 9 and 11 for all time intervals (Fig. 14). Only the results for the weighted ensemble for each resolution are presented here, since the full ensemble has very similar results. In the historical period, the median amplitude is between 11 and 12 and 80% of heat waves have amplitudes smaller than 16 (amplitude of the 2003 heat wave, Russo et al. 2015). As the century progresses, the amplitude pdf broadens (becomes more positively skewed) and more intense heat waves become more frequent. The frequency of its highest peak is reduced in half at the end of the century. Now the median is 17, i.e. more than half of the projected heat waves will be stronger than the 2003 heat wave. 80% of the events have amplitudes less than 32, a value that does not exist in the historical period of most of the models which form the ensembles (its probability is 0.997), nor in WRF9km. During the historical period 5 days heat waves are the most frequent. In the 0.44 ensembles these represent between 27.5 to 30% and 17.5 and 20% in the 0.11 ensembles. In WRF9km 5 day heat waves occurs 35% of the time. 95% of the heat waves during the historical period have less than 13 days in the full ensembles, i.e. less than 10 days in the weighted 0.44 ensemble and WRF9km and less than 14 days in the weighted 0.11 ensemble. As the century progresses these short heat waves disappear and the more frequent heat wave lasts 22 days by the end of the century. Half of the heat wave events in the EURO-CORDEX ensemble are projected to persist for more than 22 days at the end of the century. The 5% longest heat waves prevail for more than one month to one and half months, i.e. 38 days in ENS_WP_0.44, 43 days in ENS_F_0.44, 45 days in ENS_F_0.11 and 48 days in ENS_WP_0.11. In WRF9km the 5 day heat waves are still the most frequent events in the future projections. Nevertheless 20% of the projected heat waves prevail for longer periods than the ones in the historical period. In WRF9km 95% of the events last less than 28 days. The land area under a heat wave is resolution dependant for phenomena which cover small areas of the terrain (less than 25%) or for phenomena which occur in most of the country (areas larger than 90%). For the latter and in the historical period, the frequency of heat waves occupying 5% or less of the area is between 0.3 and 0.35 in the 0.11 resolution and half of that in the 0.44. At the end of the century, these values are reduced to half of these values. In the 0.11 ensembles, 55% of the events occupy less than 15% of the area the historical period whilst only 27% of heat waves cover this area by the end of the century. From this threshold (15%)onwards, as the area under a heat wave increase the frequency decreases steadily to 1% in the historical period, while it remains constant, at 4%, at the end of the twenty-first century, except for situations in which the whole country is under a heat wave. The projections envisage that 7% of the heat waves will occur over the whole country, i.e. ~100 events in the 2071–2100 period, which is more than 3 events per year on average. In the 0.44 ensembles 52% of the events have areas less than 25% of the domain whereas only 28% percent have this extent by the end of the century. The absolute number of heat waves which cover the whole country by the end of the twenty-first century is very similar to the 0.11 ensembles (~ 100) however this represents between 11 and 13% of the occurrences of the 0.44 ensembles.

Fig. 14
figure 14

EURO-CORDEX 0.44 and 0.11 ensembles and WRF9km frequency distributions, for present and future climate, of (1st row) heat wave amplitude, (2nd row) heat wave length and (3rd row) percentage of maximum land area under heat wave

The anomaly (2071–2100–1971–2000) of 30 year mean number of heat waves per grid point is depicted in Fig. 15a. Note that the average number of heat waves, lasting more than five days, in the historical period is approximately 1 (0.7–0.9), thus this anomaly can be also seen as the average number of heat waves, minus one, of the 2071–2100 period (see Fig. S6). For all ensemble and scenarios the rise in the number of heat waves is larger near the Spanish border than near the coast. The average number of heat waves per year in the vicinity of the northwestern coast is between 4 and 5 per year in the 0.44 resolution ensembles. As in the maximum temperature 90th percentiles exceedances, the effects of the resolution is clear near the coast, since the 0.11 have on average one less heat wave per year. Near the border, 5 to 6 events are projected to occur on average in all ensembles. The WRF9km projection is for an occurrence of 6 to seven heat waves per year. Since the 90th percentile projections of exceedances between WRF9km and ENS_WP_0.11 is very similar and the shape of the duration pdfs for the end the century is different, this might be the result of the methodology. No attempt was made to consider as a unique heat wave the cases when two consecutive events are separated by just one day and the maximum temperature is only slightly below the 90th percentile. For RCP4.5 the increases in the number of heat waves is milder, but even in this scenario, an average of 5 heat waves are projected for the end of the twenty-first century.

Fig. 15
figure 15

WRF9km, EURO-CORDEX ENSEMBLE 0.44, and EURO-CORDEX ENSEMBLE 0.11 end of the twenty-first century climate change signal for a average number of heat waves, b average length

The average length of the heat waves increases considerably near the Spanish border, where these phenomena last 12 to 18 days longer on average in future climate. The areas near the coast have smaller rises of less than 6 days. In RCP4.5, there isn’t a marked east–west gradient, but still increases of 9–15 days can be expected.

4 Conclusions

Mediterranean climates are prone to large spatio-temporal precipitation and temperature gradients rendering these regions very vulnerable to climate change, particularly by the end of the twenty-first century (Diffenbaugh and Giorgi 2012; Alessandri et al. 2014). Moist mild winters and dry warm/hot summers are a common feature which characterises the Portuguese climate.

In this study, the EURO-CORDEX high resolution regional climate simulations (0.11° and 0.44° resolutions) are used to investigate the maximum and minimum temperature projections for the end of the twenty-first century according to RCP4.5 and RCP8.5 greenhouse gases emission scenarios. An additional WRF simulation with even higher resolution (9 km) for RCP8.5 scenario is also examined. All simulations for the historical period (1971–2000) were evaluated against the available station observations.

The simulations are able to reproduce the main topography related temperature gradients, however, there are discernible differences between model skills. In the case of maximum temperature, all models have a cold bias and some of this bias are clearly inherited from the forcing GCM, since different RCMs, forced by the same GCM, have similar bias. This is not the case for the minimum temperature where the internal model variability plays a stronger role and models forced by the same GCM have distinct bias. Positive or negative bias are model dependant. The spatial–temporal variability is overestimated for both resolutions and for the majority of the RCMs, nonetheless, there is a large similarity between the observed and simulation distributions. The ranking of the models for each temperature (maximum and minimum) relying on the skill scores allowed the construction of multi-model ensembles. Two types score aggregation were investigated; averaging or multiplication. The latter produced better results and was chosen to perform future projections for temperatures. An ensemble in which all models have equal weights was also considered in order to assess the uncertainty associated to the ensemble construction. Both ensembles are able to correctly depict the seasonal temperature gradients, which are more pronounced in the weighted ensemble. The full ensemble is generally colder in all seasons for both resolutions and for both variables. The seasonal maximum temperatures from WRF9km are closer to the 0.11 weighted ensemble, however the minimum temperature negative bias masks the resemblance.

The projected temperature changes over Portugal are rather severe and significant. In general, the warming projections for the end of the century show larger values for maximum than for minimum temperature, for both scenarios. The maximum temperature rise is largest in summer and autumn with maximum increments of 8 °C in some areas inland whereas in winter and spring it is between 2 and 4 °C. The projected changes for the minimum temperature for the end of the century, according to both the RCP4.5 and RPC8.5 scenarios, are similar to the changes of the maximum temperature, but with lower values. But, the multi-model ensembles projections are less consistent in summer and autumn. The temperature delta changes for the individual RCMs are consistent in sign, i.e. all models project temperature increases in all seasons, yet there is considerable spread in the degree of warming projected by individual models.

The multi-model ensembles temperature PDFs are rather coherent within each time frame and scenario. The projected warming is reflected in the PDFs by a strong shift for higher temperatures. In the maximum temperature and for RCP8.5, the median increases by 3.5–4 °C and an expansion of the interquartile range by circa 2.5 °C is indicative of higher variability. More importantly, a largeincrease in frequencies above 30 °C is projected. Similarly to the maximum temperature, there is a pronounced shift for larger minimum temperature by 3–3.5 °C in the median. Unlike the maximum temperature the variability is not projected to change significantly, the interquartile range increases by less than ~ 1 °C. By the end of the twenty-first century, a considerable reduction in the lower tail of the PDF is estimated and conversely, a substantial enlargement of the frequencies of temperatures above 20 °C. The projection of the 30 year mean number of days in which the maximum temperature exceeds the historical 90th percentile number increases steadily until the end of the twenty-first century. For the last 30 years of the twenty-first century (2071–2100), the exceedances are four to fivefold larger than the historical (circa 0.5 of the number of days in MJJAS, ~ 80 days, exceed the 90th percentile). The anomaly between the historical and the 2071–2100 exceedances displays a large west-east gradient highlighting the lower warming in the coastal areas and the opposite near the border with Spain. The inter-annual variability is not uniform across ensembles. In the lower resolution ensembles and ENS_F_0.11 a limited rise until the end of the century and a small increase in coastal/inland variability is projected. In ENS_WP_0.11, however, a significant reduction in the differences in variability between the coastal and inland areas and by the end of the century the 90th percentile is exceeded every year in every region during several days. The WRF9km results are similar to the 0.11 weighted ensemble. Up to mid-twenty-first century the number of tropical nights for all ensembles is very similar between regions from then onwards the differences between the coastal areas and inland become more substantial. The number of nights increases significantly along the century, from an average of 7 to 60, and the variability increases moderately only until mid-century. As expected, the number of cold nights almost disappear to one night per year in RCP8.5.

For all ensemble and scenarios the rise in the number of heat waves is overwhelming and larger near the Spanish border than near the coast. Near the border, 5 to 6 events are projected to occur on average in all ensembles by the end of the century, when compared to less than one in present climate, i.e., a seven to ninefold increase. For RCP4.5 the increases in the number of heat waves is milder, but even in this scenario, an average of 5 heat waves are projected for the end of the twenty-first century.

The distribution of the heat waves amplitudes is projected to become more positively skewed and more intense heat waves become more frequent. By the end of the twenty-first century, more than half of the heat waves will be stronger than the 2003 extreme heat wave. Amplitudes which did not exist in the historical period become more common. This is also a consequence of the increased duration of these events, since the most common duration was 5 days in the historical period and by the end of the century it becomes 19–22 days (similar to the June 2003 duration). 5% of the longest events last for more than 1 month. The average length of the heat waves increases considerably near the Spanish border, where these phenomena last 12 to 18 days longer on average in future climate. The areas near the coast have smaller rises of less than 6 days. In RCP4.5, there isn’t a marked east–west gradient, but still increases of 9–15 days can be expected. The projections envisage that by the end of the century, 7% (in the 0.11 ensembles) and 11 to 13% (in the 0.44 ensembles) of the heat waves will occur over the whole country, i.e. ~100 events in the 2071–2100 period, which is more than 3 events per year on average.

These extreme heat events will have massive health, environmental and economic impacts. The 2003 heat wave was responsible for a 3.5% increase in mortality in Portugal and more than half of the projections are for heat waves stronger than this. It is imperative to reference that, along with theses heat extremes, an extension of the dry period and a significant reduction of precipitation is projected (Soares et al. 2012b, 2017a). This will increase significantly the hydro-meteorological hazards, like droughts, which will have severe impacts in river water quality, agriculture and forests. An amplified risk of increased aridity and large forest fires is thus foreseeable.