1 Introduction

There is evidence that extreme weather events are becoming more frequent and intense since the last decade of the twentieth century, resulting in significant environmental and socio-economic consequences (Stocker et al. 2013; Furió and Meneu 2011; Klein and Können 2003; Tompkins 2002). Precipitation extremes have received more attention due to their impacts (floods, crop damages, inability to cultivate land). Karl and Knight (1998) showed that there are more days with heavy 24-h precipitation totals in the USA and other countries. In Europe, an increasing trend of the extreme precipitation episodes, especially in winter, is also observed in many countries, such as the Czech Republic (Kyselý 2009), northern Italy (Brunetti et al. 2001), and Germany (Hundecha and Bardossy 2005). In the Mediterranean, a connection between extreme precipitation events and seasonal rainfall totals has also been observed (Toreti et al. 2010).

Furthermore, extreme high and low temperature events are studied due to their impact on human society and natural ecosystems. As the Climate Change Science Program (CCSP 2008) states, the consequences of the global temperature increase will become more obvious due to changes in extreme weather events (including extreme temperature episodes). In particular, a positive change in the mean temperature is also often accompanied by the increased probability of extreme hot days (Mitchell et al. 1990). One example of this was the catastrophic extreme European heatwave during the summer of 2003 in Europe, which caused more than 22,000 deaths (Schӓr and Jendritzky 2004). Moreover, in the eastern Mediterranean, the mean heatwave intensity, duration, and frequency have also been increasing (Kuglitsch et al. 2010). Concerning the prevalence of extreme low temperatures, some case studies provide evidence of these trends on a global and regional scale. During the period of December 2009–February 2010, unusual cold weather outbreaks were observed in many parts of the Northern Hemisphere (see http://www.ncdc.noaa.gov/sotc/), while for the same period in Europe, three cold events were observed (see http://www.knmi.nl/cms/content/79165). Furthermore, this winter was also one of the ten coldest during the last 55 years for Greece (Tolika et al. 2013), and several extreme cold episodes were observed.

Extreme climatic events are often analyzed by the statistical extreme value theory (EVT). The primary purpose of the EVT is to describe the tail of the distributions of random variables. For this purpose, the generalized extreme value (GEV) distribution and the generalized Pareto distribution (GPD) were used. The GEV commonly fits the block maxima series (Kotz and Nadarajah 1999; Kharin and Zwiers 2000; Katz et al. 2002; García et al. 2002), while the GPD distribution fits data series produced by the peaks over threshold (POT) method (Katz et al. 2002).

Block maxima are created by dividing the analysis period into non-overlapping periods with the same size and then choosing the maximum observation of each new period. The choice of the block size is critical because a very small block could create biases, while from too large blocks only a few extreme values could be selected (Coles 2001). Kharin and Zwiers (2000, 2005) showed that the annual block maxima approach could be appropriate for variables with a large block size, such as a daily temperature time series. Although the block maxima approach has a number of attractive features (offering a simple way for selecting the extremes), it also has some disadvantages. The crucial problem with this method is that many extreme events could be neglected due to their exclusion from the largest values of the selected block. POT defines values exceeding a given threshold as extremes. One of the principal concerns regarding the POT method is the definition of appropriate threshold levels. Beguería (2005) demonstrated that the POT method includes an uncertainty derived from the selection of a threshold value, a criteria that is frequently chosen subjectively. While the block maxima method could be easier in its application, as the time blocks are naturally appearing (Naveau et al. 2009; Van den Brink et al. 2005; De Valk 1993), the POT method is often preferred for its efficiency in describing the extreme’s behavior.

The characterization of distribution behavior is identified from three certain parameters: location, shape, and scale. These parameters could be estimated using different methods, such as the maximum likelihood (MLE), the L-moments, or the Bayesian approaches. MLE is considered a relatively simple procedure to estimate unknown parameters and is one of the most widely used methods due to its reliability for large samples. However, MLE could give unrealistic results for the shape parameter, especially when the studied data set has a small sample size (< 50) (Hosking and Wallis 1997; Coles and Dixon 1999; Martins and Stedinger 2000). The L-moments method is based on linear combinations of probability-weighted moments (PWMs), and it is commonly applied in the study of extremes due to its near lack of bias and low sensitivity to outliers (Rowinski et al. 2002). The basic Bayesian theory has been studied and explained in detail by Coles and Tawn (2005) and Coles (2001). From the Bayesian method, it is possible to supplement the information about the initial data, which is usually rare for the extremes, with information from alternative sources through their prior distributions.

No clear consensus exists over the best method to use in the study of extreme climate values. This paper attempts to evaluate the ability of the GEV distribution and GPD to characterize extreme temperature and precipitation values in selected stations in the Mediterranean Basin. The novelty of this study lies in its extensive comparison of different estimation methods using a significant number of goodness-of-fit tests. For this purpose, the EVT is applied to the data set, and its two main fundamental approaches (block maxima and POT) are analyzed and compared. Finally, based on the return level values that have been estimated by all the analyzed methods, the stations are classified into groups with the same characteristics.

2 Data

The study of extreme values requires highly credible data. This study employs high and low mean daily temperature and daily precipitation data from 12 meteorological stations, stretching across the European Mediterranean Basin from Spain to Greece (Fig. 1). The daily mean temperature and precipitation data for ten of the stations came from the “European Climate Assessment and Dataset” (ECAD data set; Klein Tank et al. 2002) (Table 1), while Athens and Thessaloniki’s data are provided (respectively) by the National Observatory of Athens and the Department of Meteorology and Climatology of the Aristotle University of Thessaloniki. The data series covers two time periods: one long time period of approximately 100 years (Bologna, Marseille, Athens (1901–2000), Barcelona (1916–2015)) and a shorter period of 60 years (1951–2010) for ten stations (Malaga, Barcelona, Nice, Bastia, Cagliari, Verona-Villa-Franca, Bologna, Gospic, Split-Marjan, Athens, Thessaloniki). These stations all had a percentage of missing values lower than 2%, ensuring the reliability of the data base for the purposes of our analysis.

Fig. 1
figure 1

The geographical distribution of the stations used in this study (MALA, Malaga; BARC, Barcelona; MARS, Marseille, Nice; BAST, Bastia; CAGL, Cagliari; VERO, Verona Villa Franca; BOLO, Bologna; GOSP, Gospic; SPLI, Split Marjan; ATHE, Athens; THES, Thessaloniki). Triangle represents the stations with time period 60 years and tetragonal represents stations with time period 100 years. Barcelona and Athens are represented from both triangle and tetragonal

Table 1 The climatological characteristics of the 12 stations

3 Methodology

In the present study, the extreme climate events were derived from the same initial data set using two different approaches: the block maxima and the POT. Based on the block maxima approach, the annual maximum daily precipitation and the annual high and low mean daily temperatures were selected for 100 and 60 consecutive years. For the second approach, POT, the thresholds were identified using percentiles. Anagnostopoulou and Tolika (2012) argued, based on climatological criteria, that the 99th percentile index is the most appropriate threshold for the description of the extreme rainfall events in Europe. In the present investigation, this percentile (99%) can satisfactorily represent the extreme precipitation events over the Mediterranean region. Similarly, the 95% percentile can characterize satisfactorily the extreme maximum temperatures and 5% the extreme minimum. This choice follows Karl et al. (2008), who used the 5th and 95th percentiles of daily maximum and minimum temperatures to study extreme events from 1946 to 2000. Moreover, Coelho et al. (2007) and Heikkila et al. (2011) claim that 95% threshold represents an acceptable balance between a high-enough cutoff value to define extreme events and a sufficient quantity of exceedances to represent those events.

After the selection of the extremes (with both block maxima and POT techniques), the GEV and the GPD were applied to the new data sets. The three parameters (location, shape, scale) of the aforementioned distributions are estimated by MLE, L-moments, and the Bayesian techniques. Goodness-of-fit tests are necessary in every study, as the extreme value theorem is a statement about asymptotic behavior. As a consequence, it is not guaranteed that the EVT provides the best fit for the data, especially when the data set is large. A variety of goodness-of-fit tests were thus applied to the data sets used here in order to find the best distribution with the most appropriate estimation method.

  • Cullen and Frey graphs (Cullen and Frey 1999) illustrate a combination of the square of the skewness (x-axis) and the kurtosis (y-axis) of the studied data set, using the maximum likelihood estimation method. The observed point is compared to theoretical standard distribution locations (point for uniform, normal, logistic, exponential; line for gamma and lognormal and area for beta distribution) in order to reject the inadequate distributions.

  • Kolmogorov-Smirnov, Anderson-Darling, and χ2 tests are some of the well-known statistical tests that compare a sample data set with a reference probability distribution or with two samples. First, a hypothesis is proposed for the statistical relationship between the compared samples, and then this framework is tested against a null hypothesis, which by definition identifies no relationship between the two data sets. The acceptance or rejection of the right hypothesis occurs at a predefined level of significance.

  • A QQ plot is a graphical technique used to determine whether the two data sets have a common distribution, or whether a data set follows the pattern of a certain distribution. One of the main advantages offered by QQ plots is that the compared data sets could have different sizes. This is important for the present investigation, as the extreme values data set defined by the block maxima method often has a different size from the POT’s data set, despite the fact that they characterize the same original data set.

  • The shape parameter strictly affects the shape of the distribution and determines the heaviness of each tail. A negative shape value comes from the Weibull distribution, which has a bounded upper tail; however, when the shape value is positive, the Frechet distribution is identified. Finally, the third distribution is Gumbel with a shape parameter equal to zero. As the shape parameter derives from the skewness representing where the majority of the data lies, it could be useful to test if a distribution is appropriate for the characterization of the studied data set.

4 Results

4.1 Stations’ climatology

The present study analyzes the precipitation and temperature characteristics of 12 Mediterranean stations. According to Table 1, the lowest mean daily temperature (LT) is recorded in Gospic (− 22.9 °C). In Marseille, Verona-Villa-Franca, Bologna, Split, and Thessaloniki, the LT is approximately − 10 °C, whereas in Barcelona, Nice, Cagliari, Bastia, and Athens, it is around 0 °C. Only Malaga presents LT higher than zero. The greatest high mean daily temperature (HT) is observed in Athens (36.4 °C), while all stations have HT greater than 31 °C. Gospic is the “coldest” studied station with mean temperature equal to 8.8 °C, whereas Malaga and Athens are the “warmest” (18.3 and 17.9 °C respectively).

A summary of the precipitation characteristics of the 12 studied stations is presented in Table 1. The mean daily precipitation amount for the majority of stations is approximately 2 mm/day. Τhe wettest stations are Gospic and Nice while the driest are Athens and Thessaloniki. The extreme rainfall events are independent from the mean rainfall regimes. For example, the maximum precipitation amount in Malaga is 313 mm, while the mean daily precipitation is low (1.55 mm/day). The maximum precipitation amounts that are recorded in Bastia, Marseille, Barcelona, and Nice are higher than 190 mm while in Thessaloniki, Cagliary, and Athens, they are lower than 120 mm.

4.2 Comparison between the data series of 100 and 60 years

Four of the 12 studied stations (Barcelona, Marseille, Bologna, and Athens) have temperature and precipitation data for 100 years. The database of these four stations was organized into two time periods: the long one with 100 years (Bologna, Marseille, Athens (1901–2000), Barcelona (1916–2015)) and the short one with 60 years (1941–2000). The block maxima and POT methods were used for extremes selection in the two sub-periods. Moreover, the GEV distribution and GPD (with the MLE) were applied to extremes data series and the critical parameters which can characterize their distributions were analyzed and compared. Figure 2 shows an overview of the scale, shape, and location values for the two sub-periods. It was found that most of the points in all diagrams are closed to the diagonal, meaning that the critical parameters of the two data sets are almost equal. Consequently, the distributions of extreme temperature and precipitation events for the two sub-periods are almost identical. According to these findings, a data set of 60 years can offer as reliable results as a data set of 100 years. Thus, a new data set of ten Mediterranean stations covering the period from 1951 to 2010 is used in the rest of the study. Bologna and Marseille are excluded from the rest of the study as their data do not cover the period from 1951 to 2010.

Fig. 2
figure 2

Comparison of the critical parameters of GEV and GPD distributions, between the two time series (100 and 60 years), for Bologna (red), Marseille (blue), Barcelona (yellow), and Athens(green)

4.3 Goodness-of-fit tests

4.3.1 Cullen and Frey graphs

Cullen and Frey graphs for each of the analyzed stations and for the three parameters (precipitation, HT, and LT) were created in order to reject the unlikely distributions of data sets. Following that, a number of goodness-of-fit tests were applied to the non-rejected distributions for the final choice. Figure 3 presents the Cullen-Frey graphs for each meteorological data station and each parameter, taking into account measures of accuracy (variance, confidence intervals, etc.) by bootstrapping (bootstrapped values—yellow circles). It can be observed that the beta, gamma, lognormal, and Weibull distributions appear as potential candidates to fit the extreme precipitation values for a majority of the stations. It is also clear that the extreme observations cannot be represented by the normal or the uniform distribution. Regarding the HT and LT extremes (Figs. 4 and 5), the Cullen and Frey graphs show that extremes seem to follow the beta, gamma, or Weibull distribution.

Fig. 3
figure 3

Cullen and Frey graphs for extreme precipitation of the ten studied stations and for the time period from 1951 to 2010

Fig. 4
figure 4

Cullen and Frey graphs for extreme high temperatures of the ten studied stations and for the time period from 1951 to 2010

Fig. 5
figure 5

Cullen and Frey graphs for extreme low temperatures of the ten studied stations and for the time period from 1951 to 2010

4.3.2 Anderson Darling, Kolmogorov Smirnoff, and χ 2 tests

As the choice of the appropriate distribution can be a critical success factor for the study, a further analysis of the general categories of the distributions represented here was achieved with statistical goodness-of-fit tests (Anderson Darling test, Kolmogorov Smirnoff test, and χ2 test). Table 2 presents a summary of the goodness-of-fit tests for the GEV distribution and GPD, presenting how well these distributions can fit the data sets compared with other 50 distributions. For example, the GPD is the second most appropriate distribution for the description of the precipitation data in Malaga according to the Anderson Darling (AD) test. By contrast, based on the same test, the GEV distribution is the fourth most appropriate to describe the behavior of the same data set. In general, it was found that both the GEV and the GPD are suitable for the description of the extreme events according to Kolmogorov Smirnoff (KS) and AD tests (at 5% level of significance). On the other hand, according to the χ2 test, the minimum extreme events cannot be fitted by the GEV or the GPD for the majority of the stations. Finally, it was noted that the mean rank of the GEV distribution is lower than that of GPD’s for each parameter, revealing that the GPD can describe more accurately the extreme temperature and precipitation events (Table 2).

Table 2 The rank of the suitability of GEV and GPD distributions for precipitation, maximum, and minimum temperatures, as a result of Anderson Darling (AD), Kolmogorov Smirnov (KS), and χ2 tests. R means that the test rejects the distribution

4.3.3 QQ plots

The assumption that GEV and GPD distributions could fit the examined data has been checked, and QQ plots provided a visual comparison of the three methods (MLE, L-moments, and Bayesian) which were used for the calculation of the distribution’s parameters. Figures 6, 7, and 8 present the QQ plots for the 10 Mediterranean stations and for the three studied climatological parameters (precipitation, HT, and LT). The QQ plots of the precipitation parameter (Fig. 6) reveal that both GEV and GPD distributions, estimated by the three studied methods (MLE, L-moments, and Bayesian), fit satisfactorily the extreme rainfall data. In general, the GPD-M, GPD-L, and GPD-B methods could characterize satisfactorily the studied data set, for small and medium rainfall values. On the contrary, GEV-M, GEV-L, and GEV-B present small deviances from the reference line. For the highest rainfall values, the GEV-B and GPD-B are the most appropriate for the majority of stations.

In the case of HT (Fig. 7), the points of the six compared methods fall along the diagonal line, in the middle of the graphs, but curve off in the extremities. In particular, GPD-M can characterize adequately the extreme temperature events in the whole graph, while GPD-B is more appropriate only for the highest extreme temperatures. For GEV distribution, all methods differ from the point of reference, diagonal line (y = x), and in the lowest part of the QQ plots. However, GEV-B shows the same fitting skill in extremely high temperatures as GPD-M and GPD-B.

Regarding LT (Fig. 8), it was observed that the studied distributions differ from the reference line in the lowest part of these graphs. GEV distribution differs from the diagonal line, in the upper and the median part of the graph. However, GPD distributions can characterize the medium and high LT. For the extreme LT, the GPD-L is less appropriate, whereas the GPD-M, GPD-B, and GEV-B (in the majority of stations) are the most suitable distributions for fitting the extreme LT.

Fig. 6
figure 6

QQ plots of precipitation for the 10 studied stations and for the time period from 1951 to 2010. The colored points represent the quantile points produced by GEV and GPD distribution (blue and red respectively). The different schemes of the points represent the three different methods (MLE—triangle, L-moments—star, Bayesian—circle)

Fig. 7
figure 7

QQ plots of high temperatures for the ten studied stations and for the time period from 1951 to 2010. The colored points represent the quantile points produced by GEV and GPD distribution (blue and red respectively). The different schemes of the points represent the three different methods (MLE—triangle, L-moments—star, Bayesian—circle)

Fig. 8
figure 8

QQ plots of low temperatures for the ten studied stations and for the time period from 1951 to 2010. The colored points represent the quantile points produced by GEV and GPD distribution (blue and red respectively). The different schemes of the points represent the three different methods (MLE—triangle, L-moments—star, Bayesian—circle)

4.3.4 Shape diagram

Figures 9, 10, and 11 illustrate the shape values of the GEV distribution and GPD with MLE, L-moments, and Bayesian approaches (GEV-M, GEV-L, GEV-B, GPD-M, GPD-L, GPD-B) for the three parameters of interest in order to choose whether to reject a distribution as appropriate or not for the data set under evaluation. In particular, since the Weibull distribution is bounded above, it is inappropriate for the description of the rainfall events. As a result, the methods with negative shape values should be rejected. For the precipitation parameter, the Frechet distribution could be fitted satisfactorily. By contrast, for a heavy-tailed distribution in which the higher values of the maximum are obtained with greater probability compared to distributions with light tails, it is inappropriate for the temperature parameter. As a consequence, a negative shape value should be adopted for the extreme temperature distributions (HT and LT).

Fig. 9
figure 9

Precipitation’s shape values of GEV and GPD distributions, estimated with MLE, L-moments, and Bayes methods. Different colors and schemes represent the six different studding methods. Numbers in axis y represent the ten stations (1, Malaga; 2, Barcelona; 3, Nice; 4, Bastia; 5, Cagliari; 6, Verona Villa Franca; 7, Gospic; 8, Split Marjan; 9, Athens; 10, Thessaloniki)

Fig. 10
figure 10

Max temperature’s shape values of GEV and GPD distributions, estimated with MLE, L-moments, and Bayes methods. Different colors and schemes represent the six different studied methods. Numbers in axis y represent the ten stations (1, Malaga; 2, Barcelona; 3, Nice; 4, Bastia; 5, Cagliari; 6, Verona Villa Franca; 7, Gospic; 8, Split Marjan; 9, Athens; 10, Thessaloniki)

Fig. 11
figure 11

Min temperature’s shape values of GEV and GPD distributions, estimated with MLE, L-moments, and Bayes methods. Different colors and schemes represent the six different studied methods. Numbers in axis y represent the ten stations (1, Malaga; 2, Barcelona; 3, Nice; 4, Bastia; 5, Cagliari; 6, Verona Villa Franca; 7, Gospic; 8, Split Marjan; 9, Athens; 10, Thessaloniki)

As shown in Fig. 9, the GEV-L poorly characterizes extreme precipitation values for nearly all stations (except Gospic and Thessaloniki). Additionally, the GPD-M and GEV-M should be rejected only for Gospic and the GPD-L for Verona-Villa-Franca. All of the other methods could satisfactorily describe the precipitation data set, as the shape value is positive and it identifies distributions with no upper threshold. Figure 10 presents the shape value results for the HT. Similarly to the precipitation parameter, the GEV-L does not provide an adequate fit for extreme high temperatures in nearly all stations (except Cagliari). The other five methods, due to their negative shape sign, are appropriate for the HT. In Fig. 11, the GEV-L method is also rejected, whereas the negative shape value of the other five methods shows that all of them could be acceptable.

In summary, shape diagrams offer another way of testing the ability of a particular distribution to fit a studied data set. In agreement with the QQ plots results, the GEV-L method is less appropriate for both temperature and precipitation values, while none of the other methods could be excluded from the results plotted on our graphs.

5 Return levels

5.1 Station’s classification (with heatmaps)

The estimation of return levels is commonly used for the description and qualification of climatic risks, since it offers an easy way to understand the measurement of extreme events. The r-year return level is the level expected to be exceeded once in the next r years (Coles 2001). The return levels of the precipitation, HT, and LT extremes were calculated for three return periods: 50, 150, and 300 years (Figs. 12, 13, and 14). In this part of the study, the stations were clustered into groups with similar characteristics (specified by the Euclidean method) based on their return levels.

Fig. 12
figure 12

Classification of the ten studied stations, based on the return levels of precipitation, for the three time periods (50, 150, and 300 years). X-axis represent the six studied methods, and the y-axis, the stations. Blue color represents the low return level values and red color the high values respectively

Fig. 13
figure 13

Classification of the ten studied stations, based on the return levels of high temperatures, for the three time periods (50, 150, and 300 years). X-axis represent the six studied methods, and the y-axis, the stations. Blue color represents the low return level values and red color the high values respectively

Fig. 14
figure 14

Classification of the ten studied stations, based on the return levels of low temperatures, for the three time periods (50, 150, and 300 years). X-axis represent the six studied methods, and the y-axis, the stations. Blue color represents the low return level values and red color the high values respectively

Figure 12 shows the classification of the ten stations based on the precipitation return levels. Bastia and Malaga present the greatest return level values for rainfall and thus are placed together in the first classification group. Additionally, Bastia and Malaga belong to the same group for 50-, 150-, and 300-year return levels. In the second classification group with medium precipitation amounts, a change is observed during the second return period (150 years). Specifically, for the first period (50 years), Barcelona, Gospic, and Nice show similarities in their precipitation amounts, but in the second time period (150 years), the precipitation in Gospic differs significantly from Barcelona and Nice. As a result, in both 150 and 300 years, Gospic belongs to the third classification group that includes Split, Cagliari, Verona-Villa-Franca, and Athens. Finally, according to heatmaps (Figs. 12, 13, and 14), the return level values for Thessaloniki are the lowest for all the analyzed periods. As a consequence, the fourth-driest classification group consists only of the station of Thessaloniki.

In the case of HT (Fig. 13), the first classification group contains only the coldest studied station (Gospic) in all three return periods. In the next classification group with medium HT, a change in classification groups is observed during the second return period. Despite the fact that Barcelona, Verona-Villa-Franca, Nice, and Bastia present similarities on their HT in the first return period (50 years), Barcelona shows a greater increase on HT during the two longer return periods. Thus, it is classified in the third group, which includes stations with higher HT. For the final period (300 years), the third group has the largest number of stations (five), as Malaga also fits in this category. The final warmest classification group includes Malaga and Athens for the return periods of 50 and 150 years. In the longest return period (300 years), Malaga does not see such high temperature values. Thus, the group with the greatest HT includes only Athens.

Figure 14 shows the results for the LT classification. The first coldest classification group contains only the Gospic station, which has the lowest LT in all return periods. Verona-Villa-Franca, Split, and Thessaloniki belong to the second classification group with medium LT during the entire return period, while Athens and Barcelona are included in the third group of the classification. Interestingly, in this group, Athens does not fall into the highest LT class. The final group with the highest return level values of LT consists of four stations (Malaga, Bastia, Cagliari, and Nice). A closer inspection of Fig. 14 shows that there is no change in the classification of the ten stations during the analysis period.

5.2 Return level diagrams

As discussed in Section 4.1, the selected stations can be classified into groups according to their return level values. In this subsection, the return levels for each station are compared as estimated by the six methods applied in this study (GEV-M, GEV-L, GEV-B, GPD-M, GPD-L, GPD-B). Figures 15, 16, and 17 show the return level diagrams of four stations based on the heatmaps of Figs. 12, 13, and 14 (one random station of each classification group).

Fig. 15
figure 15

Precipitation return level diagrams of four studied stations (one from each classification group). Colors represent the different methods (GEV-L, red; GEV-M, blue; GPD-M, green; GPD-L, yellow; GEV-B, orange; GPD-B, gray)

Fig. 16
figure 16

High-temperature return level diagrams of four studied stations (one from each classification group). Colors represent the different methods (GEV-L, red; GEV-M, blue; GPD-M, green; GPD-L, yellow; GEV-B, orange; GPD-B, gray)

Fig. 17
figure 17

Low-temperature return level diagrams of four studied stations (one from each classification group). Colors represent the different methods (GEV-L, red; GEV-M, blue; GPD-M, green; GPD-L, yellow; GEV-B, orange; GPD-B, gray)

Focusing on these individual stations, one example is provided in Bastia’s diagram from the “wettest” classification group in Fig. 15(a). In agreement with the corresponding heatmap (Fig. 12), the GPD-B predicts the highest return level values, reaching almost 400 mm (Fig. 15a), while the return level values are almost equal according to the GPD-L and GPD-M. Furthermore, the GEV-M and GEV-L give the lowest precipitation amounts for the entire analysis period (Fig. 15(a)). For Barcelona (Fig. 15(b)), the GEV-B and GPD-B yielded the highest values, while the GPD-L resulted in the lowest value. Moreover, the GEV-L and GEV-M lines are almost identical as in Bastia. For Athens (the third group), the differences between the GEV and GPD are noteworthy in both graphs (Fig. 15(c) and Fig. 12). In particular, the GPD predicts higher return level values than the GEV, regardless of the estimated method. The GPD-B and GPD-L presented the highest return levels for the precipitation, whereas the GEV-M provided the lowest return level. Finally, Thessaloniki (from the driest classification group) has precipitation return level values that do not exceed 120 mm. Figure 15(d) shows that the estimated return levels for Thessaloniki with the GEV-B, GPD-B, and GPD-L are higher than 100 mm for the next 300 years, while with the GEV-L and GEV-M, they are the lowest.

In agreement with Fig. 13, Athens presents the greatest HT return values, ranging from ~ 35 to ~ 38 °C (Fig. 16(a)). The GEV-L method predicts the greatest return level values, while the GPD-L provides the lowest values. It is interesting from Fig. 16(a) that almost all methods give similar return levels, except the GPD-L, whose return levels are much lower. This finding can also be detected in the equivalent heatmap (Fig. 13). For Thessaloniki (Fig. 16(b)), the GPD-L also estimates the lowest return level values, whereas the GEV-B provides the largest values. For the other methods, the predicted values are almost the same, especially after 150 years. For Bastia, (medium return level values) (Fig. 13, Fig. 16(c)), the return level values do not exceed 32 °C. It is clear from Fig. 16(c) that the GPD-L line is much lower compared to the other methods. This explains why this method is represented with a light blue color in the corresponding heatmap (Fig. 13), especially in the second time period (150 years). In addition, the GPD-M and GPD-B lines are very close and are higher than the other methods for the first 50 years and lower than the GEV’s lines after the first 50 years. Another significant result from Bastia’s diagram is that the GEV-L method gives the highest return level values that also do not differ substantially from the GEV-B predictions. Finally, Gospic from the “coldest” classification group has return level values, ranging from 25 to 28 °C (Fig. 16(d)). In agreement with the corresponding heatmap (Fig. 13), the GPD-L showed again the smallest return level values, in contrast to the GEV-L and GEV-B, which give the largest values. Furthermore, the estimation of the GPD-M and GPD-B for Gospic’s HT is similar.

Figure 17(a) presents the return level diagram for the LT of Bastia. As can been seen from Fig. 17(a), the GPD-L predicts the highest LT values, while the GEV-M and GEV-L provide the lowest values. Apart from the GPD-L, all the other methods give similar results for Bastia’s LT. The predicted LT return level values for Athens are lower than Bastia’s, ranging from − 1 to − 5 °C (Fig. 17(b)). Also for Athens (Fig. 17(b)), the GPD-L method gives the highest return levels up to the first 200 years, whereas the GPD-L and GEV-L have almost the same values for the past 100 years. Finally, the lowest LT return levels come from the GEV-B. From the classified group with medium return levels (Fig. 14), the results from the Thessaloniki station is chosen for illustration (Fig. 17(c)). As Fig. 17(c) reveals, the GEV-M and GEV-L give the highest return level values (~ − 6 °C), while the GEV-B provides the lowest (~ − 12 °C). Moreover, the predicted values of the GPD-M and GPD-B are closer for the entire analysis period. Another noteworthy observation from Fig. 17(c) is that the GPD-L is not the method that estimates the highest values, as observed in the previous stations, but predicts LT return levels close to − 8 °C. Finally the coldest station is Gospic, with LT return values from − 24 to − 18 °C (Fig. 17(d)). The GEV-M predicts the lowest return level values, whereas the GPD-L provides the highest values (Fig. 17(d)), which is also in agreement with the results from Fig. 14.

6 Discussion

The present study aims to examine different methodologies and approaches regarding the statistical analysis of extreme precipitation and temperature in the Mediterranean region. For this purpose, daily values of precipitation and mean daily temperature were used from European meteorological stations from Spain to Greece. The data covers two time periods, a long period of 100 years and a shorter one of 60 years. In order to analyze the extreme temperature and precipitation episodes, the EVT (extreme value theory) was applied. First, the extreme values were chosen and organized based on block maxima and POT (peaks over threshold) techniques. The choice of the most appropriate distribution for the characterization of extremes plays a crucial role in the EVT. For this purpose, graphical and statistical goodness-of-fit tests were applied in the data sets. Following this step, the GEV (generalized extreme value) distributions and GPD (generalized Pareto distributions) were applied in the new data sets using three different estimation methods: the MLE, the L-moments, and the Bayesian approaches.

Concerning precipitation, neither the uniform nor the normal distribution could describe the extreme episodes. This finding is in accordance with Lipton et al. (1995), who used a random data set and showed that uniform and normal distribution differ more than 34% from the extreme distributions tails. Also in agreement with our results, Kyselý (2010), Anagnostopoulou and Tolika (2012), Dyrrdal et al. (2015), and Roth et al. (2014) proposed the GEV and GPD as the most appropriate distributions for extreme precipitation events. Taking the above into account, a detailed check for the best-fitted distribution based on AD, KS, and χ2 tests revealed that although both the GEV and GPD distributions describe satisfactorily the extremes, the GPD is more appropriate. This finding enhances Coles (2001), who claimed that the POT method is better than the block maxima, and additionally corroborates Acero et al. (2011) and Roth et al. (2014), who recommended the POT and GPD methodologies for extreme rainfall analysis.

Moreover, the QQ plots of precipitation, in which the GEV and GPD with maximum likelihood estimation (GEV-M, GPD-M), L-moments (GEV-L, GPD-L), and Bayesian (GEV-B, GPD-B) methods are compared, showed that the selected estimated method could affect the analysis results. This was also concluded by El Aldouni et al. (2007), underlining the importance of choosing the right statistical approach. In their study on flood data, Martins and Stedinger (2000) proposed that the GEV-M is a more accurate method than the GEV-L. With respect to the QQ plots for extreme rainfall, we concluded that the GPD-M, GPD-L, and GPD-B methods can sufficiently characterize the extreme rainfall events. However, it should be noted that the GEV-B and the GPD-B best fit the upper precipitation tail (greatest extreme values). Taking into account these findings along with the shape parameter results, it was determined for our stations that the GEV-L method is not suitable for characterizing extreme precipitation, even though many studies have been using only the L-moment method as the default statistical approach (Kyselý 2010; Lee and Maeng 2003).

From the analysis, it was also discovered that the stations presenting the highest daily extreme rainfall amounts also have the greatest precipitation amounts for the return periods. Most of these stations, like Bastia and Malaga, are located in the western Mediterranean close to a prominent cyclogenesis region (Gulf of Genoa) (Trigo et al. 1999; Maheras et al. 2001). In the case of Malaga, the high daily precipitation amounts can be attributed to its location along Atlantic depression trajectories that are moving into the Mediterranean (Trigo et al. 1999). These depressions could justify the extreme rainfalls (< 200 mm) due to their climatological trajectories (S, SW, SE), access to a maritime moisture source, and winds and the jet stream interacting with nearby topography (Alps, Pyrenees, Massif Central) (Boudevillain et al. 2009). Moreover, the Mediterranean regions with the highest annual rainfall amounts (Adriatic and western Balkans) have, in general, lower maximum daily rainfall amounts compared to the stations in the western Mediterranean.

One example of this contrast is provided by Gospic, whose the 50-year return period was similar to those in Barcelona and Nice. However, in the case of longer return periods (150 and 300 years), Gospic was grouped with Split, Cagliari, Verona Villa Franca, and Athens, indicating that the extreme precipitation amounts in Gospic are not going to increase as much as in the Western part of the Mediterranean. This might be due to the fact that the western Mediterranean (Barcelona, Nice) is mainly affected by the intensification of the mid-latitude storm track over central and Western Europe (Lionello and Giorgi 2007). These results concur with Flato et al. (2013), who demonstrated a decrease in the frequency of the storm-related precipitation over the eastern Mediterranean. In addition, Lionello and Giorgi (2007), in their study of winter precipitation in the Mediterranean, showed that the low rainfall amounts observed in the southern and the eastern Mediterranean are associated with the reduction of the cyclone activity. In addition, the increase in the frequency of high-pressure systems results in a decrease in rainfall in the southern Balkans (Maheras et al. 2000).

Moreover, our findings suggest that the GEV-M and GEV-L methods give almost equal precipitation return level values in all stations (usually the lowest), while the GEV-B, also used by Dyrrdal et al. (2015) for extreme precipitation in Norway, gives greater return levels compared to the GEV-M and GEV-L. Furthermore, the GPD-B method, used by Cooley et al. (2007), provides the greatest return level values for the majority of stations.

Despite the fact that many researchers have analyzed the high-temperature extremes, studies concerning minimum-temperature extremes are much more limited. Thus, for both high and low temperatures, the same procedure (as precipitation) was followed. The Cullen and Frey graphs (Figs. 3, 4, and 5) showed that neither the extreme HT nor LT events could be characterized from the normal or the uniform distribution. By contrast, the GEV and GPD methods produced a better fit. Much of the current literature uses the GEV and GPD distributions to fit extreme event behavior. For instance, in Laurent and Parey (2007), both block maxima and POT methods were used; Nogaj et al. (2006) used the Pareto distribution to analyze the high- and low-temperature extremes, while Goubanova and Li (2007) preferred the GEV distribution to study potential future changes of temperature extremes. Taking into account the above studies as well as the Cullen and Frey graphs shown here, the GEV and GPD were tested for the extreme temperature events in the Mediterranean with the AD, KS, and χ2 tests. From the results, it was concluded that both the GEV and GPD can characterize the temperature extremes, although the GPD is more accurate.

Concerning the different estimated methods, the QQ plots revealed that the GPD-M can accurately characterize the extreme HT values, while the GEV-B and the GPD-B performed better for the upper tail of the extremes. Using a similar approach, Debusho (2016) applied the GPD-B method to the analysis of the maximum temperature extremes over South Africa. Moreover, Coles (2001) noted that, unlike the MLE method, a Bayesian analysis of extremes is not dependent on the regularity assumptions required by the asymptotic theory of maximum likelihood. The results from the LT QQ plots here are almost the same as those for HT extremes, implying that in both cases the GPD-L is less appropriate, despite the fact that L-moments have many theoretical advantages compared to other moments methods. For example, the asymptotic approximations of sampling distributions are better for L-moments than for ordinary moments (Hosking 1990). The fact that the GPD-L method is not suitable is also evident from the shape diagrams for both HT and LT extremes.

Finally, from the estimation of the temperature return level values, Gospic (as the station with the coldest climate) exists in an HT classification group of its own. On the other hand, the two southernmost stations of the study, Malaga and Athens, belong to the group with the greatest HT in the first two return periods, while in the 300-year period they demonstrate different behaviors. For this last return period, Malaga shows similarities with Thessaloniki, Cagliari, Split, and Barcelona.

The difference in trends between Malaga and Athens coincides with the findings in Hertig et al. (2010), who found that extreme maximum temperatures have a slight negative trend over the Iberian Peninsula, in contrast to an increase greater than 0.25 over the southeastern Mediterranean. In the case of LT extremes, this study reveals that there is no change in the classification groups during the entire analysis period. Specifically, Gospic remains in the coldest group, while Malaga, Bastia, Cagliari, and Nice belong to the group with the highest LT.

Finally, the diagrams that compare the return levels based on the different methods show that, for the HT, there is an important difference between the GEV-M and GEV-L values. This shows that the GEV-L and GEV-B give the highest return level values in the majority of the stations. Another important finding is that the GPD-L presents the lowest HT in all stations and differs significantly from the other methods. The return levels of the GPD-M and GPD-B, which have also been compared by Debusho (2016), are very similar. Moreover, the maximum likelihood estimator is the statistical method that was preferred by several researchers, such as Kioutsioukis et al. (2010), who used it to study the extreme climate events over Greece. Concerning the LT extremes, there are no statistical similarities with either HT or precipitation extremes. The GPD-L presents the highest LT for most stations, but the results also show important differences from station to station.

7 Conclusions

This research was set out to investigate the extreme high and low temperatures and extreme precipitation episodes in the Mediterranean region by evaluating widely used extreme value theory methods. The most significant findings from this study are:

  • The 60-year data set provides equally reliable results as the data set with 100 years, for the Mediterranean region and for the studied parameters.

  • Neither the uniform nor the normal distributions can characterize the behavior of the extreme events, while GEV and GPD are more appropriate for this purpose.

  • The GPD can characterize more accurately the extreme temperature and precipitation events, than GEV. This could probably be attributed to the fact that GPD is applied on data sets derived from the POT methodology, while GEV uses block maxima.

  • Despite the fact that GEV and GPD fit very well the extremes distributions (for precipitation, high mean temperature HT, and low mean temperature LT), the greatest values of the precipitation extremes are best fitted by the GEV-Bayesian and GPD-Bayesian. Additionally, for the HT and LT, the GEV-Bayesian, GPD-Bayesian and GPD-Maximum-Likelihood are the most accurate methods for the majority of the Mediterranean stations.

  • The L-moments method, and especially the GEV-L, is the least suitable method for the climatological parameters under study, compared to the Bayesian and the maximum likelihood estimation methods.

  • Based on the estimated return level values of the precipitation parameter, the western Mediterranean station (Malaga) and the station located in the Gulf of Genova (Bastia) are classified into the group with the highest precipitation levels. However stations from the eastern part of the Mediterranean are included in the group with the lowest levels. The Bayesian estimation method applied on GPD distribution gives the highest return level values for the stations, located in central and eastern Mediterranean.

  • The classification of the stations based on the HT and LT return levels reveals that there are shifts between the HT stations’ categories, during the studied return level periods, while on the LT groups, no change is observed. The Bayesian method applied on GEV distribution gives the highest (for HT) and lowest (for LT) return level for more than the half studied stations.

  • The eastern Mediterranean station (Athens) belongs to the warmest HT group of all the studied return level periods, while the western Mediterranean station (Malaga) belongs to this group only for the 50-year return period. Moreover, the station with the greatest elevation belongs to the coldest classification group according to both HT and LT return level values.