1 Introduction

Climate change is one of the most fiercely debated scientific issues in recent decades, and the changes in climate extremes are estimated to have greater negative impacts on human society and the natural environment than the changes in mean climate. Extremely high temperatures are among the most frequently investigated extreme events. Numerous studies were conducted to investigate possible effects of high temperature on the main human activities. Most analysts believe that extreme temperatures have considerable consequences: they can damage agricultural production (e.g. [1]), increase energy demand (e.g. [2]) and water consumption (e.g. [3]) and also badly affect human well-being, human health and even cause loss of human lives (e.g. [4,5,6,7]).

According to the synthesis of the Inter-governmental Panel on Climate Change (IPCC) [8], global mean temperature has shown a 0.85 °C (0.65–1.06 °C) increase over the period of 1800–2012 and a 0.74 °C increase during the last hundred years (1906–2005). The IPCC [9] report concludes that warming of the climate system is now “unequivocal,” based on observations of increases in global average air temperatures and the surface temperature is projected to rise over the 21st and it is very likely that heat waves will occur more often and last longer.

This work aims to model and analyse maximum temperature records in Riyadh city, the capital of Saudi Arabia, during the time period 1985–2014, using stationary and non-stationary extreme value theory (EVT) tools.

Within the climate literature, there are two main approaches for the diagnostic analysis of extreme events in climate data: the parametric and the non-parametric. The non-parametric approach is recognized by the definition of indices for extreme events, initially introduced by the Expert Team on Climate Change Detection Monitoring and Indices (ETCCDI). During the last years, there has been wide interest in analyzing climate with climate extreme indices (e.g. [10,11,12,13,14,15,16,17,18,19,20]). The main drawback of this approach is that it lacks the ability to extrapolate the results towards damage-relevant extremes with much larger return periods than those observed [21].

The second approach, initially developed by Fisher and Tippett [22], is founded on EVT techniques. EVT focuses directly on the tails of the sample distribution and, therefore, could potentially perform better than other approaches that model the whole distribution, in terms of predicting unexpected extreme changes [23]. EVT has been successfully applied in various fields, such as hydrology (e.g. [24,25,26]), engineering (e.g. [27]), finance and insurance (e.g. [28, 29]), oil price risk management (e.g. [23, 30]) and environment and climate fields (e.g. [31,32,33,34,35,36,37,38]). Within climate context, EVT has gained fast acceptance and popularity among both researchers and climate experts.

In the context of climate processes, it is common to observe non-stationarity due to seasonal variations and long-term trends owing to climate change. Consequently, it is essential to take into account the non-stationarity when modelling extremes. There are broadly two common strategies for dealing with the non-stationarity issues [39]: first, to use the full data set to detect and estimate non-stationarities, and then to apply methods for stationary extreme-value modelling to the resulting residuals, and, second, to fit a non-stationary extremal model to the original data. The first strategyFootnote 1 consists of using the full data set to detect and estimate non-stationarities induced by the volatility clustering using a GARCH type model and then to apply methods for stationary extreme value modelling to the resulting residuals (e.g. [23, 41]). The main advantage of this approach is that it reflects two stylized facts exhibited by most financial return series, namely stochastic volatility and the fat-tailedness of conditional return distributions over short time horizons.

The second strategy for modelling the extremes of a non-stationary process was initially introduced by Davison and Smith [24], who explicitly model the dependence structure using time series or covariate in the parameters of the model. Following this pioneering work, numerous studies describing methodologies have been proposed for the estimation of time-varying extreme value distributions on non-stationary time series. For example, Chavez-Demoulin and Davison [42] describe smooth non-stationary generalized additive modelling for sample extremes, in which spline smoothers are incorporated into models for exceedances over high thresholds. Scotto and Guedes-Soares [43] illustrate a modelling with non-linear thresholds. Besides that, the existing literature reveals that various studies have been conducted, under the non-stationarity conditions, by using different ideas and techniques (e.g. [44,44,45,46,47,48,50]).

As climate changes increase due to global warming, it is expected that more extreme weather events will occur. According to a recent study of Pal and Eltahir [51], the Arab gulf region and parts of southwest Asia could be uninhabitable before the turn of the century as temperatures are expected to rise to intolerable levels. These alarming results justify the need for further analysis of temperature records in the region. For the case of Saudi Arabia, changes in temperature have already begun to be detected, using climate indices (e.g. [11, 12, 52]), although a comprehensive analysis on the subject, based on rigorous statistical tools, has yet to be conducted. EVT may provide a rigorous statistical method for the investigation of extreme temperature events in response to current and future climate change and for the accurately assess the changes in extreme temperature.

The objective of this study is to provide a more comprehensive analysis of observed changes in temperature extremes in Riyadh city, during the time period 1985–2014, by applying stationary and non-stationary EVT tools. In the first step, stationary and non-stationary generalized extreme value (GEV) models were conducted for the yearly and half-yearly maxima temperature data. The results reveal that the stationary Gumbel distribution was found a reasonable model for the annual block maxima; however, a non-stationary GEV with a quadratic trend in the location was recommended for the half-yearly period. In the second step, generalized Pareto distribution (GPD) models, under stationary and non-stationary conditions, were fitted for the temperature data. The findings show that there is an improvement in modelling daily maxima temperature when it is applied to the declustered series and the given model outperforms the non-stationary GPD models. Furthermore, the retained GPD model was found better than the GEV. The estimates of the return levels obtained from both EVT models show that new records on maximum temperature event could appear within the next 20, 50 and 100 years.

To the best of our knowledge, no studies focused on modelling extreme weather conditions relative to Saudi Meteorological data, using stationary and non-stationary EVT models, despite the significant interest to further analyse the extreme temperatures data. Such analysis is essential to gain an understanding of their possible occurrence and dimensions in present and future climate.

The paper is organized as follows: Sect. 2 presents a review of extreme value theory. Section 3 provides the methods of estimation and model selection. Data and preliminary analysis are presented in Sect. 4. Results and discussions are presented in Sect. 5, and Sect. 6 concludes the paper.

2 Extreme Value Theory

In this section, we revisit the extreme value theory to provide a basis to our modelling of extreme temperature events in climate model data. Readers interested in a more detailed background may consult various texts on EVT such as Embrechts et al. [53]. EVT has two important results. First, the asymptotic distribution of a series of maxima (minima) is modelled and under certain conditions, the distribution of the standardized maximum of the series is shown to converge to the GEV distribution. The second significant result concerns the distribution of excess over a given threshold. EVT shows that the limiting distribution is a GPD.

2.1 GEV Distribution

2.1.1 GEV Distribution for Stationary Processes

EVT is concerned with the asymptotic distribution of standardized maxima (or minima) from a series of independent and identically (i.i.d) random variables with unknown common cumulative distribution function (cdf), F(x) = P(X ≤ x i ).

By the Fisher–Tippett theorem, the normalized maximum converges in the GEV distribution whose cumulative distribution is as follows:

$$ G\left(x,\mu, \sigma, \xi \right)=\left\{\begin{array}{c}\exp \left\{-{\left(1+\xi \left(\frac{x-\mu }{\sigma}\right)\right)}^{-\frac{1}{\xi }}\right\},\xi \ne 0,\kern1.5em 1+\xi \left(\frac{x-\mu }{\sigma}\right)>0\\ {}\exp \left\{-\left(\exp -\left(\frac{x-\mu }{\sigma}\right)\right)\right\},\kern0.5em \xi =0\end{array}\right. $$
(1)

where μ ∈ , σ >0 and ξ ∈  are the location, scale and shape parameters, respectively. The Fisher–Tippett theorem suggests that the asymptotic distribution of the maxima belongs to a Frechet, Weibull or Gumbel distribution, regardless of the original distribution of the observed data. The shape parameter ξ describes the tail behaviour of the maximum distribution. For ξ = 0, the GEV is the Gumbel distribution. For ξ > 0, the tail of the GEV is “heavier” than the tail of the Gumbel distribution, and for ξ < 0, it is “lighter” than that of the Gumbel distribution. The GEV is said to have a type II tail for ξ > 0 and a type III tail for ξ < 0.

Having modelled the upper tail of a distribution by fitting a GEV distribution, it remains to use such a model for inference. One of the main applications of extreme value analysis is the estimation of the once per N year return levels. An event exceeding such a level is expected to occur once every N years. The 1/N year return value based on GEV distribution, z N , is given by

$$ {z}_N=\left\{\begin{array}{c}\widehat{\mu}-\frac{\widehat{\upsigma}}{\widehat{\xi}}\left\{1-{\left(-\log \left(1-\frac{1}{N}\right)\right)}^{-\widehat{\xi}}\right\},\mathrm{for}\ \widehat{\xi}\ne 0\ \\ {}\widehat{\mu}-\widehat{\upsigma}\mathit{\log}\left\{-\log \left(1-\frac{1}{N}\right)\right\},\mathrm{for}\kern0.5em \widehat{\xi}=0\end{array}\right. $$
(2)

The classical G(x, μ, σ, ξ) model assumes that the three parameters of location, scale and shape are time independent [32]. This model is frequently called “the stationary approach”. However, if trends are detected in the data sample, the non-stationary case, where parameters are no longer constants but expressed as covariates (e.g. time), should be considered.

2.1.2 GEV Distribution for Non-stationary Processes

The stationary GEV in Sect. 2.1.1 can be extended to non-stationary processes by including covariates in the parameters of the model (e.g. Coles [32]). In this paper, we limit our investigation to the case of dependence only on time. The non-stationary GEV distribution can be denoted as G(μ(t), σ(t), ξ(t)) with distribution function

$$ G\left(x,\mu (t),\sigma (t),\xi (t)\right)=\left\{\begin{array}{c}\exp \left\{-{\left(1+\xi (t)\left(\frac{x-\mu (t)}{\sigma (t)}\right)\right)}^{-\frac{1}{\xi (t)}}\right\},\xi \ne 0,\kern1em 1+\xi (t)\left(\frac{x-\mu (t)}{\sigma (t)}\right)>0\\ {}\exp \left\{-\left(\exp -\left(\frac{x-\mu (t)}{\sigma (t)}\right)\right)\right\},\kern0.5em \xi (t)=0\end{array}\right. $$
(3)

Following El Adlouni et al. [46], Cannon [54] and Hasan et al. [55], we consider possible non-stationary behaviour of the location μ and scale σ parameters but keep the shape parameter ξ constant. More specifically, nine models of varying complexity that may be defined in this way (three choices for each of j and k), allowing up linear and non-linear dependence on time of both the location μ and scale σ parameters, were developed with parameters as described as follows: where

$$ {\displaystyle \begin{array}{c}\mu (t)={\mu}_0+{\mu}_1t++{\mu}_2{t}^2\\ {}\sigma (t)=\exp \left({\sigma}_0+{\sigma}_1t+{\sigma}_2{t}^2\right)\end{array}} $$
(4)

We denote by GEV jk the model with timedependence of order j in the location parameter and order k in the scale parameter. For example, the stationary GEV distribution is GEV00, obtained when the location and scale parameters are both independent of time (μ 1 = μ 2 = 0 and σ 1 = σ 2 = 0), whilst the GEV 21 non-stationary model assumes a quadratic trend in location and a log-linear trend in scale (σ 2 = 0). The choice of the appropriate specification was performed using the selection criteria such as Akaike information criterion (AIC) and Bayesian information criterion (BIC) and statistical tests.

2.2 Generalized Pareto Distribution

2.2.1 Stationary GPD Models

In practice, modelling the maximum of a collection of random variables is wasteful if other data on extreme values are available. Therefore, a more efficient approach to modelling extreme events is to attempt to focus not only the largest events, but on all events greater than some large preset threshold. This is referred to as peaks over threshold (POT) modelling.

Let us define the excess distribution above the threshold u as the conditional probability:

$$ {F}_u(y)=\Pr \left(X-u\le y\backslash X>u\right)=\frac{F\left(u+y\right)-F(u)}{1-F(u)},\kern.51em y\ge 0 $$
(5)

Balkema and de Haan [56] and Pickands [57] theorem showed that the generalized Pareto distribution (GPD) is the limiting distribution for F u (y) as the threshold tends to the right endpoint. They stated that if the distribution function belongs to the maximum domain of attraction of the extreme value distribution of G, then it is possible to find a positive measurable function σ(u) such that

$$ \underset{u\to {x}_F}{\lim}\underset{\kern2em 0\le y\le u-{x}_F}{\sup}\left|{F}_u(y)-{G}_{\xi, \sigma (u)}(y)\right|=0 $$
(6)

where G ξ, σ(u)(y), the generalized Pareto distribution (GPD), is

$$ {G}_{\xi, \sigma (u)}(y)=\left\{\begin{array}{c}1-{\left\{1+\xi \left(\frac{y}{\sigma (u)}\right)\right\}}^{-\frac{1}{\xi }},\xi \ne 0,\kern1em 1+\xi \left(\frac{\mathrm{y}}{\sigma (u)}\right)>0\\ {}1-\exp \left(-\frac{\mathrm{y}}{\sigma (u)}\right),\kern0.5em \xi =0\end{array}\right. $$
(7)

where y ≥ 0 for ξ ≥ 0 and 0 ≤ y \( \le -\frac{\sigma (u)}{\xi } \) for ξ < 0.

The GPD embeds a number of other distributions. If ξ > 0, then G is a reparametrized version of the ordinary Pareto distribution, ξ = 0 corresponds to the exponential distribution and ξ < 0 is known as a Pareto type II distribution.

The estimated return level x m that is exceeded on average once every m observations is

$$ {x}_m=\left\{\begin{array}{c}\widehat{\mu}+\frac{\widehat{\sigma}}{\widehat{\xi}}\left({\left(m{\zeta}_u\right)}^{\widehat{\xi}\hat{\mkern6mu} }-1\right),\mathrm{for}\ \widehat{\xi}\ne 0\ \\ {}\widehat{\mu}+\widehat{\sigma}\log \left(m{\zeta}_u\right),\mathrm{for}\kern0.5em \widehat{\xi}=0\end{array}\right. $$
(8)

where u is the selected threshold value, \( {\upzeta}_u=P\left(X>u\right)=\raisebox{1ex}{$k$}\!\left/ \!\raisebox{-1ex}{$n$}\right. \), k is the number exceedances and n is the number of observations. By construction, x m is the t-observation return level; however, it is often more convenient to give return levels on an annual scale, so that the N year return level is the level expected to be exceeded once every N years. If there are m x observations per year, this corresponds to the m-observation return level with m = m x  × N. Hence, an estimate of the N year return level z N is defined by

$$ {x}_N=\left\{\begin{array}{c}\widehat{\mu}+\frac{\widehat{\sigma}}{\widehat{\xi}}\left({\left(N{m}_x{\zeta}_u\right)}^{\widehat{\xi}}-1\right),\mathrm{for}\ \widehat{\xi}\ne 0\ \\ {}\widehat{\mu}+\widehat{\sigma}\log \left(N{m}_x{\zeta}_u\right),\mathrm{for}\kern0.5em \widehat{\xi}=0\end{array}\right. $$
(9)

The selection of the threshold u, which is crucial for the success of the GPD modelling, involves a delicate trade-off between bias and variance. If the threshold is chosen too high, then there are not enough exceedances over the threshold to obtain good estimators of the extreme value parameters, and consequently, the variances of the estimators are high. Contrariwise, if the threshold is too low, the GPD may not be good fit to the excesses over the threshold and there will be bias in the estimations. Several diagnostic techniques exist for this purpose, including graphical and bootstrap methods (e.g. [53]).

In the present study, we have chosen our threshold u using standard exploratory techniques, based on parameter stability plot (e.g. [32]), to aid the choice.

The POT model requires that the exceedances be mutually independent. However, especially in the case of climate data, this assumption may be violated because of serial correlation. In fact, at high levels, this appears as clustering of values of the series. To overcome this issue, a declustering approach that involves filtering the data series to remove the clustering and gain a set of independent threshold exceedances is used.

Most declustering methods are based on the estimation of a statistic called the extremal index θ. The extremal index is defined as the reciprocal of the limiting mean cluster size [32]. In the presence of no autocorrelation (clustering) in the series, then θ = 1. Else if θ < 1, then there is clustering in the data. The most widely adopted method for dealing with this problem is declustering, which filters the dependent observations to obtain a set of threshold excesses that are approximately independent.

Mainly, there are two approaches for estimating the extremal index based on run length r: In the first case, clusters are formed by arbitrarily specifying run length r, such that a cluster is considered active until r consecutive values fall below the threshold u. The extremal index is then estimated as the quotient of the number of clusters over the number of exceedances of the threshold. This approach is called “runs estimator”.

In the second case, optimal run length is obtained through the estimation of the extremal index assuming the exceedance times to be observed values of a point process whose limit is the Poisson process distribution. This latter method “intervals estimator” was formulated by Ferro and Segers [58], extending on the work of Hsing et al. [59] and Smith and Weissman [60], amongst others. The choice of r affects the bias-variance trade-off. A value of r that is too small raises concerns over the validity of the assumption of independent of cluster maxima. Conversely, large values of r could result in too few cluster maxima, hence raising concern over the precision of the GP distribution’s parameter estimates. This procedure does not involve any arbitrary choice such as that mentioned above, but it uses the extremal index, a parameter which measures the degree of clustering of extremes in a stationary process and takes values in the interval [0, 1].

In the present work, we have used an automatic declustering scheme developed by Ferro and Segers [58] to obtain independent clusters of exceedances. For a detailed review on different declustering methods, the interested reader is referred to Smith et al. [61], Ferro and Segers [58] and Fawcett and Walshaw [62].

2.2.2 Non-Stationary GPD Models

The POT method explained previously is only valid for extremes for which we can assume stationarity. The idea is now to describe non-stationarity, as it is the case for temperature series that generally shows a strong seasonal pattern, by allowing the GPD parameters to depend on time or other covariates. In the literature, various approaches have been developed to deal with non-stationarity arising from seasonality or trend.

The most widely adopted technique to deal with data that vary seasonally is to partition the data into seasons and perform a separate extremal analysis on each season (e.g. [20, 63]). The second approach is based on fitting continuous parametric functions to capture the seasonality (e.g. [64]).

Thus, attempting to model any seasonal variation, we opt for the second approach and we include in our analysis a non-stationary GPD model with cycle variation in the scale parameter. From Eq. (7), the shape parameter is assumed to be constant, that is ξ(t) = ξ, whilst σ(t) is given by

$$ \log \left(\sigma (t)\right)={\sigma}_0+{\sigma}_1\sin \left(\frac{2\pi }{365.25}t\right)+{\sigma}_2\cos \left(\frac{2\pi }{365.25}t\right) $$
(10)

For comparison, we introduce a second non-stationary GPD model with linear time covariate in the scale parameter given by

$$ \sigma (t)=\exp \left({\sigma}_0+{\sigma}_1t\right) $$
(11)

It should be noted that the exponential function has been adopted to introduce time dependency into the scale parameter to ensure its positivity. Note that it is also possible to incorporate the non-stationarity into the shape parameter. However, it is very difficult to estimate the shape parameter of the extreme values distribution with precision when it is time dependent, and therefore, it is not realistic to attempt to estimate the shape parameter as a smooth function of time [32]. Superiority of the non-stationary GPD models over the stationary GPD models was investigated through the likelihood-ratio test.

3 Estimation and Model Selection

The model parameters for GEV and GPD can be estimated in a variety of ways. Possible methods include maximum likelihood techniques (ML), the L-moment approach of Hosking [65] and Bayesian methods (e.g. [66]). Most common methods for parameter estimation in climate research are the ML estimation (e.g. Katz et al. [26, 36]) and the method of moments. The ML method, although problematic when applied to very small samples, is the preferable method due to its universal applicability and its nice asymptotic properties. Moreover, the method allows for the introduction of covariates such as time into the model [36]. The ML approach seems to be most common within the literature, and so we concentrate on this method here.

The analysis of extremes with EVT has been performed using the free software R and the extRemes package, which is designed for problems of extreme weather events and climate. For a brief introduction to the capabilities of extremes, we refer to the paper of Gilleland and Katz [67].

In the literature, there exist various methods to identify the best model out of a set of cautiously selected candidate models. One approach involves the information criteria, e.g. the Bayesian information criterion (BIC, e.g. [36]), the Akaike information criterion (AIC, e.g. [68]) and the Hannan-Quinn information criterion (HIC, e.g. [69]).

In this study, the AIC and the BIC statistics were used to determine which of the candidate models is most applicable. These two statistics identify the best model, which is supposed to fulfil the individual Student’s t test on the parameters, when minimized. For a discussion on the use of these criteria for model selection, we refer to Burnham and Anderson [70].

4 Data and Preliminary Analysis

4.1 Data

The data used in this study consists of daily maximum temperatures for 30 years spanning from 1 January 1985 to 31 December 2014. The data was collected from the Saudi Arabia Presidency of Meteorology and Environment (PME). The city of Riyadh, the capital of Saudi Arabia, was selected as the case study area. Riyadh is located at 24.7° N and 46.71° E at an elevation of 635 m above sea level and is a typical station experiencing the hot and dry climate of central Saudi Arabia. The city has experienced significant population growth and urban expansion during the last decades. Such development may have been accompanied by changes in the local extreme temperature patterns [11]. In fact, new industrial zones have come up in hitherto green and open areas, leading to increased air pollution within the urban environment. The ever-increasing number of motor vehicles also points to more pollution. Rising greenhouse gases and changes in the reflective properties of the Earth’s surface are predicted to raise global temperatures [16].

This data set is challenging to model given the variety of extreme weather and climate that make it vulnerable to a changing climate.

4.2 Preliminary Analysis of Temperature Data

Table 1 shows some descriptive statistics for the daily temperatures together with the two selection periods: the yearly and the half yearly. The daily maximum temperature in Riyadh fluctuates between 2.5 and 48.2 °C over the period 1985–2014. The 10,957 daily maximum temperatures has a standard deviation 9.1 and a coefficient of variation 0.269.

Table 1 Descriptive statistics of maximum temperature

After partitioning the data into yearly and half-yearly selection periods, it is observed that as the selection period increases, the difference between the minimum and maximum gets smaller, and the coefficient of variation decreases. This indicates that the maximum temperature data is less dispersed from the mean as the selection period increases. The skewness (SK) is negative for both periods, which indicates that the left tail of the distribution is relatively longer than the right side, implying that most of the data is concentrated on the right of the mean; the reverse is true for positive skewness. Figure 1 shows the time series plot of the annual maximum temperature (°C) observed at Riyadh over the period 1984–2014. The graph shows no clear trend in the annual maximum data. This visual impression will be confirmed by statistical tests.

Fig. 1
figure 1

Annual maximum temperature (°C) observed at Riyadh, 1984–2014

In order to get an idea about the extremes of our data, we look at the absolute frequencyFootnote 2 histograms of daily extreme temperature presented in Fig. 2. The graph displays the frequencies of extreme daily temperature per year for the time period of 1 January 1985 through 31 December 2014. We can see that as we increase the cut-off level as to what we define as extreme temperature, the number of observations declines substantially. In addition, there were some years that had many extreme temperature superior to T ≥ 45 °C (e.g. 2000, 1998, 2010, 1999 and 1996). Broadly, we can see that the frequency of extremes has been higher in late 1990 until it reaches a peak in 2000 and it seems that the very extremely observations are more frequently after 1996.Overall, we can see that about the third of the year where the daily maximum temperatures belong to the interval; 40 ≤ T < 45 °C and about the sixth of them are between 35 ≤ T < 40 °C.

Fig. 2
figure 2

Absolute frequencies of daily extreme temperatures (1 January 1985–31 December 2014)

Prior to the application of the EVT models, some tests are required to assess the proprieties of our date. Three types of tests were performed for testing the assumptions of independence, stationarity and existence of trend. In particular, the GEV distribution needs that the random variables must be independent with common distribution. Violation of the assumption of independence will occur since seasonality will cause temperature data to vary accordingly. Failure to consider non-homogeneity into account will affect the analysis of the data and then produce inaccurate results. Therefore, larger blocks are considered to assure that the assumption of having common distribution is plausible, even though it will generate only few block maxima. In addition, the choice of this block length can often avoid the need to account for yearly seasonal variation in environmental data, which may be seen as an advantage over other block lengths for this type of application.

One way for checking the assumption of independence is to perform the Ljung–Box test. The results of the test, presented in Table 2, show that the null hypothesis of independence could not be rejected for both selection periodsFootnote 3 (yearly and half-yearly). Such results are crucial for modelling with GEV distribution, which is based on the assumption of independence of random variables.

Table 2 Statistical tests for maximum temperature data

In the same table, we observe the results of the Augmented Dickey Fuller (ADF) and the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests that have been performed to check the stationarity assumption of the data relative to different selection periods. The ADF tests suggest that the null hypothesis of unit root cannot be rejected here for both yearly and the half-yearly periods; however, the KPSS test concludes to the stationarity of the yearly maxima data but not for the half-yearly.

The Mann–Kendall (MK) trend test is performed in order to detect the presence of monotonic trend in the data under the null hypothesis of absence of trends. The obtained result, in Table 2, reveals that the null hypothesis was accepted for both selection periods at the 5% level whilst we fail to reject the null hypothesis for the half-year period at the 10% level.

From the MK test result, it seems that there is some concordance with the previous results of the stationarity test, essentially if we consider the KPSS test and we relax the level of significance to 10%. These results suggest that we ought to model for both stationarity and non-stationarity of the data set.

Based on the above analysis, we are justified in trying to fit extreme value theory to the data over the studied time period.

5 Results and Discussions

5.1 Application of GEV Distribution to Annual Temperature Extremes

In this study, data of maximum temperature are firstly blocked into annual lengths. An analysis of annual maximum data is likely to be more robust compared to shorter blocks. Consequently, larger blocks are considered in our study to assure that the assumption of independence is plausible, even though it will generate only few block maxima. The choice of an annual block length can often avoid the need to account for yearly seasonal variation in environmental data, which may be seen as an advantage over other block lengths for this type of application.

Since the time series available for this study typically cover only a few decades, the sample sizes of annual maxima data are relatively small. In fact, if a 1-year block is used, this study will only have 30 annual maximum temperatures for the purpose of modelling, which may affect the accuracy of the estimated parameters of the GEV distribution. We tried to overcome this shortcoming by considering shorter blocks length (monthly, quarterly, and half-yearly block lengths) for the GEV model. However, we finally retained only the yearly and half-yearly periods, as they are the only block lengths that lead to sample data that meet the assumption of independence, required for the stationary GEV.

The model selection is based on nine models of varying complexity, as defined on Sect. 2.2, that incorporate both stationary and non-stationary GEV. The extreme value analysis is first performed by fitting the GEV distribution, assuming constant location and scale parameters to the sequence of annual maxima. Then, the likelihood ratio testFootnote 4 is then used to compare the goodness-of-fit of the model to the Gumbel distribution (GUM). It depends on the result of the test; if the null-hypothesis H0: ξ = 0 against the two-sided alternative (ξ ≠ 0) is not rejected at the significance level of 0.05, we retain in a first step the Gumbel distribution and we try to improve our modelling approach by allowing for time dependence on the location μ and/or on the scale σ parameters. However, if we fail to accept the null-hypothesis, the analysis will be based on the GEV jk models defined in Sect. 2.1.2. Model selection and parameter estimates of different candidate models are given in Tables 3 and 4.

Table 3 Model selection of GEV candidate models (for annual maxima) and return levels for the retained model
Table 4 Model selection of Gumbel candidate models (for half-yearly maxima) and return levels for the retained model

Table 3 presents the estimated results for annual maxima together with their resulting AIC and BIC values. The estimated shape parameters for the annual period were found to be very close to 0. The null hypothesis H: ξ = 0 was tested against the two-sided alternative (ξ ≠ 0) using the maximum likelihood test. The null hypothesis is not rejected at the significance level of 0.05, which justifies the application of the Gumbel distribution to the annual maxima of temperature. Since a stationary behaviour can be unrealistic for maxima, particularly in a climate change context, the present model is compared with eight candidate models that are considered as defined above. The results show that the best specification is the simple stationary Gumbel distribution (GUM00). It is important to note that this model has the smallest AIC and BIC criteria, after removing the models with insignificant parameters (e.g. GUM20) from the competing models. This finding may be confirmed by the Mann–Kendall (MK) trend test and the stationary tests results. In fact, in the former, we have failed to detect the presence of monotonic trend in the annual data and in the latter, we have showed that our data are level stationary.

The various diagnostic plots for assessing the accuracy of the Gumbel model are shown in Fig. 3. Both the probability plot and the quantile plot show the reasonability of the Gumbel fit: each set of points follows a quasi-linear behaviour. The return level plot shows approximate linearity, since it corresponds to ξ = 0 of the Gumbel distribution. Finally, the corresponding density estimate seems consistent with the empirical density of the data. Consequently, all four diagnostics plots support the fitted Gumbel model.

Fig. 3
figure 3

Model diagnostics for the GUM00 fit to the Riyadh maximum temperature data shown. Quantile–quantile plot (top left), quantiles from a sample drawn from the fitted model df against the empirical data quantiles with 95% confidence bands (top right), density plots of empirical data and fitted model df (bottom left) and return level plot with 95% point-wise normal approximation confidence intervals (bottom right)

Once the best model for the data has been selected, the interest is in deriving the return levels of extreme maximum temperature. Estimates and confidence intervals for return levels for 20, 50 and 100 years are obtained by Table 3 based on the GEV00 model. It can be seen from the table that the return levels for maximum temperature gradually increase for higher and higher return periods. From the above results, one would expect that the maximum temperature (°C) at Riyadh will exceed about 49.02 on average every 20 years, will exceed about 49.25 °C on average every 50 years and will exceed about 50.34 °C every 100 years. The 95% confidence intervals, in degrees Celsius, were (48.57; 49.93), (49.87; 50.72) and (49.36; 51.32), respectively. It is almost certain that the yearly maximum will exceed the current maximum, which is 48.2 °C.

The estimated results for annual maxima together with their resulting AIC and BIC values for the half-yearly period are shown in Table 4. The null hypothesis H: ξ = 0 was tested against the two-sided alternative (ξ ≠ 0) using the maximum likelihood test. The null hypothesis is rejected at the significance level of 0.05, which justifies the application of the GEV distribution to the half-yearly maxima. We follow our analysis by examining the existence of any trend in GEV parameters.

Our results show that the best specification is non-stationary GEV20 is the best model. This model satisfies the individual Student’s t test and minimizes both criteria, the AIC and BIC. The shape parameter of temperature was found negative ξ = − 0.31, indicating that the distribution of the extremes values is of Weibull form. Hence, the variable exhibits tail behaviour such that the upper tail is bounded at a finite upper point.

Diagnostic plots of Fig. 4 show that the fitted model is satisfactory. The return levels for the GEV20 model vary through the years because the model is non-stationary, which prevents the straightforward presentation of results.

Fig. 4
figure 4

Model diagnostics for the GEV20 fit to the Riyadh maximum temperature data. Quantile–quantile plot (top left), quantiles from a sample drawn from the fitted GEV df against the empirical data quantiles with 95% confidence bands (top right), density plots of empirical data and fitted GEV df (bottom left) and return level plot with 95% point-wise normal approximation confidence intervals (bottom right)

5.2 Application of the GPD to the Maximum Daily Temperatures

The POT approach requires that the exceedances be mutually independent. However, in the case of temperature data, this assumption may be violated because of serial correlation. For example, a hot day is likely to be followed by another hot day. To overcome this issue, a declustering approach, which involves filtering the data series to remove the clustering and gain a set of independent threshold exceedances, is used.

Prior to the declustering, we need to specify a threshold for the GPD model. As we have explained in Sect. 2.2, the selection of the threshold is crucial for the GPD. However, there is no systemic approach that allows for the choice of an optimal threshold. In order to solve this problem, we find an appropriate threshold for GP models by fitting them to a sequence of thresholds in order to find the lowest threshold that yields roughly the same parameter estimates as any higher threshold (see [32] for further details).

Therefore, the GPD was adjusted to the maximum daily temperatures. The maximum likelihood estimators of the modified scale (σ) and shape parameter (ξ) plotted versus u for the daily maximum temperatures are shown in Fig. 5. If the GPD is a reasonable model for the exceedances above a certain threshold u, the estimates σ and ξ should remain near-constant [32].

Fig. 5
figure 5

Adjustment of the GPD for a range of 25 threshold values from 40 to 48 °C. The modified scale (σ*) and the shape parameter (ξ) versus the threshold for the daily maximum temperature are shown for Riyadh, KSA during the period 1980–2014

The plotFootnote 5 indicates a range of acceptable values for u. It appears that the results are not very sensitive to the choice of threshold ranging between 41 and 44. Parameter perturbations are small until the chosen threshold of 42 is reached. Such choice was found reasonable. The reasoning underlying this choice is that we need to have a value which is large enough so that the limiting GPD of Eq. (6) is a good approximation for the exceedance distribution, whilst not so large as to reduce unnecessarily the number of exceedances available for the analysis. In order to confirm the rightness of our threshold choice, QQ plots were created using data screened at different thresholds. From the plots in Fig. 6, we can see that the chosen threshold of 42 appears to do an adequate job of predicting most of the data. Based on this, one could recommend u = 42 as the one to be used.

Fig. 6
figure 6

QQ plots for temperature data for a range of threshold

Before deciding whether there is a need for declustering or not, we begin by looking at the auto-tail dependence function plot, shown in Fig. 7, which allows us to check whether there exists temporal dependence in threshold excess data.

Fig. 7
figure 7

Auto-tail dependence function for daily temperature data using a quantile threshold of 0.90

The sample auto tail-dependence functionFootnote 6 based on \( \widehat{\rho} \), an auto-tail dependence estimate suggested by Reiss and Thomas [71], along with a plot against increasing lags is produced. It takes on values between zero and one (inclusive). If the values over a high threshold are stochastically independent, then the values of \( \widehat{\rho} \) should be close to (1 – u) at each lag, where u is the quantile threshold, which in the example above u = 0.90. Inspection of Fig. 7 shows that all lags equal or greater than five are fairly close to 0.4, but that the lag-one, two, three and four terms are between 0.45 and 0.63, so the assumption of independence may not be reasonable for these data.

In order to evaluate the strength of tail dependence, the extremal index was estimated, using the threshold 42, for maximum temperature data using the method of Ferro and Segers [58]. Consequently, the optimal run length was determined. The estimate of the extremal indexFootnote 7 was found 0.031, with a 95% confidence level of (0.023, 0.043), which is suggestive of strong tail dependence (in agreement with our findings in Fig. 7). The estimated run length was found 7. Having evidence of tail dependence, we decide to employ a declustering scheme to filter out a set of approximately independent threshold excesses.

In order to check the existence of any improvement in our modelling based on the declustered data, we present the results of the estimation of two stationary GPD models. The first model, noted (model M 1), is applied to the data assuming the hypothesis of the independence and the second, noted (model M 2), is fitted to the declustered series of daily maximum temperature using the optimal run lengths determined by the method of Ferro and Segers [58].

Table 5 displays the parameter estimates for both estimated models. For the model M 2, we can see that the shape parameter estimates is negative (ξ = − 0.39), indicating a short-tailed distribution. Hence, the variable exhibits tail behaviour such that the upper tail is bounded at a finite upper point. The 95% confidence interval for ξ is (− 0.483; − 0.311) does not contain zero which implies the upper tail here may not decay exponentially. The result shows that there is a strong evidence, at 5% significance level to support the suitability of an upper bounded distribution ξ < 0, for the declustered data.

Table 5 Summary of results from fitting the GPD models with and without declustering

For the model M 1, we also obtain strong evidence in support of GPD with ξ < 0. The shape parameter estimate is given by ξ = − 0.420, which is different from the estimate value found in the model M 2.

From the Table 5, we note that when the GPD is fitted to the declustered series, the scale parameter σ is overestimated, and the shape parameter ξ is underestimated relative to the approach which uses all exceedances (without declustering).

Figure 8 shows the QQ plots for the declustered series having used a run length of 7. Such plots indicate a good model fit as the points follow the path of the line of equality. In the same figure, we observe the QQ plot of the GPD fitted to all exceedances. It seems that GPD fitted to the declustered data (model M 2) is the most appropriate for modelling our data. We note that there is a substantial improvement in GPD fit after clustered observations were filtered from the analysis. In addition, the model M 2 showed the lowest values for both AIC and BIC information criteria.

Fig. 8
figure 8

QQ plots of the GPD models fitted to the data with declustering (left) and without declustering (right)

Of greater practical interest are the estimated return levels. Table 6 shows that the estimates are slightly different for the 20-year return periods but are greater for the declustered data for the 50- and 100-year return periods. We can expect that a maximum temperature event will reappear within the next 20, 50 and 100 years.

Table 6 Estimates of return levels for the GPD models fitted to the data without and with declustering

Within climate data, it is common to observe non-stationarity due to different seasons having different climate patterns or due to more long-term trends owing to climate change. Consequently, it is essential to take into account of the non-stationarity when modelling extremes. As an example, we have plotted the daily maxima for the years 2013 and 2014 (Fig. 9). It is clear that the temperatures follow a seasonal pattern: the temperature increases from January until it reaches a peak during summer (June, July and August) and then decreases until it reaches a minimum during winter (December, January and February).

Fig. 9
figure 9

Maximum daily temperature records from January 2013 to December 2014

To avoid these seasonal variations when dealing with modelling non-stationarity, one possibility to describe the variation due to seasonality in exceedances could be a separate season analysis. An alternative way consists to incorporate sinusoidal functions of time to the parameter estimates of the GPD model.

In this study, two non-stationary GPD models were included in our analyses with the shape parameter is assumed to be constant, that is ξ(t) = ξ, whilst σ(t) is given by either a cycle variation in the scale parameter (Eq. 10) or a linear time covariate in the scale parameter (Eq. 11).

The results of fitting the non-stationary GPD models are given in the Table 7. The results show that there is a significant trend in the scale parameter for both models.

Table 7 Summary of results from fitting the non-stationary GPD models

By comparing the log-likelihood values of the two non-stationary models, it seems that the non-stationary GPD with cycle variation in the scale parameter is more appropriate than one with linear time covariate in scale parameter.

The likelihood ratio test is used to compare the three models and the test statistic and p values are listed in Table 8. According to the p values of the likelihood ratio comparison between the stationary GPD model (M 1) and the GPD model with cycle variation in the scale parameter (model M 3) and that with model with linear time covariate in the scale parameter (model M 4), the last two models provide a significant improvement over the model M 1 whilst the model M 3 is more appropriate.

Table 8 Likelihood ratio test results for comparison of models 1 versus 3 and 1 versus 4

However, by comparing the AIC and the BIC criteria in Table 7 to those presented in Table 5, it appears that the GPD fit after clustered observations is more reasonable. There is strong evidence that the GPD model adequately estimates the tail behaviour of the distribution of the data series once the data was declustered.

5.3 Comparison of EVT Models

It is well known that theoretically the parameter ξ is the same in the GEV and the GPD. However, due to different amounts of data used in the block maxima analysis versus exceedances, the parameter estimates are different. In fact, if the block is annual, the GEV only considers annual maxima, whilst the GPD tries to model all values that exceed a certain threshold u.

In this paper, the Gumbel was found to be a reasonable choice for modelling the annual block maxima; however, by considering half-yearly block maxima, we have found that the most applicable model is a non-stationary GEV with a quadratic trend in the location parameter. This result may be due the relatively small sample sizes (30 observations) of annual maxima data used in the first case. In fact, a simple look to the Figs. 2 and 3 confirms that the GEV20 is more suitable than the stationary Gumbel.

Concerning the GPD approach, we have found that the GPD applied to the declustered data is better than the fitting of the GPD to all exceedances, ignoring dependence. There is a substantial improvement in GPD fit after clustered observations were filtered from the analysis.

By taking account of non-stationary in POT analyses, it seems that the non-stationary GPD with sinusoidal functions in the scale parameter is more appropriate for modelling our data. However, it appears that the GPD fit after clustered observations is more reasonable.

Based on the analysis above, we may retain the GEV20 and the GPD fitted to the declustered data as the most suitable models. In order to compare the performance of them, we put the diagnostic plots for the two models side-by-side (Fig. 10). We see that the three models fit the data fairly well, but the GPD, when it was applied to the declustered data, it appears to do a better job at the first. This result is not surprising since it has been well confirmed in the literature that the GPD is the more efficient approach in modelling extreme events. The discrepancies seem to be because the GPD uses much higher proportion of the original data values.

Fig. 10
figure 10

QQ plots of the GPD fitted to the declustered data and the GEV20 fitted to half-yearly maxima temperatures

6 Conclusions

In this paper, we have studied extreme temperatures in Riyadh city, KSA. Different approaches coming from extreme value theory (EVT) are applied and compared to model, analyse and forecast extreme weather conditions under stationary and non-stationary contexts. In particular, we have modelled the daily maximum temperatures recorded in Riyadh, the capital of Saudi Arabia, using the GEV and GPD models.

Major findings and conclusions of this study are as follows:

  • The stationary Gumbel model was found capable of fitting extreme temperature series when applied to annual block maxima. The developed non-stationary GEV models did not show any advantage over the stationary model according to the used criteria for models selection. The last choice could also be justified by the main time series of the data that shows no obvious evidence of climate change, which may well not yet have an impact at that location. The return level estimates reveal that the temperature that exceeds the current maximum temperature (48.2 °C) will start to appear within the next 20, 50 and 100 years.

  • Despite the absence of any obvious trends in yearly extreme data, there was evidence of trend in the half-yearly data. The non-stationary GEV model with a quadratic trend in the location shows advantage over the stationary models. This result is partially supported by Mann–Kendall (MK) and the KPPS tests.

  • The GPD approach fitted to the declustered data was found to be superior to the GPD fitted to all exceedances. Similar results were found for the return levels; as for the retained stationary Gumbel model fitted to annual block maxima, new records on maximum temperature event could appear within the next 20, 50 and 100 years.

  • The non-stationary GPD with cycle variation in the scale parameter is found to be more appropriate for modelling the temperature data when compared to the non-stationary GPD model with linear time covariate in the scale parameter. However, it appears that the GPD fit, when applied to declustered observations, is more reasonable among all the GPD candidate models.

  • By comparing EVT models, our findings seem to suggest that the three retained models (stationary GEV, non-stationary GEV and the GPD with declustering) to the temperature data are fairly well, but the GPD, when applied to the declustered data, appears to do a better job at the first. These results are not surprising since it is well confirmed in the literature that the GPD is a more efficient approach than the GEV.

The main outcomes of this study are broadly consistent with the findings of Tanarhte et al. [72], who studied the characteristics of heat waves in the eastern Mediterranean and Middle East region during the period 1973–2010. Their results, based on stationary extreme value theory, reveal that the return levels calculated for the individual hot days and found to be very high in the Arab Gulf region. In particular, they reported that the high temperature of 50 °C is expected to be exceeded every 20 years in Dhahran (Saudi Arabia) and every 10 years in Kuwait, 48 °C every 10 years in Doha (Qatar) and 46 °C every 10 years in Eilat (Israel). Our results are also in line with previous studies that used climate indices for the analysis of extreme temperatures. For example, we can cite the results of Zhang et al. [73] for the Middle East region and Athar [12] for Saudi Arabia, in which significant, increasing trends have been found in the annual maximum of daily maximum temperature.

In this paper, we have shown how extreme value theory serves as a useful analysis tool in describing extreme temperature events. The study improves our understanding of temperature events in Riyadh city, and it may be beneficial for quantifying future heat wave properties as part of forecasts or longer-term projections from climate models. Additionally, it may offer some insights, for policy makers and planners, such as the prediction of long run temperature in future relative. However, our study concludes with the importance of regarding climate change exposure of this region, instead of using a single station that may not be representative for the whole area, as we were not able to confirm the impact of climate change based only in these observations.

The results we have presented show the need for more investigation and therefore can be extended in several ways. In this study, we only considered data from a single station to demonstrate the EVT methodology for climate studies. It is not realistic to extrapolate the findings of this study for larger spatial scales such as the entire Saudi area without further analysis using temperature data from multiple observation stations within the area. In addition, non-stationarity in extremes increases the complexity in modelling. This is mainly due to the subjectivity involved in choosing appropriate parametric functions for seasonality and long-term trends.

Accordingly, the statistical modelling can further be expanded and improved in various ways. For example, multiple observation stations could be used to further test the statistical significance of the results presented in this study and their generalization for the whole region. Two scientific concerns that have recently emerged in the field of climate research motivate the extension of the present work as previously mentioned. First, the Nature Climate Change paper of Pal and Eltahir (2016), “Future temperature in southwest Asia projected to exceed a threshold for human adaptability”, addresses this region specifically and shows that the Persian/Arab gulf region and parts of southwest Asia could be uninhabitable before the turn of the century as temperatures are expected to rise to intolerable levels. Second, the IPCC reports an expected profound climate change that may radically change the picture of our climate.

Finally, researchers could extend our framework to jointly model, via copula functions, the dependence relation between extreme temperatures and air pollution as they are well known to be closely coupled and generally move together.