1 Introduction

Max-stable processes have drawn attention in the recent past, by providing an asymptotically justified framework for modeling spatial extremes, and allowing extrapolation beyond observed data (see, e.g., Davison et al. 2012). Although max-stable processes cannot be characterized by a parametric family, the canonical approach is to fit flexible parametric max-stable models. However, in practice, strong constraints are usually imposed: the max-stable models considered up to now are usually stationary (i.e., shift-invariant) and isotropic (i.e., rotation-invariant). Neglecting non-stationarity at extreme levels may not only provide a poor description of the data, but more importantly, it may also have dramatic consequences on the estimation of return levels (i.e., extrapolation to high quantiles) for spatial quantities, as illustrated in Fig. 1. While it is relatively straightforward to construct non-stationary models for marginal distributions, e.g., by letting the underlying parameters depend on covariates or splines (Chavez-Demoulin and Davison 2005; Cooley et al. 2007; Northrop and Jonathan 2011; Davison and Gholamrezaee 2012), it is more difficult to model non-stationarity in the dependence structure. Furthermore, even if a suitable family of non-stationary models can be identified, performing inference may be awkward if the dataset is not spatially rich enough. Since rare events are scarce by nature, it is even more tricky to detect non-stationary patterns at extreme levels, and there have been very few attempts to tackle this important issue so far. A related problem is the incorporation of substantive knowledge, e.g., from physical processes, into max-stable processes. In particular, information might be gained by including meaningful covariates in the dependence structure.

Fig. 1
figure 1

True return level curves for the spatial functionals \(\mathrm{INT}_j\) (left), \(\mathrm{MIN}_j\) (middle), and \(\mathrm{MAX}_j\) (right), \(j=1,2\), for domains \({\mathcal {S}}_1=[0,0.2]\times [0,1]\) (solid) and \({\mathcal {S}}_2=[0.8,1]\times [0,1]\) (dashed), based on the extremal t model. Black curves correspond to the stationary case, and red curves to the strongly non-stationary case; more details are given in Sect. 5 (Color figure online).

In an analysis of extreme snow depths, Blanchet and Davison (2011) proposed splitting the region of study into distinct homogeneous climatic zones to which stationary models were fitted separately, and where anisotropy was dealt with simple geometric deformations of the space. Although their approach simplifies the problem at first sight, it yields a physically unrealistic description of extreme events at the boundary between zones, while the number of parameters also increases dramatically. Another solution advocated by Cooley et al. (2007) is to map the original latitude–longitude space to an alternative “climate space” in which stationarity may be a reasonable assumption, but this might lead to unrealistic realizations and conclusions in the original space. Alternatively, Smith and Stephenson (2009) and Reich and Shaby (2012) proposed Bayesian non-stationary max-stable models. The latter are, however, intrinsically linked to the Smith (1990) model, which is built from very smooth storm profiles and therefore lacks flexibility (though the Reich–Shaby model cures this somewhat by having an additional parameter controlling the amount of noise). Furthermore, Bayesian max-stable models are difficult to fit (Ribatet et al. 2012), although Thibaud et al. (2015) recently showed how this may be performed in relatively moderate dimensions. In the bivariate case, de Carvalho and Davison (2014) proposed a non-parametric approach linking different spectral densities through exponential tilting. Castro et al. (2015) extended this to covariate-dependent spectral densities; see also de Carvalho (2015). However, these methods are computationally intensive and difficult to apply in large dimensions.

In the classical geostatistics literature, several non-stationary models have been suggested. Paciorek and Schervish (2006) proposed a large family of non-stationary correlation functions based on Gaussian kernel convolutions, which can be constructed from known stationary isotropic models. Nychka et al. (2002) built flexible non-stationary covariance functions using multi-resolution wavelets. Fuentes (2001) and Reich et al. (2011) created non-stationary models by mixing stationary covariance functions and letting the weights depend on covariates. Jun and Stein (2007, 2008), Castruccio and Stein (2013), and Castruccio and Genton (2016) advocated a spectral approach that provides flexible non-stationary covariance models on the sphere. Alternatively, Sampson and Guttorp (1992), Perrin and Monestiez (1999), Schmidt and O’Hagan (2003), and Anderes and Stein (2008) created non-stationary processes by smooth deformations of isotropic random fields. Bornn et al. (2012) proposed modeling non-stationarity through dimension expansion. Lindgren et al. (2011) developed non-stationary models for Gaussian random fields and Gaussian Markov random fields based on stochastic partial differential equations (SPDEs).

The present paper aims at merging ideas from extreme-value theory and classical geostatistics by proposing simple parametric models able to capture non-stationary patterns in spatial extremes through covariates. To this end, a flexible approach based on max-stable processes and Paciorek and Schervish ’s correlation model is advocated. Loosely speaking, the new models proposed here are formed by a first layer justified for extremes, within which non-stationarity is handled with locally elliptical kernels, and by a second layer, where these kernels are further described using covariates. As will be explained below, these models can also be seen locally as smoothly deformed isotropic max-stable random fields. The use of mixtures is advocated to capture different smoothness behaviors in distinct subregions.

The full likelihood for max-stable processes is intractable when the number of sites exceeds \(D=13\) (see Castruccio et al. 2016), and for some models, the joint density can only be computed for dimension \(D=2\). This explains why pairwise likelihoods (Lindsay 1988; Varin et al. 2011) have become the standard tool for inference in this context (Padoan et al. 2010; Thibaud et al. 2013; Huser and Davison 2014), although more efficient approaches based on the point process characterization of extremes have recently been proposed (Wadsworth and Tawn 2014; Engelke et al. 2015; Thibaud and Opitz 2015; Thibaud et al. 2015).

In Sect. 2, max-stable processes are introduced and some properties and limitations of the Smith–Stephenson model are discussed. In Sect. 3, we propose new non-stationary max-stable models that are more flexible than the Smith–Stephenson model. In Sect. 4, we discuss inference based on pairwise likelihoods, and in Sect. 5, we conduct a simulation study to investigate the ability of the estimators to capture non-stationarity in the dependence structure. We also investigate the effect of ignoring non-stationarity on the estimation of spatial return levels. In Sect. 6, we illustrate the methods on temperature annual maxima recorded in Colorado during 1895–1997, and we conclude with a discussion in Sect. 7.

2 Max-Stable Processes

2.1 Theoretical Foundation

Suppose that \(X_1(\varvec{s}),X_2(\varvec{s}), \ldots \), are independent and identically distributed random processes with continuous sample paths on \({\mathcal {S}}\subset \mathbb {R}^d\), and that there exist sequences of functions \(a_n(\varvec{s})>0\) and \(b_n(\varvec{s})\) such that the renormalized process of pointwise maxima \(a_n(\varvec{s})^{-1}[\max \{X_1(\varvec{s}),\ldots ,X_n(\varvec{s})\}-b_n(\varvec{s})]\) converges weakly to a process \(Z(\varvec{s})\) with non-degenerate margins, as \(n\rightarrow \infty \). Then, \(Z(\varvec{s})\) must be max-stable, i.e., for any positive integer k, the finite-dimensional distributions of \(Z(\varvec{s})\) and \(\max \{Z_1(\varvec{s}),\ldots ,Z_k(\varvec{s})\}\), where \(Z_1(\varvec{s}),\ldots ,Z_k(\varvec{s})\) denote independent replicates of \(Z(\varvec{s})\), differ only through location and scale coefficients. In particular, margins follow the generalized extreme-value distribution \(G(z)=\exp \left( -\left[ 1+\xi (\varvec{s})\{z-\mu (\varvec{s})\}/\sigma (\varvec{s})\right] _+^{-1/\xi (\varvec{s})}\right) \), with spatially varying location, scale, and shape parameters \(\mu (\varvec{s}),\sigma (\varvec{s})>0,\xi (\varvec{s})\), respectively. Furthermore, defining standardized processes as \(Y_i(\varvec{s})=1/[1-F_{\varvec{s}}\{X_i(\varvec{s})\}]\) \((i=1,2,\ldots )\) with \(F_{\varvec{s}}(x)\) the marginal distribution of \(X_i(s)\) at location \(\varvec{s}\), the limiting distribution of \(n^{-1}\max \{Y_1(\varvec{s}),\ldots ,Y_n(\varvec{s})\}\) is max-stable with unit Fréchet margins (i.e., GEV with parameters \(\mu (\varvec{s})=\sigma (\varvec{s})=\xi (\varvec{s})=1\)). Such a limiting process is called simple max-stable. Standardization allows the treatment of the margins to be separated from the dependence structure.

Simple max-stable processes have been characterized by de Haan (1984); see also Schlather (2002) and de Haan and Ferreira (2006, §9.4). Given points \(\{P_i;i=1,2,\ldots \}\) of a Poisson process with intensity \(p^{-2}\) \((p>0)\) and independent replicates \(\{W_i(\varvec{s});i=1,2,\ldots \}\) of a positive process \(W(\varvec{s})\) \((\varvec{s}\in {\mathcal {S}}\subset \mathbb {R}^d)\) with unit mean, the process created as

$$\begin{aligned} Z(\varvec{s})=\sup _{i=1,2,\ldots } P_i W_i(\varvec{s}) \end{aligned}$$
(1)

is a simple max-stable process. Conversely, under mild conditions, each continuous simple max-stable process can be decomposed as in (1). Furthermore, for any set of D spatial locations \({\mathcal D}=\{\varvec{s}_1,\ldots ,\varvec{s}_D\}\subset {\mathcal {S}}\), one has

$$\begin{aligned} \mathrm{Pr}\{Z(\varvec{s}_1)\le z_1,\ldots ,Z(\varvec{s}_D)\le z_D\}=\exp \left\{ -V_{{\mathcal D}}\left( z_1,\ldots ,z_D\right) \right\} , \end{aligned}$$
(2)

where the so-called exponent measure is \(V_{{\mathcal D}}\left( z_1,\ldots ,z_D\right) =\mathrm{E}\left[ \max \left\{ {W(\varvec{s}_1)/ z_1},\ldots ,{W(\varvec{s}_D)/ z_D}\right\} \right] \). The exponent measure has a closed-form formula for specific choices of \(W(\varvec{s})\); see, e.g., Schlather (2002), Nikoloulopoulos et al. (2009), Genton et al. (2011), Huser and Davison (2013), and Opitz (2013). A useful related quantity is the so-called extremal coefficient \(\theta (\varvec{s}_1,\varvec{s}_2)=V_{\mathcal D}(1,1)\in [1,2]\), \({\mathcal D}=\{\varvec{s}_1,\varvec{s}_2\}\), giving a measure of dependence between variables \(Z(\varvec{s}_1)\) and \(Z(\varvec{s}_2)\), or equivalently, extremal dependence between variables \(Y(\varvec{s}_1)\) and \(Y(\varvec{s}_2)\): \(\theta (\varvec{s}_1,\varvec{s}_2)=1\) corresponds to perfect dependence and \(\theta (\varvec{s}_1,\varvec{s}_2)=2\) to independence.

For more details about univariate and multivariate extremes, see Beirlant et al. (2004) and Davison and Huser (2015), and for an account of spatial extremes, see the review papers by Davison et al. (2012), Cooley et al. (2012), and Davison et al. (2013). See also the book by de Haan and Ferreira (2006), which explains the technicalities in depth.

Fig. 2
figure 2

Simulations of Model (3) for locations \(\varvec{s}=(s_x,s_y)\in [0,1]^2\). Top left stationary isotropic case with \(\varvec{\Omega }_{\varvec{s}}=0.1^2\varvec{I}_2\). Top right non-stationary locally isotropic case with \(\varvec{\Omega }_{\varvec{s}}=0.4^22^{-8|s_x|}\varvec{I}_2\). Bottom left non-stationary homogeneously anisotropic case with \(\varvec{\Omega }_{\varvec{s}}=0.4^22^{-8|s_x|}\varvec{R}\), with \(\varvec{R}\in \mathbb {R}^{2\times 2}\) being a correlation matrix with correlation 0.8. Bottom right general non-stationary case with \((\varvec{\Omega }_{\varvec{s}})_{11}=0.4^22^{-8|s_x|}\), \((\varvec{\Omega }_{\varvec{s}})_{22}=0.4^22^{-8|1-s_x|}\), and \((\varvec{\Omega }_{\varvec{s}})_{12}=(\varvec{\Omega }_{\varvec{s}})_{21}=\{(\varvec{\Omega }_{\varvec{s}})_{11}(\varvec{\Omega }_{\varvec{s}})_{22}\}^{1/2}\{e^{h(\varvec{s})}-1\}/\{e^{h(\varvec{s})}+1\}\), \(h(\varvec{s})=2\log (3)e^{-30(s_x-0.5)^2}\). Realizations are based on the same random seed. The contours correspond to \(\theta (\varvec{s}_1,\varvec{s}_2)=1.2,1.5,1.8\) (narrow to wide), where \(\varvec{s}_1\) is the center location (cross). The color scale indicates quantile probabilities.

2.2 The Celebrated Smith Model and Its Non-stationary Extension

The first stationary max-stable model proposed in the literature is the Smith (1990) model, which assumes in (1) that \(W_i(\varvec{s})=\phi _d(\varvec{s}-\varvec{U}_i;\varvec{\Omega })\), where the \(\varvec{U}_i\)s are the points of a unit rate Poisson process on \({\mathcal {S}}=\mathbb {R}^d\) and \(\phi _d(\cdot ;\varvec{\Omega })\) denotes the d-dimensional Gaussian density function with zero mean and covariance matrix \(\varvec{\Omega }\). Although finite-dimensional distributions are known in arbitrary dimensions (Genton et al. 2011), they are always degenerate for \(D>d+1\), which raises the question of the suitability of the Smith model in practice. The non-stationary extension proposed by Smith and Stephenson (2009) considers spatially varying covariance matrices \(\varvec{\Omega }_{\varvec{s}}\), capturing the small-scale dependence structure around location \(\varvec{s}\in {\mathcal {S}}\). The generalized storm profiles are of the form:

$$\begin{aligned} W_i(\varvec{s})=\phi _d(\varvec{s}-\varvec{U}_i;\varvec{\Omega }_{\varvec{U}_i}). \end{aligned}$$
(3)

This model has the appealing property of being locally elliptic (a feature that we will retain for the more general model proposed in Sect. 3), in the sense that infinitesimal contours of the extremal coefficient form ellipses, see Fig. 2. Several special cases may be of interest in practice: if contours are locally circular with \(\varvec{\Omega }_{\varvec{s}}=\omega (s)^2\varvec{I}_d\), where \(\omega ({\varvec{s}})>0\) and \(\varvec{I}_d\) is the d-by-d identity matrix, the model is locally isotropic (top right panel of Fig. 2), and when \(\omega ({\varvec{s}})=\omega >0\) for all \(\varvec{s}\in {\mathcal {S}}\), (3) reduces to the stationary isotropic case, i.e., the classical Smith model (top left panel of Fig. 2). When \(\varvec{\Omega }_{\varvec{s}}=\omega (s)^2\varvec{R}\) for some fixed d-by-d correlation matrix \(\varvec{R}\), the model is not isotropic, but the anisotropy is homogeneous over space; see the bottom left panel of Fig. 2. If \(\omega ({\varvec{s}})=\omega >0\) for all \(\varvec{s}\in {\mathcal {S}}\), it reduces to the stationary anisotropic case, illustrated by Blanchet and Davison (2011). Smith and Stephenson (2009) provide bivariate margins in the homogeneously anisotropic case only; in the Supplementary Material, calculations are performed in full generality for \(D=2\).

The extremal coefficient of the stationary Smith model satisfies \(\theta (\varvec{s}_1,\varvec{s}_2)\equiv \theta (\Vert \varvec{h}\Vert )\rightarrow 2\), as \(\Vert \varvec{h}\Vert =\Vert \varvec{s}_1-\varvec{s}_2\Vert \rightarrow \infty \), which implies that complete independence can be captured at infinity. In \(\mathbb {Z}\), this is equivalent to the process being mixing (Kabluchko and Schlather 2010). In the Supplementary Material, we show that this property is also fulfilled by the Smith–Stephenson model with \(\Omega _{\varvec{s}}=\omega ({\varvec{s}})^2 \varvec{I}_d\) (locally isotropic case) provided \(\omega (\varvec{s}) = o(\Vert \varvec{s}\Vert )\); by a simple extension, this is also true when \(\varvec{\Omega }_{\varvec{s}}=\omega ({\varvec{s}})^2 \varvec{R}\), with \(\varvec{R}\) being a correlation matrix (homogeneously anisotropic case). This result makes sense because if one has \(\omega ({\varvec{s}})=O(\Vert \varvec{s}\Vert )\), the extent of a storm centered at \(\varvec{s}\) increases at the same rate as the distance separating \(\varvec{s}\) from any fixed other point \(\varvec{s}_0\), such that the storm contributes to the supremum (1) at location \(\varvec{s}_0\) with positive probability, no matter how far it is from \(\varvec{s}_0\).

Although the Smith–Stephenson model is easily interpretable, it has several limitations. First, finite-dimensional distributions are known for \(D=2\) only. Second, pairwise densities involve the cumulative distribution and density of quadratic forms of normal variables, the computation of which may be intensive (see the Supplementary Material). Finally, as illustrated in Fig. 2, this process is very smooth. Realizations are infinitely differentiable in neighborhoods of all points that do not lie on the border between distinct storms, and this appears too strong an assumption in most environmental applications. In fact, the storm profiles are almost deterministic; randomness is solely created by the storm locations \(\varvec{U}_i\) and storm intensities \(P_i\) in (1). More flexible non-stationary max-stable models with stochastic storm profiles, generalizing (3), are proposed in Sect. 3.

3 Flexible Non-stationary Dependence Structures

3.1 The Non-stationary Extremal t Model

The extremal t model (Nikoloulopoulos et al. 2009; Opitz 2013) is defined by taking

$$\begin{aligned} W(\varvec{s})=c_\mathrm{df}\max \{0,\varepsilon (\varvec{s})\}^\mathrm{df}, \qquad c_\mathrm{df}=2^{1-\mathrm{df/2}}\pi ^{1/2}\left[ \Gamma \left\{ ({\mathrm{df}+1)/ 2}\right\} \right] ^{-1}, \end{aligned}$$
(4)

in (1), where \(\mathrm{df}>0\), \(\varepsilon (\varvec{s})\) is a Gaussian process with zero mean, unit variance, and correlation function \(\rho (\varvec{s}_1,\varvec{s}_2)\), and \(\Gamma (\cdot )\) is the gamma function. The extremal t model does not capture independence unless \(\mathrm{df}\rightarrow \infty \) (Davison et al. 2012), but this issue may be resolved by incorporating a random set element (Davison and Gholamrezaee 2012; Huser and Davison 2014), though the inference is more tricky. The model (4) has several interesting sub-models, the stationary versions of which have been applied extensively. When \(\mathrm{df}=1\), (4) reduces to the Schlather (2002) model, which has been fitted in numerous applications (Davison and Gholamrezaee 2012; Davison et al. 2012; Ribatet 2013; Thibaud et al. 2013). The Brown–Resnick process (Brown and Resnick 1977; Kabluchko et al. 2009) arises as a limiting case of (4) as \(\mathrm{df}\rightarrow \infty \) (Davison et al. 2012); its storm profiles may be expressed as \(W(\varvec{s})=\exp \{\varepsilon (\varvec{s})-\gamma (\varvec{s})\}\), where \(\varepsilon (\varvec{s})\) is a Gaussian random field with semi-variogram \(\gamma (\varvec{h})\) such that \(\varepsilon (\varvec{0})=0\) almost surely. The Brown–Resnick process extends the Smith model (Huser and Davison 2013), and it can also be viewed as the generalization of the Hüsler and Reiss (1989) multivariate extreme-value distribution to the spatial framework. In practice, Brown–Resnick processes have proven to be quite flexible compared to the Smith and Schlather alternatives (Davison et al. 2012; Jeon and Smith 2012). Model (4) not only generalizes all aforementioned stationary max-stable models, but it is also the max-attractor for the broad class of all suitably rescaled elliptical processes (Opitz 2013), which provides strong support for its use in practice; as an illustration of its practical performance, see Thibaud and Opitz (2015). The bivariate exponent measure for (4) may be expressed as

$$\begin{aligned} V_{{\mathcal D}}\left( z_1,z_2\right)= & {} {1\over z_1}T_{\mathrm{df}+1}\left[ {\left( {z_2/ z_1}\right) ^{1/\mathrm{df}}-\rho (\varvec{s}_1,\varvec{s}_2)\over \left( \mathrm{df}+1\right) ^{-1/2}\left\{ 1-\rho (\varvec{s}_1,\varvec{s}_2)^2\right\} ^{1/2}}\right] \nonumber \\&+ {1\over z_2}T_{\mathrm{df}+1}\left[ {\left( {z_1/ z_2}\right) ^{1/\mathrm{df}}-\rho (\varvec{s}_1,\varvec{s}_2)\over \left( \mathrm{df}+1\right) ^{-1/2}\left\{ 1-\rho (\varvec{s}_1,\varvec{s}_2)^2\right\} ^{1/2}}\right] , \end{aligned}$$
(5)

where \(T_\mathrm{df}(\cdot )\) is the Student t cumulative distribution function with \(\mathrm{df}\) degrees of freedom. Explicit expressions in dimension D are also available (see Thibaud and Opitz 2015).

Fig. 3
figure 3

Simulation of the extremal t model (4), with \(\mathrm{df=5}\) and non-stationary correlation function (7), combined with (8), in \([0,1]^2\). Columns correspond to different smoothness scenarios, with \(\alpha =0.5,1,1.5\) (left to right). Locally isotropic (top row) and general non-stationary (bottom row) cases are displayed. The underlying spatially varying matrices are \(\varvec{\Omega }_{\varvec{s}}=(2\,\mathrm{df})^{2/\alpha }\times \varvec{\Omega }_{\varvec{s}}^\mathrm{BR}\), where \(\varvec{\Omega }_{\varvec{s}}^\mathrm{BR}=0.4^22^{-8|s_x|}\varvec{I}_2\) (top row) or \((\varvec{\Omega }_{\varvec{s}}^\mathrm{BR})_{11}=0.4^22^{-8|s_x|}\), \((\varvec{\Omega }_{\varvec{s}}^\mathrm{BR})_{22}=0.4^22^{-8|1-s_x|}\) and \((\varvec{\Omega }_{\varvec{s}}^\mathrm{BR})_{12}=(\varvec{\Omega }_{\varvec{s}}^\mathrm{BR})_{21}=\{(\varvec{\Omega }_{\varvec{s}}^\mathrm{BR})_{11}(\varvec{\Omega }_{\varvec{s}}^\mathrm{BR})_{22}\}^{1/2}\{e^{h(\varvec{s})}-1\}/\{e^{h(\varvec{s})}+1\}\), where \(h(\varvec{s})=2\log (3)e^{-30(s_x-0.5)^2}\) (bottom row). Realizations are from the same random seed. Contour curves correspond to \(\theta (\varvec{s}_1,\varvec{s}_2)=1.2,1.5,1.8\) (narrow to wide), where \(\varvec{s}_1\) is the center location (cross). The color scale indicates quantile probabilities.

Our approach to modeling non-stationarity in spatial extremes consists of combining the extremal t model (4) with non-stationary correlation functions \(\rho (\varvec{s}_1,\varvec{s}_2)\) proposed in the classical spatial statistics literature. As mentioned above, there exist several ways to construct non-stationary correlation functions, spanning from space deformations to SPDEs, and including wavelets, spectral methods, mixtures of stationary correlations, or kernel convolutions. Hence, our methodology to tackle non-stationarity in extremes is very general and can potentially yield a large variety of models, having their own advantages and drawbacks. There are (at least) three desirable properties that we would like our model to possess: simplicity, local ellipticity, and ease to incorporate covariates. We have found that the kernel convolution approach advocated by Paciorek and Schervish (2006) is especially satisfactory. These authors have proposed a very general construction of non-stationary correlation functions that are based on known isotropic correlation models. Specifically, let \(\varvec{\Omega }_{\varvec{s}}\) denote a (continuously) spatially varying d-by-d covariance matrix, and for any two locations \(\varvec{s}_1,\varvec{s}_2\in {\mathcal {S}}\) with separation vector \(\varvec{h}=\varvec{s}_2-\varvec{s}_1\), define the quadratic form \(Q_{\varvec{s}_1;\varvec{s}_2}\) as

$$\begin{aligned} Q_{\varvec{s}_1;\varvec{s}_2}=\varvec{h}^T\left( {\varvec{\Omega }_{\varvec{s}_1}+\varvec{\Omega }_{\varvec{s}_2}\over 2}\right) ^{-1}\varvec{h}. \end{aligned}$$
(6)

Paciorek and Schervish (2006) show that for any isotropic correlation function \(R(\Vert \varvec{h}\Vert )\) valid on \(\mathbb {R}^d\) \((d=1,2,\ldots )\), the function

$$\begin{aligned} \rho (\varvec{s}_1,\varvec{s}_2)=|\varvec{\Omega }_{\varvec{s}_1}|^{1/4}|\varvec{\Omega }_{\varvec{s}_2}|^{1/4}\bigg |{\varvec{\Omega }_{\varvec{s}_1}+\varvec{\Omega }_{\varvec{s}_2}\over 2}\bigg |^{-1/2}R\left( {Q_{\varvec{s}_1;\varvec{s}_2}}^{1/2}\right) \end{aligned}$$
(7)

provides a valid non-stationary correlation function on \(\mathbb {R}^d\) \((d=1,2,\ldots )\). To avoid parametrization redundancy, the function \(R(\Vert \varvec{h}\Vert )\) can be assumed to have unit range. Many isotropic correlation functions have been proposed in the literature (see, e.g., Cressie , 1993, Stein , 1999 or Cressie and Wikle , 2011), making (7) a useful constructive device for non-stationary correlation functions. One popular possibility is the powered exponential family

$$\begin{aligned} R(\Vert \varvec{h}\Vert )=\exp \left( -\Vert \varvec{h}\Vert ^\alpha \right) , \end{aligned}$$
(8)

where \(\alpha \in (0,2]\) is a smoothness parameter, and the exponential and squared exponential models correspond to \(\alpha =1\) and \(\alpha =2\), respectively. This correlation family generates random fields with very rough (with \(\alpha \rightarrow 0\)) to analytical sample paths (with \(\alpha =2\)). Hence, great flexibility can be obtained by combining (7) with (8). Since the max-stable model in (4) inherits its sample path differentiability properties from the underlying Gaussian process \(\varepsilon (\varvec{s})\), the parameter \(\alpha \) in (8) has a direct relationship with the smoothness of the resulting max-stable process. To illustrate this, typical realizations from the non-stationary extremal t model with \(\mathrm{df}=5\) combined with (7) and (8) are displayed in Fig. 3.

Like the non-stationary Smith model, the correlation function (7) is locally elliptic, and this attractive geometric property is therefore preserved for the resulting non-stationary max-stable random field. This implies that the latter can be seen locally as a smoothly deformed isotropic max-stable process. To see this, fix \(\varvec{s}_0\in {\mathcal {S}}\) and let \(\varvec{s}_1,\varvec{s}_2\in N(\varvec{s}_0)\subset {\mathcal {S}}\) be two locations within some small neighborhood \(N(\varvec{s}_0)\) of \(\varvec{s}_0\). By continuity of the map \(\varvec{s}\mapsto \varvec{\Omega }_{\varvec{s}}\), one has that \(\varvec{\Omega }_{\varvec{s}_2}\approx \varvec{\Omega }_{\varvec{s}_1}\approx \varvec{\Omega }_{\varvec{s}_0}\) and \(Q_{\varvec{s}_1;\varvec{s}_2}\approx \varvec{h}^T \varvec{\Omega }_{\varvec{s}_0}^{-1} \varvec{h}\), where \(\varvec{h}=\varvec{s}_2-\varvec{s}_1\) is the lag vector. Then, applying the spatial transformation \(\varvec{s}\mapsto \varvec{s}^\star =\varvec{\Omega }_{\varvec{s}_0}^{-1/2} (\varvec{s}-\varvec{s}_0)\) in \(N(\varvec{s}_0)\), where \(\varvec{\Omega }_{\varvec{s}_0}=\varvec{\Omega }_{\varvec{s}_0}^{1/2}\varvec{\Omega }_{\varvec{s}_0}^{T/2}\), one can easily verify that the correlation function on the new coordinate system satisfies \(\rho (\varvec{s}_1^\star ,\varvec{s}_2^\star )\approx R(\Vert \varvec{h}^\star \Vert )\) with \(\varvec{h}^\star =\varvec{s}_2^\star -\varvec{s}_1^\star \); it is therefore locally isotropic.

Another appealing feature is that the proposed non-stationary extremal t model defined above using (7) and (8) with covariance matrices \(\varvec{\Omega }_{\varvec{s}}=(2\,\mathrm{df})^{2/\alpha }\times \varvec{\Omega }_{\varvec{s}}^{\mathrm{BR}}\) converges as \(\mathrm{df}\rightarrow \infty \) to the Brown–Resnick process with variogram \(2\gamma (\varvec{s}_1,\varvec{s}_2)=({Q_{\varvec{s}_1;\varvec{s}_2}^\mathrm{BR}})^{\alpha /2}\), where \(Q_{\varvec{s}_1;\varvec{s}_2}^\mathrm{BR}\) is defined in (6) using \(\varvec{\Omega }_{\varvec{s}}^\mathrm{BR}\). In particular, the Smith–Stephenson model (3) is recovered when \(\alpha =2\). In practice, this implies that it is enough to fit the non-stationary extremal t model, as our approach generalizes (3); if \(\mathrm{df}\) is found to be relatively large and \(\alpha \approx 2\), then it might also be interesting to consider the (more parsimonious) Smith-Stephenson model, although it is more complex to fit.

3.2 Covariates

We now continue our modeling on the plane with \({d=2}\), although our approach could be applied in higher dimensions. In order to retain simplicity in our modeling of non-stationarity, we seek to incorporate meaningful covariates in the extremal dependence structure. To this end, we propose further modeling the covariance matrices \(\varvec{\Omega }_{\varvec{s}}\) \((\varvec{s}\in {\mathcal {S}})\) as follows: let

$$\begin{aligned} \varvec{\Omega }_{\varvec{s}}= & {} \begin{pmatrix} \omega _x^2(\varvec{s}) &{} \omega _x(\varvec{s})\omega _y(\varvec{s})\delta (\varvec{s})\\ \omega _x(\varvec{s})\omega _y(\varvec{s})\delta (\varvec{s}) &{} \omega _y^2(\varvec{s}) \end{pmatrix},\quad \text{ with, } \text{ for } \text{ example, } \end{aligned}$$
(9)
$$\begin{aligned} \log \{\omega _x(\varvec{s})\}=\varvec{X}_{\omega _x}^T(\varvec{s})\varvec{\beta }_{\omega _x},\;\;\;\log \{\omega _y(\varvec{s})\}=\varvec{X}_{\omega _y}^T(\varvec{s})\varvec{\beta }_{\omega _y},\;\;\;\mathrm{logit}[\{\delta (\varvec{s})+1\}/2]=\varvec{X}_\delta ^T(\varvec{s})\varvec{\beta }_\delta , \end{aligned}$$
(10)

where \(\varvec{X}_{\omega _x}(\varvec{s}),\varvec{X}_{\omega _y}(\varvec{s})\), and \(\varvec{X}_\delta (\varvec{s})\) denote vectors of covariates corresponding to location \(\varvec{s}\), and \(\varvec{\beta }_{\omega _x},\varvec{\beta }_{\omega _y}\), and \(\varvec{\beta }_\delta \) are the associated vectors of parameters measuring the importance of covariates. The link functions in (10) ensure that \(\omega _x(\varvec{s})>0\), \(\omega _y(\varvec{s})>0\) and \(\delta (\varvec{s})\in (-1,1)\), but they could in principle be replaced by other functions that satisfy these conditions. The construction (9) guarantees the positive definiteness of \(\varvec{\Omega }_{\varvec{s}}\). The local correlation range at station \(\varvec{s}\) with respect to the x (respectively y) axis is measured by the functions \(\omega _x(\varvec{s})\) (respectively \(\omega _y(\varvec{s})\)), whereas \(\delta (\varvec{s})\) captures the local anisotropy level: if \(\delta (\varvec{s})=0\) and \(\omega _x(\varvec{s})=\omega _y(\varvec{s})\), the resulting process is locally isotropic, i.e., infinitesimal contours are circular everywhere, whereas if \(\delta (\varvec{s})\ne 0\), contours are slanted ellipses; see Fig. 3.

3.3 Max-Stable Mixtures

Although the non-stationary model (4) appears quite flexible, one limitation is that it has a single smoothness parameter for the whole region. This issue may be overcome by using non-stationary Matérn correlation functions (Stein 2005; Anderes and Stein 2011), or by using an approach based of mixtures. The latter is outlined below.

The first type of mixture consists of max-mixtures of max-stable models. Let \(Z^1(\varvec{s})\) and \(Z^2(\varvec{s})\) be independent max-stable processes with unit Fréchet margins defined on the same space \({\mathcal {S}}\). Then for any function \(0\le a(\varvec{s})\le 1\), the spatial process defined as \(Z(\varvec{s})=\max [a(\varvec{s})Z^1(\varvec{s}),\{1-a(\varvec{s})\}Z^2(\varvec{s})]\) is a simple max-stable process with exponent measure

$$\begin{aligned} V_{\mathcal D}(z_1,\ldots ,z_D)=a(\varvec{s})V_{\mathcal D}^1(z_1,\ldots ,z_D) + \{1-a(\varvec{s})\}V_{\mathcal D}^2(z_1,\ldots ,z_D), \end{aligned}$$
(11)

where \(V_{\mathcal D}^1\) and \(V_{\mathcal D}^2\) are the exponent measures of \(Z^1(\varvec{s})\) and \(Z^2(\varvec{s})\), respectively. The function \(a(\varvec{s})\) is a spatially varying proportion, determining which of the processes \(Z^1(\varvec{s})\) and \(Z^2(\varvec{s})\) is dominant at location \(\varvec{s}\). Model (11) is stationary if \(a(\varvec{s})\) is constant over space and \(Z^1(\varvec{s})\) and \(Z^2(\varvec{s})\) are stationary, but it can be made non-stationary by allowing \(a(\varvec{s})\) to depend upon covariates, e.g., \(\mathrm{logit}\{a(\varvec{s})\}=\varvec{X}_a^T(\varvec{s})\varvec{\beta }_a\), where \(\varvec{X}_a(\varvec{s})\) is a vector of covariates for location \(\varvec{s}\) and \(\varvec{\beta }_a\) is the associated vector of parameters. Different smoothness behaviors may be captured in different spatial regions, provided that \(Z^1(\varvec{s})\) and \(Z^2(\varvec{s})\) have different degrees of differentiability. More complex non-stationary max-stable models \(Z(\varvec{s})\) may be constructed by considering a collection of independent stationary max-stable random fields \(Z^1(\varvec{s}),\ldots ,Z^k(\varvec{s})\) with unit Fréchet margins and associated proportions \(a^1(\varvec{s}),\ldots ,a^k(\varvec{s})\in [0,1]\) such that \(\sum _{i=1}^k a^i(\varvec{s})=1\) for each \(\varvec{s}\), yielding the simple max-stable process \(Z(\varvec{s})=\max _{i=1,\ldots ,k}\{a^i(\varvec{s})Z^i(\varvec{s})\}\). In practice, however, this model may involve too many parameters.

The second type of mixture consists of sum-mixtures of Gaussian processes (Fuentes 2001; Reich et al. 2011) used in the formulation of the extremal t model. Specifically, let \(\varepsilon ^1(\varvec{s}),\varepsilon ^2(\varvec{s})\) be two Gaussian processes with zero mean, unit variance, and correlation functions \(\rho ^1(\varvec{s}_1,\varvec{s}_2),\rho ^2(\varvec{s}_1,\varvec{s}_2)\), respectively, and let \(0\le a(\varvec{s})\le 1\) be a function defined on \({\mathcal {S}}\). Then, a non-stationary extremal t model may be obtained by considering the process \(\varepsilon (\varvec{s})= a(\varvec{s})\varepsilon ^1(\varvec{s})+\{1-a(\varvec{s})\}\varepsilon ^2(\varvec{s})\) in the construction (4) with correlation function

$$\begin{aligned} \rho (\varvec{s}_1,\varvec{s}_2)={a(\varvec{s}_1)a(\varvec{s}_2)\rho ^1(\varvec{s}_1,\varvec{s}_2) + \{1-a(\varvec{s}_1)\}\{1-a(\varvec{s}_2)\}\rho ^2(\varvec{s}_1,\varvec{s}_2)\over [a(\varvec{s}_1)^2+\{(1-a(\varvec{s}_1)\}^2]^{1/2}[a(\varvec{s}_2)^2+\{(1-a(\varvec{s}_2)\}^2]^{1/2}}. \end{aligned}$$
(12)

Again, the proportion \(a(\varvec{s})\) may be modeled in terms of covariates. Similarly, different smoothness behaviors over the space may be captured by the different mixture components. As above, model (12) can easily be extended to higher-dimensional mixtures, though this may lead to heavy parametrization. Although similar, the two types of max-stable mixtures are not equivalent, as their corresponding exponent measures differ.

4 Inference

4.1 Pairwise Likelihood

Likelihood inference for max-stable processes is not an easy task. The joint density for max-stable processes stems from the differentiation of (2) with respect to \(z_1,\ldots ,z_D\). In dimension \(D=2\), this equals \((V_1V_2-V_{12})\exp (-V)\), where \(V_1=\partial V_{\mathcal D}(z_1,z_2)/\partial z_1\), etc., where the subscript \({\mathcal D}\) and the arguments are dropped for clarity. However, as D increases, the size of this expression renders the full likelihood quickly intractable. To illustrate this, the number of terms in the full likelihood when \(D=10,20,50,100\) is of the order \(10^5,10^{13},10^{47},10^{115}\), respectively. To get around this computational bottleneck, the use of pairwise likelihoods is now a common practice (see, e.g., Padoan et al. 2010). Denoting the vector of unknown parameters by \(\varvec{\psi }\in \Psi \subset \mathbb {R}^p\), log pairwise likelihoods for model (2) may be expressed as

$$\begin{aligned} \ell (\varvec{\psi })=\sum _{i=1}^{m} \sum _{(j_1,j_2)\in {\mathcal {P}}} \log \left\{ V_1(z_{i;j_1},z_{i;j_2})V_2(z_{i;j_1},z_{i;j_2})-V_{12}(z_{i;j_1},z_{i;j_2})\right\} - V(z_{i;j_1},z_{i;j_2}), \end{aligned}$$
(13)

where \(z_{i;j}\) denotes the \(i\mathrm{th}\)block maximum recorded at the \(j\mathrm{th}\)station, \(i=1,\ldots ,m\), \(j=1,\ldots ,D\), and where the non-empty set \({\mathcal {P}}\subset {\mathcal {P}}_\mathrm{tot}=\{(j_1,j_2):1\le j_1<j_2\le D\}\) defines the pairs of observations included in the pairwise likelihood. If \({\mathcal {P}}={\mathcal {P}}_\mathrm{tot}\), all pairs are considered in (13). Computational and statistical efficiency might however be gained by carefully selecting a much smaller number of pairs (Huser and Davison 2014; Castruccio et al. 2016). A possibility is to include a small fraction of informative pairs, i.e., typically the most dependent ones, though Huser and Davison (2014) show that further improvements may be obtained in special cases by including some weakly dependent pairs as well. For stationary isotropic processes, this might be achieved by including the closest pairs, whereas for non-stationary max-stable processes one might consider pairs \((j_1,j_2)\) with the lowest extremal coefficients \(\theta (\varvec{s}_{j_1},\varvec{s}_{j_2})\). Since the latter are unknown in practice, the choice of pairs might be guided by pre-computed empirical extremal coefficients \(\hat{\theta }(\varvec{s}_{j_1},\varvec{s}_{j_2})\); however, simulations (not shown) reveal that this approach creates bias, as data are used twice: to select the pairs in the pairwise likelihood and to estimate the parameters by maximizing the latter. Under temporal independence, the maximum pairwise likelihood estimator \(\hat{\varvec{\psi }}\) maximizing (13) is strongly consistent, asymptotically Gaussian, and converges at \(m^{1/2}\) rate, and its asymptotic variance is of the sandwich form, as is typical for misspecified likelihood estimators (Padoan et al. 2010). More precisely, if \(\varvec{\psi }_0\in \mathrm{int}(\Psi )\) denotes the “true” parameter vector, then under mild regularity conditions, one has the large sample approximation

$$\begin{aligned} \hat{\varvec{\psi }}{\ {\buildrel \cdot \over \sim }\ }{\mathcal N}_p(\varvec{\psi }_0,\varvec{J}(\varvec{\psi }_0)^{-1}\varvec{K}(\varvec{\psi }_0)\varvec{J}(\varvec{\psi }_0)^{-1}),\quad m\rightarrow \infty , \end{aligned}$$
(14)

where \(\varvec{J}(\varvec{\psi })=\mathrm{E}\{-\partial ^2\ell (\varvec{\psi })/\partial \varvec{\psi }\partial \varvec{\psi }^T\}\in \mathbb {R}^{p\times p}\) and \(\varvec{K}(\varvec{\psi })=\mathrm{var}\{\partial \ell (\varvec{\psi })/\partial \varvec{\psi }\}\in \mathbb {R}^{p\times p}\). Uncertainty may be assessed by plugging estimates of the matrices \(\varvec{J}(\varvec{\psi }_0)\) and \(\varvec{K}(\varvec{\psi }_0)\) into the asymptotic variance in (14); see Padoan et al. (2010). Alternatively, one can bootstrap the independent replicates \(\varvec{z}_i=(z_{i;1},\ldots ,z_{i;D})^T\), \(i=1,\ldots ,m\), and re-estimate parameters using the pseudo-samples, to assess the variability surrounding \(\hat{\varvec{\psi }}\). Similar asymptotic properties hold for mildly time-dependent processes (Davis et al. 2013; Huser and Davison 2014) in which uncertainty may be assessed using block bootstrap.

4.2 Model Selection

Model comparison is typically performed using the composite likelihood information criterion (CLIC), defined as \(\mathrm{CLIC}=-2\ell (\hat{\varvec{\psi }}) + 2\mathrm{tr}\{\varvec{J}(\hat{\varvec{\psi }})^{-1}\varvec{K}(\hat{\varvec{\psi }})\}\), which is comparable to the Akaike information criterion. Another possibility is to use the composite Bayesian information criterion (CBIC), i.e., the counterpart of the classical Bayesian information criterion. It is defined as \(\mathrm{CBIC}=-2\ell (\hat{\varvec{\psi }}) + \log (m)\mathrm{tr}\{\varvec{J}(\hat{\varvec{\psi }})^{-1}\varvec{K}(\hat{\varvec{\psi }})\}\), and therefore penalizes model complexity more than does CLIC. The lower the CLIC or CBIC, the better the model. Theoretical properties of CLIC and CBIC have been investigated by Ng and Joe (2014) (in which CLIC and CBIC are called instead CLAIC and CLBIC, respectively). In particular, they show that CLIC has a tendency to select overcomplicated models. For a broad survey of composite likelihood methods, see Varin et al. (2011).

5 Simulation Study

5.1 Setup

In this simulation study, we assess the ability of the maximum pairwise likelihood estimator (14) to estimate and detect non-stationarity dependence structures in a variety of contexts. We also study the effect of neglecting non-stationarity on spatial return levels.

Throughout this section, we focus on the locally isotropic extremal t model illustrated in the first row of Fig. 3 and consider various parameter combinations. Specifically, the extremal t process with \(\mathrm{df}=1,2,5,10\) is simulated on \([0,1]^2\), using the non-stationary correlation function \(\rho (\varvec{s}_1,\varvec{s}_2)\) defined in (7) based on the powered exponential model (8) with \(\alpha =0.5,1,1.5,1.9\) (rough to smooth). The underlying spatially varying covariance matrix is taken to be of the form \(\varvec{\Omega }_{\varvec{s}}=(2\,\mathrm{df})^{2/\alpha }\times \omega ({\varvec{s}})^2\varvec{I}_2\), where \(\varvec{I}_2\) is the 2-by-2 identity matrix and \(\omega ({\varvec{s}})=\beta _1 2^{-\beta _2|s_x|}\), \(\varvec{s}=(s_x,s_y)\), with range \(\beta _1>0\) and non-stationary parameter \(\beta _2\in \mathbb {R}\). To investigate different non-stationary scenarios, we consider \((\beta _1,\beta _2)=(0.1,0)\) (stationary), \((0.1\sqrt{2},1)\) (weakly non-stationary), (0.2, 2) (mildly non-stationary), and (0.4, 4) (strongly non-stationary). Although these scenarios exhibit different non-stationarity patterns, the overall dependence strength is comparable in the sense that all cases satisfy \(\omega ({\varvec{s}})=0.1\) for any \(\varvec{s}=(0.5,s_y\in [0,1])\). The \(\mathrm{df}=1\) case corresponds to a non-stationary Schlather process, whereas the \(\mathrm{df}=10\) case is a crude approximation of a non-stationary Brown–Resnick process (with \(\alpha =1.9\) corresponding approximately to the non-stationary Smith model); recall Sect. 3.1. In each case, \(m=10,20,50,100\) independent replicates of these processes are simulated at \(S=10,20,50,100\) fixed locations uniformly sampled in the unit square. Simulations are repeated 300 times to compute empirical diagnostics.

5.2 Estimation and Detection of Non-stationarity

We first investigate the performance of the maximum pairwise likelihood estimator (14) to recover the true parameters under the correct model. We estimate parameters \(\varvec{\psi }=(\beta _1,\beta _2,\mathrm{df},\alpha )^T\) with (14) using the 10% closest pairs; then we derive the empirical biases, standard deviations, and root mean squared errors (RMSEs) from the 300 independent experiments. RMSEs, typically dominated by the standard deviations, are reported in Table 1.

Table 1 Root mean squared error (\(\times \)100) of the maximum pairwise likelihood estimator (14), using the 10% closest pairs, for the locally isotropic extremal t with various parameter combinations.

We focus on the estimation of \(\beta _1\) and \(\beta _2\), which determine the non-stationary scenario. The range parameter \(\beta _1\) is quite well identified overall. The corresponding RMSE is less than 0.02, 0.04, 0.06, and 0.12 for \(\beta _1=0.1\), \(0.1\sqrt{2}\), 0.2, and 0.4, respectively, and it decreases as the smoothness parameter \(\alpha \) increases, and as the degrees of freedom (\(\mathrm{df}\)) increase. Furthermore, the higher the \(\beta _1\), the larger its RMSE, as expected. The RMSE for the non-stationary parameter \(\beta _2\) follows a similar pattern, though large values of \(\beta _2\) seem easier to estimate overall: for strongly non-stationary scenarios, the RMSE is quite small in comparison to the actual value of \(\beta _2\). This is certainly due to the very rigid type of assumed non-stationarity: a small perturbation of \(\beta _2\) entails a dramatic change in the dependence structure.

To illustrate increasing-domain and infill asymptotic properties of the estimator (14), Fig. 4 displays boxplots of parameter estimates, as a function of m and S for the extremal t model with \(\mathrm{df}=5\), \(\alpha =1\), and \((\beta _1,\beta _2)=(0.2,2)\). As expected, the estimator appears to be consistent as m increases. In addition, parameters are much better estimated if the data are collected at a dense network of sites, although the estimator is not consistent as \(S\rightarrow \infty \) for fixed m, as a result of the extremal-t model being non-mixing. Interestingly, the estimated variances of \(\beta _1/\beta _2/\mathrm{df}/\alpha \) decrease by a factor 4.9 / 4.7 / 8.3 / 4.8 when the number of independent repeated measurements increases from \(m=20\) to \(m=100\) (for \(S=100\)), whereas they drop by a factor 13.2 / 9.9 / 5.8 / 17.2 when the number of dependent spatial measurements increases from \(S=20\) to \(S=100\) (for \(m=100\)). Therefore, in finite samples, having more stations may be (much) more valuable than having more temporal replicates.

Fig. 4
figure 4

Boxplots of parameter estimates obtained from data generated from the locally isotropic extremal t process with \(\mathrm{df}=5\), \(\alpha =1\), and \((\beta _1,\beta _2)=(0.2,2)\). Estimator (14) was used, including the 10% closest pairs. Green boxes (left of vertical dashed line) show the performance for a fixed number of locations, \(S=100\), and an increasing number of independent replicates, \(m=10,20,50,100\). Blue boxes (right of vertical dashed line) show the performance for a fixed number of replicates, \(m=100\), and an increasing number of locations, \(S=10,20,50,100\). Horizontal red lines are true values (Color figure online).

We now explore the ability of estimator (14) to detect the spatial heterogeneity. For each simulated dataset, we fit the true non-stationary model and the (restricted) stationary counterpart, computing in each case the corresponding CLIC and CBIC diagnostics defined in Sect. 4.1. These information criteria were computed using finite differences combined with the direct method of Padoan et al. (2010). The empirical percentages that the CLIC and CBIC are in favor of the true underlying model (either stationary if \(\beta _2=0\), or non-stationary otherwise) are calculated from the 300 experiments and reported in the Supplementary Material for \(S=100\) and \(m=100\). Overall, non-stationarity in the dependence structure seems easily detectable when the non-stationarity level is moderate to strong, with almost \(100\%\) of success in each case with the CLIC or CBIC. By contrast, the performance is poor in near-stationary cases; this is especially striking for the CBIC, which penalizes more model complexity. In case of stationarity, the CLIC selects the true model in about \(65\%\) of occasions, whereas the CBIC attains about \(80\%\) of success. This suggests that these information criteria, but especially the CLIC, have “more power” to select bigger models, and that they should be interpreted with care. This observation agrees with the theoretical findings of Ng and Joe (2014). Furthermore, the ability to distinguish between stationarity and non-stationarity improves when more data are available. For example, for fixed \(S=20\) and parameters \(\mathrm{df}=5\), \(\alpha =1\), \((\beta _1,\beta _2)=(0.2,2)\), the CLIC percentages are \(63,\,79,\,93 ,\,99\%\), for \(m=10,20,50,100\), respectively; similarly, for fixed \(m=20\), these values are \(42,\,79,\,98,\,100\%\), for \(S=10,20,50,100\), respectively.

5.3 Effect of Model Misspecification on Return Levels

Neglecting non-stationarity when the data are truly non-stationary might have serious consequences on the estimation of spatial return levels. To assess this, we consider the locally isotropic extremal t model on the Gumbel scale, with \(\mathrm{df}=5\) and \(\alpha =1.5\). For \((\beta _1,\beta _2)=(0.1,0)\) (stationary case) and \((\beta _1,\beta _2)=(0.4,4)\) (strongly non-stationary case).

We compute return levels for the integral \(\mathrm{INT}_j=\int _{{\mathcal {S}}_j} Z(\varvec{s}) \mathrm{d}\varvec{s}\), the minimum \(\mathrm{MIN}_j=\min _{\varvec{s}\in {{\mathcal {S}}_j}} \{Z(\varvec{s})\}\), and the maximum \(\mathrm{MAX}_j=\max _{\varvec{s}\in {{\mathcal {S}}_j}} \{Z(\varvec{s})\}\), \(j=1,2\), of the max-stable process \(Z(\varvec{s})\) over the domains \({\mathcal {S}}_1=[0,0.2]\times [0,1]\) and \({\mathcal {S}}_2=[0.8,1]\times [0,1]\). In practice, these domains are pixelated using a fine grid comprising 105 points with equal spacings of 0.05. Assuming that \(Z(\varvec{s})\) describes the annual maximum process for some quantity of interest, we then derive the N-year return level for \(\mathrm{INT}_j\) and \(\mathrm{MIN}_j\) as the empirical \((1-1/N)\)-quantile calculated from one million independent simulations of \(Z(\varvec{s})\). Return levels \(z_{N;\mathrm{MAX}_j}\) for \(\mathrm{MAX}_j\) are derived using the exact formula \(z_{N;\mathrm{MAX}_j}=\log \{\theta ({\mathcal {S}}_j)\}-\log \{-\log (1-1/N)\}\) and an estimate of the areal extremal coefficient \(\theta ({\mathcal {S}}_j)\) (Lantuéjoul et al. 2011). The latter determines the effective number of independent extremes in region \({\mathcal {S}}_j\); for the stationary case, one finds \(\theta ({\mathcal {S}}_1)=\theta ({\mathcal {S}}_2)\approx 8.6\), and for the non-stationary case, \(\theta ({\mathcal {S}}_1)\approx 4.2\), \(\theta ({\mathcal {S}}_2)\approx 23.6\), indicating that extremal dependence in \({\mathcal {S}}_1\) is much stronger than in \({\mathcal {S}}_2\). Results are shown in Fig. 1.

One can see that misspecification (and therefore also misestimation) of spatial dependence strongly affects the return levels of spatial quantities. Underestimation of dependence implies underestimation of return levels for \(\mathrm{INT}_j\) and \(\mathrm{MIN}_j\) and overestimation of return levels for \(\mathrm{MAX}_j\) (and vice versa). Although this depends on the level of non-stationarity, the underlying parameters, and marginal distributions, in practice it is crucial to capture correctly the non-stationarity in the dependence structure.

Fig. 5
figure 5

Left Map of Colorado with the 45 stations (dots) used in the analysis of temperature maxima. Middle a histogram summarizing the number of annual maxima available per station. Right number of stations per year.

6 Analysis of Temperature Maxima

We now discuss an application to a temperature dataset recorded in Colorado during the period 1895–1997, which is freely available on the National Center for Atmospheric Research website. We selected stations in the Front Range area, with at least 40 years of data, and extracted maxima over the months May–September (roughly corresponding to annual maxima), bypassing therefore the modeling of seasonality. Figure 5 illustrates the locations of the monitoring stations kept for the analysis, and summarizes the data availability.

To estimate marginal distributions, we fitted a spatial GEV\((\mu (\varvec{s}),\sigma (\varvec{s}),\xi (\varvec{s}))\) model to observed maxima, assuming conditional independence with the parameters \(\mu (\varvec{s}),\sigma (\varvec{s}),\xi (\varvec{s})\), modeled as latent stationary Gaussian processes. While the means of the location and scale parameters \(\mu (\varvec{s}),\sigma (\varvec{s})\) were assumed to depend on longitude, latitude, and altitude, the mean of the shape parameter involved only two distinct values for plains and mountains. Quantile–quantile plots (not shown) suggest that marginal fits are good. Annual maxima were then transformed to the unit Fréchet scale using the parameters’ estimated mean and the probability integral transform. Histograms of estimated parameters for the different stations are displayed in Fig. 6. Shape parameters are all negative, indicating that distributions of temperature annual maxima have an upper bound, which seems physically plausible.

Fig. 6
figure 6

Histograms of estimated location \(\mu (\varvec{s})\) (left), scale \(\sigma (\varvec{s})\) (middle), and shape \(\xi (\varvec{s})\) (right) parameters obtained from the fit of the spatial GEV\((\mu (\varvec{s}),\sigma (\varvec{s}),\xi (\varvec{s}))\) model with parameters modeled as (conditionally independent) latent Gaussian processes.

We then fitted 16 stationary and non-stationary extremal t models to the transformed data using the pairwise likelihood estimator (14) including all pairs of locations. These models, summarized in Table 2, are based on the Paciorek–Schervish correlation function (7) combined with (8) and are parametrized as in (9). They are either stationary (models 1–2) or non-stationary (models 3–16), locally isotropic (models 1,3–5,9,11–13) or anisotropic (models 2,6–8,10,14–16), based on Gaussian sum-mixtures of the form (12) (models 1–8) or non-mixtures (models 9–16). In the non-stationary models, altitude, longitude, and latitude are used as covariates (on top of the intercept) in the modeling of the dependence ranges \(\omega _x(\varvec{s}),\omega _y(\varvec{s})\) (with logarithmic link) and the mixture coefficient \(a(\varvec{s})\) (with logit link), as suggested in (10) and Sect. 3.3. The anisotropy parameter \(\delta (\varvec{s})\) is kept constant. The degrees of freedom, \(\mathrm{df}\), were found to be difficult to estimate and, after some analysis, were held fixed at \(\mathrm{df}=5\) (i.e., far from the Smith–Stephenson and Brown–Resnick families).

Figure 7 reports the estimated CLIC and CBIC values of the fitted models; recall Sect. 4.1. These two diagnostics agree on at least two main conclusions:

  1. (i)

    Mixture models fit generally better, although they have three more parameters than their non-mixture counterparts. The rougher mixture component tends to be dominant in the mountainous region, while the smoother one (though not very smooth) takes over at lower altitudes.

  2. (ii)

    Altitude is a major covariate to be considered in the modeling of extremal dependence, whereas inclusion of further covariates (longitude or latitude) does not improve the fit by much. In non-mixture models, there is a huge drop in CLIC or CBIC values between model 1 (stationary isotropic model with two parameters) and model 3 (locally isotropic model, including altitude as a covariate, with only three parameters). In mixture models, point (i) underscores the importance of having different degrees of regularity at different altitudes.

Table 2 Extremal t max-stable models fitted to the temperature maxima.
Fig. 7
figure 7

Difference of estimated CLIC (left) and CBIC (right) values for all max-stable models fitted with respect to the best fit. Vertical blue lines mark the separation between mixture (1–8) and non-mixture (9–16) models. Models used for comparison are highlighted in red (stationary isotropic model, 1), orange (best non-mixture model, 6), and green (best mixture model, 11) (Color figure online).

Among non-mixture models, it is worth considering non-stationary non-isotropic models with covariates included in the dependence ranges \(\omega _x(\varvec{s})\), \(\omega _y(\varvec{s})\). The best non-mixture model is model 8 (respectively 6) according to the CLIC (respectively CBIC), but CLIC tends to select overcomplicated models. For mixture models with altitude included in the mixture coefficient \(a(\varvec{s})\), the use of further covariates in \(\omega _x(\varvec{s})\), \(\omega _y(\varvec{s})\) does not improve the fit by much, although both diagnostics agree to select model 11 as the best model.

Figure 8 displays bivariate kernel density estimators for the pairs of empirical and fitted extremal coefficients for model 1 (stationary isotropic benchmark), model 6 (best non-mixture model according to the CBIC), and model 11 (best mixture model). Empirical estimates are calculated using the projection method of Marcon et al. (2014) based on the non-parametric Pickands dependence estimator of Capéraà et al. (1997). Extremal dependence is slightly underestimated for model 1 (with a majority of points lying above the diagonal line), but extremal coefficients for non-stationary models tend to be generally closer to the diagonal. The sum of squared distances between fitted and empirical extremal coefficients is 3.63, 3.12, 3.00 for models 1, 6, 11, respectively. Clearly, the stationary isotropic model provides the worse fit, which confirms our previous conclusions, and even more strongly supports the need for non-stationary dependence structures to incorporate meaningful covariates.

Fig. 8
figure 8

Bivariate kernel density estimators for pairs of empirical and fitted extremal coefficients, displayed for (left) model 1 (stationary isotropic extremal t model), (middle) model 6 (best non-mixture model), and (right) model 11 (best mixture model). A good fit should have points concentrated around the white diagonal line.

7 Discussion

The problem of building and fitting sensible non-stationary dependence models for spatial extremes is not trivial. We have tackled this problem by proposing a very general construction, combining max-stable processes (in particular the extremal t model), non-stationary correlation functions, and mixtures. The advocated locally elliptic model is based on Paciorek and Schervish (2006) and allows various non-stationary patterns to be flexibly captured in the extremal dependence structure by incorporating meaningful covariates. We have performed inference using pairwise likelihoods, which are computationally convenient, and we have shown by simulation that pairwise likelihoods can efficiently estimate the unknown parameters, provided that the station network is dense. However, more efficient approaches based on full likelihoods (Stephenson and Tawn 2005; Wadsworth and Tawn 2014; Thibaud and Opitz 2015) might be devised for the extremal t model.

Various non-stationary max-stable models, including altitude, longitude, and latitude as covariates, were fitted to a dataset of temperature maxima in Colorado, and these models were shown to provide a better fit with respect to the traditional stationary and isotropic max-stable counterpart, although there is still room for improvement. In particular, we have identified altitude as an important covariate. In future work, other covariates, such as the slope or solar radiation, might be used to improve the fit, perhaps from satellite data or regional climate computer models. Alternatively, more flexible non-stationary models might be constructed from a Bayesian perspective, though inference may be tricky and computationally very intensive if standard Markov chain Monte Carlo algorithms are used (but see Thibaud et al. 2015). The creation of models for asymptotic independence, a degenerate case in the max-stable paradigm, is also an important issue when data are non-stationary. One possibility could be to “invert” the non-stationary max-stable models proposed above (see Wadsworth and Tawn 2012; Davison et al. 2013).

Finally, we focused in this work on maxima, but more efficient approaches may be achieved by considering peaks over high thresholds (Huser et al. 2016). This approach, however, entails additional complications such as the modeling of temporal dependence, the selection of a suitable threshold, and the non-validity of extremal models at low levels, which might be even more difficult to handle when the data are non-stationary.