1 Introduction

There is a growing interest in studying the changing behavior of climatic extremes in both space and time, as the most acute societal impacts of climate change may be those arising from changes in the frequency and severity of extreme climate events (Zwiers and Kharin 1998; IPCC 2007, Chapter 3). The statistical theory of extreme values is a well-developed formalism for investigating the extremal characteristics of probability distributions and data sets (e.g. Coles 2001), and is increasingly being applied to, or proposed for, the analysis of the instrumental climate record and climate model output. A common strategy for analyzing climate-model-derived temperature or precipitation fields is to treat the parameters of the extreme value distribution as temporally or spatially varying quantities (Kharin and Zwiers 2000, 2005; Schliep et al. 2010; Mannshardt-Shamseldin et al. 2010). More generally, Katz (2010, p.5) argues that shifts in the frequency and magnitude of climate extremes can be reliably derived by modeling the temporal and spatial behaviour of the probability distribution characterizing the climate variable, rather than the extremes themselves. These analyses point to the importance of understanding the behavior of extremes, and that the modeling of extremes requires special considerations and methodology.

To place recently observed extremes in the climate system into a longer term context than is possible using the instrumental record, it is necessary to turn to the climate proxy record. There have been numerous efforts to infer both the spatial mean and spatial pattern of past surface temperatures from natural proxies—see NRC (2006), IPCC (2007, Chapter 6), and Jones et al. (2009) for reviews. Reports of temperature reconstructions frequently include statements about the degree to which certain (generally recent) years are in some sense extreme with respect to the baseline climate. The most well-known example may be Mann et al. (1999), which states that “the past decade [1989–1998] and past year [1998] are likely the warmest for the Northern Hemisphere this millennium”, a statement which is repeated in the IPCC Third Assessment Report (IPCC 2001, Chapter 2) (also see, e.g., Luterbacher et al. 2004; Barriopedro et al. 2011; Kaufman et al. 2009). Usually climate is inferred from proxies using robust regression, which results in an estimate of the most likely past climate and associated point-wise uncertainty measures. Such results, while providing important insights into long-term climate behavior, are limited in being able to answer questions about the distribution of extremes (see, e.g., NRC 2006; Field et al. 2012)

Several recent investigations of climate extremes using overlapping proxy and instrumental data sources data have focussed on resampling techniques (e.g., Li et al. 2007) or Bayesian modeling approaches (e.g. Li et al. 2010; McShane and Wyner 2011; Tingley and Huybers 2010a, b). While these techniques have clear advantages over regression-based approaches (Tingley et al. 2012), important limitations remain from the perspective of investigating extremal behavior. Since any analysis of extreme values is, implicitly, an analysis of the tail of a distribution, it is more appropriate to use Extreme Value Theory to directly model the tails of a distribution.

Akin to the Central Limit Theorem, which describes how, asymptotically, means of random variables result in a normal distribution, extreme value theory describes how, asymptotically, the distributional properties for the tail of a statistical distribution converge to a common distributional form that is independent of the model designated for the average behavior. Unlike regression or resampling approaches, extreme value theory does not require distributional assumptions for the original series, allows for a proper assessment of uncertainty, and allows for a formal investigation of spatial and temporal trends in the distribution of extremes.

Applications of extreme value theory to climate proxy data are limited. Katz et al. (2005) model sediment yield series using a generalized extreme value distribution. While they highlight that modeling spatio-temporal effects is paramount in ecology, their model only includes temporal components. Naveau and Ammann (2005) develop a time series of extreme events based on ice core sulfate proxy information, but do not model the distribution of the extremes. Cooley et al. (2006) model the age of moraines, using a generalized extreme value distribution for lichen measurements. With a separable spatio-temporal model to model the location parameter of the extreme value distributions, spatial effects are modeled via a random effect term; the shape and scale parameters are held fixed.

This article investigates both the spatial and temporal patterns in the extremal characteristics of climate proxies by applying extreme value theory to a set of tree ring density series over northern North America. As the spatio-temporal statistical model includes parametric spatial covariance functions, it can be used to interpolate extremal behavior at unobserved locations. The tree ring density series are described in the original publications as being predominantly temperature sensitive (Briffa et al. 2002a, b). To the extent that the extremes in the tree ring density series reflect extremes in climate, specifically temperature, any significant temporal and spatial variations detected in the extremal behavior of the tree ring densities are indicative of changes in the extremal characteristics of the climate system. However, we stress that the established tree ring density–temperature calibration is in terms of mean response, not extreme behavior. In addition, the density series are likely not dependent exclusively on temperature, as trees are affected in a complex way by numerous climatic and non-climatic variables, (e.g., NRC 2006; Evans et al. 2006). Here we establish that the extremal characteristics of the proxy series display spatial and temporal dependencies, which is a necessary precursor to using the proxy record to directly reconstruct the extremal behavior of the climate variables for which the proxies are informative—which is an aim of future research.

After reviewing extreme value theory in Section 2, we introduce the tree ring densities proxies in Section 3, and provide an exploratory extreme value analysis of these data in Section 4. We present a hierarchical Bayesian model that formally investigates the spatio-temporal relationships in the extreme value parameters of decadal maxima and minima in Section 5. These models build on previous work in the paleoclimate extremes by including non-separable spatial and temporal effects. We describe the results of applying this model to the tree ring density series in Section 6, and close with discussion in Section 7. Additional figures, modeling details, as well as the Markov chain Monte Carlo algorithm used to fit the Bayesian model are provided in the online supplement.

2 Statistical modeling of block-maxima for climate and paleoclimate series

The analysis of extremes generally proceeds via models for the tails of a distribution. A cursory introduction is provided here in the context of the maxima of (paleo)climate time series (results follow similarly for minima; see, e.g., Coles 2001, p.52), and refer the interested reader to general reviews (e.g., Coles 2001; Resnick 2007; de Haan and Ferreira 2006; Finkenstädt and Rootzén 2004). Consider a set of observations at regularly spaced time intervals, at a number of different spatial locations. At any individual location \(\boldsymbol{s}\), an extreme value analysis on the time series \(\{Y_t(\boldsymbol{s}) : t=1,\ldots,T(\boldsymbol{s}) \}\) can be carried out using the block maxima approach, the running-maxima approach, or the points-over-threshold approach. The block maxima approach is described below, picked primarily for the ease of modeling which it affords for this initial investigation of space–time variability in the parameters of the extreme value distributions of paleoclimate observations. The supplement details the other two most commonly utilized approaches, and distinguishes between marginal and multivariate extreme value analysis—an active area of statistical research.

For many times series, the time increments fall naturally into blocks, such as days or years. For a given block length B, suppose that the time series has a length \(T(\boldsymbol{s})\) that is divisible by B. The time period can then be split into \(N(\boldsymbol{s}) = T(\boldsymbol{s})/B\) blocks, and the maxima, \( M_j(\boldsymbol{s}) = \max\{ Y_{(j-1)B+1}(\boldsymbol{s}), \ldots, Y_{jB}(\boldsymbol{s}) \}, \) calculated for each block. Under certain regularity conditions, the series of block maxima asymptotically follow a generalized extreme value (GEV) distribution (Coles 2001), which we denote by \(M_j(\boldsymbol{s}) \sim \mbox{GEV}(\eta, \sigma, \xi)\). In practice, the GEV approximation improves for longer block lengths, B. Assuming that the parameters of the distribution are independent of time and space, the cumulative distribution function is given by

$$ \Pr\{M_j(\boldsymbol{s}) \le x\} = \exp\left\{-\left(1+\xi \left[ \frac{ x - \eta }{ \sigma } \right] \right)_+^{-1/\xi}\right\}, $$
(1)

where y  +  =  max {y,0}. η ∈ ℝ is the location parameter, with a larger value concentrating the distribution of the maxima at higher values. The scale parameter, σ > 0, determines the spread of the distribution, with higher values resulting in a more disperse distribution of the maxima. Finally, the shape parameter, ξ ∈ ℝ, describes the tail behavior of the distribution, with higher values corresponding to heavier tails. When the shape parameter is negative, the tails are bounded; otherwise the tails are unbounded. The minima are fit with the GEV models for maxima using the negative of the original minima observations and the negative of the corresponding location parameter estimates. Our key interest is to explore how the parameters of this GEV distribution vary over space and time, in order to draw conclusions about spatio-temporal evolution of the distribution which describes the extremal behavior of a climate sensitive proxy.

3 The proxies: annually resolved tree ring densities

The tree ring data set used here is a gridded version of the maximum late wood density data set described in Briffa et al. (2002a, b)Footnote 1 which has been included in numerous efforts to reconstruct past climate (e.g., Briffa et al. 2002a, b; Mann et al. 2007, 2008, 2009; Tingley 2009; Rutherford et al. 2005). Analyzing the proxy data set on a 5° by 5° grid will aid future comparisons of the extremes in the proxy series with those in the Climate Research Unit’s gridded instrumental temperature data set (Brohan et al. 2006), which is often used to calibrate paleoclimate reconstructions (e.g., Luterbacher et al. 2004; Rutherford et al. 2005; Mann et al. 2008; Kaufman et al. 2009).

The grid box averages are formed as weighted averages of the site chronologies in each grid box, with weights determined by the number of trees in each chronology. Each chronology in turn is formed by averaging cores from individual trees (roughly 20) for a given site, after removing growth effects (Briffa et al. 2002a, b). These gridded density series are described as standardized residuals with arbitrary units. As the time series span different lengths, the series represent residuals or anomalies with respect to means calculated over different reference intervals. We guard against the differing reference intervals influencing the interpretation of spatial and temporal trends in the extremes by including location-specific intercept terms in the model described in Section 5.

For this initial application of extreme value theory to decadal maxima and minima of tree ring proxies, the study region is confined to northern North America, where there are 34 series of lengths ranging from 11 to 58 decades; each grid box contains one series. Figure 1a and b summarizes the set of spatial locations and the bimodal distribution of the number of decades per location. As a summary of the overall central tendency of the tree ring density series, Fig. 1c plots the mean across the available observations for each year, which displays both high frequency and low frequency variability. The vertical dashed lines in the figure display all known volcanic eruptions since AD 1400 with reported Volcanic Explosivity Index (VEI) greater than six (taken from Simkin and Siebert 1994). There are sudden reductions in the mean across the density series after most of these large volcanic events. The gray regions in Fig. 1c show the minimum and maximum values for each year summarized over locations, indicating that, taking the region as a whole, the maxima may be increasing over the years and the minima may be decreasing. However, as Fig. 1d shows a disparity in the number of available observations over time (with fewer proxies available at earlier time points), we caution against drawing conclusions from these apparent patterns. We leave for future research the inclusion of volcanic activity in the model, as further work is required to understand the delay and spatial influence of volcanoes.

Fig. 1
figure 1

a The location of the 34 tree ring density series, one series per grid box, arranged on a 5° × 5° grid over northern North America. In each box, the top number is the location identifier, and the number in parentheses is the number of decades observed. b A histogram summarizing the number of decades per location. c The black denotes the mean proxy value by year, averaged over all locations. The gray region denotes the minimum and maximum proxy values per year, summarized over all locations. The vertical dashed lines denote the years of significant volcanic events. d The number of proxies that make up the average in each year

4 Exploratory extreme value analysis

For each of the tree ring series at the 34 different grid cells, the decadal maxima and minima are calculated, along with the years within each decade at which they occurred (Fig. 2 shows results for representative locations). There is clear evidence of spatially varying temporal dependencies in the decadal maxima and minima series. At the majority of locations, the decadal maxima exhibit increasing long-term trends of differing magnitudes. The decadal minima are more mixed, with some areas in the west exhibiting increasing trends (e.g., locations 2 and 29), and some areas in the east exhibiting decreasing trends (e.g., locations 23 and 27). For the most part, the variability of the decadal minima (with respect to the temporal trend) appears larger than that for the the decadal maxima.

Fig. 2
figure 2

The top left panel shows the locations of the 34 tree ring density series. Blue numbers denotes the locations for which time series plots of the decadal maxima (black lines) and minima (blue lines) are shown in the remaining panels. Thick solid lines give the estimated time-varying location parameters, \(\hat{\eta}_j(\boldsymbol{s})\), at each spatial location, using maximum likelihood, and dashed lines display the associated pointwise 95 % confidence intervals for the location parameters

Separate GEV models are fit to both the decadal maxima and minima at each location.Footnote 2 To formally investigate temporal and spatial patterns in the extremes, the GEV models include time as a covariate in the specification of the location parameter. Let \(\boldsymbol{s}_i\) (i = 1, ..., I = 34) denote the spatial centroid of the grid cell corresponding to the ith tree ring series, \(M_j(\boldsymbol{s}_i)\) the decadal maxima (or minima) in decade j (\(j=1,\ldots,N(\boldsymbol{s}_i)\)), \(\textrm{yr}_j(\boldsymbol{s}_i)\) the year in which the decadal maxima (or minima) occurred within decade j, and \(a_j(\boldsymbol{s}_i) = (\textrm{yr}_j(\boldsymbol{s}_i) - 1405)/600\) a standardized year variable. For each i, assume the \(\{M_j(\boldsymbol{s}_i) : i = 1, \ldots N(\boldsymbol{s}_i) \}\) are independent and GEV-distributed:

$$ M_j(\boldsymbol{s}_i) \sim GEV(\eta_j(\boldsymbol{s}_i), \sigma(\boldsymbol{s}_i), \xi(\boldsymbol{s}_i)), \;\; \mbox{with} \;\; \eta_j(\boldsymbol{s}_i) = \alpha(\boldsymbol{s}_i) + \beta(\boldsymbol{s}_i) a_j(\boldsymbol{s}_i). $$
(2)

This model allows the temporal dependence of the location parameter, \(\eta_j(\boldsymbol{s}_i)\), to vary as a function of space. On account of the limited number of observations at each location, the scale, \(\sigma(\boldsymbol{s}_i),\) and shape, \(\xi(\boldsymbol{s}_i),\) parameters are assumed to vary spatially across locations but are constant in time (see also, Sang and Gelfand 2009; Cooley and Sain 2011).

Parameters of the GEV models are estimated via maximum likelihood (ML) in the R software package (R Development Core Team 2011) using the ismev R library.Footnote 3 Results confirm earlier observations, as the estimated location parameters for the decadal maxima and minima (Fig. 2, thick solid lines) display temporal trends that vary as a function of space. 95 % pointwise confidence intervals for the estimated location parameters (Fig. 2, dashed lines) indicate that, on a site-by-site basis, there are significant, increasing trends in many of the decadal maxima series, while trends in minima series are both positive and negative, depending on the location.

Spatial maps of the ML parameter estimates (see the supplement) show that, for the maxima, the slopes in the model for the location parameter of the GEV distribution tend to be positive. Long-term trends in the maximum decadal value of the tree ring density are thus increasing across the majority of the spatial locations. The ML slope estimates from the maxima model display strong spatial homogeneity, and many of the negative values are associated with large standard errors, indicating uncertainty in the direction of the slopes at these locations. The large standard errors occur mostly where the tree ring series are shortest. For the decadal minima, there is a spatially-cohesive region in the east for which the slope parameter is either no different from zero or negative. The slope parameters in the west indicate that the decadal minima are more likely to increase in value as a function of time. Exploratory spatial analysis of the ML estimates suggests there is significant spatial correlation and nugget effects in the intercept and slope parameters for the maxima and minima, and in the model defined in Section 5 below, we include covariates (longitude and latitude) to account for nonstationarities in space. The scale parameters display strong spatial homogeneity, and results confirm that the decadal minima of the tree ring density series tend to be more variable, with respect to the corresponding temporal trends, than the decadal maxima. The estimates of the shape parameters are more variable over space and suggest that the tails of the GEV distributions are shorter for the maxima than for the minima.

While fitting GEV models (Eq. 2) individually to the decadal maxima (or minima) at each location makes sense for a simple analysis, doing so does not leverage the spatial dependence between the GEV parameters at nearby locations. A model which takes into account the spatial persistence in the parameters allows estimates to borrow strength across space, thus reducing uncertainties. In addition, fitting models on a site-by-site basis complicates the overall joint assessment of significance, due to the standard multiple comparison problem (e.g., Hsu 1996). The following section presents a hierarchical model for extrema that avoids these issues by modeling the joint distribution of the GEV parameters.

5 Bayesian hierarchical modeling of paleoclimate extremes

The key idea underlying the hierarchical statistical model developed in this section is the notion of spatially varying coefficients for the GEV distribution (e.g., Fotheringham et al. 2002; Gelfand et al. 2003). In the previous section, a different value of the temporal slope is estimated in the model for the location parameter of the GEV at each location, without regard for neighboring locations. Such a model does not reflect scientific intuition, as it is anticipated that the geophysical variables influencing tree growth—including the variables which influence extremes—will vary smoothly as a function of space. Spatially varying coefficient models allow information about a given parameter to be shared between spatial locations. When there is even moderate spatial dependence among the parameters, such sharing of information can overcome the sparse data issues (e.g., Little and Rubin 2002; Banerjee et al. 2004) that affected the assessment of uncertainty at some of the locations for the site-by-site analysis.

5.1 Defining the hierarchical model

We assume that the distribution of the maxima (or minima) over decades \(j=1,\ldots,N(\boldsymbol{s}_i)\) and locations \(\boldsymbol{s}_i\) (i = 1, ..., K) are independent, conditional on the parameters of the GEV distribution, with

$$ M_j(\boldsymbol{s}_i) | \eta_j(\boldsymbol{s}_i), \sigma, \xi \sim \mbox{GEV}( \eta_j(\boldsymbol{s}_i), \sigma, \xi).$$
(3)

The location parameter \(\eta_j(\boldsymbol{s}_i)\) again satisfies Eq. 2. The scale and shape parameters (σ and ξ) are modeled as each being constant over space and time, a modeling choice motivated by the results of Section 4. We discuss a more general model for the scale parameters in the supplement, but model diagnostics did not indicate any significant improvements in model fit.

The spatially varying coefficient models for the intercepts, \(\alpha(\boldsymbol{s})\), and slopes, \(\beta(\boldsymbol{s})\), are specified in terms of Gaussian processes. In particular, it is assumed that each is multivariate normal, with a mean that is linear in the longitude and latitude, and a stationary, spatial covariance structure that captures residual spatial dependence between locations, while allowing for a nugget effect that captures possible site-by-site heterogeneity. Let \(\mbox{lo}(\boldsymbol{s}_i)\) denote the longitude at spatial location \(\boldsymbol{s}_i\), and \(\mbox{la}(\boldsymbol{s}_i)\) denote the latitude at spatial location \(\boldsymbol{s}_i\). Define \(\boldsymbol{X}\) to be a K × 3 design matrix with first column all ones, second column \((\mbox{lo}(\boldsymbol{s}_1), \ldots, \mbox{lo}(\boldsymbol{s}_K))^T\), and third column \((\mbox{la}(\boldsymbol{s}_1), \ldots, \mbox{la}(\boldsymbol{s}_K))^T\). Let \(N_{K}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) denote the K-variate normal distribution with mean \(\boldsymbol{\mu}\) and covariance Σ. Assuming \(\boldsymbol{\alpha} \equiv ( \alpha(\boldsymbol{s}_1), \ldots, \alpha(\boldsymbol{s}_K) )^T\) is independent of \(\boldsymbol{\beta} \equiv ( \beta(\boldsymbol{s}_1), \ldots, \beta(\boldsymbol{s}_K) )^T,\) the spatial models take the form,

$$ \begin{array}{rll} \boldsymbol{\alpha} | \boldsymbol{\lambda}_{\alpha}, \tau^2_{\alpha}, \omega^2_{\alpha}, \phi_{\alpha} &\sim& N_{I}\!\left( \boldsymbol{X} \boldsymbol{\lambda}_{\alpha}, \tau^2_{\alpha} \boldsymbol{R}(\phi_{\alpha}) + \omega^2_{\alpha} \boldsymbol{I}\right); \\ \boldsymbol{\beta} | \boldsymbol{\lambda}_{\beta}, \tau^2_{\beta}, \omega^2_{\beta}, \phi_{\beta} &\sim& N_{I}\!\left( \boldsymbol{X} \boldsymbol{\lambda}_{\beta}, \tau^2_{\beta} \boldsymbol{R}(\phi_{\beta}) + \omega^2_{\beta} \boldsymbol{I} \right). \end{array} $$
(4)

In Eq. 4, \(\boldsymbol{I}\) is the identity matrix, \(\boldsymbol{\lambda}_{\alpha} = (\lambda_{\alpha,1}, \lambda_{\alpha,2}, \lambda_{\alpha,3})^T\) are the regression parameters in the model for the spatially varying intercept, \(\tau^2_{\alpha}\) is the residual variance, and \(\omega^2_{\alpha}\) is the nugget variance, parameterizing the site-by-site heterogeneity that is unaccounted for by the spatial model. The K × K matrix \(\boldsymbol{R}(\phi_{\alpha})\) defines the residual correlation between spatial locations; assuming a stationary exponential correlation, the (i, i′) element of the correlation matrix gives the correlation between centroids \(\boldsymbol{s}_{i}\) and \(\boldsymbol{s}_{i'}\):

$$ [\boldsymbol{R}(\phi_{\alpha})]_{i,i'} = \exp\!\left( - || \boldsymbol{s}_{i} - \boldsymbol{s}_{i'} || / \phi_{\alpha}\right). $$
(5)

Here || · || is the chordal distance and is defined in the supplement. The use of the chordal distance induces “a valid correlation function on the sphere” (Banerjee 2005, p.620), and allows for the interpretation of the range parameters in units of kilometers. In Eq. 5 the range parameter ϕ α defines the strength of the spatial correlation between sites. The interpretation of the parameters characterizing the spatially varying slopes, \(\boldsymbol{\beta}\), in Eq. 4 are similar.

To allow for full propagation of uncertainties, a Bayesian approach is taken to fit the model defined by Eqs. 35. In addition, the Bayesian approach allows us to make full summaries of the posterior distributions, which is useful, for example, in characterizing the probability that trends in the GEV parameters of maxima or minima are significantly different from zero. An alternative to using the Bayesian approach would be to fit the model via maximum likelihood with the Expectation-Maximization (EM) algorithm (Dempster et al. 1977).

5.2 Bayesian inference: priors and sampling the posterior distribution

Bayesian inference requires that prior distributions be specified for all unknown parameters. Where possible, conjugate prior distributions are employed to simplify the inference (see the supplement for details). In the Bayesian paradigm, all inference is based on the posterior distribution. With \(\boldsymbol{\eta} = (\eta_j(\boldsymbol{s}_i))\) denoting the collection of location parameters, the unknown parameters for the model of the previous section can be collected in a single vector, \( \boldsymbol{\theta} = \big(\boldsymbol{\eta}, \sigma, \xi, \boldsymbol{\alpha}, \boldsymbol{\beta}, \boldsymbol{\lambda}_{\alpha}, \tau^2_{\alpha}, \phi_{\alpha}, \omega^2_{\alpha}, \boldsymbol{\lambda}_{\beta}, \tau^2_{\beta}, \phi_{\beta}, \omega^2_{\beta}\big).\) The posterior distribution \(\pi(\boldsymbol{\theta} | \boldsymbol{y})\) of the parameters given the decadal maxima (or minima) of the tree ring densities, \(\boldsymbol{y} = ( M_j(\boldsymbol{s}_i) )\), is,

$$ \begin{array}{rll} \pi(\boldsymbol{\theta} | \boldsymbol{y}) &\propto& \left\{ \prod\limits_{i=1}^I \prod\limits_{j=1}^{N(\boldsymbol{s}_i)} f\big( M_j(\boldsymbol{s}_i) | \eta_j(\boldsymbol{s}_i), \sigma, \xi\big) \right\} \\ &\times& \left\{ \pi\big(\boldsymbol{\alpha} | \boldsymbol{\lambda}_{\alpha}, \tau^2_{\alpha}, \phi_{\alpha}, \omega^2_{\alpha}\big) \pi(\boldsymbol{\lambda}_{\alpha}) \pi\big(\tau^2_{\alpha}\big) \pi(\phi_{\alpha}) \pi\big(\omega^2_{\alpha}\big) \right\} \\ &\times& \left\{ \pi\big(\boldsymbol{\beta} | \boldsymbol{\lambda}_{\beta}, \tau^2_{\beta}, \phi_{\beta}, \omega^2_{\beta}\big) \pi\big(\boldsymbol{\lambda}_{\beta}\big) \pi\big(\tau^2_{\beta}\big) \pi(\phi_{\beta}) \pi\big(\omega^2_{\beta}\big) \right\} \\ &\times& \left\{ \pi( \sigma ) \pi( \xi ) \right\}. \end{array} $$
(6)

The first term defines the likelihood of the maxima at each location conditional on the spatially—and decadally—dependent location parameter \(\eta_j(\boldsymbol{s}_i)\), scale parameter σ and shape parameter ξ. The products follow from assuming conditional independence between sites and decades, where \(f( M_j(\boldsymbol{s}_i) | \eta_j(\boldsymbol{s}_i), \sigma, \xi)\) denotes the likelihood at location \(\boldsymbol{s}_i\) and decade j, as defined by Eq. 3. The second term is the model for the spatially-varying intercept coefficients, \(\pi\big(\boldsymbol{\alpha} | \boldsymbol{\lambda}_{\alpha}, \tau^2_{\alpha}, \phi_{\alpha}, \omega^2_{\alpha}\big)\), conditional on the regression parameters, (\(\boldsymbol{\lambda}_{\alpha}\)), variances \(\big(\tau^2_{\alpha}\) and \(\omega^2_{\alpha}\big)\) and spatial correlation parameters (ϕ α ). These parameters have mutually independent prior densities given by \(\pi(\boldsymbol{\lambda}_{\alpha})\), \(\pi\big(\tau^2_{\alpha}\big)\), π(ϕ α ), and \(\pi\big(\omega^2_{\alpha}\big)\) respectively. Similarly, the third term is the model for the spatially-varying slope coefficients. In the final term, π(σ) and π(ξ) are the prior densities for the scale and shape parameters, which are mutually independent.

The posterior distribution defined in Eq. 6 does not follow a well known distributional form. In such cases a standard strategy (e.g., Gelman et al. 2004) is to sample from this distribution using a Markov chain Monte Carlo (MCMC) algorithm. The supplement provides details of the algorithm used to sequentially draw samples of the parameters \(\boldsymbol\theta\) given the data, \(\boldsymbol{y}\), and values of the hyperparameters. For each run of the MCMC algorithm, the first 10,000 samples were discarded as a “burn-in”, in order to guard against results being influenced by the algorithm being initialized in an unlikely areas of the posterior parameter space (e.g., Gelman et al. 2004, Section 11.6). In addition trace plots (time series plots of the MCMC draws) were used to check that samples were “mixing well”, in the sense that draws were representative of the posterior distribution as a whole, and not sampling a subspace. 100,000 posterior samples were produced after the burn in, but to reduce the autocorrelation between subsequent samples, only every 20th draw was retained. Thus the Bayesian inference described below is based on 5,000 samples.

6 Results

The model defined by Eqs. 35 was fit separately to the decadal maxima and minima of the tree ring density series. The posterior mean and 95 % credible intervals (CIs) for selected model parameters for the fits to the decadal maxima and decadal minima are shown in Table 1. These tables indicate Bayesian learning in most of the parameters—the posterior distributions of the parameters are different from prior distributions, indicating that the data update our knowledge about the parameters. Section 6 of the supplement demonstrates this learning graphically for two of the parameters in the model. Table 1 confirms the findings of the exploratory data analysis (Section 4). The scale parameter, σ, is larger for the minima model versus the maxima model, indicating that the decadal minima exhibit more variability than the decadal maxima. The 95 % CI for the shape parameter, ξ, in the decadal maxima model is entirely below zero, which provides strong evidence of bounded tails for the maxima (compare with Section 4). In comparison, the interval for ξ in the decadal minima is wider, distributed over a higher set of values, and includes zero. The broader range of values for ξ in the minima models reflects a higher uncertainty in the tail behavior of the decadal minima. The inclusion of positive values in the 95 % CI for ξ likely reflects model uncertainty, rather than an unbounded tail for the distribution of the minima; the Supplement discusses model diagnostics in more detail.

Table 1 Posterior summaries of a subset of the parameters in the Bayesian hierarchical model for the decadal maxima (left) and decadal minima (right) of the tree ring density time series

The remaining parameters in Table 1 summarize the posterior distributions for \(\alpha(\boldsymbol{s})\) and \(\beta(\boldsymbol{s})\), the spatially varying intercepts and slopes in the location parameter model given by Eq. 2. For the decadal maxima model, more negative values in the 95 % credible interval for λ α,2 and λ α,3 indicate that the intercepts, \(\alpha(\boldsymbol{s})\), may decrease with longitude and latitude. The 95 % CI for λ β,3 indicates a weakly positive increase in the slopes, \(\beta(\boldsymbol{s})\), with latitude, while the interval for λ β,2 suggests (albeit with lower significance) a weakly positive effect with longitude. In the decadal minima model, the intercepts, \(\alpha(\boldsymbol{s})\), significantly increase with latitude and there is a weakly positive but less significant effect with longitude. In contrast, the slopes, \(\beta(\boldsymbol{s})\), significantly decrease with longitude. In both the maxima and minima models, the longitudinal dependencies are modulated by significant spatial covariances—as measured by the posterior distributions of \(\tau^2_{\alpha}\) and ϕ α for the intercepts, and \(\tau^2_{\beta}\) and ϕ β for the slopes. In the models for both the intercepts and slopes, the nugget effects, \(\omega^2_{\alpha}\) and \(\omega^2_{\beta}\), are smaller in magnitude than the corresponding spatial covariance effects as measured by \(\tau^2_{\alpha}\) and \(\tau^2_{\beta}\).

To investigate the net effects of the longitudinal dependence and spatial variability of the slope parameters, posterior summaries of \(\beta(\boldsymbol{s})\) (from Eq. 2) are shown in Fig. 3, for both the decadal maxima and decadal minima models. Monte Carlo methods are used to sample from the posterior distribution of the slope parameters, conditional on the data and model assumptions, while the model in Eq. 4 allows for predictions of the \(\beta(\boldsymbol{s}^*)\) at unobserved grid centroids \(\boldsymbol{s}^*\), as the covariates (longitude and latitude) are fully observed and the covariance structure allows for the sharing of information across space.

Fig. 3
figure 3

a Posterior summaries of the spatially varying slopes (\(\beta(\boldsymbol{s})\) from Eq. 2) for the location parameters in the maxima GEV model at the measured grid cells (with gray borders) and the unobserved locations (without borders). In each box, the top number denotes the posterior mean slope value, and the number below in parentheses is the associated posterior standard deviation. Boxes shaded pink have a slope that is positive with a posterior probability of at least 0.95; boxes shaded blue have a slope that is negative with a posterior probability of at least 0.95. b As in a, but for the spatially varying slopes for the location parameters in the minima GEV model

These posterior summaries indicate strong patterns in the temporal slope behavior of the GEV location parameters for both the decadal maxima and the decadal minima. The maxima display significant increases with time across the entire spatial domain, with the trends becoming larger and more certain moving to the east and the north. The minima display significant increases with time in the western portion of the region, decreases with time in the east (with varying levels of significance), and do not display trends significantly different from zero in the center of the region. The slopes for the minima are generally more uncertain in the eastern part of the domain, as compared to the west.

Quantile plots were used to assess the distributional assumptions made by the GEV models (see, e.g., Coles 2001, for general discussion of diagnostics for models of extreme values). In general, the quantile plots show that the GEV models fit better for the decadal maxima than for the decadal minima; further details and figures are provided in the supplement. To investigate the robustness of our conclusions to changes in the modeling and prior assumptions, we considered additional models for the decadal maxima (and minima) of the tree ring density series (see the supplement for details) and obtain qualitatively similar results and conclusions to those shown in Fig. 3. While posterior diagnostics indicate that the GEV approximation is reasonable in this particular application, a word of caution is in order. In most paleoclimate applications, data are sparse and incomplete in both space and time. Paleoclimate records reflect, at best, seasonal averages and the decadal blocks used here, while a natural choice, are short. In related applications, it is quite possible that the data may violate the extreme value theory assumptions and lead to non-interpretable results, pointing to the need for careful model checking. In such situations, there are other methods that are available for modeling the tail of the distribution. These include, for example, a two parameter Gumbel model, the largest order statistic model (see Coles 2001, pg. 66), and quantile regression.

7 Discussion and conclusions

Understanding and characterizing the distribution of climate extremes is an important and societally relevant undertaking (Zwiers and Kharin 1998; IPCC 2007, Chapter 3). While many reconstructions of late Holocene climate include statements about the extent to which recently observed climate is, by some metric, extreme (e.g., Mann et al. 1999; Luterbacher et al. 2004; Kaufman et al. 2009; Barriopedro et al. 2011), such studies do not exploit the power of statistical extreme value theory. Application of generalized extreme value theory to a suite of climate sensitive (Mann et al. 2008; Briffa et al. 2002a, b) late wood tree ring density series over Northern North America reveals a rich spatio-temporal structure in the distributional parameters governing the maximal and minimal behavior of the proxies. The analysis presented here shows that the decadal maxima are trending upwards over this spatial region, while the decadal minima are trending upwards in the west, and exhibit some downwards trends in the east.

Centennial scale changes in either the mean behavior or the variability of the proxies would result in changes in the distributions of the decadal maxima and minima (Fig. SPM.3 from the IPCC SREX; Field et al. 2012), and a positive trend in the maxima and negative trend in the minima is consistent with an increase in the variability of the proxies with time. Results from the maxima and minima models together thus indicate that the range in tree ring densities is increasing faster in the east than the west, and the distribution is likewise more volatile in the east. While the analysis presented here is of tree ring density series, not temperatures, the motivation behind this analysis is the importance of quantifying recent climatic extremes in the longer term context afforded by the paleoclimate record (Field et al. 2012; Luterbacher et al. 2004; Barriopedro et al. 2011). Any conclusions drawn about the climate from this work necessarily assumes that the well-established tree ring density–temperature connection for mean behavior (e.g. Briffa et al. 2002a, b) likewise holds for extremes—an issue we defer for future research.

The linear model for the time dependence of the GEV location parameters is likely a simplification, and many improvements to the hierarchical model presented here are possible. Including covariates, such as green house gas concentrations, solar irradiance, and volcanic forcing (c.f., Li et al. 2010) may improve model fit. In particular, including information about large volcanic eruptions, which can be associated with rapid and short climatic cooling, may improve the fit of the decadal minima model. We stress that this work presents a novel framework for analyzing the extremal behavior of climate proxies, and even the relatively simple models used here provides significant evidence of temporal and spatial structure in the maxima and minima. This methodology may be readily applicable to other proxy series, such as oxygen isotope ratios from ice cores, or tree ring widths, where we stress once more the importance of checking that the GEV model is an adequate description of the maxima and minima; see Section 6. Understanding how the extremal value properties of various climate-sensitive proxies differ from one another will be helpful in interpreting climate reconstructions based on the different proxies.

The distributional behavior of extremes in space and time provides important information about how the climate system is changing. Most paleoclimate reconstructions model the proxy-climate relationship as linear, with additive Gaussian errors (see, e.g., Jones et al. 2009). When dealing with extremes, the assumption of Gaussian errors is clearly incorrect, and a linear relationship in the underlying distribution may not hold in the tails. This article provides an introductory framework and novel first analysis for implementing extreme value methodology for paleoclimate series in an effort to further understand long-term changes in climate behavior. The analysis presented here reveals a rich space–time structure in the parameters governing extremal behavior of climate sensitive proxy series—tying the results directly to the climate remains an important area for future exploration.