1 Introduction

After the hurricane Katrina disaster in 2005 and the tsunami tragedy that struck Asia at the end of 2004, public awareness of the potential impacts of storm surges and ocean waves on human society, the environment and ecosystems has increased. One of the issues of concern is whether or not external influences on the climate system, especially human influence, have affected the storm and ocean wave climate. Although this issue is yet to be addressed, there is evidence of significant change in extratropical cyclone activity and ocean wave heights in the boreal cold seasons of the last half century. For example, a significant increasing trend in winter (January–March) strong-cyclone activity over the high-latitude North Atlantic has been identified in the sea level pressure (SLP) fields taken from both the ERA40 reanalysis (ERA40 hereafter; Uppala et al. 2005) and the NCEP-NCAR Reanalysis (Kalnay et al. 1996; Kistler et al. 2001) for the 1958–2001 period. There is also a significant decreasing trend in strong-cyclone activity over the mid-latitude North Atlantic over this period (Wang et al. 2006). These changes are associated with a northward shift of the mean position of the North Atlantic storm track of about 180 km (Wang et al. 2006). Consistent with the increase in strong-cyclone activity, the northeast North Atlantic ocean has also been found to have roughened in winter during the 1958–1997 period, while significant decreases of ocean wave heights are identified in the subtropical North Atlantic (Wang and Swail 2006a; Wang and Swail 2001, 2002; WASA Group 1998; Bacon and Carter 1991).

There have been numerous studies on the detection, attribution, and quantification of the influence of external forcing on a range of climate variables, such as surface air temperature, atmospheric pressure, free atmosphere temperature, tropopause height, ocean heat content, sea ice extent, and precipitation (e.g., Hegerl et al. 1997; Tett et al. 1999; Zwiers and Zhang 2003; Stott 2003; IDAG 2005; Zhang et al. 2006; Gillett et al. 2003, 2005; Jones et al. 2003; Santer et al. 2003; Barnett et al. 2005; Hegerl et al. 2007; Zhang et al. 2007). However, previous studies have not addressed the question of whether external factors have influenced atmospheric storminess and ocean wave heights in the extra-tropics. This study attempts to do exactly that by comparing observationally based estimates of changes in atmospheric storminess and ocean wave heights with estimates derived from multi-model simulations of climate change driven with historical external forcing from both anthropogenic sources (e.g., greenhouse gases and aerosol forcing) and natural sources such as solar and volcanic forcing.

This article is structured as follows. The data sets used and their preparation are described in Sect. 2. The detection analysis method is described in Sect. 3. The results are presented and discussed in Sect. 4, followed by some concluding remarks in Sect. 5.

2 Data sets and preparation procedure

This study analyzes two climate elements, a measure of atmospheric storminess and ocean wave heights. The atmospheric and ocean wave height data sets that we use are described in the following two subsections.

2.1 Atmospheric data and preparation

Many indices have been used to represent atmospheric storminess, including (but not limited to) cyclone count statistics (e.g., Pettersen 1956; Whitaker and Horn 1984), eddy variance/covariance statistics (e.g., Blackmon 1976; Hoskins and Hodges 2002), and geostrophic wind speeds (e.g., Matulla et al. 2008; Alexandersson et al. 1998, 2000). In this study, we also use an index that represents the geostrophic wind energy, namely the squared seasonal mean SLP gradient, to provide a measure of atmospheric storminess. We analyze time series of anomalies of the squared SLP gradient, G t (here t denotes years) expressed relative to the 1961–1990 mean, to determine whether or not external influence on change in atmospheric storminess is detectable. The technical details of how the atmospheric storminess index G t is derived are described in Appendix A.

For comparison with previous detection studies on SLP (Gillett et al. 2003, 2005), we also analyze time series of seasonal mean SLP anomalies expressed relative to the 1961–90 mean, P t , which represent variability in the mean SLP field.

We use observationally based proxies derived from ERA40 for the 1958–2001 period (Uppala et al. 2005) and also use HadSLP2 (Allan and Ansell 2006). While the latter contains gridded SLP observations for the period 1900–2004 (available online at http://hadobs.metoffice.com/gmslp/hadslp2/index.html), we focus most of our effort on the most recent 50 years (1955–2004), which have significantly better spatial coverage than the earlier part of the record.

In order to detect the influence of historical external forcing in the observations, climate model based estimates of the response to external forcing and of natural internal climate variability are required. Thus, we also use simulations of SLP from the nine coupled ocean-atmosphere models listed in Tables 1 and 2. These simulations were obtained from the multi-model data archive at PCMDI (https://esg.llnl.gov:8443/index.jsp). An ensemble of simulations of the twentieth-century with historical external forcing is available for each of these models, with the individual ensemble sizes ranging from 3 to 7. In total, we use 41 such simulations (Table 1). Note that the twentieth-century simulations generally finished in 1999 or 2000. As in Gillett et al. (2005), we have extended these simulations to 2004 using output for 2000–2004 from integrations using the so-called SRES A1B, A2 or B1 emission scenario (Nakicenovic and Swart 2000), so that the model simulations can be compared to observations up to 2004 (see Table 1). The SRES emission scenarios are used extensively in the IPCC (2001, 2007) reports. There should not be significant differences among different scenarios for the period 2000–2004 (see, for example, Fig. 10.4 in Meehl et al. 2007). Control integrations from each of these models, which are listed in Table 2, are also used in this study.

Table 1 The nine coupled ocean–atmosphere models used in this study and the number of twentieth century runs conducted with each of these models (the period of simulation given in parentheses)
Table 2 The control integrations (runs) used in this study

The ERA40 SLP data are available on a 2.5°-by-2.5° lat-long grid, while the HadSLP2 data are only available on a 5°-by-5° lat-long grid and the multi-model SLP data were archived on different grids for different models. All of these data were interpolated onto the same 5°-by-5° lat-long grid, with the first gridpoint centered at (2.5°E, 87.5°S), and the last, at (357.5°E, 87.5°N).

Our detection analysis will focus on G t and P t change in each of the four seasons, separately: winter (JFM), spring (AMJ), summer (JAS), and fall (OND). We choose this definition of the seasons for convenience and because the resulting JFM data series is one year longer than the DJF data series. It is shown later in Sect. 4 that this definition does not notably affect the results of our detection analysis.

For each gridpoint we analyze, we obtain observed G t and P t time series from the ERA40 and HadSLP2 datasets, and also obtain corresponding time series from each of the climate model simulations. Linear trends are estimated from each time series for each season, over the periods 1958–2001 and 1955–2004, using the least squares method (this method is used throughout this paper). These two overlapping periods are chosen because the ERA40 covers only the period 1958–2001 while HadSLP2 and the climate model simulations cover the longer period 1955–2004. Our detection analyses, which will compare the observed and simulated trend patterns, will be based on 1958–2001 when using ERA40, and 1955–2004 when using HadSLP2.

The linear trend patterns in G t over the common period 1958–2001 are shown in Fig. 1. The corresponding 1955–2004 trends (not shown) are similar, though generally somewhat smaller. The multi-model simulated trend patterns in G t (and P t , not shown) resemble their observed counterparts in the boreal cold seasons, although the magnitude of the observed trend in the North Atlantic storm track region is substantially under-estimated (see Fig. 1c, e or d, f). The observed and simulated trend patterns are also similar in the southern hemisphere in the austral cold seasons (not shown).

Fig. 1
figure 1

The linear trend patterns of the geostrophic wind energy index G t (unit: (hPa)2/5°) as derived from ERA40, HadSLP2, and the multi-model/ensemble mean simulations of G t for the period 1958–2001. Solid contours and yellow-red shadings indicate upward/positive trends, and dashed ones and green-blue shadings downward/negative trends (unit: (hPa)2/5° per century). The contourline interval is 10 units (note that contour lines are drawn every other shading level). The percentage of grid-boxes that were found to have a significant trend is given in the parentheses above

Importantly, the storminess trends inferred from the geostrophic wind energy index G t are consistent with the findings of previous studies using other types of storminess indices such as indices that are based on applying objective cyclone detection and tracking algorithms to 6-hourly SLP fields (Wang et al. 2006; Gulev et al. 2001). This consistency suggests that the index G t represents the trend component of the atmospheric storminess adequately. It should therefore be suitable for a detection analysis on change in atmospheric storminess.

2.2 Ocean wave height data and preparation

It is also of great interest to determine whether or not external influence on ocean wave heights is detectable, which is another goal of this study. Here, we focus on significant wave height (SWH). Thus, we also need both SWH observations and simulations of SWH response to historical external forcing.

There exist two global wave reanalyses: ERA40 (Uppala et al. 2005; Caires et al. 2004a) and AES40 (Cox and Swail 2001). Having inter-compared and validated these reanalyses against independent NOAA/NDBC buoy and TOPEX/Poseidon altimeter observations, Caires et al. (2004b) concluded that most of the large-scale features of observed wave height variability are equally present in these wave datasets. Thus, in this study, we use the original ERA40 SWH data as a proxy for actual SWH observations. These data were aggregated onto the same 5°-by-5° lat-long grid as is used to represent G t and P t (see Sect. 2.1).

Since SWH data are not directly available from the output of global climate models, estimates of the SWH response to historical external forcing must be obtained empirically. We therefore used a statistical model to represent the observed relationship between atmospheric circulation and SWH, and applied this statistical model to climate model output in order to estimate variation and change in SWH that is induced by natural and anthropogenic forcing combined. Since the SLP-SWH relationships are weaker in the boreal warm seasons (Wang and Swail 2006b), we will focus only on the boreal winter and fall in this study. We will analyze both seasonal means and extremes of SWH in these seasons. The statistical models used for estimating seasonal mean and extreme SWH variates are described in following two subsections.

2.2.1 Seasonal mean SWH

Following Wang and Swail (2006a, b), we use the regression equation

$$ H_{\rm avg, \; {\it t}} = a + b G_t + c P_t + \varepsilon_t $$
(1)

to represent the observed relationship between seasonal mean SWH (H avg, t ) and the atmospheric circulation variates (G t and P t ). Here, ɛ t denotes a white noise process. The parameters a, b, and c are estimated using the seasonal quantities (H avg, t , G t , and P t ) that are derived from the ERA40 wave and SLP data for the period 1958–2001 (excluding a period of erroneous wave data from January 1992 to May 1993; Caires et al. 2004b). These estimates are denoted as \(\hat{a}, \hat{b},\) and \(\hat{c},\) respectively (\(\hat{x}\) denotes an estimate of x throughout this paper). Both the predictor and predictand series were detrended before estimating these parameters in order to focus the regression on the relationship between non-systematic changes. Note that the predictor trend components were not removed when the fitted regression was subsequently used to obtain the predictand values.

This statistical model exploits the significant SLP-SWH relation that exists in most of the world oceans north of 30°N (Wang and Swail 2006b). We perform our detection analysis only in the areas where the SLP-SWH relationship is significant, which include most of the oceans in the 30°N–70°N band, as shown in Fig. 2. In these selected areas, the relationship \(\hat{H}_{\rm avg,\, {\it t}}=\hat{a} + \hat{b} G_t + \hat{c} P_t\) reasonably well reproduces the observed trend patterns in H avg,t , especially in winter, although it significantly under-represents the magnitude of change (see Fig. 2a, c or b, d). Thus, we think that it is reasonable to use this SLP-SWH relationship to estimate changes in H avg,t for use in a detection analysis. Detection studies rely on the assumption that climate models, or in this case the combination of climate models and statistical downscaling simulate the correct pattern of response to external forcing, but they do not require that these patterns have the correct amplitude (e.g., Hegerl et al. 2007).

Fig. 2
figure 2

The 1958–2001 linear trend patterns of the seasonal mean SWH as derived from the original ERA40 seasonal mean SWH (H avg), the ERA40 and HadSLP2 hindcasts and the multi-model/ensemble mean simulations of H avg (i.e., \(\hat{H}_{\rm avg}).\) Yellow-red shadings and solid contours indicate upward/positive trends, and green-blue shadings and dashed contours downward/negative trends (unit: cm/year). Note that the contour scale for the simulated trends (g, h) is different from the one for the observed trends, as indicated by the contour scale bars. The percentage of grid-boxes that were found to have a significant trend is given in the parentheses above

Having fit the statistical model, we feed the climate model simulated G t and P t into this relationship to estimate climate model simulated SWH variation H avg,t for each season at each wave gridpoint in the selected areas for each of the forced and control simulations used in this study. These estimates are also referred to as the simulated H avg,t . For comparison with the original ERA40 H avg, t values, we also feed the ERA40 G t and P t values into this relationship, obtaining statistical hindcasts \(\hat{H}_{\rm avg,\; {\it t}},\) which are referred to as the ERA40 H avg,t hindcasts hereafter. Further, in order to have proxy SWH “observations” that are somewhat independent of the model used for ERA40, we also feed the HadSLP2 G t and P t values into this relationship, obtaining the HadSLP2 H avg, t hindcasts for the 1955–2004 period.

For our detection analysis, we estimate the linear trends in each of the H avg, t or constructed \(\hat{H}_{\rm avg,\;{\it t}}\) time series, again over the periods 1958–2001 and 1955–2004. The “observed” and multi-model simulated trend patterns for 1958–2001 are shown in Fig. 2. The corresponding 1955–2004 trend patterns (not shown) are similar to those shown in Fig. 2e–h, but with slightly smaller trend.

In general, the trend patterns in hindcast H avg are reasonably well represented in the empirically downscaled climate model output, although the magnitude of trend is significantly under-estimated (see Fig. 2c–h). The trend pattern in the original ERA40 H avg in the North Atlantic is also well simulated, especially in winter (see Fig. 2a, g). However, the climate models significantly under-estimate the magnitude and areal extent of the observed H avg increase in the North Pacific (see Fig. 2a, b, g–h). Such under-estimation is not unexpected given that the multi-model simulated changes in the P t and G t fields are also smaller than observed. In addition, some variance is presumably lost in the process of converting the atmospheric circulation change to a change in H avg using the regression relationship.

2.2.2 Seasonal extreme SWH

Seasonal maxima of SWH can also be derived from the original ERA40 6-hourly wave data, but as with seasonal mean SWH, comparable information is not directly available from climate models. Thus in this case also, we proceed to estimate extreme SWH quantities by means of a statistical downscaling approach following Wang and Swail (2006a, b). This approach uses the non-stationary Generalized Extreme Value model, GEV(μ t , σ, ξ), where the location parameter μ t depends on atmospheric variates G t and P t via the relationship

$$ \mu_t = \mu_o + \gamma_1 G_t + \gamma_2 P_t + \epsilon_t, $$
(2)

while the scale parameter σ and shape parameter ξ are constant over time. Here \(\epsilon_t\) denotes a white noise process. The time varying GEV distribution can be used in two different ways. First, by specifying a fixed size of extreme event, it is possible to estimate how the risk of that event varies as G t and P t vary. Alternatively, a fixed risk could be specified, such as a 1-in-20 year risk of occurrence, and the magnitude of threshold that is exceeded with that fixed risk can be estimated by finding the appropriate quantile of the GEV distribution. Since the location parameter in our statistical model depends upon G t and P t , that threshold then becomes a function of G t and P t as well. We take the latter approach in this study.

The seasonal maximal SWH and seasonal atmospheric quantities G t and P t derived from the ERA40 wave and SLP data are used to estimate the parameters μo, γ1, γ2, σ, and ξ via the method of maximum likelihood. The above GEV model is fitted for each wave gridpoint analyzed and for each season, separately. The fitted model, \({\rm GEV}(\hat{\mu}_t = \hat{\mu}_o + \hat{\gamma}_1 G_t + \hat{\gamma}_2 P_t, \hat{\sigma}, \hat{\xi}),\) is then used to estimate a series of 20-year return values of SWH, H 20y, t , that vary with the atmospheric circulation parameters G t and P t . More specifically, we use the ERA40 G t and P t values in the fitted model to obtain ERA40 hindcasts of H 20y, t , and use the HadSLP2 G t and P t values to obtain HadSLP2 hindcasts of H 20y, t . In order to estimate the H 20y, t response to historical external forcing, we use climate model simulated G t and P t values in the fitted GEV model, obtaining the corresponding downscaled H 20y, t values. This is done for each of the 41 twentieth-century simulations (Table 1), and also for each of the control simulations (Table 2). Linear trend patterns are subsequently estimated from all of the constructed return value time series and used in our detection analysis.

In general, the trend patterns in the seasonal 20-year return values of SWH (H 20y ) in both hindcasts are reasonably well reproduced by the climate models combined with the statistical downscaling, especially in the North Atlantic (compare Fig. 3a–d with e, f). The trend patterns in H 20y are also similar to those of the corresponding H avg trend patterns (see Figs. 2 and 3), which is not surprising given that both are functions of linear combinations of G t and P t .

Fig. 3
figure 3

The same as in Fig. 2 but for the linear trend (in cm/year) patterns of seasonal 20-year return values of SWH. Note that the contour scale for the simulated trends (e, f) is different from the one for the observed trends, as indicated by the contour scale bars

3 Detection analysis

Climate variability refers to variations in the mean state and other statistics (such as the standard deviation, the intensity of extremes, etc.) of the climate on all spatial and temporal scales beyond that of individual weather events (Solomon et al. 2007). Variability is referred to as internal variability if it arises from natural internal processes within the climate system, and as external variability if it is due to external forcing (anthropogenic, or natural such as changes in solar radiation and volcanism). Climate change may be due to internal climate system processes and/or external forcings on the climate system. The objective of climate change detection analysis is to understand how climate changes that result from anthropogenic and natural external forcings may be distinguished from changes and variability that result from internal climate system processes. The spatial and temporal scales used to analyze climate change are carefully chosen so as to focus on the spatio-temporal scale of the response, filter out as much internal variability as possible and enable the separation of the responses to different forcings (Hegerl et al. 2007). In this study, we compare the pattern of observed trends with a multi-model estimate of the trend that is expected to arise from external forcing to determine whether or not external influence is detectable in the pattern of observed trends. We do this by means of the optimal detection approach (Allen and Stott 2003; Hegerl et al. 1997; Hasselmann 1993).

Let Y o denote an estimate of the observed linear trend pattern, and let Y m be the multi-model mean linear trend pattern (i.e., the average of the linear trends in the individual simulations that make up the combined ensemble of 41 simulations from the 9 climate models used in this study; see Table 1). The observed trend Y o consists of a response to historical external forcing and internal variability η with variance-covariance matrix C η. Similarly, Y m consists of the simulated response to the historical external forcing and some internal variability ζ with variance-covariance matrix C ζ. Note that C ζ is small because Y m is a multi-model mean trend pattern and thus much of the internal variability that affects trends in individual simulations has been averaged out. The objective here is to compare the response to forcing in the observations (Y o  − η) with the simulated response (Y m  − ζ). This is accomplished by means of the optimal detection approach, i.e., by fitting with optimization the simulated pattern of trends to the observed pattern of trends as follows:

$$ Y_o = \beta (Y_m - \zeta) + \eta. $$
(3)

Here, optimization means to maximize signal-to-noise ratio by rotating the coordinate space (Hasselmann 1993). Because the data required to estimate C η and C ζ are limited, and because models do not simulate internal variability well on all scales, the analysis is typically performed in a dimension reduced space spanned by some number of leading EOFs (see Table 3; in this study we consider EOF truncations between 2 and 10). The regression coefficient β is estimated by means of the total least squares (TLS) algorithm (Allen and Stott 2003). [Note that the ordinary least squares algorithm is not quite suitable here because it assumes no error term in the explanatory variable, while both the response variable Y o and the explanatory variable Y m in model (3) contain a noise term, η or ζ.]

Table 3 The estimated scaling factor β, its 5–95% uncertainty range (β l , β u ), and the result of the related residual consistency test for each of the listed detection variables as derived from the ERA40 or HadSLP2 dataset

Optimal detection analysis requires knowledge of the internal climate variability. Two independent estimates of the internal variability variance-covariance matrix C η are required in this study: \(\hat{C}_{\eta_1}\) and \(\hat{C}_{\eta_2}.\) These estimates are obtained by pooling control simulation variability with the inter-integration variability found in ensembles of twentieth-century simulations. Details are given in Appendix B. \(\hat{C}_{\eta_1}\) is used in the optimization, while \(\hat{C}_{\eta_2}\) is used in estimating the scaling factor β and to obtain a confidence interval for \(\hat{\beta}.\) Once the regression model has been fitted, the residuals \(\hat{\eta} = Y_o - \hat{Y_o} = Y_o - \hat{\beta} (Y_m - \hat{\zeta})\) can be calculated , so that their variance can be compared with the corresponding simulated internal variability using a standard residual consistency test (Allen and Stott 2003). Inconsistency could occur either because (1) the climate models estimated the forced signal correctly, but under- or over-estimated internal variability; (2) the climate models did not respond correctly to forcing, but did simulate internal variability correctly; or (3) a combination of (1) and (2).

The main results of the above optimal detection analysis can be interpreted as follows: The external influence is detected if the scaling factor β in (3) is estimated to be significantly greater than zero, but not detected if \(\hat{\beta}\) is not significantly different from zero or negative. An estimate \(\hat{\beta}\) that is significantly greater than zero and consistent with unity indicates that the observed and simulated responses are considered to be comparable with each other, in which case it may be possible to attribute the observed trends to the historical external forcing if other plausible explanations (causes) can be ruled out.

This optimal detection approach was applied to determine whether or not external influence on the observed trends in the following variables is detectable: (1) the geostrophic wind energy index G t , (2) the mean pressure field P t , (3) seasonal mean SWH H avg, and (4) the 20-year return value of seasonal extreme SWH H 20y . For each of the four variables, at least two sets of observationally based datasets were available, derived from either ERA40 or HadSLP2, as described earlier in Sect. 2. The detection analysis is carried out for each of these sets of proxy observations in the following detection domains: (a) global (GL: 60°S–80°N), (b) northern hemisphere (NH: 0°N–80°N), (c) North Atlantic (NA: the ocean domain in 20°N–75°N) or (d) southern hemisphere (SH:0°S–60°S).

For GL, NH and SH, the analysis is performed on a 20°-by-60° lat-long grid, using grid-box-area weighted averages of the values at the 48 5°-by-5° gridpoints in each 20°-by-60° grid-box [the longitudes of the 20°-by-60° grid-box-centers (grid-points) are 30°E, 90°E,..., 240°E, 300°E, while the latitudes are 70°S, 50°S,..., 50°N, 70°N]. For the North Atlantic (NA: 20°N–75°N), we carried out the detection analysis on the 5°-by-5° grid over the ocean only (gridpoints over land are excluded). For the analyses on SWH statistics, the detection domain only covers the selected ocean areas in 30°N–70°N (see Sect. 2.3 and Figs. 2, 3).

In order to exclude gridpoints with frequent missing observations from our detection analysis, we use the missing data information obtained from HadSLP2.0, the un-interpolated HadSLP2 product (Allan and Ansell 2006). For any year, a missing value is assigned to all 20°-by-60° grid-boxes that have missing data at 50% or more of their 48 5°-by-5° gridpoints; and a linear trend is estimated only when there are at least 30 years of valid data for a 20°-by-60° gridbox during the period of 1955–2004. The excluded grid-boxes are not shaded in Fig. 1. Note that the same retained grid-boxes are analyzed regardless of whether HadSLP2 or ERA40 is analyzed. The same missing data mask was also applied to the model simulations, so that the observed and simulated quantities are analyzed on the same grid.

The results of our detection analysis are presented and discussed in the next section. Since H avg, t is a linear combination of G t and P t , and H 20y, t is a monotonic function of a linear combination of G t and P t , detection results for H avg, t and H 20y, t are not independent of detection results for G t and P t . Nevertheless, detection analyses on H avg, t and H 20y, t are useful because they allow a synthesis of results on G t and P t into results concerning SWHs.

4 Results of detection analysis

First of all, in order to compare with Gillett et al. (2005), a detection study on the decadal mean DJF SLP fields over the globe (derived from HadSLP2.0), we also carried out our detection analysis using the global decadal means of JFM P t as derived from HadSLP2. We obtained similar results, that is, a scaling factor of 1.21 that is significantly greater than zero but not significantly different from unity [the estimated 5–95% uncertainty range on β is (0.17, 2.33)], and that also passes the residual consistency check. Note that these results are not listed in Table 3, which contains only the results of detection analyses on trends. This corroborates the finding of Gillett et al. (2005) that boreal winter “sea level pressure trends may be attributed to external influence”. The difference in the definition of the boreal winter season (i.e., DJF vs. JFM) between Gillett et al. (2005) and this study does not affect the detection conclusion; neither does the slight difference between using HadSLP2.0 and HadSLP2 (un-interpolated and interpolated versions).

We now turn to detection analysis on the linear trend patterns of geostrophic wind energy index G t , of seasonal mean SLP anomalies P t , and of seasonal mean and extreme SWH. The results, which are summarized in Table 3 and shown in Figs. 4, 5, 6, indicate that there exist detectable external influences on the observed trends of atmospheric storminess and ocean wave heights in boreal winter (JFM) in the past half century, especially for the North Atlantic region. These results are discussed in detail in the subsections below.

Fig. 4
figure 4

The scaling factors (β) in the regressions of the winter (JFM) trend patterns of P t (seasonal mean SLP anomalies), G t (seasonal anomalies of squared SLP gradients), H avg (seasonal mean SWH), and H 20y (seasonal 20-year return values of SWH), separately, on the relevant multi-model mean of simulated trend patterns. The period for calculating the trend is 1955–2004 when the HadSLP2 is used, and 1958–2001 when ERA40 SLP is used. In the dataset labels, “Had” stands for HadSLP2, and “ERA” for ERA40 SLP; the domain of detection is denoted by two letters: GL for global, NH for the Northern Hemisphere, and NA for the North Atlantic; and “org” is used to denote that the original ERA40 wave data (seasonal mean SWH) is used as the proxy of SWH observations. An “R” just under the zero-β line denotes that the climate models under-estimate the internal variability in the trend of the particular variable in the corresponding observed dataset

Fig. 5
figure 5

The same as in Fig. 4 but for the boreal fall (OND) season

Fig. 6
figure 6

As in Fig. 4a but for the austral fall (AMJ) and austral winter (JAS) trends. North Atlantic (NA) results are not shown, but Southern Hemisphere (SH) results are given

4.1 Detection results for boreal winter trends

As shown in Fig. 4, in boreal winter, the scaling factor β is estimated to be significantly greater than zero (i.e., inconsistent with zero) for the NA domain in each case, being consistent across the different variables (G t , P t , H avg, and H 20y ) and across the different observationally based data sets (ERA40 or HadSLP2, original or hindcast wave data; see also Table 3). The detection results are generally also consistent across a range of EOF truncations (see the “#EOFs” column in Table 3), although scaling factors, confidence intervals, and the results of residual consistency test are reported for fixed levels of EOF truncation that were chosen according to the domain and variable of interest. The chosen levels of EOF truncation (Table 3) explain 65–95% of the variance in \(\hat{C}_{\eta_1}.\)

For the NH domain, the detection results are consistent across both SWH variates for both observationally based data sets; but they are not consistent between G t and P t , nor between HadSLP2 and ERA40. For the NH and GL domains, external influence is detectable in the G t trends only when HadSLP2 is used, and in the P t trends only when ERA40 is used. As mentioned before, results of detection on SWH are a synthesis of the detection results on G t and P t . Thus, external influence on SWH could be detected if it is detected on G t or P t or both.

In general, the results of detection on the wave heights are physically consistent with those of detection on the geostrophic wind energy field and/or sea level pressure field. Increases in geostrophic wind energy and/or decreases in sea level pressure are associated with increases in ocean wave heights (both seasonal means and seasonal extremes). The detection results are also consistent with the relative magnitudes of the responses shown in the left panels of Figs. 1, 2, 3.

Nevertheless, the associated 5–95% uncertainty ranges on β do not include unity in most cases, especially when ERA40 SLP or waves are used as observations (see Table 3 and Fig. 4). In these cases, the scaling factor is estimated to be significantly greater than unity, which suggests that the climate models, or the combination of climate models and empirical downscaling models, significantly under-estimate the magnitude of the response of atmospheric storminess or ocean wave heights to the observed changes in external forcing (assuming the observed trends are not systematically over-estimated in the HadSLP2 or ERA40 data). Note that the estimated scaling factor is particularly large when the original ERA40 waves data are used (see the right most dashed bars in Fig. 4b). This is because the climate model simulated SWH is derived statistically using an observed SLP-SWH relationship that has previously been found to under-estimate the variability and trend magnitude of ocean wave heights (see Sect. 2.2). Under-estimation of the observed trend magnitude by climate models would bias high the estimate of the scaling factor β. However, the effect of such under-estimation is reduced when statistical SWH hindcasts are used as observations, because both “observed” and simulated SWH quantities were derived from the same statistical SLP-SWH relationship in such cases. Thus, the scaling factor is somewhat closer to unity when the ERA40 hindcast SWH quantities are used as observations (see Fig. 4b). In general, the estimated β values are still greater than unity in these cases, which arises from under-estimation of the observed trend magnitude of seasonal mean SLP anomalies P t and of geostrophic wind energy index G t by the climate models (especially the latter; see Fig. 4a).

The residual variance was found to be consistent with the corresponding simulated internal variability for all cases of detection for the NA domain when HadSLP2 is used (see Table 3 and Fig. 4a). In contrast, the climate models, or the combinations of climate models and statistical downscaling, were found to under-estimate the observed internal variability in almost all cases of detected influence (except that of P t in NA) when ERA40 is used (Table 3). Such under-estimation weakens the robustness of the detection results.

4.2 Detection results for the transition seasons

External influence is basically not detectable in the boreal fall season (OND). It was not detected for any of the four variates (G t , P t , H avg, and H 20y ) when HadSLP2 is used (see Table 3 and Fig. 5). When ERA40 is used, external influence was not detected in either G t or P t , no matter in which domain; but it was detected in both H avg and H 20y for the NH domain (and in the original ERA40 H avg trends over the NA domain only).

Note that the detection results in boreal fall season differ considerably between the original and hindcast ERA40 wave heights (see Table 3). This is because the observed SWH trend patterns are less well reproduced by the statistical SLP-SWH relationship in this season (see Fig. 2).

In the austral fall season (AMJ), external influence was detected in P t for the SH domain, no matter whether HadSLP2 or ERA40 is used (see Table 3 and Fig. 6a). It was also detected in both G t and P t globally, but only when ERA40 is used. When HadSLP2 is used, external influence is not detectable in either G t or P t , globally or for the NH domain. Thus, in this season, external influence appears to be stronger in the SH than in the NH, and also stronger on P t than on G t . It is detectable only in P t over the southern hemisphere.

4.3 Detection results for austral winter trends

In austral winter (JAS), as shown in Fig. 6b (see also Table 3), external influence on the observed G t trends was detected globally and for the SH domain, no matter whether HadSLP2 or ERA40 is used; but it was not detected for the NH domain. External influence on the observed P t trends is much weaker than on the observed G t trends in this season. It appears to be detectable only when ERA40 is used (and only for the GL domain; see Table 3 and Fig. 6b).

In summary, in this season, external influence is mainly detectable on the geostrophic wind energy field, and is detectable mainly in the southern hemisphere. External influence is not detectable in the northern hemisphere in austral winter.

5 Concluding remarks

In this study, work on the detection of external influence on atmospheric storminess and ocean wave heights is carried out, based on twentieth-century simulations from multiple climate models with combined natural and anthropogenic forcing and statistical downscaling of the corresponding changes in ocean wave heights for the periods 1955–2004 and 1958–2001. The observational data used in this study was obtained from ERA40 (SLP, geostrophic wind energy, and SWHs; Uppala et al. 2005; Caires et al. 2004a) and HadSLP2 (Allan and Ansell 2006).

It has been shown that the observed trend patterns in atmospheric storminess are reasonably well reproduced by the climate models, especially for the North Atlantic in boreal winter. Observed NH ocean wave height trends are also reasonably well reproduced when climate model output is statistically downscaled, although the magnitude of the trends is under-estimated.

In boreal winter, the observed 1955–2004 trend patterns in atmospheric storminess and ocean wave heights are characterized by an upward trend in the high-latitudes (especially the northeast North Atlantic) with a downward trend in the mid-latitudes, which were found to contain a detectable response to a given combination of natural and anthropogenic external forcing.

In general, the results of our detection analysis suggest that, in the past half century, the external forcing has had a detectable influence on trends in atmospheric circulation (including storminess) in the winter hemisphere (i.e., northern hemisphere in JFM and southern hemisphere in JAS), and on trends of NH ocean wave heights in boreal winter. The signal of external influence is weaker in the transition seasons, and is hardly detectable in the northern hemisphere in boreal summer. Climate models generally simulate smaller changes than observed and also appear to under-estimate the internal variability, reducing the robustness of our detection results.

Analyses of triangles of long surface pressure records (e.g., Matulla et al. 2008; Alexandersson et al. 1998, 2000), which are generally located in the coastal region of northern Europe, suggest that there have been earlier, long term variations in atmospheric storminess that are comparable to changes seen at these same locations during the past half century. Whether these earlier changes were associated with a similar large scale pattern of change as detected here remains an open question. It would therefore be of great interest to carry out the detection analysis on storminess trends over a longer period, such as the entire past century.

The HadSLP2 does have data for the entire past century, although the data coverage for the first half of the twentieth century is not as good as during the latter half of the century. However, the number of available climate model simulations is insufficient to confidently estimate the internal variability on the century time scale.

Alternatively, we carried out the optimal detection analysis on the observed 1900–49 G t and P t trend patterns (not shown), comparing the HadSLP2 G t and P t trend patterns with the corresponding multi-model mean trend patterns for this period. Data coverage, however, is limited during this period. In fact, only 9–11 60° × 20° gridboxes satisfy our data coverage criterion (see Sect. 3), with all but two of these gridboxes located in the NH (mostly between 10°N and 50°N, as well as one gridbox between 50°N and 70°N in the North Atlantic in JAS). Thus, the detection domain here is NH only. In boreal winter, the observed 1900–1949 G t trend pattern is characterized by decreases in the mid-latitudes (north of 40°N) of the North Atlantic.

The detection results for our 1900–1949 analysis are summarized in Table 4. In contrast to the results for the latter half of the twentieth century (Table 3), external influence on the observed G t and P t trends over the early half of the twentieth century is not detectable, regardless of season and the number of leading EOFs retained. This maybe because limited data coverage and a weaker signal have conspired to reduce the signal-to-noise ratio in our detection analysis. However, it also suggests that external forcing is less likely to have been an important factor in surface pressure and atmospheric storminess change during the first half of the twentieth century.

Table 4 As in Table 3 but for the results of detection analysis on the 1900–1949 trend patterns