1 Introduction

The ability of coupled atmosphere–ocean general circulation models (CGCMs) to represent coherent patterns, or modes, of interannual variability in the large-scale tropospheric circulation is an important consideration in the use of such models to understand how these modes might change in the future. Of particular importance is the extent to which these modes respond to, or reflect the trend in, projected and observed changes in radiative forcing due to changing greenhouse gas concentrations. A key requirement is the need to separate the ‘signal’, in this case the response to radiative forcing, from the ‘noise’, i.e. internal variability. Aspects of this have been examined in a number of recent studies. The mean projected change has been examined in large ensembles from single models (e.g. Kirtman et al. 2011; Deser et al. 2012; McSweeney et al. 2012) or from multi-model ensembles (e.g. Meehl et al. 2007b; and references therein). Different radiative forcings have been applied to an atmosphere general circulation model (AGCM) (e.g. Deser and Phillips 2009) or CGCM (e.g. Arblaster and Meehl 2006). Time series of the response to radiative forcing can be projected on to current modes of variability (e.g. Brandefelt and Källén 2004; Simpkins and Karpechko 2012), while Branstator and Selten (2009) estimated modes of variability using a large CGCM ensemble. The response to radiative forcings has also been considered using theoretical forcing functions (e.g. Majda et al. 2010). The concept of partitioning the seasonal mean into terms related to external- and internally-forced components (e.g. Zwiers 1996; Rowell 1998) has been used to investigate projected changes in interannual variability of surface temperature and rainfall (e.g. Boer 2009; Hawkins and Sutton 2011).

Here, for the first time, the projected changes in all leading modes of variability are directly estimated in a multi-model CGCM ensemble. This is based on the premise that the seasonal mean of a climate variable, e.g. geopotential height, can be considered as a random variable, with a seasonal ‘population’ mean and departures from that mean (Zheng and Frederiksen 2004; Frederiksen and Zheng 2007b; and references therein). Zheng and Frederiksen (2004) referred to these two components as the slow and intraseasonal components of the seasonal mean. Madden (1976) used daily climate data in the frequency domain to estimate the interannual variance of the slow and intraseasonal components. Zheng et al. (2000, 2004) formulated methods using monthly climate data in the time domain. Zheng and Frederiksen (2004) extended the latter method to estimate the covariance matrix of each component. This allows empirical orthogonal functions (EOFs) to be calculated, and these represent the modes of interannual variability in each component. Frederiksen and Zheng (2007b) showed the equivalence of modes estimated using the frequency or time domain methods.

Using an ensemble of model realisations, Zheng and Frederiksen (1999) and Zheng et al. (2004) included the partitioning used by Zwiers (1996) and Rowell (1998). This allows the estimation of the interannual variance in three components of the seasonal mean. They are: (1) an intraseasonal component related to internal dynamics on intraseasonal time scales, (2) a slow-internal component related to internal dynamics on slowly varying (interannual or longer) time scales, and (3) a slow-external component related to external forcings. Zheng et al. (2009) extended the method of Zheng et al. (2004) to estimate the modes of interannual variability in each component using EOF analysis. Zheng et al. (2009) and Grainger et al. (2011b; hereinafter G11) used this method to examine the modes of interannual variability of atmospheric circulation in AGCMs. The prescription of sea surface temperature (SST) meant that the slow-external component in those studies was related to variations in both SST and radiative forcing. Here, an ensemble of CGCM realisations is examined, meaning that the slow-external component will only be related to variations in radiative forcing.

Frederiksen and Zheng (2007a; hereinafter FZ07) estimated the dominant modes of interannual variability in the Southern Hemisphere (SH) 500 hPa geopotential height, commonly used to represent the atmospheric circulation, for December–January–February (DJF) and June–July–August (JJA) using the National Centers for Environmental Prediction (NCEP) reanalysis dataset (Kalnay et al. 1996). Both the intraseasonal and slow components were considered, as the intraseasonal component explains a significant fraction of the extratropical variability of the atmospheric circulation (Zheng et al. 2004). It is well understood that statistical modes of variability are not the same as dynamical or physical modes (e.g. Frederiksen and Frederiksen 1993a; Monahan et al. 2009; and references therein). However, FZ07 found that their modes of variability had spatial structures similar to many dynamical modes.

The leading mode of the intraseasonal component has a zonally symmetric annular structure resembling the Southern Annular Mode (SAM). Frederiksen and Frederiksen (1993b) also found annular structures on intraseasonal time scales that they attributed to barotropic instability. SAM has also been shown to be generated through wave-interactions with the zonal flow (e.g. Limpasuvan and Hartmann 2000). Secondary modes have mid-latitude wave-4 (DJF) or wave-3 (JJA) patterns, similar to those seen in other reanalysis studies (e.g. Kidson 1999). Such structures may be generated by internal instability of the atmospheric flow (e.g. Frederiksen and Frederiksen 1993b), or as meridional wave trains associated with the Madden Julian Oscillation (e.g. Frederiksen and Frederiksen 1993a). The regions of maximum loading are also associated with regions of persistent blocking events (e.g. Sinclair 1996), particularly in the South Pacific.

FZ07 found that their leading mode of the slow component also had a SAM-like pattern, but with a pronounced protrusion into the South Pacific that has also been seen in reanalysis studies (e.g. Kidson 1999; Fogt et al. 2011). L’Heureux and Thompson (2006) found that the El Niño-Southern Oscillation (ENSO) influences the wave activity related to SAM, while Fogt et al. (2011) attributed the zonally asymmetric features of SAM to these interactions with ENSO. The next two modes were found to be related to ENSO variability. They have spatial structures similar to the Pacific-South American (PSA) modes found in other reanalysis studies (e.g. Mo and Higgins 1998; Mo 2000), with a wave propagating from the tropics to high latitudes. The fourth mode is similar to the South Pacific Wave (e.g. Kidson 1999). Similar fast-growing stationary modes were found by Frederiksen and Frederiksen (1996) that they related to the north–south gradient in Indian Ocean SST.

The aim of this paper is twofold. First, the modes of interannual variability in the SH atmospheric circulation in CGCMs from the Coupled Model Intercomparison Project phase 3 (CMIP3) dataset (Meehl et al. 2007a) are assessed for the second half of the twentieth century. CMIP3 data is used so that the assessment can be compared with the large body of literature (e.g. Randall et al. 2007; and references therein) on the representation of climate processes in CMIP3 models. The modes of variability in the intraseasonal and slow components are compared against those estimated using reanalysis data from the Twentieth Century Reanalysis (20CR) project (Compo et al. 2011). Then, CMIP3 models that represent relatively well the 20CR modes are used to examine the projected changes for the second half of the twenty-first century. Projected changes in the intraseasonal, slow-internal and slow-external components are all examined.

The outline of this paper is as follows. The reanalysis and CMIP3 model data used are described in Sect. 2. Section 3 describes the methods used to estimate the modes of variability, and to assess the model modes against reanalysis dataset modes. An assessment of the CMIP3 models against reanalysis data for the second half of the twentieth century is given in Sect. 4. Section 5 gives the projected changes for the second half of the twenty-first century. Conclusions are given in Sect. 6.

2 Data

As in FZ07 and G11, the SH atmospheric circulation is represented by monthly mean 500 hPa geopotential height in summer (DJF) and winter (JJA). For the modes of variability in the slow, slow-internal and slow-external components we are also interested in their relationship with global SST.

In order to compare the CMIP3 models with reanalysis data (see Sect. 3.2), all 500 hPa geopotential height data are mapped onto the same 2.5° × 2.5° longitude/latitude grid, then sub-sampled to 5° × 5°. This is thinned towards the South Pole, as in FZ07 and G11, so that the data is approximately weighted by area. All SST data are mapped onto the same 2° × 2° latitude/longitude grid. Before analysis (see Sect. 3.1), the annual cycle is removed by subtracting the climatological monthly mean.

2.1 Reanalysis data

The reanalysis monthly mean SH 500 hPa geopotential height used is from the 20CR dataset for the period 1951–2000. The 20CR project uses a recent AGCM and data assimilation system to generate an ensemble of forecasts of the atmospheric circulation using only surface pressure observations and monthly SST and sea–ice—full details are in Compo et al. (2011). Although satellite data are not used, the quality of the 20CR is at least comparable to other reanalysis products (e.g. Compo et al. 2011; Stachnik and Schumacher 2011). The SST data used are from the HadISST dataset (Rayner et al. 2003).

2.2 CMIP3 data

Monthly mean 500 hPa geopotential height from the CMIP3 dataset has been obtained for the last 50 years of the 20c3m experiment, and for the second half of the twenty-first century in the Special Report on Emission Scenarios (SRES) B1, A1B and A2 experiments (Meehl et al. 2007a). The models to be assessed are summarised in Table 1. In total, there are 70 20c3m realisations from 23 models. Surface skin temperature over oceans is used to represent model SST.

Table 1 CMIP3 models and the number of 20c3m realisations used in this study

3 Methodology

3.1 Modes of variability

The principles and assumptions used by the methodology have been detailed in previous papers (e.g. Frederiksen and Zheng 2007b; and references therein). Here a brief summary is provided. Consider a climate variable, x (e.g. geopotential height), from which the annual cycle has been removed. A time series of x, in this case monthly mean anomalies, is considered to be represented as

$$ x_{sym} (i) = \beta_{y} (i) + \delta_{sy} (i) + \varepsilon_{sym} (i) , $$
(1)

where y = (1, …, Y) is the year index in a sample of Y years, m = (1, 2, 3) is the month index within a season, s = (1, …, S) is the realisation index in an ensemble of size S and i = (1, …, I) is the index of I geographical locations (e.g. grid points). β y is the slow-external component, δ sy is the slow-internal component and ε sym is the residual monthly departure of x sym from the slow-external and slow-internal components. The seasonal mean can then be written as

$$ x_{syo} (i) = \beta_{y} (i) + \delta_{sy} (i) + \varepsilon_{syo} (i), $$
(2)

where the subscript o denotes the average over an index (s, y or m). ε syo is associated with variability within the season and has been called the intraseasonal component of the seasonal mean (Zheng and Frederiksen 2004).

Given monthly mean anomalies, Zheng and Frederiksen (2004) and Zheng et al. (2009) respectively showed that interannual covariance matrices for the components of the seasonal mean can be estimated for reanalysis data, or a single model realisation, and model ensembles. More details are given in the “Appendix”. EOF analysis is used to estimate the modes of interannual variability for each component. The terms used in this paper, and their meaning, are summarised in Table 2.

Table 2 Definitions of terms used in this paper to describe the modes of interannual variability in the components of the seasonal mean

For the S-EOFs, as defined in Table 2, the relationship with SST is considered using the covariance between the slow components in both the associated time series of the S-EOF and the SST time series. This is calculated at each SST grid point using the methodology of Grainger et al. (2011a), and is described here as the slow SST-height covariance. Analogous definitions are used for the slow-internal and slow-external SST-height covariances.

3.2 Model assessment

As detailed in the “Appendix”, it is not possible to use reanalysis data to estimate the covariance matrices for the slow-external and slow-internal components. Therefore only the CMIP3 20c3m I- and S-EOFs, as defined in Table 2, are assessed against the 20CR EOFs. Since all 500 hPa geopotential height and SST data have been mapped to the same grid, the variance of the EOFs (i.e. the eigenvalues) are directly comparable and pattern correlations of the spatial structures can be calculated (Grainger et al. 2008). Based on the principles of Taylor (2001), scores for how well a model I- or S-EOF reproduces a 20CR EOF are respectively defined as

$$ M_{\varepsilon } = \frac{2\left| R \right|}{{\left( {\frac{{\hat{V}_{\varepsilon }^{'} }}{{\hat{V}_{\varepsilon } }} + \frac{{\hat{V}_{\varepsilon } }}{{\hat{V}_{\varepsilon }^{'} }}} \right)}} $$
(3)

and

$$ M_{\mu } = \frac{{\left| R \right|\left( {1 + R_{SST} } \right)^{2} }}{{2\left( {\frac{{\hat{V}_{\mu }^{'} }}{{\hat{V}_{\mu } }} + \frac{{\hat{V}_{\mu } }}{{\hat{V}_{\mu }^{'} }}} \right)}}, $$
(4)

where \( \hat{V}_{\varepsilon } \) and \( \hat{V}_{\mu } \) are the estimated variances of the 20CR I- and S-EOFs, \( \hat{V}_{\varepsilon }^{'} \) and \( \hat{V}_{\mu }^{'} \) are the estimated variances of the model I- and S-EOFs, R is the pattern correlation between the model and 20CR EOFs (the absolute value is used since the sign of an EOF is arbitrary) and R SST is the pattern correlation, over a specified region, between the model and 20CR slow SST-height covariances.

For a set of model I-EOFs and S-EOFs, estimated from either a single realisation or an ensemble, the following procedure is used to find a set of 1–1 ‘best matches’ to the 20CR EOFs:

  1. 1.

    For the leading N 20CR I- or S-EOFs, find the permutation of N model modes that maximises the structure component of the score. That is, the permutation which maximises the sum over N EOFs of the numerator in Eqs. (3) or (4). This is the initial set of model ‘best match’ EOFs.

  2. 2.

    For each 20CR I- or S-EOF, check for any higher order, i.e. >N, model EOF that has a higher score according to Eqs. (3) or (4). Change the model ‘best match’ to the higher order EOF, and flag for further checking.

  3. 3.

    For each 20CR I- or S-EOF, also check whether any model EOF has a score that is at least 80 % of the current ‘best match’, and flag these for further checking.

  4. 4.

    For the model ‘best matches’ flagged above, subjectively inspect the EOFs to obtain the final set of ‘best matches’. In practice changes were made to 10–15 % of the original, objective, ‘best matches’.

It is convenient to consider the estimated standard deviation of an EOF, which by definition here is the square root of the eigenvalue. It is also useful to define the model estimated standard deviation relative to a 20CR EOF by

$$ \sigma^{*} = \frac{{\sqrt {\hat{V}^{'} } }}{{\sqrt {\hat{V}} }}, $$
(5)

where \( \hat{V} \) is the estimated variance of the 20CR EOF (i.e. \( \hat{V}_{\varepsilon } \) or \( \hat{V}_{\mu } \)) and \( \hat{V}^{'} \) is the estimated variance (\( \hat{V}_{\varepsilon }^{'} \) or \( \hat{V}_{\mu }^{'} \)) of the model EOF.

4 Assessment

4.1 20CR modes of variability

The leading four I-EOFs of the 20CR SH 500 hPa geopotential height for DJF and JJA 1951–2000 are shown in Fig. 1, and Fig. 2 shows the leading three S-EOFs for the same period and their slow SST-height covariance with HadISST SST. The spatial structure and physical interpretation of these modes is discussed in detail in FZ07 and summarised in the Introduction, and need not be repeated here. However, it is worth pointing out that the pronounced loading in the South Pacific of the 20CR JJA S-EOF1 (Fig. 2c) is consistent with the concept that there is zonal asymmetry as a result of SAM-ENSO interactions (Fogt et al. 2011). In contrast, I-EOF1 in both seasons is zonally symmetric (Fig. 1a, b), reminiscent of annular structures due to barotropic instability (Frederiksen and Frederiksen 1993b).

Fig. 1
figure 1

Leading four I-EOFs of 20CR SH 500 hPa geopotential height for a DJF and b JJA 1951–2000. EOFs are normalised to unit length. The estimated standard deviation (m) and variance explained (%) are given to the right of each EOF

Fig. 2
figure 2

a Leading three S-EOFs of 20CR SH 500 hPa geopotential height for DJF 1951–2000. b Slow SST-height covariance with HadISST SST for the S-EOFs in a. c, d as in a, b but for JJA. EOFs are normalised to unit length. The estimated standard deviation (m) and variance explained (%) are given to the right of each EOF

Qualitatively the 20CR modes are similar to those estimated using NCEP reanalysis data from the periods 1949–2002 (FZ07) or 1951–2000 (G11). This is summarised in Table 3, which shows the estimated standard deviation and percentage variance explained in the two datasets for the period 1951–2000, and the EOF pattern correlations between the datasets. However, there are some features that have implications for model assessment.

Table 3 Estimated standard deviation (m) and variance explained (%) for the leading four I- and S-EOFs from the 20CR and NCEP reanalysis DJF and JJA SH 500 hPa geopotential height for the period 1951–2000

In both reanalysis datasets, the variance explained of I-EOF3 and -EOF4 is similar, suggesting that they are degenerate. The pattern correlations are generally lower when compared with those for I-EOF1 and -EOF2 (Table 3). It also suggests that I-EOF3 and -EOF4 will be subject to sampling error and may be less well reproduced in the CMIP3 models.

In contrast, the leading S-EOFs are not degenerate, and so this should not be a source of error when assessing models. Previous studies (e.g. Mo and Higgins 1998; Mo 2000) found that the leading two PSA modes were degenerate when the total seasonal mean covariance (T-EOFs; see Table 2) was considered. However, FZ07 showed that the NCEP reanalysis T-EOFs are very highly correlated with the S-EOFs, and moderately correlated with the I-EOFs. When the covariance of the intraseasonal component is removed, the resulting S-EOFs were better related to slowly varying processes than the T-EOFs. Here, for example, the 20CR JJA T-EOF2 and -EOF3 (not shown) are degenerate. But JJA T-EOF3 is well correlated (0.604) with I-EOF2, and when I-EOF2 is in effect removed from the covariance matrix, JJA S-EOF3 is much better separated from S-EOF2.

There is a much clearer separation between the variance explained by S-EOF3 and -EOF4 in the 20CR than in the NCEP reanalysis, particularly in JJA (Table 3). The leading three 20CR S-EOFs explain 74.5 % and 71.1 % of the covariance of the slow component in DJF and JJA respectively. Consequently, in this study the CMIP3 models will only be assessed against these leading three modes.

The NCEP reanalysis S-EOFs have higher estimated standard deviations than the 20CR, particularly for S-EOF1. Bromwich and Fogt (2004) found that the NCEP reanalysis has a bias and artificial linear trend at SH high latitudes. It is possible that this may result in higher interannual variability. The choice of reanalysis dataset may affect which models have the ‘better’ estimated standard deviation. But models that fail to reproduce the spatial structure of the S-EOFs, or fail to correctly estimate the standard deviation, will have lower scores in Eq. (4).

4.2 Intraseasonal component modes

The EOF pattern correlation, |R|, and relative standard deviation, σ*, in the CMIP3 models for the leading four 20CR I-EOFs are shown in Figs. 3 and 4 for DJF and JJA respectively. For models with multiple realisations (Table 1), I-EOFs estimated using both individual realisations and the model ensemble are assessed.

Fig. 3
figure 3

For the CMIP3 models ad the EOF pattern correlation, |R|, and eh the relative standard deviation, σ*, to the leading four 20CR DJF I-EOFs. For each model, the ensemble estimate (filled square) and range over realisations (whiskers) are plotted. The median over the ensemble estimates is shown by the dashed line. The number of realisations for each model is given above (a) and (e)

Fig. 4
figure 4

As in Fig. 3, but for the leading four 20CR JJA I-EOFs

In DJF, the models best reproduce the spatial structure of the 20CR I-EOF1, with a median value of about 0.9 for |R| in the ensemble estimates (filled square in Fig. 3a). However, σ* is generally too low, with a median value of about 0.85 (Fig. 3e). The other modes all have lower values of \( \left| R \right| \) than for the leading mode (Fig. 3b–d) and their values of σ* are generally slightly <1.0 (Fig. 3f–h). These modes are generally present, but not necessarily in the same order as the 20CR, suggesting that they are degenerate in the CMIP3 models.

In JJA, the models also best reproduce the spatial structure of the 20CR I-EOF1 (Fig. 4a). The second mode is well separated in percentage variance explained from the other modes in the CMIP3 models. The median value of |R| of about 0.8 with respect to the 20CR I-EOF2 (Fig. 4b), is much higher than the other mid-latitude wave modes in either season. σ* is typically slightly > 1.0 in the CMIP3 models for all modes (Fig. 4e–h), with highest values occurring relative to 20CR I-EOF4 (Fig. 4h).

For individual models, the ensemble estimate generally reproduces the 20CR I-EOFs better than most realisations. The value of |R| in the ensemble estimates is typically at the upper end of the range over individual realisations. The value of σ* is typically about that of the mean over individual realisations. Overall, the intraseasonal modes are generally well reproduced in the CMIP3 models.

4.3 Slow component modes

For the CMIP3 models the pattern correlations, respectively |R| and R SST for the S-EOFs and SST-height covariances, and relative standard deviation, σ*, are calculated with respect to the leading three 20CR S-EOFs. R SST for each mode is calculated over regions of high slow SST-height covariance seen in Fig. 2b, d. The chosen region for each mode is somewhat arbitrary. However they are consistent with the regions selected by Zheng and Frederiksen (2007) as optimal SST-based predictors for statistical seasonal forecasting of SH 500 hPa geopotential height.

4.3.1 First mode

The ability of the CMIP3 models to reproduce the 20CR S-EOF1 is summarised in Fig. 5. R SST is calculated over the region 60°S–30°S. In DJF, the values of |R| and R SST (Fig. 5a, b) are high across most models. The model mode has an annular structure, and the median ensemble value of |R| is about 0.8 (Fig. 5a). Highest values of |R| are found in GISS-EH, GISS-ER and UKMO-HadGEM1, while CSIRO-Mk3.5 stands out as having the lowest pattern correlations. The majority of models have values of σ* that are > 1.0 (Fig. 5c). However, there are two models, CSIRO-Mk3.5 and INGV-SXG, that have particularly low values, and reproduce the 20CR S-EOF1 relatively poorly. Consistent with this, Sen Gupta et al. (2009) found that CSIRO-Mk3.5 and INGV-SXG have a poleward bias in the latitude of maximum zonal wind stress with respect to other CMIP3 models.

Fig. 5
figure 5

For the CMIP3 models a the EOF pattern correlation |R|, b the pattern correlation between the slow SST-height covariances, R SST , calculated over the region 60°S–30°S, and c the relative standard deviation, σ*, to the 20CR DJF S-EOF1. df Are as in ac but for JJA. For each model, the ensemble estimate (filled square) and range over realisations (whiskers) are plotted. The median over the ensemble estimates is shown by the dashed line. The number of realisations for each model is given above (a) and (d)

In JJA, the median ensemble value of |R| is about 0.55 (Fig. 5d). The model mode usually has an annular structure, and does not reproduce the zonal asymmetry of the 20CR S-EOF1 (Fig. 2c). All models except MIROC3.2(medres) have low values of R SST (Fig. 5e). There is a wide range of values of σ* (Fig. 5f), particularly across individual realisations, although the median is close to 1.0. There does not appear to be any systematic differences between CMIP3 models in their reproduction of the 20CR JJA S-EOF1.

For each model, the ensemble estimate generally lies within the range of the individual realisations. The values of |R| and R SST are typically at the upper end of the range. However, the value of σ* is often towards the lower end of the range. Similar behaviour for all three diagnostics is also seen for the second and third modes (see Figs. 6, 7).

Fig. 6
figure 6

As in Fig. 5, but for the 20CR S-EOF2 for DJF and JJA. In b and e, R SST is calculated over the region 60°S–20°N

Fig. 7
figure 7

As in Fig. 5, but for the 20CR S-EOF3 for DJF and JJA. In b and e, R SST is calculated over the region 30°S–20°N, 90°E–70°W

4.3.2 Second mode

Figure 6 summarises the ability of the CMIP3 models to reproduce the 20CR S-EOF2. R SST is calculated over the region 60°S–20°N. Similar to the leading mode, in DJF the value of |R| is high in most models, with a median value of about 0.7 (Fig. 6a). However there are more models with lower values of R SST (Fig. 6b) and σ* (Fig. 6c). Models that reproduce the EOF spatial structure relatively well are more likely to have above-median values of R SST . The median ensemble value of σ* is just <1.0 (Fig. 6c). However, the values of σ* in FGOALS-g1.0 are about 2.0, and about 0.6 in BCCR–BCM2.0 and the three GISS models.

In JJA, the median ensemble value for |R| of about 0.4 (Fig. 6d) is much lower than in DJF. The values of R SST are generally only slightly lower (Fig. 6e), with a median ensemble value of about 0.6. Except for FGOALS-g1.0, σ* is generally lower in JJA (Fig. 6f) than in DJF.

A number of models (e.g. ECHAM5/MPI-OM, both GFDL models and both CSIRO models) have above-median pattern correlations in both seasons, and have values of σ* close to 1.0. Most of these models are also considered to have ‘realistic’ oceanic ENSO variability in other studies (e.g. van Oldenborgh et al. 2005; Guilyardi 2006). In contrast, GISS-AOM and GISS-ER, which under-perform here, have been identified as having no oceanic ENSO variability (van Oldenborgh et al. 2005). Guilyardi (2006) identified GISS–EH as having much lower ENSO-related atmosphere–ocean coupling strength than in other CMIP3 models or observations, and this model also under-performs here. FGOALS-g1.0 was identified by van Oldenborgh et al. (2005) as having large ENSO amplitude and a sharp spectral peak centred on 3 years. Here, the FGOALS-g1.0 S-EOF1 is clearly identified as the ‘best match’ to 20CR S-EOF2 in all cases.

4.3.3 Third mode

The ability of the CMIP3 models to reproduce the 20CR S-EOF3 is summarised in Fig. 7. R SST is calculated over the region bounded by 30°S–20°N and 90°E–70°W. Most models have a mode analogous to the 20CR S-EOF3. In contrast to the two leading modes, this mode is often represented by a higher order S-EOF. Consequently, the standard deviation of the mode is under-estimated, with a median value of σ* of about 0.6 in both seasons (Fig. 7c, f). Values of |R| are also lower than for the two leading modes, with medians of about 0.4 in both seasons (Fig. 7a, d). Values of R SST are similar to the two leading modes, with medians of about 0.6 in DJF (Fig. 7b) and 0.7 in JJA (Fig. 7e).

Overall, the CMIP3 models reproduce the 20CR SEOF-3 less well than they do the two leading modes. However, it should be noted that models that reproduce the second mode relatively well are more likely to also do so with the third. As mentioned in Sect. 4.3.2, these models are typically those considered to have ‘realistic’ oceanic ENSO variability.

4.4 Model overall score

It is useful to quantify how well models reproduce overall the modes of variability in the twentieth century, and to identify differences between models. Here, we have found that the CMIP3 models generally reproduce well the modes of variability of the intraseasonal component of SH 500 hPa geopotential height. So our focus will be on the modes of variability in the slow component. For this purpose, we define an overall score for each model in each season by

$$ {\text{Overall Score}} \equiv \frac{1}{3}\sum\limits_{n = 1}^{3} {(M_{\mu } )_{n} } , $$
(6)

where (M μ ) n is the score estimated from Eq. (4) for the model ensemble S-EOF corresponding to the nth 20CR S-EOF. This is shown in Fig. 8 for the CMIP3 models.

Fig. 8
figure 8

Absolute Overall Score for CMIP3 models for DJF and JJA calculated using the ensemble estimates. The seasonal median overall scores are shown by the dashed lines. The number of realisations for each model is given above the plot

It is clear from Fig. 8 that all CMIP3 models perform better in DJF than in JJA. This is primarily due to the higher EOF pattern correlations for the two leading modes (compare Figs. 5a, 6a with Figs. 5b, 6b). However, in both seasons there are clear differences between models, as indicated by the spread about the median Overall Score. From this it is possible to categorise the CMIP3 model performance.

Five models reproduce the 20CR S-EOFs relatively well, with Overall Scores well ‘above median’ in both seasons. They are CSIRO-Mk3.0, ECHAM5/MPI–OM, GFDL–CM2.0, GFDL–CM2.1 and MRI–CGCM2.3.2. There are six models that are ‘below median’ in both seasons. The three GISS models were shown to have poor representations of ENSO variability (Sect. 4.3.2) and BCCR–BCM2.0 shows similar behaviour. INGV–SXG and INM–CM3.0 appear to be relatively poor across all three modes. The remaining 12 models fall into a broad category of ‘about median’. Some (CCSM3, CGCM3.1(T63), IPSL–CM4 and UKMO-HadGEM1) perform relatively well only in DJF. Others (CNRM-CM3, MIROC3.2(medres), PCM and UKMO–HadCM3) perform better, i.e. above the model median, in JJA. The categorisation of CGCM3.1(T47) and MIROC3.2 (hires) as ‘about median’ is marginal, since they are below median in DJF and about median in JJA. CSIRO–Mk3.5 and FGOALS–g1.0 behave differently from the other CMIP3 models. They both have very good spatial structures for the modes related to ENSO variability but are penalised by poor values of σ* for the first mode in CSIRO-Mk3.5 and for the second mode in FGOALS–g1.0. However, when compared with the ‘above median’ models, all ‘about median’ models are more likely to show the deficiencies described in Sect. 4.3.

It is reasonable to use the model ensemble S-EOFs for assessment of the overall performance. An alternative model Overall Score could be calculated by applying Eq. (6) to the individual realisations, and then averaging. In DJF, the ensemble Overall Score (Fig. 8) exceeds this realisation-average Overall Score (not shown) in all 16 CMIP3 models (Table 1) with multiple realisations. In JJA, the ensemble Overall Score is higher in 11 models, but is only lower than all realisations in UKMO–HadCM3. Including the intraseasonal modes in the overall score is unlikely to change the categorisation. For example, see Grainger et al. (2010) for realisation-average Overall Scores calculated over the leading four NCEP reanalysis I- and S-EOFs.

Although there are some subjective aspects of the ‘best match’ selection method (Sect. 3.2), the CMIP3 model categorisation is generally robust. Choosing plausible alternative ‘best match’ EOFs or regions for calculating R SST does not greatly affect the categorisation, although the ranking of models within each category may change. Finally, it should be noted that the assessment method only shows which models are relatively good or poor at reproducing twentieth century variability. It will not explain why this is the case, and the required diagnostic studies are beyond the scope of this work.

5 Projections

We now consider how the modes of interannual variability are projected to change in the CMIP3 SRES experiments. With an ensemble of model realisations, it is possible to estimate the interannual covariance of the intraseasonal, slow-internal and slow-external components and consequently the changes in their modes.

There has been much recent discussion on issues relating to model selection and the use of multi-model ensembles in projection studies (e.g. Knutti et al. 2010; Weigel et al. 2010; McSweeney et al. 2012; and references therein). Here we will use an ensemble based on a selection of CMIP3 models that are suitable for further analysis. We choose these on this basis of two requirements. The first is that the models reproduce ‘reasonably well’ the leading modes of variability of the slow component in both seasons. Based on Sect. 4.4, this will mostly likely be those CMIP3 models categorised as ‘above median’.

The second requirement is that the models have the ‘correct’ external forcings. Differences have been found in the twenty-first century projected SH atmospheric circulation depending on whether or not an ozone recovery is prescribed (Son et al. 2008). McSweeney et al. (2012) found that twentieth century regional precipitation was more realistic in a flux-adjusted model as a result of the more realistic SST. However, they argued that it was not clear that future precipitation is more realistic or reliable in flux-adjusted models than in non-flux adjusted models. Table 1 lists the CMIP3 models that have climatological ozone or apply flux-adjustment. When the results in Sect. 4 are grouped by external forcing (not shown), the only clear difference is lower values of R SST and σ* for DJF S-EOF1 in the six models with climatogical ozone and no flux-adjustment. No systematic differences in model performance were found. Nevertheless, only those models that have a prescribed ozone trend and recovery, as documented by Son et al. (2010), are considered for selection here.

Based on these requirements an ensemble, hereinafter denoted ‘ENS4’, is created from the four ‘above median’ CMIP3 models that have a prescribed ozone trend and recovery. They are: CSIRO–Mk3.0, ECHAM5/MPI–OM, GFDL–CM2.0 and GFDL–CM2.1. One realisation from each model is used. CSIRO-Mk3.0 and the two GFDL models have only one realisation of the SRES experiments, so for the 20c3m the realisation that initialised those experiments is used. For ECHAM5/MPI–OM, realisation 2 was selected for the 20c3m and SRES experiments, as it has the highest Overall Score of the individual 20c3m realisations (not shown).

5.1 ENS4 twentieth century modes

The spatial structure of the ENS4 20c3m I-, S-EOFs and slow SST-height covariances (not shown) are similar to the 20CR (Figs. 1, 2). The pattern correlations for ENS4 are usually well above the ensemble median values shown in Figs. 3, 4, 5, 6 and 7 and the values of σ* are usually closer to 1.0. The Overall Score for ENS4 in both DJF (0.565) and JJA (0.420) exceeds that of the best individual model, GFDL–CM2.0 (0.500) and CSIRO–Mk3.0 (0.395) respectively.

Figure 9 shows the leading three SI-EOFs, as defined in Table 2, from ENS4 20c3m for DJF and JJA 1951–2000, along with the slow-internal SST-height covariance with ENS4 SST. In both seasons the spatial structure of these modes are similar to the corresponding modes in the slow component in ENS4. When compared with the 20CR (Fig. 2), Fig. 9 shows some of the deficiencies that are general to the CMIP3 models. The structure of the leading mode for JJA is too annular and the positive slow-internal SST-height covariance for the second mode extends too far westwards in the tropical Pacific Ocean. The percentage variance explained by SI-EOF1 (Table 4) and S-EOF1 (not shown) is much more than that explained by the 20CR S-EOF1 (Table 3). This is particularly true in DJF, where the ENS4 SI-EOF1 and -EOF2 together explain 81.7 % of the variance (Table 4). The ENS4 DJF SI-EOF3 has a low estimated standard deviation and the 4.8 % of the variance explained is much lower than that of the 20CR S-EOF3 (11.3 %).

Fig. 9
figure 9

a Leading three SI-EOFs of the ENS4 20c3m DJF SH 500 hPa geopotential height for 1951–2000. b Slow-internal SST-height covariance with ENS4 SST for the SI-EOFs in a. c and d as in a and b but for JJA. EOFs are normalised to unit length. The estimated standard deviation (m) and variance explained (%) are given to the right of each EOF

Table 4 Estimated standard deviation (m), and variance explained (%) for the leading I-, SI-EOFs and SE-EOF1 from the ENS4 20c3m for DJF and JJA 1951–2000

The leading SE–EOF, as defined in Table 2, in the ENS4 20c3m is shown in Fig. 10 along with the slow-external SST-height covariance with ENS4 SST. In DJF, the SAM-like structure (Fig. 10a; left) has been seen in other studies (e.g. Arblaster and Meehl 2006; Deser and Phillips 2009; G11) investigating trends in the twentieth century atmospheric circulation. The estimated standard deviation in ENS4 (141.4) is within the range (132.5–154.4) of the ‘trend mode’ in G11 of the three AGCMs forced by SST, CO2 and Ozone. In JJA, the structure (Fig. 10a; right) is similar to the ENSO-like structure of the ‘trend mode’ described by G11. However, the estimated standard deviation in ENS4 (95.0) is much lower than the range (125.7–170.6) in G11 over all five AGCMs. SE-EOF1 explains 53.5 % of the variance in DJF and 40.2 % in JJA, and the trend in the associated time series is statistically significant at the 99 % level. All this suggests that in this ensemble there is a coherent response to observed changes in greenhouse gas concentrations. The slow-external SST-height covariance (Fig. 10b) is positive almost everywhere, and the structure is similar to linear trends in ENS4 ensemble mean SST (not shown) and twentieth century observed SST (e.g. Rayner et al. 2003).

Fig. 10
figure 10

a SE-EOF1 of the ENS4 20c3m SH 500 hPa geopotential height for DJF (left) and JJA (right) 1951–2000. b Slow-external SST-height covariance with ENS4 SST for the SE-EOFs in a. EOFs are normalised to unit length. The estimated standard deviation (m) and variance explained (%) are given to the right of each EOF

5.2 ENS4 twenty-first century modes

When the I-EOFs from the ENS4 SRES B1, A1B and A2 experiments for DJF and JJA 2051–2100 (not shown) are compared against the leading four 20c3m I-EOFs, there are minimal changes in either the spatial structure or standard deviation in either season. Table 4 gives a summary, including the EOF pattern correlation with respect to the ENS4 20c3m I-EOFs. The estimated standard deviations indicate that in the SRES experiments, the third and fourth modes in DJF are swapped relative to the 20c3m, as are the second and third modes in JJA. However, pattern correlations generally remain high. There is a consistent increase across all SRES experiments in the estimated standard deviation of the leading mode in JJA, but the variance explained only increases by just over 2 %.

Figure 11 shows the leading SI-EOFs in the ENS4 SRES A1B experiment for DJF and JJA 2051–2100. Since ENS4 DJF SI-EOF3 in the 20c3m experiment is very weak, projected changes in this mode are not examined. While there are some subtle regional differences, most notably in JJA SI-EOF3, the large-scale spatial structure is largely unchanged, as indicated by the high pattern correlations with respect to ENS4 20c3m (Table 4). This is also true for the SRES B1 and A2 experiments (SI-EOFs not shown, but see Table 4). The large-scale spatial structure of the slow-internal SST-height covariance (not shown) is also generally unchanged across all three experiments. In DJF, the estimated standard deviations of the two leading modes are the same or slightly lower across all three experiments, although the total variance explained is unchanged, ranging over 80–85 %. In JJA, the estimated standard deviation of the two leading modes is higher in all experiments. There are corresponding increases in the percentage variance explained (Table 4), which are larger than those seen in the I-EOFs. The estimated standard deviation of the JJA SI-EOF3 is lower in the SRES B1 and A1B experiments than in the 20c3m.

Fig. 11
figure 11

Leading SI-EOFs of the ENS4 SRES A1B experiment SH 500 hPa geopotential height for a DJF and b JJA 2051–2100. EOFs are normalised to unit length. The estimated standard deviation (m) and variance explained (%) are given to the right of each EOF

SE-EOF1 of the ENS4 SRES B1, A1B and A2 experiments is shown in Fig. 12 and is summarised in the bottom two rows of Table 4. There is a much larger change in the model response to radiative forcings than in the other components. In both seasons, the spatial structure of the EOF becomes nearly uniform. With increasing greenhouse gas concentrations, the estimated standard deviation and percentage variance explained by this mode also increases (Table 4), and the spatial structure becomes more uniform (Fig. 12). The increases in estimated standard deviation are much larger than the differences in the intraseasonal and slow-internal modes (Table 4). The slow-external SST-height covariances (not shown) are positive everywhere, and have the largest values in the SRES A2 experiment. The trend in the associated time series is statistically significant at the 99 % level in all cases. The spatial structures of the EOF and slow-external SST-height covariance are similar to the structures of the linear trends in ENS4 ensemble mean SH 500 hPa geopotential height and SST respectively (not shown). The changes in spatial structure and magnitude are consistent with the projected expansion of the Hadley Cell (e.g. Lu et al. 2008; and references therein).

Fig. 12
figure 12

As in Fig. 11, but for the leading SE-EOF of the ENS4 SRES B1, SRES A1B and SRES A2 experiment SH 500 hPa geopotential height for DJF and JJA 2051–2100

6 Conclusions

In this paper coherent patterns, or modes, of interannual variability in the components of the seasonal mean SH 500 hPa geopotential height for summer (DJF) and winter (JJA) were estimated using the CMIP3 dataset. For the first time, modes of variability related to radiative forcing have been separated from those related to intraseasonal and slowly-varying internal dynamics. The modes of variability in the intraseasonal and slow components in the second half of the twentieth century were first assessed against those estimated using 20CR data. Diagnostics were defined to evaluate the spatial structure and variance of the model modes. For modes of variability in the slow component, the relationship with SST was also assessed. Individual realisations and the model ensemble were both examined. Next an ensemble from four CMIP3 models, found to be suitable for further analysis, was used to directly estimate projected changes in all leading modes of variability of the intraseasonal, slow-internal and slow-external components for the second half of the twenty-first century. Our key findings are:

  1. 1.

    The leading four modes of variability in the intraseasonal component are generally well reproduced in the CMIP3 models in both seasons.

  2. 2.

    There are clear differences between models in their reproduction of the leading three modes of variability in the slow component. The modes are generally better reproduced in DJF than in JJA. The behaviour of the CMIP3 models here is consistent with other studies examining the SAM and ENSO variability.

  3. 3.

    An overall score is calculated using the leading three modes of variability in the slow component. Clear differences are found between the CMIP3 models, allowing a categorisation of their performance. The model ensemble estimates are suitable for assessing individual modes and for categorising model performance.

  4. 4.

    In the twentieth century, an ensemble based on four suitable CMIP3 models outperforms all individual CMIP3 models in both seasons. The modes of variability in the slow-internal component have similar spatial structures to the corresponding modes in the slow component. The leading mode of variability in the slow-external component explains about half of the variance of this component. The spatial structure of this mode is consistent with other studies of the effect of changes in greenhouse gas concentrations on the twentieth century atmospheric circulation, and there is a statistically significant trend in the associated time series.

  5. 5.

    In the CMIP3 ensemble, there are small changes in the leading modes of variability in the intraseasonal component in the second half of the twenty-first century. The behaviour is consistent across the three SRES experiments. Changes in the variance and percentage explained are larger in the modes of variability in the slow-internal component, and there are subtle regional-scale changes in the spatial structure of these modes.

  6. 6.

    By far the largest changes are in the leading mode of variability in the slow-external component. The spatial structure changes from annular (DJF) or ENSO-like (JJA) in the twentieth century to almost uniform in the twenty-first century. The variance and percentage explained increases with increasing greenhouse gas concentrations.

In future work, the assessment of and projected changes in the modes of variability in the recently released CMIP5 dataset will be examined. Of interest will be any improvement in performance relative to CMIP3, and any differences in the projected changes. The larger number of models in the CMIP5 dataset, and that their external forcings should be consistent, may enable an investigation of the effect of model-dependency on the projected changes.