1 Introduction

There is growing interest among researchers, policy-makers, businesses, and the general public in the potential impacts of climate change and to what extent undesirable consequences of climate change can be mitigated by reducing anthropogenic greenhouse gas (GHG) and specifically carbon emissions. Many governments around the world are formulating climate policies, including pledges made in the context of the recent Copenhagen Accord. Both the Copenhagen Accord and earlier a statement of the Major Economies Forum mention a maximum increase of global mean temperature of 2 K as a desirable goal for international climate policy (Copenhagen Accord 2009; Major Economies Forum 2009). A better understanding of the scientific and policy challenges involved in mitigation depends, among other things, on model projections of future climate and its variability. Integrated assessment models (IAMs, e.g. Clarke et al. 2010; Edenhofer et al. 2010) and Earth system Models of Intermediate Complexity (EMICs) (e.g. van Vuuren et al. 2008; Plattner et al. 2008) have previously been used to explore various mitigation scenarios, and projected climate uncertainty in the absence of emissions mitigation policy has also recently been evaluated based on an integrated global system model (Sokolov et al. 2009; hereinafter referred to as S2009), but the simplified representation of the climate system in these models still leaves many questions unanswered.

The Intergovernmental Panel on Climate Change (IPCC) 4th Assessment Report (AR4), in particular the Working Group 1 report (Solomon et al. 2007), provided a comprehensive review of the understanding of potential climate change under a range of future scenarios based on state-of-the-art complex climate models. However, the scenarios used (SRES A1B, A2 and B1; Nakicenovic and Swart 2000) only explore emission pathways in the absence of climate policy and are, therefore, not consistent with the ambitious climate targets currently being discussed. There is a need also to start to explore the consequences of low GHG emission scenarios using complex climate models. This may involve new experimental strategies, for instance because the forcing signal will be smaller than in earlier climate modelling experiments.

Some earlier work using complex climate models has been performed. May (2008) performed an idealised simulation with the ECHAM5/MPI-OM climate model, specifying GHG concentrations and the anthropogenic aerosol load designed to achieve the 2 K target by fixing concentrations of well-mixed GHGs from year 2020 onwards and rapidly relaxing the stratospheric ozone concentrations and sulfate aerosol loading towards their year 2100 values according to the SRES A1B scenario over the period 2020–2036. The future climate changes associated with this stabilization study show many of the typical features of previous climate change simulations with stronger forcings, but with somewhat weaker magnitudes. May (2008) notes, however, that some changes during the stabilization phase are relatively strong with respect to the magnitude of the simulated global warming, for instance the pronounced warming and sea-ice reduction in the Arctic region, the strengthening of the meridional temperature gradient between the tropical upper troposphere and the extratropical lower stratosphere and the general increase in precipitation.

In this paper, we present new results from a climate change experiment simulating the period 1860–2100 conducted within ENSEMBLES (Hewitt and Griggs 2004), a project in the European Union Sixth Framework Programme. The present study focuses on a multi-model ensemble analysis, complementing results already published from one of the comprehensive models involved (Roeckner et al. 2010).

The ENSEMBLES Stream 2 (ES2) experiment uses a new experimental design that provides an opportunity to explore certain simulations and analysis proposed for the 5th IPCC assessment (Hibbard et al. 2007), hereinafter referred to as the CMIP5 experiment. In the ES2 experiment, climate models are driven by GHG concentration and air pollution forcing data and data on land use change, derived from runs of IAMs. The climate models calculate the consequences for twenty-first century climate of these forcings, but in addition all models that include an integrated carbon cycle (CC) component also record the implied (or “allowable”) carbon emissions as a direct output of the experiment. Using this approach, a newly developed policy-relevant climate stabilization scenario is run in a multi-model ensemble with the latest generation of comprehensive models available in Europe, to begin to address the challenge posed above. It should be noted that including land use data into this climate model comparison experiment with low and high forcing pathways is also a novel aspect.

The models involved in this study have generally been improved compared to their previous-generation models that contributed to the IPCC AR4, through the inclusion or further development of aerosol schemes, carbon cycle models, variable vegetation cover, etc. Five out of the ten climate models include an integrated CC component and are able to report allowable emissions. Most models have reached a level of complexity prohibiting a large ensemble of perturbed initial condition simulations with each model with current computational resources. A large initial condition ensemble of simulations with each model would help to average out potentially large internal variability within the models, increasing the statistical robustness of the results, but the focus of this study is on projections of climate change for this century, and the conclusions are based on model results typically averaged over decades. The relatively small ensemble of simulations presented here should be sufficient to ensure that results are not overly sensitive to “weather noise” in the models.

Two alternative futures are simulated with each model. The first is a baseline scenario without climate mitigation policy (SRES A1B), and the second an aggressive mitigation pathway which aims at stabilizing the anthropogenic radiative forcing to that of an equivalent carbon dioxide concentration (CO2-e) of around 450 ppmv, a level reached during the twenty-second century (Lowe et al. 2009). This second scenario was designed specifically for ENSEMBLES. All models simulate changes in climate for these two pathways and some of the models diagnose carbon fluxes between atmosphere, land surface and ocean. While the allowable carbon emissions for mitigation pathways have previously been explored with integrated assessment models (IAMs) (e.g. van Vuuren et al. 2007), and EMICs (e.g. Plattner et al. 2008), ES2 allows a multi-model inter-comparison of allowable emissions using complex GCMs including CC components. The analysis in the present study is limited to global mean carbon fluxes and reservoirs, but a related study (D. Bernie, personal communication) will extend this basic analysis to examine regional/seasonal patterns of carbon cycle response and climate change with the aim of identifying robust regional carbon cycle changes and aspects of physical change which are the dominant drivers in the models and simulations introduced here.

Given the similarities between the ES2 experiments and those envisioned in CMIP5 (Taylor et al. 2009; Moss et al. 2010) using a set of representative GHG concentration pathways (RCPs), the current study is pioneering the CMIP5 work and setting the stage for a closer collaboration on scenarios between the IAM and climate model groups, in which much more attention is paid to data exchange between the two communities, something also characteristic of the RCP work (see Moss et al. 2010).

In the remainder of this paper we first detail the ES2 experimental design and multi-model descriptions (Sect. 2). In Sect. 3 we present analysis of the multi-model results, mostly on a global scale and in annual mean terms. Finally (Sect. 4), we review the main conclusions from this study, drawing some lessons of possible relevance to the modelling community engaging in CMIP5 long term experiments.

2 Experiments and models

2.1 ENSEMBLES Stream 2 (ES2) experimental design

The main objectives of the ES2 experiment are to use coupled atmosphere-ocean general circulation models (GCMs) and Earth system models (ESMs) to simulate the evolution of the Earth system from 1860 to 2100 under the two contrasting anthropogenic forcing scenario assumptions mentioned in the introduction (A1B and E1), and to seek to quantify and understand both differences and uncertainties in the resulting model simulations. Land-use change is incorporated as an additional specified anthropogenic forcing (in models that can support it) as this is considered potentially important for regional climate. The forcing path “E1” was specifically designed for the ES2 experiment. (Note that during the ES2 research, the RCP scenarios for CMIP5 were not yet available.)

The IPCC-SRES scenarios (Nakicenovic and Swart 2000), which have been extensively used for climate and impact modelling, explore different possible pathways for future GHG emissions. These scenarios do not explicitly include climate mitigation policy, so differences result in the scenarios from varying degrees of globalization, the role of environmental and social policy, economic and population growth, and the rate of technology development. Within the SRES set, the A1B scenario forms a medium-high emission scenario driven by high economic growth, strong globalization and rapid technology development. The scenario also assumes a material-intensive lifestyle so energy consumption grows rapidly despite population growth being relatively low (the population peaks around 9 billion in 2050 and declines to around 7 billion in 2100). The energy supply has a balance between fossil fuel and non-fossil fuel sources. The A1B scenario has been chosen as the baseline scenario for the ES2 simulations because it provides overlap with earlier climate modelling work. Long-term trends in historical emissions over the last 2 decades are consistent with those depicted in the SRES scenarios (van Vuuren and Riahi 2008; Le Quéré et al. 2009). The concentration data for GHGs are taken from IPCC (2001) Appendix II (Bern model calculations), while the fields for ozone are calculated on the basis of emissions (see scenario forcings below).

The experiment contrasts the A1B baseline with a corresponding aggressive mitigation scenario E1 (Lowe et al. 2009) developed with the IMAGE 2.4 IAM. Meinshausen et al. (2006) indicate that stabilization of GHG concentrations at 450 ppmv (CO2-equivalent, or CO2-e) would provide a 20–75% probability of stabilizing temperatures below a 2 K warming target. Starting from an A1B baseline, a “peaking” scenario was developed which initially peaks at around 530 ppmv CO2-e and then decreases gradually to approach 450 ppmv from above during the twenty-second century. Den Elzen and van Vuuren (2007) show that peaking scenarios may be preferable to stabilization scenarios, on the basis of cost-effectiveness considerations, for reaching long-term temperature targets. The GHG concentration data for this experiment has been calculated using the IMAGE model (see also scenario forcings below).

Long control simulations with fixed pre-industrial (1860) conditions are generally used to provide well-balanced initial conditions (taken from selected points in the control simulation) for the transient simulations. For the 1860 to 2000 (present day) period, the 20C3M (twentieth Century in Coupled Climate Models) model run specifies anthropogenic forcings (GHGs, aerosols as concentrations or precursor emissions, ozone, and land use change), in most cases without any variation in natural (solar and volcanic) forcings. Some additional 20C3M simulations were also conducted including solar and volcanic forcings as specified in previous AR4 simulations, to allow a better comparison with observed changes for validation purposes, and two models only ran 20C3M simulations with anthropogenic plus solar and volcanic forcing (Table 1). The multi-model analysis in this paper amalgamates a mixture of anthropogenic-only and anthropogenic-plus-natural 20C3M simulations. Multiple 20C3M simulations with the same model and forcings differ only in the choice of initial conditions. A1B and E1 scenario simulations were initialised from year 2000 in 20C3M simulations, which provide different initial conditions in cases where multiple simulations were run.Footnote 1

Table 1 ENSEMBLES Stream 2 multi-model summary

Consistent with the previous phase (CMIP3) and the next phase (CMIP5) of the coupled model intercomparison project (CMIP) the ES2 experimental design dictates that all model simulations (with both GCMs and ESMs) are driven with atmospheric GHG concentrations (specifically CO2), making the concentrations pathway a controlled variable—i.e. the same for all model simulations of a given scenario.

2.2 Scenario forcings

Footnote 2

2.2.1 Overall description of the scenarios

For the A1B scenario, the official A1B SRES marker was used given that it was used extensively in earlier model experiments, and thus allowed for better comparison. The E1 scenario, in contrast, was newly developed using the IMAGE IAM. The IMAGE model simulates in detail the energy system, land use and carbon cycle (MNP 2006; van Vuuren et al. 2007). Emissions and the energy system are described for 17 world regions. Land use is modelled both at the regional scale and at 0.5 × 0.5 degrees. To develop the E1 scenario on the basis of the climate-policy free IMAGE A1B scenario, a price on GHG emissions was introduced in the model, targeting a greenhouse gas concentration of 450 ppm CO2-e shortly after 2100. The IMAGE A1B is somewhat different from the A1B SRES marker scenario as it has been developed with a different model and has also been updated against new information (van Vuuren et al. 2007). The GHG price introduced in the system to represent climate policy induces changes to the energy system, non-CO2 gases and carbon plantations. An increase in agricultural productivity, slowing down of deforestation rates, and allowance for greater bio-energy production were also included (see also Lowe et al. 2009).

The data for twenty-first century emissions and concentrations were harmonized to reported 2000 values (consistent with the historical 1850–2000 period; Nakicenovic and Swart 2000, see also Appendix II of IPCC 2001). The data output files include emissions and concentrations for CO2, CH4 N2O, halogenated species, SO2, NOx, VOC and CO. For emissions, harmonization was done with the mean of available inventories for 2000 emissions (see van Vuuren et al. 2008). For both emissions and concentrations, harmonization was done by multiplying the original output with a scaling factor that for the year 2000 equals the harmonized data divided by the 2000 IMAGE output. These scaling factors were assumed to linearly converge to 1 in 2100 (van Vuuren et al. 2008). For air pollutants, the data were also made available on a 0.5° × 0.5° grid. The temporal resolution is every 5 years. The radiative forcing resulting from all the halogenated species except CFC12 has been converted into a CFC11 concentration giving the same radiative forcing. In addition to emissions, land use data on a 0.5° × 0.5° grid are also output.

2.2.2 Greenhouse gases

For the twenty-first century A1B forcing the reported GHG concentrations from the SRES A1B marker were used (Appendix II of IPCC 2001). The E1 data has been taken from the new IMAGE model runs. The E1 scenario has an emissions peak around 2020 and eventually stabilizes at 450 ppmv CO2-e in the twenty-second century. Due to relatively low mitigation costs for non-CO2 emissions from land use (including land-fills and sewage), emissions from this sector are strongly reduced after 2010 and most of the maximum reduction potential is already reached in 2050, the most important reductions coming from animals, wetland rice, landfills and sewage (CH4), and animal waste and fertilizer (N2O).

The resulting series of GHG concentrations of the new scenarios produced by the IMAGE model and interpolated to annual resolution are illustrated in Fig. 1, in which they are compared with the SRES A1B, A2 and B1 marker scenarios. The concentrations in the A1B-IMAGE scenario are higher than those from the A1B-SRES scenario (the latter was used to force IPCC AR4 simulations and the A1B simulations which are the focus of this paper). The GHG concentrations resulting from the new stabilization scenario (E1) are smaller than those in the B1 scenario, except for CH4 for which they are similar.

Fig. 1
figure 1

Evolution of CO2 (a), CH4 (b), N2O (c), and CFC (d) greenhouse gas concentrations for the historical period 1860–2000 (20C3M) and for the different scenarios. A2, A1B and B1 are the IPCC SRES marker scenarios, A1B-IMAGE and E1 are the ES2 scenarios produced by the IMAGE integrated assessment model. The black curves represent historical concentrations (observations) followed by the A1B scenario and the coloured curves alternative scenarios

For a policy-free scenario, our focus in this study is principally on A1B-SRES (marker) rather than A1B-IMAGE. We next compare CO2-e and the associated radiative forcing in A1B-SRES and E1 with two RCPs to be used in the forthcoming CMIP5 experiments (Fig. 2). For the low forcing cases, there is a very close correspondence between E1 and RCP 3-PD (van Vuuren et al. 2007)—both of which stabilize at a forcing close to 3 W/m2. For the higher forcing cases, RCP 8.5 (Riahi et al. 2007) lies well above A1B-SRES in the latter decades of the twenty-first century. The late twenty-first century forcing in A1B-IMAGE (not shown) is higher than A1B-SRES and tracks RCP 8.5 within about 10%.

Fig. 2
figure 2

Global mean CO2-equivalent concentration (top) and corresponding radiative forcing (bottom) used to drive the ENSEMBLES S2 simulations for the A1B and E1 scenarios. Corresponding profiles for the CMIP5 representative concentration pathways (RCPs) stabilizing at 2.6 W/m2 (RCP 3-PD; van Vuuren et al. 2007) and 8.5 W/m2 (Riahi et al. 2007) are also shown for comparison. The radiative forcing corresponds to the given CO2-equivalent; it does not include aerosol, ozone or land use change induced forcings

2.2.3 Aerosols

Sulfate aerosol concentrations were specified as a forcing in the majority of models. For the A1B scenario the concentrations used were those previously used to drive IPCC AR4 models. As E1 is a new scenario, gridded emissions of precursors of sulfates from the IMAGE E1 scenario were used to compute 3-D sulfate aerosol concentration maps by running the same chemistry-transport model (CTM) (Boucher and Pham 2002) for E1 as previously used to compute the SRES A1B scenario sulfate concentrations to drive IPCC AR4 models (O. Boucher, personal communication 2008). Three models were forced instead by aerosol precursor emissions corresponding to SRES A1B and E1 and derived their own on-line concentrations interactively (see Sect. 2.4 below).

Whereas in the IPCC SRES A1B simulation the total sulfate aerosol burden computed by the CTM (Fig. 3; solid lines) increases strongly in the first part of the twenty-first century to reach a peak in 2020 and decreases rapidly afterwards, the new A1B-IMAGE baseline scenario simulation shows a decrease from 2000, but both reach about the same level by the end of the century. The E1 scenario produces a much more rapid decrease and returns to near pre-industrial levels by 2100, consistent with improvements in air quality and the reduction in fossil fuel burning.

Fig. 3
figure 3

Time evolution of the total sulfate aerosol burden (TgS) simulated by the offline chemistry-transport model (CTM; solid lines), based on historical emissions (20C3M), according to IPCC SRES marker scenario A1B, and the IMAGE scenarios for A1B (A1B-IMAGE) and E1. Corresponding burdens simulated by the three models (HadGEM2-AO, HadCM3C, EGMAM+) which calculated sulfate aerosols online from emissions are also shown. Note that the CTM data is linearly interpolated between a few data points, but data for the three other models is a full time series of simulated annual means

Biomass burning and fossil fuel black carbon aerosol emissions were not computed in the IMAGE version used to generate sulfate aerosol precursor emissions, so were derived for E1 by simple pattern scaling of A1B emissions used in previous HadGEM1 simulations (Stott et al. 2006) by the ratio of harmonized E1 to A1B CO emissions.

2.2.4 Ozone concentrations

Ozone data were computed using the University of Oslo chemistry transport model CTM2 (Sovde et al. 2008), which has a horizontal resolution of T21, and a vertical resolution of 60 layers with a top at 0.1 hPa. Monthly mean global gridded 3-dimensional data for the 20C3M period, SRES A1B and E1 scenarios were computed for the years 1850, 1900, 1950, 1980, 2000, 2050 and 2100. Relative humidity was assumed to remain constant for these simulations. The effect of future temperature change on ozone chemistry was included in the A1B scenario by using average monthly mean temperature anomalies for 2091–2100 with respect to 1991–2000, these being provided by simulations with an earlier version of the EGMAM model (which has a detailed stratosphere; see below for EGMAM + further model development). The resulting ozone zonal fields for July 2050 and 2100 in the A1B scenario (not shown) indicate that the correction for the future GHG-induced cooling of the stratosphere produces a small increase in the tropical ozone maximum.

2.2.5 Land-use changes

The inclusion of land-use forcing is also a rather novel aspect of the ES2 experiment. So far, only the LUCID project has paid considerable attention to including land-use data into a model comparison experiment project (Pitman et al. 2009). The land-use data used here for the historical period (1740–1992) was taken from the LUCID project (Pitman et al. 2009). It is based on the crop dataset of Ramankutty and Foley (1999), and pasture data from the HYDE dataset (Klein Goldewijk 2001), in combination providing a fraction of grid-cell covered by crop and pasture on a 0.5° × 0.5° global grid for each year. In the ES2 simulations models used their natural vegetation map as a background and changed only the crop and pasture fraction as provided by this dataset.

For the 2000–2100 period, the IMAGE model provided gridded crop fraction maps for 19 crop types for the E1 and A1B-IMAGE scenarios. An anomaly method was used for both A1B and E1, taking into account only changes in land use computed by the IMAGE scenario, in order to interpolate smoothly to the observed land use maps in 1992. If no change was found, then the extent of crop and/or pasture for the decade was set equal to the extent of the previous decade (i.e. the one derived from the historical databases for 1990 if year 2000 is under consideration). If the crop fraction computed by IMAGE changed by an amount \( \Updelta L_{crop}^{y} = L_{crop}^{y} - L_{{cr\hat{o}p}}^{1992} \) for year y with respect to 1992, this fraction was updated to:

$$ F_{crop}^{y} = \max \left( {\min \left( {F_{crop}^{1992} + \Updelta L_{crop}^{y} ,1} \right),0} \right) $$
(1)

Similarly, if the change in pasture fraction for year y with respect to 1992 was \( \Updelta L_{pasture}^{y} = L_{pasture}^{y} - L_{pasture}^{1992} \), the pasture fraction was recomputed as:

$$ F_{pasture}^{y} = \max \left( {\min \left( {F_{pasture}^{1992} + \Updelta L_{pasture}^{y} ,1 - F_{crop}^{y} } \right),0} \right), $$
(2)

Equations 1 and 2 ensure that the values of \( F_{crop}^{y} \), \( F_{pasture}^{y} \) and their sum lie between 0 and 1. The sum of the crop and pasture fractions \( F_{crop}^{y} + F_{pasture}^{y} \) determines the amount of natural vegetation cover that can exist for year y in a given grid cell. These crop and pasture reference datasets and method used to blend them into undisturbed vegetation maps for different models are described in de Noblet-Ducoudré and Peterschmitt (2007).

The evolution of the land use fraction on different continents in the reference dataset is illustrated in Fig. 4 (note that A1B-IMAGE data were used to drive the A1B-SRES simulations described later, and due to the blending procedure land use fractions actually realized in different models vary somewhat from the reference dataset). The observed historical trend toward an increase of the land use fraction over most continents is stopped and reversed in the twenty-first century in A1B as a result of stabilizing (or even declining) population levels and continued improvements in crop yield. South America and Africa are exceptions, where population increases and changing global trade patterns are the main drivers of continued land use expansion. In contrast, land use starts to increase again more generally during the twenty-first century in E1 due to the need for new agricultural land for bio-energy production. Increased land use for bio-energy would tend to increase carbon emissions, so this aspect of E1 may be sensitive to carbon pricing policy assumptions in the IMAGE model.

Fig. 4
figure 4

Time series of the annual land use fractions (the sum of crop plus pasture fraction) averaged over six different continental regions. The LUCID (Pitman et al. 2009) database observations have been used from 1850 to 2000 (20C3M), and updated using the evolution of the land-cover provided by the IMAGE simulations for the two scenarios A1B-IMAGE and E1 over the twenty-first century

2.3 Model descriptions and simulations performed

The models participating in ES2 are generally improved or extended versions of models that contributed to IPCC AR4 (through improvements to core physical schemes, inclusion or improvement of aerosol, carbon cycle and variable land vegetation cover components). The key features of the different models and the simulations performed are listed at Table 1.

All the climate models have a similar structure in which an atmospheric GCM is coupled to an ocean GCM incorporating a sea-ice model. The atmospheric GCMs can be grouped into two families according to the numerical method used in their dynamical core: grid-point models based on finite-difference methods for the horizontal solution of the dynamical equations (HadGEM2-AO, HadCM3C, IPSL-CM4, IPSL-CM4-LOOP), and spectral models using a spherical harmonics representation of the horizontal fields (ECHAM5-C, EGMAM+, INGVCE, CNRM-CM3.3, BCM2, BCM-C). In the spectral models the resolution is expressed by the maximum wavenumber represented in a triangular truncation in wavenumber space. The truncations used here are either low resolution (T30 or T31) or medium resolution (T63). The spectral models used can be further grouped into two sub-families: those derived from versions of the ECHAM model (ECHAM5-C, EGMAM+, INGVCE), and those derived from the ARPEGE-Climat model (CNRM-CM3.3, BCM2, BCM-C). A brief description of each model now follows.

2.3.1 METOHC: HadGEM2-AO model

The HadGEM2-AO model is based on the HadGEM1 model used in IPCC AR4, described by Johns et al. (2006), but contains several improvements and modifications as described in Collins et al. (2008). The representation of aerosol processes is notably improved (Bellouin et al. 2007) and both secondary organic aerosol and mineral dust are now included. The direct radiative effect of all aerosol species (which include black carbon), plus the first and second indirect radiative effects of sulfate, sea-salt and biomass aerosol are all included. The cumulus convection parametrization is a revised version of HadGEM1’s mass flux scheme (Martin et al. 2006) which includes separately diagnosed deep and shallow convection, parameterized entrainment and detrainment rates for shallow convection, a convective momentum transport (CMT) parameterization based on flux-gradient relationships, and a convective anvil scheme. In HadGEM2-AO, an adaptive detrainment parametrization for deep convection (Derbyshire et al. 2010) is also introduced leading to significant improvements in diabatic heating profiles and moist processes. Boundary layer and land surface process parametrizations are refined. In the ocean, Laplacian viscosity function is revised, leading to lower viscosity in the tropics. Further, the ocean background vertical diffusivity is lowered in the upper 1,000 m, leading to reduced mixing with cooler water at depth, raising sea surface temperatures compared to HadGEM1. Land use change is applied through modified fractions of crop and pasture types in the land surface classification.

2.3.2 METOHC: HadCM3C model

The HadCM3C model (B. Booth, personal communication) is a modified configuration of the HadCM3 model (Gordon et al. 2000; Pope et al. 2000) used in the IPCC Third and Fourth Assessments. Unlike HadCM3, it is flux adjusted and includes interactive terrestrial vegetation and an ocean carbon cycle. Externally imposed (anthropogenic) land use change cannot currently be included. The model differs from HadCM3LC (Cox et al. 2000), the coupled carbon cycle climate model submitted to C4MIP (Friedlingstein et al. 2006), as it is configured to run with the standard (higher) HadCM3 resolution ocean (1.25° × 1.25°). The cumulus convection parametrization (as for HadCM3) is a mass flux scheme (Gregory and Rowntree 1990) including convective downdraughts (Gregory and Allen 1991) and CMT scheme (Gregory et al. 1997). HadCM3C also includes interactive atmospheric sulfur cycle chemistry and sulfate aerosol scheme including the direct and first indirect, “cloud albedo”, aerosol effects (following Jones et al. 2001; note that the second indirect, “cloud lifetime”, effect is excluded).

2.3.3 IPSL: IPSL-CM4 and IPSL-CM4-LOOP models

The IPSL-CM4 coupled ocean-atmosphere GCM (Marti et al. 2010) was used previously in IPCC AR4 and its main components are the following: LMDZ4 atmosphere (Hourdin et al. 2006); ORCHIDEE land and vegetation (Krinner et al. 2005); OPA8.2 ocean (Madec et al. 1999); LIM sea ice (Timmermann et al. 2005); and OASIS3 coupler (Valcke 2006). The version used here contains some improvements: the horizontal resolution has been increased (Marti et al. 2010) and land use change can be externally imposed. The cumulus convection parametrization is based on the Emanuel (1991, 1993) mass flux scheme, and convective clouds are represented through a log-normal probability distribution function of sub-grid scale total (vapor and condensed) water (Bony and Emanuel 2001). As in Dufresne et al. (2005), sulfate aerosols concentrations are externally imposed and direct and indirect aerosol forcings are considered.

IPSL-CM4-LOOP (Cadule et al. 2009) comprises a coupling between the IPSL-CM4 model and two carbon cycle models: PISCES (Pelagic Interactions Scheme for Carbon and Ecosystems Studies) biogeochemical model (Aumont et al. 2003) for the ocean part, and ORCHIDEE (ORganizing Carbon and Hydrology in Dynamic EcosytEms) model for the terrestrial part (Krinner et al. 2005). IPSL-CM4-LOOP has a cold bias over continents in the high northern latitudes, attributable to the coupling with terrestrial CC. With the CC activated, leaf area index (LAI) is computed rather than prescribed as in IPSL-CM4. The positive snow-albedo feedback at high latitudes is thought to be too strong due to an error in the leaf albedo, which amplifies the smaller cold bias present in IPSL-CM4. In turn, the enhanced snow-albedo feedback tends to increase the warming response to a given radiative forcing in IPSL-CM4-LOOP compared with IPSL-CM4.

2.3.4 MPI + DMI: ECHAM5-C model

ECHAM5-C is a low-resolution version of the Max Planck Institute for Meteorology Earth System Model (MPI-ESM), consisting of models for the atmosphere including the land surface (T31L19), the ocean including sea ice, and the marine and terrestrial carbon cycles (3°L40). The atmospheric component (ECHAM5; Roeckner et al. 2006) has been coupled to the MPI-OM ocean model (Marsland et al. 2003) by exchanging daily mean fluxes of heat, water and momentum, and the state of the ocean surface, respectively. No flux adjustments are employed. Details on coupling method and simulated climatology can be found in Jungclaus et al. (2006). The cumulus convection parameterization of ECHAM5 is based on the Tiedtke (1989) scheme, modified by Nordeng (1994) for deep convection. The bulk mass flux scheme parameterizes the contribution of cumulus convection to the large scale budgets of heat, moisture and momentum by an ensemble of clouds consisting of updrafts and downdrafts in a steady state. Cloud base mass flux depends on moisture convergence below cloud base for shallow and mid level convection, and on CAPE adjustment for deep convection. The carbon cycle model coupled to ECHAM5/MPI-OM comprises the ocean biogeochemistry model HAMOCC5 (Maier-Reimer et al. 2005) and the modular land surface scheme JSBACH (Raddatz et al. 2007). Soil carbon is partitioned into a pool with a short turnover time (about 1 year) and one with a long turnover time (about 100 years). It is released to the atmosphere by heterotrophic respiration, which depends linearly on soil moisture and exponentially on soil temperature. Vegetation is differentiated according to five natural phenotypes (evergreen, summergreen, raingreen forest, shrubland, grassland) and managed (non-forest) areas.

2.3.5 FUB: EGMAM + model

The modified coupled atmosphere-ocean GCM ECHO-G with Middle Atmosphere Model EGMAM (Huebener et al. 2007) is based on ECHO-G (Legutke and Voss 1999), which couples ECHAM4 (Roeckner et al. 1996) at a horizontal resolution of T30 via OASIS2.4 with the Hamburg Ocean Primitive Equation-Global Model (HOPE-G; Wolff et al. 1997) at a horizontal resolution of 0.5–2.8° (with refinement near the equator) and 20 vertical layers. It includes a dynamic-thermodynamic sea ice model and time constant flux correction for heat and freshwater exchange. With the extension to middle atmosphere up to 0.01 hpa (ca. 80 km) the model has 39 vertical layers and a gravity wave parameterization (Manzini and McFarlane 1998). The model includes an interactive aerosol transport scheme (Feichter et al. 1996), changing land use (crop, pasture) and a time-varying 3d ozone field. The aerosol scheme includes as prognostic species dimethyl sulfide and sulfur dioxide gases, and sulfate aerosol. The direct aerosol effect of backscattering of shortwave radiation, and the impact of sulfate aerosol on cloud albedo (first indirect effect) are represented. Cumulus convection is parameterized following a mass flux scheme (Tiedtke 1989) modified by Nordeng (1994) for deep convection, as in the ECHAM5 model. Land use changes are implemented by changing the leaf area index and vegetation fraction as well as the forest fraction, while other surface parameters remain unchanged.

2.3.6 INGV + CMCC: INGVCE model

The INGV-CMCC Earth System Model (INGVCE) consists of an atmosphere-ocean-sea ice physical core coupled to a land-and-ocean carbon cycle model. The technical details of the physical atmosphere ocean coupling and of the implementations of the vegetation and biogeochemistry (i.e. the carbon cycle) models into the physical core model are described in Fogli et al. (2009). The role of the ocean carbon cycle in the regulation of anthropogenic carbon emission as simulated by the INGVCE model is discussed in Vichi et al. (2011). The ESM components are: ECHAM5 atmosphere (Roeckner et al. 2006); SILVA land and vegetation (Alessandri 2006); OPA8.2 ocean (Madec et al. 1999); LIM sea ice (Timmermann et al. 2005), and PELAGOS biogeochemistry (Vichi et al. 2007). The cumulus convection parameterization is based on the Tiedtke (1989) scheme modified by Nordeng (1994) for deep convection, as in the ECHAM5 model. The software used to couple the atmosphere (including the land-vegetation model) model and the ocean (including the biogeochemistry) model is OASIS3 (Valcke 2006).

2.3.7 CNRM + DMI: CNRM-CM3.3 model

The CNRM-CM3.3 model is an improved and updated version of the CNRM-CM3.1 coupled model (Salas-Mélia et al. 2005) used for IPCC-AR4. The atmospheric part is based on the ARPEGE-Climat version 4 GCM (Déqué 1999; Royer et al. 2002; Gibelin and Déqué 2003) with spectral truncation T63 and Gaussian grid of 64 × 128 points, a progressive hybrid sigma-pressure vertical coordinate with 31 layers, and semi-Lagrangian advection scheme with a semi-implicit 30-min time step. Ozone concentration is a prognostic variable with a simplified linear parameterization of sources and sinks (Cariolle et al. 1990) modified to improve the simulation of the effects of chlorine on the ozone destruction. The indirect effect of sulfate aerosols is based on the parameterization of Boucher and Lohmann (1995) with a calibration from POLDER satellite data (Quaas and Boucher 2005). Deep convection is parameterized using a mass-flux convective scheme with Kuo-type closure (Bougeault 1985). The atmosphere-ocean coupling through OASIS 2.2 has been revised to achieve a better conservation of the energy fluxes during interpolations between the atmospheric and oceanic grids. The ocean model (OPA 8.1) and sea-ice model (GELATO; Salas-Mélia 2002) have been checked carefully, with minor corrections implemented to improve the energy conservation. The improvements in the coupled system have led to reduced drift in ocean volumetric and surface temperature and atmosphere 2 m temperature. Changes in land use are introduced through a modification of the fractions of crop and pasture types in the land-surface classification, and the resulting surface properties have been computed with an updated version (ECOCLIMAP-2) of the ECOCLIMAP vegetation map (Champeaux et al. 2005).

2.3.8 NERSC: BCM2 and BCM-C models

For the ES2 simulations two different versions of BCM have been used, BCM2 and BCM-C. The BCM2 is an updated version of the original Bergen Climate Model (BCM) described in Furevik et al. (2003), which was used for IPCC AR4. The atmospheric part is ARPEGE-Climat version 3, which is based on the atmospheric GCM developed at CNRM-GAME (Déqué et al. 1994) and contains very similar physics to the ARPEGE-Climat version 4 used in the CNRM-CM3.3 model. (The atmospheric model differences are mainly to the dynamics, without major impacts on the current simulations.) In the version used in this study ARPEGE is run with a truncation at wave number 63 (TL63) and a 30-min time step. A total of 31 vertical levels are employed, ranging from the surface to 0.01 hPa. The physical parameterizations are similar to those used in previous versions of the BCM, but the vertical diffusion scheme has been updated to that of ARPEGE-Climat version 4 (Otterå et al. 2009). Deep convection is parameterized using a mass-flux convective scheme with Kuo-type closure (Bougeault 1985). The indirect effect of tropospheric sulphate aerosols is parameterized according to Rongming et al. (2001). The oceanic part is Miami Isopycnic Coordinate Ocean Model (MICOM) (Bleck and Smith 1990; Bleck et al. 1992) and is extensively modified at NERSC. With the exception of the equatorial region, the ocean grid is almost regular with horizontal grid spacing approximately 2.4° × 2.4°. The model has a stack of 34 isopycnic layers in the vertical, with potential densities ranging from 1,029.514 to 1,037.800 kg m−3, and a non-isopycnic surface mixed layer on top providing the linkage between the atmospheric forcing and the ocean interior. BCM2 uses the GELATO (Salas-Melia 2002) sea ice model. Several modifications have been made to MICOM and are documented in Otterå et al. (2009).

Recently, the Bergen earth system model (BCM-C) has been developed by coupling terrestrial and oceanic carbon cycle models into BCM2 (Tjiputra et al. 2010). BCM-C adopts the Hamburg Ocean Carbon Cycle (HAMOCC5.1) model, which is based on the original work by Maier-Reimer (1993) with the extensions of Maier-Reimer et al. (2005). The HAMOCC5.1 implements full carbon chemistry formulation for air-sea CO2 exchange. It is similar to the ocean carbon cycle model used in the ECHAM5-C model, but incorporated here into MICOM (Assmann et al. 2010). For the terrestrial part it uses the Lund-Postdam-Jena model (LPJ) (Sitch et al. 2003), a large-scale terrestrial carbon cycle model which includes global dynamical vegetation. The LPJ version in BCM-C does not implement land-use change. The different components are coupled together using OASIS2.2 (Terray and Thual 1995; Terray et al. 1995) and the model is run without any form of flux adjustments. Unlike BCM2, BCM-C uses the original NERSC sea ice model. Validation and assessment of climate-carbon-cycle feedbacks in BCM-C have been made by Tjiputra et al. (2010).

2.4 Interpretation of the forcing data by different models

Considerable efforts have been made to implement the forcings in a similar way across the various models. In particular all the models used the same concentrations of the well-mixed GHGs, the models with carbon cycle being driven with the concentration of CO2 as previously described. However, due to specific features and constraints in certain models some differences in the implementations of the forcings still remain as outlined below.

Ozone 3-D concentrations were specified in the scenario simulations with HadGEM2-AO, HadCM3C, ECHAM5-C and EGMAM + from the ozone simulations provided by the University of Oslo database, the E1 ozone fields first having been adjusted for future estimated temperature-dependence using a rescaling based on A1B ozone fields (which already incorporated the temperature-dependent effect in the off-line modelling). IPSL-CM4/IPSL-CM4-LOOP and BCM2/BCM-C simulations used a fixed ozone climatology throughout all their simulations, INGVCE used the ozone distribution from 1860 to 2100 of Kiehl et al. (1999), and in CNRM-CM3.3 ozone was modelled as a prognostic variable.

For sulfate aerosols, most models used the concentration maps provided by the CTM (Boucher and Pham 2002), exceptions being HadGEM2-AO, HadCM3C and EGMAM+ which used their own aerosol transport schemes driven by geographical emissions. In the HadGEM2-AO and HadCM3C models, an explicit geographical representation of ship track emissions was used, but it assumed no change in ship tracks in the twenty-first century compared to present-day. Note that although the radiative forcing due to GHGs is constrained by the experimental design to be quite similar for a given scenario in all models, the forcing due to aerosols is less tightly constrained. This represents probably the largest modeling uncertainty in the net forcing and an important contributory factor to the resulting spread in climate response. Given the same aerosol burden, the aerosol forcing effects vary due to their different representations in models, but the sulfate aerosol burden itself is an additional source of variation between models (Fig. 3). In particular, HadGEM2-AO and HadCM3C simulate systematically lower burdens than the CTM, while EGMAM + simulates considerably higher burdens (more than double those in HadCM3C). Additionally, there are variations in the shape of the A1B peak and its subsequent decline.

Land-use changes were taken into account in most models according to the specified crop and pasture fraction variations, but omitted in HadCM3C, IPSL-CM4-LOOP, INGVCE, BCM2, and BCM-C due to the difficulty of integrating this forcing with dynamical vegetation within a terrestrial carbon cycle. Different (model-dependent) underlying land use maps and crop/pasture classifications in terms of plant functional types meant that implementing the associated forcing completely consistently was problematic. (ECHAM5-C is the only model to combine land use change with a terrestrial carbon cycle and therefore the only model able to report land use carbon emissions separately from energy emissions. In all other models, the allowable anthropogenic carbon emissions are implicitly a sum of land use and energy emissions.)

Solar and volcanic forcings were represented in only two of the 20C3M simulations used to initialise A1B and E1 simulations, namely those with IPSL-CM4-LOOP and BCM2.

Solar forcing was represented in both models via variations of the solar constant and thus the top of the atmosphere shortwave flux. The basic solar constant time series for the 20C3M simulation with IPSL-CM4-LOOP was the construction by Solanki and Krivova (2003), in which most of the total rise of about 1.5 W/m2 takes place in the period 1900–1950. The solar cycle and its variations over time were also included. In the BCM2 case, solar constant variations follow Crowley et al. (2003).

Volcanic radiative forcing was represented in IPSL-CM4-LOOP by an additional change of the solar constant (modelling the shortwave radiative effect only). The forcing variations follow an updated version of Sato et al. (1993) (using data obtained from http://data.giss.nasa.gov/modelforce/strataer/) in which aerosol optical depth τ was converted to radiative forcing F (W/m2) according to the relationship F = −23τ proposed by Hansen et al. (2005). In BCM2, the volcanic aerosol forcing time series follows Crowley et al. (2003), specifying monthly optical depths at 0.55 microns in four latitude bands (90°N–30°N, 30°N-equator, equator-30°S and 30°S–90°S). The aerosol loading was distributed in each model level in the stratosphere so that both the shortwave and longwave radiative responses are simulated (Otterå 2008).

3 Results

As described in the previous section, a total of ten models were used to produce the simulations presented here. We consider that two pairs of models (IPSL-CM4/IPSL-CM4-LOOP and BCM2/BCM-C), which only differ with regard to inclusion of an integrated CC component, should not be regarded as independent models within the experiment. For those models we therefore use a half weight rather than a full weight when computing multi-model ensemble mean results in which both models of the pair contribute. For analysis which specifically concerns the CC response, we restrict attention to a sub-ensemble of 5 models in which each carries the same weight in computing ensemble means.

3.1 Global climate response

3.1.1 Temperature and precipitation

Each model shows a significantly lower temperature response in E1 than A1B in the late twenty-first century (Fig. 5), but the multi-model ensemble spread leads to a slight overlap between the projected temperature rise for the most sensitive models’ E1 simulations (HadCM3C, IPSL-CM4) and the least sensitive models’ A1B simulations (EGMAM+, CNRM-CM3.3). The ensemble mean warming at 2100 relative to 1861–1890 is about 3.4 K for A1B and 1.8 K for E1. The spread is similar for A1B and E1 (~1.5 K at 2100) and consistent with that of AR4 AOGCMs driven with the SRES A2 concentration pathway (Fig. 10.20 of Meehl et al. 2007). The spread in temperature is relatively smaller for the models with interactive carbon cycle than that seen in the C4MIP models (Friedlingstein et al. 2006) because the experimental design used in this study by design suppresses the full carbon cycle feedback-driven temperature spread exhibited in C4MIP, in which carbon emissions rather than concentrations were specified, and which contributes to the temperature spread reported in AR4. Hence results obtained from concentration pathway experiments, such as those presented here, do not represent the full modelling uncertainty in temperature response for given emissions pathways.

Fig. 5
figure 5

Global mean near surface temperature change, relative to 1861–1890, for historical (1860–2000) and subsequent A1B (top panel) and E1 (bottom panel) scenario simulations (2000–2100) for the contributing ES2 models. Each separate model curve is a simple average over all simulations by that model, tending to smooth some models more than others, and an 11-year running average is also applied to all curves. The overall ensemble mean weights models as described in the text

The two pairs of models that differ with regard to the inclusion of a CC component both show differences in their twentieth century temperature responses. In particular the temperature anomalies during the 1910–1960 period are more positive in IPSL-CM4-LOOP and BCM2 compared to IPSL-CM4 and BCM-C. The inclusion of solar and volcanic forcings during the twentieth century in IPSL-CM4-LOOP and BCM2 (Otterå et al. 2010) is the main factor contributing to these differences. However, the different twentieth century temperature responses also partly reflect internal decadal variability in the different simulations, and also the enhanced snow-albedo feedback in the IPSL-CM4-LOOP case. For both pairs of models, the temperature increases (either for A1B or E1) are similar during the twenty-first century. For the IPSL-CM4/IPSL-CM4-LOOP pair this reflects the fact that the difference in snow-albedo feedback between the two models slowly decreases as the high-latitude temperature difference between their corresponding scenario simulations progressively decreases.

In some models (e.g. HadCM3C, IPSL-CM4) the warming during the first half of the twenty-first century in the E1 scenario often exceeds that in the A1B scenario. This is mostly due to a reduction in the forcing effect of aerosols, i.e. a considerably stronger reduction in the aerosol cooling in E1 than in A1B. We demonstrate this for the IPSL-CM4 E1 simulation in comparison to the two variants of A1B (Fig. 6; cf. Fig. 3). The shortwave radiative forcings due to the sulfate aerosol direct and first indirect effects were calculated on-line in these simulations using two calls to the radiation scheme at each time step—firstly with the simulated aerosol distributions, and secondly with the pre-industrial aerosol distributions, as described by Dufresne et al. (2005). The difference in shortwave radiative fluxes at the top of the atmosphere defines the radiative forcing shown.

Fig. 6
figure 6

Total radiative forcing in a cooling sense (W/m2) due to anthropogenic sulfate aerosol, diagnosed in the IPSL-CM4 model simulations, for the A1B, A1B-IMAGE and E1 scenarios

The most striking feature of the E1 results is that the simulations show a rapid change in the slope of the global mean temperature curve in mid century, around the time of the GHG concentration peak (Fig. 2). We interpret this, not as a direct temperature response to GHG concentrations, but rather a lagged response to the reduction of GHG emissions earlier in the century. Emissions in E1 rapidly diverge from A1B around 2010 and decline thereafter (Lowe et al. 2009), but the divergent concentrations pathways that result in the twenty-first century do not have a strong impact on the temperature response in the first half of the century, these being determined to a significant extent by the committed response to past emissions. This is despite the fact that E1 concentrations are even lower than for the lowest SRES scenario (B1). Consistent with the AR4 results for SRES scenarios and our understanding of inertia of the climate system, only in the second half of the twenty-first century do the divergent concentration pathways express themselves clearly in the temperature response, so that by 2070 there is a much clearer separation between A1B and E1 simulations. The delay of several decades from the start of rapid emissions reduction until the resulting temperature stabilization response is experienced is consistent with previous studies (e.g. van Vuuren et al. 2008) and is a key point for policymakers. It implies that the need for adaptation action in the next several decades will be mostly independent of the mitigation action.

Examination of the individual models’ E1 temperature time series (Fig. 5) reveals three broad groups of models, with characteristic responses in the second half of the century in which: (a) temperature continues to rise but the rate of increase slows down (HadCM3C, HadGEM2-AO, IPSL-CM4, ECHAM5-C, INGVCE, BCM2), (b) the temperature response approximately stabilizes (IPSL-CM4-LOOP, CNRM-CM3.3, BCM-C), and (c) the temperature response decreases slightly (EGMAM+).

The global annual precipitation time series (Fig. 7) show agreement between models for a gradual increase of precipitation toward the end of the twentieth century and in the twenty-first century in both scenarios, but with marked decadal variability in some cases and a considerable range across models. Not surprisingly, given the stabilization of temperatures at a lower level, the ensemble mean total precipitation increase in E1 is eventually lower than in A1B, but it exceeds that in A1B up until about 2065. The ensemble spread of global mean changes for the late twenty-first century overlaps more between A1B and E1 for precipitation than for temperature, consistent with the larger uncertainty of precipitation projections shown by Douville et al. (2006) for the CMIP3 models.

Fig. 7
figure 7

As Fig. 5 but for global mean precipitation change relative to 1861–1890

The twenty-first century temperature and precipitation increases are consistent overall with the CMIP3 model ranges (see Fig 10.5 of Meehl et al. 2007). The increase of precipitation for most models is linearly correlated to the temperature increase, as can be seen from a scatter diagram of precipitation versus temperature anomaly (Fig. 8). The slope of fitted linear relationships between temperature rise and precipitation change (the “hydrological sensitivity”) varies quite widely among the models employed here, from 0.78 to 2.13%/K for A1B and 1.89 to 2.69%/K for the E1 scenario. The ensemble mean for A1B (1.53%/K) is slightly lower than in the CMIP3 simulations (1.63%/K; Meehl et al. 2007 p.SM.10-4, table S10.2). For E1, the hydrological sensitivity is significantly higher at 2.33%/K, similar to that seen in CMIP3 constant composition ‘Commit’ simulations (2.29%/K). (Note, however, that the linear regression method used here for computing trends over the entire twenty-first century differs from the method used in the CMIP3 results cited. Meehl et al. (2007) derived the hydrological sensitivity from global mean changes of 2080–2099 relative to 1980–1999 instead. The CMIP3 method makes only a small difference to the results for the ensemble means: 1.59%/K for A1B; 2.28%/K for E1).

Fig. 8
figure 8

Scatter diagrams of global annual mean precipitation versus temperature anomalies relative to the 1980–1999 period for scenarios 20C3M (blue), A1B (red) and E1 (green) for the contributing ES2 models (IPSL-CM4 and IPSL-CM4-LOOP; and BCM2 and BCM-C results are combined). Best linear fits to A1B and E1 results for each model are illustrated

In contrast to the good agreement with CMIP3, the multi-model range of twenty-first century temperature increase for A1B in our study (1.76–3.28 K; see Table 2 later) is considerably different to that of the S2009 study (3.50–7.37 K for the end of the twenty-first century relative to 1990; 5–95% percentiles). Interpretation of this difference is not straightforward because ES2 results are a discrete set of model samples whereas S2009 results are a fitted probability density function. Also, S2009 (in a similar way to C4MIP) permits CC-climate feedbacks such that the median radiative forcing is considerably higher than A1B and more similar to the SRES A1FI scenario. The absence of any overlap between the two projected temperature response ranges is rather surprising, but one possible interpretation is that the ES2 comprehensive models exhibit systematically lower climate sensitivity (S) than the S2009 model over its range of sampled input parameters. There is some evidence to this effect from Fig. 2 of S2009, which shows that fits to AR4 comprehensive models (an analogy for ES2 models) generally require S values towards the lower end of the S2009 model’s distribution.

Table 2 Changes in global annual mean temperature (T), precipitation (PR), cloud radiative forcing (SWCRF—shortwave, LWCRF—longwave, and CRF—net), atmospheric net radiative flux divergence (AA) and net surface energy flux (FS) for the decade 2090–2099 in the A1B and E1 scenarios relative to 1990–1999

Despite its new elements, the ES2 experimental design is somewhat limited in scope compared with CMIP3 or CMIP5 experiments and this restricts our ability to diagnose the physical climate feedbacks and mechanisms leading to the range of temperature and precipitation responses seen in the ES2 models and hence to further understanding of the causes of the spread in these model results beyond the state of knowledge in the IPCC AR4 report (Randall et al. 2007; Meehl et al. 2007). In particular direct diagnosis of S to pure GHG forcing is not possible with the ES2 simulations currently available.Footnote 3 Instead of S, we examine changes in cloud radiative forcing (CRF) and its shortwave (SWCRF) and longwave (LWCRF) components, which are potentially related to S. Changes in atmospheric radiative flux divergence (AA) and surface net energy flux (FS) are also of interest in relation to temperature (T) and hydrological (PR) responses. In Tables 2 and 3 we therefore document global mean changes in these quantities for the A1B and E1 scenarios for each model, and linear correlations over the ensemble between changes in T/PR and the other variables.

Table 3 r2 for linear correlations between X and Y, where X = T or PR and Y = other variables, for the combined A1B and E1 results in Table 2 (weighting all models equally)

The mean response in A1B and E1 for 2090–2099 relative to 1990–1999 (Table 2) shows no discernible linear relationship between CRF and T (r2 = 0.00; Table 3) combining data from both the A1B and E1 scenarios. LWCRF is more strongly related to both T (r2 = 0.38) and PR (r2 = 0.48) than are SWCRF or CRF. The strongest correlations exist between FS and T (r2 = 0.70) and between AA and PR (r2 = 0.76), whereas correlations between FS and PR (r2 = 0.34) and AA and T (r2 = 0.25) are much weaker.

An examination of changes in atmospheric energy budget terms related to the hydrological response, namely atmospheric net radiative absorption (∂AA) and surface sensible heat flux (∂SH), suggests that the majority (two-thirds or more) of the ensemble spread in hydrological response relates to ∂AA and only a third or less to ∂SH (Fig. 9; note the different vertical scales for ∂AA and ∂SH), where ∂ denotes change relative to the 1861–1890 period. ∂SH in general responds smoothly to global warming, but (as for ∂PR and ∂AA) not consistently in terms of sign. ∂AA in part reflects a smooth radiative response to GHG changes and increasing water vapour (Mitchell et al. 1987) along with associated feedbacks, but also reflects a faster timescale response to aerosol-related forcing (Ming et al. 2010). Although the representation of cloud-radiative feedbacks is part of the modelling uncertainty in ∂AA, given the relatively low correlation between the precipitation response and SWCRF or LWCRF we suggest that the radiative response to aerosol forcings is probably a more significant factor contributing to the modelling uncertainty in ∂AA (and hence the hydrological response) in our results. Decadal variability in ∂AA (Fig. 9) very closely matches that in ∂P (Fig. 7) in most models. Further evidence supporting the role of aerosols in the modelling uncertainty is that around 2020 the spread in ∂AA is smaller in E1 than in A1B.

Fig. 9
figure 9

As Fig. 5 but for changes relative to 1861–1890 in atmospheric net radiative absorption (∂AA; top), and surface sensible heat flux to the atmosphere (∂SH; bottom) inferred as a residual from the atmospheric energy budget ∂SH = −(L∂P + ∂AA), where L is the specific latent heat of water (= 2.26e6 J/kg) and ∂P the change in precipitation (cf. Fig. 7). The assumed value for L neglects additional latent heating associated with frozen precipitation (a minor correction, of order 1%). Note the magnified vertical scale for ∂SH compared to ∂AA (Data from only three of the six ECHAM5-C 20C3M and A1B simulations were used for this figure)

In HadGEM2-AO, global precipitation decreases significantly in the first three decades of the twenty-first century in the A1B case, which delays the overall increase. HadGEM2-AO behaves more consistently with other models in the E1 scenario, though its precipitation versus temperature relationship is markedly more non-linear than other models. The lag in A1B can be explained on the basis of the strong correlation between ∂P and ∂AA and the fact that in HadGEM2-AO ∂AA exhibits two peaks (Fig. 9), the first in the late twentieth century (1950–1980) and the second in the early twenty-first century. The latter corresponds with the 2020 peak in A1B aerosol emissions, which has no counterpart in A1B-IMAGE or E1 emissions (Fig. 3). The aerosol-induced response in ∂AA is (apparently) much more pronounced in HadGEM2-AO than the other models, which may be attributable to the inclusion of black carbon and biomass aerosol or indirect sulfate aerosol (cloud lifetime) effects, although this requires further investigation. HadGEM2-AO therefore exhibits a low hydrological response in A1B (0.052 mm/day) but a high response in E1 (0.106 mm/day) for the period 2090–2099. The marked early twenty-first century divergence between the aerosol burden in A1B and E1 is probably a key factor in the more rapid rise in precipitation in E1 compared to A1B up until 2060, consistent with previous findings that, for the same total radiative forcing, aerosol-induced forcing tends to exhibit a stronger hydrological response than GHG-induced forcing (Feichter et al. 2004), and that aerosols (particularly absorbing aerosols such as black carbon) promote a fast hydrological forcing/response mechanism (Andrews et al. 2010).

Lastly we examine a composite picture of the A1B and E1 global carbon emissions, equivalent CO2 concentrations applied to the models, and multi-model ensemble mean temperature and precipitation responses (Fig. 10). The relatively higher global warming response in E1 compared to A1B over the period 2000–2040 is again apparent, for the reasons already discussed. The peak in E1 carbon emissions occurs in 2015, but the corresponding peak in concentrations occurs 30 years later (2045) and the change towards approximate stabilization of global warming shortly after that (2050). Beyond 2050, only about 0.1 K of additional warming occurs up to 2100. In contrast, the emissions peak in A1B occurs 40 years later than in E1 at 2055, at almost double the level of emissions, such that concentrations are still far from stabilization by 2100 and rapid warming ensues. The warming rate increases in mid-century before slowing to approach a steady 0.28 K/decade at 2100. There is a change of slope in the precipitation response in E1 around 2050 as for temperature, but rather than flattening out precipitation continues to increase steadily, at a rate faster than is consistent with a constant global hydrological sensitivity. There is even some acceleration after 2080. The A1B precipitation response remains below E1 between 2000 and 2065, 25 years longer than for temperature. These features indicate a relatively longer adjustment timescale for global precipitation compared to temperature.

Fig. 10
figure 10

Superimposition of (scaled, see legend) global annual mean nominal carbon emissions (green, taken from the original IPCC SRES A1B marker and IMAGE E1 scenarios), CO2-equivalent concentration (black), ensemble mean simulated warming (red) and precipitation change (blue) for the A1B and E1 scenarios. Global warming and precipitation changes are relative to 1861–1890 and smoothed as in Figs. 5 and 7

3.1.2 Allowable global carbon emissions

The allowable anthropogenic carbon emissions in the ENSEMBLES S2 experiment from the five different global models that include an interactive carbon cycle are illustrated in Fig. 11. Large interannual variability which was evident in the results has been reduced by applying an 11-year running average to each model’s allowable emissions (and to the atmospheric CO2 change, for consistency). In one case (ECHAM5-C) the results shown also represent an ensemble mean of 6 or 3 simulations depending on scenario, which further reduces the variability. Considerable (unforced) variability nonetheless remains in the allowable emissions with a range of periodicities, particularly in the case of HadCM3C model results. This variability arises predominantly from variations in the flux of carbon to the atmosphere from the land rather than the ocean (Fig. 12).

Fig. 11
figure 11

Implied (“allowable”) anthropogenic net carbon dioxide emissions to the atmosphere (GtC/yr) in ES2 runs, diagnosed from the imposed change in atmospheric CO2 concentrations and the modelled net carbon flux exchange between the atmosphere and land surface and ocean. An 11-year running average is applied to all curves, including concentration changes. ECHAM5-C results show an ensemble mean of 6 (20C3M + A1B) and 3 (E1) independent simulations, tending to smooth those results compared with other models. ENSEMBLE MEAN curve weights each independent model equally. Corresponding SRES A1B (Nakicenovic et al. 2000) and IMAGE E1 scenario values (the sum of fossil fuel plus land use change emissions; every 5 years from 1970 to 2100 for E1, and every 10 years from 2000 to 2100 for A1B) are shown as symbols for comparison

Fig. 12
figure 12

Net carbon flux exchange between the atmosphere and land surface (top panel) and ocean (bottom panel), with an 11-year running average applied to all curves as in Fig. 11

For the historical (20C3M) simulations of those models that use an interactive CC the net carbon flux to the atmosphere shows a generally consistent behaviour across the model ensemble (Fig. 11 Footnote 4). There is a rising trend with anthropogenic emissions reaching between 6 and 9 GtC/yr in 2000—reasonably consistent with the observed carbon budget estimate (Le Quéré et al. 2009) of around 8.1 GtC/yr (a sum of 6.7 GtC/yr from fossil-fuels plus cement manufacture and 1.4 GtC/yr from land use emissions). This is also consistent with the data used by the IMAGE model for 2000. Taking account of their internal variability all models agree that the combined carbon sinks remove about half of net anthropogenic emissions from the atmosphere at 2000, consistent with estimates of the current airborne fraction of total CO2 emissions (e.g. Le Quéré et al. 2009).

In the future projections there is a much greater spread within the model ensembles, particularly for A1B in mid-century. Remarkably, two models (IPSL-CM4-LOOP and ECHAM5-C) agree consistently not just in the historical period but also throughout both A1B and E1 scenarios despite differences in model formulation (for instance, one includes land use carbon emissions explicitly while the other doesn’t). This suggests that the representation of the carbon cycle response in terms of net sinks is very similar in these models. There is some overlap between the multi-model ensemble allowable emissions for A1B and E1 in the early twenty-first century, but a clear separation occurs by 2030. A pattern emerges in which the two consistent models (IPSL-CM4-LOOP and ECHAM5-C) predict the largest net sinks and hence imply the highest allowable anthropogenic emissions (peaking at around 17 GtC/yr in 2050 and falling to ~12 GtC/yr in 2100 for A1B), while HadCM3C implies the lowest allowable emissions (only ~10 GtC/yr in 2050, falling to ~8 GtC/yr in 2100 for A1B), with the INGVCE and BCM-C models falling in between (INGVCE closest to HadCM3C). The same model ordering is seen in the E1 scenario results but in this case the allowable emissions at 2050 range from +4 GtC/yr to near zero, a reduction below 1960 levels at least. HadCM3C, in which net carbon uptake by the land and ocean reduces considerably as the atmospheric carbon dioxide concentration stabilizes, again implies the lowest allowable emissions. Our result (e.g. Fig. 12) is consistent with an earlier feedback study by Friedlingstein et al. (2006), which indicated that the HadCM3LC and IPSL-CM4-LOOP models have the highest and lowest reductions in terrestrial carbon uptake due to global warming amongst the C4MIP models.

To achieve the E1 scenario concentration pathway the models agree that allowable anthropogenic emissions must reach close to zero by 2100, and according to HadCM3C even slightly negative carbon emissions (i.e. a small anthropogenic sink) may be needed (Fig. 11). This future implied need for near zero emissions is consistent with previous work, typically using simpler climate models of intermediate complexity (e.g. Matthews and Caldeira 2008; Plattner et al. 2008).

The multi-model ensemble mean allowable emissions are in good agreement with the anthropogenic CO2 emissions (accounted as fossil fuels plus land use emissions) projected with the IMAGE IAM for E1. The latter falls within the ensemble spread and generally within 1 GtC/yr of the ensemble mean (Fig. 11). This implies that the carbon cycle representation in the IAM used to generate E1 is consistent with the more complex ESMs used to assess its implied emissions. There is, however, a broad ensemble spread around the central value, which is relevant to take into account in designing policy that aims for the kind of targets exemplified by the E1 scenario. As previously noted, with the exception of ECHAM5-C, the models do not account for anthropogenic land use change separately from energy emissions in their diagnosed land-to-atmosphere carbon fluxes (Fig. 12), but IMAGE does. (It is likely that improvements will be implemented in models taking part in the next IPCC assessment to allow more explicit accounting of land use change emissions within the total allowable emissions).

Focusing on the land-to-atmosphere flux, in E1 the ensemble mean land sink approaches zero by the end of the twenty-first century, and in HadCM3C actually becomes a net source of atmospheric carbon in the second half of the twenty-first century. Such behaviour was seen previously in the HadCM3LC model (Friedlingstein et al. 2006) but here the sink-to-source transition happens at a lower temperature rise. The model spread is also noticeably greater in the A1B experiment, with three models projecting a significant but decreasing land carbon sink toward 2100 while two models actually indicate a land carbon source.

The ocean remains a sink throughout the experiments in both A1B and E1, but the sink strength decreases following a maximum at around 2015 in E1 (Fig. 12), whereas in A1B it continues to grow until 2050 before declining. There is a much lower ensemble spread for the ocean than for the land carbon fluxes, and the former are also much smoother in time. Note that the BCM-C and ECHAM5-C models project very similar oceanic carbon uptake to each other for both A1B and E1, using similar ocean carbon cycle models but with major differences in their physical ocean GCM components.

Carbon fluxes in the IMAGE E1 scenario itself (Fig 12; symbols) are again within the spread of the multi-model ensemble and in good agreement with the ensemble mean fluxes.

In summary, the results all agree that the ocean is already a significant net carbon sink and will remain so throughout the twenty-first century in both the A1B and E1 scenarios. The ensemble spread and temporal variability in the land carbon fluxes make this the main source of uncertainty in the carbon budget, and hence the allowable anthropogenic emissions. Three models project a decreasing but still significant carbon sink in A1B towards 2100, while two models project a growing land carbon source. Somewhat closer agreement occurs for E1 (most likely due to the smaller climate deviation from the current situation), the ensemble mean showing a tendency towards a neutral land carbon budget at 2100, but still with wide variations between models.

Changes in global allowable anthropogenic emissions consistent with the E1 pathway can be deduced (Table 4). For the ensemble mean for each model relative to its own emissions pathway, the results indicate that some increase in emissions is still permitted at 2020 relative to a 1990 baseline (30% increase), but a greater than 50% reduction is required by 2050, rising to 74% by 2080. Relative to a 2005 baseline, only a small rise in emissions (5% increase) is permitted by 2020, while the mean reduction in allowable global emissions tops 60% by 2050, reaching almost 80% by 2080. The multi-model results are broadly consistent with those from the IMAGE IAM that generated E1 and also with previous estimates from EMICS (Plattner et al. 2008). This consistency supports the idea that the indicated emission reductions are needed, although the spread in the current model results presumably underestimates the full range of uncertainty in land and ocean carbon exchanges. In other words, the current IMAGE model has already been calibrated in such a way that it is able to represent the mean of the more complex models—and there is no immediate reason to assume that to meet climate goals greater or lesser emissions reductions would be needed than indicated in the results of this model. The results of Table 4 can additionally be used to help assess whether IMAGE (and other IAMs) can also represent the complex models’ uncertainty range.

Table 4 Ensemble mean and range of allowable global carbon emissions for the E1 scenario, as changes relative to 1990 and 2005 baseline years, for the five ES2 models including an interactive CC and the IMAGE 2.4 model

3.2 Regional climate response

The late twenty-first Century ensemble mean surface air temperature (SAT) response (Fig. 13, upper panels) illustrates the familiar Northern Hemisphere high-latitude enhancement of warming, and marked land-sea contrast (note that land-sea contrast is present in both A1B and E1, but less apparent in E1 in Fig. 13 due to the contours used). The intra-ensemble spread in SAT (Fig. 13, lower panels) is largest in the Arctic Ocean, Labrador Sea, Amazonia and Southern Ocean—and reflects the land-sea contrast in the mean response.

Fig. 13
figure 13

Mid and late twenty-first century weighted ensemble mean response in surface air temperature (SAT) response (upper panels) and the intra-ensemble spread in SAT (lower panels)

The ensemble mean ratio of E1 to A1B SAT response in the mid and late twenty-first Century (Fig. 14), demonstrates that only a small fraction (<10%) of the baseline A1B warming response is mitigated by mid century through the adoption of the E1 scenario over most of the globe, but a much larger fraction (30–70%) is mitigated by the end of the century. This is a further illustration, on a regional scale, that the benefits of mitigation in temperature terms will not be realised until several decades after emissions reductions begin.

Fig. 14
figure 14

The weighted ensemble mean ratio of E1 to A1B SAT response in the mid and late twenty-first century (defined as 2030–2049 and 2080–2099 respectively relative to 1980–1999)

An examination of the weightedFootnote 5 number of models exceeding local 2 K or 4 K warming thresholds in the twenty-first century relative to pre-industrial (1861–1890 baseline) highlights the similarities and large differences in the two scenario cases (Fig. 15; results are illustrated for two contrasting time horizons corresponding to the mid and late twenty-first century). In A1B almost all land area exceeds 2 K warming in all models in the late twenty-first century, but for most land areas some models indicate a warming below 2 K according to the E1 results. Even so, a majority of continental land, including Eurasia, Canada, the USA, Amazonia, much of N. Africa, parts of S. Africa and N. Australia exhibits local warming of >2 K in a large fraction of models (orange/red colours), even with the E1 mitigation scenario. The number of models with warming above 4 K greatly reduces in E1 compared to A1B beyond the 2030–2049 time horizon, but more than half still exceed 4 K for the Arctic, and more than a quarter in part of the Weddell Sea, on the 2080–2099 time horizon. Warming above 4 K is also projected for Amazonia in a single model (HadCM3C), consistent with a strong Amazonian die-back response to global warming seen previously in HadCM3LC (Cox et al. 2000), which leads to amplified warming over this region.

Fig. 15
figure 15

Weighted number of models that simulate a local warming exceeding 2 K (top panels) and 4 K (bottom panels) for time horizons in the mid and late twenty-first century (2030–2049 and 2080–2099) relative to pre-industrial climate (1861–1890) for A1B (left) and E1 (right). Orange to red colours (dark blue colours) indicate a predominant signal for exceeding (not exceeding) the local threshold

There are strong geographical variations in the ensemble annual precipitation response, whether measured in terms of ensemble mean precipitation change (Fig. 16) or the number of models predicting a mean increase or decrease (Fig. 17), with a marked contrast between regions that will experience a precipitation increase (mostly near the equator and at high latitudes) and other regions experiencing a precipitation decrease (mostly around the subtropics and low mid-latitudes). While the broad scale pattern of annual precipitation increase/decrease remains similar in E1 compared to A1B at the end of the twenty-first century, the geographical extent and magnitude of drying in S. Europe, the Mediterranean and N. Africa is much stronger in A1B (Fig. 16). In regions where seasonal changes are of opposing sign the annual mean change may appear comparatively weak, although these changes may have a significant impact on the hydrological cycle. This is particularly true for transition zones between regions with predominant increases and regions with predominant decreases, such as the mid-latitudes. Detailed analysis of seasonal and monthly variations of changes in precipitation and other components of the hydrological cycle is beyond the scope of this paper but will be presented in a separate paper (H. Huebener, personal communication).

Fig. 16
figure 16

Weighted ensemble mean annual mean percentage precipitation change for 2080–2099 relative to 1980–1999 for A1B (top) and E1 (bottom)

Fig. 17
figure 17

Weighted number of models that simulate an annual mean precipitation increase for 2080–2099 relative to 1980–1999 for A1B (top) and E1 (bottom). Blue to purple colours indicate a predominance of increased precipitation while yellow to red colours indicate a predominance of drying

4 Concluding discussion

The ES2 study presented here provides a new set of comprehensive simulations following the Hibbard et al. (2007) experimental design, in which models including carbon cycle components provide estimates of allowable carbon emissions; and secondly it applies this design to an aggressive climate change mitigation pathway leading towards eventual stabilization. These features are similar to those of the long term experimental component of CMIP5. For the first time we have been able to compare the link between carbon emissions and atmospheric concentrations across a set of comprehensive GCMs for a policy-relevant aggressive mitigation scenario.

The choice of the low stabilization E1 scenario in this study makes the results particularly relevant in the mitigation debate around the Copenhagen Accord and its 2 K global warming target. We have examined whether the E1 scenario, designed as a 2 K stabilization scenario using an IAM, also behaves as such in comprehensive climate models. The comprehensive model simulations show that during the second half of the twenty-first century, global warming is significantly reduced in E1 compared to A1B and in six out of the ten models the simulated global mean warming remains below 2 K, most of this warming having been realized before 2050.

In addition to quantifying the global average response to a low stabilization scenario, the results can inform regional adaptation studies. Regional changes rather than simply global average warming are clearly important in this context, because a global 2 K target may allow for regional changes above 2 K. That this is indeed the case is demonstrated by the E1 scenario results, in which most of the models (>75%) predict that a 2 K warming will be exceeded locally over most of the land surface area (Fig. 15).

The ES2 ensemble results for global mean temperature and precipitation changes are generally consistent with the ranges reported in the IPCC AR4 report (Randall et al. 2007; Meehl et al. 2007) for the A1B scenario. By analogy with the higher rate of global mean precipitation increase per degree temperature rise for B1 compared to A1B in the CMIP3 study (Meehl et al. 2007), we find that the quotient in E1 exceeds that for A1B in this study, but by a larger factor than for B1 compared to A1B in the CMIP3 study. Although the relationship is close to that found in CMIP3 for A1B, the results suggest an even stronger hydrological cycle response per degree of warming through the twenty-first century for the E1 aggressive mitigation scenario than in the constant composition commitment experiment of CMIP3. Given the importance of the impacts of regional hydrological changes, this is an important point for future study. We also conclude that aerosols effects are a major contributory factor to the model spread of hydrological responses.

The results imply (Figs. 11, 12) that in order to follow the E1 pathway during the twenty-first century, global carbon emissions must peak before 2020, and decline rapidly thereafter towards a level near zero by 2100. Combined land and ocean carbon uptake is projected to be reducing at this time. We note that our results indicate that the IAM calibration for the carbon cycle places it close to the mean of the comprehensive climate models’ results. The results could also be used as a basis to explore the uncertainty ranges within the IAM.

In conclusion, based on results from these comprehensive modelling tools (including the IMAGE IAM, used to construct the E1 scenario), it appears feasible to limit global warming to the 2 K policy target—but only with rapid de-carbonisation during the coming century. It is important to remember that we have only examined one possible concentrations pathway for mitigation, so the allowable carbon emissions at 2020, 2050, etc. to keep warming below the 2 K target are not necessarily fully constrained by the estimates in this study.

A number of issues were encountered in designing and realising the ES2 experiments, from which scientific and practical lessons can be drawn or which prompt future investigation:

  1. 1.

    The specified CO2 concentration growth rates are not very smooth in the historical period (most evident in Fig. 11 up to c. 1940). In applying the Hibbard et al. (2007) design to the 20C3M simulation one effectively imposes these natural carbon cycle concentration oscillations onto the carbon cycle in models with an interactive CC. This observed CO2 variability is attributable, to a certain extent, to observed climate variability such as ENSO and NAO (e.g. Heimann and Reichstein 2008) and therefore inconsistent with the internal climate variability simulated by the models. This inconsistency could be addressed in the CMIP5 experiment by using a smoothed (or IAM-generated) CO2 concentration profile for the 20C3M period.

  2. 2.

    Sulfate aerosol burdens in A1B and E1 simulations diverge strongly in the early twenty-first century. The E1 scenario follows, as a side-effect of its mitigation policies, a much lower sulfur burden trajectory, but this divergence is also partly caused by lower sulfur emissions in the baseline scenario (A1B-IMAGE) from which E1 is derived. E1 therefore (counter-intuitively) warms relative to A1B in the early twenty-first C as the total forcing from GHGs plus aerosols is higher. Additionally, the precipitation increases more rapidly in E1 than in A1B until 2060. This is consistent with the lower sulfur burden and an enhanced hydrological response associated with aerosol-induced forcing compared to GHG-induced forcing (Feichter et al. 2004). In practice, this means that to interpret the early twenty-first century response in A1B and E1, results are needed from a corresponding ensemble of A1B-IMAGE simulations. However, due to the computational expense, the A1B-IMAGE scenario has only been run with a few models as an intermediate simulation; preliminary results (not shown) suggest that A1B-IMAGE simulations would follow the corresponding E1 temperature and precipitation responses more closely in the early twenty-first century.

  3. 3.

    Sulfate aerosol forcing is applied via emissions in a few models, but as concentrations in others, so this important forcing element is not as tightly controlled as the GHG forcing (via concentrations) in the experimental design. In combination with model-to-model variations in the representation of aerosol physics and their radiative forcings, the overall effect of aerosols therefore represents an important source of uncertainty in the results. Our results indicate that modeling uncertainties affecting changes in the atmospheric radiation budget (the direct response to aerosol forcing—particularly absorbing aerosol—being one contributory factor) are more significant (by a ratio of 2:1 or greater) than those affecting changes in surface sensible heat flux to the atmosphere, in determining the range of hydrological responses under the given forcing scenarios.

  4. 4.

    Land use changes were applied via crop and pasture fraction changes blended with underlying land use maps, but this was only possible for a subset of the models. Additionally, due to the model-dependent nature of the underlying land use maps, it was problematic to achieve a consistent land use change forcing even in this subset of models. This echoes the conclusions of Pitman et al. (2009) that land use change, despite being a regionally significant forcing, is presently impossible to impose in a common way in multiple models.

In future, there are clearly opportunities to explore aspects of climate change and climate impacts using the ES2 results for a plausible scenario with a small signal-to-noise ratio. But given that we have an initial condition ensemble of only modest size, we may need to develop new techniques to extract the fine scale climate change information.

As described in the text, further analysis of the E1 versus A1B responses is planned which will examine in greater detail regional and seasonal changes in the hydrological cycle, and look into the carbon cycle dynamics in the changing climate simulated in the subset of models with interactive CC, with the aim of ascertaining which aspects of physical climate change are the dominant drivers. To facilitate such future studies, and more general exploitation of the results of the ES2 experiment, data from the simulations are freely available for download as described in the “Appendix”.