1 Introduction

Climate prediction at interannual-to-decadal timescales can help decision makers to calibrate plans and actions related to climatic events due to climate variability and change (Hibbard et al. 2007; Cox and Stephenson 2007). Model error and forcing uncertainties are however a source of near-term global prediction biases, as underlined by the global temperature fluctuation of the 2000’s, i.e. the so-called “hiatus”, that has not been predicted by the CMIP5 climate models (Kosaka and Xie 2013; Watanabe et al. 2013; Meehl et al. 2014; among others). Decadal prediction experiments are an opportunity to analyze the physical mechanisms associated with climate predictability and variability in a near-future (2010–2035). They can thereby allow model physics and projection improvements. Meehl et al. (2011, 2014), Watanabe et al. (2013), Guemas et al. (2013b) and England et al. (2014) have thus used CMIP5 models to evaluate this 2000’s temperature hiatus in climate models. This is a relevant question and is also one goal of this study. In this study we explore the ability of a newly developed high resolution climate model to predict the decadal evolution of temperature, sea-ice extent and volume. In addition, we assess conditional skill by focusing on two specific events, the 2000’s temperature hiatus and the 1990s warming of the subpolar gyre.

Motivation for initialized decadal predictions comes from various studies that have successfully predicted regional climate on decadal time scales (Smith et al. 2007; Keenlyside et al. 2008; Pohlmann et al. 2009; Mochizuki et al. 2012; Chikamoto et al. 2013; Bellucci et al. 2013, 2014; Doblas-Reyes et al. 2013a; Garcìa-Serrano et al. 2015; Karspeck et al. 2015). The strongest skill is found in retrospectively predicting surface air temperature over large areas. This is primarily determined by the response to well-mixed greenhouse gases (GHGs), especially for lead-time greater than 2 years (Oldenborgh et al. 2012). Additional skill is given by the major volcanic events that induce an abrupt cooling through the ejected volcanic aerosols (Guemas et al. 2013a; Mehta et al. 2013; Timmreck et al. 2016). Another source of predictability is the initialization of the ocean that can lead to additional skill for the first years of the hindcasts, improving the near-term predictability (Yeager et al. 2012; Robson et al. 2012b; Matei et al. 2012; Chikamoto et al. 2013; Doblas-Reyes et al. 2013a). This is especially relevant for the North Atlantic Ocean where an accurate initialization of the Atlantic meridional overturning circulation (AMOC) allows Atlantic multidecadal variability to be predicted a decade or more in advance (Griffies and Bryan 1997; Boer 2000; Collins et al. 2006; Pohlmann et al. 2013). Model studies have then shown that it is possible to produce an accurate simulation of the mid 1990s subpolar gyre (SPG) warming thanks to a correct initialization of the ocean mean state (Robson et al. 2012a, b; Yeager et al. 2012; Msadek et al. 2014a). The SPG warming is therefore an excellent case study to test the ability of a prediction system, due to its large magnitude and potential predictability, and is thereby one of the main goal of this study.

The impact of the land surface initialization has also been assessed and is associated with a more limited but significant improvement of the precipitation and temperature predictions (Koster and Suarez 2003; Paolino et al. 2012). Over the Arctic the sea ice thickness initialization plays a role in the seasonal forecast skill of sea ice extent, has shown in Day et al. (2014) and suggested in Msadek et al. (2014b). Important efforts are thus currently being undertaken to improve sea-ice initialization (Lindsay et al. 2012). The role of other drivers has been summarized in the review of Bellucci et al. (2015).

It has been suggested that models with larger systematic errors tend to produce lower prediction performance (DelSole and Shukla 2010). The improvement of the decadal prediction systems by increasing spatial resolution and improving physical parametrization is one of the main challenges proposed in Doblas-Reyes et al. (2013b). There is indeed some evidence that increasing model resolution can improve some aspects of the mean climate. It has been shown that high resolution ocean–atmosphere coupled models produce smaller errors than low resolution models in sea surface temperature (SST) over coastal upwelling regions (Shaffrey et al. 2009; Gent et al. 2010; McClean et al. 2011; Delworth et al. 2012; Sakamoto et al. 2012; Small et al. 2014), due to a better simulation of the wind and its effect on the ocean (Gent et al. 2010; Sakamoto et al. 2012). It reduces the double ITCZ in the tropics (Delworth et al. 2012), El-Niño Southern Oscillation (ENSO) (Shaffrey et al. 2009; Sakamoto et al. 2012; Small et al. 2014) and the north Atlantic SSTs (Gent et al. 2010) are better simulated. Gent et al. (2010) with an ocean–atmosphere coupled model and Jung et al. (2012) with a atmosphere model have shown a more accurate atmospheric circulation at higher resolution: in Gent et al. (2010) the maximum sea-surface temperature biases in the major upwelling regions are reduced by more than 60%, the precipitation patterns are improved in the summer Asian monsoon, and the atmospheric circulation in the Arctic is also improved. In Jung et al. (2012) the tropical precipitation and atmospheric circulation, the occurrence of Euro-Atlantic blocking and the representation of extratropical cyclones are better represented in increasing horizontal resolution. Over land, the improvement of the precipitation is partly due to a better resolved orography (Gent et al. 2010; Delworth et al. 2012; Sakamoto et al. 2012) as shown with regional downscaling.

With the high resolution Community Atmosphere Model (CAM) Bacmeister et al. (2014) have found however that their simulations are not dramatically better than low resolution simulations. Furthermore some problems remain or are exacerbated at these resolutions: some simulations exhibit too much warming and cooling, especially over polar regions where sea ice are not enough accurately represented (McClean et al. 2011; Kirtman et al. 2012), and Bacmeister et al. (2014) have found an exacerbated Pacific double ITCZ bias. The improvement due to a higher resolution is therefore model-dependent. Increasing the resolution does not necessary lead to more accurate simulations.

Increasing the resolution can also lead to improved seasonal predictions: with the high-resolution GFDL climate model Jia et al. (2015) have shown an improvement of the seasonal prediction of 2-m air temperature and precipitation over land and of the Nino-3.4 index.

The question of the skill improvement with a high-resolution climate model remains a wide-opened question. We address it here by using a set of decadal hindcasts performed with the CERFACS-HR high-resolution model. Unfortunately, we cannot directly assess the potential improvements due to resolution only as the model physics version is slightly different from our low resolution version that was used for CMIP5 (Sanchez-Gomez et al. 2015). Furthermore, the initialization dates are different making impossible a fair comparison of the two prediction systems. We thereby simply assess the skill of our new modeling system and qualitatively compare it to the range obtained with low-resolution models. Note also that we do not have historical simulations (uninitialized hindcasts) that would allow us to clearly assess the additional influence of ocean initialization in addition to that of the external forcing.

The aim of this paper is thus to address three main questions:

  • What is the skill of our high-resolution coupled climate model in predicting global and regional surface quantities (temperature and sea-ice)?

  • Is the model able to represent the 2000’s global mean temperature fluctuation when initialized with observed ocean initial conditions?

  • Is the model able to predict the mid-90’s warming of the Atlantic subpolar gyre? What are the dominant mechanisms?

The paper is structured as follows. The model, data and methodology are described in Sect. 2. Section 3 focuses on the model ability to predict the annual means of SST, air-surface temperature, sea-ice volume and extent. We analyze the simulated 2000’s global mean temperature (GMT) changes in Sect. 4. The case-study of the mid-90’s north Atlantic warming is investigated in Sect. 5. A summary and discussion are given in Sect. 6.

2 Data and methodology

2.1 The CERFACS-HR model

CERFACS-HR is a high resolution Atmosphere–Ocean General Circulation Model (AOGCM) developed at Centre Européen de Recherche Avancée en Calcul Scientifique (CERFACS). The atmospheric model is the ARPEGE-climate v5.3 with a horizontal resolution of about 50 km and 31 vertical levels (T359L31), developed at Météo-France/CNRM (Centre National de Recherches Météorologiques). This version of ARPEGE shares the same physic as ARPEGE v2, used in the CMIP5 exercice (CNRM-CM5.1, Voldoire et al. 2013), but runs at a higher resolution (truncature T359 instead of T127) and does not use the SURFEX (SURFface EXternalisé) modeling system. The land-surface Interaction between Soil Biosphere and Atmosphere (ISBA) model (Noilhan and Planton 1989; Noilhan and Mahfouf 1996) is used in order to represent continental surfaces (soil, vegetation, snow).

The ocean model is the Nucleus for European Models of the Ocean (NEMO) v3.4 (Madec 2008) from the Institut Pierre-Simon Laplace (IPSL). NEMO is discretized on a ORCA025L75 grid (horizontal resolution of about 0.25° and on 75 vertical levels) at global scale decreasing poleward (an isotropic Mercator grid in the Southern Hemisphere, a quasi-isotropic bipolar grid in the Northern Hemisphere with poles over land at 107°W and 73°E). Vertical grid spacing is finer near the surface and increases with depth. Further information on the DRAKKAR ORCA-R025 grid is given in Barnier et al. (2006). The Louvain-la-Neuve Sea Ice model LIM2 3.3 (Vancoppenolle et al. 2009a, b) is embedded into the ocean modeling system NEMO, on a C-grid dynamic-thermodynamic model and includes the representation of the subgrid-scale distributions of ice thickness, enthalpy, salinity and age. The atmospheric and oceanic components are coupled with OASIS3-MCT2.0 (Valcke et al. 2013).

2.2 Initialization and external forcing

The hindcasts are full-field initialized every year from 1993 to 2009 (initialized November 1st every year). Ocean and Sea-ice initial conditions are given by the GLORYS2V1 ocean reanalysis product (Ferry et al. 2012). This reanalysis is based on an ocean and sea-ice general circulation model at 1/4° horizontal resolution assimilating in situ profiles of temperature, salinity and sea surface temperature.

The atmosphere is initialized from a SST-forced ARPEGE simulation over the same period. The different members are obtained by lagged perturbation of the atmospheric initial conditions (initialized from a different day of the same month). Finally, ten ensembles (1993–2002) of five members integrated over a 10-year period and seven additional ensembles (2003–2009) of five members integrated over a 5-year period are available. Each member shares the same radiative forcing and differs due to a different atmospheric initialization and to the model internal variability.

The model is constrained by the observed external forcings such as past and current greenhouse gases concentration, solar irradiance and various types of aerosols (black carbon, particulate organic matter, dust, sea salt and sulfate). The optical depths of the tropospheric aerosols are similar as that of the CMIP5 prescribed emissions (Szopa et al. 2013).

2.3 Model evaluation

The magnitude of the bias and drift of CERFACS-HR is assessed by comparing the raw hindcast data with the ocean reanalysis. CERFACS-HR exhibits positive biases of SST over the Austral Ocean (up to 6 °C) and over the eastern equatorial Pacific and Atlantic Oceans (up to 4 °C) (Fig. 1a). These strong biases, in particular over the equatorial and subtropical Oceans, are commonly obtained with climate model simulations. The bias in monthly GMST increases during the first 2 years, reaching 1.2 °C, and decreases then after 10 years (Fig. 1b).

Fig. 1
figure 1

a CERFACS-HR minus ERAI difference in surface air-temperature (°C), averaged over all the start-dates and lead-time, b drift in global mean surface air temperature (°C), over the 122 months lead-time, c drift in Arctic sea-ice extent (106 km2) over the 122 months lead-time, d mean value of the Atlantic meridional overturning circulation (Sverdrup) computed with GLORYS2V3 (contour) and CERFACS-HR (color), and its e CERFACS-HR minus GLORYS2V3 difference (color, in Sverdrup). f Annual mean of the drift of the Atlantic meridional overturning circulation, i.e. an index computed at 40°N and at a depth of 2000 m, for the 10 years lead time

The Arctic experiences a cold bias of 5 °C, at the east of Greenland (Fig. 1a). The sea ice extent is therefore anomalously large (+0.5 × 106 km2) during the first months of the simulations. The bias increases with time and can reach up to 2.5 × 106 km2 after 9 years (Fig. 1c).

The AMOC spatial pattern is reasonably simulated (Fig. 1d, e) despite a negative bias at the location of the core of North Atlantic deep water (the AMOC is weaker in CERFAC-HR than in GLORYS2V3 north of 30°N). Note that the ocean reanalysis AMOC data may also suffer from large biases and does not necessarily reflect the observed AMOC.

2.4 Observed data sets

The new Mercator product (GLORYS2V3; Lellouche et al. 2013) covers a longer period (1993–2012) than GLORYS2V1 (1993–2009) and allows us to take into account all the hindcasts integrated over a 10-year period. A comparison between the GLORYS2V1 and GLORYS2V3 products shows a similar mean state of the SST (not shown). Furthermore these products were produced with the same ocean NEMO global configuration, i.e. the ORCA025 grid (Lellouche et al. 2013), than CERFACS-HR.

The model ability to predict SSTs is investigated by comparing with the ERSST3b analysis. This product covers the 1854-present period on a monthly frequency and on a 2° × 2° horizontal resolution. The ERSST3b analysis includes the Extended Reconstruction SST version 2 (ERSST) improvements due to analysis methods tuning and to the inclusion of bias-adjusted satellite data (Smith et al. 2008).

The observed sea level pressure (SLP) and surface fluxes are given by the ERA-interim reanalysis (Simmons et al. 2007; Uppala et al. 2008; Dee et al. 2011) over a global domain and available from 1979 to present. ERA-interim corrects some of the ERA-40 errors: an improved representation of the hydrological cycle, a more realistic stratospheric circulation and a better temporal consistency of the reanalysis fields (Dee et al. 2011).

For the surface-air temperature, three sets of observations are used. The Hadley Centre/Climate Research Unit Temperature version 4 (HadCRUT4) provides near-surface air temperature from 1850 to present on a 5° × 5° horizontal resolution (Morice et al. 2012). HadCRUT4 covers about 84% of the globe (with a weaker concentration of observations at the poles and over Africa). The coverage is varying in time and missing values are associated to grid points receiving a weak coverage of observations. The incomplete global coverage is a potential source of bias in global temperature reconstructions. HadCRUT4 has been improved in Cowtan and Way (2014) with a better infilling of poorly sampled regions. This additional dataset is hereafter named COWT.

The NASA’s Goddard Institute for Space Studies Surface Temperature Analysis (GISTEMP) consists of an update of the Goddard Institute for Space Studies (GISS) analysis of surface temperature. The GISTEMP record attempts to address the coverage issue by extrapolating temperatures into unmeasured regions by means of kernel smoothing using a canonical kernel of radius 250 and 1200 km (Hansen et al. 2010). We used the data smoothed at 250 km. GISTEMP provides temperature anomalies (the base period is 1960–1990) over land and ocean from 1880 to present on a 2° × 2° horizontal resolution.

We also used the NOAA Merged Land–Ocean Surface Temperature (MLOST) version 3b of the National Climatic Data Center (NCDC). MLOST is a merged land air and sea surface temperature anomaly analysis, based on data from the Global Historical Climatology Network (GHCN) of land temperatures and the International Comprehensive Ocean–Atmosphere Data Set (ICOADS) of Sea Surface Temperature (SST) data. The temperature anomalies, with respect to 1961–1990 are analyzed separately and then merged to form the global analysis. More details are given in Smith et al. (2008) and Vose et al. (2012). MLOST is a spatially gridded (5° × 5°) global surface temperature dataset, with monthly resolution from January 1880 to present.

The Sea-Ice Extent (SIE) is given by the National Snow Ice Data Center (NSIDC) (Fetterer et al. 2009). As for the NSIDC sea ice index, the CERFACS-HR SIE is computed as the surface of the grid cells covered by ice when the ice concentration is above 0.15. The Sea-Ice Volume (SIV) time series is reconstructed from the Pan-Arctic-Ice-Ocean-Modeling-System (PIOMAS) (Zhang et al. 2008; Schweiger et al. 2011).

CERFACS-HR results were interpolated on different grids so to compare them with different as the reanalysis, i.e. a 1.5° resolution for SLP and surface fluxes, a 2.5° resolution for precipitation and a 5° resolution for surface air temperature. SSTs have been interpolated at the ERSST3b 2° resolution. The oceanic variables used in the Sect. 4 (temperature, salt, zonal and meridional velocities) are produced on the same ORCA025 grid as GLORYS2V1 and GLORYS2V3, and have therefore not been interpolated.

In Sect. 4 the observed heat content is computed with the Hadley Center reanalysis (EN4 data; Good et al. 2013) that has used all the available data sets including Argo floats of recent years. It is a global quality controlled ocean temperature data set, available from 1900 to present at a monthly frequency at 1° resolution with 42 vertical levels.

2.5 Methodology

The hindcasts cover a short period starting every year over a 17-year period (1993–2009). Hindcasts starting in 1993–2002 (2003–2009) were run for 10 (5) years. Such a short period can be strongly impacted by decadal variability and the removal of a parametric trend associated with the increase in well-mixed GHGs may lead to spurious results. Removing the trend (linear or quadratic) could therefore lead to remove part of the decadal signal. We thus chose to not detrend the temporal data fields.

2.5.1 Bias adjustment

Hindcasts start from the initial conditions and drift away as time progresses. This drift is due to model errors and reflects the model adjustment from the initial observed state back to its equilibrium state. We used a standard procedure following the World Climate Research Program recommendations (ICPO report on mean bias adjustment, 2011) to remove the drift, a posteriori and in a linear way.

Let \({Y}_{j}^{i}\left(\tau \right)\) the hindcasts and X j (τ) the corresponding verifying observations for each member i of the m simulations; where j = 1 to n indicates the initial year of the prediction (start-date), and τ is the lead time. For each lead-time τ we can thus define the mean model predictions \(\overline{Y}\) (τ) and observations \(\overline{X}\) (τ) as

$$\bar{Y}\left( \tau \right)=\frac{1}{nm}\underset{j=1}{\overset{n}{\mathop \sum }}\,\underset{i=1}{\overset{m}{\mathop \sum }}\,Y_{j}^{i}\left( \tau \right)$$
$$\bar{X}\left( \tau \right)=\frac{1}{n}\underset{j=1}{\overset{n}{\mathop \sum }}\,{{X}_{j}}\left( \tau \right)$$

The drift dr(τ) is thus estimated for each lead-time as the difference between the mean model prediction \(\overline{Y}\) (τ) and the observation \(\overline{X}\) (τ), as

$$dr\left( \tau \right)=\bar{Y}\left( \tau \right)-\bar{X}\left( \tau \right)$$

The corrected model prediction, for each member i, initial year j and lead-time τ is then defined as

$$\hat{Y}_{j}^{i}\left( \tau \right)=Y_{j}^{i}\left( \tau \right)-dr\left( \tau \right)$$

This method assumes that the drift is independent of the initial conditions. The drift is however not stationary and the results may be sensitive to this methodology (in the case where there is strong non-stationary drifts it could be preferable to use the method developed in Kruschke et al. 2015). Here we assume that the ICPO method is a reliable bias adjustment method.

In Sect. 3 we choose to remove the drift of SST based on the observed datasets used to assess skill (ERSST3b). The drift in air-surface temperature is removed with ERA-interim since HadCRUT4 is given in anomaly. We have test the sensitivity to the selected reanalysis used to remove the drift: we have found weak differences among the used reanalysis (not shown). In Sect. 4 we analyzed the subpolar gyre warming and used the GLORYS2V3 reanalysis (that are close to the GLORYS2V1 reanalysis used to initialize the ocean model while spanning a longer time period) to remove the drift on the original ORCA grid of the model. We used ERA-interim for heat fluxes, surface pressure and surface winds.

2.5.2 Evaluation of hindcast skill

The skill is computed with an Anomaly Correlation Coefficient (ACC) r, given by

$$r\left( \tau \right)=\frac{\mathop{\sum }_{j=1}^{n}\left[ {{X}_{j}}\left( \tau \right){{Y}_{j}}\left( \tau \right) \right]}{\sqrt{\mathop{\sum }_{j=1}^{n}{{\left[ {{X}_{j}}\left( \tau \right) \right]}^{2}}\mathop{\sum }_{j=1}^{n}{{\left[ {{Y}_{j}}\left( \tau \right) \right]}^{2}}}}$$

Where \({Y}_{j}\) is the ensemble mean anomaly for the j th hindcast starting in year j; \({X}_{j}\) is the observation anomaly for the corresponding starting date j. τ is the lead-time and n the number of starting date. Anomalies for \({X}_{j}\) and \({Y}_{j}\) are calculated independently, both having a zero mean over the hindcast period.

We estimate the significance of the ACC through re-sampling and 1000 permutations in a Monte-Carlo framework. The values are judged significant at the 5% level if the correlation values are stronger than 97.5% of the randomly obtained correlation values. We also use the root mean squared error (RMSE). RMSE measures the magnitude of the error between the hindcasts and the observations while ACC measures only the phase difference between observations and hindcasts. RMSE and ACC thus provide complementary information on the prediction skill of CERFACS-HR.

The 4-year persistence is computed for the SST, SIE and SIV based on the observed values of SST, SIE and SIV in the 4 years prior to the start date. The skill of this empirical model to predict the climate depends on the initial condition given the climate system memory.

3 Model skill

3.1 Sea surface temperature

The ACC is computed for the first year of the hindcasts (lead-time 1) and averaged over two different lead-times: 2–5 and 6–9 years (based on the hindcasts for which the time-period is long enough). The skill of the first year is strongly dependent on the initial conditions and represents the seasonal-to-interannual predictions; the 2–5 years lead-time represents the interannual timescale (Goddard et al. 2013). The comparison of the 2–5 and 6–9 year means allows analyzing the dependence of the skill on lead time. The 4-year average reduces higher frequency noise. Another reason of the choice of the lead-times 2–5 and 6–9 is that they are frequently used (Bellucci et al. 2013, 2014; Goddard et al. 2013; Karspeck et al. 2015, among others) and results are thus comparable with other modeling studies.

The skill in predicting annual mean sea-surface temperature (SSTs) is given by the ACC between the hindcasts and the ERSST3b analysis. The first year exhibits significant correlations over the Atlantic Ocean (mostly over the tropical Atlantic and the Caribbean), the Indian Ocean and the Pacific Ocean, west of 150°W (Fig. 2a). The 2–5 year lead-time still exhibits skill over tropical and North Atlantic (but with a lesser extent), Indian and West of the Pacific Ocean (Fig. 2b). The 6–9 year lead-time exhibits negative correlations over the Pacific Ocean and the skill over the Caribbean vanishes, underlying the loss of skill in some areas (Fig. 2c).

Fig. 2
figure 2

Anomaly coefficient correlation of the SST hindcasts (with respect to ERSST3b) for (a) year 1, (b) years 2–5 and (c) 6–9. The trend is not removed. The hindcasts considered are the 17 5-year ones initialized every year from 1993 to 2009 for lead-times 1 year and 2–5 year, and the 10 10-year ones initialized every year from 1993 to 2002 for the lead-time 6–9 year. Hatching indicates that the ACC is positive and significant at the 95% confidence level according to a Monte-Carlo procedure

It has been shown in Sanchez-Gomez et al. (2015) that the CNRM-CM5 model drift is due to quasi-systematic excitation of an ENSO event at the first year of the hindcasts. Since the drift is not totally removed by the ICPO adjustment method a similar mechanism could occur in CERFACS-HR (which also used the ARPEGE and NEMO model but at higher resolution) and can partly explain the weak skill obtained in the Pacific Ocean.

For the 6–9 year lead-time, the most prominent result is the skill obtained over the tropical Atlantic that is unexpected since biases, due to the poor representation of the mean intensity and dynamic of the cold tongue and seasonal cycle of the SSTs (Okumura and Xie 2004; Richter and Xie 2008) generally lead to low skill in these regions (Stockdale et al. 2006). Figure 2c shows annual mean correlation but significant correlations remain in March-April-May over the tropical Atlantic (not shown), the period of the cold-tongue set-up and the season where the bias is at its maximum (Zermeño-Diaz and Zhang 2013).

The RMSE is strong but no significant over the equatorial Pacific and the subtropical north Atlantic, for the first year (Fig. 3a). The 2–5 and 6–9 year lead-time areas with large and significant RMSE value correspond to the areas of large and negative ACC value, i.e., the Pacific Ocean for 2–5 and 6–9 year lead-time and the Austral Ocean and subtropical North Atlantic for the 6–9 year lead-time (Fig. 3b, c). The loss of skill is thus due to inabilities of CERFACS-HR to reproduce both the magnitude and phase of the SST.

Fig. 3
figure 3

Root mean squared error, computed with the hindcasts and ERSST3b for the sea-surface temperature, and for (a) year 1, (b) years 2–5 and (c) 6–9. The trend is not removed. The hindcasts considered are the 17 5-year ones initialized every year from 1993 to 2009 for lead-times 1 year and 2–5 year, and the 10 10-year ones initialized every year from 1993 to 2002 for the lead-time 6–9 year. Hatching indicates that the RMSE is not due to the sampling a 95% confidence level according to a Monte-Carlo procedure

Different boxes are defined in order to show the ability of the model to predict regional indexes: the Atlantic multidecadal oscillation (AMO) index (0°N–60°N; 80°W–0°W), the nino3.4 index (5°S–5°N; 170°W–120°W), the tropical Atlantic ([3°S–3°N; 20°W–10°E) and Indian SSTs (30°S–10°S; 60°E–100°E) (see Fig. 2a). The global mean SST domain is defined as the average of all the observed grid points containing non-missing values.

Figure 4a shows the correlation score for these five indexes and for 4-year averaged periods. Gray shading represents the non-significant correlations (according to the Monte-Carlo test) and the 4-year persistence is represented within the purple line.

Fig. 4
figure 4

Anomaly coefficient correlation of the SST hindcasts (with respect to ERSST3b) for (a) the global ocean, (b) the AMO, (c) the tropical Atlantic, (d) the Indian Ocean and (e) the nino 3-4 index. The areas used to compute the indices are defined within the Fig. 1a. The red (purple) line denotes the hindacst (4-year persistence) skill (trends included). The hindcasts are initialized every year from 1993 to 2002 (i.e. 10 hindcasts of 10 years). The gray shading indicates that the ACC is not significant at the 95% confidence level according to a Monte-Carlo procedure

When we consider the global domain, results are significant for the first lead-time, decrease as time progresses, and become non-significant for the lead-time 3–6 year (Fig. 4a). It is almost zero at a lead-time 6–9 year. The 4-year mean persistence is not beaten by CERFACS-HR leading to the conclusion that the model ability to predict the mean global SSTs is mostly due to the external forcing. The decrease of the skill is partly due to the negative correlation over the Pacific Ocean from the lead-time 2–5 to 6–9 year (Fig. 4e). This is consistent with the Fig. 2c which exhibits negative correlations over the Pacific Ocean.

Correlations are significant for the AMO, Tropical Atlantic and Indian Ocean. The skill of the AMO is positive and significant for the lead-times 1–4 year to 4–7 years and becomes non-significant for the lead-times 5–8 and 6–9 year. A positive and significant skill for the AMO was obtained by numerous authors (Kim et al. 2012; Chikamoto et al. 2013; Bellucci et al. 2014; Garcìa-Serrano et al. 2015). This high predictability was attributed to multidecadal variability of the ocean dynamics (Pohlmann et al. 2004) and to the initialization of the ocean (Garcìa-Serrano et al. 2015).

Over the tropical Atlantic, correlations using CERFACS-HR are significant and stronger than persistence for lead-times up to 3–6 years. The skill in predicting tropical Atlantic SSTs is associated with variables linked to the tropical Atlantic SSTs variability: the westerly winds (Richter et al. 2014a, b), precipitation over the adjacent areas (Chang et al. 2008; Richter and Xie 2008; Zermeño-Diaz and Zhang 2013) and deep convection (Richter et al. 2014b). Since no skill is found on the pressure field (involving the subtropical gyre), the heat content (involving the memory of the ocean) and the meridional wind and precipitations (Supplementary material S1) we hypothesize that this result could be due to the sampling (we only used ten start-dates). The question of the relationship between these variables and the tropical Atlantic SSTs in CERFACS-HR is however not under the scope of this study.

The skill over the Indian Ocean is positive and marginally significant for several lead times (1–4 years, 2–5 years and 6–9 years). Guemas et al. (2013a) used a set of decadal predictions and found that the Indian Ocean stands out as the region where the predictions of SST perform the best worldwide. The models ability to predict the SSTs is mainly due to the long-term warming trend (as in Guemas et al. 2013a).

The CERFACS-HR model is thus able to predict the decadal trends of several indexes of SST (AMO, Tropical Atlantic and Indian) and with at least similar skill as that shown in studies based on low-resolution models (Bellucci et al. 2013; Chikamoto et al. 2013; Guemas et al. 2013a).

3.2 Sea ice extent and volume

The observed shrinking of summer Arctic sea ice extent (SIE) (Serreze et al. 2007; Stroeve et al. 2012) has increased accessibility to marine waters and has raised the possibility of an ice-free Arctic in the near-future (Wang and Overland 2009). This rapid decline also contributes to the polar temperature amplification (Screen and Simmonds 2010) and is thereby an indicator of climate change. Accurate prediction of SIE and the sea ice volume (SIV) can thus lead to a better understanding of the incoming climate change. There are plenty of prediction studies at a monthly to seasonal timescale (Sigmond et al. 2013; Day et al. 2014; Msadek et al. 2014b; among others) about sea-ice but only a few studies at decadal time-scales with for instance Blanchard-Wrigglesworth et al. (2011) and Germe et al. (2014).

The skill in simulating the SIE and SIV 10-year trends is investigated with the Fig. 5 for ten start dates (from the 1994–2003 to 2003–2012 trend). The observed SIE and SIV trends are included in the CERFACS-HR spread for a majority of start dates, highlighting the importance of both the external forcing and the initial conditions in predicting these quantities. The mean simulated and observed trends are negative (except observed SIE in winter), indicating a continuous shrinking of SIE and SIV. CERFACS-HR well simulates the sign of the long-mean trend with a better result in summer (Fig. 5b, d) than in winter (Fig. 5a, c) (the SIE and SIV shrinking is not apparent in the observations in winter and is over-estimated with CERFACS-HR). The internal variability of SIE and SIV is strong (the ensemble-spread of summer SIE is of 2 million km2 for the 2002–2011).

Fig. 5
figure 5

Ten-year linear trends for 1994–2003 to the 2003–2012 periods for (a) winter SIE, (b) summer SIE, (c) winter SIV and (d) winter SIV. The SIE trend is expressed in 106 km2 and the SIV trend in 103 km3. The observed SIE (from NSIDC) and SIV (from PIOMAS) are represented with a black cross. Each hindcast is represented by an orange circle and the ensemble mean by an orange dashed line. The gray areas represent the spread (more or less one standard deviation of the ensemble) computed from all the hindcasts of each start dates, and the white area is the average over all the hindcasts. The hindcasts are initialized every year from 1993 to 2002 (i.e. 10 hindcasts of 10 years with 5 members)

4 The global mean temperature fluctuation of the 2000’s

Despite a continuous increase in atmospheric GHGs, the GMT has shown a quasi-stabilization since 1998. This recent temperature evolution has been named “the hiatus” and has been associated with several possible factors: a negative phase of the Pacific decadal oscillation (PDO) with anomalously cold SSTs over the eastern Pacific ocean and more heat uptake (Meehl et al. 2011, 2014; Kosaka and Xie 2013; Watanabe et al. 2013; Trenberth and Fasullo 2013; Guemas et al. 2013b; England et al. 2014; Douville et al. 2015), the recent stratospheric volcanic aerosol trend (Fyfe et al. 2013; Santer et al. 2014, 2015; Haywood et al. 2014; Ridley et al. 2014; Schmidt et al. 2014; Brühl et al. 2015; Mills et al. 2016), the decrease in tropospheric water vapor concentration (Solomon et al. 2010) and the solar minimum around 2009 (Kopp and Lean 2011).

We analyzed the ability of CERFACS-HR to simulate the so-called hiatus over the recent decades. Figure 6 represents the decadal linear trends for the hindcasts initialized from November 1997 to November 2002. The hindcast surface air temperatures are interpolated on the same horizontal grids and with the same missing values as in COWT, MLOST and in GISTEMP and the respective trends are calculated over the 10-year period starting in January following the initialization (e.g. over the 1998–2007 period for the hindcasts starting in November 1997).

Fig. 6
figure 6

Ten-years linear trends (°C by 10 years) for 1998–2009 to the 2003–2012 time-period for COWT (black cross), MLOST (blue square), GISTEMP (orange triangle) and for the hindcasts interpolated to the same horizontal resolution and with the same spatial coverage of the observations. Each hindcast is represented by a circle and the ensemble mean by a dashed line. The gray areas represent the spread (more or less one standard deviation) computed with the average of each hindcast ensemble mean and the white area is the average over all the hindcasts. The hindcasts are initialized every year from 1997 to 2002 (i.e. 6 hindcasts of 10 years with 5 members)

The uncertainty of the annual-mean GMT evolution is addressed using these three observed datasets, which trends are shown on Fig. 6, i.e. black crosses for COWT, blue squares for MLOST and orange triangles for GISTEMP. The trends computed over the last decades are weaker as time progresses and highlight the decrease of the global warming rate. The observations are globally consistent but with COWT exhibiting values about 0.1 °C per 10 years higher. This is mainly due to the differences in spatial coverage among the various observations and shows the impact of high-latitude regions since regions north of 60°N present a trend of about 0.5 °C per 10 years higher (Fig. 7b). Indeed Cowtan and Way (2014) have shown that the coverage bias causes a cool bias in recent temperature trend relative to the late 1990s, that increases from around 1998 to present. Hansen et al. (2010) have also shown that the global temperature change is sensitive to estimated temperature in polar regions where observations are limited.

Fig. 7
figure 7

Ten-year linear trend (°C by decade) for the 2003–2012 period for the annual mean air-surface temperature for (a) CERFACS-HR (hindcast initialized in 2002) and (b) COWT; stippling indicates zones where the trends are significantly different (i.e. the observed trend is not included in the modeled mean trend +/− 1.64 standard deviation). 10-year linear trend (107 J m− 2 by decade) for the heat content [0–2000 m] with (c) CERFACS-HR and (d) EN4; stippling indicates that the trends are significant at the 95% confidence level according to a Spearman’s rank correlation test

The large spread among the hindcasts stands out when different grid points are replaced by missing values as in COWT (black circles), MLOST (blue circles) and in GISTEMP (orange circle). This is due to strong positive trends over the North Pole in the model (Fig. 7a), which are more or less masked depending on the various observed masks. For each start date, the gray area represent the spread (+/- one standard deviation) computed with all the hindcasts, and the white area is the average over all the 3 interpolated hindcasts, for each start-date. The spread is strong with mean trends ranging from −0.08 to +0.19 °C per 10 years. From 1999 and on, the warming rate decreases with time in the hindcasts as in the observations with the weakest trends occurring over the last decade (2003–2012). The model is thus able to simulate the hiatus of the 2000’s. Interestingly, the hindcasts considered with the spatial coverage of MLOST (CERFACS-HR/MLOST, blue circles) exhibit 10-year trends closer to the raw MLOST observations (blue squares) than CERFACS-HR/GISTEMP or CERFACS-HR/COWT with their respective observations.

Note that for the hindcasts starting in November 1997, the model is initialized with an anomalously warm SSTs over the eastern Pacific ocean (the 1998 El-Nino) and the 1998–2007 trends are largely weaker than in the observations (this is even more evident when we only consider the tropical latitudes, not shown). With the standard bias-correction method, it is assumed that the drift is stationary and independent of the initial conditions. The strong discrepancy between the hindcasts and the observations for these strong events suggests that the bias-adjustment method may not be able to correctly remove the drift.

The last decade (2003–2012) exhibits the weakest 10-year trend and is reasonably simulated since the difference between the observations and the model does not exceed the model spread (Fig. 6). Figure 7 shows the 10-year trend in air surface temperature for CERFACS-HR and COWT (Fig. 7a, b), and ocean heat content [0–2000m] for CERFACS-HR and EN4 (Fig. 7c, d). We used the 0–2000m depth to take into account the upper and mid ocean layers, which have undergone an increase in heat uptake during the hiatus period (Meehl et al. 2011). The horizontal resolution of EN4 is weaker than for CERFACS-HR and each grid cell point contains thus, by construction, more heat. The heat content comparison was made available by scaling the observations and CERFACS-HR by the surface of each grid cell. The heat content values represent thus the heat contained in a 1 m2 area down to a depth of 2000 meters. The trend in air surface temperature is positive over the Arctic and the Indian Ocean (Fig. 7a). The positive trend over the western Pacific and the negative trend over the eastern Pacific Ocean indicate that CERFACS-HR simulates a negative phase of the PDO. This is consistent with the observed trend (Fig. 7b). Over the Pacific Ocean stippling indicates that the trend is more negative in the observations than in CERFACS-HR. This pattern is however consistent with the literature and is associated with an increase (decrease) in the ocean heat content over the Western Pacific and Indian (eastern Pacific) Ocean (Fig. 7c, d). The positive and significant heat content trend over the Indian Ocean is consistent with Lee et al. (2015) who show that the heat has been transferred from the Pacific Ocean to the Indian Ocean through the Indonesian straits.

During this decade the volcanic activity has led to an increase in stratospheric aerosol optical depth that is able to explain a part of the hiatus (Fyfe et al. 2013; Santer et al. 2014; Haywood et al. 2014). In CERFACS-HR the stratospheric aerosol optical depth is prescribed based on the data of Vernier et al. (2011). Unfortunately a technical problem led to an inaccurate representation of these volcanic aerosols in the model leading to a very small effect equivalent to that of a small and constant volcanic aerosol background. Hence, the ability of CERFACS-HR to simulate the hiatus cannot be attributed to the impact of the stratospheric aerosols and can therefore mainly be explained by the ocean initialization and heat uptake. Additional simulations are needed to conclude on a possible impact of the stratospheric aerosols on the modeled hiatus. This is the main topic of a current work with CERFACS-HR.

The ability of CERFACS-HR to simulate the recent 2000’s temperature fluctuation could seem contradictory to inability of CERFACS-HR to simulate the GMST at decadal scale (6–9 years). These two results are however not directly linked since the former is associated with the hindcasts of only a few start-dates and the latter is based on all the hindcasts.

5 North Atlantic warming of the mid-90’s

In this section we address the question of the model ability to predict the North Atlantic warming that occurred during the mid-90’s, when the subpolar gyre warmed significantly by 1 °C in just 2 years (1995–1996) (Robson et al. 2012a). At the same time the North Atlantic Oscillation (NAO) went from a long positive phase (1989–1995) to a negative phase (1996). Different hypotheses, linking the subpolar gyre and the North Atlantic Ocean heat content (OHC) with the phase of the NAO, have been proposed to explain this sudden warming (Hurrell 1995; Bersch 2002; Hurrell et al. 2003; Bersch et al. 2007; Sarafanov et al. 2008; Ortega et al. 2011, 2012; Barrier et al. 2015). The first hypothesis is based on the long positive phase of the NAO (NAO+) and the second on the short and strong negative NAO (NAO−) phase that occurs in 1996 (Lohmann et al. 2009):

  • A NAO + phase is associated with heat loss over the subpolar North Atlantic (Barrier et al. 2014) through increased westerlies across the subpolar gyre (Hurrell et al. 2003; Visbeck et al. 2003; Yashayaev 2007; Barrier et al. 2014). Temperatures cool down, leading to an increase in deep-water formation and to a strengthening of the AMOC (Lohmann et al. 2009; Ortega et al. 2012). This favors an increase in northward oceanic heat transport leading to the warming of the subpolar North Atlantic Ocean several years later (Johns et al. 2011; Msadek et al. 2013).

  • The NAO- phase has been associated with an increase in OHC over the subpolar gyre through a change of the oceanic circulation and air-surface heat fluxes (Bersch 2002; Bersch et al. 2007; Sarafanov et al. 2008; Lohmann et al. 2009; Barrier et al. 2015). The NAO- phase can favor a shrinking and a weakening of the subpolar gyre and thus a northward motion of warm subtropical waters into the subpolar gyre (Hátún et al. 2005; Sarafanov et al. 2008). The north Atlantic warming could thus be linked to the abrupt change from a positive to a negative phase of the NAO (as shown by Lohmann et al. 2009 maintaining NAO- conditions in numerical experiments).

The first hypothesis involves the NAO- phase and therefore implies low-predictability (Collins 2002; Müller et al. 2005) while the second one implies higher predictability through a lagged AMOC response (Griffied and Bryan 1997; Pohlmann et al. 2004; Collins et al. 2006, Pohlmann et al. (2013).

5.1 Oceanic changes

The subpolar gyre simulated in our hindcasts is represented with the annual-mean barotropic stream function (BSF) in Fig. 8. Negative values indicate a cyclonic gyre that extents from the Labrador Sea to the Iceland Basin with a maximum south-east of Greenland. The subpolar gyre (hereafter noted SPG) area is defined from 60°W to 10°W and from 50°N to 65°N as in Robson et al. (2012a, b) and Msadek et al. (2014a) (see the box in Fig. 8).

Fig. 8
figure 8

Annual mean barotropic streamfunction (Sv) averaged over all the hindcasts. The hindcasts are initialized every year from 1993 to 2002 (i.e. 10 hindcasts of 10 years with 5 members). Negative (positive) values denote counterclockwise (clockwise) circulation. The subpolar gyre area (SPG) is defined as the box represented in black: [60°W–10°W; 50°N–65°N]

Figure 9 exhibits the OHC anomalies integrated over the SPG area and from the surface to a 500-m depth for CERFACS-HR and GLORYS2V3. We used the 0–500-m integrated anomalies but checked that the warming is consistent over the 0–1000-m depth (Supplementary material S2). Since the shift occurs in 1995–1996, we focus on hindcasts initialized few years before (i.e. in 1993 and 1994; hereafter DH93 and DH94) as well as those initialized in 1995 and 1996 (hereafter DH95 and DH96). GLORYS2V3 exhibits a strong warming from 1993 to 1998 (Fig. 9a; black line). DH93 and DH94 do not simulate a strong and abrupt warming (with a positive anomaly in heat content) over the SPG (Fig. 9a, b). However the simulations initialized in 1995 and 1996 exhibit a shift (Fig. 9c, d), as shown by the large and sudden increase of OHC during the first 4 years. DH95 simulates a strong warming (year 1 to 6), with the SPG remaining anomalously warm during 5 years (year 4 to 8), and finally cools down from year 6 to 10 (Fig. 9c). DH96 exhibits a warming that maintains itself during 8 years (Fig. 9d) and thus simulates the observed persistent SPG warming. The following analysis focuses on DH95, which is the first hindcast of the series to simulate the SPG warming.

Fig. 9
figure 9

Oceanic Heat Content (1021 joules) [0–500 m] over the SPG for GLORYS2V3 (black line) and (a) DH93, (b) DH94, (c) DH95 and (d) DH96 (red line). The anomalies are computed with respect to the time average over the 10-years hindcasts. The spread is computed on the 5 members as more or less 1 standard deviation (red shading). The zero line is in gray

A comparison of the vertically integrated temperature [0–300 m] between the GLORYS2V1, GLORYS2V3, ORA-S4 (Balmaseda et al. 2013), and Nemovar1 (Balmaseda et al. 2010) shows that GLORYS2V1 and GLORYS2V3 are close to but warmer than ORA-S4 and Nemovar1 of about 0.5 °C over the SPG area (Supplementary material S3). ORA-S4 and Nemovar1 cover a longer period than GLORYS2V1 and GLORYS2V3; however, their lower temperature may lead to an overestimation of the warming and therefore these reanalysis were not used throughout this study.

The evolution of the volume weighted annual-mean temperature anomalies [0–500 m] are shown on Fig. 10 over the 1996–2005 period. The first two years exhibit a positive temperature anomaly centered at 45°N and 40°W (Fig. 10a, b). This anomaly then propagates northward following the eastern side of the SPG (Fig. 10c, d) and reaches the northern SPG boundary during the fifth year (the maximum of warming is located at the south-east of Greenland on Fig. 10e). As in Fig. 9c the SPG then starts to cool down and the positive temperature anomaly slowly vanishes (Fig. 10h, j). The timing of DH95 is not exactly synchronized with the reanalysis since the maximum warming occurs in 2001 (instead of 1998 in GLORYS2V3) but its amplitude is well reproduced (about 1 °C).

Fig. 10
figure 10

Temperature anomalies (°C) [0–500 m] for DH95 and from 1996 to 2005. The anomalies are computed with respect to the average over all the 10-years hindcasts. Dots indicate that anomalies are significant at the 95% confidence level according to a Student t-test

Some studies have suggested that the SPG warming is due to the northward migration of subtropical anomalously warm and salty waters. The salt anomalies [0–500 m] simulated by DH95 are shown within the Fig. 11. A positive salt anomaly propagates from the subtropical gyre (Fig. 11a) to the SPG (Fig. 11e–h). At the same time the subtropical gyre exhibits a negative salt anomaly suggesting a transfer from the subtropical to the SPG. As for temperature the maximum is located at the south-eastern tip of Greenland. The positive anomaly located on the eastern side of the subtropical gyre highlight the importance of the gyre as a source of salt and temperature for the SPG warming. This anomaly is also found in Msadek et al. (2014a) in the simulation that reproduces the SPG warming. It also suggests a possible role of the Mediterranean Sea in exporting anomalously warm and salty waters into the subtropical gyre.

Fig. 11
figure 11

Same as Fig. 10 but for salt anomalies (psu)

The subtropical gyre strengthens along with the North Atlantic current during the first five years of DH95 (Fig. 12a–e). In 1998 a robust and positive anomaly extends from the north-west (55°W–40°N) of the subtropical gyre to the east (40°W–55°N) of the SPG (Fig. 12c). It reinforces in 1999 (Fig. 12d) denoting a northward shift of the Gulf Stream (for more clarity the mean climate is not shown but the reader can refers to the Fig. 8). The BSF weakens from 1998 to 2000 over the eastern boudary of the SPG. The SPG continue then to weaken (Fig. 12d, f), especially in 2003 (Fig. 12f).

Fig. 12
figure 12

Same as Fig. 10 but for barotropic streamfunction anomalies (Sv)

In summary the strengthening of the north Atlantic current favors the northward move of a salty and warm anomaly that reaches the Iceland Basin, south of Greenland and the Labrador Sea several years later (see Fig. 11e for the salt and Fig. 10e for the temperature).

The evolution over the 1996–2002 period of the Meridional Heat Transport (MHT) integrated over the whole oceanic column is analyzed in Fig. 13. MHT increases from 1996 to 2000 (with significant anomalies for the first 2 years) north of 40°N with a maximum at 55°N (Fig. 13a). MHT decreases after 2000. The MHT change is thus consistent with the temperature evolution over the SPG, from 40°N to 60°N (Fig. 10).

Fig. 13
figure 13

a Meridional heat transport anomalies \(tv\) (PW) and its decomposition into (b) \(\bar{T}{v}'\), (c) \(\bar{v}{T}'\) and (d) \({v}^{\prime}{T}^{\prime}\) for DH95. The anomalies are computed in regard to the average over all the 10-years hindcasts. Bars denote the monthly means and prime the departure from the monthly means. Dots indicate that anomalies are significant at the 95% confidence level according to a Student t-test

The MHT anomaly is decomposed in the anomalous advection of mean temperature (\(\overline{T}{v}^{\prime}\)), the mean advection of anomalous temperature (\(\overline{v}{T}^{\prime}\)) and the cross-product of anomalies (\({T}^{\prime}{v}^{\prime}\)) (Fig. 13). This decomposition is computed at a monthly time-scale, bars denoting monthly means and prime the departure from the monthly means. These products, averaged in annual means are shown in Fig. 13. The anomalous advection \(\overline{T}{v}^{\prime}\), denoting an enhanced northward heat transport over the SPG associated with a strengthening of the oceanic circulation, dominates the MHT change (the anomalies of \(\overline{T}{v}^{\prime}\) are significant from 1996 to 2002) between 50°N and 55°N (Fig. 13b). The anomaly of MHT due to the mean advection of anomalous temperature (\(\overline{v}T^{\prime}\)) is positive and significant north of 55°N during the first five years of DH95 (Fig. 13c). This (\(\overline{v}T^{\prime}\)) anomaly is located at 55°N during the first 3 years, then moves northward and reaches 60°N from 1998 to 2001; it is generally weaker than the \(\overline{T}{v}^{\prime}\) anomaly but dominates the MHT anomalies at 60°N during the 1998–2000 period. This mean advection of temperature anomalies is thus sustaining the warming first created by the anomalous advection of mean temperature. The (\({T}^{\prime}{v}^{\prime}\)) component is weak and can be neglected over the SPG (Fig. 13d). During the last two years of DH95, the decrease of temperature is associated with a weaker MHT linked with a decrease of both \(\overline{v}T^{\prime}\) and \(\overline{~T}{v}'\).

Figure 14 shows the mixed-layer depth anomalies averaged over the years 1996–2000. The mixed layer is deeper, consistent with more deep water formation over the Labrador Sea. It denotes an increase of convective activity (Ortega at al. 2011), leading to an AMOC strengthening and an increase of meridional heat transport (Johns et al. 2011; Msadek et al. 2013). The deepening of the mixed layer is also consistent with the \(\overline{T}{v}^{\prime}\) increase and confirms (as in Msadek et al. 2014a) the prominent role of the ocean dynamics on the SPG warming. The AMOC is particularly strong in November 1995 (and stronger than for the other start date) in GLORYSV21 (not shown). The DH95 dynamical shift is probably, at least partly, built-in in the initial condition. The mixed-layed depth anomaly weakens after 2002 and vanishes during the second half of the simulation (2001–2005).

Fig. 14
figure 14

Mixed layer depth anomalies (m) for the years 1–5 of DH95. The anomalies are computed with respect to the average over all the 10-years hindcasts. Dots indicate that anomalies are significant at the 95% confidence level according to a Student t-test

5.2 Atmospheric changes

Some studies have also suggested a possible role of heat fluxes in relation with the atmospheric variability (Bersch 2002; Bersch et al. 2007; Sarafanov et al. 2008; Lohmann et al. 2009). The surfaces fluxes are shown in the Fig. 15. By convention positive (negative) values denote downward (upward) anomalies. The net surface heat flux (the sum of net shortwave, net longwave, sensible and latent heat fluxes) averaged over the SPG exhibits positive values over the 1998–2003 period (Fig. 15a), which means that the atmosphere is warming the ocean in DH95. The surface flux anomalies are stronger over the Labrador Sea, the Irminger Sea and the Iceland Basin (Fig. 15b). However, the strongest positive anomaly is located south of 50°N (southern boundary of the SPG). The net shortwave and longwave fluxes show weak and non-significant anomalies (Fig. 15c, d). Sensible and latent heat fluxes exhibit stronger positive anomalies (Fig. 15e, f), denoting a decrease in oceanic heat loss.

Fig. 15
figure 15

(a) Surface net fluxes (net shortwave + net longwave + sensible + latent) anomalies (W m−2) averaged over the SPG, (b) 3–8 year average of net surface flux anomalies and its decomposition into the (c) net shortwave anomaly, (d) net longwave anomaly, (e) sensible heat flux anomaly and (f) latent heat flux anomaly. By convention positive (negative) values denote downward (upward) anomalies. The anomalies are computed with respect to the average over all the 10-years hindcasts. Dots indicate that anomalies are significant at the 95% confidence level according to a Student t-test

We now discuss changes in the surface wind speed and sea-level pressure (SLP) average over the 1998–2003 period (Fig. 16). The SLP denote a negative phase of the NAO with positive (negative) anomalies of SLP over Iceland (the subtropical latitudes). The SLP exhibits a cyclonic anomaly between 30°N and 50°N and an anticyclonic anomaly north of 50°N. The wind decreases over the south-eastern boundary of the SPG and acts to weaken the SPG, favoring the northward move of a salty and warm anomaly from the subtropical to the SPG, as described in Hátún et al. (2005). The decrease of the wind speed over the SPG, associated with the pressure anomalies, is able to weaken the oceanic heat loss through the weakening of the latent and sensible heat fluxes (Fig. 15e, f). These changes are strong in winter (DJF) but not consistent in summer (JJA) (not shown). CERFACS-HR is able to reproduce the negative phase of the NAO. However this does not indicate a skill in predicting the NAO since only concerning one start-date: the NAO is not skillful on interannual time-scales in CERFACS-HR (not shown). We have computed the fraction of heat increase due to the net oceanic heat transport (difference between the heat entering the southern boundary of the SPG and the heat leaving its northern boundary) and the part coming from the atmosphere (the net heat surface fluxes). One-third of the OHC change is due to atmospheric changes over the 1998–2003 period. The anomalies of net surface heat fluxes are thus not negligible.

Fig. 16
figure 16

3–8 year average of sea-level pressure (hPa, color) and surface wind speed anomaly (m s−1, vector). The anomalies are computed in regard to the average over all the 10-years hindcasts. Dots indicate that anomalies are significant at the 95% confidence level according to a Student t-test. Only the significant wind anomalies are represented, in red (black) when significant at the 95% (90%) confidence level according to a Student t-test

The pressure anomaly may be due to a remote effect of the Atlantic SST (Czaja and Frankignoul 2002) and/or East Pacific SST (Huang et al. 2002; Brönnimann 2007) on the NAO. With only five members we cannot assess the robustness of such a relationship. This will need to be confirmed when larger ensemble sizes will be routinely used with high-resolution models.

6 Conclusion

CERFACS-HR was first evaluated in term of its ability to predict temperature and precipitation. CERFACS-HR shows some skill at predicting the AMO, Indian and tropical Atlantic SST as shown by related indexes that exhibit reasonable correlations for 1–4 to 3–6 year lead-time. However CERFACS-HR presents no skill in predicting the eastern pacific SSTs. Skill in predicting SSTs was found in other studies (Chikamoto et al. 2013; Bellucci et al. 2013, 2014; Garcìa-Serrano et al. 2015; Karspeck et al. 2015; among others) and are mainly due to the warming trend. The skill improvement due to a spatial resolution increase is not assessed in this study. We nevertheless observe that the CERFACS-HR skill in predicting temperature is slightly weaker than in the literature. This can be explained by several factors: (1) the model tuning aims to improve the ability of a climate model to simulate the mean climate. High resolution coupled climate models are still computationally very expensive. Less sensitivity tests are thus performed with high-resolution models meaning that they are usually less optimized than low-resolution AOGCMs. The bias reduction associated with the time devoted to preliminary tests is important as it is related with prediction performance (DelSole and Shukla 2010). (2) We only ran a few start dates covering a period with no major volcanic events. In contrast, the studies based on the CMIP5 models used longer time periods (1960–2009), including major volcanic events (El-Chichón, Pinatubo) that improve model skill (Guemas et al. 2013a). (3) Another difference comes from the start-date frequency, every 5 years in the CMIP5 protocol and every year in the present study. Garcìa-Serrano et al. (2015) have noticed different results using 5-year rather than 1-year date frequency.

Interestingly, the model is able to simulate the observed slow-down of the global warming rate over the recent period. This hiatus is associated with a negative phase of the PDO and an increase of the ocean heat uptake in both CERFACS-HR and observations. This is consistent with Meehl et al. (2011), Kosaka and Xie (2013), and Watanabe et al. (2013) among others. Since the observed PDO has experienced a negative phase during the 2000’s (Trenberth and Fasullo 2013), the question arises as to the main driver of the PDO variations, external forcing (e.g. the volcanic forcing due to the small to medium size eruptions after 2005) or internal variability. In order to confirm the role of ocean initialization versus the external forcing, we have performed two other sets (6 members each) of 10-year simulations starting from November 2002 and differing by the applied volcanic forcing (a constant weak background is used in one set while the observed forcing is used in the second set). The two sets simulate a negative phase of the PDO (Monerie et al. submitted) with the main contribution coming from the ocean initialization and an additional cooling due to volcanic forcing (Fyfe et al. 2013; Santer et al. 2014; Haywood et al. 2014; Ridley et al. 2014; Schmidt et al. 2014; Brühl et al. 2015; Mills et al. 2016).

The last part of the paper focuses on the case study of the North Atlantic warming of the mid 1990s. The ability of the model to reproduce such an event is not directly related to the overall skill over the North Atlantic. The simulation initialized in November 1995 simulates a quick warming, of about 1 °C, in just 5 years. The analysis of temperature, salt and circulation anomalies reveals that this warming is associated with a strengthening of ocean dynamics. The subtropical gyre strengthens and its north-western boundary transports the anomalously warm water to the SPG. The SPG then warms and weakens. The increase of the North Atlantic temperature through a strengthening of the oceanic circulation is consistent with Robson et al. (2012a, b), Yeager et al. (2012) and Msadek et al. (2014a).

DH95 exhibits a robust strengthening of the heat transport by the anomalous advection of mean temperature (\(\overline{T}{v}^{\prime}\)). The SPG warming predictability comes mainly from the initialization of the ocean. Note that DH96 simulates an increase of the mean SPG temperature with a MHT change largely dominated by the mean advection of temperature anomalies (\(\overline{v}T^{\prime}\)) north of 50°N (Supplementary material S4). In DH96 the shift has already occurred and the temperature anomaly is incorporated in the initial conditions. The warm anomaly is located inside the SPG at the beginning of DH96 and stay inside the SPG during the simulation period (Fig. 9d). The reproducibility of the shift mechanisms is thereby dependent on the initial conditions. Interestingly analysis of DH96 confirms that the SPG warming may be sustained by the \(\overline{v}T^{\prime}\) anomaly.

There is also a contribution from the net surface heat flux indicating a warming of the ocean by the atmosphere over the 1998–2003 period. A negative phase of the NAO is simulated during this period and favors a decrease of the wind speed over most of the SPG. This has multiple consequences: (1) the latent and sensible heat fluxes weaken and reduce heat loss from the ocean (2) the decrease of the wind speed over the south-eastern boundary of the SPG acts on the SPG circulation, favoring the northward move of the warm anomaly along the gyre eastern boundary. The warming of the SPG in CERFACS-HR is thus due to both an increase of the oceanic heat transport and of the local change due to atmospheric forcing, which sustains the positive anomaly of temperature. This is consistent with the relation found in Bersh (2002), Bersh et al. (2007), Sarafanov et al. (2008), Lohmann et al. (2009) and Barrier et al. (2014) between a NAO—phase and anomalously warm SSTs over the North Atlantic Ocean. We hypothesize that this negative phase of the NAO acts to reduce the meridional heat transport and the anomaly of mixed-layer over the Labrador Sea, in consistency with Ortega et al. (2012).

In this study we show that the north Atlantic warming of the mid-1990’s is predictable due to the ocean initialization (ocean memory): DH95 is initialized with a strong AMOC and thus with an anomalously strong oceanic heat convergence.