1 Introduction

As an integrator of both natural variability and past external forcing, the ocean contains much of the climate predictability at decadal time scales. Hence, properly initializing the ocean state provides crucial added skill to decadal climate predictions (e.g. Meehl et al. 2013; Guemas et al. 2013). The enhanced skill in predictions due to initialization is demonstrated in simulating various global, as well as regional, climate processes, such as the abrupt warming of the North Atlantic subpolar gyre in the 1990s (Yeager et al. 2012; Robson et al. 2012; Wouters et al. 2013), the increase in Atlantic hurricane frequency since the 1970s (Vecchi et al. 2013; Smith et al. 2010), North Atlantic sea surface temperature (SST) (Keenlyside et al. 2008; Pohlmann et al. 2009; van Oldenborgh et al. 2012; Hazeleger et al. 2013a; Bellucci et al. 2013) and tropical Pacific SST (Keenlyside et al. 2008). This enhanced skill is also evidenced for climate indices in different basins as the Atlantic Multi-decadal Oscillation (AMO, Garcia-Serrano et al. submitted), the Pacific Decadal Oscillation (PDO) (Chikamoto et al. 2012), the Atlantic Meridional Overturning Circulation (AMOC) (Swingedouw et al. 2013; Pohlmann et al. 2013); and the global SST (Smith et al. 2007; Doblas-Reyes et al. 2013).

In order to initialize the ocean in their prediction models, decadal prediction research groups either use ocean reanalyses or in-house initial conditions. The latter choice is motivated by the usually large initial shock of the former due to the incompatibility between the “alien” reanalysis and the “home” model used for the hindcasts. The initialization can be either full state or anomaly and it can concern just the surface of the ocean or the deeper layers. The limitations to full state approach are that the model forecasts drift towards its systematic error while in case of anomaly approach, there are possible mismatches in observed anomalies and model climatology (e.g. in the Gulf Stream region). Yet, it remains unclear, which of the strategies is better suited to decadal predictions as both show similar skill (or lack thereof) (Magnusson et al. 2013; Smith et al. 2013; Hazeleger et al. 2013b). The other issue is to keep the model’s internal dynamics coherent so that the hindcasts and forecasts remain on the observed trajectory. For this purpose nudging only the surface appears a less intrusive approach. The idea is to assume that oceanic processes themselves will transport the surface forcing information into the ocean interior through relevant processes (Keenlyside et al. 2008; Swingedouw et al. 2013). Of course surface nudging imposes less constraint on the ocean interior, so that one has to assess the resulting subsurface reconstruction.

The amount of subsurface temperature observations has evolved drastically since 1950 from the top 100 to 1,000 m in the twenty-first century (Carton et al. 2012). Nevertheless, these improvements falls short to provide global three-dimension (3D) observations of the ocean for the last 50 years or so, as required for the decadal prediction exercises. Thus one has to resort to ocean reanalysis that reconstruct the ocean state using some type of data assimilation schemes and ocean General Circulation Models (GCMs), themselves driven by air-sea fluxes devised from atmospheric reanalysis. These ocean reanalysis provide information on past oceanic state and can be used as initial conditions for climate prediction model (e.g. Chang et al. 2013; Bellucci et al. 2013). Nevertheless, there are differences among the reanalyses in reconstructing regional climate processes and upper ocean heat content (Xue et al. 2012; Zhu et al. 2012). Beyond the lack of observations, the uncertainty among reanalyses is also attributed to the chaotic nature of the heat content variability in certain regions (Xue et al. 2012), the changing density of observations over time (Carton et al. 2012), the models used, the assimilation methods and the forcing fields (Xue et al. 2012; Zhu et al. 2012; Masina et al. 2011) in particular in the tropical Atlantic and Southern Ocean. Nevertheless, these have to be taken in account when evaluating reconstructions made for decadal prediction studies, and one should use at least several reanalyses. When nudging SST only, these problems are somewhat reduced since more quality SST observations are available compared to subsurface observations over the last century (Carton et al. 2012). Kumar et al. (2014) have shown that SST nudging is able to reconstruct the observed subsurface ocean variability in equatorial Pacific through coupled air-sea interaction processes. They suggest that such specification of SST could replicate the ENSO related variability in the ocean subsurface (e.g. Luo et al. 2008). Merryfield et al. (2010) very recently demonstrated the performance of using SST nudging in their multi-seasonal ensemble predictions. However idealized model experiments have highlighted some potential problems in not considering potentially valuable subsurface data leading to spurious AMOC and poor skill in decadal predictions in the extra-tropics (Dunstone and Smith 2010).

In an attempt to further understand the processes at play, Servonnat et al. (2014, hereafter S2014) analyzed the impact of surface nudging on subsurface reconstruction through a perfect model approach. They performed several sensitivity experiments on a 150-year portion of a control run, with SST restoring and sea surface salinity (SSS) restoring to another portion of the control simulation. Their study showed that SST restoring is efficient to reconstruct the subsurface ocean in the tropics, mainly in the Tropical Pacific, with significant correlations down to 2,000 m facilitated through dynamical adjustment processes due to the impact of SST on the tropical atmosphere circulation and surface winds. In the mid to high latitudes SSS nudging applied in addition to SST leads to improved reconstruction of the 3D temperature and salinity, mainly through formation of water masses at right surface densities followed by dynamical processes transmitting the signal to further depths. It now remains to be evaluated how these results hold when restoring the model towards actual SST observations in an historical context.

The current study follows the work of Swingedouw et al. (2013) and S2014. The same climate model, IPSLCM5A, is used and the performance of surface initialization for the period 1949–2005 in reconstructing subsurface variations is evaluated with a focus on ocean temperature. Swingedouw et al. (2013) used nudged and hindcast simulations to study the initialization and predictability of the AMOC and here we extend the focus to the global and regional reconstruction of the ocean state. We also assess the impact of horizontal atmospheric resolution on the performance of the reconstruction.

The paper is organized as follows. Section 2 describes the model and the experimental set up including an overview of the climate model, the model simulations and the observation-based reconstructions used together with the statistical methods employed for the analysis. Section 3 provides a background comparison among few ocean reanalyses and among model independent reconstructed datasets. Section 4 presents the comparison of the nudged and historical simulations to these observation-based independent datasets. Section 5 provides a discussion and conclusions of the analysis and implications to future studies (Table 1).

Table 1 List of simulations

2 Experimental design

2.1 Model description

We use the earth system model from Institut Pierre Simon Laplace (IPSL) in its version 5A (IPSLCM5A) (Dufresne et al. 2013) both in low atmospheric resolution (IPSLCM5A-LR) and in medium atmospheric resolution (IPSLCM5A-MR). The atmospheric model is LMDZ5 (Hourdin et al. 2013) with 96 × 96 points on horizontal grid corresponding to a resolution of 1.875° × 3.75° in its “low resolution” (LR) version and with 144 × 143 points corresponding to a resolution of 1.25° × 2.5° in its “medium resolution” (MR) version. Both LR and MR use 39 vertical layers. Impact of refined horizontal atmospheric grid for atmospheric processes is described in Hourdin et al. (2013). The ocean model is NEMOv3.2 (Nucleus of European Modelling of the Ocean, Madec 2008) in ORCA2 configuration with 182 × 149 points, non-regular grid with a nominal resolution of 2°, refining to 1/2° in the tropics, with 31 vertical levels. NEMOv3.2 also includes the sea-ice component LIM2 (Fichefet and Maqueda 1997) and PISCES (Aumont and Bopp 2006) for ocean biogeochemistry. Evaluation of the performance of the oceanic configurations for the coupled model is described in Mignot et al. (2013).

2.2 Historical simulations

The first set of simulations analyzed in the study consists of historical simulations for the period 1850–2005, following the CMIP5 protocol. For each model resolution, different members are considered, differing only by their initial conditions. Initial conditions in 1850 are taken from a 1,000-year control simulation under preindustrial conditions from different dates separated by 10 years. Four members are considered with the LR configuration and three with the MR. The external forcing used for historical simulations includes the increase in greenhouse gases, aerosol concentrations, ozone changes, land-use changes, estimates of solar irradiance and volcanic eruptions (cf. Dufresne et al. 2013).

2.3 Nudged simulations

The second set of simulations consists of nudged simulations. In these simulations, simulated SST anomalies are restored towards observed ERSST anomalies (Reynolds et al. 2007; Smith et al. 2008). In practice, a heat flux term QR is added in the SST conservation equation, as QR = γ (SST mod  − SST obs ), where the SST mod is the anomalous SST from the model at each grid point and each time step and SST obs is the observed SST anomaly at that location and time. However, to avoid tampering with the ocean internal dynamics that contains the predictive signal, nudging is not performed when sea ice covers more that 50 % of the grid point in the higher latitudes (Swingedouw et al. 2013). The hypothesis to be tested in the current study is that on long timescales the surface signal penetrates into the ocean interior through physical processes. The SST anomalies for nudging are computed over the period 1949–2005 in both the historical simulations and the ERSST. We use a restoring coefficient γ = −40 W/m2/K. This γ is chosen on a physical basis—it corresponds to the observed magnitude of thermal air-sea coupling and stands for a relaxing timescale of around 60 days (Swingedouw et al. 2013; S2014). By using this value, which is comparatively lower than that used in other studies (e.g. Keenlyside et al. 2008), we aim at avoiding the formation of spurious water masses in strong gradient regions. Each nudged simulation starts on 1st January 1949 with initial conditions obtained from an individual historical simulation and uses the same external forcing as the latter. The sets of nudged simulations are composed of 4 members for the LR configuration and 3 for the MR.

2.4 Observational datasets: reanalysis

In order to assess the quality of oceanic reconstruction performed by the nudging, we use different observational datasets. First, we use HadISST (Rayner et al. 2003) as an independent dataset to compare the surface temperature from the nudged simulations as well as ocean reanalyses. The subsurface reanalyses considered in the study are Simple Ocean Data Assimilation (SODA, Carton and Giese 2008) and the European Center for Medium Range Weather Forecast-Ocean ReAnalysis System (ORAS4, Balmaseda et al. 2013).

SODA version 2.2.4 (SODA 2.2.4, hereafter SODA refers to SODA 2.2.4) (Giese and Ray 2011; Ray and Giese 2012) uses POP version 2.0.2 as its ocean model on an average resolution of 0.25° × 0.4° with 40 vertical levels and 10 m spacing in the upper 100 m and assimilating surface temperatures from ICOADS2.5 dataset and subsurface temperature and salinity observations from World Ocean Database 2009 (WOD09). Satellite altimeter data were not assimilated. SODA 2.2.4 spans the period 1871–2005 and uses twentieth century reanalysis (20CRv2, Compo et al. 2011) for its surface boundary conditions.

ORAS4 (1958–2009) uses NEMO version 3.0 as its ocean model on an ORCA1 configuration with 42 vertical levels, 18 of which is in the upper 200 m. Apart from temperature and salinity profiles from EN3_v2a datasets, ORAS4 also assimilates along-track altimeter-derived sea-level anomalies (Balmaseda et al. 2013). The surface forcing for ORAS4 is derived from ERA40 reanalysis until 1989 and from ERA-Interim reanalysis from 1989 to 2009.

The two reanalyses also use different methods to assimilate observations, while SODA 2.2.4 uses sequential optimal interpolation scheme, ORAS4 uses the NEMOVAR, an incremental four-dimensional variational scheme. Thus, these two reanalyses differ in many aspects apart from assimilating similar observations (though SODA 2.2.4 does not use altimeter data). To consider overlapping periods with our simulations, we retain 1949–2005 from SODA 2.2.4 and 1958–2005 from ORAS4 for our analyses.

Various objectively reconstructed datasets of the upper (0–300 and 0–700 m) ocean heat content are made available by Levitus et al. (2009) (Levitus hereafter); Ishii and Kimoto (2009) (Ishii hereafter); Domingues et al. (2008) (Domingues hereafter). These datasets have used various statistical in-filling methods in regions of no observations. Levitus and Ishii use objective mapping techniques to in-fill data whereas Domingues use statistics of observed ocean variability estimated from altimeter data. AchutaRao et al. (2007) points out the difficulty to estimate the observed heat content variability reliably due to the incomplete time-varying observations. Nevertheless, we propose to use these additional datasets to evaluate performances of our experiments since they do not rely on any general circulation model.

2.5 Statistical analysis

In order to remove long-term trend associated with global warming in the different simulations and observational datasets we linearly detrend (unless otherwise mentioned) the yearly averages (1949–2005) in the simulations, HadISST, and the reanalyses. Time series are detrended after spatial average while maps showing grid point analyses were detrended linearly on grid-point scale.

In analyzing the monthly averages in addition to linearly detrending we also remove the annual cycle. For each variable and each type of simulation (historical, nudged, using the LR and the MR configuration respectively), both the ensemble mean and spread among members are considered. The spread among members, computed as a root mean squared error with respect to the ensemble mean represents the range of variability among the members. The ensemble standard deviation should be lower than the observed variability and of the same order of magnitude as the observational uncertainty so that we can consider that initialization has added value. It provides confidence in the reconstruction if that range is encompassing the observed variability. We also consider correlations between our simulations and the observational datasets. On all spatial maps, only the significant correlations at the 90 % level are shown in colors. Significance of correlations is estimated using a one-sided student t test, where the serial autocorrelation has been taken into account for the computation of the degrees of freedom following Bretherton et al. (1999). We also use a 20-year sliding window to compute the correlation among datasets in order to assess the temporal variation in the correlation. Hence, using a 20-year sliding window, correlation coefficients for the period 1949–2005 are shown from 1959 (for 1949–1968) and end in 1996 (for 1986–2005). Significance is estimated following the same procedure.

To understand the impact of surface nudging in reconstructing the subsurface at different depths in global and regional domain, we first assess how well and where the two reanalyses agree and differ.

3 Comparison among datasets

3.1 Winds

Here we present an assessment of the wind stress differences among the reanalyses. The yearly zonal wind stress anomalies from ERA-40 and 20CRv2 reanalyses show significant correlation in over much of the ocean surface for the period 1958–2001 (Fig. 1). Correlation of 0.9 is seen in the northern ocean basins, equatorial Pacific, southwestern Pacific. Poor correlation is seen in most of the tropical and southern Atlantic, central to southern Indian Ocean, and eastern Pacific. Although a detailed assessment of discrepancies among the wind products is beyond the scope of this study, we note that the northern basins are observationally better covered than the southern basins, which might explain the higher correlations. In poorly observed areas the reanalyses depend more strongly on the underlying GCMs, which can have large biases. The southwest Pacific shows better agreement among the reanalyses than other southern basins, however large biases in reanalysis exist (Bromwich and Fogt 2004). This region witnesses the poleward and eastward wave train patterns in circulation anomalies associated to El Niño teleconnection in the southern hemisphere (Karoly 1989). We note that the zonal wind stresses in the reanalyses are much better represented in the equatorial Pacific (correlation of 0.8), which has better observational coverage (TOGA-TAO. McPhaden et al. 1998) than in the equatorial Atlantic (correlation of 0.45).

Fig. 1
figure 1

In-phase correlation between zonal wind stress from 20CRv2 (used as forcing in SODA) and ERA-40 (used as forcing in ORAS4) for 1958–2001. Only correlations significant at the 90 % level are shown. We account for auto-correlation within the time series for the computation of the degrees of freedom following Bretherton et al. (1999). This is the case for all statistical tests performed in this study

3.2 Latitude-depth section

Figure 2a–d shows correlation of annual anomaly of zonally averaged temperature in the reanalyses as a function of depth. The correlation between the global zonal means (Fig. 2a) is significant down to a depth of around 400 m. In both Pacific and Atlantic basins, significant correlation reaches down to around 600 m in the subtropical region between 30°N and 40°N (Fig. 2b, c), where subduction of water masses enables penetration of surface signals into the ocean interior. The signal is consistent down to 800 m in the latitudinal band of 50°N–60°N, mostly in the North Atlantic (Fig. 2c) where deep convection events occur. While in general the correlations are quite low even if significant at 90 %, most of the highest correlations in the equatorial band comes from the equatorial Pacific (Fig. 2b), which can be related to the longer observational coverage by TAO array.

Fig. 2
figure 2

In-phase correlation between SODA and ORAS4 for annual mean over the period 1958–2001 for zonally-averaged temperature for a global, b Pacific, c Atlantic, and d Indian Ocean. Correlations significant at the 90 % level are shown. Isotherms of SODA overlaid in black

In the framework of oceanic initialization for decadal predictions, the results from Fig. 2 crucially raise the question of which reanalysis to refer to initialize or validate the ocean interior. For the present study, whose purpose is to assess the performance of the surface nudging in reconstructing the oceanic variability, we use the correlation presented here as a benchmark, considering that we cannot validate the nudged simulations in regions where reanalyses do not agree.

3.3 Upper ocean heat content

Upper ocean heat content is an integrated view of the vertical temperature structure. It is therefore an important parameter to reconstruct in the context of generating initial conditions for decadal climate predictions (Meehl et al. 2009; van Oldenborgh et al. 2012). Several groups (Levitus, Domingues, Ishii) have reconstructed this integrated quantity without using a climate model.

Comparison of the upper 0–700 m heat content (Fig. 3a) from reconstructed datasets (Levitus, Ishii and Domingues) and the reanalyses (SODA and ORAS4) highlights the discrepancies among the different products, even for periods in the late twentieth century when the observational coverage was relatively high (Corre et al. 2012). The correlation among the different datasets is shown in Fig. 3c using a 20-year sliding window. The bold marks highlight 90 % significant correlation between the respective products. The reconstructed products are significantly cross-correlated for very limited time periods, indicating a weak agreement in terms of interannual variability. The reanalyses do not show significant agreement among themselves either (Fig. 3c). This is consistent with the large discrepancies in temperature below the first few hundred meters of the upper ocean, as seen in Fig. 2. We note that, the reconstructed dataset from Levitus shows significant correlation with SODA for a longer period than with ORAS4 (Fig. 3c). These results confirm the uncertainty in the upper ocean heat content variability in observations and reanalyses discussed by AchutaRao et al. (2007) and Gleckler et al. (2012). Instrumental biases, and their relative contribution to the observing system over time, have added to the discrepancies among the datasets. Gleckler et al. (2012) indeed pointed toward the structural uncertainty among the different datasets and attributed it mainly to the impact of different methods of XBT bias corrections.

Fig. 3
figure 3

Time evolution of a 0–700 m heat content and b 0–300 m heat content in reanalyses and reconstructed products. Correlation (centered on 20-year sliding window) among the different c 0–700 m heat content datasets for 1955–2002 and d 300 m heat content datasets. Colors in legend show the correlated datasets. Significant correlations at the 90 % level are highlighted in symbols

The integrated oceanic heat content over the upper 300 m shows less discrepancy. In particular, reanalyzed data down to this depth are significantly correlated among themselves (Fig. 3d). This is consistent with the relatively good agreement of temperature down to 300 m depth in all basins (Fig. 2). This indicates that oceanic heat content down to 300 m could be used with relative confidence to validate the oceanic subsurface reconstruction.

3.4 Averaged temperature above fixed isotherms: Tiso

Beyond all possible sources of discrepancies among reanalyses and reconstructions one has to consider that spatial averages of such quantities integrated over a fixed depth may include water masses that cut across different isothermal surfaces and are driven by different dynamical processes (Palmer and Haines 2009). While integrating over fixed depth in regional basins alleviates some of these difficulties, averaging vertical temperatures over fixed isotherms as suggested by Palmer and Haines (2009) helps considering a physically consistent integrated quantity. Such approach is particularly adapted to the tropical and subtropical regions up to 30°–35° of latitude, where isotherms are organized as a regular bowl. The averaged temperature above the 14 °C isotherm (represented as Tiso14) is used by Corre et al. (2012) as an indicator of upper ocean variability to evaluate the subsurface ocean variability in the global climate change context. The average depth of 14 °C isotherm (30°S–30°N) in the reanalyses (250 m) limits a direct comparison of Tiso14 to fixed depth (0–300 m) heat content.

The two reanalyses show significant correlations in the variations of the depth of the 14 °C isotherm (Fig. 4a) in the western Pacific warm pool region (greater than 0.8), and in the tropical (20°S–20°N) Pacific (0.6–0.7). These correspond to the regions where the zonal wind stress in the two reanalyses agree well (Fig. 1). Correlations as high as 0.8 are also seen in the southern tropical Indian Ocean, where the winds are not well correlated. The cause of such high correlation is not known yet. In the tropical Atlantic, correlations are generally lower but still significant. Equatorial Pacific and the tropical Indian Ocean are better correlated in terms of Tiso14 (Fig. 4b). In the subtropical Atlantic the two reanalyses show higher correlation in Tiso14 (Fig. 4b) than in the depth of 14 °C isotherm (Fig. 4a). Such a difference could be partly explained by the fact that the averaged temperature contains information on the surface temperature that are nudged towards observed SST, while the depth of the isotherms represents a subsurface signal with more degrees of freedom. The fact that ORAS4 uses altimeter satellite data and SODA does not, could be an additional source of the difference among the reanalyses as altimeter data affects the depth of 14 °C isotherm by pushing the water column down (Corre et al. 2012). Following Corre et al. (2012), Tiso14 provides an interesting benchmark to quantify the quality of reconstructed tropical upper ocean variability.

Fig. 4
figure 4

Correlation (significant at the 90 % level) in a depth of 14 °C isotherm and in the b averaged temperature above 14 °C isotherm (Tiso14) in SODA and ORAS4 for the period 1958–2005

4 Performance of nudging in reconstructing the 1949–2005 climate variations

4.1 Reconstruction of SST variations

Figure 5a shows the time evolution of globally averaged annual SST for the period 1949–2005 for simulations and several datasets (HadISST and SODA). As expected the nudged simulations better capture the trend and variability of the data (correlation of 0.91 with HadISST over the whole period) compared to the historical simulations (correlation of 0.8). We detrend the global SST to correlate the natural variability component of SST (Fig. 5b). Correlations on detrended time series with HadISST are still high for the nudged simulations (0.8) but not significant in the historical simulations (0.4). This demonstrates the ability of the nudging to imprint some of the observed variability in the modeled surface ocean temperature. However, this efficiency is not homogenous. From an oceanic point of view, one major bias of the LR mean climate state in pre-industrial (or historical) conditions is a cold bias at mid latitudes (Dufresne et al. 2013; Mignot et al. 2013). This cold bias is due to an equatorward shift of zonal atmospheric mid-latitude jets, and a biased oceanic circulation. The winter sea-ice extent is largely over estimated in LR (e.g. Escudier et al. 2013) and thus limits deep water formation in the Nordic Seas. Figure 6 shows in phase correlation of detrended SST in the ensemble mean of LR-nudged with HadISST. Regarding the higher latitudes, we recall here that no nudging is applied when and where more than 50 % of the grid point is covered by sea ice, which explains the lack of correlation at high northern and southern latitudes. Beyond these specific regions correlations are strongest in the tropics and on the poleward flank of the subtropical gyres. They are weakest (below 0.5) in regions of strong horizontal currents (the Southern Ocean and western boundary currents in particular) as well as in regions of upwelling. As discussed by S2014, all these energetic oceanic processes can overwhelm the nudging term for SST variability.

Fig. 5
figure 5

a Ensemble mean of global SST (3 years smoothed, mean of 1949–2005 removed) in IPSLCM5A-LR (blue bold-nudged, thin-historical), and global SST in HadISST (line black) and SODA (dash black). Standard deviation of the ensemble members are in envelopes and bars LR-historical (blue envelope), and LR-nudged (blue bars). b Same as a but linearly detrended

Fig. 6
figure 6

Correlation of yearly mean detrended SST from HadISST and ensemble mean of IPSLCM5A-LR nudged simulations for the period 1949–2005. Significant correlations at the 90 % level are shaded

4.2 Reconstruction of zonal wind stress

Figure 7 shows the correlation between the zonal wind stresses in LR-nudged and the atmospheric reanalyses ERA-40. While there is no linear correlation in the time evolution of the zonal wind stress between the historical simulations and the observations (not shown), Fig. 7 demonstrates that, given the strong air-sea coupling in the tropics, SST nudging allows a reconstruction of the zonal wind stress in the tropical Pacific (cf. S2014), particularly in the west, where SST correlations are highest. Indeed, at low latitudes, the SST zonal gradient leads to strong sea-level pressure gradients that drive part of the zonal wind variability. The reconstruction of surface winds may in turn strongly influence the surface and subsurface temperature in the tropics. Indeed, Kumar et al. (2014) demonstrated that SST nudging reconstructs the equatorial Pacific ocean subsurface through correct initialization of the surface winds above. Comparing figure 4a of Kumar et al. (2014) to present Fig. 7, we find zonal wind reconstructions both in western and central equatorial Pacific. Such a constrain of SST nudging on the zonal wind stress in tropical regions was also detected in the perfect model approach of S2014. However this feature is not seen to such an extent in the tropical Atlantic and is absent in the tropical Indian Ocean, probably due to the lack of surface temperature nudging on the continents, which may limit reconstruction of the large-scale circulation and hence surface zonal winds. The SST variability in the eastern equatorial basins is not well constrained by SST nudging, where upwelling processes strongly contribute to surface temperature variability (Fig. 6).

Fig. 7
figure 7

Correlation (significant at the 90 % level) of zonal wind stress in ERA-40 (used as forcing in ORAS4) with LR-nudged for the period 1958–2001

4.3 Reconstruction of subsurface temperature

Figure 8 shows the correlation of the global zonal average of monthly temperature anomaly between the LR nudged simulations and the reanalysis down to 800 m (color shading). The solid black line denotes the regions where the reanalysis agree (Fig. 2) and the grey hatching corresponds to domains where they do not agree (no significant correlation at the 90 % level). As discussed above, we consider that in this domain, uncertainty in the observed oceanic state is such that no evaluation of the model reconstruction can be done. To further emphasize this, we show the correlations both using the SODA reanalysis (top panels) and the ORAS4 reanalysis (bottom panel) for the period 1958–2005. The maximum monthly mixed layer depth is shown as a solid blue curve. Outside the tropical band it limits the maximum depth that is potentially directly affected by surface fluxes, and therefore, by the nudging.

Fig. 8
figure 8

Correlation (significant at the 90 % level) of globally averaged zonal subsurface temperature (monthly anomalies) between a LR-nudged simulation and SODA 2.2.4 for the period 1949–2005; and b LR-nudged and ORAS4 for the period 1958–2005. Black contour the boundary of 90 % significant correlation among reanalyses, the hatched side shows significant disagreement (and includes land points as well). The bold blue line is the maximum depth of mixed layer. Isotherms from the nudged simulations are in red contours

The tropical regions (30°S–30°N) show strong correlations at the surface and in the upper ocean, following the typical bowl shape of the tropical thermocline. Correlations are minimum around the thermocline and then are significant again down to more than 200 m around the Equator. This pattern is seen with both reanalysis and originates in the Pacific Ocean (Fig. 9a). Indeed, as discussed in S2014, it is associated with the surface wind stress reconstruction in that basin (Fig. 7). Around 20°N significant correlations in both the Pacific and Atlantic basins (Fig. 9) (stronger in the later) extend deeper than the maximum mixed layer depth. This suggests that at these latitudes, the signal from the surface due to SST nudging is able to penetrate in the ocean interior via other mechanisms than just turbulent mixing such as the subduction that occurs near the center of the subtropical cell. Significant correlations with reanalyses is seen between 40°N and 55°N down to 300 m but within the winter mixed layer depth (Fig. 9a, b) in both the Pacific and Atlantic Ocean. Between 55°N and 60°N significant correlations are deeper in the Atlantic Ocean than in the Pacific Ocean and are associated with deep convection.

Fig. 9
figure 9

Same as Fig. 8a but for a Pacific, b Atlantic, c Indian Ocean averaged zonal subsurface temperature (monthly anomalies) between LR-nudged simulation and SODA

Figure 8 further shows regions where the reconstruction can be improved, either via adding other types of surface nudging like SSS (cf. S2014) or wind stress or by direct subsurface nudging. These regions (white areas between the color shading and the grey hatching) cover the Southern Ocean, mid-latitudes between ~20°S and 60°S down to 700 m and the tropics below 300 m. These are regions known to be challenging to be correctly represented by models (Sloyan and Kamenkovich 2007; Sallée et al. 2013, Carman and McClean 2011).

While the analysis of reconstructed subsurface in the nudged simulations for the period 1949–2005 using correlation maps provides an assessment for the whole period, it possibly masks out different regimes of climate variability. In the following section we assess the time variability of the upper ocean heat content.

4.4 Reconstruction of upper ocean heat content variations

Figure 10a shows the variations in the 0–300 m heat content for simulations, reanalyses and the observation-based Levitus product (black line). The rapid decrease in heat content post 1963 Agung eruption, 1982 El Chichon eruption and 1991 Pinatubo eruption are relatively well reconstructed by both the nudged simulations and the historical simulations. Nevertheless the time series shows that the time evolution is better reconstructed in the nudged simulations than in the historical ones, although the amplitude of variability is under-estimated in it. For the upper ocean heat content to be reconstructed by the ensemble of nudged simulations, the ensemble standard deviation should encompass the different observations and the ensemble standard deviation of the nudged simulations should be of the same order of magnitude as the observed uncertainty. The nudged simulations show significant correlation in global 300 m heat content with the Levitus dataset for brief period in the 60s, 70s and 80s (bold symbols over the thick blue curve) while the historical simulations show no significant correlation with Levitus (Fig. 10a).

Fig. 10
figure 10

Linearly detrended upper 0–300 m heat content in IPSLCM5A-LR (blue bold-nudged, thin-historical), Levitus (black line), SODA 2.2.4 (black dash), and ORAS4 (green dash) averaged over a global, b Tropical Pacific, c Tropical Atlantic, d Tropical Indian, e North Atlantic, and f North Pacific. Spread among nudged members in colored bars, and spread among historical runs in shaded envelope. 90 % significant correlations (over 20-year sliding window) with Levitus are in red bold symbols. A smoothing of 3-years is applied to all time series

Figure 10b–f shows the time evolution of the 0–300 m ocean heat content averaged over different basins (defined in Fig. 1) together with the Levitus product and the two reanalyses. The decadal oscillations of the heat content in the tropical Pacific consistently seen in all the datasets (Levitus, SODA and ORAS4) for the period 1955–1980 are not well reproduced by the nudged simulations (Fig. 10b). From the 1980s, the ensemble mean of nudged simulations is significantly correlated with the Levitus product (Fig. 10b) and with SODA (not shown).

The nudged simulations show weak correlation to Levitus in the tropical Atlantic (Fig. 10c) over most of the period except in the 1980s. Nevertheless, we notice an increase in the spread of the nudged simulations over this period.

In the tropical Indian Ocean the nudged simulations show significant correlation to Levitus (Fig. 10d) for any 20-year long period between 1960 and 1990. Since the wind variability in the region is not well reconstructed, the relative agreement in 0–300 m heat content between the historical and nudged simulations to Levitus may be due to external forcing such as volcanoes. We also notice in this basin a large discrepancy between model simulations and reconstruction around 2000, whose origin remains unknown.

Regarding the mid to high latitudes, significant correlations in 0–300 m heat content between LR-nudged and Levitus are found in the northern Pacific (Fig. 10f) for the period 1962–1985. The reanalyses are also close to Levitus in this region. The late 1980s does show significant reconstruction of the North Atlantic heat content in the nudged simulation. We also notice similar large increase in the late 1990s both in the observations and nudged simulations. It could be explained by a large increase in the AMOC at the same period (cf. Swingedouw et al. 2013), as also suspected in the data by Robson et al. (2012) or Yeager et al. (2012).

As expected from the previous section and Figs. 8 and 9, the nudged simulations are not able to reconstruct the global average of 700 m heat content compared to Levitus (not shown).

4.5 Tiso14

The depth of 14 °C isotherm in the Tropical Pacific varies from 200 m in the eastern Pacific to 400 m in the off-equatorial western Pacific (Fig. 11) in the simulations. The eastern equatorial Pacific is characterized by a bias in the mean depth of the isotherm, located near sharp zonal gradient in the thermocline (Fig. 11). The 14 °C isotherm in SODA is deeper (between 200 and 300 m) than in LR nudged simulations (100–200 m shallower than SODA) in the eastern equatorial region. Both the Tiso14 and the upper 300 m heat content shows no significant correlation in this region (Fig. 12). Neither the Tiso14 (Fig. 12a) nor the upper 300 m heat content (Fig. 12b) is consistently reconstructed globally in the nudged simulation. Figure 11 shows that away from the outcropping latitudes, the 14 °C isotherm is generally too shallow in the eastern Atlantic and Pacific in the simulations compared to reanalyses. Surface nudging does not significantly alter the depth of the 14 °C isotherm (Fig. 11a, b). In the tropical Atlantic, where the depth of 14 °C is close to 200 m, the averaged Tiso14 time series is reconstructed in the nudged simulations showing significant correlations to the reanalyses (Fig. 13c). In the western subtropical Atlantic gyre, where the 14 °C isotherm is deepest in both the simulations and reanalyses, it is shallower in the nudged simulations (600–650 m) than in SODA (750 m). The tropical Indian Ocean shows deeper 14 °C isotherm (300–400 m) in the nudged simulations compared to SODA (200–300 m), however the Tiso14 time series is significantly reconstructed compared to the reanalyses (Fig. 13d).

Fig. 11
figure 11

Depth (in m) of 14 °C isotherms in a LR-historical, b LR-nudged, and c SODA respectively

Fig. 12
figure 12

Correlation map (significant at the 90 % level) of a Tiso14 and b upper 300 m heat content in LR nudged simulation and SODA for the period 1949–2005

Fig. 13
figure 13

Time evolution of Tiso14 averaged over 30°S–30°N a globally, b Pacific, c Atlantic and d Indian Ocean in nudged and historical simulations (color code as earlier), SODA (black dash) and ORAS4 (green dash). Smoothing of 3 year running mean is applied. Significant correlations at the 90 % level of simulations to SODA on a sliding 20-year window are in red bold symbols

Even though the tropical (30°S–30°N) Tiso14 volume (4.59 × 1016 m3) is less than the corresponding 0–300 m volume (5.59 × 1016 m3), the reconstructed Tiso14 in the tropical Pacific, is correlated over a much longer time period (35 points of significant correlation on 20-year periods in Fig. 13b versus 10 points in Fig. 10b) with reanalyses. The 0–300 m heat content in the nudged simulations in the tropical Atlantic is not significantly reconstructed (except for 1980–1988) compared to SODA (not shown) or Levitus (Fig. 10c, between 1960 and 1975). However the same region shows significant reconstruction of Tiso14 by the nudged simulations (Fig. 13c). The tropical Indian Ocean 0–300 m region has a larger volume (10.06 × 1015 m3) than the Tiso14 region (8.3 × 1015 m3) but both are significantly reconstructed (Figs. 10d, 13d).

5 Impact of enhanced horizontal resolution in the atmosphere

In this section we assess the performance of the MR nudged simulations compared to the LR ones. Hourdin et al. (2013) shows a reduction of the atmospheric bias, and thus a reduction of mid-latitude cold SST anomaly with refined atmospheric horizontal grid. Comparative analysis (not shown) shows that the curl of the wind stress is stronger in MR than in LR in the Southern Ocean, in the equatorial Pacific, in the Gulf Stream region, extending from the subtropical to subpolar region in the North Atlantic as well as in the Nordic Seas, and along the Kuroshio region in the North Pacific. Strong differences (not shown) in the Antarctic continent indicate the impact of higher atmospheric resolution on the Southern Ocean winds. The MR simulation is warmer by 2 °C in most of mid-latitude regions. Differences are nevertheless much stronger in the North Atlantic (up to 3.5 °C) particularly in the southern tip of Greenland and the Nordic Seas and the subpolar gyre region.

The reduction of the mid-latitude cold bias is related with a poleward shift of the mid-latitude jets (Hourdin et al. 2013), that have a positive impact on the ocean circulation in the North Atlantic with the wind-driven warm and salty waters from the tropics being advected further north, reinforcing deep water formation (cf. Marti et al. 2010). The reduction of winter sea-ice extent in MR enables more deep water formation in the Nordic Seas (Fig. 14). As a consequence, the AMOC in preindustrial simulation goes from a maximum of 10.3 Sv in LR to 13.5 Sv in MR. Note that the reduction of the sea ice cover in MR is also particularly important in the framework of our nudging protocol, since nudging is not applied under sea ice. The nudging extension is thus largely expanded in the MR configuration so that the North Atlantic is better represented in MR when compared to reanalyses and observations than in LR.

Fig. 14
figure 14

Difference in winter mixed layer depth (m) in MR and LR for the period 1949–2005 using ensemble means. Purple deeper winter mixed layer in MR. Contoured over by the winter mixed layer depth in MR The winter sea-ice boundary in LR simulation is in denoted in dash whereas in MR simulations is in black line

Nevertheless, except at high latitudes, the overall performance of surface nudging on SST is generally not strongly influenced by the present increase of atmospheric horizontal resolution (not shown). Temperature reconstruction at depth is also very similar in the tropical and subtropical basins. Deeper and higher correlations with both reanalyses are found around 50–60°N (Fig. 15). These are due to the reconstruction in deep convection areas (see also S2014).

Fig. 15
figure 15

Same as Fig. 9b but for correlation between MR-nudged simulation and SODA

The present increase in horizontal resolution in the atmosphere only improves the reconstruction of the subsurface oceanic state in the North Atlantic, which could of importance since it is one of the most promising region for predictability (van Oldenborgh et al. 2012). The impact of the improved mean state in MR, notably for the AMOC, is not evaluated here in terms of predictability as compared to observations, but could potentially be more significant than the differences in reconstruction of the subsurface ocean. Note that these limited improvements may come from the fact that the increase in resolution is modest. Nevertheless Hodson and Sutton (2012) also showed that much higher resolution had only a limited impact on the variability and adjustment of their climate model.

6 Discussions and conclusions

In this study, we assessed to what extent SST nudging can reconstruct the (poorly) observed subsurface ocean variability in a climate model. The motivation is to provide initial conditions in the ocean interior for the decadal hindcasts using only surface observations. An ensemble of nudged simulations performed with the IPSLCM5A model using the 1949–2005 CMIP5 historical forcing is compared to model-independent reconstructed observations and observation-based reanalysis products. The sensitivity to the atmosphere model resolution is also explored using IPSLCM5A in low (LR) and medium (MR) resolutions. The strategy to initialize the climate model through surface nudging only is motivated by the fact that (1) surface observations are much more reliable and consistent than subsurface observations over the last century and (2) nudging at depth tampers with the model’s variability trajectory where most of the predictability lies (Swingedouw et al. 2013; Servonnat et al. 2014—S2014). The assumption is that oceanic processes can transmit the reconstructed surface properties to the deeper layers. The surface is restored towards Reynolds SST anomalies with a restoring coefficient of −40 W/m2 that corresponds to the observed magnitude of thermal air-sea coupling (Swingedouw et al. 2013).

The ocean reanalyses used to measure the degree of reconstruction at depth (SODA and ORAS4) show substantial differences between them, likely due to different forcing fields, base models, assimilation techniques and lack of subsurface observations. The oceanic domains where they are significantly correlated vary with depth (Fig. 2) and sometimes are close to the surface. We only assess the nudged simulations in these regions, as no estimate of the actual ocean subsurface is otherwise available.

The equatorial Pacific (down to 200–300 m), the subtropical Atlantic (~10°–20°N and 40°N, down to 400–600 m), the Nordic Seas convection site (down to 800 m, mostly in MR) and some part of the Southern Ocean (down to 700 m) exhibit significant subsurface reconstruction over the 1950–2005 period in the surface-nudged simulations. The equatorial Pacific subsurface reconstruction until 2,000 m in perfect model framework (S2014) and until 300 m in historical context is mainly due to the reconstruction of the surface winds through the SST nudging. LR and MR exhibit little differences in their ability to reconstruct the deeper ocean. One notable exception is the North Atlantic where improved circulation and sea-ice cover in MR leads to much improved convection in the Nordic Seas and alleviates the spurious convection issue seen in LR (cf. S2014).

Integral quantities have been evaluated in order to describe the time evolution of the correlation beyond the mean correlation over the whole 50-year period of the analysis. The 700 m ocean heat content has been evaluated against the reanalyses and several observation-based and model-independent reconstructions. The lack of any significant correlation between these references (and with the reanalyses) precludes using them to evaluate the reconstructions in the nudged simulations on global domain averages. The 300 m ocean heat content exhibit better agreement between the reconstruction and is also partly reconstructed in the different nudged simulations. Alternative diagnostic in the form of average temperature above fixed isotherms (Tiso14, Corre et al. 2012) is then used to assess the time evolution of more dynamical ocean subsurface reconstruction. This diagnostic is limited to the tropics as the outcropping latitude of 14 °C is within 40° of latitude but is able to account for the east–west slopping gradients present in the tropics. The reconstructed tropical subsurface assessed through Tiso14 in individual basins shows significant correlation to SODA in the nudged simulations and for most of the period, displaying comparable correlation to 0–300 m heat content variability in the tropical basins.

This study demonstrates the potential of surface nudging to reconstruct the deeper ocean in a number of key regions of the oceans (tropical Pacific, North Atlantic). However, the salinity is not as well reconstructed (not shown) except for a small equatorial region in the Pacific that is attributed to the reconstruction of the atmospheric freshwater budget through SST-atmosphere coupling (see S2014 for detailed discussion). S2014 demonstrated the impact of surface nudging in a perfect model framework and the current study extends their results to the historical and observational context. The regions of unsuccessful subsurface reconstruction in the nudged simulations that lie within the region where reanalyses agree provide space for further improvement. The added impact of sea surface salinity and surface wind nudging on subsurface reconstruction is currently being investigated and will be reported in a subsequent study.