1 Introduction

The Arctic Ocean has experienced a sharp decline in sea ice extent and thickness in the recent decades (Serreze et al. 2007; Comiso et al. 2008; Parkinson and Cavalieri 2008; Stroeve et al. 2012) and a record-breaking low in Arctic sea ice extent of 3.61 million km2 was reached in September 2012 (http://www.nsidc.org/arcticseaicenews/). This dramatic evolution of the sea ice cover has been seen as an early indication of global climate change (IPCC 2007) and has aroused worries about the consequences of such changes on the European climate (Tang et al. 2014). On seasonal to decadal time scales, the Arctic sea ice cover changes have been shown to have a significant impact on the Northern hemisphere climate (e.g. (Francis et al. 2009; Petoukhov and Semenov 2010; Outten and Esau 2011). Polar climate studies are however severely hampered by the sparseness of sea ice observations over the last century.

Up to 1973, the Arctic sea ice data have been limited to monthly estimates of sea ice extent, with complete cover assumed within the ice pack and a necessary treatment of missing data in the marginal seas (Walsh and Johnson 1978). The situation is even worse in the Antarctic where sea ice data is limited to estimates of extent climatologies over two distinct periods: 1929–1939 (Deutsches_Hydrographisches_Institute 1950) and 1947–1962 (Tolstikov 1966). From 1973, the U.S. Navy, Canadian and Danish aerial reconnaissance provided quasi-weekly estimates of sea ice concentration (Knight 1984). Following the advent of satellite microwave imagery in 1978, sea ice concentration data became available at a 2-day frequency, later increased to a daily-frequency (in 1987), and roughly at a 1° resolution (Cavalieri et al. 1996). The publicly distributed sea ice concentration datasets, the National Snow and Ice Data Center (NSIDC) sea ice concentration fields (Cavalieri et al. 1996) covering the 1978-present period and HadISST, (Rayner et al. 2003) covering the 1870-present period, stand as the best estimates obtained from combination, homogenization and extrapolation of these sparse observational data. When considering the sea ice thickness, the lack of observational data is even more striking. The first unified dataset (Lindsay 2010) was released in 2010. It combines Arctic submarine observations from 1975, moored upward looking sonar observations from 1990 and air-borne or satellite electromagnetic measurements in the last decade.

The knowledge of the initial climate system state has been shown to be a source of information in seasonal forecasts (Balmaseda and Anderson 2009; Doblas-Reyes et al. 2006) as well as in decadal climate predictions (Smith et al. 2007; Keenlyside et al. 2008; Pohlmann et al. 2009). In particular, the spring Arctic sea ice thickness distribution has been shown to be a precursor of the following September sea ice cover (Chevallier et al. 2011) in a model study. A summer-to-summer reemergence mechanism has been suggested whose memory lies in the sea ice thickness (Blanchard-Wrigglesworth et al. 2011a). Wang et al. (2013) also find a dependence of the sea ice prediction skill on the initial sea ice thickness. Filling the critical gap in sea ice thickness observations is therefore crucial to provide optimal sea ice initial conditions and improve the quality of seasonal to decadal climate predictions. Estimating the observed sea ice thickness over the last decades has been attempted by several authors through the implementation of sea ice data assimilation techniques exploiting the available observations of sea ice concentration (Tietsche et al. 2012; Tang et al. 2013) and velocity (Lindsay and Zhang 2006; Duliere and Fichefet 2007). Data assimilation generally allows for a more realistic representation of the sea ice extent, thickness and draft as compared to the few available observations. The lack of observations in the 1960s and 1970s prevents the production of a coherent sea ice reanalysis covering the 1958-present period, i.e. the focus of the climate prediction exercise achieved within the CMIP5 (Coupled Model Intercomparison Project Phase 5) project. We present, in Sect. 2, the alternative method of constraining sea ice historical simulations by ocean and atmosphere observations only. While previous studies produced single-member reanalyses, we produce here an ensemble comprising five members using atmospheric and oceanic perturbations which attempt at estimating the uncertainty in the sea ice state. Our sea ice reconstruction therefore offers a double added-value compared to previous reconstructions:

  1. 1.

    They cover a longer period than the satellite period (but exploit less observational data for the sake of consistency in the observations along the reconstruction).

  2. 2.

    Containing five members, they aim at sampling the uncertainty in the sea ice state.

These simulations then provide best estimates of initial sea ice extent and thickness for a set of 3-year long five-member retrospective climate predictions performed with the EC-Earth forecast system (Hazeleger et al. 2010, 2012). Most current operational seasonal forecast systems do not initialize their sea ice component from contemporaneously observed sea ice conditions but from a climatology (Arribas et al. 2011), or do not even include a sea ice model (Gueremy et al. 2005; Stockdale et al. 2011). In the framework of the CMIP5 climate prediction exercise, many institutes initialized the ocean component of their forecast system but did not constrain the sea ice component towards observations (Matei et al. 2012; Voldoire et al. 2012). Efforts towards initializing the sea ice state are emerging (Chevallier et al. 2011; Wang et al. 2013; Chevallier et al. 2013; Sigmond et al. 2013; Tang et al. 2013) and a thorough evaluation of its benefits in different forecast systems remains to be done. We assess in Sect. 3 the added-value of our sea ice initial conditions on the prediction performance of EC-Earth. Furthermore, we evaluate, for the first time, the impact of taking into account the uncertainty in the sea ice initial state on the spread along the predictions, by initializing from the five different members of our sea ice reconstructions. Discussion and conclusions are provided respectively in Sects. 4 and 5.

2 Sea ice historical simulations constrained by ocean and atmosphere reanalyses

2.1 Prior sea ice simulations used to initialize the EC-Earth forecast system

Within the framework of the CMIP5 project, the decadal climate predictions performed with the EC-Earth forecast system (Hazeleger et al. 2010, 2012) were initialized, for the sea ice component, from a single-member simulation (Brodeau et al. 2010) run with the Louvain-la-Neuve (LIM2) sea ice model (Fichefet and Maqueda 1997; Goosse and Fichefet 1999) embedded into version 2 of the NEMO (Nucleus for European Modelling of the Ocean) ocean model (Madec 2008; Ethe et al. 2006). The ocean-sea ice model was forced by DFS4.3 (Brodeau et al. 2010) atmospheric surface fields over the 1958–2006 period. The sea ice state was therefore constrained by atmospheric observational datasets but no constraint by oceanic observational datasets was applied.

2.2 Description of the simulations

Our strongly-constrained sea ice historical simulations are run with the Louvain-la-Neuve (LIM2) sea ice model embedded into the version 3.2 of the NEMO ocean model, which is a more recent version than the one used by Brodeau et al. (2010). In all of the experiments described below and sketched on Fig. 1, the ocean and sea ice models are forced by either DFS4.3 or ERA-interim (Dee et al. 2011) atmospheric surface fields through the CORE bulk formulae (Large and Yeager 2004). To additionally constrain the ocean towards observational data, the ocean temperature and salinity are nudged towards their monthly counterpart from the ORAS4 ocean reanalysis (Mogensen et al. 2011; Balmaseda et al. 2012). Although the ORAS4 ocean reanalysis only provides an estimate of the ocean state based on model and observations, such a physical extrapolation of the available observations stands as one of the best estimates of the ocean state over the last decades. This nudging exerts a strong constraint on the sea ice extent as any ice transported towards an area where the Sea Surface Temperature (SST) is above the sea water freezing point melts. The timescale for the nudging is set to 360 days below 800m and 10 days above, except in the mixed layer, but the SST and Sea Surface Salinity (SSS) are also restored (−40 W/m2 and −150 mm/day/psu). The nudging is not applied in the 1°S–1°N band to avoid disrupting the strong equatorial currents but is applied anywhere else. The nudging prevents sea ice appearing in regions where the SST in ORAS4 is above the freezing point, but a weaker constraint applies in regions where the SST is below the freezing point. Indeed an SST below the freezing point favours sea ice production but does not constrain the intensity of such sea ice production which can be insufficient to stand against a strong atmospheric forcing. Hence, we obtain an asymmetrical impact of the ocean nudging on the sea ice, which acts as a strict upper bound for sea ice extent but as a weak lower constraint. We do not use any sea ice data assimilation.

  • SpinNudg is a 10-year spin-up experiment forced by the DFS4.3 atmospheric surface fields and started in January 1958 from the LEVITUS climatology (Levitus and Boyer 1994), 3m of ice in the Arctic and 1m in the Antarctic. The ocean reaches an equilibrium almost immediately due to the nudging. This spin-up is therefore mainly required for the sea ice conditions to reach a stable realistic state and to obtain five different January initial sea ice conditions as initial conditions for the next simulation. This spin-up is validated in the Supplementary Materials (see Sect. 1 and Figure S1).

  • HistDfsNudg is a five-member historical experiment covering the 1958–2006 period and started from the 31 December of years 1961 to 1965 of SpinNudg, a different year per member. Each member of this historical simulation is nudged towards a different member of the ORAS4 reanalysis. They are forced with DFS4.3 atmospheric surface fields, with additional wind-stress perturbations. These three different components are used to provide perturbations to create the five ensemble members. This methodology is inspired by the one used to generate five members for the ORAS4 ocean reanalysis (Mogensen et al. 2011): different initial conditions from a spinup in which data assimilation is implemented, thinning of the observations and wind stress perturbations. Our wind stress perturbations are computed following the methodology used for the ORAS4 ocean reanalysis, the previous ORAS3 (Balmaseda et al. 2008) reanalysis and in the ENSEMBLES project (Weisheimer et al. 2007; Doblas-Reyes et al. 2010) which we describe in more detail in a separate paragraph below.

  • HistEraNudg is a five-member historical experiment covering the 1979–2012 period and started from HistDfsNudg on the 31 December 1978. This experiment is forced by the ERA-interim atmospheric surface fields with wind stress perturbations, and the ocean is nudged towards the 5-member ORAS4 reanalysis.

  • SpinFree is a second spin-up experiment identical to SpinNudg, except that no nudging is applied. It is run for 49 years. This spin-up is validated in the Supplementary Materials (see Sect. 1 and Figure S1).

  • HistDfsFree is a one-member historical experiment identical to HistDfsNudg, except that no nudging is applied. It is started from the 31 December 1974 of SpinFree. In the choice of year 1974, we aim at finding a compromise between dates as early as possible to minimise the greenhouse warming compared to the 1958 ocean state and dates as late as possible to allow for an equilibrium of the ocean state.

Fig. 1
figure 1

Sketch of the experiments used throughout this article and described in detail in Sect. 2.2 and 3.1

We have constrained our sea ice reconstructions with the ERA40/ERA-interim reanalyses (DFS4.3 corresponds to a correction of the ERA40 reanalysis by observations; Brodeau et al 2010) above the sea ice system and with the ORAS4 ocean reanalysis below. This methodology also intends to optimize the consistency between the ocean, the atmosphere and the sea ice initial conditions that we will use afterwards to initialize our climate predictions. Indeed, as explained later in Sect. 3.1, we use the ERA40/ERAinterim reanalyses to initialize the atmospheric component and the ORAS4 reanalysis for the ocean component.

Wind stress perturbations (Weisheimer et al. 2007; Balmaseda et al. 2008; Mogensen et al. 2011). We compute the differences between the monthly 10m wind of DFS4.3 and ERAinterim over the 1979–2006 period. These differences provide an estimate of the observational error in the wind field. We obtain 28 differences for each month of the year. January perturbations are then picked randomly from the set of January differences, February perturbations from the set of February differences, etc … After drawing the perturbations for a complete year, i.e. 12 monthly perturbations, they are interpolated linearly to obtain daily perturbations. The perturbations are added to four of our five members following the methodology applied for the ORAS4 reanalysis.

2.3 Reference observational datasets

Our sea ice reconstructions are validated against:

  • sea ice area: the estimates computed from the HadISST and NSIDC datasets as the integral of the grid-point product of sea ice concentration and grid area.

  • sea ice volume: the UCL (Université Catholique de Louvain la Neuve) (Mathiot et al. 2012) and the PIOMAS (Zhang and Rothrock 2003) sea ice reanalyses

  • ocean heat content and meridional overturning circulation: the ORAS4 ocean reanalysis (Mogensen et al. 2011).

  • sea ice thickness: IceSat observational dataset (Kwok et al. 2007; Kwok and Cunningham 2008; Kwok et al. 2009) available for February–March 2004, 2005, 2006 and 2008, March–April 2007 and October–November 2003 to 2007.

2.4 Validation of the sea ice conditions

Fig. 2
figure 2

Sea ice reconstructions HistDfsNudg in red, HistDfsFree in blue, HistEraNudg in brown, HadISST observational dataset and PIOMAS reanalysis in black dashed, NSIDC observational dataset in black dotted, ORAS4 and UCL reanalyses in black continuous. March and September (a, c) Arctic and (b, d) Antarctic (a, b) sea ice area and (c, d) sea ice volume, (e) Atlantic Meridional Overturning Streamfunction averaged between 40°N and 55°N and between 1 and 2 km depth, (f) Total global ocean heat content

In the nudged historical simulations (HistEraNudg, HistDfsNudg), the Arctic (Fig. 2a) and Antarctic (Fig. 2c) sea ice areas tend to be underestimated as compared to the HadISST observational estimates, though they are in better agreement with the NSIDC estimates. Discrepancies between the HadISST and NSIDC estimates mainly originate from the different combinations of source data but also from the different combinations of channels in their algorithm for retrieval and from different corrections for weather and satellite drifts. The use of two different observational datasets provides some hints about the observational uncertainty for which no robust estimate exists for sea ice area. The underestimation of the sea ice area also appears in the UCL reanalysis to a lesser extent. As described in Sect. 2.2, the ocean nudging exerts an asymetric constraint on the sea ice cover, acting as a strict upper bound for sea ice area but as a weak lower constraint, hence favouring a reduction of the sea ice area in HistDfsNudg as compared to HistDfsFree. The nudging also tends to improve the agreement with the observational estimates in terms of interannual variability, especially in the Arctic, and in terms of the long-term trend, especially for the summer Arctic sea ice area and the winter Antarctic one. The underestimation of the long-term trend in HistDfsFree might be partly caused by the DFS4.3 surface fluxes which consist of a climatology for precipitation and snow until 1979 and for downwelling longwave and shortwave radiation until 1984 (Brodeau et al. 2010) and monthly estimates of those fields afterwards. The ocean nudging corrects for this underestimation. The spread between members is larger in the Antarctic than in the Arctic, consistent with the sparser observational coverage in the Antarctic Ocean. The spread is also larger for the sea ice volume (Fig. 2b, d) than for the sea ice area, consistent with the lack of sea ice thickness observations. As very few observations are available for sea ice thickness, we validate here the sea ice volume against the PIOMAS (Zhang and Rothrock 2003) and UCL (Mathiot et al. 2012) reanalyses in which no constraint towards sea ice thickness observations has been used but only constraints towards sea ice concentration observations. The nudging in HistDfsNudg tends to improve the agreement of the winter sea ice volume with the UCL and PIOMAS sea ice reanalyses as compared to HistDfsFree although the sea ice volume still seems to be overestimated. The choice of nudging the ocean temperature and salinity towards ORAS4 was mainly driven by the poor ocean circulation simulated in HistDfsFree as illustrated by the Atlantic Meridional Overturning Circulation (AMOC) index (Fig. 2e). This AMOC index is chosen as the average of the Meridional Overturning Streamfunction between 40°N and 55°N and between 1 and 2 km, which corresponds to the region of the ORAS4 maximum. The AMOC strength in the free historical simulation (HistDfsFree) amounts to about 25 % of ORAS4 and barely exhibits any variability. The ocean nudging allows for a more realistic ocean circulation and associated heat transports which reproduce the decadal variability seen in the ORAS4 reanalysis, though they are still underestimated by about 25 % in HistEraNudg and HistDfsNudg. Given the coupling on decadal timescales of the Arctic sea ice cover with the AMOC and associated heat transports suggested by several studies (Zhang et al. 1995; Goosse et al. 2002; Hakkinen and Proshutinsky 2004; Koenigk et al. 2006; Mahajan et al. 2011), the ocean nudging appears crucial to capture this source of Arctic sea ice variability. The total global ocean heat content tends also to be overestimated (Fig. 2f) in HistDfsFree as compared to ORAS4. A suddden increase by about 223J also occurs in the mid 1970s in HistDfsFree which might be related to the assimilation from 1973 of synthetic surface pressure observations from satellite imagery in the surface atmospheric forcings (Brodeau et al. 2010). With the ocean nudging, the total ocean heat content in HistEraNudg and HistDfsNudg follows ORAS4 closely. Only ocean temperature and salinity are nudged towards ORAS4, which explains the almost perfect reproduction of the OHC and the underestimation of the AMOC.

Fig. 3
figure 3

Observed and simulated sea ice thickness a, b February–March, c, d October–November sea ice thickness, in metre, averaged over years a, b 2004, 2005 and 2006, c, d 2003, 2004, 2005 and 2006 in a, c the HistDfsNudg experiment, b, d) the IceSat observations

Sea ice thickness observations are only available for the Arctic ocean and for a few seasons over the course of the last decade. The average of the IceSat observations (Kwok et al. 2007, 2009; Kwok and Cunningham 2008) over the years covered by the HistDfsNudg reconstruction for February and March on the one hand and October and November on the other hand are displayed in Fig. 3 together with their simulated counterpart. The simulated Arctic spring sea ice thickness (Fig. 3a) tends to be underestimated in the Chukchi Sea and overestimated in the East Siberian, Laptev and Beaufort Seas as well as over the Central Arctic. The steep increase in sea ice thickness when approaching Greenland is well captured. The Arctic autumn sea ice thickness (Fig. 3c), on the contrary, is underestimated in the East Siberian and Laptev Seas but still overestimated in the central Arctic. The validation of the sea ice thickness in the HistEraNudg reconstruction against the average of the IceSat observations over all the available seasons is provided in the Supplementary Materials (See Sect. 2 and Figure S2). To complement the validation of the simulated sea ice thickness, the HistDfsNudg, HistDfsFree and HistEraNudg distributions averaged over the 1979–2006 period are compared with the UCL reanalysis in Fig. 4 for March and Fig. 5 for September. The ocean nudging generally decreases the sea ice thickness but the distribution is very similar between HistDfsNudg and HistEraNudg. The simulated Arctic March sea ice thickness in HistDfsNudg, HistDfsFree and HistEraNudg (Fig. 4a-c) shows a relatively good agreement with the UCL reanalysis (Fig. 4d). A slight overestimation of the ice thickness appears however in the Hudson and Baffin Bays, and in the Bering Sea and Sea of Okhotsk, together with a slight underestimation in the central Arctic which contrasts with the overestimation with respect to IceSat observations in the recent years. The nudging partially corrects for the overestimations in the peripheral seas, but also decreases the sea ice thickness in the central Arctic. The same bias and effect of the nudging appear for the Arctic September sea ice thickness (Fig. 5a–d) in the central Arctic but not in the peripheral seas where there is no ice during this season. The Antarctic March sea ice thickness (Fig. 4e–g) also shows a relatively good agreement with UCL (Fig. 4h). The nudging prevents the overestimation of the ice thickness in the Bellinshausen Sea. In September, the asymetry of the sea ice distribution between West and East Antarctica in the UCL reanalysis (Fig. 5h) is less pronounced in our historical simulations (Fig. 5e–g). The nudging tends to reduce the sea ice thickness but does not correct for the lack of asymetry in the sea ice distribution, although few oceanic observations are available in this region to constrain the ORAS4 reanalysis and subsequently our historical simulations.

Fig. 4
figure 4

March sea ice thickness in metre averaged over years 1979–2006

Fig. 5
figure 5

September sea ice thickness in metre averaged over years 1979–2006

3 Added-value in interannual climate predictions

3.1 The simulations

The added-value of this new set of sea ice initial conditions is assessed by performing hindcasts (or retrospective forecasts) with the EC-Earth (Hazeleger et al. 2010, 2012) coupled general circulation model version 2.3 which comprises:

  • the IFS atmospheric component with 62 vertical levels and a TL159 horizontal resolution.

  • the NEMO version 2 ocean component in the ORCA1 configuration with 42 levels.

  • the Louvain-la-Neuve (LIM2) sea-ice model version 2 embedded into NEMO version 2.

The atmospheric and oceanic components are coupled through OASIS3 (Valcke 2006). Du et al. (2012) describes the standard initialization approach for all the components of the EC-Earth forecast system. Three sets of five-member hindcasts are used in this article and represented in Fig. 1:

  • PredicCTL: As a contribution to the CMIP5 project, decadal predictions have been initialized every 1st November from 1960 to 2005 from the five-member ORAS4 ocean reanalysis (Mogensen et al. 2011) for the ocean component, from the ERA-40 reanalysis (Uppala et al. 2004) for all start dates before 1989 and from ERA-Interim afterwards for the atmospheric component and from a single-member NEMO2-LIM2 simulation forced with DFS4.3 atmospheric surface fields for the sea ice component (see Sect. 2.1).

  • PredicDfsNudg: Climate predictions have been initialized on 1st November, as in PredicCTL but only every 2 years from 1960 to 2005 plus in years 1965, 1975, 1985, 1995, 2005, to limit the computational cost, which makes a total of 28 start dates. The only difference with the PredicCTL experiment resides in the sea ice initial conditions which are taken from the five-member HistDfsNudg reconstruction. The predictions have been run for only 3 years into the future which corresponds to the timescales over which the predictability of Arctic sea ice have been suggested to be dominated by initial conditions (Blanchard-Wrigglesworth et al. 2011a, b).

  • PredicEraNudg: Climate predictions have been initialized on 1st November, as in PredicCTL but only every 5 years from 1980 to 2005, plus in years 2001, 2002, 2003 and 2004 which makes a total of 10 start dates. The sea ice initial conditions are taken in this set from the five-member HistEraNudg reconstruction. The predictions have also been run for only 3 years into the future.

The exact same EC-Earth model version has been used in those 3 different experiments which only differ by their sea ice initial conditions.

3.2 Reference observational datasets

Our retrospective predictions are validated against:

  • sea ice area: the HadISST and NSIDC estimates.

  • sea ice volume: the PIOMAS and UCL estimates.

  • SST: the NOAA Extended Reconstructed SST v3b dataset (named ERSST in this article) (Smith et al. 2008) and the HadISST v1.1 dataset from the UK Met Office (HadISST) (Rayner et al. 2003)

  • ocean heat content: the ORAS4 ocean reanalysis (Balmaseda et al. 2012)

  • sea level pressure: the HadSLP2 dataset (Allan and Ansell 2006)

  • 2-metre temperature (T2M): the NCEP/NCAR R1 (Kalnay et al. 1996) (named NCEP) and ERA40 (Uppala et al. 2004) reanalyses and a merged dataset (named GHCNERSSTGISS in this article) combining land air temperatures from the GHCN/CAMS dataset (Fan and van den Dool 2008) and SST from ERSST, except outside the 60°N–60°S band where the GISSTEMP (Hansen et al. 2010) dataset with 1200 km decorrelation scale is used.

3.3 Methodology

The model or observation climatology is defined as a function of the forecast time, by averaging the hindcast variable across the start dates, using only hindcast values for which observations are available at the corresponding dates. The model climatologies obtained in such a way are then subtracted from each raw hindcast to obtain anomalies over the whole hindcast period. The same method is applied to the observations to obtain anomalies over the whole observational period. The anomalies thus obtained are referred to as “per-pair” anomalies following Garcia-Serrano and Doblas-Reyes (2012). Following this methodology, the trend along the forecast is removed but the trend along the start dates for a given forecast time is still present. We do not apply any detrending along the start dates because any available detrending method has drawbacks and can introduce spurious signals. We focus in this article on the added-value of our initial conditions on the forecast skill rather than on the level of skill itself.

The hindcast performance is assessed from the bias-corrected “per-pair” anomalies. Hindcast skill is measured either using the anomaly correlation coefficient (ACC) or the Root Mean Square Error (RMSE). The confidence interval for the ACC is computed after a Fisher-Z transformation and takes into account the autocorrelation of the time series following VonStorch and Zwiers (2001). The confidence interval for the RMSE relies on a χ 2 distribution and for the ratio of two RMSE, it is computed through a Fisher test, accounting for the autocorrelation of the time series in both cases. The spread between the members is measured using the Inter-Quartile Range (IQR). Its confidence interval is computed by bootstrapping. To assess whether the level of spread is sufficient, the ratio between the Standard-Deviation (SD) of the members around the ensemble-mean and the RMSE of the ensemble-mean is computed.

3.4 Impact of improved sea-ice initialization on the prediction skill

The Root Mean Square Error (RMSE) for the Arctic sea ice area (Fig. 6a) is decreased at almost each forecast time when initialized from our reconstruction in PredicDfsNudg (blue) as compared to PredicCTL (red). The RMSE in HistDfsNudg is provided in black as a reference for the upper bound of the prediction skill (the lower bound of the RMSE). The RMSE ratio of PredicDfsNudg over PredicCTL is not significantly different from 1 at the 95 % level, which is consistent with the strong overlapping of the confidence intervals of the two RMSE. Indeed, even when using 28 hindcasts initialized 1 or 2 years apart, two consecutive hindcasts can not be considered as independent. The number of effective independent data ranges between 5 and 20 depending on the variable considered when applying the formula described in VonStorch and Zwiers (2001) based on the autocorrelation function. More independent hindcasts would be required for such a small difference in RMSE to be significant at the 95 % level. Increasing the frequency of the startdates over the re-forecast period does not provide more independent hindcasts but those could be obtained by lengthening the period sampled by our start dates. Unfortunately, obtaining accurate initial conditions for such a backward extension of the reforecast period is extremely challenging given the sparse observational coverage. The persistence through the forecast of a reduced RMSE in PredicDfsNudg suggests however a robust added value of our sea ice initial conditions on the forecast quality for the Arctic sea ice cover. The forecast skill is particularly improved during the boreal summer season. Such an improvement does not appear for the Antarctic sea ice cover (Fig. 6b), but the base RMSE is larger in the Antarctic region than in the Arctic region and the reference observational dataset used for verification itself bears much more uncertainty. We also assess the performance in predicting the sea ice volume against the two reanalyses (UCL and PIOMAS) in Fig. 6c. It should be noted that the UCL and PIOMAS reanalyses only cover the 1979–2007 and 1979–2012 period respectively and they do not include any observational constraint of sea ice thickness, but do include observational constraints of sea ice concentration. We obtain a better skill in predicting the sea ice volume in PredicDfsNudg than in PredicCTL when we use either UCL or PIOMAS as a reference. The performance of PredicDfsNudg are barely discernable from the sea ice reconstruction HistDfsNudg for the fist 18 months. The improved skill in predicting the Arctic sea ice conditions translates into an improved skill in predicting the Arctic near surface temperature (Fig. 6d). The impact on the skill in predicting the global mean surface atmospheric temperature is only marginal though (Fig. 6e). The skill in predicting the Arctic ocean heat content (OHC) seems also slightly improved in PredicDfsNudg (Fig. 6f). Similar conclusions are drawn from the ACC as a measure of skill on raw and detrended anomalies (not shown). The same RMSE scores computed for the PredicEraNudg and PredicCTL experiments over their common start dates are shown in the Supplementary Materials (see Sect. 4 and Figure S3). It should be noted that the scores computed for PredicCTL and shown in Figs. 6, 8 and 9 are not the same as the ones shown in Figures S3 to S5 since start dates over different periods have been used.

Fig. 6
figure 6

Root Mean Square Error. PredicCTL in red, PredicDfsNudg in blue with one thick line for the RMSE and thin lines for its 95 % confidence interval. HistDfsNudg is also shown in black as a lower bound for the prediction RMSE. The observational datasets used for reference are: HadISST for the a Arctic and b Antarctic sea ice areas, c PIOMAS (continuous) and UCL (dots) reanalyses for the Arctic sea ice volume, NCEP (continuous) and ERA40 (dots) reanalyses for the d Arctic (60–90°N) and e Global mean 2m temperature, f ORAS4 reanalysis for the Arctic Ocean Heat Content. The anomalies have been smoothed out with a 3-month running mean prior to the RMSE computation

Further insight into the extrapolar impacts of our sea ice initial conditions is provided by Fig. 7. The skill in near surface temperature is improved (Fig. 7a, c) over most of the Arctic Ocean for Years 1 and 2 although this improvement is significant only over the central Arctic for Year 1. Only a few continental areas close to the Arctic Ocean show a marginally increased skill: North-West Canada and East Siberia for Years 1 and 2 although these improvements are not significant. The regions of maximal increase in skill also correspond to regions where PredicCTL shows a particularly low skill (Fig. 7b, d). The skill in sea level pressure (Fig. 7e, g) is reduced over the Arctic Ocean and during the second forecast year over Antarctica but this degradation of skill is not significant. The same conclusions are drawn when using the ACC as a measure of skill rather than the RMSE (not shown).

Fig. 7
figure 7

Root Mean Square Error. a, c, e, g The ratio of the RMSE in the PredicDfsNudg experiment over the RMSE in the PredicCTL experiment whereas b, d, f, h the RMSE in the PredicCTL experiment. The observational datasets used for reference are: ad the GHCNERSSTGISS merged dataset for near surface temperature, eh the HadSLP2 observational dataset for sea level pressure. Dots regions where the 95 % significance level is reached for the RMSE ratio in the left column. Dots can be seen in a the Bering Strait and North of Scandinavia, c along the Pacific Coast of U.S.A. and off west coast of Chile, d South of Tasmania, g South of Tasmania and East of New Zealand. Year 1 (Year 2) comprises month 3–14 (15–26) of the retrospective forecasts

3.5 Spread

Fig. 8
figure 8

Spread in polar regions PredicCTL in red, PredicDfsNudg in blue with one thick line for the Inter-Quartile Range (IQR) or ratio Standard-Deviation over Root-Mean Square Error and thin lines for the confidence interval at the 95 % level. The anomalies have been smoothed out with a 1-year running mean prior to the computation of the diagnostics

While the five members in PredicCTL run in the framework of the CMIP5 project did not include any perturbations of the sea ice initial conditions, PredicDfsNudg is initialized from the five different members of our HistDfsNudg historical simulation. We assess here the impact of using our five-member ensemble of sea ice initial conditions on the spread between the members of the PredicDfsNudg ensemble predictions compared to using single-member sea ice initial conditions as in PredicCTL (see Sects. 2.1 and 3.1 for more detail about the experimental design). The spread between the members is estimated through their interquartile range (IQR) which gives a more robust estimate than the standard-deviation (SD). We also compare the spread between the members to the RMSE of the ensemble-mean prediction. In this comparison, we use the SD of the members around the ensemble-mean as a measure of spread for homogeneity with a RMSE and we compute the ratio of the SD of the members over the RMSE of the ensemble-mean. This ratio should be one optimally so that the ensemble spread is representative of the forecast error. The spread (IQR) between the members in terms of simulated Arctic sea ice area seems slightly increased during the first half of the forecast (Fig. 8a) in PredicDfsNudg as compared to PredicCTL. Due to the large reduction in RMSE of the ensemble-mean forecast, the ratio of the standard deviation (SD) of the members around the ensemble-mean to the RMSE of the ensemble-mean forecast (Fig. 8b) is increased by about 20 % in PredicDfsNudg as compared to PredicCTL. This ratio is still underestimated though as it reaches a maximum of 0.65, .i.e. the spread is still too low as compared to the forecast error. The spread (IQR) is also generally increased along the forecast for the Antarctic sea ice area (Fig. 8c) except during a few months at the end of the first year and beginning of the second year. The ratio of the SD of the members to the RMSE of the forecast (Fig. 8d) is not larger, however, in PredicDfsNudg than in PredicCTL, since the sea ice initial conditions from HistDfsNudg tend to increase the RMSE (except during the first months of the forecast). The Arctic near surface temperature does not show any larger spread (IQR) in PredicDfsNudg than in PredicCTL (Fig. 8e). The ratio of the SD to the RMSE is, however, substantially larger in PredicDfsNudg than in PredicCTL due to the lower RMSE, but it still does not exceed 0.8. The larger spread in the sea ice variables (Fig. 8a, b) in PredicDfsNudg than in PredicCTL do not translate into any larger spread (IQR) in the global mean near surface temperature (Fig. 9a), upper ocean heat content (Fig. 9c) or sea surface temperature (Fig. 9e). Given the marginal decrease in RMSE for the global mean atmospheric temperature and global mean SST, the ratio of the SD to the RMSE does not show any robust increase in PredicDfsNudg as compared to PredicCTL for any of those variables (Fig. 9b, f). This ratio is larger all along the forecast in PredicDfsNudg than PredicCTL for the upper OHC (Fig. 9d), though still lower than 0.7. In summary, introducing perturbations in the sea ice initial conditions increases the spread between the members for the sea ice variables only, thus more representative of the forecast errors. Outside the polar regions and above the sea ice cover, the ratio of the SD to the RMSE closer to one mainly originates from the reduced forecast error. The spread between the members is also compared between PredicCTL and PredicEraNudg in Figures S3 and S4 but since only 10 common start dates between those experiments are available, those scores are much more noisy than the ones shown in the main article and comparing PredicDfsNudg and PredicCTL.

Fig. 9
figure 9

Spread in global variables. PredicCTL in red, PredicDfsNudg in blue with one thick line for the inter-quartile range (IQR) or ratio standard-deviation over root-mean square error and thin lines for the confidence interval at the 95 % level. The anomalies have been smoothed out with a 1-year running mean prior to the computation of the diagnostics

4 Discussion

4.1 Limitations of the constraint by atmospheric and oceanic reanalyses only

Our methodology to generate sea ice reconstructions relies on constraining a sea ice model by the atmospheric and oceanic reanalyses that we use afterwards to initialize our climate predictions. We do not apply any sea ice data assimilation. The constraint by the atmospheric and oceanic reanalyses providing the initial conditions for climate predictions attempt at optimizing the consistency of our sea ice initial conditions with the atmospheric and oceanic components. However, these oceanic and atmospheric reanalyses were not provided any information from our sea ice reconstructions at their production time. Although consistent with one another since using the same sea ice cover, they are not fully consistent with our sea ice reconstructions. Only a coupled ocean, sea-ice and atmosphere data assimilation system, such as the ones tested by Wang et al. (2013) and Sigmond et al. (2013), can ensure a full consistency between the initial conditions in the various components of the climate system and prevent any initial shock. Furthermore, the polar regions stand as the areas where the main weaknesses of those atmospheric and oceanic reanalyses can be found, primarily due to the sparse available observations to constrain them. They were produced without any sea ice model but using observed sea ice concentration. These limitations can be overcome by using sea ice data assimilation with an adequate propagation of the sea ice concentration updates to other sea ice variables in a coupled data assimilation system. The development of such coupled data assimilation system is on-going work.

4.2 Limitations of the perturbation method

For consistency of our sea ice reconstructions with the ORAS4 ocean reanalysis that we use to initialize our climate predictions, we have nudged each member of the sea ice reconstruction towards a different member of ORAS4 and we have used the same methodology as in ORAS4 to generate surface forcing perturbations. Most of the spread between the members of our sea ice reconstructions comes from the surface forcing perturbations (not shown). These perturbations are only applied to the surface wind field. They aim at accounting for the observational uncertainty by using the monthly differences between two set of surface forcings, namely DFS4.3 and ERA-interim, along their common period. This perturbation method has been designed to generate ocean reanalyses, and hence focuses on the wind field only, which plays a key role in shaping the ocean variables. In our sea ice reconstructions, the near-surface temperature and humidity, the solid and liquid precipitation, the surface winds and the downwelling shortwave and longwave radiation are used to force the coupled ocean and sea ice models through the CORE bulk formulae (Large and Yeager 2004). A more complete method to account for the uncertainty in all those variables could be developed by extending the wind perturbation method to all the other variables.

4.3 Degradation of the skill in predicting the large-scale atmospheric circulation

Although the use of our sea ice reconstructions allows for an increase in skill in predicting the Arctic sea ice area and near surface temperature with a significant improvement in the central Arctic, we also obtain a degradation of the skill, though not significant, in predicting the sea level pressure over the Arctic, i.e. the large scale atmospheric circulation. As a consequence, we do not observe any robust improvement in skill outside the polar regions when using our sea ice reconstructions. The reasons behind the degradation of the skill in large-scale atmospheric circulation are still under investigation.

5 Conclusion

In this study, we present two five-member sea ice historical simulations constrained by ocean and atmosphere observations and covering the 1958–2006 and 1979–2012 period, which is the focus of the climate prediction exercise achieved within the CMIP5 (Coupled Model Intercomparison Project Fifth phase) project. Our ensemble sea ice reconstruction stands as the longest available up to date and attempts at sampling, for the first time, the uncertainty on the sea ice state. The constraint by ocean observations is performed via a nudging of the simulated three-dimensional temperature and salinity towards their counterpart from the ORAS4 ocean reanalysis from the European Center for Medium Range Weather Forecasts (ECMWF). The constraint by atmosphere observations is performed by running ocean and sea ice coupled simulations forced by atmospheric data from the DFS4.3 observational data, on the one hand, and the ERA-interim reanalysis, on the other hand, which cover the 1958–2006 and 1979–2012 periods respectively. By introducing wind stress perturbations and nudging towards the five different members of the ORAS4 reanalysis, we produce five different members for our constrained historical simulations. This methodology allows for the generation of sea ice initial conditions for operational use in seasonal and decadal forecasting which are publicly available on request. Several observational datasets have been used for the validation of our sea ice reconstructions but the large observational uncertainty limitates this validation. The obtained Arctic and Antarctic sea ice areas show a reasonable agreement with their estimates from the NSIDC and HadISST datasets. These sea ice reconstructions then provide best estimates of initial sea ice extent and thickness for a set of 3-year long retrospective climate predictions. We compare this set with a reference one where the sea ice initial conditions were taken from a single-member ocean and sea ice coupled simulation forced by DFS4.3 observational data and performed with an older model version. Initializing from our reconstruction allows for an improved skill in predicting the Arctic sea ice cover and Arctic near surface temperature, but the skill in predicting the Antarctic sea ice cover is slightly degraded. Using our sea ice initial conditions also allows for a larger spread between the members for the sea ice variables, thus more representative of the forecast error.