1 Introduction

Since the inception of “decadal prediction” as a new field in climate research in the early 2000s (Vera et al. 2010; Meehl et al. 2009), much effort has been devoted to provide reliable information of the climate system in terms of forecasts of the coming decades. Being a joint initial and boundary value problem (e.g. Hawkins and Sutton 2009; Murphy et al. 2010), obtaining skillful decadal predictions remains a major challenge, partly due to uncertainty in the initial climate state the model starts from, due to imperfection of climate models, or even due to lacking information about future external forcing.

Despite remaining problems, earlier studies have demonstrated convincingly the improvement in decadal climate predictions, which result from initializing models by estimates of the observed climate state (e.g. CMIP5; Taylor et al. 2012). In general terms, initialization encompasses an initialization strategy and a reanalysis that provides the initial state of the model. How to best estimate the initial conditions and what best practices are while initializing a model from those estimates still remain the subject of major investigations. Modeling groups have in particular devoted much effort into exploring different techniques and methodology for initialization (e.g. CMIP5, Meehl et al. 2009, 2014; Murphy et al. 2007). On seasonal time scales, Magnusson et al. (2012) showed that higher predictive skill was provided by full-state initialization in comparison to anomaly initialization, while Smith et al. (2012) in the context of decadal predictions reported regionally more skillful predictions in hindcasts with anomaly initialization. In contrast, a recent study by Hazeleger et al. (2013) revealed no significant difference in decadal predictability between the two different methods. Polkova et al. (2014) obtained the highest SST predictive skill for flux corrected hindcasts combined with full field initialization due to smaller biases in regions of a deep mixed layer, which carry the predictive memory. Nevertheless, so far the flux correction strategy has not been adopted by any of the climate models in CMIP5 (Meehl et al. 2014).

With the full field initialization, any model is strongly constrained with a realistic representation of the climatic system at the beginning of the forecasts (e.g. Bellucci et al. 2014; Smith et al. 2013), which typically differs from its native model climatology. In response, this leads to an initial shock at the beginning of forecasts (Balmaseda and Anderson 2009) and the model bias grows as the forecasts evolve (Pierce et al. 2004; Smith et al. 2013). So far, the most commonly used source of initial conditions is estimations of a few climate components (e.g. the ocean, atmosphere), mostly derived from combining observations with atmospheric or oceanic general circulation models. However, to offer the best forecast skill, all the elements of the climate system shall be considered, including the atmosphere, the ocean, sea ice and land parameters (Doblas-Reyes et al. 2011; Smith et al. 2007; Troccoli and Palmer 2007). Such initialization can result from a coupled system estimating initial conditions all at once. Reducing the initial shock may also help to improve the forecasts skill with the full state initialization strategy. One way to pursue is to employ analyses derived from coupled data assimilation system as initial conditions (Laloyaux et al. 2015; Mulholland et al. 2015). Ideally this should be obtained using the same model that is used during the forecast step and should be done in a way that the model is kept in dynamical balance thereby eliminating the initial model shock.

Several approaches are feasible to perform coupled data assimilation, including coupled Ensemble Kalman filters (EnKF) as well as fully coupled adjoint data assimilation schemes (see Stammer et al. 2016 for a review of assimilation approaches). As an example, Karspeck et al. (2014) used an EnKF based coupled assimilation approach to initialize decadal predictions. Results were of mixed quality, however, possibly because a state estimated with a coupled EnKF becomes dynamically inconsistent with the coupled system. A superior approach therefore seems to use, instead of a filter, a smother, which estimates only model parameters such that the resulting initial state remains dynamically consistent with the model. Such a coupled adjoint system was realized by Sugiura et al. (2008) who demonstrate that by assimilating historical observations (ocean, atmosphere, ice and land surface) into a fully coupled ocean–atmosphere model “Coupled GCM for Earth Simulator” (CFES), better initial conditions can be derived since the initial shock in the full-state initialization strategy is reduced simply because of the dynamical consistence between the model and initial conditions. By controlling air-sea-land fluxes through the optimization of the bulk formulae parameters, coupled model errors are reduced. In comparison to the initialization from ocean-only assimilation, improved predictive skill is found on seasonal to interannual time scales.

Because in the context of decadal climate prediction a self-consistent initial condition (derived through a coupled model) has not been used, the impact of the consistency between the model and its initial conditions has not yet been explored. To what extent existing forecast skills on decadal time scales could be further improved through the development of coupled data assimilation/forecast systems remains to be investigated. A related investigation is performed here, realized by initializing the fully coupled climate system CFES with (1) self-consistent initial conditions and (2) non-consistent initial conditions. Since with the dynamically self-consistent initial conditions, the model will suffer less from initial shocks and artificial model drift, more skillful predictions than initialized with non-consistent initial conditions are expected. Results will help to identify the existence of additional potential skill, which can result from coupled adjoint estimation approaches.

The remaining paper is structured as follows: Sect. 2 provides a description of the coupled model, the experimental setup, as well as the statistical methods used for skill assessment. The performances of climate predictive skill based on one or the other initialization scheme are evaluated in Sect. 3. Section 4 investigates the underlying mechanisms relevant to predictive skill and concluding remarks are provided in Sect. 5.

2 Model and experimental setup

2.1 The fully coupled global climate system CFES

To explore the skill of decadal predictions, we use the fully coupled climate model CFES developed by “Japan Agency for Marine-Earth Science and Technology” (JAMSTEC). CFES is composed of the atmospheric GCM for the Earth Simulator (AFES, Ohfuchi et al. 2004) and the Ocean-Sea Ice GCM for the Earth Simulator (OIFES, Komori et al. 2005). The resolution of the spectral AFES component horizontally is T42 (about 2.8° by 2.8°); in the vertical the model has 24 layers in σ coordinates. Embedded with a rectangular grid system, OIFES has a global domain with horizontal resolution of 1° in both latitude and longitude, and 45 vertical layers (Masuda et al. 2006; Sugiura et al. 2008).

For the decadal hindcasts, an updated model system forced with historical radiative forcing conditions including greenhouse gas (GHG), aerosol and volcano, is used, based on the radiation code MstrnX (Nakajima et al. 2000). Zonal averaged 2D-xy grid annual-mean GHGs (CH4, CO2, N2O) and monthly volcano are taken from historical and RCP4.5 scenario-based data in CMIP5, both vertically averaged. As is commonly used in global warming simulations, radiative effects of volcanic ash (historical major volcanic eruptions) are also taken into account. But instead of directly using concentration of volcanic ash, additional optical thickness for a specific band of radiation spectrum at the lowest level of the model stratosphere is used. Aerosols (black and organic carbon, dust, sulfur) used here are column integrated monthly-mean historical and RCP4.5 scenario-based simulations in CMIP5. Since the version of MstrnX used does not calculate chlorofluorocarbon (CFC) effect, the CFC values are set to zeros and ozone is set to 3-D monthly climatology.

2.2 Experimental setups for CDA/GECCO2 initialized hindcasts

We initialize the first set of hindcasts performed with the fully coupled climate model CFES with the state provided by the coupled data assimilation (CDA, Sugiura et al. 2008) itself. CDA is an estimation of the coupled climate state obtained by assimilating ocean (SST, climatological data, altimetry and in situ data) and atmospheric data [National Centers for Environmental Prediction (NCEP) reanalysis] into the CFES model with the 4D-VAR method. The coupled model was controlled by air–sea surface fluxes through the estimation of bulk adjustment factors (control variables of sensible heat, latent heat, and momentum fluxes) and the oceanic state every 6 months over 9 months long periods with overlapping windows. In the following we will refer to this set of hindcasts as CDA initialized hindcasts (CIH).

To evaluate how consistent initial conditions impact the predictive skill, another set of hindcasts initialized with the ocean synthesis from the project “German contribution to Estimating the Circulation and Climate of the Ocean” (GECCO2, Köhl 2014), referred to as GECCO2 initialized hindcasts (GIH), is also carried out. Forced with fluxes derived via bulk formulae from the atmospheric state of National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) reanalysis, GECCO2 uses the adjoint method to adjust the atmospheric state by assimilating available hydrographic and satellite data (Köhl 2014). The properties used as the oceanic initial conditions are 3-D monthly mean temperature, salinity, velocity (both zonal and meridional), and sea surface height. Since the GECCO2 uses a different grid (vertical, zonal and meridional) and topography than the coupled CFES system, the GECCO2 fields are interpolated to the CEFS grid after expanding with the nearest grid value towards the continent and bottom of ocean.

In both sets of hindcasts the atmospheric model is initialized with CDA and makes use of climatology of the optimized bulk adjustment factors from CDA. Given the incompatibility of GECCO2 ocean estimation to the atmospheric component of CFES, model biases arise in GIH and the climate of the forecast drifts, which will impact the predictive skill (e.g. Smith et al. 2013). In order to evaluate the impact of the drift, an extra set of hindcasts is conducted in which the ocean is initialized with GECCO2 through the anomaly initialization approach (AGIH). The anomalies of GECCO2 are calculated with respect to the 1980–2006 mean and added to the 1980–2006 climatology of an un-initialized 20C run that is externally forced only. The CDA synthesis is available from 1980 to 2006 and all hindcast experiments that run over 9 years are initialized every year in January from 1980 to 2006, each with one ensemble member. The list of experiments is summarized in Table 1.

Table 1 Summary of the experiments

2.3 Metrics of skill evaluation

To quantify to what extend the climate drift reduces if forecasts are initialized with self-consistent initial conditions, the time evolutions of global-averaged (60°S–60°N) SST from the three sets of hindcasts CIH, GIH and AGIH are compared with observations and CDA or GECCO2, respectively. The observations for SST are taken from the UK Met Office Hadley Center dataset (HadISST; Rayner et al. 2003). As is indicated by the colored lines in Fig. 1, CIH drifts less from CDA (black) than GIH (colors) does with respect to GECCO2 (black), which tends to be warm-biased after initialization and drifts towards a cold bias later on. With the anomaly initialization strategy drifts are smaller than for the full-state initialization approach when initialized from GECCO2. CDA uses 9-month assimilation windows during which bulk coefficients are adjusted to keep the model close to the data and therefore to also minimize drift. Due to the succeeded restarts of the model, the CDA trajectory is not perfectly continuous but the model drift is well controlled such that no apparent gaps become visible. In contrast, CIH uses only a climatology bulk coefficients but additionally GHG forcing. These differences could lead to a larger drift in CIH than CDA as is shown in the top panel of Fig. 1 by the colored lines. Despite this, with the full-state initialization approach, a smaller drift in CIH due to self-consistent initial conditions can still be expected in terms of global SST values.

Fig. 1
figure 1

The time series of global-averaged (60°S–60°N) annual mean SST of HadISST (green), initialized hindcasts (colours) and ocean estimation from which the hindcasts are initialized (CDA/GECCO2/combined anomaly, black). The colored lines in the three panels from top to bottom are CIH, GIH and AGIH

For a detailed evaluation of predictive skill, a posterior bias correction is needed with full-state initialized hindcasts (ICPO 2011). We adopt the lead-time dependent mean-bias correction method before the assessment of forecast skill (e.g. Polkova et al. 2014). The mean bias as a function of lead year is removed from the hindcasts by subtracting the difference between the climatological average of hindcasts and that of observations on each starting date at different lead years from the hindcasts of the same period. To assess the predictive skill of hindcasts, two metrics are commonly used: the anomaly correlation coefficient (ACC) and the root mean square error (RMSE). RMSE offers primary assessment on prediction skill (Tebaldi and Knutti 2007; Goddard et al. 2013) through the comparison between hindcasts and observations. To compare and assess the relative skill in different forecast experiments, the root mean square error skill score (i.e. RMSS; Jolliffe and Stephenson 2003) based on a set of RMSE from paired hindcasts (e.g. RMSS for CIH is 1-RMSECIH/RMSE20c) was computed. The resulting global map of RMSS in terms of annual mean SST (not shown) reveals no significant difference at the first lead year between CIH and GIH, while errors are smaller over large areas of the ocean in the former than the latter at lead year 2–5.

Since RMSE is sensitive to outliers and would lead to a possibility of large error (Collins 2002), another measurement used here is the anomaly correlation coefficient (ACC), which is a scale-invariant measure of the linear associations between the two sets of forecasts. The ACC skill measures only the phase difference between the observations and forecasts experiment (Matei et al. 2012). Since ACC is sensitive to the trend, all the variables are detrended first, following a period dependent linear-trend removal strategy (Kharin et al. 2012). With this strategy, the long-term linear trend of the variable is firstly calculated from hindcasted fields over the initialization period of 1980–2006 at different lead year for each grid point, employing the least square statistical method. Afterwards, the trend is subtracted from the hindcasts over lead years for each grid point. The application of detrending avoids an overestimation of the performance of the hindcasts. The bias correction is only applied for the skill metrics mentioned above. The correlation coefficients shown in this paper are all 95 % significant, based on a two-tail Student’s t test. Considering the fact that time series of oceanic variable is usually auto-correlated, the “effective sample size” (von Storch and Zwiers 1999) is applied.

3 Evaluation of decadal predictive skill of sea surface temperature

Working as a control parameter for the heat flux exchanges between the atmosphere and the ocean, SST is a key variable in climate research. It therefore has been used as a primary assessment parameter of the skill arising from initialization. Respective skill improvement can be observed in particular over parts of the ocean in recent studies of decadal prediction (e.g. Pohlmann et al. 2009; Mochizuki et al. 2010, 2012; Matei et al. 2012). SST is used here again to asses the initialized forecasts in terms of regions and duration of predictive skill.

At lead year 1, wide areas with significant prediction skill are observed over almost the whole ocean from the maps of ACC between CIH and CDA (Fig. 2). The highest ACC skill is found near the tropical area of the ocean, especially over the eastern tropical Pacific, where many forecasts failed to capture the variability (e.g. Matei et al. 2012; Pohlmann et al. 2009; Polkova et al. 2014). Predictive skill at the first lead year is mostly achieved due to initialization. The consistent dynamics and compatibility of initial conditions (CDA) with the model CFES used for hindcasts ensure the high SST predictive skill in CIH.

Fig. 2
figure 2

Spatial distribution of SST anomaly correlation coefficient between CIH and CDA (left), and CIH and observed SST (HadISST, right), at lead year 1 (top panels), averages of lead year 2–5 (middle panels) and lead year 6–9 (bottom panels). Only the significant coefficients (at 95 % level) are shown here. A linear trend removal is applied to all the SST involved before calculation of ACC

However, as the forecast evolves, the influence from initialization decreases, while the response to externally forced climate variation starts to influence the forecasts (Hawkins and Sutton 2009; Branstator and Teng 2010, 2012). A significant decrease of skill is observed when the lead time of the forecasts increases to lead year 2–5 and 6–9, especially over the North Pacific and tropical Pacific, where significant skill is observed at lead year 1. Some areas with predictive skill in the southern Indian Ocean and sub-polar Southern Ocean still remain at lead year 2–5 for CIH/CDA. Nevertheless, researchers do report less prediction skill over the North Pacific compared with the Indian Ocean (e.g. Goddard et al. 2013; Doblas-Reyes et al. 2013). Studies show that initialization is playing a more significant role in predictive skill than the uninitialized simulations in the Pacific (Guémas et al. 2013; Meehl and Teng 2012a, b; Meehl et al. 2014). However, the sensitivity to uncertainties from the initial state (Branstator et al. 2012; Branstator and Teng 2012) may cause a decrease in predictive skill in the Pacific. The small uncertainty in initial conditions in CIH leads to high predictability over eastern tropical Pacific at lead year 1.

For the GECCO2 initialized hindcasts (GIH), the patterns in terms of SST anomaly correlation coefficients skill (Fig. 3) are similar to those of CIH, but with much smaller areas of significant SST predictive skill. The figure reveals the highest SST predictive skill in the southern tropical Indian Ocean, the sub-polar Southern Ocean and the western Pacific at lead year 1. CIH shows higher correlation with CDA than GIH does with GECCO2, especially over the eastern tropical Pacific. Poor SST predictive skill over the tropical Pacific at lead year 1 is observed in previous studies (e.g. Matei et al. 2012; Pohlmann et al. 2009) from GECCO initialized hindcasts, mainly due to the permission of a relatively large SST error while nudging the model towards GECCO (Pohlmann et al. 2009). Later on, in an attempt to improve the predictive skill of SST in the tropics, Pohlmann et al. (2013) introduced atmospheric initialization which contributes to the improvement of predictive skill, suggesting atmospheric initialization is necessary in climate prediction. For GIH, instead of nudging GECCO2 towards the model CFES first, the hindcasts are directly initialized in the ocean from GECCO2 and in the atmospheric from CDA, which does not allow the atmosphere to adjust to the GECCO2 ocean state. The climate difference between GECCO2 and CDA is likely to cause the atmosphere to be inconsistent with the ocean initial state and leads to model drift. As is indicated by Fig. 1 (middle panel), global SST is warm-biased at lead year 1 in GIH. The spatial distribution of the mean SST difference (Fig. 4) between CDA/GECCO2 and HadISST shows that GECCO2 (bottom panel) is warmer than CDA (top panel) in the tropical Pacific (with an exception of central Pacific along the equator), while such differences are less in the high latitudes. The large bias in the tropical Pacific may explain the relatively lower SST predictive skill of GIH relative to CIH there. Spatial distribution of RMSE based on annual mean SST from GECCO2 or CDA and HadISST further reveals relatively larger values in GECCO2 than in CDA over the ITCZ area except the central Pacific (not shown). Averaged over the area of 20°S–20°N, 140°E–90°W, GECCO2 has a RMS SST error of 0.49, which is less than the error of HadISST. Spatial distribution of anomaly correlation coefficients indicates that, despite the warm bias in GECCO2, GECCO2 is better correlated with HadISST than CDA (Fig. 5). Therefore, the use of GECCO2 as an initial condition for the forecasts is probably mainly affected by a larger bias in the tropics which will be investigated further below.

Fig. 3
figure 3

Spatial distribution of the SST anomaly correlation coefficient between GIH and GECCO2 (left), and GIH and observed SST (right), at lead year 1 (top panels), averages of lead year 2–5 (middle panels) and lead year 6–9 (bottom panels). Only the significant correlation coefficients (at 95 % level) are shown here

Fig. 4
figure 4

SST difference to HadISST averaged over 1980–2006 for CDA (top panel) and GECCO2 (bottom panel)

Fig. 5
figure 5

Spatial distribution of the anomaly correlation coefficient for annual-mean SST between CDA and HadISST (top), and GECCO2 and HadISST (bottom). Only the significant correlation coefficients (at 95 % level) are shown here

For the 4-year average, significant SST predictive skill is also found over part of the Indian Ocean and the western tropical Pacific. Both GIH and CIH fail to capture the long-term variation of SST over the North Atlantic (NA), where other studies reported robust skill for forecasts initialized through different modeling systems (e.g. Polkova et al. 2014; Matei et al. 2012; Yang et al. 2013). In the North Atlantic, although the relative role of internal variability and external forcing on SST predictive skill remains a topic to debate, the role of external forcing (including volcanic eruptions and anthropogenic aerosols) is non-negligible (e.g. Booth et al. 2012; Dunstone et al. 2013; Otterå et al. 2010; Villarini and Vecchi 2013). The predictive skill of SST in the North Atlantic is therefore affected by removing the trend, which is connected with increasing GHG but also related to multi-decadal variability. Further analysis on non-detrended SST indicates significant SST prediction skill in GIH (not shown) over the North Atlantic until lead years 6–9, while the NA SST predictive skill in CIH is slightly affected. On the other hand, a recent study by Müller et al. (2014) pointed out that a larger period of hindcasts enhances the significant predictive skill of the NA SST at decadal time scales, due to inclusion of additional phases of multi-decadal variability. Compared with extended hindcasts of 1901–2010, the trend is more important than the internal variability for the NA SST predictive skill of the shorter hindcasts (1960–2010). As is indicated above, the evaluation period in our solution is restricted to an even shorter 27-years period (1980–2006), due to the available CDA synthesis, which is only half size of the 50-years period that are commonly used by previous studies. The reduced predictive skill of the North Atlantic SST in our solution is partly due to detrending and thresholds for significance are high due to the relatively short hindcast period.

In summary, the above results indicate that when the long-term linear trend is removed, dynamical consistent initial conditions provide higher predictive skill of SST than when initialized with initial states from a different model. The strongest enhancement of skill is achieved globally in the first year, especially over the eastern tropical Pacific. However, for longer lead periods the difference becomes small. Surprisingly, the analysis of the non-detrended SST shows that the trend, and thus potentially the forced part of the signal, is less well represented when initialized with consistent initial conditions.

4 Mechanism leading to low prediction skill

Climate predictive skill at shorter time scales is strongly influenced by initialization. Therefore, predictive skill is mostly affected by initialization during the first lead year. From the above results, the most significant difference is that CIH shows substantially higher correlations over the tropical Pacific at the first lead year than GIH. This makes the tropical Pacific a suitable region to assess the impact of consistency between the climate model and its initial conditions on forecast skill and explore the underlying mechanism. Therefore, we will now focus on this region to find out how a non-consistent initial condition influences forecast skills.

Because in the tropical Pacific, interannual climate variability is highly characterized by ENSO (El Niño-Southern Oscillation), the high/low predictive skill of tropical Pacific SST from CIH/GIH at the first lead year is possibly related to ENSO events. Therefore, we now explore the possible mechanism regarding the predictive skill of hindcast SST through analyzing key characteristics of the El Niño and investigate why this mechanism seems to work differently between different initialization approaches. For this purpose, the Niño 3.4 Index based on SST anomalies in the Niño 3.4 region (170°W–120°W, 5°S–5°N) is chosen to characterize the evolution of ENSO events.

4.1 Predictive skill of El Niño events

By comparing the time series of the hindcast monthly Niño 3.4 Index during the first lead year to that of HadISST (Fig. 6), it becomes obvious that CIH reproduces ENSO events for most of the hindcasts at the first lead year. However, besides the successfully reproduced historical ENSO events, some additional erroneous El Niño events are also detected, especially for GIH. CIH produces only one El Niño-like event in 1984, while GIH produces 8 more—1980, 1981, 1984, 1985, 1989, 1996, 2001, 2003 and 2005. Hovmöller diagrams of sea surface height (SSH) and zonal wind stress (τ x ) over these years (not shown) indicate similar characteristics and development procedure as historical El Niño (not shown). However, observations of tropical Pacific SST do not indicate occurrence of El Niño during these years. This leads to a conclusion that GIH produces pseudo El Niño events. These pseudo-ENSO events account for one-third of the number of forecasts. Apparently the poor predictive skill of SST of GIH is relevant to these erroneous El Niño events. In the following, we therefore investigate what conditions in the GECCO2 state lead to the reproduction of pseudo El Niño events in GIH and how the problem can be addressed.

Fig. 6
figure 6

Monthly Niño 3.4 Index of lead year 1 from the hindcasts (black), HadISST (green) and the initialization data (red, CDA and GECCO2 respectively), for a CIH and CDA, b GIH and GECCO2 and c AGIH and GECCO2. The Nino 3.4 Index is derived as averaged SST anomalies in the Nino 3.4 region relative to the climatology 1980–2006. Afterwards, a 5-months running mean is applied to the anomalies in order to smooth out the possible intraseasonal variations. Finally the smoothed SST anomalies are normalized by its SD over 1980–2006

4.2 Zonal momentum balance in the upper equatorial Pacific

In climate prediction, a state resulting from assimilating data into an ocean model is commonly used for initializing the model. When combining this ocean state with the atmospheric model, the balance between the zonal pressure gradient and wind stress forcing near the equator needs special attention. Bell et al. (2004) describes the lack of this balance during data assimilation as one of the reasons for artificial model response. By initializing the CFES coupled model with the interpolated GECCO2 ocean state and the atmospheric conditions from the CFES model based CDA, differences between the atmospheric state of GECCO2 and CDA model may lead to an imbalance between the zonal wind stress and pressure gradient force in the equatorial Pacific (e.g. Neelin and Latif 1998). If the imbalance is large, conditions resembling the early phase of an El Niño event could appear. In order to explore the low performance of GIH in predicting the equatorial Pacific SST, we will focus on the balance along the equator between the pressure gradient force (PGF hereafter) and surface zonal wind stress τ x for CIH versus GIH in the upper ocean. Following Bryden and Brady (1985), the governing equation of momentum balance in the upper equatorial Pacific in the x-direction under balanced situation can be written as:

$$A_{v} \frac{\partial u}{\partial z} = \frac{1}{{\rho_{0} }}\left( {\tau^{x} - \mathop \int \limits_{0}^{z} \frac{\partial P}{\partial x}dz} \right)$$
$$P = \rho g\eta + \mathop \int \limits_{z}^{\eta } \rho^{{\prime }} dz$$

where A v is the vertical eddy viscosity, u is the zonal velocity. The left-hand term represents the parameterization of the vertical mixing due to uncompensated surface wind stress. Here \(\eta\) is the sea surface height, P is the pressure and ρ is the density of the seawater. ρ′ in the equation is the difference between ρ and ρ 0 (ρ 0 = 1026 kg/m3). At the depth where the vertical shear of zonal velocity becomes zero, the wind stress is compensated by the integrated pressure gradient force (Bryden and Brady 1985). Since the simplest early models for predicting El Niño rely on statistical relations of the balance between wind stress and gradients of density (Barnett et al. 1988), we illustrate the change in balance of monthly-mean GECCO2 Synthesis as a reference.

The residual between pressure gradient force and zonal wind stress along the equatorial Pacific is illustrated in Fig. 7 in two categories: historical El Niño years and non-El Niño years. As can be seen, a negative imbalance is already observed in December of the previous year in the western Pacific in historical El Niño years. From March on, a strong negative imbalance has developed in the western equatorial Pacific. The uncompensated negative imbalance could possibly result from (1) relatively weak easterly trade winds; (2) strong pressure gradient force or the combination of (1) and (2). The imbalance propagates eastwards as Kelvin wave until it reaches the eastern Pacific, which is in agreement with the developing phase of El Niño. As a result, warm anomalies are observed in the eastern Pacific (not shown), with a peak intensity around November and lately disappearance of the imbalance. Since the simulations of CIH and GIH are initialized in January, a possible imbalance in January between zonal wind stress and pressure gradient force serves as one condition to trigger an ENSO event later in the year. On the other hand, an imbalance introduced in January is still likely to propagate as Kelvin wave and therefore may explain the occurrence of those pseudo El Niños observed in GIH.

Fig. 7
figure 7

Zonal momentum balance of upper equatorial Pacific between pressure gradient force and zonal wind stress from GECCO2 Synthesis (1980–2006) averaged in historical El Niño years (black) and the non-El Niño years (red) at: the former December, March, May, July, September and November. The unit is N/m2

Considering pseudo El Niño events produced in GIH, we study the zonal momentum balance of GIH in three categories: (1) historical El Niño years, (2) pseudo El Niño years and (3) the non-El Niño years. Since CIH produces only one pseudo El Niño in 1984, this year is excluded in the two categories for CIH and the residual of momentum balance of CIH will be assessed in two categories as the GECCO2 Synthesis. However, although the pressure gradient force and wind stress are the main terms of the momentum balance in the equatorial Pacific, they do not perfectly balance and further terms are relevant, in particular, the vertical advection of the zonal momentum which has opposite sign in the eastern and the western tropical Pacific. In order to exclude the advection terms from the balance, the momentum balance shown in Fig. 8 is referred to those averaged over non-El Niño years in January. Due to the atmospheric pressure forcing embedded in the coupled climate model CFES, the effects of sea surface air pressure on sea surface height are removed from the hindcasts.

Fig. 8
figure 8

Zonal momentum balance of upper equatorial Pacific between pressure gradient force and zonal wind stress in January from GIH (left panel), CIH (middle panel) and AGIH (right panel). Black lines denote average over January of historical El Niño years, red lines denote those averaged over non-El Niño years, and blue line indicates average over GIH-produced pseudo El Niño years. All the values shown are calculated according to the baseline of average over non-El Niño years in January. The unit is N/m2

As indicated by the black lines in Fig. 8, in historical El Niño years, negative values between PGF and τ x are observed in the central equatorial Pacific in January for CIH and GIH. The averaged zonal momentum imbalance in GIH during its pseudo El Niño years also shows a negative value over large area of western and central equatorial Pacific (blue line in Fig. 8), in January when the forecast is initialized. However, the evolution of negative imbalance is sensitive to perturbations and develops inconsistently and differently from the GECCO2 Synthesis (Fig. 7), the eastwards propagation of the imbalance is hardly visible along the equator (not shown). Within the hindcast period of 1980–2006, there are only seven historical El Niño events and eight pseudo El Niño events. Due to the small sample size the variability of the events leads to an inconsistent imprint on the imbalance, which together with the fact that in January the lead-time to the fully developed El Niño is large and indications are therefore weak makes the identifications of the causes of El Niño difficult. However, the similarity between the real and pseudo-events suggests that the imbalance in January is the relevant cause.

The offset in the balance for GIH at initial time suggests that a larger pressure gradient exists in GECCO2 than what can be balanced by the wind stress from CDA, which implies difference in SSH and consequently thermocline slope and SST gradients. These individual elements affecting the momentum balance at the initialization state are investigated in two categories, as is the case for GECCO2.

The climatological SSH and SST anomaly along the equatorial Pacific in January are shown in Fig. 9 from four different data sources: GECCO2 (red), CDA (green), GIH (black), and CIH (blue), in two categories as is mentioned above. The anomalies are calculated relative to the CDA averaged over non-El Niño years. The blue lines almost overlap the green lines in the four panels, suggesting that CIH well resembles CDA. However, the black and red lines don’t follow each other, indicating that GIH has already moved away from GECCO2 despite the short period after initialization. During pseudo El Niño years, GECCO2 shows larger zonal SSH and SST gradients along the equator in the January than CDA (Fig. 9a, c), with a warmer/colder core in the western/eastern Pacific. The relatively larger zonal SSH/SST gradients are also observed in GIH, which supports the occurrence of pseudo El Niños later in those years.

Fig. 9
figure 9

Climatological SSH/SST anomaly along the equatorial Pacific in January of (a) and (c): GIH produced pseudo El Niño years, (b) and (d): historical El Niño years, for GIH (black), GECCO2 (red), CDA (green) and CIH (blue). The anomalies are calculated as differences to CDA averaged over non-El Niño years. The units of SSH and SST are cm and °C

Targeting the problem of ENSO prediction, so far there are three indicators of El Niño during spring, for the occurrence of El Niño later in the year, which include the meridional mode (MM, Chang et al. 2007), the equatorial heat content and the westerly wind bursts (e.g. Kug et al. 2005; Jin 1997). The first is considered as a conduit for the extratropical atmosphere to impact ENSO (e.g. Chang et al. 2007; Wang et al. 2012) and describes a pattern of warming and southwesterly wind anomalies in the vicinity of the Intertropical Convergence Zone. Each of the latter two factors by itself is a necessary but not sufficient pre-condition for an El Niño. In the tropics, there is a solid relationship between the variability of the ocean heat content (OHC) and sea-level variations (White and Tai 1995; Fu and Cazeneve 2000). Such solid relationship is proved by the large values of correlation coefficients (0.71–0.93) between OHC (upper 500 m) over the tropics (5°N–5°S) and SSH along the equator from GECCO2, suggesting that SSH can be used as proxy for OHC to evaluate its impact on ENSO. Therefore, patterns associated with the meridional mode are considered to enhance the perspective for El Niño prediction.

The difference of SST/SSH between GECCO2 and CDA reveals a warming anomaly with higher sea level during the pseudo El Niño years (Figs. 9a, c, 10), whilst the southwesterly wind anomaly only occurs in the central Pacific and northern subtropical Pacific. Therefore, during pseudo El Niño years of GIH, different ocean states between CDA and GECCO2 imply different responses. As is shown in Fig. 11a, stronger zonal wind stress is observed in GECCO2 in the central Pacific (180°–140°W) compared with CDA. Larger mean zonal pressure gradient than CDA in the area of 175°–130°W (Fig. 10b) is also observed in GECCO2. A stronger wind in the central Pacific is needed to balance the relatively larger zonal SSH/SST gradient along the equator from GECCO2 in GIH. Hence, an adjustment between the ocean and atmosphere through coupled air-sea interaction for GIH is taking place and the imbalance propagates eastwards as Kelvin waves along the equator, causing the occurrence of pseudo El Niños.

Fig. 10
figure 10

The January SST/wind stress difference between GECCO2 and CDA as an average over pseudo El Niño years. Color shading indicates SST anomaly (°C) and the vectors indicate wind stress anomaly (N/m2)

Fig. 11
figure 11

Zonal wind stress (a) and zonal pressure gradient (b) along the equatorial Pacific in January averaged over (1) historical El Niño years from GECCO2 (black) and CDA (red); (2) non-El Niño years from GECCO2 (blue) and CDA (cyan). The unit for wind stress is N/m2, and for pressure gradient is Pa/m

4.3 Case study with anomaly initialization

The approach of anomaly initialization helps in reducing model drifts (Fig. 1) by constraining the model with observed anomalies superimposed on the model climatology (e.g. Smith et al. 2013). Previous studies (e.g. Polkova et al. 2014) showed improved SST predictive skill in the tropical Pacific with the anomaly initialization strategy. Therefore, by initializing the model with observed anomalies added to the model climate (here we take the climatology of 20C simulation), the warm bias along the tropical Pacific could be reduced in AGIH and the balance between zonal pressure gradient and wind stress forcing should be expected.

The spatial distribution of the anomaly correlation coefficients for detrended SST (Fig. 12) denotes predictive skill in part of the central Pacific and along the western Pacific at the first lead year in AGIH. Consistent with that, the monthly Niño 3.4 Index of lead year 1 (Fig. 6) shows that AGIH reproduces historical El Niño events better than GIH. In all, there are three erroneous El Niño events produced in AGIH, compared to nine events in GIH. The successful reproduction of ENSO explains the predictive skill of SST in the tropical Pacific at lead year 1 in AGIH. Further illustration of the momentum balance between pressure gradient force and zonal wind stress in the upper equatorial Pacific denotes a negative imbalance along the equatorial Pacific during the climatological January of the historical El Niño years. The usage of anomaly initialization strategy in AGIH provides a better compatibility between the ocean and the atmospheric model since the climatological mean can assumed to be in balance and only the deviations from the mean may give rise to imbalances. Therefore, a balanced state between pressure gradient force and wind stress in the upper equatorial Pacific is observed in AGIH in January over non-El Niño years, while GIH reproduces nine erroneous El Niños within these years. This difference related to the change in climate supports the interpretation that the incompatibility of GECCO2 mean climate leads to the dynamical momentum imbalance in the upper equatorial Pacific in GIH. This highlights the importance of a momentum balance between pressure gradient force and wind stress for SST predictive skill in the equatorial Pacific at shorter time scales.

Fig. 12
figure 12

Spatial distribution of the detrended SST anomaly correlation coefficient between AGIH and GECCO2 (left), and difference between ACCAGIH/GECCO2 and ACCGIH/GECCO2 (right), at the first lead year (top panels), averages of lead year 2–5 (middle panels) and lead year 6–9 (bottom panels). Only the significant coefficients (at 95 % level) are shown here

5 Concluding remarks

In this paper, the benefit of initializing a decadal prediction system with dynamically consistent initial conditions is explored by comparing respective results obtained by initializing the coupled model CFES with the atmospheric initial conditions from CDA and the ocean states either from CDA or from a non-native ocean component (GECCO2). In terms of predictability of SST, we find that CIH provides higher predictive skill at lead year 1 over vast areas of the ocean, suggesting that the dynamical consistency of initial conditions is relevant for the predictive skill at least in the first lead year. The most significant improvement is observed in CIH over the tropical Pacific. The difference between two different hindcasts becomes small over longer lead periods. At longer lead times, the importance of the response to externally forcing starts to increase which is mainly seen if time series are not detrended, especially for GIH.

At lead year 1, the significant, high SST predictive skill over the tropical Pacific in CIH is connected to a good reproduction of El Niño events while the poor predictive skill of SST of GIH relates to additional erroneous El Niño events. The upper equatorial Pacific Ocean is characterized by a zonal momentum balance between the wind stress and the pressure-gradient force. GIH is initialized with the GECCO2 ocean state but atmospheric conditions are resulted from the same coupled model as used in the hindcasts. Since the ocean state of GECCO2 is characterized by a warmer SST with lager zonal SST and SSH gradient than those of CDA, a larger zonal pressure gradient is introduced to GIH. Because the wind stress there remains relatively weaker due to atmospheric initialization through CDA, the large zonal pressure gradient cannot be balanced by wind stress for GIH in the central Pacific. As a result, adjustment though coupled air-sea interaction take place.

The imbalance propagates eastwards as Kelvin waves and causes the additional erroneous El Niños in GIH. Although the two main terms of wind stress and pressure gradient do not perfectly balance and further terms are relevant, in particular, the vertical advection of the zonal momentum which has opposite sign in the eastern and the western tropical Pacific, the reduced predictive skill of GIH in the equatorial Pacific is found mainly related to the dynamical imbalance due to inappropriate oceanic initialization. Our results suggest that in climate predictions, including seasonal forecasts, it is beneficial to initialize a model with oceanic initial conditions consistent to the atmospheric state particularly for the predictive skill in the tropical Pacific. Although not quite reaching the skill of the consistently initialized hindcasts of CIH, the anomaly initialization method solves the problem of inconsistent climatological states between the atmosphere and the ocean that offsets the balance between pressure and wind stress in the equatorial Pacific and the excessive number of El Niños is avoided. It should be mentioned that in the commonly used procedure of nudging the ocean to an ocean synthesis the atmosphere is allowed to respond to the ocean state, which will partially make up for differences in climatological states. This also applies to the alternative approach of forcing a coupled model with observed wind in which the ocean is allowed to react to the prescribed wind (Thoma et al. 2015). However, initializing the ocean and the atmosphere simultaneously by nudging both components to the respective reanalyses was shown not provide additional skill in comparison to initializing only the ocean component, particularly in the tropical regions (Pohlmann et al. 2013). A consistent initialization remains therefore an important factor, and to improve tropical predictive skill the momentum balance between zonal wind stress and pressure gradient force along the equatorial Pacific is crucial when initializing the model from any oceanic state.