1 Introduction

The El Niño/Southern Oscillation (ENSO) forecast remains a challenge and is important for the seasonal to decadal predictions of the global climate because ENSO dominates the interannual variability in the tropical Pacific and is the major source of global climate predictability (National Research Council 2010). Based on twelve dynamical and eight statistical model hindcasts for 2002–11 and 1981–2010, Barnston et al. (2012) concluded that the averaged hindcast skills (measured by the anomaly correlation, R) are 0.42 and 0.65, respectively, for 6-month lead forecasts. It appears that the hindcast skill has been reduced in recent years 2012/13 and 2014/15 are good examples. 2012/13 is a neutral-condition year, while most of state-of-the-art climate models predicted the occurrence of a weak El Niño in the winter of 2012/13 given the observed initial conditions in boreal summer (hereafter, the seasons referred to are those of the Northern Hemisphere), but those forecasts turned out to be false alarms. Similarly, a borderline El Niño in 2014/15 was also not successfully predicted (further discussed in “Appendix”). It was argued that the decrease in the prediction skill since 2000 may be associated with the reduced variability of the coupled system in the tropical Pacific (Horii et al. 2012; McPhaden 2012; Hu et al. 2013, 2016).

The equatorial Pacific warm water volume (WWV) resulting from the variation in the Pacific subtropical cells (STCs) has been long believed to be a useful El Niño predictor (Meinen and McPhaden 2000; Clarke 2014). It reflects the recharge/discharge of equatorial heat exchange between the equator and off-equatorial oceans (e.g., Jin 1997). The discharge of equatorial heat content occurs after the mature phase of El Niño, resulting in an off-equatorward transport of heat (interior STCs divergence) and a shallower than normal equatorial thermocline. This discharge terminates the warm El Niño phase and leads to the development of La Niña through the positive thermocline and zonal advective feedbacks (Jin and An 1999). After the mature phase of La Niña, the recharge of equatorial heat content due to the equatorward transport (interior STCs convergence) results in a deeper than normal equatorial thermocline, returning back to the El Niño phase. The full ENSO cycles are completed on the interannual time scales. Chen et al. (2015) found that the equatorial sea surface temperature anomaly (SSTa) lags the STCs by 18 months with a correlation R = 0.59, and the WWV also leads the SSTa by 10 months (R = 0.73) on the interannual scales. However, the relationship between WWV and ENSO in recent years weakened, and the lead time of WWV to ENSO was reduced (Horii et al. 2012; McPhaden 2012; Bunge and Clarke 2014). Hu et al. (2013) argued that the combination of a steeper thermocline slope with stronger trade winds since 2000 may hamper the eastward migration of the warm water along the equatorial Pacific, reducing the WWV variability and ENSO amplitude. This is consistent with a coherent weakening of the interannual variability of the ocean–atmosphere system in the tropical Pacific (Hu et al. 2013). Thus, the ENSO occurrence and its reduced prediction skill since 2000 may be directly influenced by the coupled system change in the tropical Pacific.

Figure 1 shows the equatorial evolution of the 20 °C thermocline depth anomalies (D20a) during four representative ENSO events since 1994 based on the observed tropical atmosphere ocean (TAO) array data. Following the definition of the Climate Prediction Center (CPC) (http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.shtml), ENSO events are defined by at least five consecutive months over the threshold of ±0.5 °C for the typical Oceanic Niño Index (ONI) throughout this paper, which is a 3-month running mean of SSTa in the Niño 3.4 region (i.e., all SSTa are normalized by 0.5 °C for consistency throughout the paper). Two El Niño (strong: 1997/1998 and weak: 2004/2005) and two La Niña years (strong: 1998/1999 and weak: 2000/2001) are chosen because the thermocline depth differs significantly in response to different ENSO strengths. During the ENSO developing phases, we can see an evident thermocline anomaly deepening (or shoaling) toward the east over time, indicating the pronounced positive (or negative) ocean heat content anomalies propagating eastward along the equator in the El Niño (or La Niña) developing months regardless of the strength and type of ENSO, consistent with the wave dynamics proposed in the delayed oscillator (e.g., Battisti and Hirst 1989). The eastward propagation is more evident in the El Niño years than in the La Niña years because the deepening signal can be easily detected while the shoaling thermocline is commonly mixed up with the already shallow thermocline in the eastern tropical Pacific (Fig. 1c, d). Nevertheless, we can still see the propagation signal between east of the dateline and 140°W in both strong and weak La Niñas. We also note that the propagation in all ENSO events has a similar phase speed of approximately 40°/month, which is about twice as slow as the free baroclinic Kelvin wave speed of 2.7 m s−1 (Fedorov 2010). The slower propagation speed may result from the coupling between the SST and atmospheric convection (Roundy and Kiladis 2006) and has been highlighted as an important signature to the onset of El Niño (McPhaden and Yu 1999). Now the question is if we can use this signature to predict ENSO practically? Most specifically, does it include any additional ENSO triggering information (e.g., the ocean–atmosphere coupling) beside WWV variability?

Fig. 1
figure 1

Evolution of D20a for a 1997/1998 (strong El Niño), b 2004/2005 (weak El Niño), c 1998/1999 (strong La Niña), and d 2000/2001 (weak La Niña) based on the observed pentad mean TAO data. The arrows indicate some events of eastward propagating Kelvin waves (the slope shown is 40°/month for reference)

It is suggested that the eastward propagating, downwelling equatorial Kelvin waves can be excited by the westerly wind events (WWEs) (e.g., McPhaden and Yu 1999; Delcroix et al. 2000; Kessler 2002; Fedorov et al. 2014). The WWEs are referred to as the westerly wind anomalies with amplitude of about 7 m s−1 lasting for 5–20 days (Harrison and Vecchi 1997). Indeed, WWEs have been observed to occur in association with the onset of significant El Niño events for the past 50 years (McPhaden 2004) and every El Niño since 1994. Similarly, Chiodi and Harrison (2015) suggested the easterly wind surges (EWSs) play an important role in the onset and maintaining phase of La Niña. Here, we propose a simple linear regression model for ENSO forecasting by taking the equatorial 20 °C isotherm (reflecting the change of WWV) associated with the eastward Kelvin wave propagation and WWEs/EWSs into account as the main predictor. As we mentioned earlier, the WWV has been used as an important ENSO precursor (Clarke 2014). Meanwhile, some previous studies have also explored the importance of the eastward Kelvin wave propagation in ENSO evolution (e.g., McPhaden and Yu 1999; Delcroix et al. 2000; Roundy and Kiladis 2006). Nevertheless, none has been combined the Kelvin wave propagation and WWEs/EWSs together to predict the ENSO evolution directly. In this study, based on the subsurface ocean temperature anomaly and the ocean–atmosphere coupling feature in the central tropical Pacific, we first derive an ENSO prediction index (EPI) and then build up a forecast model, which shows a better prediction skill than the WWV only scheme. In this model, we emphasize the importance of the intra-seasonal propagation of the subsurface ocean temperature anomalies, driven mainly by the WWEs/EWSs. The advantage of the empirical prediction model is that it can effectively capture/integrate the dynamics of WWV and events of WWEs/EWSs. Particularly, the proposed linear regression model is simple and has a competitive prediction skill compared to other dynamical and statistical models.

Section 2 introduces the data and methods used in this paper. Section 3 presents the correlation analysis and compares the hindcast skill with the other models. Section 4 further discusses the characteristics of the prediction model and its implication for the ENSO dynamics. Finally, the summary is given in Sect. 5.

2 Data and methods

2.1 Observation and reanalysis data

The observational dataset used in this study is primarily based on the pentad TAO 20 °C thermocline depth since 1994. The pentad global ocean data assimilation system (GODAS; Behringer and Xue 2004) data since 1980 is also used for long-term verification purposes. In order to analyze the corresponding ocean–atmosphere interaction, we also take the near surface wind data (0.995 sigma level) from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis project (Kalnay et al. 1996; Kistler et al. 2001) on a 2.5° × 2.5° horizontal grid resolution. Daily and monthly data are used to examine the WWEs and the monthly surface field changes, respectively. The SST dataset used in this study is the monthly HadISST data on a 1° × 1° grid (Rayner et al. 2006). The Global Precipitation Climatology Project (GPCP) dataset (Adler et al. 2003) is also used for precipitation on a 2.5° × 2.5° grid.

2.2 Prediction scheme

The empirical ENSO prediction model is built through a simple normalized EPI (nEPI) based on linear combination of D20a at different times and different longitudes averaged between 2°S and 2°N in the central Pacific along with the zonal wind anomaly (\(w_{x}\)) as follows.

$$nEPI\left( t \right) = f\left( {nEPI_{WWV} \left( t \right), nEPI_{O {-} A} \left( t \right)} \right)$$
(1)

The prediction model \(nEPI\left( t \right)\) is a function of WWV in the central tropical Pacific, \(nEPI_{WWV} \left( t \right)\), and the ocean–atmosphere feedback, \(nEPI_{O {-} A} \left( t \right)\), discussed below. nEPI WWV (t) is defined as \(\alpha_{1} D20a_{{180^{ \circ } ({\text{t}} - 25)}} + \alpha_{2} D20a_{{170^{ \circ } {\text{W}}({\text{t}} - 15)}} + \alpha_{3} D20a_{{155^{{^{ \circ } }} {\text{W}}(t)}}\) at any given predictor time t, and the D20a is the D20 anomaly referred to as available climatological mean: 1993–2007 for the TAO and 1982–2004 for the GODAS, respectively. The subscripts represent the longitude locations along the equator and the day used in the formula. For example, to predict a matured ENSO on December 14, 2009 with a 4 month lead, the predictor time t is August 16. Thus, we use D20a at 180° on July 22 (t − 25), at 170°W on August 1 (t − 15) and at 155°W on August 16 (t) as predictors to obtain nEPI WWV . These longitudes and timings (days) are chosen based on the existing TAO array stations with a 5-day time resolution (http://www.pmel.noaa.gov/tao/proj_over/map_array.html; or see Fig. 2 of Hu and Kumar 2015) so that the forecasting can be made in real-time.

Fig. 2
figure 2

Time-series of nEPI from pentad mean TAO (black) and GODAS (blue) anomaly data since 1994 superimposed by Niño 3.4 index (red). The yellow (green) shading indicates the El Niño (La Niña) matured phase (November to January)

The chosen longitudes and timings in the \(nEPI_{WWV}\) trace the influence of the eastward Kelvin wave propagation linking to the propagation of WWV at approximately 30°/month. This speed is slightly slower than the estimated speed of 40°/month from Fig. 1 because of the low temporal and spatial resolution of TAO array. The different days shown in parentheses represent the lead timing used to mimic this propagation. Currently, the weighting parameters of α1–α3 are defined as the climatologically mean thermocline depth differences between these three longitudes and the thermocline base at 175°E (i.e., D20175°E − D20180°, D20175°E − D20170°W, D20175°E − D20155°W, respectively) and then normalized by the standard deviation (STD) of the EPI. Because the climatological thermocline is normally increasing toward the west along the equator, the different weighting parameters in the normalized EPI WWV (nEPI WWV ) are chosen to reflect the different thermocline tilting (depth difference) relative to the Kelvin wave propagation (phase speed C 2 = gh) in the central Pacific region. Note that this is just a rough estimation for simplicity and is not the actual wave propagation speed in reality. The central Pacific region chosen here, between 175°E–155°W, is where the warm pool is commonly extended eastward related to the WWEs linking with the ENSO events (Yu et al. 2003; Eisenman et al. 2005).

The other term, nEPI OA (t), approximates the positive feedback resulting from the interaction between the ocean and atmosphere to mimic the amplification of the coupled ocean–atmosphere feedback and, thus, depends on the first term and the w x between 180°–155°W (the same region used in the nEPI WWV ). This ocean–atmosphere coupled feedback is not easy to represent. Currently, some of the relevant ENSO prediction models consider only the impacts of westerly wind bursts (e.g., the semi-stochastic model in Gebbie and Tziperman 2009; the WWE parameterization in the dynamical coupled model in Lopez and Kirtman 2014) at most. Here, we assume that the occurrence of more than 50 % probability WWEs/EWSs events during a continuous period of 50 days can have potential impacts modulating the ENSO prediction (i.e., more than 25 days of WWEs/EWSs in a continuous period of 50 days). The time scale of 50 days is chosen based on the mean occurrence of westerly wind burst timing (approximately 48.9 day for 1988–2004) found empirically in Gebbie and Tziperman (2009). Our results are not quite sensitive to the chosen time scale ranging from 20 days to 100 days. This coupled feedback can either compensate for or enhance the interannual precursor of WWV contribution in nEPI WWV as long as the w x can persist long enough (i.e., mimic a sequence of WWEs/EWSs). We then construct this coupled feedback as:

$$nEPI_{O {-} A} \left( {t,nEPI_{WWW} , w_{x} } \right) = sign\left( {event\left( {t,w_{x} } \right)} \right) \cdot dH\left( {t,nEPI_{wwv} } \right)$$
(2)

where

$$event\left( {t,w_{x} } \right) = \left\{ {\begin{array}{*{20}c} {positive ~w_{x} } & {in \left[ {t - 50,t} \right]} \\ {negative ~w_{x} } & {in \left[ {t - 50,t} \right]} \\ \end{array} } \right.$$
(3)
$$dH\left( {t,nEPI_{wwv} } \right) = \left\{ {\begin{array}{*{20}c} {\frac{{\rvert{nEPI_{wwv} \left( t \right) - nEPI_{wwv} \left( {t - 5} \right)\rvert + \rvert nEPI_{wwv} \left( {t - 5} \right) - nEPI_{wwv} \left( {t - 10}\right)} \rvert}}{2},} & {{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2} \le P\left( {event\left( {t,w_{x} } \right)} \right)} \\ {0,} & {P\left( {event\left( {t,w_{x} } \right)} \right) < {1\mathord{\left/{\vphantom {1 2}} \right. \kern-0pt} 2}} \\ \end{array} } \right.$$
(4)

and the \(event\left( {t,w_{x} } \right)\) indicates the positive or negative zonal wind anomaly events within the consecutive 50 days [i.e., (t − 50, t) days corresponding to 11 pentad data]. \(sign\left( {event\left( {t,w_{x} } \right) } \right)\) is just the corresponding positive (or negative) sign of \(w_{x}\) for the amplification in the events of WWEs (or EWSs). The \(event\left( {t,w_{x} } \right)\) can then be used to define the probability, \(P\left( {event\left( {t,w_{x} } \right)} \right)\), of the WWEs/EWSs. We note that the results are not changed significantly if the positive or negative events are defined differently (e.g., changing to larger than STD/2 of the whole time series for the positive events and lower than—STD/2 for the negative events). The main idea of \(nEPI_{O {-} A}\) is to assume that no ocean–atmosphere coupled feedback is triggered if the WWEs/EWSs event probability P < 1/2. When the event probability P > 1/2, we assume the amplification of ocean–atmosphere coupled feedback, i.e., \(dH\left( {t,nEPI_{wwv} } \right)\), is active. Therefore, \(P\left( {event\left( {t,w_{x} } \right)} \right) = 1\) means the pentad \(w_{x}\) has all positive or all negative values for the consecutive 50 day period, indicating long-lasting WWEs/EWSs regardless of magnitudes. The amplification magnitude \(dH\left( {t,nEPI_{wwv} } \right)\) is defined by the mean magnitude change of \(nEPI_{wwv}\) in the last 10 days before the prediction time t. Two pentad time periods are used to average out the potential noises in the observation. The magnitude \(dH\left( {t,nEPI_{wwv} } \right)\) reflects the thermocline change during the time when the ocean–atmosphere coupled feedback is active. Therefore, a large \(nEPI_{wwv}\) may not imply a strong ocean–atmosphere feedback. Only when the \(nEPI_{wwv}\) increases dramatically (time derivative of \(nEPI_{wwv}\) rather than \(nEPI_{wwv}\) itself), the large \(nEPI_{O {-} A}\) occurs to amplify the prediction model. The fundamental idea of nEPI OA (a function of t, nEPI wwv , and w x ) is to assume that the ocean–atmosphere coupled feedback can amplify or reduce the magnitude change of nEPI wwv according to the events of WWEs/EWSs. Thus, the prediction model can be easily constructed based on the available pentad TAO and GODAS data in the real-time forecast.

2.3 Multivariate linear regression model to predict Niño 3.4 SSTa

Taking the physics of the Kelvin wave propagation and the ocean–atmosphere coupled feedback into account, we then construct a multivariate linear regression model based on different lead time. For example, a 6 month lead linear regression model can be constructed as follows (Wu et al. 2009; Seo et al. 2015).

$$nEPI = 0.92 nEPI_{wwv} + 1.58 nEPI_{O {-} A}$$
(5)

The correlation between the two predictors, \(nEPI_{wwv}\) and \(nEPI_{O {-} A}\), is only 0.37. The correlation is even lower (r = 0.12) when a one-year high-pass filter is applied, indicating the two predictors differ dramatically at the intra-seasonal time scale. Then, a statistical prediction model can be built to predict Niño 3.4 SSTa and the predicted SSTa based on different lead-times can be compared with other prediction models to evaluate the hindcast skill.

3 Results

3.1 The nEPI using TAO and GODAS data during 1994–2015

The nEPI formulated in Eq. (5) forms the base in the linear regression prediction model. We first analyze the time-series of nEPI and compare it with the Niño 3.4 index here. Figure 2 shows the time-series of nEPI based on the pentad TAO anomaly data since 1994 superimposed by the Niño 3.4 index. The Niño 3.4 index is normalized by 0.5 °C (ENSO threshold in ONI) so that the ENSO events are defined as the standard ONI. Therefore, El Niño (or La Niña) occurs when the Niño 3.4 index is larger than 1 (or smaller than −1). We note that the ENSO events defined here based on the TAO data are slightly different from those in the latest CPC official list (http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.shtml) (differences are only in the weak ENSO events); see the discussion in “Appendix”. For example, 2014/15 is not an El Niño in the CPC official list, but it is here. We use the same definition based on the TAO data in order to keep the data consistent throughout the manuscript.

The nEPI roughly leads the Niño 3.4 index with the correlation R = 0.64 by 3.7 months overall (bars in Fig. 3a). Unless mentioned otherwise, all correlations in this paper are statistically significant at the 99.9% confidence level (significance levels are indicated with dashed lines). The lead time of this nEPI is longer in some El Niño events and shorter in others. We can clearly see the shifting of the ENSO behavior before (black) and after (green) 2003. 2003 is chosen randomly by dividing the whole time-series into two decadal segments. Before (or after) 2003, the correlation between the nEPI and Niño 3.4 is much higher (lower) with a longer (shorter) lead time. The correlation is 0.70 (5.5 month lead) from 1994–2003 and is only 0.63 (2.3 month lead) from 2004–2015 (Fig. 3a), consistent with the decadal change since 2000 shown in previous studies (e.g., McPhaden 2012; Hu et al. 2013). This decadal change results mainly from the background thermocline change in the tropical Pacific (Hu et al. 2013). The background change may affect the variability of WWV as well as Kelvin wave propagation or the ocean–atmosphere coupling. That deserves further investigation.

Fig. 3
figure 3

Lead-lag correlation of Niño 3.4 index with a nEPI based on TAO (1994–2015), b nEPI based on GODAS (1994–2015), c nEPI based on GODAS (1980–2015) and d WWV index (1994–2015). Bar represents the correlation in 1994–2015 in (a, b, d), and 1980–2015 in (c); black line is the correlation in 1994–2003 in (a, b, d), and 1980–1993 in (c); and green line is the correlation in 2004–2015 in (a, b, d). 99.9 % significance levels for 1994–2003 and 2004–2015 are indicated with dashed lines in (a, b, d). The same significance level for 1980–1993 is also indicated in (c). The bar in (a) is superimposed in (d) as a grey dashed line for comparison

The largest magnitudes of the index in the time-series of nEPI are 1997 spring (positive) and 1998 summer (negative). The lead time of nEPI for the 1997 El Niño is longer than the averaged lead time with a long-lasting value greater than 1 starting from early 1997. This indicates that the WWV not only persists but also propagates eastward frequently so that the signal is enhanced (at least four evident propagation signals are shown in Fig. 1). The nEPI leading signals also exist in other weak El Niño events but with a shorter lead time and a weaker magnitude than the 1997/98 El Niño.

In order to confirm the robustness of the nEPI, we also perform the same analysis using GODAS for the same period 1994–2015 (blue line in Fig. 2). The time-series of nEPI using GODAS reasonably follows the nEPI using TAO with a similar lead correlation of 0.66 by 3.8 months (Fig. 3b). Similar to the decadal change in the TAO data, the best correlation is 0.72 at a 5.5 month lead before 2003 and 0.67 at a 2.7 month lead after 2004, respectively. The reduction of the lead time may be associated with the shortening of ENSO period since the 2002 El Niño. For example, Hu et al. (2016) indicated that ENSO shifted to a relatively higher frequency regime in recent years (from 2–4 years averaged in 1979–1999 to 1.5–3 years after 2000).

Note that the maximum correlation has a secondary peak when the Niño 3.4 leads the nEPI by 9 months (the correlation is −0.33 and −0.37 for TAO and GODAS data, respectively), particularly before 2003. The leading of SST on the thermocline change at the region of 180°E to 155°W has been explained and discussed in Chen et al. (2015) through the zonal wind changes and their interaction with SST (see their Fig. 5).

3.2 The nEPI using GODAS data during 1980–2015

Figure 4 further shows the time-series of nEPI using GODAS data since 1980. A similar lead-lag correlation can also be found. The correlation is still significant (R = 0.59) at 4.5 month lead but lower than that for the period of 1994–2015. If the long-term running mean is taken, the overall correlation can be higher than 0.8 (e.g., the correlation is R = 0.81 when the nEPI leads 6 months with an 11-month running mean in TAO). The lead-time is also much longer before 1993. Overall, the prediction quality of the GODAS is similar to that of TAO array. The long GODAS data confirm the robustness and consistency of the nEPI.

Fig. 4
figure 4

Same as Fig. 2, but for time-series of nEPI from pentad GODAS (black line) anomaly data since 1980 superimposed by Niño 3.4 index (red line)

Asymmetry is a typical nonlinear characteristic of the ENSO variation (e.g., Hoerling et al. 1997; Kang and Kug 2002; An and Jin 2004). Since, on average, the observed El Niño events are often stronger than La Niña events, the skewness of Niño 3.4 is positive (0.32 for the pentad 1994–2015 TAO data and 0.26 for 1981–2015 GODAS data). However, La Niña events tend to last longer than El Niño ones so that the skewness of nEPI is negative (−0.31 for the pentad 1994–2015 TAO data and −0.58 for 1981–2015 GODAS data) because nEPI reflects the change of WWV where the nonlinear thermocline response tends to last longer than the SST response (DiNezio and Deser 2014).

3.3 Comparison with the WWV index

The relationship between the subsurface heat content and ENSO variability has been well recognized. Meinen and McPhaden (2000) confirmed the connection between the WWV and the SSTa, linking it to the development of ENSO, which was suggested by the early numerical models (Zebiak 1989) and theoretical analysis (Jin 1997). Here, the WWV index is defined as the volume of water above 20 °C isotherm in the equatorial Pacific region 137°E–95°W (slightly narrower than the commonly defined region of 120°E–80°W due to the insufficient observation in TAO). The time-series of normalized WWV based on the pentad mean TAO data is shown in Fig. 5 since 1994. Compared with the time-series of nEPI (Fig. 2), the WWV index is much smoother, representing the whole equatorial volume changes. The correlation of WWV with the Niño 3.4 index is comparable with the correlation between the nEPI and the Niño 3.4 index (Fig. 3d, lead-lag correlation between the nEPI and Niño 3.4 index is also superimposed as a grey dashed line). The lead-lag correlations are similar when both indices lead the Niño 3.4 SSTa with a slightly longer lead time for nEPI. This suggests that the eastward propagation of heat content may be an indicator for the influence of WWV onto the Niño 3.4 SSTa (Jin 1997). However, they differ significantly when both indices lag, indicating different characteristics. Consistent with the decadal change of the lead time discussed above (longer before 2003 and shorter after 2004), the maximum correlation is 0.73 (0.69) when the WWV leading Niño 3.4 by 4.2 (2.8) months for the period of 1994–2003 (2004–2015). In fact, further analysis shows that the correlation of nEPI is higher than that of WWV if the chosen longitudes in Eq. (5) are all shifted eastward 5°–10°. These results are expected since shifting the D20a locations eastward leads to much closer locations to the center of the Niño 3.4 region (145° ± 25°W).

Fig. 5
figure 5

Same as Fig. 2, but for time-series of WWV (black line) and nEPI (blue line) indices from pentad TAO anomaly data since 1994 superimposed by Niño 3.4 index (red line)

As expected, the time-series of nEPI is highly correlated with the WWV. Specifically, the low frequency variation of nEPI is very close to the WWV variation (R = 0.94 based on an 11-month running mean). Although they are similar in the low-frequency scales, the nEPI includes more high-frequency variability. The nEPI is closer to a new WWV proxy defined in Bunge and Clarke (2014) if nEPI OA is not included. Bunge and Clarke (2014) suggested that the local D20a near Kiritimati Island (157°28’W, 1°59’N) could serve as an effective proxy for WWV on the interannual to decadal scale since the D20a at Kiritimati can be approximately derived from the sea level and local rainfall.

Figure 6 illustrates the major differences between the WWV and nEPI in two randomly chosen weak El Niño (1994 and 2009) and 2 weak La Niña (1995 and 2011) years. The D20a proxy based on Bunge and Clarke (2014) is also shown as the red dashed line for comparison. All of these indices have similar increasing trends in the weak El Niño developing years and decreasing trends in the La Niña ones, showing the consistent low-frequency contribution of WWV. We can also see that the dominant nEPI WWV (black dashed lines) in Eq. (5) indeed resembles the D20a index defined in Bunge and Clarke (2014), and the only difference comes from the contribution of the three different longitudes at different times, which may enhance or weaken the nEPI signal (e.g., the tendency toward a positive value in late February 2009, representing the existence of the eastward Kelvin wave propagation to initiate the onset of the 2009/10 El Niño in Fig. 6b). The difference between nEPI (black solid) and nEPI WWV (black dashed) comes mainly from the nonzero contribution of nEPI OA when the ocean–atmosphere is active (WWEs/EWSs are taken into account). The resemblance between the nEPI and D20a index in Bunge and Clarke (2014) in Fig. 6 is expected (but they significantly differ in the strong ENSO years, not shown here) and indicates that few Kelvin wave propagations with clear ocean–atmosphere triggering events occur in these weak ENSO years. Whether the eastward wave propagation event exists or not is the main difference between the nEPI and D20a index in Bunge and Clarke (2014). There is no significant difference if there is no eastward wave propagation related dynamics. On the other hand, the difference between the nEPI and the common WWV index is generally more noticeable.

Fig. 6
figure 6

Evolution of nEPI, nEPI WWV , WWV and the D20a index of Bunge and Clarke (2014) [labeled as B&C (2014)] superimposed by Niño 3.4 index (red line) from pentad mean TAO anomaly data in 4 weak ENSO years; El Niño: a 1994, b 2009; and La Niña: c 1995, d 2011. All indices are normalized by their own STDs

Our results support the speculation that the low-frequency WWV variation resulting from the STCs convergence/divergence provides a sufficient heat source to affect the tropical SST variation (e.g., Chen et al. 2015). This explains the good correlation of WWV and Niño 3.4 index in Fig. 3d (Meinen and McPhaden 2000). However, it does not guarantee the occurrence of ENSO. The impacts of WWV are clearly evident (Fig. 11 in Chen et al. 2015) only in the medium to large ENSO events. For weak ENSO and natural conditions, the ocean–atmosphere coupling may be very important. Since the nEPI is weighted by the enhanced propagating and the WWEs/EWSs signals between 180°E and 155°W, it actually includes the potential short-term triggering signal not related to the low-frequency variation of WWV. This will be further discussed in Sect. 4.

3.4 Statistical ENSO prediction model hindcast and its skill

The above correlation analysis suggests that the nEPI can be used as a predictor. Figure 7a shows the time series of the hindcast SSTa (red line) in the Niño 3.4 region with a 6-month lead using the linear regression model described in Sect. 2.3. The observed SSTa (blue line) from the pentad mean TAO anomaly data since 1994 is also overlaid (R = 0.62). We can see the hindcast SSTa is quite noisy due to the large variation of the observed D20a, while the hindcast SSTa predicted by the WWV is smoother. The hindcast skill increases (R = 0.67) by merging the pentad mean TAO data into the monthly data (Fig. 7b). Similar 6-month lead hindcast differences between the pentad and monthly GODAS data since 1980 can also be found in Fig. 8. In the following analysis, all comparisons are made based on the monthly data in order to compare with previous studies. The 6-month lead hindcast skill (R = 0.67) during 1994–2015 is slightly higher than the hindcast SST using the WWV (R = 0.62) listed in Table 1 and is also comparable with the ENSO plume (Barnston et al. 2012). The hindcast skill may change according to the period of hindcast. The 6-month lead hindcast of 2002–2011 period using TAO data has R = 0.59 (Table 1), which is higher than the averaged skill of 0.42 in Barnston et al. (2012). The hindcast skill is 0.68 for the period of 1981–2010, which is also higher than the averaged skill of 0.65 in the ENSO plume (Barnston et al. 2012).

Fig. 7
figure 7

a Time series of the predicted SSTa (red line) in the Niño 3.4 region generated by the 6-month lead linear regression model overlaid with the observed SSTa (blue line) from the pentad mean TAO anomaly data since 1994. b Same as a but converting the pentad mean TAO data into monthly data. Hindcast SSTa using the 6-month lead linear regression model of WWV is also superimposed

Fig. 8
figure 8

Same as Fig. 7 but using pentad mean GODAS anomaly data since 1980. The WWV hindcast is not superimposed to avoid crowdedness

Table 1 The hindcast skills (correlation and normalized RMSE) using the monthly data based on the linear regression model and the WWV, respectively

Although the nEPI is close to the commonly used WWV index, the Niño 3.4 SSTa hindcast skill based on the linear regression model is generally better in terms of the monthly correlation, root-mean-square-errors (RMSEs) and ENSO occurrences. Table 1 also shows the 4 and 8-month lead hindcast skills using the TAO and GODAS data. The hindcast skill of WWV dramatically drops from 0.70 (4-month lead) to 0.49 (8-month lead), while the hindcast of the proposed model still has the skill of 0.57 with an 8-month lead in the period of 1994–2015. The 8-month lead hindcast skill using the GODAS data (R = 0.58) also does not degrade too much. The corresponding normalized RMSEs are also listed in Table 1 for comparison, indicating the reduced RMSEs using the proposed model. Considering the 6-month or longer lead hindcasts, the hindcast skill (correlations and normalized RMSEs) based on the linear regression model for both TAO and GODAS is higher than that based on the WWV index and the ensemble mean of the ENSO plume shown in Barnston et al. (2012). That may imply the importance or advantage of including the evolution (propagation) of the subsurface ocean temperature anomaly associated with WWEs/EWSs in predicting ENSO.

In addition, based on the percent correct metric (Larson and Kirtman 2014), the proposed linear regression model with a 6-month lead time can successfully hindcast all El Niño events since 1994 using the sign criteria and miss the 2014/15 El Niño and 2000/2001 La Niña using upper or lower tercile criteria (Table 2). These results are superior to those using the 6-month lead WWV hindcast (parentheses in Table 2). In fact, the missed 2014/15 El Niño is a borderline El Niño, and the missed 2000/2001 La Niña is due to the long-lasting La Niña. The long-lasting evolution of La Niña extends from 1998 (three consecutive La Niña years) and results from the asymmetric evolution between El Niño and La Niña (Hu et al. 2014; Chen et al. 2015). That may also imply that the thermocline change and its propagation embedded in Eq. (5) are not the only causes for the La Niña evolution. We will further address these discrepancies in Sect. 4.

Table 2 Percent correct of 6-month ENSO forecast (see Larson and Kirtman 2014 for the metric definition)

The lower percent correct of hindcast using the 1980–2015 GODAS is not surprising due to the lack of observational data before 1994. The prediction based on GODAS actually matches better with the ENSO events defined by the latest Extended Reconstructed Sea Surface Temperature v4 (ERSST.v4; Huang et al. 2015). This may be because the climatological mean of GODAS is 1982–2004, which is closer to the climatology of 1981–2010 used in ERSST.v4. The use of different monthly climatology indeed plays an important role in defining the occurrence of some weak ENSO events even using the same SST data (see the discussion in “Appendix”). Nevertheless, the prediction using GODAS and TAO data is quite similar in general, and these percent corrects are all higher than those predicted by the Pacific meridional mode (PMM, Larson and Kirtman 2014). These results suggest that the proposed model can be effectively used in both TAO and GODAS data for the ENSO prediction although GODAS data has some embedded model biases (Hu and Kumar 2015).

4 Discussion

4.1 Dynamical processes associated with the statistical model

The fundamental idea of the proposed statistical model mainly relies on whether the local WWV propagates eastward beyond the dateline associated with the WWEs/EWSs events. The two criteria determining the hindcast of the proposed model are the WWV and eastward Kelvin wave propagation amplified by the WWEs/EWSs. The accumulation of WWV in the tropics serves as a precondition and plays a dominant role (as shown by the prediction of WWV). However, not all large WWV events in the western-central Pacific lead to an ENSO event. The eastward propagation of WWV resulting from the ocean–atmosphere coupling is another necessary contribution to trigger the ENSO development. 2012 is a year in which the tropical Pacific warming in the spring failed to develop into an “El Niño” due to a sudden decay of SST in summer and fall. The D20a at 180°E in January to April 2012 was more than 20 m deeper than the climatology, and the eastern Pacific SSTa even exceeded 1 °C in July 2012. Most state-of-the-art models gave a false alarm of an El Niño by predicting its occurrence at the end of 2012. However, there was no eastward propagation of a positive WWV anomaly since May 2012 (Fig. 9), resulting in the better hindcast of neutral SSTa using the proposed statistical model rather than the WWV (Fig. 7). Figure 9 shows the Hovmöller diagram of D20a (contours) along the equator superimposed by the surface zonal wind anomalies (shaded) averaged between 2°S–2°N for two El Niño (strong: 1997, weak: 2004), two La Niña (strong: 1998, weak: 2000) and two neutral years (2001 and 2012). 1997/98 is the typical Eastern Pacific El Niño and 2004/05 is the typical Central Pacific El Niño (Kao and Yu 2009; Yu et al. 2012). However, we can see the existence of a sequence of eastward propagation associated with strong WWEs in both years. Only a very weak and even shoaling D20a propagation signal can be detected after May 2012. This made the statistical model predict lower SSTa than the WWV in Fig. 7 (leading to a neutral condition rather than an El Niño). Su et al. (2014) showed that the SSTa cooling in the eastern subtropical Pacific in the summer of 2012 induced strong easterly and low-level divergence anomalies, suppressing the development of westerly and convection anomalies over the equatorial central Pacific. Thus, the unfavorable zonal wind anomalies (shown as color in Fig. 9) inhibited further development into an El Niño event. Therefore, the eastward propagation of WWV is suppressed as indicated by the predicted SSTa using the proposed model. These processes are simply parameterized in the nEPI and the resulting statistical model. Although the WWV had already provided a good initial precondition for the possible growth of El Niño since early 2012 (Fig. 9), the decoupling suppressed the growth of Bjerknes Feedbacks so that no eastward propagation of positive D20a can be clearly found in the upper ocean, and the WWV cannot be further enhanced and sustained.

Fig. 9
figure 9

Hovmöller diagram of D20a (contours) along the equator superimposed by the surface zonal wind anomalies (shaded) averaged between 2°S–2°N for two El Niños (strong: 1997, weak: 2004), two La Niñas (strong: 1998, weak: 2000) and two neutral years (2001 and 2012). Contour interval is 8 m and the zero contour is thick black

In fact, a similar impact of subtropical SSTa can also explain the missed 2000/01 La Niña using the proposed model (the predicted SSTa has turned into a neutral phase). 2000 can be seen as a continuation of the strong La Niña in 1998 (i.e., three continuous La Niña years). A strong La Niña is likely to generate strong off-equatorial negative oceanic heat anomalies associated with reflected westward Rossby waves at the oceanic eastern boundary. The stronger westward propagating anomalies can better maintain their identity, leading to the persistence of cold surface SSTa in the following years (Hu et al. 2014). Figure 10 shows the extensive cold SSTa (shading) in the central and eastern Pacific between 35°S–35°N throughout the entire year 2000 even though the D20a had deepening around 180°E (contours in Fig. 9) and the tropical warm WWV had been accumulated due to the enhanced STCs resulting from the strong anomalous easterly wind in the central Pacific (Fig. 10) in association with the increased Sverdrup transport (Chen et al. 2015). However, the cold subtropical SSTa prevents the transition from La Niña to El Niño, so as to hamper the ocean–atmosphere coupling that favors El Niño development. The negative precipitation anomalies also indicated a typical convection anomaly (contours in Fig. 10) associated with La Niña. This is further supported by the cooling tendency in the central and eastern Pacific in both hemispheres during 2000 (Fig. 5b in Hu et al. 2014). The discharge in late 2000 is indeed much weaker than the other typical La Niña events (WWV tendency in Fig. 4 of Hu et al. 2014), suggesting that the recharged process attempted to turn the subsurface heat content into the El Niño phase but failed. The competitive surface SSTa and winds in the subtropics continued to suppress the required ocean–atmosphere coupling. This suggests that the recharge–discharge oscillator can well explain the low-frequency variability of STCs on the interannual scales as shown in Chen et al. (2015) but it is not a perfect indicator whether the ENSO really occurs or not (i.e., the typical definition of a 3-month average of SSTa in Niño 3.4 exceeds ±0.5 °C for 5 months). Although we can see a large rebound in nEPI (or WWV) after the strong La Niña, it is not necessary for the La Niña to completely turn its phase if the subtropical SSTa (and its associated surface flux) is unfavorable for the required ocean–atmosphere coupling. Therefore, it is not surprising to have this discrepancy since the current scheme considers only the WWV and its propagation without any extratropical influence and other surface conditions.

Fig. 10
figure 10

Two-monthly averaged SSTa (shading), anomalous precipitation (contours), and the near-surface anomalous wind vector during 2000. The interval is 0.25 °C for SSTa, and 1.5 mm/day for anomalous precipitation (blue is positive and red is negative). The anomalous wind has the unit of m s−1

Similarly, 2001 still had a large negative SSTa in the subtropical Pacific (both hemispheres, not shown here). Although a weak and warm SSTa emerged along the eastern tropic coming from the increasing subsurface heat content since the spring of 2001 (Fig. 5), the enhanced positive WWV still could not overcome the extensive cold SSTa from the subtropical northeastern Pacific as in 2000. Intermediate WWEs were found but still could not trigger the large scale surface convergence in the central Pacific. Therefore, the predicted SSTa in the Niño 3.4 region remains a neutral condition like the observation.

Similar competition between the WWV and subtropical SSTa contribution can be found in the 2014/15 El Niño due to the suppression of WWEs (e.g., Menkes et al. 2014; Larson and Kirtman 2015). A warm SSTa and a cold SSTa exist at the northeastern and southeastern subtropics since 2014 January (Fig. 11), respectively. The subtropical pattern in the northern hemisphere is a typical surface footprint of the Victoria Mode (VM), leading to an increase in the zonal SST gradient across the western–central tropical Pacific and then strengthening the anomalous westerlies over the western–central tropical Pacific (Ding et al. 2015a). The tropical eastern Pacific part of VM is also called PMM (see Chiang and Vimont 2004; Chang et al. 2007; Larson and Kirtman 2014), but the VM emphasizes specifically on the extratropical atmospheric forcing while PMM focuses mainly on the tropical air-sea coupling. The process continuously pushes the occurrence of Bjerknes feedback through the local ocean–atmosphere coupling in summer–winter 2014 (evidenced by the continuous Kelvin wave propagation and intermittent WWEs in Fig. 12a) and ultimately leads to the weak warming event in early 2015. The strong positive VM in spring of 2014 is also followed by a positive Pacific Intertropical Convergence Zone (ITCZ) precipitation anomaly during the summer of 2014, in association with positive precipitation anomalies over the ITCZ regions of the tropical western Pacific (contours in Fig. 11). The thermodynamic ocean–atmosphere coupling between the ITCZ and SSTAs associated with the VM may play a key role in the initiation of the weak warming event in 2014/15 (Ding et al. 2015b; Zhu et al. 2016) although this event is not in the CPC official list of ENSO (see the discussion in “Appendix”). Here, we still treat it as an El Niño event based on the TAO data while the proposed model predicts a neutral condition (the only missed El Niño event).

Fig. 11
figure 11

Same as Fig. 10, but for 2014

Fig. 12
figure 12

a Same as Fig. 9 but for 2014. bd The evolution of SSTa at three different latitudes in 2014

However, we have to indicate that not all positive VM events can lead to an El Niño (Ding et al. 2015a). Indeed, the high nEPI and WWV in early 2014 (second highest nEPI in Fig. 2 and WWV index in Fig. 5) suggested that the subsurface heat content and its propagation provide a good precondition favorable for an El Niño development. In reality, 2014/15 is just a borderline El Niño event. A possible key is the missed booster of the southern hemispheric quadrupole SSTa pattern (a similar subtropical SSTa impact of VM as discussed above). Since January 2014, the cold subtropical SSTa continuously suppresses a potential super El Niño development so that the eastern tropical Pacific cannot be warm enough to initiate the necessary ocean–atmosphere coupling (Hong et al. 2014; Ding et al. 2015c; Zhu et al. 2016). The southern hemispheric extratropical impacts play a vital role in many canonical ENSO events (Zhang et al. 2014a, b; Ding et al. 2015c). The sequence of Fig. 11 shows that the unfavorable southeasterly surface wind in addition to the cold SSTa in the southeastern subtropical Pacific provide negative feedback for the growth of El Niño until early 2015 (also see the evolution of SSTa in Fig. 12; Zhu et al. 2016). This is also supported by the lack of WWEs after April 2014 (Fig. 9), which indicates the lack of a large scale ocean–atmosphere coupling in the central–eastern tropical Pacific (see the discussion of Menkes et al. 2014; Larson and Kirtman 2015). Furthermore, a large extension of warm SSTa in the western Pacific (west of 180°E) also inhibits the zonal SST gradient favorable to the shifting of the atmospheric response until late November 2014. Therefore, even though there is favorable WWV, the lack of coupling suppresses the occurrence of a super El Niño as demonstrated in the current model.

4.2 Linkage between the nEPI and Westerly Wind Bursts (WWEs) in ENSO events

A key question linked to the ocean–atmosphere coupling is how this eastward Kelvin wave propagation is connected to the WWEs/EWSs, which can be seen in all ENSO events. Particularly, D20a eastward propagation is consistent with the tropical Pacific SSTa change for a moderate to strong ENSO (not shown here), indicating a significant contribution from the subsurface heat content through the positive thermocline and zonal advection feedbacks in these ENSO events. However, the consistency is not valid for a weak ENSO or neutral condition, suggesting that an additional mechanism related to the ocean–atmosphere coupling may be involved.

We can clearly see that the eastward, downwelling D20a propagation is always associated with the WWEs above or to its west in Fig. 9 (a large positive zonal wind speed). Previously, the WWEs have been thought to be the surface manifestation of the Madden-Julian Oscillation (MJO) (e.g., McPhaden and Yu 1999; Slingo et al. 1999; Seo and Xue 2005; Seiki and Takayabu 2007a, b). However, recent modeling studies suggested that adequate modeling of the feedback between SST and WWE can significantly affect the characteristics and dynamical regime of the ENSO system (Eisenman et al. 2005; Perez et al. 2005; Gebbie et al. 2007; Jin et al. 2007) instead of MJO. Chiodi et al. (2014) further found that WWEs are not totally stochastic processes. Rather, the characteristics of WWEs are a function of the large-scale environment in which they form. A large deterministic component may depend on the large scale SST pattern (Tziperman and Yu 2007), where the WWEs can be modulated by the low-frequency SST variation (Eisenman et al. 2005; Gebbie and Tziperman 2009). The WWE is likely to occur when the warm pool is extended eastward (Yu et al. 2003; Eisenman et al. 2005) regardless of any MJO event. In fact, the range of the warm pool edge variation is used in the definition of nEPI in Eq. (5), where we believe an intensive ocean–atmosphere coupling commonly occurs. This is also supported by the regional atmospheric and oceanic model studies of the 2002/03 El Niño in Miyama and Hasegawa (2014), which found that the WWEs could be enhanced by the SST gradient along the equator in the western Pacific warm pool. The enhanced WWEs further increase the oceanic Kelvin wave response along the equatorial Pacific. Figure 9 shows that the favorable D20a gradient exists a few weeks before and after the WWEs in all ENSO events, and an unfavorable D20a gradient (e.g., easterly wind anomalies above the deepened D20a) commonly exists in the neutral years. This also suggests that the SST-modulated WWEs, which lead to the eastward D20a propagation, may be critical in the ENSO evolution.

The proposed statistical model is mainly based on the central Pacific D20a derived from the TAO or GODAS data with different time lags. The different time lags take the existence of the eastward wave propagation into account. The estimated wave propagation speed in Fig. 1 and Eq. (5) is about half of the linear baroclinic Kelvin wave speed, which takes approximately 2 months across the whole Pacific basin (Bosc and Delcroix 2008; Fedorov 2010). The slow Kelvin wave phase speeds resulting from the ocean–atmosphere coupling during the development of El Niño are consistent with the previous findings of Roundy and Kiladis (2006) and the theoretical work by Lau (1981).

5 Summary

A statistical linear regression model, which is based on the subsurface ocean heat condition and its propagation, provides a promising statistical prediction of ENSO events since 1994 (or 1980) using the TAO (or GODAS) data. Based on the monthly TAO data, the averaged hindcast skill of the Niño 3.4 SSTa using the linear regression model of Eq. (5) in the period of 1994 to 2015 is 0.75, 0.67 and 0.57 with a 4, 6 and 8 month lead, respectively. The hindcast skill is also verified using the GODAS data since 1980, showing the averaged 6-month (8-month) lead hindcast skill of 0.64 (0.58). The hindcast skill of SSTa is generally better than the commonly used WWV index in terms of the monthly correlation, normalized RMSEs and ENSO occurrences due to accounting the additional information of the Kelvin wave propagation resulting from the ocean–atmosphere coupling in the central Pacific. The successful hindcasts of the proposed model supports the key influence of the interannual WWV combined with the Kelvin wave propagation to initiate an ENSO development. The Kelvin wave propagation can be clearly seen in all ENSO events regardless of ENSO strengths. 2012 is a good example showing a large spring WWV accumulation in the central tropical Pacific due to the STCs contribution but no clear Kelvin wave propagation after May, leading to an aborted El Niño. This neutral condition was predicted by the proposed model while many state-of-the-art climate models gave a false alarm.

The proposed model takes the eastward Kelvin wave propagation into account, induced by the surface wind anomalies. Under favorable subtropical SSTa condition, the WWEs further advect the edge of the warm pool eastward, triggering a distinct warming over the central and eastern tropical Pacific. Once these warmed regions merged, the ocean and atmosphere coupling was fully effective, thus leading to an El Niño occurrence. The final triggering of an ENSO requires the necessary ocean–atmosphere coupling in the central Pacific. Our results are consistent with the physical processes of ENSO development as follows: the subsurface WWV provides an important ocean precondition, which is a necessary condition for moderate to strong ENSO but not for a weak ENSO. When the seasonal phase lock of ocean–atmosphere coupling triggers the positive (negative) zonal wind anomaly in boreal summer and fall, an El Niño (La Niña) will develop as evidenced by the Kelvin wave propagation, which transport ocean heat anomalies from the central to central-eastern tropical Pacific. However, whether the ENSO will be initiated later or not relies heavily on the extratropical SSTa distribution in both hemispheres, which directly affects the sequential ocean–atmosphere coupling. Both northern and southern hemispheric extratropical SSTa forcing can play a role in suppressing or enhancing the ultimate ENSO development (Wang et al. 2012; Ding et al. 2015a, c).

Finally, the proposed model predicts a neutral event in 2014/15 (while it is a very weak El Niño defined using TAO data). The high nEPI value in March 2014 indicates a large WWV contribution but cannot last long due to the suppression of southern hemispheric cold SSTa. Nevertheless, the weak coupling still exists, resulting from the northern hemispheric SSTa; thus, we can still see the eastward propagation of the Kelvin wave through 2015. Since the preconditioned WWV still exists in the spring of 2015 due to its low-frequency characteristic (Chen et al. 2015) and the subtropical SSTa in early 2015 becomes favorable for an El Niño development, an El Niño did indeed occur at the rest of 2015 with much stronger strength than that in 2014.