1 Introduction

Numerous processes drive the natural climatic variability, but the most important is the El Niño-Southern Oscillation (ENSO) phenomenon (Clarke 2008). ENSO is an atmospheric phenomenon characterized by anomalies in temperature in tropical areas of the Pacific Ocean, causing changes in wind and precipitation patterns in tropical and latitudinal regions (Souza Júnior et al. 2009). The ENSO has two distinct phases: El Niño, a hot phase, and La Niña, a cold phase (Berlato et al. 2005). The cycle period in which neither El Niño nor La Niña occur is classified as a neutral phase, which presents intermediate climatic conditions.

The ENSO phenomenon is tracked and quantified by the Oceanic Niño Index (ONI) which is defined as the average sea surface temperature (SST) anomalies in certain points of the Pacific Ocean (Ludescher et al. 2014). The reference region in the ocean is called “Niño 3.4” corresponding to the area formed between 5° S and 5° N and between 120 and 170° W. It was established that an episode of El Niño occurs when the index is above + 0.5 °C for a period of at least 5 months, just as an episode of La Niña occurs when this index is below − 0.5 °C for the same period of time (Climate Prediction Center, 2012).

Strong episodes of El Niño or La Niña can have major impacts on the world (Ludescher et al. 2014) such as tsunamis, storms, droughts, or abnormal rains (Zhang et al. 2006), as reported by Ronghui and Yifang (1989) in the drought caused in the Indochina Peninsula and southern China during the summer, by Chimeli et al. (2008) in the direct correlation between the effects of the SST anomalies and the price of maize, or by Bastianin et al. (2018) in the impact of ENSO on coffee plantations in Colombia. Therefore, early warning systems in cooperatives, government institutions, companies, and even producers, which have ENSO quality forecasts, are highly desirable.

Although ENSO forecasts using coupled atmosphere–ocean general circulation models (AOGCMs) generally outperform statistical models (Luo et al. 2008), AOGCM systems require high processing power due to the need for simulation of many scenarios (McPhaden et al. 2006) and still do not provide an accurate ENSO forecast for delivery times starting 1 year (Tang et al. 2018).

Several events can improve predictions of El Niño and La Niña episodes such as the evolution of current models, the improvement of observation and analysis systems, the higher spatial resolution, and the use of longer time series. In addition, the construction of statistical deterministic models, based on observed time series, has become increasingly common, especially in regional climate applications (Gavrilov et al. 2019).

Having a long series of observations allows to make the time series stationary by increasing the probability of accurate forecastings (Armstrong et al. 2005). However, statistical models continue to have problems with accuracy due to high spatial and temporal variability, limiting these types of models only in specific regions. Thus, artificial intelligence (AI) methods have been filling these gaps due to the great power of accurate predictions with great computational power associated with simple hardware.

AI methods are considered important when the systems under study are complex and with high data variability (Witten and Frank, 2005; Barsegyan et al. 2007). With the advent of “big data,” AI has had a dramatic impact on many domains of human knowledge (Austin et al. 2013; Arsanjani et al. 2014) making accurate estimates and forecasts possible in a relatively simple way.

Machine learning techniques (ML) are quite robust and can easily analyze patterns and rules in large data sets of multiple predictor variables that have nonlinear relationships with the target variable (Chandra et al. 2019). Aparecido et al. (2021) related that ML is a method that works with data analysis, and it seeks to automate the construction of analytical models. A widely used ML algorithm is the decision tree classifier (DTC) mainly in data prediction analysis (Zhu et al. 2018).

The basic objective of inducing a decision tree is to produce an accurate forecast model (Breiman et al. 1984); this method presents robustness regarding the presence of extreme values (outliers) or even values with error, besides traceability and good performance. For this reason, DTC has already been used for the organization and development of decision support systems related to agro-meteorological databases (Teli et al. 2020).

Nevertheless, few articles in the literature use machine learning (ML) techniques related to the occurrence of ENSO. Silva and Hornberger (2019) related ENSO to rainfall distribution in Sri Lanka using DTC, and Wei et al. (2020) predicted heavy rainfall in East Asia using DTCs in historical series of ENSO and precipitation. However, no work was found using DTCs for ENSO event forecast. Based on this gap, this study has used DTCs to predict ENSO conditions from ONI data, in order to obtain a simple and accurate model for predicting El Niño-Southern Oscillation events.

2 Data and methodology

2.1 Input data

We used quarterly data from the Oceanic Index Niño (ONI) of the ENSO cycles of 1950–2020, provided by the National Oceanic and Atmospheric Administration (NOAA). ENSO cycles in the tropical Pacific Ocean are detected by different methods, including satellite remote sensing, sea level analysis, and by anchored, floating, and disposable buoys (Climate Prediction Center, 2012). ONI is NOAA’s main indicator for monitoring LN and EN. It tracks mean sea surface temperatures (SSTs) quarterly in the eastern-central tropical Pacific, called as Niño index 3.4 (5° N and 5° S in latitude and 170 to 120° W in longitude), as proposed by Trenberth (1997) (Fig. 1). The ENSO is characterized based on the date which the events were mature/seasoned. The SST, in 125 m deep, carries relevant information and provides a source for predicting the extent of the ENSO in advance (Pinault, 2016). Niño 3.4 region is important to delimit the study area since it is accepted that ENSO exhibits a different behavior when observed in the central and/or eastern Pacific regions (Kao and Yu, 2009).

Fig. 1
figure 1

source NOAA) (the colors represent the ocean temperature, the redder the hotter). B Location of the region called Niño 3.4 where the temperature sensors are located to estimate the Niño Ocean Index (ONI)

A Example of El Niño event in the Pacific Ocean (

The ENSO extremes in the 1950–2020 period were classified according to the National Oceanic and Atmospheric Administration, which considers that EN conditions are present when ONI is above + 0.5 and LN conditions exist when ONI is under − 0.5, for 5 consecutive 3-month running averages (NOAA, 2017).

2.2 Induction of the decision tree classifier

The decision tree classifier (DTC) is nonparametric supervised methods used for classification. The objective of these methods is to create a model that estimates TARGET (dependent variable) values by learning simple decision rules from FEATURES (independent variables). In this case, TARGET is the annual average value of ONI, which ends in December, and FEATURES are the quarterly ONI values 2 years before.

The data input for the DTC was done as follows: 1 year of quarterly data from the previous year, adding the 3 quarters of the current year, totalling 15 quarters as input variables for the current year forecast (Fig. 2).

Fig. 2
figure 2

Schematic of the input variables in the decision tree model forecast. The letters represent the months, as the following: DJF-p, December, January, and February of previous year; JFM-p, January, February, and March of previous year; FMA-p, February, March, and April of the previous year; MAM-p, March, April, and May of previous year; AMJ-p, April, May, and June of previous year; MJJ-p, May, June, and July of previous year; JJA-p, June, July, and August of previous year; JAS-p, July, August, and September of previous year; ASO-p, August, September, and October of previous year; SON-p, September, October, and November of previous year; OND-p, October, November, and December of previous year; NDJ-p, November, December, and January of previous year; DJF, December, January, and February of current year; JFM, January, February, and March of current year; FMA, February, March, and April of current year

The DTCs build the tree recursively, from top to bottom (Han and Kamber 2001). It starts with the training set, which was divided according to a test on one of the FEATURES, forming a more homogeneous subset in relation to TARGET. This procedure was repeated until to obtain a very homogeneous set, for which it is possible to assign a single value to the dependent variable.

The DTCs have the advantages of being simple and easy to view, require little data preparation, can handle numerical and categorical data, and are robust models, making it possible to validate the models using statistical tests, even when some assumptions are violated. However, as results, complex trees can be generated; nevertheless, there are some mechanisms to avoid this problem, such as adjusting the minimum number of samples for each node and/or the maximum number of nodes in the tree. However, to solve this problem, a large and balanced number of data is normally used, and the appropriate FEATURES are selected in a way that the representation of the system under study is done in a parsimonious way.

The DTC algorithm takes into account the average values and annual variability of ONI quarterly data. It selects which quarters were most important for the annual forecast of the ENSO phenomenon. The criterion used to choose the independent variable that divides the set of examples in each repetition is the main aspect of the induction process. Among the most known and used criteria is the index of information gain or entropy (Quinlan, 1993), related to the (im)purity of the data, which was the chosen criterion for this work. Entropy is a measure of the lack of homogeneity of the input data in relation to its classification. For example, if the sample is completely homogeneous, then entropy is zero, and if it presents maximum entropy (equal to 1), it means that the data set is heterogeneous (Mitchell, 1997; Coimbra et al. 2014).

The induction of the DTC was performed binary, with two branches from each internal node. To avoid overfitting, which would compromise its generalization and performance with new examples, two criteria for stopping the induction algorithm were adopted. The first rule limited the depth of the tree, allowing it to have at most five levels (the root node is considered to be at level zero). The second rule limited the fragmentation of the training set, requiring a minimum of four examples at each node for a new division and at least four examples at each leaf node.

In addition to the stop criteria, called pre-pruning, a procedure of post-pruning was performed, after the induction of the complete tree, in order to decrease the dimensions of the tree. Together with this complete tree, its possible sub-trees were evaluated, and the smallest sub-tree (lowest complexity) with the lowest error rate over the training set was chosen, considering that complex trees can be worse than the simplest trees in predicting new data due to overfitting (Phillips et al. 2017).

2.3 Assessment metrics

Error rate and accuracy are the most common evaluation measures for decision trees (Han and Kamber 2001; Witten and Frank, 2005). The forecast error of a model is defined as the expected square error for new forecasts and for classification as the probability of incorrectly classifying new observations (De'ath and Fabricius 2000). Calculating the error rate on the training set usually results in a highly optimistic estimate, due to the model’s specialization with respect to examples. To work around this problem, the model was randomly divided into two independent sets, one for training and one for validation, with 80 and 20% of the data, respectively (Han and Kamber 2001).

In addition to these measures, the decision tree confusion matrix was produced, which offers effective means for evaluating the model based on each class (Monard and Baranauskas 2003). Each element of the matrix represents the number of examples, being the class observed in line and the class predicted in column. From the generated matrix were extracted the true positive (TP) values, which occur when the model predicts a positive case correctly, false positives (FP) which are those cases where the model is classified as a certain ENSO when in fact it is not, false negative (FN) when the model indicates that it is not a certain ENSO but in fact is, and, finally, the true negative (TN) which is when the model says that it is not determined ENSO and it hits (Hothorn et al. 2005).

The values of TP, TN, FP, and FN were determined as follows from the confusion tables (Fig. 3).

Fig. 3
figure 3

Confusing table used to determine true positive (TP), true negative (TN), false positive (FP), and false negative (FN) in the machine learning analysis by decision trees, for the cases of La Niña (LN) (A), El Niño (EN) (B), and neutral years (NE) (C). Being: OBS the values observed in the historical series and PRED the values predicted by the decision tree

Some statistics, described by Stojanovic et al. (2014), were calculated in two ways: as screening tests and as diagnostic tests. Screening tests are those made to know the proportions of classifications that have already occurred. In this case, the positive predictive value (PPV) is the proportion of correct classifications of the decision trees among all the correct classes of the same type (Eq. 1), and the negative predictive value (NPV) is the proportion of wrong classifications among all the wrong classes of the same type (Eq. 2).

The diagnostic tests, related to the estimated values, are those made to know the probability of the detections for the future: The sensitivity evaluates the capacity of the decision tree to detect a certain class of ENSO when it is really the correct one (Eq. 3). The specificity evaluates the ability of the decision tree not to detect a certain class when it is not really and correctly (Eq. 4). The accuracy is the general performance of the decision tree model (Eq. 5).

$$Positive\;predictive\;value:PPV=\frac{TP}{(TP+FP)}$$
(1)
$$Negative\;predictive\;value: NPV=\frac{TN}{(TN+FN)}$$
(2)
$$Sensitivity=\frac{TP}{(TP+FN)}$$
(3)
$$Specificity=\frac{TN}{(TN+FP)}$$
(4)
$$Accuracy=\frac{(TN+TP)}{n}$$
(5)

All the analyses and data organization were performed using Python 3.9.6 programming language, with the following libraries: numpy, pandas, statsmodels, sklearn, and the graphs with matplotlib, graphviz, and pysankey.

3 Results and discussion

The predictability of ENSO depends on the period in which it is estimated (Chen et al. 2004, 1995; Kirtman and Schopf 1998; Balmaseda et al. 1995). A period of study of 70 years is long enough for the forecast to be statistically robust, different from the work of some authors who are based on forecasts of the last two or three decades only (Goswami and Sukla, 1991; Chen et al. 1995). The evaluation of the time series of the observed seasonal SST anomalies of Niño 3.4 during the period from 1950 to 2020 can qualitatively evaluate the variability of ENSO, which at several moments demonstrates a multi-decadal pattern (Fig. 4), with peaks of El Niño, La Niña, and neutral constant.

Fig. 4
figure 4

source NOAA). A year is classified as La Niña when the ONI ≤  − 0.5 and El Niño when the ONI ≥  + 0.5, for 5 consecutive 3-month running averages. The vertical bars indicate the standard deviation of quarterly ONI per year

Annual Oceanic Niño Index (ONI) (

During the years 1950 to 2020, at least moderate El Niño events took place in 1986/87, 1996/97, and 2014/15 the same way for La Niña events in 1954/55, 1974/75, and 1999/00. The weakest events, close to neutrality, for El Niño occurred in 1962/63, 1991/92, and 2003/04, already for La Niña, the same occurred in 1955/56, 1984/85, and 2006/07 (Fig. 4).

There was a frequency of neutral years during the analysis of the time series, with sequences marked between the years 1959/62, 1977/81, 2003/06, and 2012/14 indicating a multi-decadal period with these events happening in intervals of 10 or 20 years. The condition of neutral years presents less dispersion of ONI data when compared with the conditions of years of El Niño and La Niña (Fig. 4).

Despite the strong and regular oscillations in the 2000s, there were periods without much activity in El Niño or La Niña, as in the late 70 s and early 80 s. Considering an El Niño as a warm event when the ONI anomaly rate is greater than 0.5 °C and a La Niña as a cold event when the ONI anomaly rate is lower than − 0.5 °C, both for 5 consecutive 3-month running averages, we see that there were 20 El Niño events and 22 La Niña events from 1950 to 2020.

Stratifying the analysis to the quarterly scale, there was a lower variability of ONI during the middle of the calendar year, verified by the smallest standard deviation of seasonal gait. The fall/winter season in the Southern Hemisphere showed the smallest standard deviation from the spring/summer seasons. According to Ham et al. (2019), the ONI index has a high correlation with almost all target seasons, influencing more robust forecasts in the seasons between late spring and boreal fall; however, for southern conditions, there is the opposite making the most informative quarters for ML models the most variable, between August and March. This fact was corroborated by the present study, as the most important quarter for forecasting the annual ENSO condition was January–February-March (Fig. 5).

Fig. 5
figure 5

source NOAA). A quarter is classified as La Niña when the ONI of that period ≤  − 0.5. When the quarterly ONI value ≥ 0.5, the quarter is classified as El Niño. Neutral quarters occur when − 0.5 ≤ average ONI ≤ 0.5. The vertical bars indicate the standard deviation of ONI per year. The letters indicate the quarter, for example, DJF, December, January, and February; JFM, January, February, and March, following this way

Quarterly Oceanic Niño Index (ONI) for the period between 1950 and 2020 (

The boreal spring (between March and June) is seen as a barrier in the ENSO forecast because it causes a drop in ability to persist (decrease in standard deviations) in all ONI forecasts by the models (Chen et al. 2004). The same occurs in the southern hemisphere, however, in the fall season (between March and June) (Fig. 5).

Conversely, in the austral spring (between September and December) occurs the increase of ONI standard deviations allowing greater variability of information in the inputs of the decision tree making this period informative for the ML models (Fig. 5).

The decrease in standard deviations, between May and June, was probably related to the lack of very strong events, such as the El Niño events of 1991/92 and the La Niña events of 1984/84 and was also due to the seasonal dependence of the discharge/recharge cycle on the equatorial (or sea level) heat content, which leads to the SST anomaly for 6 to 8 months and plays a crucial role in ENSO (Jin, 1997; Meinen and McPhaden 2000; McPhaden 2003).

Analysis of the decision tree model showed that some quarters of the ONI index, including July–August-September (JAS-p), January–February-March (JFM-p), and February–March-April (FMA-p) of previous year and January–February-March (JFM) of current year, had greater influence on ENSO event prediction (Fig. 6). The first predictor (JFM) is related to the first quarter of the forecasted year, and the other three predictors used for the ONI prediction are related to quarters of the previous year, demonstrating the anticipation of the forecast and confirming that the ONI event prediction is largely determined by the initial oceanic conditions (Chen et al. 2004) in southern spring–summer.

Fig. 6
figure 6

Decision tree analysis for forecasting El Niño-Southern Oscillation annual events as a function of the quarterly Oceanic Niño Index (ONI). JFM, January–February-March; JAS-p, July–August-September of the previous year; JFM-p, January–February-March of the previous year; FMA-p, February–March-April of the previous year

The first division is based on JFM quarter, which was the most powerful predictor and used by the model for all forecasts, indicating that the model can predict an annual event until the third month of the current year. The left node is strongly homogeneous and is not later divided, being classified as a La Niña year.

Continuing with the right branch, comprising the JAS quarter of previous year, was divided with great information gain, spited into two other nodes at the third level of the decision tree. The right node, again comprising the JFM quarter, was strongly divided, ending with a classification of El Niño years on the right and maintaining the division on the left branch, which ends with a classification of La Niña years on the right and neutral years on the left.

As for the left node, also with the JFM quarter but now in relation to previous year, the division process was strongly repeated, with a classification process of neutral years on the right and with a last division on the left node. The left node again comprises the JFM quarter of current year, which was used for the third time in the decision tree and finalizes the division process by predicting neutral years if the quarter in question has an ONI index less than or equal to − 0.15 and otherwise finalizes with El Niño years on the right.

This delay of one to two stations occurs because the atmosphere takes about 2 weeks to respond to SST anomalies in the tropical Pacific, and then the ocean integrates the force associated with the atmospheric bridge in the coming months (Alexander et al. 2002). The decision tree responses were accurate and robust for both training and validation period, indicating that the proposal for annual ENSO prediction was successful.

During the training period, of the 21 years observed as NE, 18 were predicted as NE, 2 were predicted as EN, and 1 year as LN. For the LN years, of the 18 years observed, 13 were successfully predicted as LN, 4 were predicted as NE, and 1 as EN. Finally, of the 17 years observed of EN, 16 were successfully predicted, and only 1 was predicted as NE (Fig. 7, Appendices).

Fig. 7
figure 7

Sankey’s diagram for representation of the confusion matrix for the training period for the El Niño (EN), La Niña (LN), and neutral years (NE) forecasts

Therefore, the EN years do not have a 100% specificity for predicting NE and LN events as if they were El Niño years, but as they have a high ability to predict EN events, they have a high sensitivity of 94%, which results in a good accuracy of 93% for this model. A similar case occurs for NE years, which presents 86% sensitivity and 85% specificity, resulting in 86% accuracy.

The opposite occurs for years of LN, although the years of LN have a high specificity of 97%, due to the fact that the vast majority of the predicted events were actually LN years, that is, the model does not normally predict other phenomena as if they were LN years; the sensitivity of the model in the prediction of LN is the lowest, because 4 events were incorrectly predicted as NE years and another 1 event as EN year; therefore, the sensitivity of this model is 72%, resulting in an accuracy of 85% (Table 1).

Table 1 Performance of decision tree analysis in training and test periods for predicting events of El Niño (EN), La Niña (LN), and neutral years (NE)

In the validation period, from the 3 cases of EN, 2 were correctly predicted as EN; however, one-third of the years of this event were incorrectly predicted as NE events; for this, the sensitivity of the model in the prediction of EN is low (67%). For the LN years, with the highest hit rate, of the 3 observed years, the 3 were correctly predicted as LN, resulting in 100% sensitivity and specificity of the model for this event, since no other phenomena are incorrectly predicted as La Niña years. Finally, for the 8 NE years, 6 were predicted as NE, and 2 were predicted as EN, which decreases the sensitivity and, consequently, the accuracy of events of NE (Fig. 8, Appendix).

Fig. 8
figure 8

Sankey’s diagram for representation of the confusion matrix for the validation period for the El Niño (EN), La Niña (LN), and neutral (NE) year forecasts

In the training period (n = 56 years), the decision tree predicted the ENSO events with overall accuracy of 84% and in the validation period (n = 14) equal to 78% (Table 1, Fig. 9). These results were similar to those obtained by other authors such as Nooteboom et al. (2018) who associated artificial neural networks with ARIMA models. However, our results were achieved applying a simpler approach with the same performance.

Fig. 9
figure 9

Observed and predicted class frequencies of El Niño, La Niña, and neutral years by the decision tree model for 70 years divided into 56 years for training (A) and 14 years for validation (B)

The highest accuracy of the model (positive predictive value, Table 1) was in the La Niña years forecast (93%), followed by El Niño years (84%) and neutral years (78%). In the validation period, the results improved for the La Niña and neutral years with 100% and 86% accuracy, but the accuracy decreased to 50% for the El Niño years.

The more sensitive the rating, the higher your NPV, meaning that there is more security that the decision tree did not go wrong. Usually, the sensitivity and specificity are inverse. The more specific the model, the more it is overfitting; however, for this analysis, values close to 100, both in training and validation, offer a lot of certainty in the response of the decision tree because we used all the available data from the historical ONI series. In this case, the more specific the better (≈100%). All events presented high specificity, the ability to say that the event did not occur when it really did not occur. For the training period, the best results were obtained for La Niña, El Niño, and neutral years with values of 97%, 92%, and 86%, respectively. In the validation period, the La Niña years presented high specificity (100%), and the neutral and El Niño years presented similar specificities, of 83 and 82%, respectively.

4 Conclusion

The ENSO signal forecast was more robust in the seasons between late spring and autumn boreal, and the most important predictors for El Niño and La Niña years were quarters of previous years. The forecast was possible with 8 months of anticipation, using only quarterly data from the Oceanic Niño Index (ONI) for the July–August-September, January–February-March, and February–March-April periods of the previous year and January–February-March of the current year.

The results indicate that it is possible to forecast the hot and cold events of El Niño-Southern Oscillation only with data of the sea surface temperature, being able to use decision trees for forecasting conditions of years of La Niño, El Niño, and neutral with an average accuracy of 78%. The DTC presented an accuracy of 89%, 84%, and 78% for La Niña, El Niño, and neutral years forecasts, respectively, in the training period and 100%, 79%, and 79%, respectively, for the validation period. However, the model performed best for predicting La Niña years, with high sensitivity and specificity for the cold ENSO event.