Introduction

Natural hazards such as volcanic eruptions, earthquakes, and tsunamis can perturb the ionosphere (Astafyeva 2019; Huang et al. 2019; Calais and Minster 1995; Peltier and Hines 1976; Hargreaves 1992; Occhipinti 2015; Rolland et al. 2010; Meng et al. 2019; Artru et al. 2005; Chou et al. 2017; Zettergren et al. 2017). In detail, these events can generate acoustic and gravity waves (AGWs), that are amplified as atmosphere density decreases and can reach the ionosphere. These waves interact with the ionospheric plasma and cause ionospheric electron density disturbances, known as traveling ionospheric disturbances (TIDs; Galvan et al. 2012; Astafyeva 2019). Here, we mention acoustic gravity waves generated near the epicenter (AGWepi) and internal gravity waves (IGWtsu; Occhipinti 2015). AGWepi, related to the uplift at the source, reaches the ionosphere in around 8 min, whereas IGWtsu, linked to tsunami offshore propagation, takes about 45–60 min (Lognonné et al. 2006; Occhipinti et al. 2011).

These perturbations are detected through variations in ionospheric total electron content (TEC; Coster et al. 2013; Hofmann-Wellenhof et al. 2008), retrieved by global navigation satellite system (GNSS) measurements, and measured in TEC units (1 TECU = 1016 electrons/m2). Specifically, we refer to the slant TEC (sTEC) measured by dual-frequency GNSS receivers and encountered by the GNSS signal during its path through the ionosphere from the satellite to the receiver.

The VARION algorithm (Variometric Approach for Real-Time Ionosphere Observation) is an established tool to estimate sTEC variations from GNSS observations (Savastano et al. 2017; Ravanelli et al. 2021). It is based on a single time difference of geometry-free combinations, being suitable for real-time applications (Ravanelli et al. 2020).

TIDs detection considering sTEC time series has been conducted using traditional techniques reliant on human expertise, including the analysis of the ionospheric power index (Manta et al. 2020), the wavelet analysis threshold-based (Torrence and Compo 1998), and the 2D principal component analysis (Lin 2022). Despite the effectiveness of these traditional approaches, there is increasing recognition that artificial intelligence (AI), particularly machine learning, holds potential for advancing TIDs detection. Machine learning uses data-driven algorithms for autonomous decision-making, offering computational efficiency and data handling capacity (Kuglitsch et al. 2022; Crocetti et al. 2021). During the past years, the application of machine learning algorithms has become widespread in ionospheric research for different purposes: automatic detection methods to study TID signatures (Brissaud and Astafyeva 2022; Constantinou et al. 2023a, b); forecasting TEC data (Cesaroni et al. 2020; Huang and Yuan 2014; Natras et al. 2022; Liu et al. 2020); nowcasting TEC data (Zhukov et al. 2018; Camporeale 2019; Łoś et al. 2020); improvement of regional and global TEC models (Zhukov et al.2020); ionospheric scintillation parameter predictions (Atabati et al. 2021; McGranaghan et al. 2018; Linty et al. 2019); and the general analysis of TEC variations induced by earthquakes and tsunamis (Zhukov et al. 2020).

Within the framework of automatic detection methods to study TID signatures, the previous studies by Constantinou et al. (2023a, b) considered computer vision and convolutional neural networks, while Brissaud and Astafyeva (2022) applied the Random Forest classifier, similar to our study, focusing on classifying ionospheric waveforms into TIDs and noise, picking TID arrival times, and associating arrivals across a satellite network in near real-time. In this context, our study aims to use machine learning algorithms for classifying ionospheric TEC variations caused by earthquakes and tsunamis using the large amount of GNSS time-series data (provided by every satellite-station link available). It aligns with the existing body of research while contributing novel insights into the classification of TIDs in large-scale GNSS data sets. Automatic TIDs detection, as planned by the NASA-JPL GUARDIAN system (Martire et al. 2023), underscores the importance of our research within the scientific community.

This paper investigates the 2015 Illapel earthquake and tsunami, a valid case study with a well-documented high magnitude and tsunami signature (Reddy et al. 2017; Ravanelli et al. 2021; Shrivastava et al. 2021).

The main objective is to examine if and how machine learning algorithms are suitable to find TEC time-series signatures related to earthquakes and tsunamis.

To reach our aim, we consider two machine learning classifiers, namely, Random Forest and XGBoost (Breiman 2001; Chen and Guestrin 2016), and apply them to the first temporal derivative of the sTEC (dsTEC/dt), representing VARION-core output. It depicts the rate of change of sTEC with respect to time, to which an elevation cut-off of 15° and 25° is implemented. Several experiments are carried out to determine the optimally performing model, such as data pre-processing, and feature-selection. The following Section provides an overview of the analyzed event, the study region, and the data used. The Methods Section describes the methodologies: the VARION algorithm and the machine learning techniques. Moreover, it presents the set-up of the machine learning classification, the explanations of the pre-processing features, and the assessment of the model performance. In the Results Section, the outcomes are presented, analyzed, and discussed. Here, we also validate the model by applying it to time series with no seismic-induced variations in sTEC. Specifically, we consider data with alterations in dsTEC/dt values not related to the earthquake, testing the algorithm's capability to exclusively detect variations directly linked to the earthquake, aligning with our primary objective. Finally, the last Section summarizes the outcomes derived from this study, identifies the most effective configuration for an efficient model, and offers a prospective outlook on potential enhancements yet to be explored.

Study context and data set overview

On September 16, 2015, at 19:54:33 Chile Standard Time (22:54:33 UTC), a devastating earthquake with a moment magnitude of Mw 8.3 occurred 46-km offshore of the Coquimbo region of Chile. The primary seismic event lasted between 3 and 5 min, followed by multiple aftershocks (Ravanelli et al. 2021). Both the NOAA Pacific Tsunami Warning Centre (https://www.tsunami.gov/; https://earthquake.usgs.gov/earthquakes/eventpage/us20003k7a/executive) and the Servicio Hidrográfico y Oceanográfico de la Armada (Chile’s National Tsunami Warning System) (https://www.armada.cl/noticias-navales/shoa-difunde-oportuna-alerta-de-tsunami; https://www.snamchile.cl/) issued tsunami threat messages and alarms, respectively. Within 10 min, tsunami waves measuring 4.5 m struck the Chilean shoreline between Chañaral (~ 26°S) and Constitución (~ 35°S), causing substantial impact (https://www.tsunami.gov/events/PAAQ/2015/09/16/nuskyv/22/WEAK51/WEAK51.txt). Pichidangui (~ 32°S) was reached by the first wave 13.70 min after the mainshock. The coastal area between Coquimbo (~ 30°S) and Valparaíso (~ 33°S) recorded the highest wave height (over 1.5 m), leading to significant flooding in Coquimbo (Shrivastava et al. 2021).

Data set

Data from 115 GNSS stations mostly located in Chile but also spread all over the South American continent (Fig. 1) are processed with VARION (Ravanelli et al. 2021). The GNSS receivers collect data at 10-, 15-, and 30-s rates. Following Reddy et al. 2017 research, we consider G12, G24, and G25 satellites, resulting in a data set composed of 345 observations.

Fig. 1
figure 1

Map showing the 115 GNSS receivers (red points) and the epicenter of the 2015 Illapel earthquake (yellow star)

In detail, our data set consists of 345 VARION-obtained sTEC variations over time, i.e., dsTEC/dt [TECU/s], which represent the primary and the real-time output of VARION and do not require any additional post-processing (Ravanelli et al. 2021). Specifically, the dsTEC/dt time series are representative as the earthquake has high magnitude and causes evident signatures in the ionosphere. However, while our data set provides insights into TID detection, it has limitations. Indeed, seismic and non-seismic external factors such as space weather, noise, satellite angles, geomagnetic field, and observation geometry influence the induced TEC observation (Meng et al. 2022; Bagiya et al. 2019). We analyze both GPS days, i.e., DOY 259 and 260 (the earthquake day and the day after), considering the time series of the whole DOY 259 and part of the time series of DOY 260, namely, until satellites descend above the elevation cut-off (Fig. 2). Moreover, the data set considers only 31 selected links satellite-stations following four criteria:

  • Data availability on both days (DOY 259 and 260)

  • Considerable (that can be visually identified) variations in the dsTEC/dt time series (related to the earthquake)

  • Only one gap during the day (due to satellite visibility)

  • Data availability from the first observation of the day (which is on the first 15 s of the day), for both DOY 259 and 260

Fig. 2
figure 2

Two of the 31 dsTEC/dt time series that constitute our data set. Here, the link composed of the G12 satellite and the PAZU station is shown on the top plot (a); whereas the bottom plot (b) presents a zoomed view of the time series of the link composed of the G25 satellite and the MRCG station, specifically focusing on the part of the day when the earthquake develops. This portion, occurring after the gap due to the satellite visibility, highlights the seismic-induced variations in the data

The frequency of the observations related to the selected 31 samples is 15 s.

Methodology

The VARION algorithm is applied to the observations of GPS DOY 259 and 260. VARION estimates real-time sTEC variations relying on stand-alone GNSS receivers and standard GNSS broadcast products. It is based on single time differences of a geometry-free combination of GNSS carrier-phase measurements (Savastano et al. 2017; Ravanelli et al. 2020, 2021). A crucial tool in this study, VARION provides the time series for both train the machine learning algorithm and validate the results.

In this analysis, we use the elevation cut-off angles of 15° and 25° to mitigate the impact of observational noise, a prevalent challenge in ionospheric studies. These cut-off angles represent a strategic filter, effectively excluding observations derived by satellites with lower elevation angles where data tend to be noisier due to increased atmospheric interference. This aligns with common practices in the literature, where such elevation cut-offs are used to enhance the signal-to-noise ratio and improve the overall quality of the data set, ensuring a more robust and reliable foundation for our analysis (Occhipinti et al. 2011; Astafyeva 2019; Ravanelli et al. 2023).

To achieve our final aim of detecting the TIDs caused by earthquakes and tsunamis, we formulate a binary classification problem using supervised machine learning algorithms. Data are classified into two categories: 0 (if there are no sTEC variations related to the earthquake and tsunami in a 30-min window that runs through the time series) and 1 (if there are sTEC variations). The TIDs detection is based on established criteria derived from prior studies available in the literature. These include factors such as the arrival time of the perturbation in the ionosphere, its shape, absence of geomagnetic disturbances, and frequency content, as identified and validated in the previous research (Reddy et al. 2017; Ravanelli et al. 2021; Shrivastava et al. 2021; Sanchez et al. 2023).

We consider Random Forest (RF) and XGBoost (XGB) classifiers among the several available for classification tasks (Zhang et al. 2017), for their well-known good performances (Brissaud and Astafyeva 2022). Indeed, Crocetti et al. 2021 show that RF and XGB algorithms perform better than others (i.e., Linear Support Vector Classification, Perceptron, K-Nearest Neighbor, and more). Specifically, RF excels in handling high-dimensional data and mitigating overfitting through bootstrap aggregation, while XGBoost improves weak learners iteratively, enhancing accuracy and computational efficiency. Both algorithms, as ensemble learning methods, capture intricate relationships within the data and enhance predictive accuracy. They are suitable for our purposes, especially since we do not require deep learning methods (i.e., Convolutional Neural Networks—CNNs; Albawi et al. 2017) given our moderate data set size and the division into short chunks of the time series.

In detail, RF consists of decision trees, nonlinear models with multiple linear boundaries. Decision tree nodes use data-related questions linked to specific feature values, recursively dividing layers into child nodes. The iterative process creates a tree with a predefined depth. The algorithm selects bootstrapped samples and a random feature subset for model evaluation. The final prediction aggregates results from all decision trees (Breiman 2001). In contrast, XGBoost builds trees sequentially, each minimizing the error of the previous tree. It starts with a constant, iteratively trains trees on residuals, and combines them with the previous model to reduce error. Finally, strong learner results from combining all weak learners (Wang and Liu 2020; Chen and Guestrin 2016).

The classification is conducted for the selected 31 links (described in the Data set Section), split into 80% (25 links) and 20% (6 links) for training and testing. The dsTEC/dt time series of each link are divided into individual chunks, as described in the following Section. The model is trained based on the dsTEC/dt time series of the 25 training links, while tested considering the ones of the six testing links. Finally, the validation of the model and the evaluation of its performance are analyzed by considering 18 unseen dsTEC/dt time series, also related to the Illapel event. However, these show variations in dsTEC/dt not linked to the seismic event. This additional analysis allows us to assess the model's ability to distinguish variations in dsTEC/dt specifically linked to the earthquake and tsunami from those unrelated to the seismic event.

Feature matrix

We use the dsTEC/dt time series of 31 links satellite-stations as features for the machine learning algorithms. In detail, we exclude the gap due to the satellite visibility of each time series and consider the time frames (1) from the first 15 s of DOY 259 until the start of the gap of DOY 259 and (2) from the end of the gap of DOY 259 until the start of the gap of the DOY 260 of each time series.

The time series are split into k chunks to create the feature matrices. Each chunk is m = 30 min long and shifted by n = 1 min from the next one: The first chunk contains dsTEC/dt values from the first 15 s (the first observation for both the days) until m, while the second one from 75 s (15 s + n) until 75 s + m, and so on. The feature matrices have the dimension [k, 120] as 120 is the number of 15 s in m. Finally, we obtain 31 feature matrices (one per link), where we combine 25 to create the feature matrix used for training (80%), and 6 to create the one used for testing (20%). The structure of the feature matrix is shown in Fig. 3.

Fig. 3
figure 3

Structure of the feature matrix related to the time values for DOY 259 (both for training and for testing). For every time value, the matrix is filled with the corresponding dsTEC/dt value. The red cells are an illustrative and hypothetical example to show the dsTEC/dt values perturbed from the earthquake and tsunami, corresponding to the value “1” in the target vector

Target vector

The target vector denotes whether sTEC variations due to the tsunami occurred within their corresponding chunks, classifying them as “1” if the perturbation is present or “0” if not. As previous mentioned, this attribution is based on well-defined conditions validated in the literature, which demonstrate the link between sTEC perturbations and the seismic event. To perform this binary classification, the time frame of sTEC perturbation related to the event is manually labeled from the time series, i.e., the initial and finishing times of the sTEC perturbation for every link satellite-station. Therefore, the target vector is labeled as 1 whenever the 30-min chunks contain any point in time of the manually labeled time frame of the sTEC perturbation.

In this way, we create a target vector with the dimension [k, 1] for every link (Fig. 3). As for the feature matrix, we use the same 80% of the links for training and 20% for testing.

Data pre-processing

The pre-processing of the data set consists of cleaning and preparation of the data, to improve the quality of the data set and ensure better performance of the model. In this study, we consider standardization and normalization, two methods used to scale the data set. In detail, standardization transforms the data to a specific range (e.g., 0 and 1 or − 1 and + 1); while normalization changes the data so that they resemble a normal distribution (Ali and Faraj 2014; Vieira et al. 2020).

Finally, we also include additional features in the feature matrix, namely, the value range, defined as the maximum value minus the minimum one before the normalization, and the variance of each chunk, both individually and together.

Model evaluation

To evaluate the performance of the model, we consider the confusion matrix, the difference between the labeled and predicted middle epochs of the perturbations time frames, the receiver operating characteristic (ROC) curve and the area under the curve (ROC-AUC).

In detail, the confusion matrix shows the number of false negatives (FNs), false positives (FPs), true negatives (TNs), and true positives (TPs) generated from the machine learning classification (Crocetti et al. 2021). In our case, TPs and TNs indicate correctly classified chunks with (or without, for TNs) earthquake- and tsunami-induced sTEC variations. Conversely, FNs identify chunks with undetected sTEC variations, while FPs designate chunks without sTEC variations, wrongly classified as containing them. Due to our aim of not overlooking any tsunamis, FNs are considered the most crucial errors in the analysis. However, to evaluate the best model, we consider the confusion matrix in terms of the well-known performance measures of precision, recall, and F1 score (Ting 2017).

Moreover, we compare the labeled perturbation time frame with the one generated by the model, represented by the time frame containing all the TPs (assumed to be at the middle epoch of the 30-min time period of the chunk). To achieve this goal, we calculate the numerical difference between the middle epoch of both time frames and assess how many 15 s of difference there are between them (according to the time-series resolution).

Results and discussion

To reach the final aim of finding signatures in TEC time series related to earthquakes and tsunamis using machine learning algorithms, we performed several experiments considering two classifiers, i.e., Random Forest and XGBoost, and two elevation cut-off angles, i.e., 15° and 25° (see the following Section). We conducted hyperparameter tuning through grid search for each of the four combinations classifier elevation cut-off. We also assessed the influence of additional features (see the Section related to the impact of added features) and pre-processing techniques (refer to the Section concerning the pre-processing effects) on the model's performance. The model with the best performance in terms of F1 score was then selected for each combination, resulting in four models. Finally, our final best model was selected in terms of F1 score, difference between the middle epochs of the labeled and predicted perturbation time frames, ROC curve and ROC-AUC.

The best result is presented in the first part of this section, while the following subsections show how we came to this conclusion by comparing the best model with the others.

We trained our model on a machine with a 2.20-GHz i7-Intel Core processor, 32 GB of memory, and an Intel(R) UHD Graphics 630 graphics card. So, it can even be trained on a laptop without a proper GPU.

From this study, the best model is the XGBoost classifier that uses 15° elevation cut-off dsTEC/dt time series and includes the value range of each chunk as an additional feature. A hyperparameter tuning was performed to optimize the overall model performance. In the case of XGBoost, the number of boosting rounds (or the number of trees to be built in the ensemble), the maximum depth of trees, the number of columns to be randomly sampled for each tree, and the learning rate (the step size at each iteration while the model optimizes toward its objective) were analyzed. To determine the most suitable hyperparameters, we used a grid search, which systematically searches through a grid of a manually predefined set of hyperparameter combinations. In this study, the performance of all hyperparameter value combinations is evaluated based on a threefold cross-validation. This involves partitioning the data set into three subsets, with two-thirds used for direct model training and one-third reserved for validation. This process was repeated three times, ensuring each subset served as a validation set once. The average performance across these three runs for each hyperparameter combination was then compared to identify the set of hyperparameter values that reach the best performance. Table 1 shows the tested combinations and the best and default hyperparameter values for XGB-15°, our best model.

Table 1 Best tested and default hyperparameter values in the grid search for XGB-15°

Regarding the testing samples, the best-performing model correctly classifies 183 of 247 (74.09%) samples of sTEC variations related to the earthquake and the tsunami (TP). Furthermore, 2975 of 3021 (98.49%) of the testing samples are correctly classified, containing no earthquake-induced sTEC variations (TN). Thus, 64 of 247 samples (25.91%) are wrongly classified as containing no seismic-induced sTEC variations (FN), while 46 of 3021 (1.51%) are the number of wrongly classified chunks as containing sTEC variations related to the event (FP), as shown in Table 2. The model achieves an F1 score of 0.77, recall of 0.74, and precision of 0.8; while the accuracy considering the training and the testing data is 0.98 and 0.97, respectively (Table 3). Figure 4 depicts these results for two links and the difference between the labeled and predicted middle epochs of the perturbation time frames.

Table 2 Confusion matrix of our best model (XGB-15°), showing FPs, TPs, FNs, and TNs
Table 3 Results (in terms of accuracy for training and testing data, precision, recall, and F1 score) of the four models
Fig. 4
figure 4

Time series of two sample links: the one composed by the G24 satellite and PAZU station, used for training (a); and the one formed by the G24 satellite and UDAT station, used for testing (b). In both, the performance of the best model is presented. The plots show the position of the FNs and TPs in the time series together with the labeled and predicted middle epochs of the perturbation time frame

The average numerical difference between the epoch of the TID is 70.8 s, considering the 25 links used to train the model, and 75 s for the 6 links used for testing (Table 4). As the time series are 15-s step discontinuous, achieving a 5-step average differences between the two epochs both in training and in testing is relevant. This highlights the algorithm's precise temporal detection of the AGWepi-related perturbations, underlining its critical role in timely TIDs identification for effective early warning systems. Furthermore, having similar results for training and testing data shows the model’s generalization ability, avoiding overfitting problems. In this context, the model's capability to generalize is ensured by incorporating an additional feature (i.e., value range), which prevents overfitting and captures essential information. The generalization is also achieved by the threefold cross-validation. Moreover, the sensitivity to noisy observations (i.e., selection of elevation cut-off angles) improves the model's focus on relevant information.

Table 4 Results (in terms of the average difference between labeled and predicted middle epochs and the number of 15-s steps considering the training and testing links) of the four models

Figure 5 illustrates the ROC curves and AUC values for both algorithms and cut-off angles. RF reaches better ROC curves and higher AUC values; however, we opted for XGB-15° considering its greater performance in the F1 score and in the difference between the labeled and predicted middle epochs. In particular, a good performance in the last metric is essential for good results in real-time applications, aligning with our aim scope.

Fig. 5
figure 5

ROC curves depicting the performance of the different machine learning algorithms. The blue dashed line represents the performance of a random classifier. The legend includes the corresponding AUC values

Overall, the algorithm demonstrates good computational efficiency, detecting in about 2–3 min, with hyperparameters tuning as the most critical step. An effective computational performance is crucial for real-time operational requirements, essential for effective early warning systems. However, our analysis uses data sampled at larger intervals than the 1-s real-time data rate, which can increase the computational time to 30–40 min.

Finally, the algorithm holds potentials as a highly viable tool, considering its ability to operate in real-time using only sTEC time series. However, several crucial aspects must be addressed for practical implications. In fact, successful real-world implementation requires access to real-time data from networks in high-risk areas prone to tsunamis. Furthermore, incorporating buffer time into the analysis process allows for a thorough examination of the data and its quality in real-time (i.e., data integrity, and bias and outlier identification). Establishing the necessary infrastructure is also crucial. This includes dedicated servers that can perform complex computations, efficiently run AI algorithms, and transfer data to cloud platforms. Finally, the use of servers and cloud storage is essential for storing data collected in real-time, which can be used for training new models and conducting retrospective analyses.

Comparison of different elevation cut-off angles

We tested two different elevation cut-off angles, 15° and 25°, as input features for our model. The results for the two classifiers, and the elevation cut-off angles, are shown in Table 3, where the first column shows the results of our best model.

For both classifiers, the performance related to the different cut-off angles is similar in terms of accuracy and precision. However, the recall is a bit higher when using an elevation cut-off of 25° (Table 3), whereas the F1 score is higher for an elevation cut-off of 25° for RF and similar for XGBoost. Conversely, when comparing the results of the two classifiers, we note that while the precision is higher when using RF, the F1 score and recall are significantly higher for XGBoost (Table 3).

Furthermore, Table 4 shows that for both classifiers, the differences between the labeled and predicted middle epochs for the links used for training and testing are smaller when using an elevation cut-off of 15° instead of 25°. Even though the average difference between the middle epochs for the training links is higher for XGBoost than for RF, the one for the testing links is smaller (Table 4). Moreover, RF presents a high dissimilarity between the average difference of the middle epochs considering the training and testing links, so the model proves not to be suitable to generalize well over unseen data sets.

Thus, we conclude that the XGBoost algorithm using dsTEC/dt time series with an elevation cut-off of 15° is the best model since it has high accuracy, precision, recall, and F1 score and performs best when investigating the average differences between labeled and predicted middle epochs.

Impact of additional features

We evaluate the impact of adding additional features, namely, the value range and variance of the chunk, to the feature matrix used in the machine learning algorithm. The choice of incorporating those features in our model is motivated by their capacity to capture different aspects of the dsTEC/dt time series. The value range represents the amplitude of variations within each chunk, providing insights into the overall perturbation magnitude. Meanwhile, variance quantifies the internal dynamics and temporal variability, describing the temporal characteristics of the ionospheric perturbations. For all combinations (XGB-15°, XGB-25°, RF-15°, and RF-25°), this addition improves the results in terms of precision, recall, and F1 score, as shown in Table 5. Indeed, they enable the model to better distinguish seismic-induced variations in dsTEC/dt time series, thus improving predictive accuracy through a deeper understanding of ionospheric perturbations. In particular, in XGB-15°, RF-25°, and XGB-25°, including value range or variance leads to similar results, having thus a similar impact on the model. On the other hand, in RF-15°, including the value range has a greater impact than including variance as it doubles the values of the metrics obtained with the raw data. This sensitivity of the algorithm to the range values aligns with the nature of Random Forest, which benefits from capturing a broad spectrum of information for robust decision-making, and the value range seems to capture the data variability more effectively than the variance. Finally, the study of the feature importance reveals that, for our best model (XGB-15°), the value range has the greatest impact. Thus, this shows that the value range significantly contributes to reduce errors in the tree ensemble.

Table 5 Impact of additional features (value range and variance, both individually and together) on the precision, recall, and F1 score, metrics of the four models

Including both value range and variance in the feature matrix improves the raw case results. However, in XGB-15° and XGB-25°, the outcomes are worse than the ones related to the addition of value range or variance individually. Only in the case of RF-25°, adding both value range and variance presents better results than including them individually. This is likely because we do not have that much training data, and having more data can help the model better learn the patterns and relationship within the features. In this case, the joint inclusion of the features might introduce redundancies or interactions that the model struggles to effectively leverage.

Impact of the pre-processing

In this section, we evaluate the impact of standardization and normalization of the feature matrix chunks, done separately for each chunk, on the performances of the different models (Table 6), which commonly should be small on tree-based algorithms (García et al. 2015; Dougherty 2013). We highlight that only standardization applied to XGB-15° outperforms the raw case. This improvement can be attributed to the sensitivity of the XGBoost algorithm to the scale of input features. Standardization helps align the features to a common scale, facilitating more effective convergence during the boosting process. Furthermore, with 15° elevation cut-off, we observe more noisy data compared to the ones at 25°. This leads to more pronounced fluctuations and a high amount of data, underscoring the greater impact of standardization.

Table 6 Impact of pre-processing of the features (standardization and normalization) on the precision, recall, and F1 score, metrics of the four models

In contrast, in RF-15°, RF-25°, and XGB-25°, both standardization and normalization have a negative impact on the model, worsening the raw case results in terms of F1 score and recall. This outcome shows that RF and XGB are in this case not highly sensitive to the scale of features.

Validation of the model

To validate our results, we apply the best model to unseen dsTEC/dt time series related to the same event but generated from different satellite-station links. Specifically, we select the time series with variations in dsTEC/dt values occurring before the seismic event, ensuring that these perturbations are not caused by the earthquake. To select them, we apply some variations to the first two selection criteria used before:

  • Data availability only on DOY 259, excluding DOY 260 (as those time series are not registering at the end of DOY 259, when the earthquake occurred, so there would be a gap between DOY 259 and 260)

  • Considerable (that can be visually identified) variations in the dsTEC/dt time series, but not related to the earthquake (caused by noise or ionospheric background)

In this way, 18 dsTEC/dt time series are selected, enabling us to evaluate the performance of the model. Indeed, the expectations for this application are to predominantly capture all or most TNs, and minimize the occurrence of FPs. This is particularly crucial, as we are specifically considering time-series data with variations in sTEC that is not induced by the earthquake, ensuring a focus on our primary objective. The confusion matrix shows good results: 96.89% TNs and 3.11% FPs (in numbers: 5119 of TNs and 164 of FPs). These successful results are also confirmed by an accuracy of 0.97 (Fig. 6).

Fig. 6
figure 6

Time series of G14 satellite and LSCH station showing the results obtained applying the model to one of the unseen dsTEC/dt time series, where the earthquake does not occur. The plot shows the FPs and TNs in the time series (a), with a zoom on the time frame where the FPs are detected (b)

Conclusions

This study successfully used two machine learning algorithms, Random Forest and XGBoost, to detect TEC variations induced by the significant 2015 Illapel earthquake and tsunami, known for its substantial ionospheric TEC signatures. We approached the problem as a supervised binary classification task and used the VARION-generated dsTEC/dt time series as the input for our model. We followed specific criteria to select 31 links satellite-station. Then, we split the corresponding dsTEC/dt time series into individual 30-min chunks to create the feature matrix used in the machine learning algorithms.

We considered different classifiers, elevation cut-off angles, additional features, and pre-processing techniques. The best result, based on F1 score, average difference between labeled and predicted middle epochs of the perturbation time frames, and ROC curves, was with XGBoost classifier applied to 15° elevation cut-off time dsTEC/dt time series. The best performance was achieved by including the value range of the chunks as an additional feature and by tuning the hyperparameter using grid search.

Applying our final model to unseen test data, we obtained an overall performance of an F1 score of 0.77, a recall of 0.74, and a precision of 0.80. Indeed, focusing on the testing samples, 183 of 247 (74.09%) were correctly classified to contain sTEC variations related to the earthquake and the tsunami (TP). Furthermore, 2975 of 3021 (98.49%) of the testing samples were correctly classified, containing no sTEC variations caused by an earthquake (TN). Thus, 64 samples (25.91%) were wrongly classified as containing no sTEC variations caused by the event (FN), and 46 (1.51%) were the number of wrongly classified chunks as containing sTEC variations related to the earthquake and tsunami (FP).

Our model, with a smaller data set and single event, achieves competitive TP and TN rates when compared to the study by Brissaud and Astafyeva (Brissaud) with a substantial data set of 12 earthquake events, underscoring its potential utility in operational early warning systems.

This model demonstrated a 75-s average difference in predicting perturbation time frames for testing links, equivalent to an average difference of five steps considering the 15-s steps time series. This highlights the algorithm's potential for early detection of ionospheric perturbations caused by earthquakes and tsunamis, aiding in early warnings purposes.

The model’s versatility allows application in an operational real-time setting using real-time GNSS data, as it only needs the VARION-generated real-time sTEC time series (dsTEC/dt). In that case, buffer time conversion of the sTEC time series, tools for store the processed data, powerful servers, and adequate computational resources need to be considered.

The model takes a few minutes to detect TIDs, presenting a very low percentage of FPs (1.51%) and showing a high computational efficiency, crucial for effective early warning systems. However, real-time situations may increase the computational time due to higher data frequency (1 s) and bigger data set size, emphasizing the need for continuous improvement.

Finally, we validated our results applying the best model to dsTEC/dt time series related to the same event but generated from different satellite-station links. In detail, those time series had variations in dsTEC/dt values occurring before the seismic event, thus not caused by the earthquake. We obtained promising results: 96.89% TNs, 3.11% FPs, and 97% accuracy. However, we know that validation process also involves applying the model to similar events. We acknowledge the importance of external validation on entirely different data sets to assess the model's performance in different scenarios, which will be discussed in the future works.

In conclusion, we have demonstrated a powerful tool for timely and accurate identification of ionospheric perturbations linked to seismic events. Our study not only provides a valuable contribution to the field of ionospheric research but also sets the stage for the integration of advanced machine learning techniques into operational early warning systems, improving our ability to respond proactively to seismic events and associated hazards.

However, recognizing the current model's limitations, particularly referring to the low frequency and low copiousness of the data, future studies should fine-tune the algorithm for real-time, high-resolution data, widening the analysis to different kinds of TIDs and data sets. Furthermore, the model should be improved by incorporating additional features and optimizing the computational efficiency considering parallel processing, together with the analysis of its generalization capabilities. This will enhance the model's robustness and effectiveness in real-time application. Moreover, a database that stores the events (such as earthquakes, tsunamis, and volcanic eruptions) with their characteristic features (time frame, waveform, frequency content, and period) should be established. In this way, it will be possible to collect several events that can be adopted within different algorithms, reaching a way of continuous learning. Those outcomes could then be used as tools that enable early warning systems to combine data derived from the ionosphere with other information to achieve integrated systems that work synergically, enhancing the overall effectiveness of disaster prediction and mitigation strategies.