Introduction

Prediction of earthquakes can be classified into Short Term (SHT), Immediate Term (IMT), Long Term (LT). SHT requires non-seismic precursors for predicting earthquakes. It has been said that SHT is not achieved yet, according to Ghaedi and Ibrahim (2017) (Ghaedi et al. 2018). On the other hand, IMT uses CN, MSc, M8, M8S combined with precursors and seismicity monitoring. The standard procedure for these different algorithms uses generic concepts of pattern identification that allow dealing with several sets of EQ precursors and authorizes for regular seismicity monitoring and extensive testing of predictions, according to Uyeda et al. (2011). But based on the study mentioned by Ghaedi and Ibrahim (2017) and (Mallouhy et al. 2019). IMT estimations cannot prevent all damages that could be caused and further protect all human life. However, they may undertake specific affordable actions to decrease damage and losses and modify the post-disaster relief. Despite the strenuous efforts taken and the several models developed, no prosperous technique has been detected yet. Because of the randomization of EQs, it may not be possible to determine the exact location, magnitude, and time of the following fatal EQ. Concerning LT, though several models were designed, there isn’t any prosperous technique discovered yet. An Machine Learning (ML) / Artificial Intelligence (AI) branch has proven some accuracy and is used in earthquake estimation. Yet, there isn’t any beneficial approach to precisely detect Earthquakes yet. So, ML/AI can study the accuracy and predict the Earthquakes from existing data.

(Cicerone et al. 2009) listed a compilation of earthquake precursors that could potentially be used to predict earthquakes. They mentioned that electric and magnetic fields, gas emissions such as radon ultrasonic vibration models, could be used to predict earthquakes. (Hattori 2004) studied the Ultra-Low-Frequency (ULF) and cited that short-term earthquake prediction could be possible using ULF. (Ghosh et al. 2009) gave a brief review of the progress made in radon measurements in earth sciences in predicting earthquakes. Radon anomalies have been observed in the soil or spring before earthquakes had been studied. They even proposed models that tried to relate precursor time, epicentral distance, and earthquake magnitude. (Varotsos et al. 1986) came up with the VAN method, which measures the low-frequency electric signals. i.e., Seismic Electric Signals (SES). The idea of implementing the earth’s electric field before an EQ for short-term EQ prediction was used in Greece by P. Varatsos. This is done by noting the potential change between the electric fields in East–West (E-W) and North–South (N-S) polarity gradients. Earthquake size can be described by intensity or magnitude. Distance from the hypocenter, intensity, and duration of the vibrations, the varying characteristics are associated with new observations of earth behaviour together with other planets. Hence it becomes hard to use approaches like distance-based. The paper hence uses a Support vector machine for the analysis.

This study applied the same knowledge and calculated the potential change between the Electric fields in the E-W and N-S polarity gradients and further used it to design various models to estimate the earthquake's magnitude and time. Furthermore, it was used as a comparative study, which noted the accuracy and precision across the various models developed.

Related works

Earthquake prediction could be made with the help of the calculation of magnitudes. Asim et al. 2017) studied earthquake prediction using ML techniques such as pattern recognition neural network, recurrent neural network, Random Forest (RF) and Ensemble of trees using LP Boost, which is a linear combination of many tree classifiers where each classifier was added iteratively to the set of selected classifiers until no tree was needed to be added (Adeli and Panakkat 2009). (Panakkat and Adeli 2009) tried to predict earthquakes with a probabilistic neural network. Their paper has applied Bayesian statistics and non-parametric density approximation to calculate and make a neural network model. There was accuracy for earthquakes in their model with a magnitude between 4.5 and 6.0, but they did not yield accurate results for magnitudes greater than 6.0. Zarou et al. (Zarour et al. 2012) used Earthquake Prediction using an artificial neural network, where they predicted earthquake magnitudes in the northern Red Sea region, such as the Gulf of Aqaba, Gulf of Suez, and the Sinai Peninsula. They used other forecasting methods such as moving average, standard distributed random predictor, and uniformly distributed random predictor (Freund et al. 2017).

They even used different statistical methods and data fitting, such as linear, quadratic, and cubic regression. The results showed that the neural network model provided a higher forecast accuracy than other proposed methods. The Neural network model was at least 32% better than other proposed methods. The neural network can capture non-linear relationships than statistical methods and other proposed methods (Varotsos et al. 1986).

(Vasti and Dev 2020) applied ML algorithms to analyze the earthquake data. (Zhou et al. 2019) analyzed earthquake detection using dictionary learning. (Galkina et al. 2019) surveyed to predict an earthquake using ML methods. Earthquake prediction could also be made using non-seismic precursors, which helped in Short-term earthquake prediction. (Sgrigna and Conti 2012) tried to find a deterministic approach to investigate earthquake predictions. They studied the methodological aspects of damage prevention and prediction approaches. They also came up with an empirical approach to deterministic earthquake prediction based on medium-term and short-term ground and space precursory phenomena that have been given. (Moustra et al. 2011b) tried using Artificial Neural Networks (ANN) to predict the Earthquakes in Greece's region on different input data types. The model tried to predict the magnitude of the earthquake of the following day. (Xu et al. 2010) made use of a series of physical quantities measured by the DEMETER satellite, including Electron density, Electron temperature, ions temperature, and oxygen ion density, together with seismic belt information and then made sample sets for a back-propagation neural network. The neural network model was then to be used to conduct the prediction. (Jánský and Pasko 2018) tried to use Earthquake Lights as a precursor to predicting earthquakes. According to them, an Earthquake light occurs before, during, or after an earthquake((Alarifi et al. 2012; Moustra et al. 2011a)) (Mallouhy et al. 2019).

(Grisoni 2017) theorized “Peroxy Defect Theory” to be capable of providing explanations for the multitude of pre-earthquake phenomena. They further examined different types of precursory signals that claim to have noticeable changes before an earthquake. (Ghosh et al. 2019) studied the possibility of lower ionospheric anomalies in very low-frequency earthquakes. However, they couldn’t find a reason why lithospheric variabilities relate and result in ionospheric irregularities. Furthermore, using analytical theory, they created a simulation for their study and found that the possible setup for explaining EQL would be when the source current dipole's upper pole is shifted close to the Earth’s surface. They also found that the VLF (Vasti and Dev 2020) wave propagation study could help understand the cause-and-effect scenario of seismo ionospheric coupling. The polarity gradients primarily relate to the earth parameters for associating the magnitude. Algorithms are available for finding the fractional variation in the magnitude. The classification is also related to the gradients.

Another way for earthquake prediction is using a hybrid technique, i.e., combining different earthquake prediction methods. (Astuti et al. 2014) came up with an earthquake prediction technique involving a combination of the Singular Value Decomposition (SVD) technique for feature extraction and then used Support Vector Machines (SVM) to classify the EQ. (Zhou et al. 2017) proposed a system that combined SVM and neural networks. Its experimental results showed that the combined SVM and a neural network algorithm showed better predictability than the traditional SVM or neural network. The neural network, on its own, resulted in overfitting or underfitting. But the combination of SVM and neural networks improved that disadvantage.

(Saba et al. 2017) proposed an earthquake prediction technique using Bat Algorithm and Feed Forward Neural Network (FFNN). The Bat algorithm was used to train the weights, and the FFNN was used to predict future earthquakes based on past input data. Their experiment results showed that the proposed approach was comparable and more stable than the Back Propagation Neural Network. (Asim et al. 2018) proposed an earthquake classification system that combined Support Vector Regressor and Hybrid Neural Network to predict the earthquake. The HNN was a combination of three different Neural Networks, supported by an Enhanced Particle Swarm Optimization (EPSO), which offered weight optimization at each layer. Their numerical result showed improved prediction performance for all the considered regions than previous prediction studies (Jánský and Pasko 2018).

The reason for choosing the classifiers and algorithms is outlined below:

  1. a)

    Since the earthquake data can be time-varying, Neural Networks is chosen

  2. b)

    The features are numeric, and hence K-Means approach is chosen. It is better as it provides scope for choosing the value of K using the Elbow method.

  3. c)

    The approaches are mainly chosen to result in outliers, and hence the performance will not degrade to a new dataset.

Implementation

This section elaborates on implementing the system's magnitude estimation and time estimation design with Earth’s Electric Field.

Preprocessing of the electric field signals

In this study, EEFS are taken from Athens (ATH) for the years 2004–2011, Pyrgos (PYR) for the years 2004–2011, and Hios (HIO) for the years 2007–2008 as the data to be preprocessed for the designing of the dataset. The years 2003 (for ATH), 2003 and 2012 (for PYR), and 2006 and 2009 (for HIO) is preprocessed and is kept separately for additional testing (Validation testing dataset) model after the model has been designed to get an unbiased estimate of the accuracy and precision. The usage of these testing data will be further elaborated in the subsequent sections. These datasets were collected from www.earthquakeprediction.gr, which collected the EEFS from the three sites for research purposes.

In the files used for training, each year has 365/366 days. Therefore, there is a total of 365/366 files for each year. Each file represents a day in a particular year. The file's data contains a minute-by-minute reading of the electric field signal, which counts to a total of 1440 samples per file. There are five columns in each file, which is described as shown below (as explained by www.earthquakeprediction.gr):

  1. 1)

    First Column – Time in the hour: minute format (hh: mm)

  2. 2)

    Second Column – EEFS data along the E-W direction

  3. 3)

    Third Column – Ignored

  4. 4)

    Fourth Column – EEFS data along the N-S direction

  5. 5)

    Fifth Column – Ignored

So, the first step to the preprocessing is to calculate the EEFS \(\left(\left|\overrightarrow{E}\right|\right)\) each minute by adding the EEFS squares, E-S polarity, and N-S polarity and square rooting the summation. The formula is as shown below EQU (1):

$$\left|\overrightarrow{E}\right|= \sqrt{{E}_{EW}^{2}+{E}_{NS}^{2}}$$
(1)

Once this formula is applied to all the data, the EEFS (GfDiff) difference is calculated each day. The formula for the same is as shown below EQU(2):

$$\Delta E=E\left[n\right]-E\left[n-1\right]$$
(2)

After applying the formula, there would be a total of 1439 readings per day. The GfDiff of a day can be seen in Fig. 1. From these data, the peak value is found from each day, and the highest value and the date are recorded for further processing to create the Earthquake Dataset with which the various models are created for estimating the magnitude and time separately.

Fig. 1
figure 1

GfDiff of January 1, 2005, from the Athens monitoring site

Collecting the earthquake data

A dataset from Kaggle was used for collecting the data required to create the final dataset on which the models are applied. In this dataset collected from Kaggle, the dataset has 8 columns as described below:

  1. 1)

    First Column: Year (yyyy)

  2. 2)

    Second Column: Month

  3. 3)

    Third Column: Date

  4. 4)

    Fourth Column: Hours [Column has been ignored]

  5. 5)

    Fifth Column: Minutes [Column has been ignored]

  6. 6)

    Sixth Column: Latitude

  7. 7)

    Seventh Column: Longitude

  8. 8)

    Eight Column: Magnitude (Richter scale)

The dataset has all the Earthquake recordings for Greece from the year 1901–2018. Out of this, for the final dataset to be designed, Earthquakes from the year 2004–2011 have been extracted with a magnitude greater than 5.0, latitude ranging from 35–39.9, and a longitude ranging from 20–27.9. The data retrieved of the dates an Earthquake takes place can be seen in Table 1.

Table 1 The Earthquake Dataset retrieved after the necessary extraction

The date is related to the observed Date; the other parameters, namely latitude, longitude, and magnitude, are related to the earth's movements. It is a standard dataset, and the ground truth of the values and constraints exists. Hence no preprocessing is performed on the dataset.

The features do not present any constraint in terms of noise and values. There is no feature selection approach followed in the paper. However, in the future Principal Component Analysis can be used.

Data pruning

After retrieving dates of when an earthquake took place between 2004 and 2011, an algorithm is designed to extract the dates that have a significant GfDiff value days before the earthquake such that the days are 30 days or less. The algorithm is explained using the flowchart in Fig. 2.

Fig. 2
figure 2

Flow diagram of identifying and recording the significant GfDiff that occurs before an Earthquake

After finding the significant GfDiff that occurs 30 days or less, that Date is recorded, and the number of days between the dates is recorded. The final dataset designed can be described as the following:

  1. 1)

    First Column: Date of the EQ

  2. 2)

    Second Column: Date Prior EQ with Significant GfDiff

  3. 3)

    Third Column: No. of Days

  4. 4)

    Fourth Column: Latitude

  5. 5)

    Fifth Column: Longitude

  6. 6)

    Sixth Column: Magnitude

The total number of readings that have been recorded from the three sites is mentioned below:

  • ATH: 56 Readings

  • PYR: 56 Readings

  • HIO: 25 Readings

All the above Readings have been combined into one dataset, having a total of 137 readings. The first 20 rows from the dataset are visible in Table 2. Following the dataset's formation, the dataset was split into training and testing datasets with a test size of 33%. This was stored separately to be implemented on the models, i.e., a formal training and testing set was used for designing the models. These models were used to estimate the magnitudes and estimate the time. This will be further elaborated using in the following subsections.

Table 2 Table Displaying the First 20 Readings from the Final Dataset That Was Formed

Figure 2 has start and End, indicating the beginning and end of the model. The flow paths are connected in operation, and there is no associated condition. The conditional aspects of Yes/No have already been labelled on paper.

Magnitude estimation

The magnitudes can be classified, as shown in Table 3. On studying the dataset on which the models would be training and testing on, it was noticed that the number of reading with a magnitude greater than 6 seemed to be small. This seemed to affect the models' accuracy, which was designed as there were, resulting in an ill-defined F1-score.

Table 3 Earthquake Magnitude Classes

Hence, during the implementation of magnitude estimation, the model tried two separate classification classes. The first one followed the classification as mentioned below:

  • 5 – 5.9 – Class 0

  • 6 – 6.9 – Class 1

  • 7 – 7.9 – Class 2

  • 8 or more – Class 3

However, the second classification classes were designed, as shown below:

  • 5.0 – 5.5 – Class 0

  • 5.6 – 5.9 – Class 1

  • 6.0 – 6.5 – Class 2

  • 6.5 Or more – Class 3

This was done to study any differences in accuracy and precision when testing the model’s predictability. This will be elaborated on further in the subsequent sections.

Time estimation

To implement this, the dataset was classified into four classes, as seen in Table 4. This classification was used in training the various models.

Table 4 Classification Table for Time Estimation

As seen in Table 4, the dataset observes and defines the class according to the number of days. For instance, when we refer to Table 2, the first reading stated that the number of days between the Date of EQ and the Date with significant GfDiff before an EQ is 24 days; these places this reading into class 3. Similarly, all the readings have been given their class, and then the dataset is divided into training and testing data on which the models are trained. The accuracy, precision, and observations made have been further elaborated in the subsequent sections. The dataset does not have any errors. The round-off operation alone is performed on the data for algorithms to work.

Table 4 defines the class level rules. It is required for classifying the earthquake features. It is related to Table 3 as it forms the basis for classifying the earthquake data. More details on the same are available in EQ (Zhou et al. 2017).

Artificial Neural Network

By definition, ANN can be considered an information processing model inspired by humans' biological nervous system, just like how the brain processes information. It comprises highly interconnected processing elements (i.e., neurons) to find a pattern to solving a problem.

In this study, two models of ANN are designed. One is to estimate the magnitude; the other is to estimate the time. To create this network, a high-level API, Keras, is used to design the model. Both models are similar in design. There are 4 input dimensions in the input layer. There are three hidden layers, and finally, an output layer of one unit with sigmoid as the activation function. Sometimes, overfitting can be a problem in neural networks. One of the steps taken to avoid this is adding a dropout, a regularization technique to reduce overfitting by changing the network. When fitting the model, the epochs are set to 500. Another form of overfitting is memorization, which could be possible since the entire dataset is passed forward and backwards through the neural network for each epoch.

Another point that needs to be kept in mind is that underfitting should not happen either. To check this, the training and validation loss during each iteration could be noted. There could be overfitting if the training loss is less than the validation loss, underfitting if the training loss is more significant than validation loss. The ‘just right’ is possible when the training and validation loss are almost similar. (Fig. 3).

Fig. 3
figure 3

Training Loss and Validation Loss Plotted in a graph. Values are quite similar, so there is no overfitting or underfitting

The x-axis represents the epochs, and the y-axis represents the measured loss. So, in order to achieve this, the iterations of the epochs need to be controlled. If the number of epochs required for fitting the model is low, but the number of epochs set is like 500, there could be overfitting or underfitting. So even after regularization was applied, early stopping was also set. It is also another form of regularization where the model stops iterating if it starts to degrade. As seen in Fig. 3, even though the number of epochs set is 500, the iteration stops after 30 because the performance will be underfitting if the iteration continues.

Thus, keeping all this in mind, the ANN was designed, and the accuracy and precision are observed and noted. This will be further elaborated in the subsequent sections.

SVM-KNN

In order to design this model, an advanced technique of ensemble learning is called blending, as shown in Fig. 4.

Fig. 4
figure 4

Illustration of the Blending of the SVM-KNN Model

Here the training data is further divided into training data and validation data. The new training data is fitted into the SVM, base, and testing validation data. The previous testing data is also tested into the model. The prediction results from the validation testing and the testing set are added as features and the new dataset, as seen in Fig. 5.

Fig. 5
figure 5

Schematic Diagram of the Blending of SVN and KNN

As seen in Fig. 5, the features are added to the validation dataset and the testing test. The validation dataset becomes the new training data. This training data is then fit into the KNN model. The testing data is applied to the prediction model, and the accuracy and prediction are observed.

The dataset is splatted into Training, Validation, and Test Data. The validation data is used for defining the parameters for the model. This is done as there could be a possibility of losing some classes, leading to the ill-definition of individual classes.

SVM-ANN

This model is also similar to the SVM-KNN model, as seen in Fig. 6. Here, the SVM model will be the base model, and ANN will be the final model. The schematic diagram (Fig. 5) explains how the process has been implemented. In ANN, the possibility of overfitting and underfitting needs to be taken into account. So, in order to prevent the same, regularization by adding dropout, which modifies the neural network, is added. Another method applied here is the early stopping, which stops the iteration when training and validation loss differ significantly.

Fig. 6
figure 6

Illustration of the blending of SVM and ANN Models

The ANN has an input layer with the input dimension set to 5 as the prediction result from SVM is added into the validation dataset, thus increasing the size of the input dimension by a unit. There are three hidden layers in the neural network, and the output layer has only one output, which uses the activation function sigmoid. The accuracy and precision of this model are observed and elaborated on in the subsequent sections.

A drawback is that in the final model is that the dataset is changed, as mentioned previously in 3.7. This is mainly because a small portion of the training data, i.e., the validation data, is used as the final model's training data, i.e., ANN. There could be a possibility of losing some classes, which further leads to the ill-definition of individual classes.

SVM-KNN-Logistic Regression

This model uses SVM and KNN as the base class and then Logistic Regression as the final model. An illustration of how it works is depicted in Fig. 7.

Fig. 7
figure 7

Illustration of the Blending of SVM, KNN, and Logistic Regression

As shown in Fig. 7, the dataset is split into training and testing data. This training data is further split into training and validation data. This new training data is fitted into the SVM model, and then this model is tested using the validation data. The results yielded from the prediction of the validation data are recorded. After this, the new training data is fitted into the KNN, a base model. The KNN model is tested with the help of the validation data. The results yielded from the prediction of the validation data are also recorded.

In the previous case, only the SVM validation data's predicted results were added into the validation data, but in this case, the prediction results of both SVM and KNN are added. These recorded results from the SVM prediction and the KNN are added to the validation data, thus increasing the input dimensionality by 2. The original testing data is also predicted by the base models SVM and KNN. The results yielded from the prediction are added to the original testing data, which results in an increase in the dimensionality by two. This testing data is used to test the final model. The validation data is now the training data, fitted into the final model, the Logistic Regression model. This model is then tested with the original testing data, and the accuracy and the precision in the predictability are observed. This entire process is illustrated in the schematic diagram (Fig. 8). It can be observed that this process is similar to the process of blending SVM and ANN or even SVM and KNN, which was illustrated in Fig. 5.

Fig. 8
figure 8

Schematic Diagram of the Blending of SVM, KNN, and Logistic Regression

As mentioned before, in Sect. 3.7 and Sect. 3.8, a drawback is that in the final model is that the dataset is changed. This is mainly because a small portion of the training data, i.e., the validation data, is used as the final model's training data, i.e., ANN. There could be a possibility of losing some classes, which further leads to the ill-definition of individual classes. This issue is usually prevalent when the number of training and testing data is not so huge. Since the dataset only contains 137 readings, this issue could be seen. The dataset's size could not be increased as there were not enough EEFS Readings to increase the dataset's size.

SVM-ANN-KNN

As shown in Fig. 9, similar to Sect. 3.9, the dataset is divided into training and testing datasets with a test size of 33%. The training data is divided into training and validation data, with a test size of 33%. This new training data is fitted into the base model SVM, and thus the SVM model is created, then the same new training data is fitted into the following base model, ANN.

Fig. 9
figure 9

Illustration of the Blending of SVM, ANN, and KNN

Here, the ANN model contains an input layer with an input dimension of 4, there are 3 hidden layers, and finally, the output layer has only one output with the activation function sigmoid. Here, overfitting and underfitting are taken into account. So, to prevent the same, early stopping stops the iterations when a vast difference between the training and validation loss exists and dropout layers have been added to modify and regularize the neural network. After the training data is fitted into the base models, the validation data is used to test the model's predictability. The results obtained from the base models are added to the validation data, thus increasing the dimensionality by two units. The original test data is also used for the base models, and the results are added to the test data. The new validation data becomes the training data for the KNN model; the new test data is used to test the model's predictability. Then the accuracy and precision are observed and elaborated in subsequent Sections.

The Schematic Diagram in Fig. 8 can explain this process, but the difference is that instead of KNN being a base model, it is the final model and the base model is ANN and SVM instead. The drawbacks are the same as mentioned in Sect. 3.10 and Sect. 3.9, i.e., the loss in classes that affect the precision, accuracy, and F1-scores, making them ill-defined.

SVM-ANN-Logistic Regression

As seen in the illustration above (Fig. 10), the dataset is divided into training and testing data. This training data is further split into training and validation data.

Fig. 10
figure 10

Illustration of the Blending of SVM, ANN, and Logistic Regression

This new training data is fitted into the base models SVM and ANN. Then the validation data is tested on these base models. The results obtained from these base models' predictions are added to the validation data, increasing the dimensions of the two units. The same is applied to the testing data, and the results are added to the testing dataset.

Here, the ANN model contains an input layer with an input dimension of 4, there are 3 hidden layers, and finally, the output layer has only one output with the activation function sigmoid. Here, overfitting and underfitting are taken into account. To prevent the same, early stopping stops the iterations when a considerable difference between the training and validation loss exists and dropout layers have been added to modify and regularize the neural network.

The validation data is used as the training dataset on the final model with Logistic Regression. Further testing data is used to test the predictability. Finally, after results from the prediction of the validation data and the base models' testing data and the results being added to the validation data and testing data, respectively. Then the accuracy and precision are observed and elaborated in subsequent sections.

The Schematic Diagram in Fig. 8 can explain this process, but the difference is that instead of being a base model, the base model is ANN and SVM instead, with the final model remaining the same. The drawbacks are the same as Sect. 3.10 and Sect. 3.9, i.e., the loss in classes that affect the Precision, Accuracy, and F1-scores, making them ill-defined.

Performance evaluation

This section discusses the evaluation of the performance of the models designed and tested. A validation dataset had also been created for unbiased observation and estimation of the model. The data is collected from the sites Athens (ATH), Pyrgos (PYR), and Hios (HIO). These years contain inconsistent data as some of the days’ readings of the EEFS is not available. The years of data taken from the sites and the available range of dates are mentioned below:

  • ATH – Contains data from the year 2003, from April 15 to December 31

  • PYR – Contains data from the year 2003, from May 23 to December 31, and data from the year 2012, from January 1 to March 17

In this folder, specific dates were found to be missing.

  • HIO – Contains data from the year 2006, from March 18 to December 31, and data from the year 2009, from January 1 to November 11. But since this was used just for validation purposes, this issue was overlooked.

The final validation dataset formed after the preprocessing of the EEFS, extraction of the earthquake dates, and combining the dates with the dates having a significant GfDiff before 30 days or less can be seen in Table 5.

Table 5 A dataset that was used for Validation Testing

Magnitude Estimation type 1

The classification of the magnitude of type 1 is as follows:

  • Class 0 – Magnitude between 5 and 5.9

  • Class 1 – Magnitude between 6 and 6.9

  • Class 2 – Magnitude between 7 and 7.9

  • Class 3 – Magnitude 8 or more

The dataset was split into training and testing sets, and the models have trained accordingly. In the models which included an ANN, overfitting and underfitting had been taken into consideration. The number of epochs when training the model had been set to 500. A dropout layer has been added to regularize each layer. Additionally, there was an early stopping feature to prevent that the training and validation loss is similar. When the training loss is greater than the validation loss, then it is a case of underfitting. When the training loss is lesser than the validation loss, then it is a case of overfitting. The ‘just right’ condition would be when the training loss is similar. So, the first stopping feature has been implemented where the model will keep stop training when the performance ceases to improve. But, in some instances, there could be a chance for the model’s performance to worsen, and then it starts to improve. So, a patience argument has been set, and a callback feature saves the best model observed, and that model is loaded. The training and validation losses have been plotted in Fig. 11 (a) for the ANN model, Fig. 11 (b) for the SVM-ANN model, Fig. 11 (c) for the SVM-ANN-KNN model, Fig. 11 (d) for the SVM-ANN-Logistic regression, along with the number of epochs used while training the data for Magnitude Estimation of type 1.

Fig. 11
figure 11

(a) ANN (b). SVM-ANN model (c). SVM-ANN-KNN model (d). SVM–ANN–Logistic Regression

On the other hand, accuracy checks the percentage of accurate predictions, and hence as shown in Fig. 11 (a), (b), (c), and (d) early stopping, and callback helped create a model with the training and validation loss as minimal as possible. Thus, with the two's help, ANN stops training after 19 epochs; SVM-ANN, after 316 epochs; SVM-ANN-KNN, after 34 epochs and finally, SVM-KNN-Logistic regression, after 35 epochs. The error measurements help estimate the performance of a model. In order to comparatively analyze the performance of the models created, accuracy, macro-precision, and macro-F1 score are analyzed. Precision measures the number of relevant results returned than the irrelevant ones. But precision doesn’t necessarily mean that the model is good depending on the precision alone. It is not the best performance measurement. F1-score, on the other hand, is a harmonic mean between precision and recall. Since the label sizes are not balanced in the dataset, macro-precision and macro F1-scores are analyzed. Macro measurements ensure that there is a bias against the least popular labels. The performance of the models of the magnitude estimation of type 1 can be seen in Table 6. Further inferences are also presented as below:

  1. 1)

    The original dataset does not have the class label. The class label learned with unsupervised learning allows for classifying the Earthquake dataset. It supports the class label feature as part of the training dataset.

  2. 2)

    Hence during testing, the dataset is predicted with a suitable class label using ANN.

Table 6 Performance Evaluation of The Testing Data on The Different Models Designed for Magnitude Estimation of type 1

Since ANN is used, the learning rate is related to the Error learned during the training process. It will allow for reducing the loss further.

On checking the testing data's performance, it can be observed that for the magnitude estimation of type 1, the SVM–ANN–Logistic Regression seems to have better performance in predicting the magnitude than the other models. While on the other hand, all the other models seem to have the same performance result. This could be mainly because of the unbalance label size, i.e., the amount of data of Class 1 must be less than Class 0, due to which there is an ill-definition in the prediction of samples. However, the validation test’s performance evaluation was observed to have 100% accuracy. When one looks at the validation data set (Refer Table 5), by looking at the magnitude, we can see that fall in the Class 0 category is based on the magnitude classification of type 1. Thus, validation testing is not helpful in the performance evaluation of the magnitude classification of type 1.

Magnitude Estimation type 2

The classification of the magnitude of type 2 is as follows:

  • Class 0 – Magnitude between 5 and 5.9

  • Class 1 – Magnitude between 6 and 6.9

  • Class 2 – Magnitude between 7 and 7.9

  • Class 3 – Magnitude 8 or more

Magnitude Estimations of type 2 are similar to that of type 1. The only difference lies in the classification of the classes. Thus, when training the models with ANN, the number of epochs while fitting the model changes, and the performance changes. In these models, dropout layers have been added to regularize the neural networks. The number of epochs when training the model had been set to 500. In order to avoid overfitting and underfitting, early stopping and callbacks have been implemented. The primary function of early stopping is to stop the training process when the model’s performance worsens. When the training loss is greater than the validation loss, then it is a case of underfitting.

When the training loss is lesser than the validation loss, then it is a case of overfitting. The ‘just right’ condition would be when the training loss is similar. So, the first stopping feature has been implemented where the model will keep stop training when the performance ceases to improve. But, in some instances, there could be a chance for the model’s performance to worsen, and then it starts to improve. So, a patience argument has been set, and a callback feature saves the best model observed, and that model is loaded.

The training loss versus validation loss graphs of the models involving an artificial neural network, i.e., ANN, SVM-ANN, SVM-ANN-KNN, and even SVM-ANN-Logistic regression, can be seen in Fig. 12 (a), (b), (c), and (d) along with the number of epochs used while training the data for Magnitude Estimation of type 2. As shown in Fig. 12 (a), (b), (c), and (d), early stopping and callback helped create a model with the training loss and validation loss as minimal as possible. Thus, with the help of the two, ANN stops training after 3 epochs; SVM-ANN, after 3 epochs; SVM-ANN-KNN, after 377 epochs and finally, SVM-KNN-Logistic regression, after 3 epochs.

Fig. 12
figure 12

a ANN (3 epochs) b SVM–KNN (3 epochs) c SVM–ANN–KNN (377 epochs)d SVM–KNN–Logistic Regression (3 epochs)

The error measurements help estimate the performance of a model. In order to comparatively analyze the performance of the models created, accuracy, macro-precision, and macro F1-score are analyzed. Precision measures the number of relevant results returned than the irrelevant ones. But precision doesn’t necessarily mean that the model is good depending on the precision alone. On the other hand, accuracy checks the percentage of accurate predictions, and hence it is not the best performance measurement. F1-score, on the other hand, is a harmonic mean between precision and recall. Since the label sizes are not balanced in the dataset, macro-precision and macro F1-scores are analyzed. Macro measurements ensure that there is a bias against the least popular labels. The performance of the models of the magnitude estimation of type 2 can be seen in Table 7.

Table 7 Performance Evaluation of The Testing Data on The Different Designed for Magnitude Estimation of Type 2

The x-axis is varied with epochs. The y axis is associated with the rate of learning. From the performance evaluation of the testing data, a difference can be seen in the performance compared to the previous Magnitude Estimation models. Here, ANN, SVM-KNN, and SVM-ANN seem to have the same performance, while SVM-KNN-Logistic Regression, SVM-ANN-KNN, and SVM-ANN-Logistic Regression have different performances compared to the other three. Here, the worst performance is shown by SVM-ANN-KNN, while on the other hand, SVM-ANN-Logistic regression seems to be the best performing model for both types of Magnitude classification.

On testing the performance of all the models with the validation dataset, all of the models seem to have the same percentage of performance, i.e., 0.89 accuracies, 0.44 macro-precision, and 0.47 macro F1-score except for SVM-KNN-Logistic regression, which differs in the micro-precision and so it has a micro-precision of 0.48. The possible reason why the performance is the same is that the validation data contains Earthquakes with a magnitude between 5 and 6 (Table 5). There are only two readings of class 1, and the balance of 16 belongs to class 0. Thus, performance evaluation on the validation test data set is less functional.

Time estimation

The classification of time for the estimation of the time of a model’s earthquake is shown below:

  • Class 0 – 1–7 days

  • Class 1 – 8–14 days

  • Class 2 – 15–21 days

  • Class 3 – 22 or more

Here, the dataset is split into training and testing data used to design the models. In models that involve ANN, the possibility of overfitting or underfitting is taken into account. So, the layers are regularized using a dropout layer. Additionally, for training the model, the number of epochs has been set to 500. One epoch represents how a dataset is passed forward and backwards through the neural network once, so 500 epochs mean 500 times the dataset is passed through the neural network. The major problem is the possibility of memorization where the neural network can memorize the result, which leads to overfitting.

So, during each iteration, the training and validation loss can be observed. If the training loss is lesser than the validation loss, then overfitting occurs; if the training loss is greater than the validation loss, then underfitting occurs. A ‘just right’ condition would be when the training and the validation loss. So, if the model's performance keeps getting worse after each iteration, then underfitting or overfitting could have taken place. In order to prevent this, early stopping has been set up, which stops the training if the model’s performance starts to worsen. Additionally, callback helps in choosing the model which best suits the performance.

Their training versus validation loss has been plotted for the models with an ANN, as shown in Fig. 13. These plots show how the model stops training after a certain number of epochs based on the training performance. The default value of epochs has been set to 500. As seen in Fig. 13, except for ANN, the models have 500 epochs for the models' training. The error measures observed are used for the estimation of the performance of the models. The error measures used are macro-precision, accuracy, and macro F1-score. The reason macro is chosen is that the label sizes of the dataset are not perfectly balanced. So, the macro measures tend to bias the least popular ones. Thus, macro-precision is observed, and macro F1-score too. Accuracy is used to measure how many readings were predicted accurately. The performance measures of the models are observed in Table 8.

Fig. 13
figure 13

ANN (3 epochs) b SVM–ANN (500 epochs) c. SVM–ANN–KNN (500 epochs) d. SVM–KNN–Logistic Regression (500 epochs)

Table 8 Performance Evaluation of the Testing Data Designed on the Different Models Designed for Time Estimation

From the model's performance evaluation, as seen in Table 8, it is visible that ANN and SVM-ANN show the worst performance. Both models have low accuracies, macro-precision, and macro F1-scores. While on the other hand, the SVM-ANN-KNN and SVM-KNN-Logistic Regression seem to show good performance. SVM-KNN and SVM-ANN-Logistic Regression seem to perform reasonably well too. The model parameters have been based on Galkina et al. (2019).

The validation testing data was used to check the performance of the model. Unlike the magnitude estimation, the estimation of time in the different models showed different performance results. Even if the dataset had an imbalance, it was not as prominent as the magnitude estimation.

From Table 9, as expected, ANN and SVM-ANN performed poorly. SVM-KNN showed a decent predictability performance. On the other hand, the validation data was correctly predicted by SVM-KNN-Logistic Regression and SVM-ANN-Logistic Regression, with SVM-ANN-KNN following. Thus, SVM-KNN-Logistic Regression, SVM-ANN-KNN, and SVM-ANN-Logistic Regression can be considered ideal models in time estimation.

Table 9 Performance Evaluation of the Validation Test Data on The Different Models Designed for Time Estimation

Conclusion

This study aimed to design models to estimate the magnitude and to estimate the time. This prediction system uses Earth’s Electric Field signals from Athens, Pyrgos, and Hios. The dataset of the electric field taken for processing was from the years 2004–2011. The electric field for each reading was calculated, and the GfDiff was calculated. Then the maximum peak value of each day is found. Then the day with the highest significant GfDiff that occurs 30 days or less from the date of the earthquake is placed is noted down, and the number of days between the date of the earthquake and the Date with significant GfDiff is calculated. The regression analysis is used as an initial step towards classifying earthquake magnitude. Further, as the dimensions are greater than 3, other classifications and learning models have been used in the paper. Once the dataset is made, the dataset is split into training and testing data with a test size of 33%.

Then the training data is fit into the models designed for magnitude estimation and time estimation. The models designed were ANN, SVM-ANN, SVM-KNN, SVM-KNN-Logistic Regression, SVM-ANN-KNN, and SVM-ANN-KNN. These models’ performances were evaluated. It was observed that the data labels were not balanced for the magnitude estimation models, due to which the performance metrics were ill-scored. But reviewing the models for magnitude estimation, the model that did reasonably well was the SVM-ANN-Logistic Regression Model. SVM-KNN-Logistic Regression, SVM-ANN-KNN, and SVM-ANN-Logistic Regression models performed well with good prediction results for the time estimation model. The model tries to predict the earthquake's magnitude and predict when it would occur for future work. It will allow for safety precautions and will direct the community with warning levels.