1 Introduction

Earthquake is one of the major catastrophes, and around 76 million people from about 39 states and provinces are still at risk of earthquakes [9]. Considering the recent cases, more than 377 earthquakes of magnitude greater than 5.3 have occurred across the world till now in 2020 and United States alone has witnessed 37,307 earthquakes in the past 365 days which is accounts for the largest number of earthquakes with magnitudes ranging from 3.8 to 7.8 and has caused numerous fatalities [51]. This has been the main reason to carry out this research for the United States as it is the most affected region and has faced a lot of earthquakes [34]. If this disaster predicts in advance, it can save a lot of human lives [43]. It can’t be denied that the seismologists still had not been successful in the development of an earthquake prediction model. Some people believe it is impossible to predict an earthquake in advance, while some researchers utilize their efforts to construct an earthquake prediction model [1]. Hence, this is one of the significant research challenges, and many researchers work on this [47].

There are various reasons exists which makes the earthquake magnitude prediction difficult for the seismologists [55]. The absence of technology to monitor multiple parameters, including stress, pressure, and temperature changes, is also one of the primary reasons that make the prediction process difficult [15]. This is the reason for the unavailability of data regarding seismic features [19]. The gap among the seismologists and researchers for discovering the new aspects of the research for this problem statement is also one of the primary reasons, making it even more difficult [5]. Western states and provinces of the United States, including Alaska, California, Oregon, Hawaii, and Washington, are more prone to earthquakes [10]. Its effects can also be seen in the mountain regions, eastern seaboard (especially in South Carolina) and New Madrid Seismic Zone, which is located in the central US. Earthquake-prone areas of the United States are represented in Fig. 1 [16].

Fig. 1
figure 1

Earthquake hazards in the United States [16]

As the earthquakes are a huge threat to the lives and economy of a country [18], the main focus of this study is to prescience the magnitude of earthquakes beforehand by utilizing a fusion of best performing machine and deep learning regression models and computing the error values to validate the accuracy and the performance of these algorithms.

Initially, earthquake prediction problem is considered as a time series prediction [37]. Various researches show that seismologists had been used different earthquake prediction mechanisms in the past. Various trends were observed corresponding to the earthquakes in the prediction methods of various researches [11]. These trends involved sub-soil radon gas emission [24], total electron content of ionosphere [33], the magnetic field of the earth [45], etc. This proposed work aims to develop a prediction model to estimate the earthquake magnitude of the United States region by using the position and depth parameters for various radius values. Three regression models, namely Random Forest Regression, MLP Regression, SVR, are used in this research. At last, all the proposed algorithms are empirically compared by analyzing it via RMSE evaluation. The objectives of the paper are as follow:

  • To split the entire dataset on basis of various radius values and carry out a detailed exploratory data analysis on different datasets and study the effect of the parameters such as latitude, longitude, depth and their impact and share in causing an earthquake of a certain magnitude.

  • To use to various supervised learning algorithms and predict the magnitude and thus the severity caused by earthquakes depending on the radius values and the aforementioned parameters.

  • To analyse the impact caused by earthquakes of certain magnitudes and detailed analysis on the Root Mean Square Error value for each algorithm which depicts the error in actual and predicted results.

The rest of the paper’s organization is as follows: Section 2 gives a detailed review of the various research that takes place in the world for earthquake prediction in advance. Section 3 discusses the methodology, Algorithms or techniques used, evaluation metrics used, and the proposed model of this research. After that, Section 4 deals with the experimental results, including the analysis of dataset and all the models individually, along with the empirical comparison of all the models followed by the RMSE calculation and analysis. At last, Section 5 provides the conclusion and suggest future areas of research.

2 Literature survey

Many types of research are being done on the prediction, scales, and harms caused by Earthquakes. The prediction of such parameters becomes extremely crucial as earthquakes cause widespread damage and loss. Knowing the severity, potential areas, and harms caused by earthquakes is a big step towards better management of such calamities.

G. Lanzano et al. worked on revising the framework of ground agitation for trifling the crustal earthquakes that are taking place in Italy, tapered in the 4.0 to 6.9 magnitude range. It utilizes durable-motion data that is measured up to the 2009 L’Aquila Sequence. Further, in this, the new collection of data allows us to extend the range of the magnitude exceeding 6.9, including vibration periods of up to 10 s. The ground agitation variability is broken down within components amidst event and site to site to form the model, which is suitable for the assessment of non-governable probabilistic seismic hazard [31].

G. F. Panza et al. presented an extended development of assimilation of seismological as well as geodetic instruction, showing the benefaction of geodesy to the realization and prognosis of earthquakes [41]. P. kundu et al. proposed a probabilistic accession for the estimation of the expected rebound time of an earthquake of a particular magnitude, within an anchored life span of structure succeeded by the determination of the peak grounded acceleration at the site of structure in Chilean area based on the Gutenberg Richter Law. The data here is held from the USGS (United States Geological Survey), and the procedure here can be applied even before the construction of structure at a site to appease the death toll caused due to the collapse of the structure [28].

Q. Wang et al. employed a deep learning approach called LSTM (Long short-term memory) networks to apprentice the Spatio-temporal relation amidst earthquakes in varied regions and make foresight by taking the benefit of that particular relationship. The outcomes show that these networks with 2-D input can exploit the correlations that are Spatio-temporal to make far better predictions [54]. B. Idini et al. worked on a database of robust agitation records for the Chilean subduction earthquake zones. They made a ground motion prescience equation (GMPE) for apex ground acceleration along with a riposte spectral expedition with a 5 % damping proportion for periods in between 0.01 and 10 s [22]. D. ju et al. proposed two contemporary procedures for the evaluation of fault parameters of asperity frameworks for the prognosis of tenacious ground agitations from crustal level earthquakes. One is for the long strike-slip faults, and the other one is for lengthy antipode faults [12].

G. M. Molchan et al. analyzed the portentous seismicity methodology, also described as pattern B, which is evaluated in 13 areas of the world. Its great demographic connotation is assured. The mathematical accession advanced here is useful in the analysis of the harbinger of earthquakes [36]. C. Papantonopoulos et al. used the unique element method to foretell the earthquake counter of the multi-drum marble framework of a restrained column. The outcomes are compared with the experimental data for a similar specimen under similar excitation. The experiments and the numerical analysis both took place in 3D. The results tell that the distinct element methodology can captivate the main features of the response [42].

V.G. Gitis et al. suggested a new technique to estimate the constraints of in homogeneous spatiotemporal marked point fields. It is built on the idea of adaptive weights smoothing (AWS). In this paper, a wide variety of the AWS algorithm is constructed to calculate the spatial and spatiotemporal fields of density, the mean values, along with the correlation dimension. This algorithm is utilized to assess the seismic process criterion fields from certain earthquake litanies. The AWS forecasting method surpassed the forecasting using kernel estimation [17]. G. Asencio-Cortes et al. analyzed the effect of using various parameterizations for inputs in the supervised learning algorithms through a new framework. Five varied analyses were conducted, which involved the tweaking of training and testing sets for the scheming of b-value and the tuning of collected gauges [2].

K.M. Asim et al. worked on the prediction of the magnitude of earthquakes using the temporal arrangement of seismic wave activities, combined through various machine learning classification algorithms. The prognosis was done on the foundation of eight seismic indicators utilizing the catalog of earthquakes. In these four techniques, including recurrent neural networks, pattern recognition neural networks, linear programming boost, and ensemble classifier and random forest classifiers were used to calculate seismic parameters and further occurrences of earthquakes [4]. J. R. Holliday et al. worked on the informatic pattern analysis by using the complex eigenvectors and created the short-term forecast of hotspot maps that are different from hotspot maps which are created by using real-valued data. They also suggested various methods of analyzing differences and computing the information gain [21]. H. Cam et al. worked on a feed-forward back propagation artificial neural network related to Gutenberg-Richter relation, based on which b values are used in earthquakes is developed [8].

After discussing the various researches, it is observed that most of these use the seismic wave nature/activities and spatio-temporal features for prediction purpose but exploring geological features for prediction has not been carried out in the aforementioned researches. Using these parameters for magnitude prediction has given relatively less error and proves to be a decent approach for carrying out the prediction. This study is focused on the prediction of the magnitude of earthquakes depending on the latitude, longitude, depth factor, and radius of a given region. Different regression algorithms have been applied for the analysis. This work considers minimizing the damage caused by earthquakes by predicting its magnitude, taking the precautions beforehand, and saving many lives along with the commercial properties.

3 Methodology

For analysing and predicting the magnitude of an earthquake, the United States dataset is taken. After pre-processing of the dataset, various regression algorithms, including Random Forest, Support Vector Regression, and MLP Regressor, are applied, which are also discussed in this section. For measuring the performance, RMSE metrics are used in this research. This section discusses about the proposed methodology in detail.

3.1 Proposed model

An overview of the proposed model encompasses the dissection of the bulk dataset into seven datasets on the basis of radii values, followed by data cleaning and further processing. The fresh datasets are then fissured until the apex accuracy (least RMSE value) of 1.647 in case of minimum radius and 0.428 in case of maximum radius is achieved through a requisition of best performing regression models. At last validation is carried out over an invariable magnitude value which is followed by reckoning the error metrics to calculate the efficiency.

Figure 2 represents the proposed model of our research. Initially, datasets of different radius values (100, 200, 500, 1000, 1500, 3000, 5000) are taken and pre-processed. Each of the datasets is split into the training and test set. The model is then trained with the algorithms including Random forest (RF) Regression, Multi-Layer Perceptron (MLP) regression, and Support Vector Regression (SVR). The splitting and training continue until the maximum accuracy is achieved. The main motivation behind the selection of these three algorithms for this problem statement is as follows:

  1. 1.

    Since, the ensemble algorithms are widely utilized for solving real-time problems. Hence, Random Forest is chosen to solve the earthquake prediction problem with the optimized results.

  2. 2.

    Since SVR is quite popular to derive a function which maps the input domain values to real numbers, hence SVR is also utilized in our proposed work.

  3. 3.

    Since the MPL Regressor has the capability to learn the models in real-time problems like this earthquake prediction model using partial_fit parameter, hence it is also considered at the time of designing the solution.

After the completion of the training, the models are then validated to predict a constant magnitude of 4.4. Finally, results are evaluated using the RMSE metrics.

Fig. 2
figure 2

Proposed model using various regression algorithms

3.2 Algorithms and techniques

In this paper, three regression models, namely, Random Forest Regression, Support Vector Regression and the Multi-Layer Perceptron Regression are used to analyse the intensity of earthquakes, along with the prediction of the magnitude of the earthquake.

3.2.1 Random forest regression

A Random Forest is a type of model using an ensemble approach to deliver good predictive outcomes [23]. It is one of the approaches used to perform both regression and classification tasks using multiple decision trees, and a technique known as Bootstrap Aggregation, or bagging for the same process [50]. The basic idea behind this algorithm is to combine different decision trees and arrive at a final result [59].

The mathematical formulation for the model is shown below in Eq. 1.

$$h\left(x\right) = f0\left(x\right)+f1\left(x\right)+f2\left(x\right)+f3\left(x\right)+\dots fn\left(x\right)$$
(1)

Where h(x) is the summation of base models, and the output is an ensemble of these models, which are at the root level, with various decision tree models only [58]. Figure 3 below shows an RF Regressor model [57].

Fig. 3
figure 3

Random forest regression model [57]

The tree is formed in Decision trees by specifying the important variables as nodes, but arbitrariness is added to the model as the tree grows in the case of Random Forest. This model also helps save time, since in this case, very little time is spent in hyper-parameter tuning.

3.2.2 Support vector regression

Support Vector Machine (SVM) is an algorithm which embraces both linear regressions along with nonlinear regression [39]. As it seems in Fig. 4, the primary objective is to accommodate as many cases between the lines as possible while restricting breaches of the margins [13]. In this model, the idea of contravention is seen as epsilon (ε). The Support Vector Regression thus finds the appropriate line or the hyperplane, if we consider the higher dimensions [48].

Fig. 4
figure 4

Support Vector Regression Model [48]

3.2.3 Steps to build a Support Vector Regression Model

  1. 1.

    Assembling the training set.

  2. 2.

    Making selection of the kernel, the parameters, and the regularization if necessary.

  3. 3.

    The building of the correlation matrix, which is depicted in the below Eq. 2.

    $${K}_{i,j}=\text{exp}\left(\sum {_k}{\theta }_{k}{\left|{x}_{k}^{i}- {x}_{k}^{j}\right|}^{2}\right)+ \in {\delta }_{i.j}$$
    (2)
  1. 4.

    Training of your model, using the main part of the algorithm to get the contraction coefficient and compute the coefficient value of k using Eq. 3.

    $$k\overrightarrow{\alpha }= \overrightarrow{y}$$
    (3)
  1. 5.

    Now, utilizing the co-efficient from the equation above and they* from Eq. 4 to create the estimator as shown in Eq. 5.

    $${y}^{*}= \overrightarrow{\alpha } \cdot \overrightarrow{k}$$
    (4)
    $${K}_{i}=\text{exp}\left(\sum {_k}{\theta }_{k}{\left|{x}_{k}^{i}- {x}_{k}^{*}\right|}^{2}\right)$$
    (5)

It deviates from Linear Regression in the way that the primary aim in linear regression [6] is to abate the errors between both the estimated besides the actual data given while in SVR, the end goal is to make sure that the errors don’t outstrip the threshold [53].

3.2.4 Multi-layer perceptron regression

A neural network is made of a perceptron, which simply means that the input it receives is multiplied by some weights and then made to pass through an activation function [29], and an output is produced. The neural networks formed by the addition of layers of perceptron together are known as multilayer perceptron model [38].

MLP Regressor does the training iteratively, as every time the partial derivatives of the cost function are updated for the model parameters [35]. A regularization can also be added in this case to shrink the model parameters and prevent overfitting [25]. Figure 5 below shows an MLP Regression model [56].

Fig. 5
figure 5

MLP regression model [56]

The classic MLP Regressor is used for implementing an MLP which trains by using backpropagation without any activation function in the output, or it can be perceived as using the identity activation function. It exploits the square error as its loss function, and the result is a set of incessant values.

The MLP Regressor uses the L2 regularization, which is useful in avoiding the overfitting problem by penalizing the weights with huge magnitudes.

3.3 Evaluation metrics

After the evaluation of the data distribution in the csv files, we find out that the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) are the perfect fit for this problem statement instead of other available evaluation metrics. Hence, we utilized these two evaluation metrics in our research. These metrics are discussed in detail below:

3.3.1 Mean Square Error (MSE)

This is obtained by calculating the square mean, which is the difference between the initial sample data and the approximate values taken [46]. This error shows the regression line ‘s effectiveness, and the smaller MSE error value shows that the fit is better because the error magnitude is minimal [52]. The error function below is expressed by Eq. 6.

$$MSE=\frac{1}{N}\sum\nolimits _{i=1}^{n}{({Y}_{i} -\widehat{ {Y}_{i}})}^{2}$$
(6)

Where N is the total number of observations, \({Y}_{i}\)is the actual value, and \({\widehat{ Y}}_{i}\) is the predicted value. The difference between them is calculated, squared, and the summation is performed over them to achieve the final loss [27].

3.3.2 Root Mean Square Error (RMSE)

Root Mean Square Error (RMSE) is the normal distribution of residuals (prophecy errors). Residuals are the extent of how far these data points are from the regression line, and they are a measure of how those residuals are spread out . Equation 7 illustrates that error:

$$RMSE= \sqrt{\frac{1}{N } {\sum }_{i=1 }^{n}({{Y}_{i} -\widehat{ {Y}_{i}})}^{2}}$$
(7)

Where N is the total number of observations, \({Y}_{i}\) is the actual value, and \({\widehat{ Y}}_{i }\)is the predicted value [26]. The difference between them is calculated, squared, and the summation is performed over them to achieve the final loss [32]. Finally, the root is taken over this while calculating the accuracy [30].

4 Experimental result and analysis

This section provides the experimental results of the proposed model. All the seven datasets are randomly split into training and test set. Around 3/4th of the data is utilized for training the model, and 1/4th is utilised for testing the model. The model is repeatedly trained and tested until the maximum accuracy is achieved with each proposed model. Finally, after the completion of the training and testing of data, all the models are validated for the prediction of a constant magnitude of 4.4. Each model’s behaviour is analysed by validating it on all the seven datasets used in this research and by plotting the corresponding graphs between the earthquake magnitude and radius around the target earthquake. An empirical comparison of all the proposed models is performed after individual algorithm analysis, and RMSE is calculated and analysed.

4.1 Dataset

The data for this research is taken from the USGS website (https://earthquake.usgs.gov/data/) and is exported in the form of CSV files. Various datasets are used in this research. These datasets consist of features and magnitude values for different radius values (100, 200, 500, 1000, 1500, 3000, and 5000) around the target earthquake. Datasets are pre-processed by eliminating the unnecessary attributes and dropping the null values in the fields. The final features consist of position parameters (i.e., latitude and longitude) and depth for the prediction of earthquake magnitude. Table 1 represents the sample dataset for a radius value of 5000.

Table 1 Sample dataset for radius 5000

4.1.1 Data analysis

The distribution of the features, along with the correlation between each possible pair of datasets, is analysed. The distribution plots are shown by the histograms, while the correlation plots are demonstrated by the scatterplots. The distribution and correlation plots among all possible pairs are represented in Fig. 6.

Fig. 6
figure 6

Distribution and correlation plots among all possible pairs

The correlation of these variables is also analyzed using the heatmap of the seaborn library of python [14], which provides the correlation values among these pairs on the scale of 0 to 1 [7]. The correlation matrix using the seaborn heatmap is shown in Fig. 7.

Fig. 7
figure 7

Correlation matrix on a scale of 0 to 1 using the seaborn heatmap

As can be analysed from Fig. 7, the magnitude depends on the position compared to the depth of the target area. Longitude and latitude are highly correlated with each other. The depth parameter also shows some sort of correlation with the latitude.

4.2 Regression model analysis

4.2.1 Random forest regressor

The Random Forest model is repeatedly trained and tested by changing the number of estimators (i.e., no of decision trees) until the minimum RMSE is achieved [44]. The maximum efficiency of the Random Forest model is achieved by selecting 200 estimators. After completion of the training, the model is validated to predict an earthquake magnitude of 4.4 for all the available datasets of different values of the radius. Figure 8 represents the magnitude values predicted by the Random Forest model for different radius. The actual magnitude value considered here is 4.4, which is represented by a green dashed line.

Fig. 8
figure 8

Radius vs. Magnitude plot for Random Forest Regression

4.2.2 Support vector regressor

The Support Vector Regression model is also repeatedly trained and tested until the minimum RMSE is achieved [40]. The ‘rbf’ kernel is used for the training of the model. After completion of the training, the model is validated to predict an earthquake magnitude of 4.4 for all the available datasets of different values of the radius. Figure 9 represents the magnitude values predicted by the Support Vector Regression model for different radius. The actual magnitude value considered here is 4.4, which is represented by a green dashed line.

Fig. 9
figure 9

Radius vs. Magnitude plot for Support Vector Regression

4.2.3 Multi-layer perceptron regressor

The MLP Regressor network is repeatedly trained and tested by increasing the number of hidden layers until the minimum RMSE is achieved [49]. The minimum value of RMSE is achieved by selecting 200 hidden layers. After a further increase in the number of hidden layers, the value of RMSE starts increasing again [20]. The activation function is chosen as ‘tanh,‘ and the solver is taken as ‘sgd’ in training the model [3]. After completion of the training, the model is validated to predict an earthquake magnitude of 4.4 for all the available datasets of different values of the radius. Figure 10 represents the magnitude values predicted by the MLP Regressor model for different radius. The actual magnitude value considered here is 4.4, which is represented by a green dashed line.

Fig. 10
figure 10

Radius vs. magnitude plot for support vector regression

4.3 Empirical comparison of algorithms applied

Table 2 represents the predicted magnitude values for different values of radius for all the three proposed models.

Table 2 Predicted magnitude values of the proposed regression models

The corresponding plots for all the three algorithms for each radius value when the actual magnitude is taken as 4.4, are shown in Figs. 11, 12, 13, 14, 15, 16 and 17.

Fig. 11
figure 11

Comparison of predicted magnitude value for Radius value 100 when the actual magnitude is taken as 4.4

Fig. 12
figure 12

Comparison of predicted magnitude value for Radius value 200 when the actual magnitude is 4.4

Fig. 13
figure 13

Comparison of predicted magnitude value for Radius value 500 when the actual magnitude is taken as 4.4

Fig. 14
figure 14

Comparison of predicted magnitude value for Radius value 1000 when the actual magnitude is taken as 4.4

Fig. 15
figure 15

Comparison of predicted magnitude value for Radius value 1500 when the actual magnitude is taken as 4.4

Fig. 16
figure 16

Comparison of predicted magnitude value for Radius value 3000 when the actual magnitude is taken as 4.4

Fig. 17
figure 17

Comparison of predicted magnitude value for Radius value 1500 when the actual magnitude is taken as 4.4

Figure 18 provides the empirical comparison of all the three proposed algorithms by plotting a simultaneous magnitude vs. radius curve. As can be seen from the graph, the MLP Regressor shows the maximum similarity points with the actual magnitude compared to other proposed algorithms.

Fig. 18
figure 18

Simultaneous magnitude vs. radius plot for all the three proposed algorithms

4.4 RMSE analysis

The RMSE values for all the three regression models for each value of radius are shown in Table 3. As can be analysed from Table 3, the MLP Regression model of Deep Learning shows the minimum RMSE values for all the radius values. Hence, it can be concluded from this analysis that the MLP Regressor is the best algorithm among all other proposed algorithms for the earthquake magnitude prediction.

Table 3 RMSE values of regression models for each value of radius

5 Conclusion and future scope

This research aims to develop an earthquake prediction model based on the position (latitude and longitude) and depth by using various machine learning and deep learning algorithms, namely Random forest, MLP, and Support Vector Regression and along with this exploratory data analysis is also carried out on the basis of the above-mentioned parameters. To begin with, each of the proposed algorithms is individually analysed by predicting an earthquake magnitude of 4.4 for all the available datasets of different values of the radius. After that, the predicted magnitude values by all the proposed algorithms are compared empirically. At last, RMSE values are calculated for all the three algorithms and analysed. The RMSE values for a minimum radius of 100 comes out to be The RMSE values for a minimum radius of 100 comes out to be 1.731, 1.647 and 1.720 for RF, MLP and Support Vector regression and on a similar note the values for maximum radius 5000 are 0.436, 0.428, 0.449 respectively for the three algorithms .Both of the Empirical Comparison and RMSE analysis shows that the MLP Regressor is the best algorithm among other algorithms for the earthquake magnitude prediction as the error/deviation from the actual value is the least in this case.

Future investigations can be further applied by incorporating various other data mining techniques like J48, AdaBoostM1, CNB, GBRT, and XGBoost, etc. in the research model or incorporating an amalgamation of the algorithms that give the least error, which can possibly increase the model efficiency.