1 Introduction

Discrimination of seismic events is one of the important tasks for having a reliable seismic catalogue that can be contaminated by quarry blasts (Horasan et al., 2009). It can create problems on seismic hazard (Ghofrani et al., 2019) and seismicity studies (Gulia & Gasperini, 2021). Seismic catalogues spoiled by quarry blasts can cause high b-values in seismicity studies and may cause the increase of seismic hazard of a region. This may be more serious when seismic and anthropogenic sources overlap (Astiz et al., 2014).

Various methods are developed in order to discriminate seismic sources. Quarry blasts can be treated as isotropic sources since the nature of the explosives is to expand its surrounding in all directions. Because of that, the ratio between P and S wave energies can be used as a discriminator. The difference between body waves are used by using their amplitudes (Horasan et al., 2009; O’Rourke et al., 2016; Tibi et al., 2018), power spectrum (Bennett & Murphy, 1986; Kim et al., 1993; Arrowsmith et al., 2006; Yavuz et al., 2019; Lythgoe et al., 2021), and coda wave decay (Hartse et al., 1995).

Thanks to the increasing number of catalogued earthquakes and explosions, increasing computing powers, and easy-to-use machine learning methods, research to differentiate seismic sources is also developing rapidly. In recent years, several studies are carried out by using Convolutional Neural Networks (CNNs) along with other tools of machine learning methods. Machine learning algorithms may use different types of inputs to predict the outcomes, which in this case would be earthquakes and quarry blasts. For instance, Kuyuk et al. (2011) used several parameters such as spectral ratios and P/S wave amplitude ratios to separate seismic sources. Zeiler and Velasco (2009), Reynen and Audet (2017), Shang et al. (2017), Yavuz et al. (2019), Sertçelik et al. (2020) and Renouard et al. (2021) used multiple amplitude, frequency, and distance information to train a model by using several different machine learning approaches. Johnson et al. (2021) developed a phase picking algorithm and Hourcade et al. (2023) event discrimination algorithm for quarry blasts by using CNNs. Moreover, instead of the predetermined parameters, waveforms are also used for the same goal. Linville et al. (2019) created two different models by using CNN and Long-Short-Term-Memory (LSTM) architectures. They used spectrograms for LSTM and CNN models. Miao et al. (2020) used the entire waveform starting from P wave arrival to the end of coda wave. They used the spectral information of the waveforms to train the model.

Machine learning algorithms are also other aspects of seismic monitoring. Determination of seismic phases (Ross et al., 2018b; Dokht et al., 2019; Woollam et al., 2019; Mousavi et al., 2020) and first motion polarity (Ross et al., 2018a) are studied. Focal mechanisms (Kuang et al., 2021; Zhang et al., 2021) of the seismic events and seismic sources such as earthquakes (Perol et al., 2018; Tang et al., 2020; Yeck et al., 2021; Yang et al., 2021), volcanoes (Titos et al., 2018; Cortés et al., 2019) and geothermal (Holtzman et al., 2018) events and their positions are also analyzed via machine learning tools. Magnitudes of the events are also estimated (Mousavi & Beroza, 2020; Münchmeyer et al., 2021; Majstorović et al., 2021). Early warning studies are also implemented machine learning methods (Lomax et al., 2019; Fauvel et al., 2020).

The purposes of this study are, first, to develop a machine learning algorithm to successfully discriminate earthquake signals from quarry blast signals, second, determining its reliability, third, and finding its applicability in other regions. To do that we are using both time and frequency information of the waveforms that are collected from Austria, Croatia, North-East (NE) Italy, and Slovenia. Moreover, we take the study of Linville et al. (2019) as a baseline to see the comparison between two studies since they used a machine learning algorithm and it is representative of current development in this topic. We train models by using the architecture and data preparation methods used by Linville et al. (2019) and then compare these models with ours. The results are important for our study area since there are mining areas that are located on top of seismically active regions. Hence, it is vital to separate quarry blasts from earthquakes to obtain a convenient seismic catalogue to calculate the seismic hazard of the region properly. Furthermore, we did a statistical analysis to see our model’s stability and to compare with other models. In the end, we implement our model to other explosion database. Afterwards, we try to predict the origin of the seismic signals used by Miao et al. (2020) to see the capabilities of our model in other region.

Outcomes of the study are

  1. (i)

    our model performs well on the separation of seismic sources

  2. (ii)

    and outperforms the baselines that we compare with our model and

  3. (iii)

    we show the relation between the high performance model and data features by comparing our model with the study of Linville et al. (2019) and training our model along with Linville et al. (2019) by using different data sets.

It is found that, it our model along with the model of Linville et al. (2019) perform well in the dataset that are collect from a specific region and fail on different dataset collected from other region. In this study, we discover that previous studies do not perform well in our dataset and to solve the problem of source identification of our study area, we developed a machine learning model with a high accuracy rate.

2 Data

In the study, events detect with the detection algorithm developed by Gallo et al. (2014) and Slovenian Environment Agency (ARSO 2001) are combined. Events from the border between Italy and Slovenia with longitudes of 11.80 and 14.00, and latitudes of 45.50 and 46.56 are selected for the study. We do not discriminate the earthquake sources, whereas ARSO manually investigates the earthquakes and identify quarry blasts. Earthquake signals with epicentral distance less than 150 km are retrieved. The seismic events between September 2002 and April 2021 are collected. The events that are labelled as quarry blasts in the ARSO catalogue were specifically excluded from the Italian National Accelerometric Network (RAN) catalogue (Costa et al., 2022). Earthquakes cover the time range between September 2015 and April 2021 and the explosions between September 2002 and April 2021 with magnitude up to 4.0. In total, 11,821 earthquakes and 9822 explosion records with 3 components out of 1463 earthquakes and 727 explosions are used as an input for the deep neural network (Fig. 1). 142 events that are registered as earthquakes in RAN catalogue are removed after comparing the origin times of the events from ARSO catalogue.

Figure 1
figure 1

Earthquake (red star), explosion (green star), and station (blue triangle) distribution of the dataset. Active fault lines are retrieved from Atanackov et al. (2021) and plotted as black lines

The study area has a complex tectonic regime. There are more than 5 different seismogenic zones defined in the area (Basili et al., 2008). The region between the cities of Trieste and Monfalcone has convergence regime (Tari, 2002) along eastern Adria margin and the Northwest (NW) - Southeast (SE) oriented Idria fault system that created 1511 Idria earthquake with strike-slip deformation (Vičič et al., 2019). In this area, there are active quarries in Monfalcone, North of Trieste (Sežana, Slovenia), Gorizia (Nova Gorica, Slovenia), and East of Koper along with the mentioned tectonic regimes. The other major tectonic regime is located North of Udine. It has thrust fault systems that produced 1976 Friuli seismic sequence (Slejko et al., 1999) along with strike slip faults. Complexity of the tectonic regimes in the region requires a careful monitoring of the seismic events to understand the location of the event and its source characteristics. In this study, we work on the latter to separate mining related events, which are spread along the study area, from the earthquakes.

Time span of the waveforms always start from the origin time of the event and end 90 s after the origin time (Fig. 2a). If the signal has gaps in the seismic records related to the event part, it is deleted, otherwise filled with zeros. The data is detrended and re sampled as 100 Hz. The data is filtered by using four corner butter-worth filter between 1 Hz and 20 Hz. Finally, the records are tapered with 5 % Hann window. No pre-determined signal filtering method (eg. signal-to-noise ratio threshold) is used. The frequency information of the waveforms is also used by using the fast-Fourier transformation (FFT) (Fig. 2b). In total, our database contains 21,643 records (11,821 earthquake and 9822 explosion) with 3 channels.

Figure 2
figure 2

a Example waveforms from an earthquake (top) and a quarry blast (bottom), b and average of normalized FFTs of components of earthquakes and quarry blasts

3 Method

Our proposal is modelled over a CNN. The inputs of our model are 2 different representations of a seismic record: the waveform, s, and frequency content of the record, \({\mathcal {F}}(s)\). In particular, the inputs of the models are waveforms with 9000 sample points (90 s signal), and their FFTs with the length of 4500 data points (up to 50 Hz with 0.011 Hz interval). We remark that our approach differs from previous works since two different sources of information were combined without the need to specify handwritten features. The rationale behind this choice is twofold. The first one is that, the P/S amplitudes between earthquakes and quarry blasts are different from each other. Moreover, coda waves are also different from each other, which can be both visible on waveform and frequency domains. The second one is that, quarry blasts have different frequency information that can be used as a separator (Kim et al., 1993). On top of these, FFTs are calculated in real-time to dynamically filter the waveforms for earthquake monitoring purposes (Gallo et al., 2014). Having a model that requires only the seismic signals and FFT information may easily be adapted to real-time information of the seismic sources.

We designed a CNN that combines the inputs. The CNN is composed of two different convolutional parts that process these two types of information. Convolutional parts process the waveform and FFT information separately. Since the length of these two inputs is different, we implement two separate CNNs instead of a single multi-channel analysis. Waveforms and FFTs are separately analyzed as multi-channel inputs in their parts of the model since we use 3 channels of seismic stations. These two components are responsible for extracting the most interesting features to correctly process the seismic event, avoiding manually describing any parameter. The features extracted are then concatenated to construct a single data vector. Then, this is processed by a fully connected neural network that will take care of classifying the record between earthquake and explosion events. The complete structure of the network is described in Fig. 3. Tensorflow (Abadi et al. 2015) and Keras (Chollet et al. 2015) libraries of Python are used to construct the model.

Figure 3
figure 3

Graphical representation of the model architecture

In order to find the optimal structure, different configurations were explored by a trial-and-error process. Unlike the study of Majstorović et al. (2021) in which optimization functions, learning rates, and batch sizes are optimized, we analyzed the effect of number of convolutional and deeply connected layers, and kernel, pool, and filter sizes of the layers. Then the ideas were converged to the following structure. For the waveform part, 4 layers were used with 64, 32, 32 and 16 filters respectively and a kernel size of 24, 12, 12 and 6. Between all convolutionary layers, a maxpooling layer was placed with a size of 2, with the only exception of the first one where we use a size of 4. For the frequency part, 2 layers were used with 16 and 8 filters and a kernel a size of 2 and 1. Maxpooling sizes are chosen as 6 and 3 for the frequency part. A Rectified Linear Unit (ReLU) was used as an activation function and all the weights are initialized with a Glorot normal initialization (Glorot and Bengio 2010). The fully connected layers are 2 with, respectively, from 20 to 10 neurons for each layer, everyone with a ReLU activation. The last layer is activated through a sigmoid function and the network is optimized with an Adam stochastic gradient descent (Kingma & Ba, 2014).

4 Results

The performance of our model was quantified with different measures: false positive ratio (FPR) and false negative ratio (FNR). FPR is the ratio between the number of false positives (FP), in other words, explosions that are predicted as earthquakes, and the number of FP and true negatives (correctly predicted explosions, Eq. 1a). FNR is the ratio between the false negatives and true positives (Eq. 1b).

$$\begin{aligned} FPR&= \frac{FP}{FP + TN} \end{aligned}$$
(1a)
$$\begin{aligned} FNR&= \frac{FN}{FN + TP} \end{aligned}$$
(1b)

To reduce the variation related to lucky or unlucky data, our proposal was evaluated through a 4-fold cross-validation, and we averaged both indices on all the repetitions. With cross-validation, we repeat the training and testing procedure \(k=4\) times varying the portion of the data used as training and validation sets. Then, we averaged the results among the k repetitions in order to achieve a reliable evaluation. Moreover, to avoid our results being tainted by random initialization of the weights, all the experiments were repeated 5 times, varying the seed used of the random generation, resulting in \(4 \times 5 = 20\) repetitions for each experiment (16,232 signals for each training fold). FNR and FPR values are calculated for validation set of each repetition. The signals are given to the model in maximum 20 epochs. At the end of each epoch, the model is evaluated on a different set of examples (validation set). If the loss function measured on the validation set increases for 2 consecutive times, the training is stopped early to avoid over-fitting. The size of the validation set is fixed to 25 % of the examples used for the training. We trained the neural network on an Intel® Xeon® Gold 6140 CPU @ 2.30GHz with 34 cores and equipped with 196GB of RAM along with Tesla V100 with 16GB of RAM GPU. The duration of the training process is in order of minutes.

4.1 Comparison with Linville et al. (2019)

There are previous studies that are dedicated to solving the same problem as ours. Study of Linville et al. (2019) is selected for the comparison (Linville et al. (2019) Utah, USA). To do that, our data is pre-processed as explained in the study, then the data is given to the 10 models that are created from each fold that Linville et al. (2019) has used (https://github.com/quapity/Utah, last access: 17 June 2022). Waveforms are given to the models in the same order as we give them to our models. The receiver operating characteristic (ROC) data-points of these models can be seen in Fig. 4.

Results of Linville et al. (2019) show that our study is far more accurate both in terms of FPR and FNR metrics. Several reasons may have played a role in the results. The most important factor would be the data. In Linville et al. (2019) most of the quarry blasts are detected by the stations in intermediate distances and the earthquakes by near-source distances, whereas in our study, records are coming from all distant ranges for both types of seismic sources. Moreover, the data is recorded by only the vertical component stations, the horizontal components are given as zero. In contrast, for this study only the stations with 3 components were retrieved.

Table 1 Comparison between our model, retrained model of Linville et al. (2019) with the data from our study, and Linville et al. (2019) model

To see the effect of the data on the method of Linville et al. (2019), a new model was created by using our dataset with the data pre-processing and the architecture of Linville et al. (2019) NE Italy). The results can be seen in Table 1. The results are improved dramatically thanks to waveforms that are coming from our dataset. It can be said that Linville et al. (2019) models have learnt the features that can be applicable for their study area but not ours. The results of the new model with Linville et al. (2019) architecture are very promising, but they show slightly worse performance indices compared with our proposal. In order to get a meaningful comparison, we perform a statistical test among all the repetitions by means of a Wilcoxon signed-rank test with a Bonferroni correction. From the statistical comparison, it is evident how our model is better than the other proposal with a confidence level of 99 % (\(\alpha = 0.01\)): the comparison with Linville et al. (2019) NE Italy shows that our model is better with a p-value of 0.0005 (0.0015 after correction) for the FPR index and with a p-value of 0.0001 (0.0003 after correction) for the FNR.

4.2 Model Tuning

Different configurations of the model were explored to find the most suitable solution. More in detail, we experimented with 10 different models whose details can be seen in the Electronic supplement. To find the best model, the performance of the models are visualized by using ROC (Fig. 4) and their performance in terms of area under the curve (AUC) are measured. The Figure showed that all models are almost identical in terms of AUC. Model 5 is selected as the main model with its relatively higher AUC.

In Fig. 4 there are two clusters among the studies. Models trained with the data from Utah, the USA by Linville et al. (2019) show significantly low performance, whereas 10 models developed and the trained model of Linville et al. (2019) with the data from our study show high performance. On the right side of the figure, only the high-performance results are presented. One can see that they all have high True Positive Rate (TPR, \(1-\text {FNR}\)) and low FPR. In fact, the difference between our models are not statistically significant. However, the model with the best performance among 10 models was selected and that is our model. Even though the retrained model of Linville et al. (2019) has high performance, our model outperforms it (Table 1).

4.3 Comparison with Miao et al. (2020) Dataset

To understand the capability of our model in another dataset, we used the study of Miao et al. (2020). Dataset consists of 4406 event in which 4256 are quarry blasts collected in Kentucky, USA. In total, there are 57,680 records for quarry blasts and 2081 records for earthquakes. To have a balanced database, 2081 records are randomly picked from quarry blast records. The same procedure for the training method is followed along with the cross-folding. Data pre-processing is carried out in the previous sections and predictions capabilities of our model along with the model that we created by using the architecture of Linville et al. (2019) can be seen in Table 2. Furthermore, we trained a model by using the dataset of Miao et al. (2020) by using our model architecture along with the architecture of Linville et al. (2019). Their results are presented in Table 2. In order to compare all the results among all the repetitions, we perform a statistical validation using a Wilcoxon signed-rank test with a Bonferroni correction.

Table 2 Comparison between our model and model of Linville et al. (2019) with the data from our study, and the study of Miao et al. (2020)

5 Discussion

As we mentioned in the Sect. 1, several previous studies have also tried to solve the problem of distinguishing quarry blasts from earthquakes. To see the performance of our model, the study of Linville et al. (2019) was used as a comparison study. We cannot fully compare our model with Linville et al. (2019) using the same dataset, because the data is not completely available. Hence we limited the comparison by evaluating the trained models of Linville et al. (2019) on our dataset. Moreover, we trained some models with the very same architecture of Linville et al. (2019) using our dataset. To do that, first the seismic sources of our data were predicted as described in the study. Linville et al. (2019) states that both LSTM and CNN models work equally and with high accuracy. 10 different LSTM models that are trained from each fold were implemented by using the spectrograms of our signals calculated according to the description of the study. TPR and FPR of the models vary between 0.478 - 0.807 and 0.281 - 0.705, which are significantly worse than our model (Fig. 4). Several factors may play a role in the significant differences among these studies.

Figure 4
figure 4

(a) ROC performance of developed models along with models of Linville et al. (2019), (b) zoomed in view of the figure on (a)

First of all, Linville et al. (2019) trained their models by using the data from Utah, the USA, whereas our data is coming from NE Italy. In both seismic sources, frequency content can be different due to explosive related parameters (eg. explosive type) or some frequencies may not reach the recorder due to attenuation. Deep and shallow velocity structure may play a role on attenuation. Secondly, to expose the distinctive features of earthquakes and quarry blasts, different studies use different types of data. In Linville et al. (2019), most quarry blasts data are coming from longer distances, unlike their earthquake counterparts. In our study area, both of the sources have both near-source and intermediate distances. These variable may play role on the waveform and its frequency content. A comparison between a sample taken from our dataset, one from the portion that we retrieved from Linville et al. (2019), and one fromMiao et al. (2020) can be seen in Fig. 5. The figure highlights that frequency contents of each class are different among not only these dataset but also from our dataset.

Figure 5
figure 5

Average of FFTs of components of earthquakes and quarry blasts from a) Linville et al. (2019) and b) Miao et al. (2020) dataset

However do not have insight about these effects among study areas. It is not the scope of the study to analyze the contribution of the medium or the source effect. on seismic waveforms and their frequency content. Furthermore, we only use 3 component signals, whereas Linville et al. (2019) uses only single-channel waveforms by adding zeros to horizontal components.

Then, a model is trained by using the same architecture and the data processing method that Linville et al. (2019) described. As a result, Linville et al. (2019) model has TPR of 0.987, and FPR of 0.017. It shows the importance of the data for the training of a model. Even though the model that is trained with our dataset is almost good as our models, our models perform statistically better than the model developed by Linville et al. (2019). Our data processing procedure can also be easy to apply to our monitoring activities, since waveforms are collected and FFTs of the seismic records are all calculated in real-time Gallo et al. (2014). Models that use waveforms along with spectrograms that have high accuracy rates were also created, however waveforms and FFTs are more suitable for our data processing routines.

To see the capabilities of our model along with the trained model of Linville et al. (2019) in other settings, we used the database of the study of Miao et al. (2020). As presented in Table 2, models trained on our dataset are failed. Even though the the model of Linville et al. (2019) shows slightly better performance with respect to our model in both FPR and FNR metrics, neither of the model has reliable with having 37.7 to 51.0 % mislabeled earthquakes and quarry blasts in given dataset. On the other hand, the models have better results, when they are trained with the data provided from the study area of Miao et al. (2020). This demonstrates that our CNN architecture can generalize the features from a given dataset. In Fig 4 our models have stable performances along different folds. But neither the weights of the CNN defined in our model nor the models of Linville et al. (2019) are capable of predicting the sources from other datasets. It can be seen in the experiment that we carried out with the dataset of Miao et al. (2020). As presented in Table 2 our model statistically outperforms the model of Linville et al. (2019) when trained with the same data.

In Miao et al. (2020), the data is coming from Kentucky, USA and the features of the data are not similar with the one coming from our study area. It leads models to fail not because the models are weak but because the feature space of the data is diverse. The CNN extracts the features from the given dataset and when the features are different among datasets, weights that are tuned with the given dataset may not work efficiently in another dataset. When the models are trained by using the data from the Miao et al. (2020), both models work efficiently.

There are other methodologies developed to conduct the same task. Sertçelik et al. (2020) developed multiple models to make the source characterisation. Amplitude differences between P wave and S wave (amplitude ratio) developed by Wüster (1993), energy differences between low frequency and high frequency bands (complexity) developed by Gitterman and Shapira (1993) are used for linear (LDF) and quadratic discriminant functions (QDF) and continuous wavelet transform (CWT) to make a decision about the source. In the end 5 different models are created for each station. However, the decision making algorithms dependent to the LDF, QDF, and CWT functions and cannot be used in any other stations since they are tuned for stations separately. Moreover, one needs to detect P and S wave arrivals to use the amplitude ratio approach. Determination of the S wave is also fixed to the time difference between P and S waves which may not always true. In the complexity method, high and low frequency bands are fixed and they may not represent the frequency information of another region. Zeiler and Velasco (2009) used the amplitudes of P, S, Love, and Rayleigh wave amplitudes along with earthquakes magnitudes of \(m_b\) from P and Rayleigh waves, \(M_{L}\), and \(M_{s}\) calculated with these waves. Hard cut-offs are determined by visualizing the results of amplitude ratios of the waves and magnitude differences between different type of magnitudes. Surface waves are not analyzed by our group and it requires further analysis of the waveforms. Different types of magnitudes are also required to carry out this method. Furthermore, the results may be different among regions depending on the soil type and their effects on amplitudes, and magnitudes. Our model, on the other hand, has the information of an overall feature from the given signals and provides a generalized solution.

6 Conclusion

In this study, a solution for the separation of earthquakes from quarry blasts by using artificial neural networks was proposed. To do that, seismic traces from NE Italy and its surroundings that are recorded as a result of earthquakes and quarry blasts were recorded. The earthquake and the quarry blast catalogues are from us, and ARSO, respectively. To separate the seismic sources, 10 models were created and compared with each other to obtain the best model for the problem.

Furthermore, our model was compared with a previously developed model by Linville et al. (2019), using machine learning algorithms. We found that our model has better performance indices with respect to the models developed by the previous study. However, when we use the data from our study area to train new models with the architecture of Linville et al. (2019), we have good results. But Model 5 of our study performs better than the recreated model from Linville et al. (2019) study. This claim was verified with a statistical test that certified how our model is the best one. Moreover, in our method, the waveform and FFTs which are monitored by our group in real-time are used, and it is easier to implement them to the monitoring system with respect to model of Linville et al. (2019) which uses spectrogram. We found that our model is more applicable than the study of Linville et al. (2019) model and it can be used for the separation of earthquakes from the quarry blasts that occurred in the study area. It is important since multiple active seismogenic zones and quarries are located in the same region and our we are not labelling the source of the seismic events. As we mentioned in Sect. 2, there are more than 100 events that are stored in our catalog as earthquakes, in reality they are quarry blasts. Thanks to this study, we, now, have the capability of detecting the blasts without making any change in monitoring parameters.

Above mentioned findings indicate that our model can separate quarry blasts from earthquakes in our study area with high precision and our model has high confidence level in the metrics that has been used for comparison. It is easy-to-apply in our data collection routines and do not require any analytical analysis as we mentioned in Sect. 5. However our model trained, along with the model of Linville et al. (2019) that are both trained with the data from our dataset, fail to separate seismic sources of the database of Miao et al. (2020). On the other hand, when the models re-trained with the data of Miao et al. (2020), they have good results. It, again, shows the importance of the data.

In conclusion, we proposed a model which is generalized for our area of interest works successfully in our study area and it provides consistent results. Models of Linville et al. (2019) do not predict the seismic sources of the events in our region, but it works when a new model trained by using our dataset. Neither of these models are trained with the dataset of Miao et al. (2020) and they fail to predict the seismic sources of the dataset. However, our model has the capability of providing good results when trained with the data from area of interest.