1 Introduction

A rock burst disaster is a phenomenon whereby the elastic deformation potential energy that has accumulated in coal and rock masses located close to excavation spaces or roadways is suddenly and violently released under certain conditions. It is one of the major safety hazards associated with deep well minsssses, and it often leads to serious personnel casualties as well as property damage (He et al. 2010; Mansurov 2001). Therefore, obtaining advanced information on its occurrence is essential. Such information will facilitate early warning, which can effectively bring about a decrease in the associated damage. The fracture-induced electromagnetic radiation (FEMR) method is a typical geophysical monitoring method. The generation mechanism, characteristics, and propagation features of FEMR from coal rocks have been investigated (Baddari et al. 2011; Carpinteri et al. 2012; Freund and Sornette 2007; Frid and Vozoff 2005; Potirakis et al. 2019a). Additionally, the evaluation index, precursor law, monitoring system, data processing method, and software associated with rock burst FEMR monitoring and early warning have also been investigated in several studies (Bahat et al. 2002; Contoyiannis et al. 2016; Das et al. 2020; Frid et al. 2003). Recently, remote monitoring and early warning methods as well as automatic interference signal identification and filtering methods have been developed, and comprehensive early warning guidelines have been established (Qiu et al. 2018; Wang et al. 2011, 2009). Thus, the FEMR monitoring process is convenient, the monitoring data are continuous, and the response to precursory rock burst information as well as early warning is effective (Kumar et al. 2017; Lacidogna et al. 2011; Potirakis et al. 2019b). However, existing rock burst FEMR and early warning analyses often rely heavily on manual procedures, which makes the identification of precursory anomalies difficult (Fukui et al. 2005; Liu and Wang 2018). Additionally, the accuracy and timeliness of the impact hazard identification still need further improvement.

With improvements in computing power as well as the explosive growth of data volume, significant progress has been made in the algorithms and applications of various deep learning models (Bassam et al. 2010; LeCun et al. 2015; Schmidhuber 2015). Deep learning has facilitated the realization of major breakthroughs in computer vision (Karimpouli and Tahmasebi 2019; Krizhevsky et al. 2017; Smirnov et al. 2014; Xiong and Zuo 2016), natural language processing (Kombrink et al. 2011; Lee 2000; Mikolov et al. 2010), and other fields (Shelhamer et al. 2017; Srivastava et al. 2014). In geophysics, the most common applications of deep learning lies in seismic data processing and automation (Sun et al. 2020; Wrona et al. 2018). Besides, deep learning has also been applied in other geophysical methods like convolutional neural networks (CNNs). CNNs, which can not only denoise images (Zhang et al. 2017), but also effectively remove random noise, boast stronger denoising capabilities compared with conventional denoising algorithms (Yu et al. 2019). Deep learning can also be applied to various complex seismic scattering wavefield inversion, lithology identification, fluid identification in the pores of sandstone reservoirs, and reservoir prediction (Spichak and Popova 2000; Sun et al. 2020).

Long short-term memory recurrent neural networks (LSTM-RNNs), which are extensively used in the field of natural language processing, provide new insights for time-series data processing (Donahue et al. 2017; Greff et al. 2017). Unlike CNNs which can effectively process spatial information, RNNs are designed for better processing of temporal information. They use hidden states to store historical information and combine current inputs to determine current outputs (Palangi et al. 2016; Sudakov et al. 2019; Zhao et al. 2017). RNNs are often used to process sequence data such as a paragraph of text or sound. Given that FEMR data belong to a typical time-series data, its processing using RNNs can lead to the realization of intelligent rock burst monitoring and early warning.

In this study, the principles related to RNNs were expounded, and the method employed to identify rock burst precursor FEMR signals was described. Additionally, the early warning effect of the rock burst hazard was investigated in combination with an actual case.

2 Relevant Principles of RNNs

2.1 RNNs with Hidden States

As shown in Fig. 1, when the input data show a time correlation, the hidden variable at time step, t, is Ht, the hidden layer activation function is Φ, the input at time step, t, is Xt, the hidden layer weight parameter at time step, t, is Wxh, the hidden variable at time step, t − 1, is Ht−1, the hidden layer weight parameter at time step, t − 1, is Whh and the hidden layer deviation parameter is bh. Ht was calculated by combining the input, Xt and the hidden state, Ht−1 into the fully connected layer with activation function, Φ, as follows:

$$ H_{t} = \Phi (X_{t} W_{xh} + H_{t - 1} W_{hh} + b_{h} ) $$
(1)
Fig. 1
figure 1

A RNN with a hidden state

Hidden variables are also known as hidden states. Since Ht−1 was used in the calculation of Ht, it implied that the calculation is cyclic, and such a network that uses cyclic calculation is referred to as a cyclic neural network. The calculation of the output, Ot, at the output layer at time step, t, was as follows:

$$ O_{t} = H_{t} W_{hq} + b_{q} $$
(2)

The parameters of the RNN include the weight (Wxh and Whh) and deviation (bh) parameters of the hidden layer, and the weight (Whq) and deviation (bq) parameters of the output layer. There was no increase in the number of parameters, and to determine the hidden state, Ht+1, Ht was employed and fed into the fully connected output layer at time step, t (LeCun et al. 2015).

2.2 LSTM

To address the problem of gradient attenuation or gradient explosion that is common with RNNs and capture the connections between data with large spans of time step distances, a gated RNN model has been proposed. For example, the well-known LSTM-RNN, into which input gates, forget gates, output gates, and memory cells have been introduced to record data (Lee 2000).

As shown in Fig. 2, the input into the LSTM gate was input Xt and the hidden state, Ht−1. The output was calculated from the fully connected layer that is characterized by a sigmoid activation function. The input gate, It, the forget gate, Ft, and the output gate, Ot, at time step, t, were calculated as follows:

$$ I_{t} = \sigma (X_{t} W_{xi} + H_{{t - {1}}} W_{hi} + b_{i} ) $$
(3)
$$ F_{t} = \sigma (X_{t} W_{xf} + H_{{t - {1}}} W_{hf} + b_{f} ) $$
(4)
$$ O_{t} = \sigma (X_{t} W_{xo} + H_{{t - {1}}} W_{{{\text{ho}}}} + b_{{\text{o}}} ) $$
(5)

where Wxi, Wxf, Wxo, Whi, Whf, and Who are weight parameters, while bi, bf, and bo are deviation parameters.

Fig. 2
figure 2

Computation of the hidden state (the multiplication was elementwise)

LSTM, which uses the tanh function as the activation function, requires candidate memory cells, \(C_{t}^{\prime }\), which at time step, t, were calculated as follows:

$$ C_{t}^{\prime } = \tan \,h(X_{t} W_{xc} + H_{t - 1} W_{hc} + b_{c} ) $$
(6)

where Wxc and Whc are the weight parameters, while bc is the deviation parameter.

The calculation of the memory cell, Ct, was as follows:

$$ C_{t} = F_{t} \, \odot \,C_{{t - {\text{1}}}} + I_{t} \odot C_{t}^{\prime } $$
(7)

Additionally, the hidden state, Ht, was calculated as follows:

$$ H_{t} = O_{t} \odot \tan \,h(C_{t} ) $$
(8)

By combining the data in the memory cell, Ct−1, and the candidate memory cell, Ct´, the memory cell, Ct, controlled the flow of data through the input gate, It, and the forget gate, Ft. The input gate, It, controlled how the input, Xt, entered the memory cell, Ct, through the candidate memory cells, Ct´, and the forget gate, Ft, controlled how data in the memory cell, Ct−1, entered the time step, t. If the forget gate, Ft, was always ~ 1 and the input gate, It, was ~ 0, the past memory cells were retained and passed onto the current time step (Kombrink et al. 2011).

2.3 Bidirectional RNNs

In the previous RNN model, data flow from front to back through the hidden states, and the state of the current time step can also be determined using the state of the subsequent time step (Mikolov et al. 2010). For example, when identifying a batch of signal data, the previous data may be modified according to the latter data. Bidirectional RNNs process data by adding hidden layers that flow from back to front. Figure 3 shows the architecture of a bidirectional RNN with a single hidden layer.

Fig. 3
figure 3

Architecture of a bidirectional RNN

Given that the activation function of the input, Xt, and the hidden layer is Φ at a time step, t, let the forward hidden state in the bidirectional RNN be \(\overrightarrow {H}_{t}\) and the reverse hidden state be \(\overleftarrow {H}_{t}\). Thus the forward hidden and reverse hidden states were calculated as follows:

$$ \begin{aligned} \overrightarrow {H}_{t} = \Phi \left( {X_{t} W_{xh}^{(f)} + \overrightarrow {H}_{t - 1} W_{hh}^{\left( f \right)} + b_{h}^{(f)} } \right) \end{aligned} $$
(9)
$$ \begin{aligned} \overleftarrow {H}_{t} = \Phi \left( {X_{t} W_{xh}^{(b)} + \overleftarrow {H}_{t + 1} W_{hh}^{\left( b \right)} + b_{h}^{(b)} } \right) \\ \end{aligned} $$
(10)

where Wxh(f), Whh(f), Wxh(b), and Whh(b) are the weight parameters, while bh(f) and bh(b) are the deviation parameters.

The output layer output, Ot, was calculated as follows:

$$ O_{t} = H_{t} W_{hq} + b_{q} $$
(11)

Here, Whq and bq are the weight and deviation parameters of the output layer of the model, respectively.

3 Method of identifying rock burst FEMR precursor signals

3.1 Framework of the Method of Identifying FEMR Rock Burst Precursory Signals

As shown in Fig. 4, the rock burst FEMR precursor signals are identified in accordance with four major steps, namely FEMR data collection, FEMR data pre-processing, intelligent rock burst precursor signal identification, and early warning of rock burst danger.

Fig. 4
figure 4

Framework of the method of identifying rock burst FEMR precursor signals

3.2 2.2 Procedures for Identifying the Rock Burst FEMR Precursor Signals

  1. 1.

    FEMR Data Collection: The FEMR data were collected at 20-s intervals using a coal-rock dynamic disaster acoustic and electrical monitoring system (Qiu et al. 2018; Wang et al. 2011, 2009) installed at 30 m along the groove on the west working face of the Daanshan Coal Mine. The KBD7 FEMR sensor (Wang et al. 2011), which maintains a 30-m distance from the working facing, moves with the advancement of the working face. The data were transmitted to the ground server via the FEMR sensor, the monitoring substation, and the industrial ring network switch. Thereafter, it is transmitted via the internet in real time to the laboratory server.

  2. 2.

    FEMR Data Pre-Processing: The FEMR data were pre-processed using an averaging method, and hourly data were used as the arithmetic mean value in chronological order. Thereafter, the set of all arithmetic mean values was used as the FEMR data set. The averaging method (Liu and Wang 2018) is advantageous, because it offers the possibility of eliminating a large part of the interference data while effectively preserving the integrity of the original data. Considering the extremely large number of data (all the FEMR data collected from May, 2015 to July, 2018) used by the RNN model, the averaging method was adopted to pre-process these data, because this method has two advantages. First, it eliminates the interference in FEMR data. Second, it compresses the FEMR data and thereby promotes the efficiency of calculations performed by the model. With the aid of this method, the model can capture the connections between the FEMR signals over a longer time span, thereby recognizing the signals more accurately.

  3. 3.

    Intelligent Identification of Rock Burst Precursor Signal: The intelligent rock burst precursor signal identification method uses a bidirectional RNN model (Fig. 5) to analyze a large number of normal FEMR signals and rock burst FEMR precursor signals so as to quickly and accurately identify the danger in the signals. The establishment of the RNN model is to use the bidirectional RNN framework in Fig. 3, where the hidden state is recorded by the LSTM unit in Fig. 2. The bidirectional RNN processes data through the bidirectional flow of hidden state, and the LSTM unit introduces input gates, forget gates, output gates, and memory cells to better record the hidden state. The forward calculation of the forward hidden state \(\overrightarrow {H}_{t}\) and the reverse hidden state \(\overleftarrow {H}_{t}\) in the RNN model is shown in formulas (9) and (10), and the specific calculation of the hidden state Ht is shown in formula (8). The output of the output layer is shown in formula (11). To create the RNN model, MXNet and advanced API-Gluon were used (Wang et al. 2018). Thereafter, the model was trained on the GPU for supervised learning (Shi et al. 2018).

  4. 1.

    Production of Data Sets: To train an effective RNN model, a large and diverse data set should be provided for the model. At present, there has been no public training set applicable to the FEMR data from coal and rock, and it is rather difficult to establish a coal and rock FEMR training set for three reasons: First, the complex underground mining environment increases the difficulty in signal acquisition. (2) The marking of training samples requires considerable labor. (3) Private companies do not share FEMR data.

    After a long period of on-site collection and manual labeling, we have completed the production of original training, validation, and test data sets. Among the FEMR data from Daanshan Coal Mine, 60%, 20%, and 20% were used as the three sets, respectively. Based on on-site records of the dangerous situation, the FEMR training and validation data sets collected during measurements at the Daanshan Coal Mine were marked as “normal” or “dangerous”. (Each data set marked “normal” contained 200 time-series FEMR data under normal or interference conditions, and each marked “dangerous” contained all the FEMR precursor data monitored for each rock burst.) The three data sets of the Daanshan Coal Mine are shown in Fig. 6 where various interference conditions including drilling, roof support, guns taken, cables, electrical equipment, sensors, etc. have been marked. These interference signals, caused by the work of the working face and personnel activities, are intense and increase suddenly. They are marked as “normal” due to their obvious interference characteristics. After a long period of on-site collection, we have collected abundant rock burst FEMR precursor data, and increased the number and proportion of samples with “dangerous” labels through repeated sampling and random promotion of disturbances in the precursor data[]. In this way, the ratios of positive and negative samples in the three data sets are balanced. Finally, we obtained a new data set based on the original data set. Among the 15,647 training samples in the new data set, 6000 are “dangerous” signal samples, and the rest are “normal” signal samples. Among the 5230 validation samples, 2000 are “dangerous” signal samples. The number of test samples is 5000, and they are randomly selected. Typical cases are shown in Figs. 7 and 8.

    All the data in the three sets are from the west working face of the Daanshan Coal Mine. Despite their identical distribution, they are independent of each other. The training set is used to fit the parameters in the network. The validation set is used to adjust the hyperparameters. The test set is used to test the performance of the trained network. As the number of collected data grows, the training set will contain more and more data covering various interference conditions, and the training effect will be improved. In this way, the model can respond well to all kinds of interference and become generalized.

  5. 2.

    Creation, Training, and Testing the Models: By reading the training data set, creating data iterators, adopting bidirectional LSTM, and outputting classification results, the complete bidirectional LSTM-RNN model was defined, after which the model was trained and the prediction function was defined. After training, data in the validation set were read; the RNN model parameters were adjusted; the RNN model was optimized. The optimized hyperparameters are as follows: the learning rate is 0.01; the epoch is 300; the optimization method is Adam; the loss function is softmax cross entropy; and the number of depth layers is 9. The training and validation losses are shown in Fig. 9. The training and validation losses are both small after 300 epochs, which is indicative of an excellent fitting effect. In the case described here, the training process takes about 4 h. Finally, the FEMR signal sequence was fed into the RNN model to judge whether the signal sequence possessed rock burst danger. The recognition speed of the model on the GPU is very fast (0.02 s on average), which is feasible for large-scale data sets.

  6. 3.

    Early warning of rock burst danger: The precursor signal recognition results and the development trend of FEMR signals were analyzed comprehensively. If the result of the intelligent rock burst prediction was “danger” and the FEMR signal exhibited a tendency to increase, then it could be judged that there was rock burst danger. In this case, the danger information would be sent.

Fig. 5
figure 5

Bidirectional RNN model

Fig. 6
figure 6figure 6figure 6

ad Original data of FEMR training set. ef Original data of FEMR validation set. g, h Original data of FEMR test set

Fig. 7
figure 7

a, b Actually collected FEMR signals for marking dangerous labels. c, d Examples of FEMR signals marked as “dangerous” with randomly increased disturbance

Fig. 8
figure 8

ac Examples of FEMR signals marked “normal”

Fig. 9
figure 9

Comparison between the training loss and the validation loss of RNN model

4 Examples of Early Warning of Rock Burst

After optimizing the RNN model, the results of the identification of the intelligent rock burst danger precursor signals based on the FEMR test data set for the Daanshan Coal Mine are shown in Fig. 10. On the west working face of the Daanshan Coal Mine, from February 19, 2018, the intensity of the FEMR signal began to increase gradually. By February 23, the signal fluctuated dramatically, exceeding the warning threshold, and this intense FEMR signal persisted until February 25, on which a level 2.1 rock burst occurred. Inputting the FEMR signal sequence from February 19 to 25 into the RNN model, the recognition results shown in Fig. 11a were obtained (the shaded part in the figure represents the FEMR danger signal). The identification result of the precursor signal was consistent with the label marked “danger” as well as the on-site impact pressure record, suggesting that the parameters considered for the RNN model are pretty reasonable.

Fig. 10
figure 10

a, b Identification of intelligent shock hazard precursor signals using the test data set

Fig. 11
figure 11

Recognition result of intelligent rock burst precursory signal

From June 21, 2018, the FEMR signal showed a growing trend, and by June 24, the signal had surged notably. Using the RNN model, the rock burst danger was identified from June 21 to 24, and a hazard warning was issued. The recognition result is shown in Fig. 11b (the shaded part in the figure represents the hazard FEMR signal). From June 23 to 25, the west working face of the Daanshan Coal Mine showed an obvious rock pressure as well as an increased resistance to support. In the evening of June 25, measures were taken to stop production and release pressure, leading to a significant decrease in the strength of the FEMR signal. However, from June 28, its intensity began to increase intermittently, and by July 11, it fluctuated sharply. The RNN model identified the rock burst danger from July 5 to 11, and a hazard warning was issued. The recognition result is shown in Fig. 11c (the shaded part in the figure represents the FEMR danger signal). From July 11 to 14, the west working face of the Daanshan Coal Mine again showed an increased resistance to support. The mine underwent pressure relief blasting measures in the early hours of July 14, and the FEMR signal dropped significantly and regained stability, eliminating the rock burst danger.

In summary, the early rock burst prediction results based on the FEMR precursor signal recognition method for rock burst matched well with the drill cuttings, mine pressure appearance, and pressure relief records on the west working face of the Daanshan Coal Mine. Information on the rock burst danger could be captured in advance.

5 Discussion

5.1 Is the Precursor Signal Recognition Method Based on RNNs Better than the Traditional Method?

The traditional FEMR precursor signal identification method generally collects the FEMR signals, and manually analyzes their time sequence characteristics and amplitude characteristics (Qiu et al. 2018; Wang et al. 2009). It is of low timeliness and accuracy and is dependent on experience. Although its judgment result generally agrees with the on-site record, it may make some wrong judgments under certain circumstances. For example, it tends to make mistakes when identifying the precursor signals under the interference of drilling and roof support or the precursor signals below the critical value, thus affecting the judgment of rock burst danger.

Compared with the traditional one, the FEMR precursor signal recognition method introduced in this paper does not require manual analysis of FEMR signal data, and the trained RNN model is automated (without requiring parameter adjustment) and efficient on the GPU. The RNN model can identify the two non-supercritical rock burst hazards in Fig. 10, because it is good at capturing the relationship between FEMR signals within a long span. By storing long-term FEMR signal growth in hidden states, the accuracy of rock burst FEMR precursor signal identification is enhanced. Besides, in the real data set, the model can realize automatic/intelligent recognition without manual intervention, thus saving labor and time and meanwhile reducing ambiguity. In the case described here, the training process takes about 4 h, which may be considered time-consuming. The recognition speed of the model on the GPU is very fast (0.02 s on average), which is feasible for large-scale data sets. Moreover, with the more data collected, the identification effect becomes better, so do the timeliness and accuracy of the model, which is conducive to the identification of rock burst danger.

6 Conclusions

  1. 1.

    In this study, a method for identifying rock burst FEMR precursor signals, primarily composed of FEMR data collection, FEMR data pre-processing, intelligent rock burst precursory signal recognition, and rock burst early warning, was proposed. With this method, it was possible to comprehensively analyze the development trend of FEMR signals. The method can realize automatic, efficient, and intelligent identification of rock burst precursor signal without requiring manual intervention. Based on the intelligent precursory danger signal recognition results, the intelligent early warning of the dangerous situation of rock burst was achieved.

  2. 2.

    The results of the application of the model to on-site early warning showed that it responds well to the occurrence of rock bursts, and can capture information on rock burst hazards in advance.