Introduction

Rockburst is a dynamic and uncontrolled nature hazard in mining and civil engineering, which is associated with high stress conditions, and brittle rock (Kaiser et al. 1996; Ortlepp 2005). Normally, due to the sudden release of elastic strain energy, rockburst causes unfavorable phenomena such as bursting, spalling, ejecting, and slabbing in a short time with high velocity, considerably jeopardizing the safety of personnel and directly destroying field equipment and established structures (Kie 1988; Cai 2013). Many cases and accidents associated with rockburst have been widely reported in a lot of countries such as China, America, Australia, Canada, South Africa, and Japan, during underground operations (i.e., tunneling and mining) (Dehghan et al. 2013; He et al. 2018; Zhou et al. 2018). For example, when constructing the Jinping II hydropower station in China (Zhang et al. 2012), many intense rockburst events occurred in tunnels as the overburden depth of the station is over 2525 m (the different rockburst intensity shown in Fig. 1). The rockburst hazard is a worldwide issue, and therefore, its prediction and control technique has a significant role in safety production.

Fig. 1
figure 1

The examples of rockburst in Jinping hydropower station, China. a The moderate rockburst, b the intensive rockburst case 1, c the intensive rockburst case 2, and d the extremely intense rockburst (modified after Zhang et al. 2012)

In recent decades, numerous research works related to the mechanisms, types, and hazard control methods of rockburst have been proposed according to field studies, laboratory tests, and theoretical analysis (Li et al. 2017c; Wu et al. 2019; Xibing et al. 2019). Moreover, several advanced monitoring approaches such as microseismic monitoring system, microgravity, and geological radar have been utilized (Lu et al. 2012). In some cases, they can record and predict before the occurrence of rockburst. However, predicting rockburst accurately is still a difficult task as it is affected by many factors such as stress levels, energy accumulation, rock properties, and geological conditions (Zhang et al. 2017). The long-term prediction of rockburst intensity was normally used in initial stages of construction in many underground activities, for which several models and methods have been presented such as theoretical prediction models normally associated with strength theory and energy theory, empirical knowledge methods, simulation approaches, and artificial intelligence (AI) methods (Peng et al. 2010; Zhou et al. 2019, 2020b, a; Farhadian 2021; Wang et al. 2021).

Among the above methods, AI methods partially known as machining learning-based models have been successfully applied in the prediction and classification of rockburst (Feng and Wang 1994; He et al. 2015). For example, there are various artificial intelligence methods for predicting long-term rockburst hazards such as support vector machines, cloud models, fuzzy comprehensive evaluation methods, artificial neural networks, Bayesian networks, and decision trees. The relevant studies were summarized in Table 1 containing accuracy and the number of data. It should be pointed out that although the adopted machine learning methods in the previous studies are available to predict rockburst, they still have some disadvantages such as long training time, the random selection of hyperparameters in algorithms, and low accuracy performance with larger number data (Huang et al. 2020b, a2021; Sun et al. 2020, 2021; Zhang et al. 2020; Wu et al. 2021). Therefore, with a large dataset, it is necessary to develop a high-efficiency and high-accuracy model for the prediction of rockburst.

Table 1 Previous studies on rockburst prediction with specific accuracy and data by machine learning methods

In recent years, random forest (RF) is one of the most popular machine learning approaches for regression and classification and is has been widely employed in various areas such as civil engineering, mining engineering, and material science (Qi et al. 2018; Sun et al. 2019a; Zhang et al. 2019). RF is a robust and simple classifier and therefore the researchers and readers could utilize it for predicting rockburst easily in practice (Breiman 2001). It is reported that few studies used the RF model to classify the rockburst (Dong et al. 2013; Zhou et al. 2016a; Lin et al. 2018), and however, the accuracy is poor with large datasets because of lacking an efficient method to tune its structure intelligently (Sun et al. 2019b, c, 2020). To achieve the best structure, key parameters are known as hyperparameters in RF (the number of the trees, and the minimum leaf node) need to be optimized. The firefly algorithm (FA) is a global optimization algorithm, which shows better performance on both global optima and local optima, compared to the widely used random search method and particle swarm optimization (PSO) (Yang and He 2013). Therefore, FA can be used for searching the optimum parameters in RF, and then the ensemble method RF-FA is presented. To the authors’ knowledge, only a few research used the ensemble approach for rockburst classification and no relevant reports about the RF-FA model in this area.

To fill the above gap in rockburst prediction by machining learning-based model, this paper presents and assesses an ensemble model by the combination of RF and FA to classify rockburst in underground engineering. A large dataset was collected in previous studies representing the rockburst events worldwide. A five-fold cross-validation method and FA were used for selecting the optimum hyperparameters of RF. Then the combined RF-FA ensemble classifier was validated by the testing dataset. Compared with the conventional empirical criteria and existing RF models, the proposed ensemble classifier performed better on the accuracy. Therefore, the RF-FA was successfully applied to several new rockburst events in engineering projects, indicating that it is a robust method for rockburst evaluation in underground projects.

Dataset description

A total of 279 cases of rockburst reported in the previous literature were collected to evaluate the rockburst intensity (Jingyu 2010; Zhou et al. 2012, 2016a; Li et al. 2017b; Afraei et al. 2019; Ghasemi et al. 2020). The data was first updated and supplemented, as some data are different from the original literature. The data includes five influencing variables for input parameters and one output parameter, namely, rockburst intensity. Specifically, the input variables contain the buried depth of opening (H), maximum tangential stress of the excavation boundary (σθ), uniaxial compressive strength of rock (\({\sigma }_{c})\), the tensile strength of rock (\({\sigma }_{t}\)), and elastic energy index (Wet). These input variables are normally used in rockburst classification and can reflect the basic information of the occurrence of rockburst in underground conditions. The output only contains one parameter (i.e., rockburst intensity) but it has four classes according to the failure properties, i.e., none, light, moderate, and strong. The detailed definition of above-mentioned input and output parameters are summarized in Table 2. According to the dataset, the cumulative distribution and frequency of each input variable are shown in Fig. 2. The statistics of input and output are given in Table 3 and the pie chart (Fig. 3), respectively.

Fig. 2
figure 2

The histograms of rockburst under various influencing variables

Table 2 the description of input and output parameters.
Fig. 3
figure 3

Statistics of rockburst classes

Methodology

The collected dataset was divided into two sets (i.e., training dataset and testing set). The training set was applied for training the classification model and selecting the ideal hyperparameters of the classifier. The testing set was utilized for testing the generalization of the classifier. In this study, the rockburst prediction, as a typical classification problem, can be classified by an intelligent classifier. As the mentioned dataset, the training set contains 195 cases (70%) and the testing set contains 84 cases (30%). The following contents introduced the machine learning model (random forest), global optimization method (firefly algorithms), performance measures, and the procedures of hyperparameter tuning.

Random forest classifier (RF)

Random forest can train and predict samples by multiple trees and its output category is calculated by individual tree(Pal 2005). It has special advantages such as dealing different variables and fast learning speed. The typical structure of the RF classifier is shown in Fig. 4. Now it is well used for constructing a classifier, which can classify different issues accurately.

Firefly algorithm (FA)

FA is inspired by the firefly flicker especially relating its behavior (Yang 2008, 2010). The flow chart of FA is illustrated in Fig. 5. The function applied in updating two fireflies xi and xj is given as follows:

$${x}_{i}^{t+1}={x}_{i}^{t}+\beta \mathrm{exp}\left(-\gamma {r}_{ij}^{2}\right)\left({x}_{j}^{t}-{x}_{i}^{t}\right)+{\alpha }_{t}{\varepsilon }_{t}$$
(1)

where β means the attractiveness, rij means the distance in xi and xj, γ denotes the light absorption coefficient, αt represents a parameter relating to step size, and εt means a number from Gaussian function.

Fig. 4
figure 4

structure of random forest classifier

Performance evaluation methods

The objective of this work is to present an intelligent model for predicting rockburst accurately. Thus, the method of evaluating the performance is of significance. The error matrix, namely, confusion matrix, can visualize model performance in which the row means prediction class, while the column denotes a real class. In this study, rockburst classification can be divided into 1, 2, 3, and 4 presenting none, light, moderate, and strong, respectively. Therefore, the classification of rockburst including correct and incorrect combinations can be summarized in Table 4.

Table 3 The collected input variables

According to the cases in Table 4, some evaluation formulas can be defined such as accuracy, precision, and recall (Zhou et al. 2015).

$$\mathrm{Accuracy}= \frac{11+22+33+44}{n}$$
(2)
$$\begin{aligned}\mathrm{Precision}=&\frac{1}{4} (\frac{11}{11+21+31+41}+\frac{22}{12+22+32+42}\\ &+\frac{33}{13+23+33+43}+\frac{44}{14+24+34+44})\end{aligned}$$
(3)
$$\begin{aligned}\mathrm{Recall}=&\frac{1}{4} (\frac{11}{11+12+13+14}+\frac{22}{21+22+23+24}\\ &+\frac{33}{31+32+33+34}+\frac{44}{41+42+43+44})\end{aligned}$$
(4)

where n represents the total number of samples.

Besides, two typical methods for a classification model evaluation are the receiver operating characteristic (ROC) curve and AUC curve (the area under the ROC). The horizontal axis and vertical axis can be represented by the false positive rate (FPR) and the true positive rate (TPR) in ROC curve. Rockburst hazard determination is a fourfold classification issue, and therefore, the ROC and AUC are necessary for evaluating rockburst correctly. Assume C is the set of the n classes with ci as the positive class Pi and the other classes are treated as the negative class Ni, i.e.,

$${P}_{i}={c}_{i}$$
(5)
$${N}_{i}={U}_{j\ne i}{c}_{j}\in C$$
(6)

Each Ni consists of n − 1 classes, and the ROC curve of ci can be improved with improving prevalence within these classes. For AUC, the AUCs can be weighted by the prevalence of the reference class as follows:

$${\mathrm{AUC}}_{total}={\sum }_{{c}_{i}\in C}\mathrm{AUC}({c}_{i})\bullet p({c}_{i})$$
(7)

where \(\mathrm{AUC}({c}_{i})\) denotes the area under the class reference ROC for ci and \(p({c}_{i})\) is the prevalence of ci.

Model development

In this study, the k-fold cross-validation (CV) method was utilized when the hyperparameters were tuned. The k was chosen as five, which means the dataset was split into five-folds, and the RF model was trained by four-folds while the other one fold is used for testing. The mentioned process would be repeated five times. The hyperparameters in RF such as tree_num and min_sample_leaf were adjusted by five-fold CV and FA. Specifically, during the tuning process, the AUC values can be regarded as the light intensity of fireflies, and these values can reach the largest state when the fireflies showed the brightest light. The initial setting of FA was summarized in Table 5. After tuning, the optimum RF classifier (RF-FA, combination of RF and FA) can be used for further performance evaluation on the testing dataset. Then the proposed ensemble model was compared with the baseline model such as multiple linear regression (MLR) and conventional RF model. Also, the influencing variables were further conducted by the ensemble model. The whole procedure of such an ensemble intelligent classifier RF-FA and its performance evaluation were illustrated in Fig. 6.

Table 4 Error matrix in rockburst classification
Fig. 5
figure 5

Flow chart of firefly algorithm

Results and validation

Results of hyperparameter tuning

Different from the binary classification problem, the rockburst classification is a multiclass classification problem. Thus, a one-vs.-rest method was introduced to address this problem. The main idea of one-vs.-rest method is to train a single classifier in each class with positive samples, whereas the remaining samples are treated as negatives. Specifically, four RF binary classifiers were trained and each RF is for a specific class. During this process, AUC is set as objective function, and the hyperparameters in RF were tuned by FA to construct a combination model RF-FA. Then four RF-FA models are used for unknown samples.

To show the effect of FA on tuning hyperparameters, the AUC values were obtained under each iteration (shown in Fig. 7). The AUC values and convergence shoed different patterns with different classes. With the increase of iteration, the average AUC values in the rockburst dataset (weak, moderate, and strong classes) became stable after 10 iterations, which means the influence of hyperparameters on the performance RF model is obvious. Compared with the strong class, the AUC convergence rate is slower than the moderate and weak class. It is noted that the performance of RF on none class was influenced by hyperparameters significantly, as the AUC value increased sharply at the first iteration and after that, the average AUC value increased slowly until 40 iterations, meaning that the model can predict the larger dataset accurately. The above analysis showed that the hyperparameters of RF were tuned effectively by FA.

Fig. 6
figure 6

The procedures of RF-FA classifier ensemble for rockburst prediction

As can be seen that the optimum AUC values in the weak class achieve 0.98, demonstrating that the RF-FA model has an excellent performance on the training dataset. The final AUC values in the moderate and strong class are 0.87 and 0.85, respectively, which shows that RF-FA also can be regarded as an outstanding classification model. It should be noted that though the average AUC value in the weak class exhibited a relatively low AUC value (0.81) probably because of the small dataset in this class, the model still has a good performance on training weak rockburst dataset. When the best AUC values were obtained, the hyperparameters in RF were correspondingly determined. Then the optimum ensemble RF-FA model can be evaluated by testing the rockburst dataset. The hyperparameters of RF were given in Table 6.

Table 5 The basic parameters of FA

The validation of the proposed RF-FA model on the test set

As the development of the model, the training dataset was used to train the model and selecting the optimum hyperparameters, while the remaining dataset, i.e., testing set, was applied to test the RF-FA model. The confusion matrix on the test set is given in Table 7. It is clear that the accuracy is over 0.90 and fewer data are misclassification, showing that the proposed RF-FA has excellent performance on rockburst prediction and therefore the developed model has been validated.

Table 6 The optimum hyperparameters in each class

Relative importance

To determine the feature importance of the rockburst, the Fourier amplitude sensitivity test (FAST) with the classifier ensemble as the objective function was applied. FAST can express the sensitivity of different variables, and therefore, the importance score of each input variables could be calculated. It is a qualitative global sensitivity analysis, which can sort the input parameters qualitatively according to the sensitivity with less computation. By this method, the influence between input and output can be given by

$${S}_{i}={V}_{xi}/V(Y)$$
(8)

where \({V}_{xi}\) means the variance-based first-order influence for input \(xi\) and \(V(Y)\) is the total variance of the output.

Figure 8 shows the importance score of each variable in classes. The Wet plays the most important role in the rockburst intensity, followed by \({\sigma }_{\theta }\), H, and \({\sigma }_{c}\), while \({\sigma }_{t}\) is not important as above variables. Specifically, the score of the relative importance of Wet is 39.8%, and that value of \({\sigma }_{\theta }\), H is 30.4% and 24.2, respectively. Parameters \({\sigma }_{c}\) and \({\sigma }_{t}\) have relatively small scores (3.5% and 2.1, respectively). By analyzing the above scores, more attentions should pay to Wet, \({\sigma }_{\theta }\), and H in the field, but it does not mean that the other variables are not important as the results are obtained by comparing all influencing factors together.

Fig. 7
figure 7

The evolution of AUC values with different dataset classes by FA tuning

Discussion

Comparison of the ensemble RF-FA model with other models

The prediction performance of the proposed ensemble RF-FA model and other baseline classification, i.e., RF (random forest) and MLR (multiple linear regression), is given in Table 8. It is clear that the RF-FA model performed better (training set 0.95 and test set 0.91). Compared with individual classifier RF and MLR in training set, the accuracy of the proposed RF-FA ensemble classifier increased by 20.2% and 58.3%, respectively. Also, the model RF-FA exhibits the optimum performance on other indicators on the testing set (precision = 0.82, recall = 0.81), while the MLR model achieved the poorest performance on that (precision = 0.48, recall = 0.40).

Table 7 The confusion matrix of the proposed model on test validation

Furthermore, the ROC and AUC values calculated by the three models are shown in Fig. 9. The ensemble classifier RF-FA performed the best compared with the other two models. Specifically, the RF-FA model achieved an AUC value of 0.95, followed by RF (0.83) and MLR (0.71). It is interesting that we found the proposed RF-FA model is the best one in all cases, though the threshold setting is low (below 0.05) or high (above 0.95), which means the RF-FA ensemble classifier exhibits a good ability of generalization and can be used in the prediction of rockburst engineering.

Fig. 8
figure 8

The relative importance of variables

Fig. 9
figure 9

ROC curve of the proposed ensemble RF-FA and individual RF, MLR model

Compared with the existing RF models in previous studies

As described above, few studies applied the conventional RF model on the rockburst assessment. In this section, the proposed ensemble classifier RF-FA was also compared with the previous studies (given in Table 9). The accuracy performance of the proposed model (0.91) is relatively higher than the previous studies (0.73 or 0.61). Interestingly, compared with the previous literature (in total of 246 cases with 6 or 7 input variables), the RF-FA model can achieve higher performance with more cases (279) and fewer input variables (5), which means the proposed model in this study is a robust and efficient method for rockburst prediction.

Comparison between RF-FA and empirical criteria

It is noted that there are some typical criteria of the prediction of rockburst such as rock brittleness coefficient criterion, burst proneness index, and Russenes criterion. In this study, a detailed comparison between the proposed models and empirical criteria was performed. The results are summarized in Table 10. The accuracy of the RF-FA model is much better than the empirical criteria, which means that the conventional models are limited by the single index and a small amount of data. Normally the empirical criteria are conservative and not suitable for all engineering, while the established prediction model in this paper exhibited high accuracy and applicability.

Table 8 Classification performance of ensemble classifiers and baseline models

Application in new engineering projects

The developed RF-FA model was used for predicting eight rockburst events in four different engineering projects in deep tunnels and mines. Specifically, the field data are from Cangling tunnel, Dongguashan mine (Jia et al. 2013), Duoxiongla tunnel (Tang and Xu 2020), and Daxiangling tunnel (Jingyu 2010). The results are shown in Table 11. All cases are predicted correctly, which means that the RF-FA model can be used for rockburst prediction in advance and guide the engineering in practice.

Table 9 comparison of the RF-FA model with previous studies using RF
Table 10 Comparison of the proposed RF-FA model with empirical criteria
Table 11 Engineering application of the proposed RF-FA model

Conclusion

This study proposed a novel ensemble classifier RF-FA combining random forest (RF) and firefly algorithm (FA) to classify the rockburst intensity in underground projects. The conclusion can be summarized as follows:

FA algorithm can tune the hyperparameters of random forest model (individual classifier) effectively, which means that the RF-FA model can be used for further application.

The proposed ensemble classifier performed better accuracy (0.91), in which that figure increased by 19.7% and 121.9% compared with the individual classifier RF and MLR, respectively.

The relative importance results demonstrate that Wet is the most important variable in rockburst classification.

Compared with the empirical criteria, existing RF models, the proposed performed much better on prediction accuracy. Besides, the proposed model was successfully used for predicting rockburst events in new projects.

It is noted that the accuracy and generalization of the machine learning models are influenced by the dataset. In this study, we used 70% of the data to train the proposed model and the remaining 30% of the dataset to validate the model. Normally, the generalization can be improved if we applied all data to train the model to some extent.