1 Introduction

In the construction process of deep hard rock mines and tunnels, rockburst is a common type of geological hazard. Rockbursts generally occur because of the accumulation of energy within the rock mass due to external factors that cause a sudden release of disturbance [51]. A rockburst disaster at the Altenberg tin mine in Germany in 1640 caused heavy damage which could be the earliest recorded rockburst disaster [38]. Today, rockbursts occur extensively in deeper hard rock mines in many countries, e.g., South Africa [23], Canada [21], Western Australia [15], the USA [7] , and China [57]. In addition, there are also rockburst cases in tunnel engineering in Norway [3] and China [53]. In deep hard rock mines and tunneling projects, rockbursts can lead to varying degrees of impact on equipment, and serious rockburst disasters even threaten personnel safety [59]. Due to a greater demand for resources, mining and construction activities are going deeper underground compared to the past. These increasing geotechnical activities can exacerbate rockbursts [62].

Microseismic monitoring (MS) technology is a kind of three-dimensional monitoring technology for monitoring microseismic events caused by rock cracking, which has been used in the field of rockburst monitoring for many years[69]. By placing sensors in different directions, microseismic waves emitted by rock fractures can be monitored. Further analysis of microseismic waves can obtain the location, time, intensity, and type of rock fracture [12]. Therefore, rockburst hazards can be predicted according to microseismic monitoring information. At present, microseismic monitoring has become one of the common technical means in deep hard rock mines [33]. Mendecki et al. [34] pointed out that seismic quantitative analysis becomes a crucial part of the safety of gold mining in South Africa. In 1990, AngloGold Mining Company of South Africa invented a system named “robot” to quantify earthquake disasters which were improved into the later Routine Rating of Seismic Hazard (RRoSH). After that, this system was widely developed in more than a dozen gold mines in South Africa. In 1997, Poplawski [40] applied an approach called “departure indexing” to predict rockburst hazards. Becka and Brady [4] proposed a “cell evaluation” approach based on numerical simulation and analysis to conduct a quantitative analysis of earthquake risk. Trifu and Suorineni [46] applied a microseismic monitoring system in the field of rockburst prediction; Li et al. [24] used microseismic monitoring techniques for rockburst hazard assessment in underground engineering. Andrzej and Zbigniew [2] suggested that the characteristics associated with microseismic events could be the basis for rockburst risk assessment and used this to conduct a risk assessment of the Zabrze-Bielszowice coal mine in Poland. However, the traditional method which was used to judge MS monitoring information depends on the manual implementation, and it is difficult to achieve sufficient speed and accuracy.

In recent years, there has been an increased focus on AI technology in the field of geotechnology [8, 18, 70, 43, 48, 57, 58, 63,64,65,66,67]. Especially in the field of rockburst prediction, a variety of rockburst prediction techniques have been developed based on different machine learning methods. Zhou and Gu [56] proposed a geographic information system based on the artificial neural network (ANN) to evaluate rockburst propensity. Su et al. [45] and Zhou et al. [59] used the K-nearest neighbor algorithm (KNN) to predict rockburst and achieved good results. Adoko et al. [1] attempted to combine an ANN with a fuzzy inference system to analyze 174 rockburst events to predict rockburst. The results show that the fuzzy inference system performs a crucial part in the estimation of rockburst intensity. Zhou et al. [57, 59] applied the SVM algorithm with different kernels to rockburst prediction and achieved good results. Besides classical machine learning models, other methods such as quadratic discriminant analysis (QDA) [59], Naive Bayes (NB) [59], and Bayesian network [25] have achieved good results in rockburst prediction. Compared with traditional methods that rely on manual judgment and processing of information, these emerging prediction models using artificial intelligence technology can effectively reduce errors caused by human factors and process large volumes of data that are difficult to be processed by traditional methods more quickly and effectively. More importantly, the model established by using AI technology can perform better in solving the problem of complex and fuzzy relationships among the characteristic values of datasets [22].

Although there are many scholars who have done research on different rockburst prediction methods [62], few scholars consider ensemble learning to analyze the monitored MS data and provide real-time and effective rockburst hazard warning for underground engineering.

Support vector machines (SVM) are gaining attention as a well-performing machine learning model in geotechnical engineering as well as in construction engineering [13, 71]. SVM has an uncomplicated structure and fewer controllable variables, and the difficulty of the calculation has nothing to do with the spatial dimension of the sample. Therefore, compared with an ANN, SVM is based on a more solid theoretical foundation and can better deal with nonlinear, high-dimensional, and small-sample problems [55]. Bi et al. [6] compared several types of machine learning models, such as KNN, RF, and SVM, for the classification of microseismic events, and the results showed that SVM has the most outstanding ability to recognize microseismic events. Pu et al. used the method of SVM to predict kimberlite rockbursts and achieved good prediction performance [41], Zhou et al. proposed SVM hybrid models to classify and assess the long-term rockburst hazard of underground cavities, possessing a high prediction precision [57]. The aforementioned research has demonstrated that SVM has a fair level of robustness in predicting rockbursts; thus, this paper will employ the SVM method to categorize the different classes of rockburst hazards. In addition, the particle swarm optimization (PSO) algorithm is an algorithm that simulates the predatory behavior of birds to continuously search for optimal solutions. Compared with traditional algorithms, on the one hand, PSO is easy to implement and has fewer adjustable parameters. On the other hand, it has a strong global search ability for nonlinear and multi-peak problems [32]. Therefore, it is widely used by researchers. Harris hawk optimization (HHO) algorithm [16] and moth flame optimization (MFO) algorithm [37] are optimized algorithms that simulate the process of Harris hawk predation and moth being attracted by flame respectively, both of which have the characteristics of few adjustable parameters and fast convergence. It performs well even on higher dimensional and more complex problems. Based on these three optimization algorithms, three hybrid classification models, PSO-SVM, HHO-SVM, and MFO-SVM, are constructed for rockburst prediction using MS monitoring information.

In this article, first, the background of the algorithm has been introduced. On this basis, the framework of the model is described. And then, a dataset containing 343 microseismic monitoring samples from Dongguashan Copper Mine is established. After the previous preparatory work was completed, this paper developed a three-class hybrid model to predict the rockburst hazard level. Finally, the classification capabilities of the three hybrid models (PSO-SVM, HHO-SVM, and MFO-SVM) are comprehensively compared by using multiple indicators, and the best hybrid classification model for rockburst prediction is obtained.

2 Materials and Methods

2.1 Support Vector Machines

Support vector machines (SVM) is a machine learning algorithm of a binary model proposed by Vapnik [47]. It maps feature items corresponding to instances to a subset of points in space. Then, the points are classified by the hyperplane found in the model, to achieve the effect of classifying the input data, as shown in Fig. 1.

Fig. 1
figure 1

Schematic diagram of SVM. a Classification principle in one-dimensional space for nonlinear classification problems. b Classification principles in multidimensional space for linear classification problems

The support vector in SVM is a number of sample points scattered around the hyperplane. Margin is the sum of the distances between the support vectors distributed on each side of the hyperplane to the hyperplane, which can be indicated by the following formula:

$${\text{Margin}}=\frac{2}{\| \omega \| }$$
(1)

The larger the distance between support vectors, the easier it is to search for the most suitable hyperplane, that is, the value \(\omega\) should be minimized, to minimize the influence of sample local disturbance on the model and generate the most robust results. Depending on whether the training set data is linearly separable, the solution of the optimal classification hyperplane is also different.

By integrating constraints into the optimization objective function, the Lagrange formula [20, 28] is established, as shown in formula (2), where \({\alpha }_{i}\) is the Lagrange coefficients and the same to\({\beta }_{i}\).

$${L}_{P}=\frac{{\| \omega \| }^{2}}{2}+C\sum_{i=1}^{l}{\xi }_{i}-\sum_{i=1}^{l}{\alpha }_{i}[{y}_{i}({\omega }^{T}{x}_{i}+b)-1+{\xi }_{i}]-\sum_{i=1}^{l}{\beta }_{i}{\xi }_{i}$$
(2)

For various divisions of the training set data, the SVM uses various planning strategies. For linear non-separable problems, a kernel function should be established firstly:\(K({x}_{i},{x}_{j})=\varphi ({x}_{i})\varphi ({x}_{i})\). To construct the optimal classification hyperplane, the input samples in the original space need to be mapped into the high-dimensional feature space H using a nonlinear mapping \(\varphi :{R}^{d}\to H\) [54]. The optimal decision formula is shown in formula (3) [64, 68]:

$$f(x)={\text{sign}}[\sum_{i=1}^{l}{y}_{i}{\alpha }_{i}K({x}_{i},x)+b]$$
(3)

According to the problem of linear divisibility and linear indivisibility of SVM, it is not difficult to find that SVM is defined only for binary classification problems. If you want to solve the multi-classification problem, you need to make further improvements. At present, there are two main methods for constructing a multi-classification SVM model: direct method and indirect method. The direct method is to directly solve and calculate the multi-classification function suitable for the problem to be solved. The indirect method is to realize multiple classifications by constructing and combining multiple SVM models. The commonly used methods are as follows: one-to-one method and one-to-many methods.

In the SVM, a node represents a support vector, and the output is a linear combination of the intermediate nodes, as shown in Fig. 2. Kernel functions commonly used in SVM include the d-order polynomial kernel function, linear kernel function, radial basis kernel function (RBF), and Sigmoid kernel function with parameters k and θ. The RBF is used in this study. In RBF-SVM, different combinations of hyperparameters c and g play a crucial role in classification capability. Therefore, in order to obtain better classification performance, PSO, HHO, and MFO are used to optimize the hyperparameters c and g.

Fig. 2
figure 2

The structure of support vector machines

2.2 Particle Swarm Optimization

Particle swarm optimization (PSO) algorithm seeks optimal solutions through mutual cooperation and information sharing among individuals of a group [10], which is inspired by the predatory actions of birds. PSO has the advantages of few adjustable parameters and simple implementation and is widely used in image processing, function optimization, vehicle driving road optimization, and other fields [5, 9, 36, 27]. The system structure of the PSO is shown in Fig. 3. This algorithm first initializes a group of random particles, which have two properties of velocity \({v}_{i}\) and position \({x}_{i}\). In each iteration, the particles will update themselves according to two “extreme values” p and g, and the formula is as follows:

Fig. 3
figure 3

The architecture of particle swarm optimization

$${v}_{i}(t+1)={v}_{i}(t)+{c}_{1}{r}_{1}[{p}_{i}(t)-{x}_{i}(t)]+{c}_{2}{r}_{2}[{p}_{g}(t)-{x}_{i}(t)]$$
(4)
$${x}_{i}(t+1)={v}_{i}(t+1)+{x}_{i}(t)$$
(5)

where \({x}_{i}(t)\) represents the position and \({v}_{i}(t)\) represents a velocity of the tth iteration; \({p}_{i}(t)\) represents the historical best location where the ith particle is found; \({p}_{g}(t)\) is the historical sweet spot where all particles have been found; \({c}_{1}\) and \({c}_{2}\) represent learning factors, and both of them are 2; \({r}_{1}\) and \({r}_{2}\) represent random numbers between [0,1].

2.3 Harris Hawk Optimization

Harris hawk optimization (HHO) algorithm was proposed by Heidari et al. [16] taking inspiration from simulated Harris hawk predation. HHO has a great global search ability and few adjustment parameters. The process of HHO is mainly composed of three parts: the search stage, the search and development transformation stage, and the development stage.

  1. (a)

    In the search stage, the location of the Harris hawk is randomly selected, and the prey is searched through two strategies, as shown in Eqs. (6) and (7).

    $$X\left(t+1\right)=\left\{\begin{array}{c}{X}_{rand}\left(t\right)-\left.{r}_{1}\right|{X}_{rand}\left(t\right)-2{r}_{2}\left.X\left(t\right)\right|,q\ge 0.5\\ \left[{X}_{p}\left(t\right)-{X}_{m}\left(t\right)\right]-{r}_{3}\left[lb+{r}_{4}\left(ub-lb\right)\right],q<0.5\end{array}\right.$$
    (6)
    $${X}_{m}\left(t\right)=\sum_{k=1}^{M}{X}_{k}\left(t\right)/M$$
    (7)

    where \(X\left(t\right)\) represents the individual current position, while \(X(t+1)\) represents the next iteration position, and t represents iteration times. \({X}_{rand}\left(t\right)\) is a random position of an individual; \({X}_{p}\left(t\right)\) represents the location of prey and also represents the location of the individual with the optimal fitness; \({r}_{1},{r}_{2},{r}_{3},{r}_{4},q\) are random numbers ranging from 0 to 1, where q represents the strategy chosen. \({X}_{m}\left(t\right)\) represents the average position of individuals, \({X}_{k}\left(t\right)\) represents the location of the kth individual in the swarm; M represents the swarm size.

  2. (b)

    Transformation of search and development. HHO algorithm will convert between search and development according to the escape energy of prey, which can be expressed in Eq. (8):

    $$E=2{E}_{0}(1-\frac{t}{T})$$
    (8)

    where \({E}_{0}\) represents the escape energy of prey, t represents the number of current iterations, and T represents the maximum number of iterations. When \(|E|\ge 1\), HHO enters the search stage or enters the development stage.

  3. (c)

    The development stage. Depending on the circumstance, HHO will now select between a light siege and a severe siege to approach the optimal result.

In HHO, compared with the update of the fitness of individual fitness and prey location. If the individual fitness value is superior to the prey position fitness, the prey location will be replaced by a new and better individual position. The process will stop when HHO reaches a set number of iterations.

2.4 Moth Flame Optimization

Seyedali Mirjalili proposed the moth flame optimization (MFO) in 2015 [37] which was inspired by the characteristics of moths spiraling close to an artificial light source in the night environment, as shown in Fig. 4. MFO has strong parallel optimization ability, global optimization, and characteristic ability[69, 50].

Fig. 4
figure 4

Principles of moth fire suppression algorithm

In MFO, the moth is represented by m. Considering the moth as the solution to the problem, then the position of the moth in space is the unknown parameter of the required solution. Thus, the moth can be made to fly in various spatial dimensions by changing its position vector. The flame that corresponds to each moth in MFO is the only flame that the moth will fly around. The following equation can reflect the change in moth location needed to quantitatively model how moths react to flames:

$${M}_{i}=S({M}_{i},{F}_{j})={D}_{i}\cdot {e}^{bt}\cdot \mathrm{cos}(2\pi t)+{F}_{j}$$
(9)

Among them, \({M}_{i}\) is the ith moth; S is the spiral function; \({F}_{j}\) represents the jth flame; \({D}_{i}=|{F}_{j}-{M}_{i}|\) is the distance of the ith moth to the jth flame; b is the defined logarithmic spiral shape constant, t takes a random value between − 1 and 1 which represents the path coefficient.

As shown in Fig. 5, the position of the flame influences the update of the moth’s position. The different values of t represent the different positions of the moths from the flame. In the continuous iterations, the moth updates the position based on the fitness value of the current location and the fitness value of the corresponding sequence of flames, as shown in Fig. 6, and then more accurately approaches the flames in their corresponding sequence.

Fig. 5
figure 5

Logarithmic spiral and the space around the flame

Fig. 6
figure 6

Moth flame distribution diagram

Since each position update of each moth will search all different positions, the local search capability of MFO becomes weaker. A control mechanism for the number of flames as shown in Eq. (10) solves this problem well:

$$flame.no=round(N-l*\frac{N-1}{T})$$
(10)

Among them, N represents the maximum quantity of flames; l represents the present number of iterations; T represents the maximum quantity of iterations. According to the formula, due to the reduction of flames, the moths will change their position parameters according to the flame adaptation value.

3 Data Collection

3.1 Data Sources

This study is based on microseismic monitoring data from the Dongguashan Copper Mine in Tongling City, Anhui Province, China. Dongguashan Copper Mine, formerly known as Shizishan Copper Mine, is located 7.5 km east of Tongling City and is an extra-large high-sulfur copper deposit, as shown in Fig. 7. The elevation of the main ore body in the mining area is − 680 ~  − 1000 m; the horizontal strike is 1810 m; the maximum width is 882 m, while the minimum width is 204 m; the average thickness of the middle section is 40 m. The ore body is layered, and its occurrence is basically the same as that of the surrounding rock. It is a gently inclined layered ore body. The ore body as a whole has the characteristics of wide distribution in the plane, but shallow distribution in the vertical direction. The internal structure of the mine is simple and uncomplicated, joints and fissures are not developed, and the rock mass is high in hardness.

Fig. 7
figure 7

The location of the Dongguashan Copper Mine

The stress and deformation state of rock mass and its change characteristics caused by mining activities are an important factor that causes ground pressure activities. The Dongguashan Copper Mine contains a variety of lithologic rock formations, the mining area is large, and the structural distribution of the stope and the mining area is intricate, resulting in a complex spatial distribution of rock masses prone to rockbursts [11, 49]. A schematic diagram of the stope distribution of Dongguashan Copper Mine is shown in Fig. 8, and it is also the main monitoring area of the microseismic monitoring system.

Fig. 8
figure 8

Schematic diagram of Dongguashan microseismic monitoring system network layout

A schematic diagram of the network layout of the microseismic monitoring system used at the Dongguashan Copper Mine is shown in Fig. 7. The microseismic monitoring system installed in Dongguashan Copper Mine has a total of 7 sensors buried, one of which is a three-component sensor. The specific arrangement of the sensor is shown in Table 1.

Table 1 Location of microseismic monitoring sensors in Dongguashan Copper Mine

The database used in this paper contains 343 sets of microseismic monitoring data, including the angular frequency ratio (AFR), which indicates the vibration frequency; the total energy (TE) released by microseismic in rock mass; the apparent stress (AS), which measures the stress release at the source; the concave and concave and convex radius (CCR), which reflects the variation of geological plate characteristics; the energy ratio (ER) of P-wave to S-wave; and the moment magnitude (MM), which visually indicates the magnitude of the earthquake, as shown in Table 2.

Table 2 Microseismic monitoring data

3.2 Rockburst Hazard Index

The monitoring status of the data used in this study includes a large number of parameters such as magnitude, energy, angular frequency, apparent stress, and convexity radius. Among these, a large number of fields are not strongly correlated with rockburst hazard, so the variables closely related to rockburst hazard need to be screened out. Table 3 gives a brief overview of the microseismic evaluation indicators of mining engineering.

Table 3 Common indicators of microseismic in mine engineering

In the microseismic monitoring of mines, the angular frequency reflects the vibration of the rock mass. Based on the angular frequency ratio of P-wave and S-wave, the internal vibration of the rock mass can be judged. For microseismic events, the release of microseismic energy is related to the internal fracture mode and speed of rock mass, and the energy radiation of the P-wave and S-wave is also different. In this study, the sum of the energy values of P- and S-waves can reflect the energy released by the occurring microseismic events. The radius of the asperity is a kind of geological plate feature, and the occurrence of earthquakes is related to the rupture of the asperity [31]. The apparent stress is generally expressed as the ratio of the microseismic release energy to the microseismic body change potential. In addition, volumetric potential magnitude [35] and moment magnitude [14] are also intuitive indicators for evaluating microseismic events.

With microseismic monitoring at the Dongguashan copper mine, researchers are able to quickly identify anomalous microseismic conditions and notify site staff to observe the actual site conditions and provide feedback. To avoid the ambiguity of visual description, the staff usually classify the site conditions roughly based on empirical formulae and combine the microseismic monitoring anomaly data with the actual site conditions. In mine microseismic monitoring, four types of microseismic characteristic parameters including angular frequency ratio (AFO), total energy (TE), apparent stress (AS), and concave and convex radius (CCR) were taken into consideration to preliminarily calculate rockburst hazard composite index [19] and then classify the RHL to quantify the rockburst hazard. The empirical formula to discriminate RHL steps is as follows:

  1. (1)

    First, the rockburst hazard index of a single parameter should be determined. The premise of calculating the comprehensive index of rockburst hazard is to determine the rockburst hazard index of a single microseismic characteristic parameter \({W}_{i}(t)\). The calculation method is as follows:

    $${W}_{n}(t)=\frac{||A(t)|-\overline{{A }_{0}}|}{{A}_{\mathrm{max}}-\overline{{A }_{0}}}$$
    (11)

    where A(t) represents the amplitude of a certain type of characteristic parameter at time t; \(\overline{{A }_{0}}\) is the mean of the amplitude monitored under normal conditions; Amax represents the maximum value of microseismic characteristic parameters monitored.

  2. (2)

    The weighting factor \({P}_{n}(t)\) is usually determined manually by the staff based on the site damage. After determining the weight factor \({P}_{n}(t)\) of a single index, according to the rockburst hazard index \({W}_{n}(t)\) of each index, the comprehensive rockburst hazard index \({W}_{z}(t)\) can be obtained by referring to Eq. (12).

    $${W}_{z}(t)=\sum_{n=1}^{R}{W}_{n}(t){P}_{n}(t)$$
    (12)

According to the comprehensive rockburst hazard index, rockburst is divided into four hazard levels, and the corresponding RHL at each moment are evaluated. The classification of weight factors and rockburst hazard grades is shown in Table 4.

Table 4 Evaluation table of weight factor and rockburst hazard composite index

This method relies on the personal experience of the staff to roughly discriminate the RHL by empirical formulae, which has many drawbacks. Firstly, this approach relies heavily on the personal work experience and subjectivity of the staff and may be biased due to the different experiences of the observers (the data obtained in this paper were obtained from experienced staff and verified by microseismic monitoring researchers). Secondly, for some hazardous areas, which is hard for the staff to check them, it is difficult to assess the damage situation at the site. Based on the above-mentioned drawbacks of the empirical formula, this paper is aimed at developing a new machine-learning model to achieve the prediction of RHL and reduce the influence of human subjectivity and environmental limitations. In this study, the influence of energy ratio (ER) and moment magnitude (MM) on rockburst is also considered in addition to the four types of characteristic parameters included in the empirical formula.

Correlations, graphs, scatterplots, and histograms obtained from the analysis of rockburst data are shown by the GGally function [42] in Fig. 9a. And Pearson correlation coefficients of various characteristic parameters in different RHLs are illustrated in Fig. 9b. Based on the above, the input variables are angular frequency ratio (AFR), total energy (TE), apparent stress (AS), concave and convex radius (CCR), energy ratio (ER), and moment magnitude (MM), and the output variable is RHL, from which the three hybrid classification models (PSO-SVM, HHO-SVM, and MFO-SVM) are trained and tested. The whole process is implemented for this study in Fig. 10.

Fig. 9
figure 9

Correlation analysis of rockburst eigenvalues. a Correlation plot of rockburst characteristic quantities. Different colors represent different RHL. b Pearson correlation coefficients of various characteristic parameters in different RHL

Fig. 10
figure 10

Complete analysis flow of PSO-SVM, HHO-SVM, and MFO-SVM

In addition, it can be seen from the description of the parameter distributions of the training and test sets in Fig. 11 that they are almost identical for each class of parameters. Therefore, the credibility of the model is guaranteed.

Fig. 11
figure 11

Distribution of the training and test sets of the hybrid models

3.3 Evaluation Indicators of the Models

To evaluate the performance of PSO-SVM, HHO-SVM, and MFO-SVM models, this study adopted four kinds of evaluation indicators, namely, accuracy (ACC), precision (PRE), kappa coefficient, and confusion matrix [60, 61, 70] as shown in Fig. 10. ACC is the proportion of samples where the predicted value matches the true value of all samples. In the classification process, the class concerned is usually positive and the other classes are negative, and the classifier is correct or incorrect in the prediction of the dataset. Therefore, there are four situations as follows: TP (true positive), FP (false positive), TN (true negative), and TP (true positive). PRE is the true percentage of the sample that is predicted to be positive. In the evaluation of multi-class problems, the precision is calculated separately for each label, and the unweighted average is taken. The kappa coefficient plays a role in statistics to evaluate consistency. In practical application, the general range is [0,1]. The magnitude of the coefficient is positively related to the classification accuracy of the model. The calculation methods and principles of ACC, PRE, and kappa coefficients are shown in Fig. 12, where, P0 represents the accuracy of prediction, Pe represents accidental consistency:\({P}_{e}=\frac{{\sum }_{i=1}^{n}{a}_{i+}*{a}_{+j}}{{N}^{2}}\), \({a}_{ij}\) is the sample with actual i and predicted j; N is the whole quantity of samples, and n represents the number of categories;\({a}_{i+}=\sum_{j}{a}_{ij}\);\({a}_{+j}=\sum_{i}{a}_{ij}\).

Fig. 12
figure 12figure 12

Definition and calculation formula of each evaluation index. a Definition and calculation formula of three types of evaluation indicators. b The illustration of the RHL confusion matrix. c The dichotomization of RHL multi-classification problems

4 Results and Discussion

To optimize the target machine learning algorithm, PSO, HHO, and MFO are introduced to determine its internal optimal parameters. In this section, the classification capability of PSO-SVM, HHO-SVM, and MFO-SVM models are systematically compared and analyzed as a single SVM model.

The input dataset consists of a training set and a test set, usually in the ratio of 70:30 [26, 39], 75:25 [52], or 80:20 [17, 29]. In this paper, the dataset obtained from microseismic monitoring is divided into a training set and a test set in the ratio of 80:20 to prove the reliability of the model. Divided in this way, the training set contained 274 sample data, while the test set contained 69 sample data.

In summary, different values of the two hyperparameters (c and g) included in the SVM can have a marked effect on the classification capacity of the model. To obtain better classification results when predicting RHL, three optimization algorithms (PSO, HHO, and MFO) are applied to search for the best combination of hyperparameters for the SVM. To avoid the randomness of dividing the dataset, six kinds of population scales (10, 20, 50, 100, 150, and 200) are set in each classification model, corresponding to the number of particle populations in PSO-SVM, the number of Harris hawk in HHO-SVM, and the number of moths in MFO-SVM, respectively.

4.1 Parameter Setting

Some parameters need to be set in PSO-SVM, HHO-SVM, and MFO-SVM, such as some parameters in the three optimization algorithms (PSO, HHO, and MFO). These parameters affect the classification ability and running speed of the model. Some parameter settings of these three hybrid models are the same, including the region where c and g are taken, the number of iterations, and the division ratio of the training set and test set. In the range of values of the hyperparameters c and g, the optimization algorithm will search for a more reasonable parameter combination in the algorithm space. The number of iterations will affect the optimization result after the optimization process. The segmentation ratio of the training set and testing set affects the reliability and classification ability of the whole model to some extent. In this article, parameters in PSO-SVM, HHO-SVM, and MFO-SVM are compared and analyzed, and the final screening parameters are shown in Table 5.

Table 5 The setting of parameters in three hybrid models

4.2 Discussion and Analysis

In this section, the combined capabilities of PSO-SVM, HHO-SVM, and MFO-SVM are compared and analyzed. In this study, the three indexes PRE, ACC, and kappa coefficient in the above equation are used to evaluate the three hybrid classification models (PSO-SVM, HHO-SVM, and MFO-SVM). In addition, all models adopt the same training set and test set.

To compare the comprehensive performance of the hybrid models, this paper adopts the method of scoring each evaluation index and finally takes the total score to make the comparison. Therefore, the combined capacity of different models in different populations was ranked during the testing phase. The results of relevant rankings can be seen in Fig. 13.

Fig. 13
figure 13

Comprehensive ranking comparison of RHL prediction models. a PSO-SVM, b HHO-SVM, c MFO-SVM, and d Hybrid models

Based on the results in Fig. 13a, PSO-SVM models of different populations all reached a stable state at 100 iterations. To determine the optimal population size, after the completion of PSO-SVM model training, the prediction capacity of the model is comprehensively analyzed. From Fig. 13a, it can be clearly visualized that the optimal population for the PSO-SVM model is 50 (PRE = 0.8198, kappa = 0.8477, ACC = 0.9265).

For the HHO-SVM model, parameters are set as presented in Table 6. It should be noted that HHO differs from PSO in that it does not need to set too many parameters, while other training and test conditions are the same as PSO-SVM. Figure 13b shows the fitness changes in six different populations. Finally, according to the combined score in Fig. 11, the HHO-SVM model has the best classification ability when the population is 200, and thus, the optimal combination of parameters is obtained, as shown in Fig. 13b. (PRE = 0.8660, kappa = 0.8461, ACC = 0.9265).

Table 6 The classification performance of four classification models on the test set

In the process of establishing MFO-SVM, parameters are also set according to Table 6. Similar to HHO, MFO does not need to set specific parameters, and its training and test conditions are set the same as PSO-SVM and HHO-SVM. Finally, the comprehensive performance of the MFO-SVM model is evaluated. As shown in Fig. 13c, the optimal parameter combination and model performance of MFO-SVM is achieved when the population is 50 (PRE = 0.9063, kappa = 0.9094, ACC = 0.9559).

In order to further study the classification and prediction ability of different models for RHL, the PSO-SVM model with a population of 50, the HHO-SVM model with a population of 200, and the MFO-SVM model with a population of 50, which performed better among the three hybrid models, are further evaluated for comprehensive performance. As shown in Fig. 13d, among the three kinds of hybrid models, the MFO-SVM model with a population of 50 demonstrates greater ability in predicting RHL compared to other models, while the performance of PSO-SVM and HHO-SVM is slightly inferior to that of MFO-SVM, and there is no significant difference between them. However, in comparison with the SVM model that has not been optimized, as presented in Table 6, the prediction capacity of the three hybrid models has been significantly improved, which confirms that the optimization algorithm is effective in improving the rockburst prediction ability of the model (Fig. 14).

Fig. 14
figure 14

Optimizing SVM models with PSO, HHO, and MFO of different population values based on the training set. a PSO-SVM, b HHO-SVM, and c MFO-SVM

When evaluating the results, the confusion matrix serves to demonstrate the agreement between the predicted and actual values of the model [44, 59,60,61]. The numbers on the diagonal line from the top left to the bottom right of the confusion matrix indicate the number of samples whose predicted values agree with the actual values, while the other positions present the number of samples where the predicted value does not match the actual value. According to the optimal population number and corresponding parameters of the three hybrid classification models, the confusion matrix of the three hybrid classification models can be obtained. As shown in Fig. 15, among the three hybrid models, the data dispersion degree of MFO-SVM is the smallest, while the performance of PSO-SVM and HHO-SVM is slightly inferior to that of MFO-SVM, but they also have good classification prediction ability.

Fig. 15
figure 15

Confusion matrix of SVM models optimized by PSO, HHO, and MFO. a PSO-SVM, b HHO-SVM, and c MFO-SVM

In conclusion, among the three kinds of hybrid classification models based on the test set, the MFO-SVM model with a population of 50 has an outstanding performance in RHL prediction, while the performance of PSO-SVM and HHO-SVM is slightly inferior to that of MFO-SVM. However, the performances of the three hybrid models have improved significantly compared to the unoptimized SVM model.

5 Conclusions

Rockburst is a common disaster in deep hard rock mines and tunnel engineering, which is potentially dangerous to personnel and equipment. This paper studied the high-precision prediction of RHL by SVM classification technology, which is vital for rockburst hazards prevention and control in mines as well as tunneling projects.

The parameters of the model in this paper contain the total energy, apparent stress, moment magnitude, and other variables in microseismic monitoring. Machine learning can better clarify the relationship between multiple highly nonlinear variables compared to the traditional rockburst prediction methods, which are mostly based on microseismic monitoring.

Based on the SVM model, three optimization strategies are combined to select the best combination of hyperparameters for the model. After the optimization of PSO, HHO, and MFO algorithms, the three kinds of mixed classification models (PSO-SVM, HHO-SVM, and MFO-SVM), and SVM models were comprehensively evaluated, and thus, the optimal combination of hyperparameters and the model with the best classification capacity were obtained.

Experimental results showed that the MFO-SVM performs best among the three hybrid classification models proposed in this article with the same dataset with the highest ACC and PRE (ACC = 0.9559, PRE = 0.9063) and is more suitable for rockburst prediction using microseismic information. After the improvement of the three optimization algorithms, the prediction performance of SVM models was significantly improved, and the kappa coefficients of PSO-SVM, HHO-SVM, and MFO-SVM prediction models reached 0.8477, 0.8461, and 0.9094, respectively. Given the complex relationship between each input variable and the rockburst hazard, these results are highly satisfactory.

The results showed that the three optimization methods have different optimization effects on the prediction capacity of the SVM model. A comparison shows that MFO-SVM has the optimal comprehensive prediction performance, which can be applied to the prediction of rockburst hazards based on microseismic information. The limitation of rockburst prediction using the SVM method in this paper is the limited sample size of the dataset, with only 343 microseismic monitoring samples in total. On the other hand, there may be other characteristic parameters that are not covered in this study that affect the RHL. Therefore, with the continuous expansion of the sample data and more associated characteristic parameters being considered in the model, the prediction capacity of the hybrid classification model will be further improved.