Introduction

One of the most effective techniques for fragmenting rock in open-pit mines is blasting because of its advantages from technical and economical points of view. It can generate a large amount of rock for the subsequent operations (e.g., loading, transporting) with low cost (Jhanwar et al. 1999). However, its ill side influences are not negligible, including air over-pressure (AOp), flyrock, ground vibration, dust, and fumes (Nguyen et al. 2018; Zhang et al. 2019; Shang et al. 2019) (Fig. 1). Of those, AOp is considered as a dangerous phenomenon, which is needed to control (Alel et al. 2018; Armaghani et al. 2015; Khandelwal and Kankar 2011; Khandelwal and Singh 2005; Nguyen and Bui 2018; Nguyen et al. 2017, 2018).

Fig. 1
figure 1

Illustration of the undesirable effects of blasting operations

For predicting blast-induced AOp, several scholars proposed empirical equations, as listed in Table 1. Accordingly, the relationship between monitoring distance (D) and explosive charge per delay/maximum explosive charge capacity (W) was established through empirical equations.

Table 1 Several empirical equations for predicting blast-induced AOp

Of the empirical equations in Table 1, the equation No.1 (USBM empirical equation) has been widely used to predict blast-induced AOp (Siskind et al. 1980; Hustrulid 1999; Walter 1990; Kuzu et al. 2009; Hasanipanah et al. 2016; Mahdiyar et al. 2018). However, the accuracy of empirical models was often not high due to some drawbacks of them, as discussed by Hasanipanah et al. (Hasanipanah et al. 2016), Mahdiyar et al. (Mahdiyar et al. 2018).

Recently, artificial intelligence (AI) became more appropriate and highly used in different fields, especially mining technology (Pierini et al. 2013; Rahmani and Farnood Ahmadi 2018; Montahaei and Oskooi 2014; Wiszniowski 2016; Naganna and Deka 2019; Piasecki et al. 2018; Nguyen et al. 2019a, b, c, d; Zhou et al. 2019; Asteris et al. 2016; Asteris and Nikoo 2019). In order to estimate blast-induced AOp, Hajihassani et al. (Hajihassani et al. 2014) trained an artificial neural network (ANN) by an evolutionary algorithm (Particle Swarm Optimization—PSO), namely ANN-based PSO model, using 62 AOp datasets. Their results showed that the ANN-based PSO model performed properly in forecasting blast-caused AOp with the correlation coefficient (CC) of 0.94. In another study, Mohamad et al. (Mohamad et al. 2016) predicted blast-induced AOp by an ANN-based genetic algorithm (GA), abbreviated as GA-ANN, using 76 blasting events. Empirical and ANN models were also provided to predict AOp and compared them to those of the GA-ANN model. Their results interpreted that the GA-ANN model performed better than those of empirical and ANN models. Hasanipanah et al. (2016) used ANFIS, ANN, fuzzy system (FS) techniques, and an empirical equation for estimating blast-induced AOp. For developing these models, a group of 77 blasting events was used in their study. Their findings revealed that the ANFIS system was the most superior approach in forecasting AOp. Amiri et al. (2016) also introduced a new combination of k-nearest neighbors (KNN) and ANN models to predict AOp using 75 blasting events. Their results indicated that the KNN-ANN model predicted better than those of ANN and empirical models. Mahdiyar et al. (2018) also proposed three AI models to estimate AOp based on PSO algorithm and 80 blasting events. The results indicated that the PSO model estimated AOp very well with a promising result. Nguyen et al. (2019) also discovered a hybrid model based on clustering technique and backpropagation neural networks. In another study, Nguyen et al. (2018) performed a comparative study of MLP neural nets, BRNN, and HYFIS in estimating AOp. Their results showed that the MLP neural nets were the most superior model than those of the other models. They also developed another AI model based on ensemble of ANN and RF (i.e., ANNs-RF) for predicting AOp with an excellent result (Nguyen and Bui 2018). By the use of optimization algorithm, AminShokravi et al. (2018) demonstrated the potential of the PSO algorithm in predicting AOp with high accuracy. Bui et al. (2019) also evaluated the performance of different AI techniques for estimating AOp in an open-pit coal mine, including RF, boosted regression trees, KNN, SVR, GP (Gaussian process), BART (Bayesian additive regression trees), and ANN. They claimed the feasibility of the mentioned AI techniques. ANN model was recommended as the best model in their study for estimating AOp. Zhou et al. (2019) also developed a novel AI model for forecasting AOp based on FS and firefly algorithm (FA), namely FS-FA model. A high prediction level was confirmed in their study for the proposed FS-FA model. Gao et al. (2019) also took into account the performance of the GA and group method of data handling (GMDH) for forecasting AOp. Eventually, their GA-GMDH model was proposed as a robust technique with an excellent agreement.

A review of the literature shows that blast-induced AOp predictive models were developed and proposed quite well. Nevertheless, they cannot apply and represent other locations/regions, whereas the effects of blast-induced AOp are different from country to country. In this study, blast-induced AOp was assessed and predicted by three ensemble machine learning algorithms, including RF, GBM (gradient boosting machine), and Cubist. An empirical model was also developed to predict and compare with those of ensemble models herein.

The rest of the present work is arranged as follows: “Study area and data used” section presents the study site and characteristics of the dataset; “Methods” section provides the principle of the approaches used; the preparation of the dataset is introduced in “Preparing the dataset” section; the development of the models is shown in “Establishing the AOp predictive models” section; some performance indices are presented in “Performance indices” section; and “Results and discussion” section reports the results and discussion. Finally, “Conclusions and remarks” section presents our conclusions of this work.

Study area and data used

Study area

Herein, the Deo Nai open-pit coal mine, which is located in Quang Ninh Province, Vietnam, was selected as a special study area. It lies within latitudes 21°001′00″N and 21°020′00″N and between longitudes 107°018′15″E and 107°019′20″E (Fig. 2). The coal store is 42.5 Mt, and production capacity is 2.5 Mt/year; overburden is 20–30 Mt/year. (Vinacomin 2015). With a large amount of overburden per year and the hardness of rock being high (from 10 to 14 according to Protodiakonov’s classification (Bach et al. 2012)), blasting was selected as a proper technique for fragmenting rock in the mine. ANFO is the main explosive used in this mine, with the amount being up to 20 tons. The non-electric delay blasting method was applied to fragment rock with the diameter of borehole of 105 mm. The nearest distance from blasts to the residential area is about 400–500 m. Hence, the ill side effects of blasts are substantial.

Fig. 2
figure 2

Location of the study site

Data collection and its characteristics

In this study, 146 events of blasting were investigated, with ten parameters being measured. Of the ten parameters, nine first variables were used as the inputs to predict the outcome of AOp, including powder factor (q), maximum explosive charge capacity (W), burden (B), length of stemming (T), spacing (S), number of rows per blast (N), monitoring distance (D), bench height (H), and air humidity (RH) (Fig. 3). For monitoring blast-induced AOp, an instrument of Instantel (Canada) was utilized with a microphone. According to the guideline of the producer, the microphone should be placed at the sensitive locations and straightforward with the direction of blasts (Fig. 4). Also, a handheld GPS was used to define D. RH was measured by Kanomax 2212 air quality meter (Japan). It is one of the most influential parameters for estimating AOp, which was recommended by Nguyen et al. (2018). The remaining inputs were extracted from the design of blasts. Table 2 shows the characteristics of inputs and output in this work.

Fig. 3
figure 3

Structure of the borehole and its parameters. a Parameters of blast design and b a combination plan of blasting

Fig. 4
figure 4

Data collection for predicting AOp in this work

Table 2 Inputs, output, and their properties

Methods

Empirical

Empirical is one of the methods which is utilized to predict blast-produced AOp in open-cast mine. Of the empirical methods (as shown in Table 1), USBM empirical formula was widely applied to predict AOp in open-pit mines (Hajihassani et al. 2014; Armaghani et al. 2016). For example, Kuzu et al. (2009) used the empirical equation of the USBM to forecast AOp with a promising result. In the USBM equation, the scaled distance was illustrated through W and D as follows:

$$SD = DW^{ - 0.33}$$
(1)

Subsequently, the USBM empirical equation can be computed according to Eq. 2:

$${\text{AOp}} = \gamma (SD)^{ - \alpha }$$
(2)

where \(\gamma\) and \(\alpha\) are the site factors.

Random forest

Decision tree (DT) is one of the branches of AI, and RF belongs to the DT branch, which was developed by Breiman (2001). As a robust DT model, RF can solve both classification and regression cases. Based on the different results of the trees, this method has been suggested as a suitable method for achieving predictive precision (Vigneau et al. 2018). In addition, this method used the results of the exclusive tree in the forest to present the best outcome. As a voter, each tree contributes its predictions for the final decision of RF (Gao et al. 2018). On the other hand, RF ensembles the predictions of the tree and making a final decision based on the obtained results. The key of the RF for regression is presented in three steps: (i) producing bootstrap instances as the tree number in the forest (ntree) according to the database, (ii) expand a suitable regression tree for any bootstrap instance using random sampling of the estimators (mtry) (Dou et al. 2019). Of those variables, choose the most appropriate split and (iii) estimate recent perception using ensemble the estimated amounts of the trees (ntree). For the regression issue (i.e., estimating AOp), the mean amount of the estimated values in the single tree is applied.

According to the training dataset, a prediction of the error rate may be calculated according to the two following steps:

  1. 1

    At any iteration of bootstrap, estimate the non-information in the instance of bootstrap using the tree grown with the bootstrap instance, named “out-of-bag” (OOB).

  2. 2

    Collect the OOB estimations and predict the error.

More details of the RF algorithm can be explained in (Nguyen and Bui 2018; Breiman 2001; Bui et al. 2019).

Gradient boosting machine

GBM is an ensemble approach that is suggested by Friedman (2002). It is an improved boosting algorithm and can be applied for regression, as well as classification problems (Friedman 2001). The boosting algorithm can be described according to the pseudocode in Fig. 5 (Friedman 2002).

Fig. 5
figure 5

Pseudocode of the boosting algorithm

Subsequently, Friedman (Friedman 1999) provided a particular algorithm based on the platform of boosting algorithm for various loss criteria like least squares:

$$\psi (y_{\text{AOp}} ,T) = (y_{\text{AOp}} - T)^{2}$$
(3)

Least absolute deviation:

$$\psi (y_{\text{AOp}} ,T) = \left| {y_{\text{AOp}} - T} \right|$$
(4)

Huber M:

$$\psi (y_{\text{AOp}} ,T) = (y_{\text{AOp}} - T)^{2} 1(\left| {y_{\text{AOp}} - T} \right| \le \delta ) + 2\delta (\left| {y_{\text{AOp}} - T} \right| - \delta /2)1(\left| {y_{\text{AOp}} - T} \right| > \delta )$$
(5)

Let \(\left\{ {y_{{i.{\text{AOp}}}} ,x_{{i.{\text{AOp}}}} } \right\}_{1}^{N}\) as the entire training information instance and \(\left\{ {\pi (i)} \right\}_{i}^{N}\) stands for random permutation for integers \(\left\{ {1, \ldots ,N} \right\}\). Then, a random subsample of size \(\tilde{N} < N\) is predicted by \(\left\{ {y_{{\pi (i.{\text{AOp}})}} ,x_{{\pi (i.{\text{AOp)}}}} } \right\}_{1}^{{\tilde{N}}}\). The pseudocode of the GBM algorithm is described in Fig. 6 (Friedman 2002).

Fig. 6
figure 6

Pseudocode of the GBM algorithm

Cubist

Cubist algorithm (Rulequest 2016a, b) is one of the rule-based algorithms, which is utilized to make predictive models according to the input information analysis, whereas the See5/C5.0 method that is able to solve classification problems (Quinlan 2004), the Cubist can solve regression issues very well. The outcomes from the Cubist model are more priority than those of linear regression models. In addition, it is simpler than the ANN model (Rulequest 2016a, b).

The Cubist model is expanded based on Quinlan’s M5 model tree (Quinlan 1992) with the capability to apply for thousands of input characteristics (Rulequest 2016a, b). In the Cubist model, the targets depend on the inputs, and it is computed based on the rule(s). A combination of different conditions with a linear function is conducted for these rules. The related linear function is used to estimate the output properly if a rule takes into consideration the whole requirements. The Cubist algorithm can perform multiple situations at the same time and then detect various distinct linear functions for estimating output. Therefore, Cubist can generate various models and mixes them based on the rules which are determined before. Developing multiple models with different rules and their combinations can assist Cubist model in attaining much higher levels of precision. More details of Cubist can be found in Refs. (Nguyen et al. 2019; Kuhn et al. 2012; Drzewiecki 2016; Kuhn et al. 2018; Bernat and Drzewiecki 2015).

Preparing the dataset

In this section, the AOp dataset is prepared as a geospatial database by the ArcGIS software; 146 records of blast were divided into two sections according to the recommendations of previous researchers (Nguyen et al. 2019a, b); 80% of the total datasets (approximate 118 events of blast) are selected by randomly and applied as the training dataset to build the AOp predictive models. The rest (28 records of the blast) were utilized as the testing dataset for evaluating the AOp models’ performance. Summary of training and testing datasets is shown in Tables 3 and 4, respectively.

Table 3 Summary of the training dataset
Table 4 Summary of the testing dataset

Establishing the AOp predictive models

For the empirical model, 118 blasting events (training dataset) were used to compute the site factors k and β. Microsoft Excel 2016 was used to define k and β by the use of a multivariate regression analysis technique. As a result, k = 208.26 and β = 0.183 are the optimal values of the USBM model for predicting AOp. The USBM model (in this case) can be described as:

$${\text{AOp}} = 208.026(SD)^{ - 0.183}$$
(6)

For the development of the ensemble models, the tenfold cross-validation method, along with three repetitions, is utilized to avoid overfitting. Furthermore, the ensemble models used the same training as those used for the development of the USBM model. To develop the RF model, the number of trees was set equal to 2000 to meet the diversity of the forest (Nguyen et al. 2017). Then, the random predictor (mtry) was tuned to get the optimal performance of the RF model. Herein, mtry was set in the range of 1–50 as a trial and error procedure. Ultimately, an optimal value of mtry was determined for the RF model with mtry = 41. Figure 7 shows the efficiency of the RF model for estimating AOp.

Fig. 7
figure 7

RF modeling for prediction of AOp

Unlike the RF model, the GBM model used four parameters to control the model’s performance, such as the number of trees, max tree depth, shrinkage, and n.minobsinnode. A grid search method was also applied to define the optimal parameters for the GBM model. As a result, number of trees =500, max tree depth =4, shrinkage =0.1, and n.minobsinnode =5 were the best values for the GBM model in this case. GBM’s performance is illustrated in Fig. 8.

Fig. 8
figure 8

GBM modeling for prediction of AOp

To develop the Cubist model, committees and neighbors were used as the key parameters. The results indicated that the Cubist model reached optimal performance with committees of 80 and neighbors of 0, as shown in Fig. 9.

Fig. 9
figure 9

Cubist modeling for prediction of AOp

Performance indices

For evaluating the efficiency of the AOp predictive models, three performance indices were computed, including mean absolute error (MAE), coefficient of determination (R2), and root mean square error (RMSE).

$${\text{RMSE}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {(y_{\text{AOp}} - \hat{y}_{\text{AOp}} )^{2} } }$$
(7)
$$R^{2} = 1 - \frac{{\sum\nolimits_{i} {(y_{\text{AOp}} - \hat{y}_{\text{AOp}} } )^{2} }}{{\sum\nolimits_{i} {(y_{\text{AOp}} - \bar{y}_{\text{AOp}} )^{2} } }}$$
(8)
$$MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {y_{\text{AOp}} - \hat{y}_{\text{AOp}} } \right|}$$
(9)

n is the total number of observations. \(y_{\text{AOp}}\) is recorded values, \(\hat{y}_{\text{AOp}}\) is predicted values, and \(\bar{y}_{\text{AOp}}\) is the average of recorded values.

Results and discussion

Once the models were well established, their performance is evaluated and checked through the performance indices according to Eqs. (79). Table 5 shows the results, as well as the performance of the ensemble and empirical models on training/testing datasets.

Table 5 Performance indices of the ensemble and empirical models

It can be easy to recognize that the ensemble models performed very well in this study. On the training dataset, the ensemble models obtained high performance with RMSE of 1.739–2.199; R2 of 0.968–0.970; and MAE of 0.980–1.451. The similar results were also observed on the testing dataset for the ensemble models with RMSE of 2.483–2.721, R2 of 0.950–0.956, and MAE of 0.976–1.498. In contrast to the ensemble models, the empirical model provided the poorest efficiency (i.e., RMSE = 4.838, 4.448; R2 = 0.871, 0.872; and MAE = 4.101, 3.719, on the training and testing datasets, respectively). Among three ensemble models (RF, GBM, Cubist), the Cubist model was the most dominant model with an RMSE of 2.483, R2 of 0.956, and MAE of 0.976 on the testing database. Figure 10 shows the efficiency of the AOp predictive models in testing process.

Fig. 10
figure 10

Relationship of measured and predicted AOp on the ensemble and empirical models

Although the efficiency of the ensemble models is better than the empirical model in this study, however, the practical technique used only two input parameters (W and D) to estimate blast-induced AOp, whereas the ensemble models used nine input parameters for predicting the same objective. Therefore, a sensitivity analysis procedure was conducted to assess the effect of the inputs on the AOp predictive model (Tarantola et al. 2007; Saltelli et al. 2010). The results showed that W, S, T, RH, and D were the most influential parameters on the AOp predictive model, as illustrated in Fig. 11.

Fig. 11
figure 11

Sensitivity analysis of the parameters

Conclusions and remarks

Based on the obtained results of this study, some conclusions and remarks are withdrawn as follows:

  • Ensemble machine learning algorithms are good candidates for predicting blast-induced AOp than those of empirical methods, especially RF, GBM, and Cubist models. They should be considered to control the undesirable effects of blasting in practical engineering.

  • Cubist is a robust ensemble AI model for predicting AOp in this study. Its accuracy can ensure safety for the surrounding environment. However, it should be reconsidered in other locations/areas.

  • RF and GBM are also good AI techniques for predicting AOp. However, its performance seems not to satisfy. Therefore, they need to improve and further research.

  • For predicting AOp, it is not only W and D, but also S, T, and RH are the important inputs for the development of the AOp predictive models. They should be carefully collected to ensure the accuracy level of the models.