1 Introduction

Rock breakage through blasting is one of the most important activities widely applied in mines, civil and construction engineering, tunneling and subway construction activities because of its economies of scale and higher efficiencies. The overall costs of mining, quarrying and construction activities are significantly influenced by the degree of fragmentation, displacement and looseness of the muckpile that has to be transported. In the blasting operations, a large amount of explosive energy is utilised in creating undesirable environmental impacts like generating flyrock, blast-induced ground vibrations, air-overpressure, and back-break which can affect the surrounding area [15]. Among these, the blast-induced ground vibrations are recognized as the most undesirable phenomenon which may lead to damage the surrounding structures, adjacent rock masses, roads, underground workings, slopes, railroads, the existing ground water conduits, and may cause irreparable damage to the ecology of the surrounding area [610].

Upon detonation, the chemical substances in the explosive will undergo transformation into an enormous volume of gases having high temperature, heat and pressure. The created high-pressure gas travels in an outward direction (circular pattern) to the blast hole, crushing and shattering everything in its path. The detonation pressure decays or dissipates quickly as the wave front propagates within the rock. In the ground adjoining the blast hole, a wave motion is produced by the strain waves, propagated as elastic waves when the stress wave intensity reduces to the ground level and manifests in ground vibrations [6].

In this paper, CART approach has been used to predict the PPV. In addition, an empirical model proposed by the erstwhile United States Bureau of Mines (USBM) [now the US Geological Survey (USGS) or National Institute for Occupational Safety and Health (NIOSH)] and a multiple regression models were proposed to estimate the PPV and the obtained results were evaluated in accordance with several performance prediction indices.

2 Literature review

Generally, blast-induced ground vibrations can be quantified through two main factors, namely the frequency and peak particle velocity (PPV). Many researchers [6, 1114] have observed that the PPV is used as an index for measuring the ground vibrations as it is an important indicator for controlling the structural damage. During past few decades, many empirical vibration predictor equations have been proposed to calculate the PPV produced by a blast (e.g., [1519]) using two factors, namely maximum charge per delay and distance from the blast face. As a result, these empirical approaches prove inadequate to determine blasting safety zones or areas in a mine where high level of precision in PPV estimation is required. This might be due to incorporating a limited numbers of influential parameters in estimating the PPV (maximum charge per delay and distance from the blast face) in these predictors, whereas the PPV could also be influenced by numerous other controllable or non-controllable parameters like burden, spacing, stemming and powder factor [2, 6]. 

Apart from empirical predictors, statistical techniques have been widely used for PPV predictions (e.g., [8, 20, 21]). In these techniques, several other input parameters related to blast design, rock mass and explosive properties were utilized for ground vibration prediction (e.g., [6, 10, 22, 23]). However, the application of statistical techniques is limited to a specific site and/or data and are not generic (or universal) in nature [22, 24, 25].

In recent years, soft computing techniques have been extensively applied and developed to predict ground vibrations due to blasting operations. Many researchers highlight the successful use of these techniques in the field of ground vibration predictions (1–6; 22–33). For example, Khandelwal and Singh [1] examined the empirical predictors and artificial neural network (ANN) model to predict the PPV and frequency values obtained from 150 blasting events and observed that the ANN results were more accurate compared to many other empirical predictors.  

In another study, Monjezi et al. [26] developed the ANN, empirical and statistical-based models for predicting blast-induced ground vibrations in Siahbisheh pumped storage dam, Iran. They used a database comprising 182 datasets to predict PPV and concluded that ANN can implement better in predicting PPV compared to other proposed models. Iphar et al. [27] and Jahed Armaghani et al. [28] developed the adaptive neuro-fuzzy inference system (ANFIS) for estimating PPV induced by blasting. A fuzzy inference system (FIS) model was proposed by Fisne et al. [29] for evaluation and prediction of 33 PPV values obtained from the Akdaglar quarry, Turkey. A study by Ghasemi et al. [30] proposed  yet another fuzzy model for indirect determination of PPV using six different controllable input parameters. They highlighted the high-performance prediction of the fuzzy model in estimating PPV.

Mohamed [24] proposed both ANN and FIS models for estimating PPV and reported that FIS approach can provide slightly higher performance capacity in approximating the PPV. Based on the blast parameters obtained from Bakhtiari Dam, Iran, Hasanipanah et al. [31] introduced a support vector machine (SVM) model to estimate PPV. Dindarloo [32] developed an SVM model for estimating 100 PPV values collected from Golegohar iron ore mine in Iran. They used 12 model input parameters, both controllable and non-controllable, to predict the PPV and found that the developed model is a versatile tool for predicting PPV. Two hybrid intelligent techniques, namely particle swarm optimization (PSO)-ANN and imperialism competitive algorithm (ICA)-ANN, were developed in the studies carried out by Hajihassani et al. [33] and Hajihassani et al. [23], respectively.

Classification and regression tree (CART) analysis technique is considered as an innovative, powerful and accurate approach for approximating science and engineering problems [34]. CART technique can be defined according to a decision tree where several parameters are considered as inputs of the system to determine the influence of them on output(s) of the system. CART is non-parametric in nature and is able to handle data with high skew value. Additionally, CART technique can be employed using a less number of model inputs. This fact makes CART as an interesting method for simulation purposes specifically, in cases where there are only two or three predictors. This approach has successfully been utilized in numerous areas of rock mechanics and geotechnical engineering [3537]. Tiryaki [35] and Liang et al. [34] used this method for predicting strength of the rock and observed that CART is a powerful tool for solving rock strength problem. Rock cuttability values were simulated using CART approach in an investigation carried out by Tiryaki [36]. Henderson et al. [37] developed a CART model for better estimation of soil properties. Gandomi et al. [38] suggested this technique for assessing post-earthquake soil liquefaction using a comprehensive database comprising seismic parameters and soil properties.

3 Classification and regression tree (CART)

One of the non-parametric methods that is widely used in data mining is the decision trees analyses. The main purpose of decision trees is to create a predictive model using independent variable(s) to predict dependent variable(s). The decision trees, known as a hierarchical model, is a rule-based method that splits the independent variables into homogenous domains [3941], is formed by root, leaves, branches and nodes that mimic the natural trees (Fig. 1). The scholarly literature suggests that this method has been successfully used in decision making, pattern recognition, classification and prediction purposes in tandem with other complex methods such as ANNs [35]. There are three main components in decision trees, namely node, condition and production. The nodes comprised decision, chance and end nodes. As shown in Fig. 1, ‘A’ represents the decision node, ‘B, C, D, E, F’ and ‘G’ describe the chance nodes and ‘H, I, J’ and ‘K’ signify the end nodes. In the decision tree, each node represents a definite characteristic or independent input variable. The nodes can be connected quantitative or qualitative variables to each other. If the target variable is categorical and the tree is used to identify the “class” within a target variable, it will be named as classification tree (CT). If the target variable is continuous and the tree is used to predict target value, it is called as regression tree (RT).

Fig. 1
figure 1

Architecture of a simple regression tree

There are several algorithms used to develop a decision tree model, namely Chi-square automatic interaction detection (CHAID) [42], quick, unbiased and efficient statistical tree (QUEST) [43], classification and regression tree (CART) [44], ID3 [45], exhaustive CHAID [46], and C4.5 [47]. The review of studies carried out by other scholars demonstrate that the CART method is one of the most popular methods in predicting engineering problems.

Introduced by Breiman et al. [44], CART is a non-parametric method that does not need any initial assumptions about the variables and their relationships and it can self identify the most significant variables and eliminate non-significant ones. Another advantage of CART is the ease of dealing with the outliers. Outliers can have a negative effect on the results of some statistical models such as principal component analysis and linear regression. But the CART algorithm will easily handle noisy data by isolating them in a separate node. In addition, a procedure can be applied to remove outliers or using mean, mode or nearest neighbor methods in the CART algorithm, to overcome this problem. The main features of CART and any decision tree algorithm can be described as follows:

  • application of some rules for splitting data at a node based on the value of one variable,

  • existence of criteria to stop the system which is known as production node, and

  • calculation of predicted values for final nodes.

The process of CART algorithm starts with the selection of a variable as root (the node No. 1) of a tree. The process is continued by asking a question about the range of this variable (having answers of yes or no). According to the selected answer, the process is divided by branches into sub-nodes. This procedure is continued until stopping criteria (i.e., the maximum tree depth or the minimum values of root mean square error for each leaf) is met. At the end, the final nodes show the predicted values by the CART model [39, 44, 45].

4 Site investigation and data source

The field study was conducted at one of the opencast coal mine of Sinagreni Collieries Company Limited (SCCL), Telangana, India. The SCCL area geographically lies between the north latitude 17°55′50″ to 17°56′25″ and east longitude 80°44′45″ to 80°45′30″. The area is mostly covered by limestone of Pakhals in the western and southern parts and slowly grades into the sandstone of Gondwana series in northeasterly direction. The other geological units found within the project area are Talcher and Barakars. Kamthis are observed away from the project area in northern and eastern directions.

The limestone is massive, flaggy and at places striking in NW–SE direction, dipping towards NE with dip amount varying from 35° to 40°. At the contact zone between limestone and sandstone, calaceous beds are observed within grades into sandstone. The sandstone is soft and coarse grained. The various units of lower Gondwana are abutting each other in different directions due to structural disturbances in that area.

In general, this area consists of soft soil up to 2 m depth followed by medium- to coarse-grained gray sandstone overburden along with shale and thick coal bands of varying thickness of 17–50 m. Thickness of top seam is varying from 1.4 to 4.4 m and the bottom seam thickness is varying from 2.75 to 5.07 m. The partition thickness consists of mostly medium-grained gray sandstone and it is varying from 4.87 to 13.0 m.

A review of previous studies of ground vibration predictions show that the effects of maximum charge per delay (MC) and distance from the blast face (D) on PPV are higher than the other controllable and uncontrollable blasting parameters [11, 1519]. The results of MC and D of 51 blasting operations were considered as predictors to estimate PPV. Summary of the utilized data with their ranges can be seen in Table 1.

Table 1 Summary of the data used in the modeling and their categories

5 Model developments for PPV estimation

In this section, model developments of PPV prediction using CART, empirical, and multiple regression (MR) methods are described. In proposing empirical equation, a model recommended by USBM was considered and used. In these models, MC (kg) and D (m) were fixed as input parameters to estimate PPV resulting from mine blasting.

5.1 CART model

In developing CART model for predicting PPV the XLSTAT V.15 software was used. At first, data were divided into training and testing datasets. Training datasets were used for the purpose of model development whilst testing datasets were applied for the purpose of evaluation of the developed model. In this study, 70 % (36 datasets) and 30 % (15 datasets) of whole datasets were utilized randomly for training and testing purposes, respectively, as suggested by Nelson and Illingworth [48].

In the next step, training datasets were introduced to the software. To obtain CART parameters and subsequently an optimal regression tree with high degree accuracy, a trial-and-error procedure was used. The minimum number of parent size is the minimum number of objects that a node must contain to split. Moreover, the minimum number of son size is the minimum number of objects. Each created node must be able to be split into two different nodes, so the minimum number of parent size and son size was fixed as 2 and 1, respectively. There are also two other controller parameters, namely number of intervals and maximum tree depth. For the first controller parameter, the maximum number of intervals generated during the discretization of the quantitative explanatory variables is selected using univariate partitioning by the Fisher’s method. The parameter related to the maximum depth of regression tree can control redundancy and complexity of the problem. The XLASTAT software suggests ranges of (1–10) and (2–10) for number of intervals and maximum tree depth, respectively. According to the trial-and-error procedure, the different combinations of these parameters were examined and finally, a CART model with the values of 9 and 4 were obtained for number of intervals and maximum tree depth, respectively.

To evaluate performance of the constructed CART models, coefficient of correlation (R 2) and root mean square error (RMSE) were used. A predictive model is excellent if values of 1 and 0 are obtained for R 2 and RMSE, respectively. Tree structure of the proposed CART model is shown in Fig. 2. This tree has 19 nodes. The process of calculating PPV by this regression tree is very simple. For example, suppose a dataset with values of 160, 760 and 0.851 for MC, D and PPV, respectively, the tree starts with selecting D as the root node. By considering above assumptions, with following routes of D ≥ 245, D ≤ 420, D ≥ 622.5 and MC ≥ 86.9, and eventually, the system reaches to node No. 19 with predicted PPV value of 1.002 mm/s. Some of the constructed rules for the developed CART model in predicting PPV values are presented in Table 2. More discussions and details regarding the developed CART model will be given later in this paper.

Fig. 2
figure 2

Tree structure of the proposed CART model to estimate PPV

Table 2 Some of the constructed rules for the developed CART model in predicting PPV

5.2 Empirical model

As mentioned earlier, many researchers have developed empirical predictor equations for estimating peak particle velocity due to blasting (e.g., [15, 18, 19]). The most popular one is the model proposed by USBM [15]. In this study, USBM model was used to propose a PPV empirical model. In the USBM model, a scaled distance (SD) factor was calculated based on the following equation:

$${\text{SD}} = \left( {\frac{D}{{\sqrt {\text{MC}} }}} \right) ,$$
(1)

where D and MC are distance (m) and maximum charge per delay (kg).

Accordingly, the PPV can be calculated using the following equation:

$${\text{PPV}} = K ({\text{SD}})^{B}$$
(2)

where B and K are site constants and PPV is peak particle velocity (mm/s), PPV values can be computed.

Considering the PPV method suggested by USBM and using the same training datasets used in the CART technique, a formula was developed for estimating PPV as follows:

$${\text{PPV = 143}} . 2 8 ( {\text{SD)}}^{{{ - 1} . 2 1 3}}$$
(3)

The PPV model development was achieved using Eq. 1 which exhibited high correlation with an R 2 value equal to 0.835. In addition, R 2 = 0.89 was obtained for testing datasets in predicting PPV. These results demonstrate the ability of the developed empirical model for PPV estimation. The logarithmic relationship between SD and measured PPV values together with their developed equation is shown in Fig. 3.

Fig. 3
figure 3

Logarithmic relationship between SD and measured PPV values

5.3 Multiple regression model

The regression analysis is a statistical tool that is used to identify the relationships between variables. Typically, researchers attempt to ascertain the causal effect of one variable on another. In multiple regression (MR) techniques, the relationship between independent variables or predictors and dependent variable or output is systematically determined in the form of a function [49]. This technique has been widely applied for approximating many objectives in the field of geotechnical engineering [50, 51]. In this study, MR was also applied to propose a new equation for PPV prediction using the same training and testing datasets. In constructing the MR model, the results of MC and D were used as model inputs to estimate PPV. The analysis of MR equation was conducted with help of statistical software package of SPSS version 16 [52]. The developed MR equation for estimating PPV is presented as following:

$${\text{PPV}} = 0.008 \times {\text{MC}} - 0.01 \times D + 6.901$$
(4)

R 2 values of 0.580 and 0.870 for model development and evaluation, respectively, indicate suitable applicability of the proposed MR equation in estimating PPV.

6 Evaluation of the proposed PPV models

This section presents the evaluation of the developed models in estimating PPV produced by blasting. In the CART modeling procedure, empirical and MR models, 70 and 30 % of whole datasets (36 and 15 datasets) were assigned for model development and evaluation, respectively. In these models, both parameter, i.e., MC and D were set as inputs, while PPV was set as system output. Three statistical functions, namely RMSE, variance account for (VAF) and R 2 were computed to control prediction performance of the proposed models:

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{N} (y - y^{\prime } )^{2} }}{{\mathop \sum \nolimits_{i = 1}^{N} (y - \tilde{y})^{2} }}$$
(5)
$${\text{VAF }} = [1 - \frac{{\text{var} (y - y^{\prime } )}}{{\text{var} (y)}} ] \times 100$$
(6)
$${\text{RMSE}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} (y - y^{\prime } )^{2} }$$
(7)

where y, y′ and are the measured, predicted and mean of the y values, respectively, N is the total number of datasets and P is the number of predictors.

Theoretically, a predictive model is considered as suitable if the R 2 = 1, VAF = 100 and RMSE = 0. The results of these statistical functions for the proposed models are given in Table 3 and the results summarized confirm that the performance prediction of the CART model is better than MR and empirical models. Considering testing datasets, the values of R 2 (0.92, 0.89 and 0.87), VAF (91.89, 86.94 and 86.94), and the RMSE (0.97, 0.99 and 0.98) were obtained for CART, empirical and MR models, respectively, which demonstrate superiority of the developed CART approach in approximating PPV.

Table 3 Performance prediction of the developed models in predicting PPV

To validate the proposed models, the models’ performance prediction results were compared with those obtained by two other empirical models proposed for estimating the PPV using training and testing datasets [11, 16]. Based on the results, the developed models in this study (i.e., CART and empirical) work better when compared to the other two empirical models proposed by other researchers. The relationships between measured PPVs and predicted PPVs by Indian standard [11] and Langefors and Kihlstrom [16] models together with their R 2 values for training and testing datasets are shown in Figs. 4 and 5, respectively.

Fig. 4
figure 4

Relationships between predicted PPVs by Indian standard [11] model and measured PPVs for training and testing datasets

Fig. 5
figure 5

Relationships between predicted PPVs by Langefors and Kihlstrom [16] model and measured PPVs for training and testing datasets

For a better comparison, the obtained PPV results (51 sets) and the measured PPVs are illustrated in Fig. 6. It is observed that in most of cases, the values predicted by CART model are comparable to the measured and predicted values by MR and empirical models.

Fig. 6
figure 6

Comparison of the obtained PPV values by CART, empirical and MR techniques together with the measured PPV values

7 Summary and conclusions

In this paper, the application of CART model was used for predicting PPV induced by blasting operations in mines. The paper also examined two other models (i.e., MR and empirical) to compare the performance of the CART model. For this purpose, a database comprising 51 datasets was prepared. In the established database, results of MC and D were considered as model inputs for estimation of PPV. Using the available datasets, several predictive models with different parameters have been developed to predict the PPV. The accuracy of the predictive models was evaluated using magnitude of three statistical functions (RMSE, VAF and R 2). Results of these statistical functions demonstrated that the CART is a more accurate and applicable model for prediction of ground vibration in comparison with the other predictive models. R 2 equal to 0.89 and 0.92 for training and testing datasets, respectively, were obtained by CART model, while these values were achieved as 0.83 and 0.89, and 0.58 and 0.87 for the developed empirical and MR models, respectively. This paper also found that CART as a rule-based method can be used as a new tool to aid the engineers in PPV estimations in the field.