1 Introduction

The estimation of polyaxial rock strength is of great importance in the mining, geomechanics, and geoengineering areas due to its increasing application in great depth projects such as wellbore stability and storage caverns or in order to make excavation or hydrofracturing. Furthermore, as engineering depth increases, the true triaxial stress condition (\({\sigma }_{1}\) > \({\sigma }_{2}\) > \({\sigma }_{3}\)) becomes more general. Moreover, there is a growing body of experimental evidence (Mogi 1967, 1971a, b; 2007; Takahashi and Koide 1989; Chang and Haimson 2000; Haimson and Rudnicki 2010; Sriapai et al. 2013; Feng et al. 2016; 2019; Ma and Haimson 2016) to illustrate that the intermediate principal stress (\({\sigma }_{2}\)) is a significant contributor to their compressive strength, deformability, failure types, and fault angle of rocks. However, different rock types exhibit varying degrees of \({\sigma }_{2}\) dependency.

A number of theoretical and empirical failure criteria have been proposed to model geomaterial strength in the last century, among which Mohr–Coulomb, Hoek and Brown, Lade–Duncan, Wiebols and Cook, Mogi, and Drucker–Prager criteria are widely adopted and well-known failure criteria. Cohesion, friction angle, hardness parameter (\(m\) of Hoek–Brown criterion), and uniaxial strength are basic constants to define these criteria. These well-known expressions have been developed and modified or unified by this time. In addition to smoothness and convexity problems, in the process of developing failure criteria, determining strength parameters that best-fit the whole sets of experimental data is a challenge to achieve the precise form of a failure equation (Lee et al. 2012). In general, the generalized theoretical and mathematical modeling of realistic nonlinear rock behaviour under different multiaxial stresses is a difficult task, which also needs further constants, constrains, and assumptions. Despite the existence of several strength criteria, developing a universal strength criterion capable of describing the behavior of different materials subject to anisotropic stress conditions is of great interest (Li et al. 2021; Fathipour-Azar 2022b). Comparisons of some of these failure criteria were made by Colmenares and Zoback (2002), Zhang (2008), Benz and Schwab (2008), You (2009), Rafiai (2011), Priest (2010, 2012), Zhang et al. (2010), Lee et al. (2012), Jiang and Xie (2012), Sriapai et al. (2013), Liolios and Exadaktylos (2013), Rukhaiyar and Samadhiya (2017b), Bahrami et al. (2017), Jiang (2018), Ma et al. (2020), and Feng et al. (2020). It has been revealed that the performance of a failure criterion to polyaxial strength data is affected by both the type of failure criterion and the varying \({\sigma }_{2}\)-dependence of the rock (Ma et al. 2020). Because of the considerable differences between hard and soft rocks, different failure criteria may be used (e.g., soft rocks (Wang and Liu 2021) and hard rocks (Feng et al. 2020)). Strength models have been built under circumstances such as for specific rock type and stresses (Sheorey 1997; Yu et al. 2002; Rafiai and Jafari 2011; Rafiai et al. 2013; Moshrefi et al. 2018; Gao 2018; Fathipour-Azar 2022b), while data-oriented machine learning (ML) methods are flexible. ML as a statistical modeling technique identifies hidden and unknown implicit patterns and relation between independent and dependent parameters of a given experimental database without any explicit description. Generalization (applicability to different rocks and stress conditions) and accuracy are key factors for rock failure criteria assessment.

In the field of rock mechanics, several studies have investigated the effectiveness of using ML techniques to predict the failure strength of intact rocks under polyaxial and triaxial stress conditions. Some of these studies include Rafiai and Jafari (2011), Rafiai et al. (2013), Kaunda (2014), Zhu et al. (2015), Rukhaiyar and Samadhiya (2017a), Moshrefi et al. (2018), and Fathipour-Azar (2022b).

Rafiai and Jafari (2011) and Rafiai et al. (2013) developed artificial neural network (ANN)-based failure criteria for different rocks under triaxial and polyaxial conditions. These criteria were compared with traditional failure criteria proposed by Bieniawski and Yudhbir, Hoek and Brown, modified Weibols and Cook, and Rafiai, and showed better efficiency. Similarly, Kaunda (2014) used ANN to study the effect of intermediate principal stress on the strength of intact rock for five different rock types.

Zhu et al. (2015) used least squares support vector machines (LSSVM) to establish a criterion for rock failure and compared it with Mohr–Coulomb and Hoek–Brown criteria. Rukhaiyar and Samadhiya (2017a) used ANN to predict the polyaxial strength of intact sandstone rock types and found it to be more accurate than five conventional polyaxial criteria namely modified Wiebols and Cook, Mogi-Coulomb, modified Lade, 3D version of Hoek–Brown, and modified Mohr–Coulomb criteria for testing dataset.

Moshrefi et al. (2018) compared ANN, SVM, and multiple regression models to predict the triaxial and polyaxial strength of shale rock types. They found that ANN predicted strength with minimum root mean squared error compared to Drucker-Prager and Mogi-Coulomb failure criteria. Fathipour-Azar (2022b) proposed an interpretable multivariate adaptive regression splines-based polyaxial rock failure strength with \({R}^{2}\) = 0.98 and used multiple linear regression, SVM, random forest, extreme gradient boosting, and K-nearest neighbors methods to predict major principal stress (\({\sigma }_{1}\)) at the failure of intact rock material under the polyaxial stress condition. In general, using ML techniques displayed superior performance accuracy and generalization ability in predicting the failure strength of different intact rocks subject to polyaxial conditions compared with conventional failure criteria in the form of such as Drucker–Prager, modified Weibols and Cook, and Mogi-Coulomb criteria.

To our knowledge, there is not an effort in the literature consisting of probabilistic and interpretable tree based-ML methods, namely the Gaussian process regression model (GP), random tree (RT), and M5P algorithms, to predict failure strength (major principal stress at failure) in polyaxial empirical and computational failure models for rocks. One of the advantages of using the GP model is its probabilistic nature, which allows the model to define the space of functions that relate inputs to outputs by specifying the mean and covariance functions of the process. By doing so, the GP provides a more informative and flexible representation of the underlying data distribution than deterministic models, and allows for uncertainty quantification in both the predictions and the model parameters. Tree-based models provides a more transparent way to predict rock failure strength for different rock types. The advantage of tree-based approach lies in its interpretability, which enables the investigation of how the algorithm uses the selected inputs and help in understanding the contribution of each input variable to the output, which can be valuable in rock engineering applications.

This study explores this gap in the current literature by implementing not only with GP, RT, and M5P algorithms but also by using a hybrid approach based on boosting additive regression and these three ML methods as an alternative way to the commonly used black box models (e.g., ANN) or conventional models to predict failure strength (major principal stress) from rock type, minor principal stress, and intermediate principal stress data. The advantage of using a hybrid approach based on boosting additive regression and these three ML methods is that it can combine the strengths of different models and improve the accuracy of the predictions. Boosting additive regression can enhance the performance of the GP and tree-based models by combining them in an ensemble method that focuses on the strengths of each model. The hybrid approach can also reduce overfitting and increase the generalization of the model, allowing it to be applied to a wider range of rock types. The proposed approaches also offer the advantage of generalization as it can be applied to a wide range of rock types, unlike the conventional approaches that are often designed for each specific rock type separately. The validation and comparison of developed failure models were performed using coefficient of determination (\({R}^{2}\)), root mean square error (RMSE), and mean absolute error (MAE) statistical metrics. Moreover, a sensitivity analysis was also performed and discussed to evaluate the effects of the input parameters on the polyaxial rock strength modelling process.

2 Data Mining Algorithms

Data mining is an approach that employs data-oriented techniques to find unknown and complex patterns and relationships within the data. In this study, ML techniques, namely Gaussian process regression model (GP), random tree (RT), M5P, and additive regression (AR) models are implemented to predict major principal stress of rock at failure. The performances of different models were assessed based on calculating the error indices of the RMSE and MAE. RMSE is used to measure the differences between predicted values by the models and the actual values. MAE is a quantity used to measure how close predictions are to the actual values. \({R}^{2}\) is also used to evaluate the correlation between the actual and predicted values. The three statistical RMSE, MAE, and \({R}^{2}\) formulas that are utilized to compare the performances of developed models are as follows:

$${\text{RMSE}}\,\, = \,\,\sqrt {\frac{{\sum\nolimits_{k = 1}^{N} {\left( {t_{k} - y_{k} } \right)^{2} } }}{N}}$$
(1)
$${\text{MAE}} = \frac{1}{N}\,\,\mathop \sum \nolimits_{k = 1}^{N} \left| {t_{k} - y_{k} } \right|$$
(2)
$$R^{2} \,\, = \,\,1 - \frac{{\sum\nolimits_{k = 1}^{N} {\left( {t_{k} - y_{k} } \right)^{2} } }}{{\sum\nolimits_{k = 1}^{N} {\left( {t_{k} - \overline{t}} \right)^{2} } }},$$
(3)

where \({t}_{k}\) and \({y}_{k}\) are target and output of developed models for the kth output, respectively. \(\overline{t }\) is the average of targets of models and \(N\) is the total number of events considered. The models that minimized the two error measures beside the optimum of \({R}^{2}\) is selected as the best ones.

2.1 Gaussian Process Regression Model

Gaussian process (GP) regression is a nonparametric Bayesian method to regression issues (Rasmussen and Williams 2006; Wang 2020). Because of the kernel functions, GP regression is very efficient in modeling nonlinear data.

Consider a training dataset of \(D = \left\{ {x_{i} ,y_{i} } \right\}_{i = 1}^{n}\), where \(X\in {R}^{D*n}\) represents the input data (design matrix) and \(y\in {R}^{n}\) is the corresponding output vector. In this study, rock type, minor principal stress, and intermediate principal stress are input variables for predicting failure strength (major principal stress). The GP regression output is major principal stress. Therefore, \(x = \left[ {{\text{rock type}},{ }\sigma_{3} ,\sigma_{2} } \right]\) and \(y=[{\sigma }_{1}]\). In GP regression, it is assumed that the output can be expressed as follows (Rasmussen and Williams 2006; Ebden 2015):

$$y=f\left(x\right)+\varepsilon ,$$
(4)

where \(\varepsilon \sim N(0,{\sigma }_{n}^{2})\in R\) is the equal noise variance for all \({x}_{i}\) samples.

The GP method considers n observations in \(y=\left\{{y}_{1},\cdots ,{y}_{n}\right\}\) vector as a single point instance of a multivariate Gaussian distribution. This Gaussian distribution can also be assumed to have the mean of zeros. The covariance function defines the relationship of one observation to another.

A covariance function \(k(x, x\mathrm{^{\prime}})\) describes a relationship between observations and is often defined by "exponential squares" in GP method to approximate function, which is as follows:

$$k\left( {x,{ }x^{\prime}} \right) = \sigma_{f}^{2} \times {\text{exp}}\left[ {\frac{{ - \left( {x - x^{\prime}} \right)^{2} }}{{2l^{2} }}} \right] + \sigma_{n}^{2} \delta \left( {x,{ }x^{\prime}} \right),$$
(5)

where \({\sigma }_{f}^{2}\) denotes the maximum allowable covariance. It is worth noting that \(k\left(x, {x}^{\mathrm{^{\prime}}}\right)\) equals to the maximum allowable covariance only when \(x\) and \({x}^{\mathrm{^{\prime}}}\) are so close to each other; thus, \(f\left(x\right)\) is approximately equal to \(f\left({x}^{\mathrm{^{\prime}}}\right)\). Besides, \(l\) indicates the kernel function's length. Furthermore, \(\delta \left(x, {x}^{\mathrm{^{\prime}}}\right)\) is the Kronecker delta function, which has the following definition:

$$\delta_{ij} = 1\,{\text{if}}\,i = j\,{\text{and}}\,\delta_{ij} = 0\,{\text{if}}\,i \ne j$$
(6)

In terms of the training dataset, final aim of the learning process is to predict the output value of \(y*\) for a new input pattern. To accomplish this, three covariance matrices should be developed as follows:

$$\begin{aligned} K & = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\begin{array}{*{20}c} {k\left( {x_{1} ,x_{1} } \right)} \\ {k\left( {x_{2} ,x_{1} } \right)} \\ \end{array} } & {\begin{array}{*{20}c} {k\left( {x_{1} ,x_{2} } \right)} \\ {k\left( {x_{2} ,x_{2} } \right)} \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} \cdots \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} {k\left( {x_{1} ,x_{n} } \right)} \\ {k\left( {x_{2} ,x_{n} } \right)} \\ \end{array} } \\ \end{array} } \\ {\begin{array}{*{20}c} {\begin{array}{*{20}c} \vdots \\ {k\left( {x_{n} ,x_{1} } \right)} \\ \end{array} } & {\begin{array}{*{20}c} \vdots \\ {k\left( {x_{n} ,x_{2} } \right)} \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} \ddots \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} \vdots \\ {k\left( {x_{n} ,x_{n} } \right)} \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] \\ K_{*} & = \left[ {k\left( {x_{*} ,x_{1} } \right){ }k\left( {x_{*} ,x_{2} } \right){ } \cdots { }k\left( {x_{*} ,x_{n} } \right)} \right] \\ K_{{**}} & = k\left( {x_{*} ,x_{*} } \right) \\ \end{aligned}$$
(7)

The data sample can be represented as a sample of a multivariate Gaussian distribution based on the Gaussian distribution assumptions, as follows:

$$\left[ {\begin{array}{*{20}c} y \\ {y_{*} } \\ \end{array} } \right]\sim N\left( {0,\left[ {\begin{array}{*{20}c} K & {K_{*}^{T} } \\ {K_{*} } & {K_{{**}} } \\ \end{array} } \right]} \right),$$
(8)

where \(T\) is matrix transpose. Since \(\left.{y}_{*}\right|y\) is developed from a multivariate Gaussian distribution with the mean of \({K}_{*}{K}^{-1}y\) and the variance of \({K}_{**}-{K}_{*}{K}^{-1}{K}_{*}^{T}\), the estimated mean and variance of the predicted output \({y}_{*}\) are stated as follows:

$$\begin{aligned} E\left( {y_{*} } \right) & = K_{*} K^{ - 1} y \\ var\left( {y_{*} } \right) & = K_{{**}} - K_{*} K^{ - 1} K_{*}^{T} \\ \end{aligned}$$
(9)

Following the determination of kernel function hyperparameters, Bayesian inference can find model parameters such as \(x\) and \({\sigma }_{n}\). Following training, the GP model can be used to predict unknown values based on known input values.

It is important to select a suitable covariance or kernel function since it has a direct impact on predictive efficiency. In the study, two different (Gaussian or Radial basis kernel and Pearson VII kernel function (PUK)) widely used and well-understood kernel functions were selected for GP model development and to provide a good baseline for comparison. These two kernels have been shown to perform well in a variety of applications (Fathipour-Azar 2021a, b; 2022a, b, c, d, e).

$$k_{RBF} \left( {x,{ }x_{i} } \right)\,\, = \,\,e^{{ - \gamma x - x_{i}^{2} }}$$
(10)
$$k_{PUK} \left( {x,{ }x_{i} { }} \right)\,\, = \,\,\left( {1/\left[ {1 + \left( {2\sqrt {x - x_{i} }^{2} \sqrt {2^{{\left( {1/\omega } \right)}} - 1} /\sigma } \right)^{2} } \right]^{\omega } } \right),$$
(11)

where \(\gamma\), \(\sigma\), and \(\omega\) are kernels parameters (also known as hyper-parameters). In this study, the data are normalized before fitting a GP using the following equation:

$$d_{norm} = \frac{{d_{i} - d_{min} }}{{d_{max} - d_{min} }},$$
(12)

where \({d}_{\mathrm{norm}}\) is the normalized data, \({d}_{i}\) represents the experimental value of value for \(i\)th data point, and \({d}_{\mathrm{max}}\) and \({d}_{\mathrm{min}}\) show the maximum and minimum values of the data, respectively.

2.2 Random Tree (RT) Model

A decision tree constructs classification or regression models in the framework of a tree structure. RT uses a bagging idea to splits a random data set into sub-spaces for constructing a decision tree. Therefore, RT is generally built by a stochastic process and assigning the best among the sub-spaces of randomly selected attributes at that node. Since the features are chosen at random, all trees in the group have an equal chance of being sampled (Witten and Frank 2005).

2.3 M5P Model

The M5P tree is a reconstruction of Quinlan's M5 algorithm (Quinlan 1992). This technique is based on a binary decision tree that assigns a series of linear regression functions at the leaf (terminal) node, which helps in estimating continuous numerical parameters. This model uses two steps to fit the model tree. In the first step, the data are split into subsets and form a decision tree. The splitting of decision tree is based on treating the standard deviation of class values reaching a node. It measures the error at the nodes and evaluates the expected reduction in error as a result of testing each parameter at the node. The Standard Deviation Reduction (SDR) is calculated as follows:

$$SDR = sd\left( N \right) - \sum \frac{{\left| {N_{i} } \right|}}{\left| N \right|}sd\left( {N_{i} } \right),$$
(13)

where \(N\) is a set of examples that reach the node. \({N}_{i}\) is ith outcome of subset of examples of potential set, and \(sd\) is the standard deviation. Due to the splitting process, the standard deviation of child node will be less than that of the parent node (Quinlan 1992). After evaluating all possible splits, M5P tree chooses the one that maximises the error reductions. This process of splitting the data may overgrow the tree which may cause over fitting. To overcome this overfitting, the next step, the overgrown tree is pruned and then the pruned sub-trees are replaced with linear regression functions.

2.4 Additive Regression (AR)

In order to improve the performance of the above mentioned (i.e., GP, RT, and M5P) basic regression base approaches, additive regression (AR) as an implementation of gradient boosting ensemble learning technique is used (Friedman 2002). In this algorithm, each iteration applies a new base model to the residuals from the previous one. The predictions of each base model are added together to make a final estimation.

3 Dataset for Models

Polyaxial tests dataset for fourteen different rocks including Aghajari sandstone; Jahrom Dolomite; Soltanieh Granite; Pabdeh Shale; Asmari Limestone; Karaj Trachyte; Karaj Andesite; Naqade Amphibolite; Jolfa Marble; Hormoz Salt; Mahalat Granodiorite; Shourijeh Siltstone; Shahr-e babak Hornfels; Chaldoran Metapelite rocks’ results were taken from published literature (Bahrami et al. 2017) are used for assessing data-oriented strength criteria. Scatter plot of variables with correlation and diagonal frequency histograms is presented in Fig. 1. Scatter plots below and on the left of the diagonal (lower triangle) show the relationships failure strength (major principal stress), minor principal stress, intermediate principal stress, and rock type. Values above and on the right of the diagonal (upper triangle) show the coefficient of determination between variables. The diagonal graphs show the frequency histograms and density plots of the corresponding variable. A total of 480 samples were used for the predictive modeling, out of which, 80% were randomly selected for the training of models and the remainder 20% for testing developed models in estimating major principal stress based on minor and intermediate principal stresses. The statistical parameters of the training and testing datasets are presented in Table 1.

Fig. 1
figure 1

Scatter plots (lower diagonal), h istograms (diagonal), and coefficient of determination (upper diagonal) between failure strength (major principal stress), minor principal stress, intermediate principal stress, and rock type

Table 1 Statistics analysis of the training and testing datasets

4 Results

In the present study, data-oriented surrogate models are first developed and compared to predict polyaxial rock strength. To improve the performance of these basic regression base approaches, AR as an implementation of boosting approach is used.

Grid search optimization is applied to tune hyperparameters of ML models. Grid search trains a ML model with each combination of possible values of hyperparameters and assesses its performance using a predefined measure. For the GP regression based on the RBF kernel, the optimum values for \(\varepsilon\) and \(\gamma\) are determined 0.001 and 3 respectively. The optimal hyper-parameters of the GP model based on the PUK kernel are \(\varepsilon =0.001\), \(\omega =0.1\), and \(\sigma =4\), which provide better performance values. In case of the RT model, 3 randomly chosen attribute is determined as optimal parameter. For the M5P model, minimum 8 instances at a leaf node are used.

The calculated performance indices (\({R}^{2}\), RMSE, MAE) for developed data-oriented models in the training and testing phases are shown in Fig. 2. Comparison of results presents that the AR-RT outperforms other developed models for the training and testing periods, indicating improved performance in terms of the highest \({R}^{2}\) (1) and the lowest RMSE (0 MPa) and MAE (0 MPa) in the training phase and the highest \({R}^{2}\) (0.987) and the lowest RMSE (29.771 MPa) and MAE (22.517 MPa) in the training phase. According to the statistical indices presented in Fig. 2, using AR based on boosting enhanced the model’s performance. This improvement is more noticeable in AR-M5P model compared to M5P model in the training and testing phases.

Fig. 2
figure 2

Coefficient of determination, root mean square error (RMSE), and mean absolute error (MAE) for developed data-oriented models for the training (blue) and test (orange) data

The M5P model tree-based polyaxial rock strength regression tree structure is shown in Fig. 3. As can be observed, 21 linear models (LMs) or rules have been constructed based on conditional statements. In this diagram, boxes signify terminal leaf nodes with labels within, whilst ellipses indicate other nodes with a symbol inside alluding to the split feature. The splitting rules are indicated on the corresponding paths. For each leaf of the tree, more information is presented in brackets. For instance, LM 1 contains 26 instances, with a 4.206% error in that leaf. It is obvious that the accumulation of number of instances in each leaf equals 384, the number of the training dataset. The LMs for all situations obtained by the M5P model are given in Table 2.

Fig. 3
figure 3

Tree visualization of constructed M5P regression tree for the polyaxial rock strength prediction. The terminal leaf nodes are represented by boxes with labels within, while the other nodes are represented by ellipses with symbols inside that correspond to the feature where the split occurs. On the respective pathways, the dividing rules are listed

Table 2 LMs for the established polyaxial rock strength regression tree

Figure 4 presents the variation in predicted values of major principal stress using different surrogate modeling techniques in comparison with experimental values of major principal stress.

Fig. 4
figure 4

Experimental and predicted values of major principal stress and its corresponding scatter plots during the testing phase of the applied intelligence predictive models. a GP-RBF, b GP-PUK, c RT, d M5P, e AR-GP-RBF, f AR-GP-PUK, g AR-RT, and h AR-M5P models

Although according to Fig. 2, AR-RT, AR-GP-PUK, GP-PUK, and RT models demonstrated high performances in terms of high accuracy and low error, Figs. 2 and 4 show that the AR-RT, RT, and the AR-M5P strength models are closer to the experimental value than other models in the testing phase in comparison to other evolved models.

Figure 5 presents cumulative distribution functions (CDFs) of the observed and predicted major principal stress, \({\sigma }_{1}\) (MPa) using the models developed for training and testing datasets, respectively. In Fig. 5, the CDFs of estimated \({\sigma }_{1}\) from AR-RT, AR-GP-PUK, GP-PUK, and RT models are the same as of measured \({\sigma }_{1}\). This agreement suggests that the information contained in the estimated \({\sigma }_{1}\) using these developed models is consistent with that obtained from the measured \({\sigma }_{1}\). Although the CDFs of the estimated \({\sigma }_{1}\) obtained from other developed models are also close to that of measured \({\sigma }_{1}\) and follows the pattern and trend of the CDF of measured \({\sigma }_{1}\), small errors and deviations could be seen between these models and measured \({\sigma }_{1}\). This further confirms the statistical results of the estimated \({\sigma }_{1}\) (Figs. 2 and 4), indicating that RT, hybrid AR-RT, and hybrid AR-M5P models provide better estimates than other models.

Fig. 5
figure 5

Cumulative distribution function of the observed and predicted major principal stress, \({\sigma }_{1}\) (MPa) using the models developed for a training and b testing datasets

The cumulative distribution function (CDF) versus relative error was provided in Fig. 6 for all the developed ML-based failure criteria. According to Fig. 6a, AR-RT, AR-GP-PUK, and GP-PUK-based failure criteria have 100% probability that error in prediction will be 0 in the training data, respectively. These results are in consistency with Fig. 5a. In the testing phase (Fig. 5b), the probability will be more than 70% of predicting error within 10% for AR-RT, RT, AR-M5P, and AR-GP-PUK-based failure criterion. Therefore, AR-RT based failure criterion demonstrates a higher degree of confidence and accordingly is effective for strength prediction.

Fig. 6
figure 6

Cumulative distribution function of ML-based polyaxial rock failure criteria; a training and b testing phases

Overall error prediction distribution of developed models in training and testing phase is shown in the violin plot in Fig. 7. The negative and positive prediction error values indicate the developed models’ over- and under-estimation behavior, respectively. In this figure, the prediction error of AR-RT is lower than the rest models in the training and testing phases. Approximately similar prediction errors could be seen for RT in the training and testing phases. According to this figure, AR-GP-PUK has 0 error in training phase; however, the noticeable error is seen in the testing phase.

Fig. 7
figure 7

Violin plot for error prediction using the models developed for a training and b testing datasets

A Taylor diagram (Taylor 2001) is a graphical representation of comparing various model outcomes to measured data. The standard deviation, RMSE, and R between different models and measurements are depicted in this diagram. This diagram is plotted for major principal stress in Fig. 8. The location of each model in the diagram indicates how closely the predicted pattern matches with measurements. According to these figures, due to the distance of developed models points to the measured point, developed AR-RT model is generally promising method in estimating shear strength properties.

Fig. 8
figure 8

Taylor diagram indicating models’ performances in a training and b testing phases

The efficacy of the proposed data-oriented models was also compared with each other and against several well-known failure criteria over the literature including the Mohr–Coulomb (MC); Hoek–Brown (HB); Modified Lade (ML); Drucker-Prager (DP); Linear Mogi 1971a, b; Modified Wiebols and Cook (MWC); 3D Hoek–Brown (3D HB); Bieniawski-Yudhbir (BY); Hoek–Brown-Matsuoka-Nakai (HBMN); Modified Mohr–Coulomb (MMC) in uniaxial compressive strength (UCS) prediction of fourteen rocks as depicted in Fig. 9. Data-oriented based strength models are generally robust modeling techniques that also predict UCS in consistency with those of well-established criteria from best fit to experimental data. As a result, ML approaches are able to capture the nonlinearity of the polyaxial strength response of rock.

Fig. 9
figure 9

Comparison of developed data-oriented models and some well-known criteria. Note: Mohr–Coulomb (MC); Hoek–Brown (HB); Modified Lade (ML); Drucker-Prager (DP); Linear Mogi 1971a, b; Modified Wiebols and Cook (MWC); 3D Hoek–Brown (3D HB); Bieniawski-Yudhbir (BY); Hoek–Brown-Matsuoka-Nakai (HBMN); Modified Mohr–Coulomb (MMC)

5 Sensitivity Analysis

Sensitivity analysis was performed to identify the most effective input parameter for predicting polyaxial rock strength using the established models. By eliminating one input parameter in each case and determining its effect on polyaxial rock strength using \({R}^{2}\) and RMSE performance metrics. Figure 10 shows that the prediction of polyaxial rock strength is mainly influenced by \({\sigma }_{3}\) and rock type, with \({\sigma }_{2}\) having the least significant impact on the strength.

Fig. 10
figure 10

Sensitivity analysis to determine the impact of each variable on the polyaxial rock strength

6 Discussion

The accurate determination of rock strength subject to various loading conditions and given circumstances is pivotal for a wide range of geoengineering applications (Zhang et al. 2010; Haimson and Bobet 2012; Lee et al. 2012; Burghardt 2018; Wang and Liu 2021; Bao and Burghardt 2022), and various empirical, mathematical, and theoretical strength criteria have been proposed for strength prediction in geoengineering practice. However, finding the most appropriate criterion for a given situation remains challenging (Ulusay and Hudson 2012), and failure models based on experimental results of one specific type of geomaterial are not applicable to other types of geomaterials. In geoengineering practice, all failure criteria need to be modified by trial and error (Wang and Liu 2021). In addition, defining the real behavior of geomaterials under different stress circumstances is difficult due to the complexity of the materials. Conventional failure models require assumptions, and the number of material parameters that need to be determined increases as the complexity of models increases, which restricts their practical application in engineering (Gao 2018).

ML-based failure models have emerged as a promising approach to address these challenges. ML-based models can process large amounts of data and learn complex model functions from input and output training experimental datasets without any assumption and physical background. This allows abstract information or theoretically unknown behaviour to be represented. Moreover, the ML models can be improved by retraining them with new data, and the established models and learned information can be stored (Fathipour-Azar and Torabi 2014; Fathipour-Azar et al. 2017, 2020; Gao 2018; Zhang et al. 2020; Fathipour-Azar 2021a, b; 2022a, b, c, d, e, f; 2023a, b).

In this study, the efficiency of probabilistic (i.e., GP) and tree-based (i.e., RT and M5P) ML algorithms is demonstrated first in predicting failure strength of rock under polyaxial conditions. The GP is a nonparametric kernel-based Bayesian method that computes posterior predictive distributions for new test inputs and allows the quantification of uncertainty in model estimations. While Bayesian analysis is a general framework for statistical inference that combines prior knowledge with new data to estimate parameters and quantify uncertainties (e.g., Burghardt 2018; Bao and Burghardt 2022), GP is a specific method for modeling functions as Gaussian processes. Nonlinear regression can also be performed using regression trees. The RT and M5P algorithms use regression trees to partition the space into smaller parts and apply simple models to each of them. While the M5P regression tree has a lower predictive performance than that of the other ML algorithms in this study (Fig. 2), RT and M5P models provide an intuitive visualization and explicit description of how inputs affect the output, which is beneficial in engineering practices (Fig. 3 and Table 2).

Finally, boosting-based AR is used to enhance efficiency of the GP, RT, and M5P algorithms in terms of high accuracy and low error. According to the findings of this study, the prediction strength and performance of individual algorithms could be enhanced by hybrid algorithms for this dataset (Figs. 2, 4, 5, 6, 7 and 8). This is due to the adaptability and structural compatibility of AR with different models. This improvement is more noticeable in M5P model results with the results of hybrid AR-M5P model. The study shows that the accuracy and performance of the ML models are dependent on the type of algorithm used, and hybrid models that combine multiple algorithms can improve the predictive accuracy.

Comparison with well-known failure criteria over the literature showed that the developed ML-based strength models were able to predict UCS in consistency with those of well-established criteria from best fit to experimental data. This highlights the effectiveness of data-oriented modeling techniques in capturing the nonlinearity of the polyaxial strength response of rock.

The sensitivity analysis revealed that predicting polyaxial rock strength is primarily influenced by \({\sigma }_{3}\) and rock type, followed by \({\sigma }_{2}\) with a less significant impact. This indicates that the microstructure and properties of the rock are important factors in determining its strength under polyaxial loading conditions.

Properties of rock vary with rock type. In this context, a wide variety of data using 14 rocks from different types of rocks including igneous, metamorphic, and sedimentary rocks are employed as database of simulations, to demonstrate the efficiency of the ML algorithms. \({\sigma }_{3}\) and \({\sigma }_{2}\) ranges from 5 to 140 MPa and 5 to 360 MPa, respectively (Table 1 and Fig. 1). The findings of this study contribute to the field of rock mechanics by providing insights into the factors that influence polyaxial rock strength and demonstrating the effectiveness and potential of these individual or hybrid ML-based techniques in improving the accuracy and reliability of rock strength predictions, which can have important applications in the design and construction of rock engineering structures. The integration of various regression models through boosting can enhance the accuracy and robustness of predictions while preventing overfitting. Additionally, Bayesian analysis can be applied to a wider range of problems beyond function modeling (e.g., Burghardt 2018; Bao and Burghardt 2022). These methods can aid in making informed-site decisions in a variety of subsurface engineering applications.

Further research is needed to investigate the generalization of the developed models to other rock types and testing conditions and to evaluate their effectiveness in practical applications. Moreover, a larger dataset with more explanatory data variables could be analyzed to improve the model’s precision and reliability in future research.

7 Conclusion

Data-oriented models for predicting polyaxial rock strength can be valuable methods in actual projects. In this study, hybrid additive regression combined with three ML algorithms is utilized to estimate polyaxial rock strength and capture nonlinear patterns. The ML algorithms employed include Gaussian process regression (GP) with two kernels, random tree (RT), and M5P methods. Three parameters (rock type, minor, intermediate, and major principal stress) are used from the 480 polyaxial rock experiments from published research to construct the data-oriented surrogate models. The AR-RT performed superior to the other individual and hybrid models in the training and testing datasets. The efficiency of the hybrid models to individual developed models is demonstrated in terms of high accuracy and low error. The hybrid AR-RT model with \({R}^{2}\) = 1, RMSE = 0 MPa, and MAE = 0 MPa in training period and \({R}^{2}\) = 0.987, RMSE = 29.771 MPa, and MAE = 22.517 MPa in testing period could be regarded to be excellent polyaxial rock strength surrogate model. The results of the sensitivity analysis indicate that \({\sigma }_{3}\) and rock type are the most important parameters for measuring the polyaxial strength failure of the rock.