1 Introduction

The strength of intact rock mainly depends upon the shear strength parameters. Therefore, determinations of the shear strength properties of the rock are of primary interest in the geomechanics and geoengineering investigation. The common procedure to determine the shear strength parameters of rock is triaxial tests. While the experimental studies need notable expenditure and time to have a well-equipped laboratory with series of experiments, the predictive models are valuable to advance understanding strength parameters of rock under various conditions. Moreover, since complex instruments such as a triaxial test apparatus are used, high technological skills are often needed. Therefore, it is time to establish reliable approaches that are able to predict the geomechanical properties of rocks.

Several scholars have made contributions in the field of predicting shear strength parameters in rocks. Sivakugan et al. [1], Karaman et al. [2], Shen and Jimenez [3], and Moon and Yang [4] investigated the feasibility of compressive and tensile strength data (UCS and UTS values) to determine shear strength parameters of rock using mathematical regressions. Shen et al. [5] determined Mohr–Coulomb shear strength parameters from generalized Hoek–Brown criterion. Zhang et al. [6] studied the statistical correlation between shear strength parameters of rock masses using Hoek–Brown (H–B) failure criterion and triaxial test result of intact rocks. As indicated [6,7,8], the utilization of incorrect \(C - \varphi\) correlation coefficient values may result in either underestimation or overestimation of failure probability in risk analysis of geomechanics engineering. Wei et al. [9] estimated the equivalent Mohr–Coulomb parameters based on the Hoek–Brown criterion. They also investigated its applicability in slope analysis.

Recently, soft computing techniques have been used to predict shear strength parameters of rock. Armaghani et al. [10] estimated shear strength parameters (interlocking and friction angle) of shale rock using rock index tests through particle swarm optimization–artificial neural network (PSO–ANN) integrated model. Rock index tests include dry density, point load index, Brazilian tensile strength, ultrasonic velocity, and Schmidt hammer test. They found that the proposed PSO–ANN model could accurately predict shear strength parameters of shale rock. Shen and Jimenez [3] employed genetic programming to estimate the Mohr–Coulomb shear strength parameters of sandstone rock using two strength indices (UCS and UTS) under triaxial stress conditions. Confining stress (\(\sigma_{3}\)) is taken into account to consider the nonlinearity of the failure envelopes and enhance the effectiveness of estimates. Based on laboratory tests on shale rock specimens [10, 11] studied rock interlocking through two hybrid ANN-based models by considering genetic algorithm (GA) and fuzzy inference system. They indicated that developed hybrid models are reliable. Moreover, the ANFIS predictive model performed slightly better than the GA-ANN method in terms of interlocking estimation. Shao et al. [12] predicted the internal friction angle of shale rocks using two hybrids neural net-based models that integrate ANN with GA and imperialist competitive algorithm (ICA). They considered p-wave velocity, Schmidt hammer, and point load as a system input. Based on performance indices, they presented ICA–ANN as a new method for estimating the internal friction angle of shale rock samples. Although the NN-based method is widely used in this topic, in recent years, there has been a growing interest in the application of data-centric machine learning (ML) paradigms in geomechanics and geoengineering investigations [13,14,15,16,17,18,19,20,21,22,23,24].

The present study is the first attempt at indirect prediction of Mohr–Coulomb shear strength parameters such as internal friction angle and cohesion strength of sandstone rock employing random forest (RF), alternating model tree (AT), and support vector machine (SVM) with two kernels [radial basis function (RBF) and Pearson VII kernel function (PUK)] methods under triaxial conditions using the uniaxial compressive strength (UCS) and uniaxial tensile strength (UTS). Quantification of potentiality and functionality of proposed ML algorithms are provided in terms of statistical performance indicators. The sensitivity analyses are also carried out to determine the importance of each input parameter in order to predict the Mohr–Coulomb shear strength parameters. The rest of the paper is organized as follows. Section 2 reviews the background knowledge of ML algorithms. Section 3 provides the study data. Section 4 presents the performance measure indices. Section 5 delivers the obtained results and discussion. Section 6 presents sensitivity analysis. Finally, conclusions are represented in Sect. 7.

2 Data-oriented computational methods

2.1 Random forest (RF)

Random forest (RF) is an ensemble tree-based regression method. The RF algorithm consists of regression trees trained using various bootstrap samples (bagging) of the training database. Each tree functions regression on its own, and the final output decision of RF is the average of the outputs of individual regression tree regressors. In the case of bagging, the training dataset contains about two-third of the data from the actual training dataset (InBag). The rest out-of-bag (OOB) training data (one-third out of the bootstrap sampling data) from each tree developed are utilized to assess estimation error and variable significance. The RF regression model is built using two main parameters: the number of trees and the selected features that split the nodes [25]. The flow chart of the RF algorithm is presented in Fig. 1.

Fig. 1
figure 1

Flow chart of RF algorithm

2.2 Alternating model tree (AT)

Decision tree techniques are commonly used approaches for classification and regression problems. Alternating model tree (AT) is a new algorithm based on ensemble learning method whose strategy is to iteratively correct the predictions of an additive model by using the residual errors it makes on the training data as target [26]. The layout of the AT is a single tree. AT performs regression with two different nodes, namely splitter and prediction nodes. The splitter node splits numeric attributes at the median value. An estimator node that produces a model response using a linear regression method. The AT model is composed of \(k\) base models of the form of \(F_{k} \left( {\vec{x}} \right) = \sum\nolimits_{j = 1}^{k} {f_{j} } \left( {\vec{x}} \right)\) that are fitted in a forward stage-wise additive regression manner to minimize a squared errors measure, i.e., \(\sum\nolimits_{i = 1}^{n} {\left( {F_{k} \left( {\vec{x}_{i} } \right) - y_{i} } \right)^{2} }\) through \(n\) training samples \(\left( {\vec{x}_{i} ,y_{i} } \right)\). The target variable is the residual of the fit at the \(k - 1\) stage for every iteration \(k\). The approach uses a shrinkage parameter \(\lambda\), which is a multiplicative factor in the range (0, 1] that is applied to the predictions of the base model before they are included in the additive model. It dampens the values towards the mean prediction and aids in model regularization.

2.3 Support vector machine (SVM)

Support vector machine (SVM) is based on statistical learning theory [27]. The SVM implements the structural risk minimization principle. Let \(\left( {x,y} \right)\) be a training dataset, where \(x \in R^{n}\) stands for an input vector with \(n\) components and \(y\) represents the corresponding output. In the regression model of SVM, the goal is to determine a function \(f\left( x \right)\) whose deviation from each target \(y_{i}\) is at most \(\varepsilon\) for all training data. The estimating function can be represented by the following equation:

$$f\left( x \right) = \omega .\varphi \left( x \right) + b$$
(1)

where \(\omega\) is the weight vector, \(b\) is the bias term, and \(\varphi \left( x \right)\) denotes a set of nonlinear transformations. The coefficients \(\omega\) and \(b\) are calculated by minimizing the following regularized risk function:

$$\begin{aligned} & \frac{1}{2}\vert\vert\omega\vert\vert^{2} + C\mathop \sum \limits_{i = 1}^{n} \left( {\xi_{i} ,{ }\xi_{i}^{*} } \right) \\ & {\text{Subject}}\;{\text{to}}\;:\;\left\{ {\begin{array}{*{20}c} {\omega .\varphi \left( {x_{i} } \right) + b - y_{i} \le \varepsilon + \xi_{i}^{*} } \\ {y_{i} - \omega .\varphi \left( {x_{i} } \right) - b \le \varepsilon + \xi_{i} } \\ {\xi_{i} ,{ }\xi_{i}^{*} \ge 0,{ }i = 1,2,..,n} \\ \end{array} } \right. \\ \end{aligned}$$
(2)

where \(C\) is a regulation parameter. \(\xi_{i}\) and \(\xi_{i}^{*}\) are positive slack variables. The primal Lagrangian form is used to solve this constrained optimization problem [27, 28], and finally written as follows:

$$f\left( x \right) = \mathop \sum \limits_{i = 1}^{n} \left( {\beta_{i} - { }\beta_{i}^{*} } \right).K\left( {x_{i} ,x_{j} } \right) + b$$
(3)

where \(\beta_{i}\) and \(\beta_{i}^{*}\) are the Lagrange multipliers and \(K\left( {x_{i} ,x_{j} } \right)\) is the kernel function. It is necessary to choose suitable covariance or kernel function since it directly affects predictive efficiency. In the study, two different (radial basis kernel and Pearson VII kernel function (PUK)) commonly used kernel functions are selected for SVM development.

$$K_{\text{RBF}}(x_{i}, x_{j}) = e^{-{\gamma}\vert\vert x_{i}-x_{j}\vert\vert^{2}}$$
(4)
$$K_{{{\text{PUK}}}} \left( {x_{i} ,x_{j} } \right) = \left( {1/\left[ {1 + \left( {2\sqrt {\vert\vert{x_{i} - x_{j} }\vert\vert^{2}} \sqrt {2^{{\left( {1/\omega } \right)}} - 1} /l} \right)^{2} } \right]^{\omega } } \right)$$
(5)

where \(\gamma\), \(l\), and \(\omega\) are the kernels parameters (also known as hyper-parameters). \(\vert\vert{x_{i} - x_{j}}\vert\vert\) is the Euclidean distance between points \(x_{i}\) and \(x_{j}\). Besides, the data standardization is calculated through the following equation:

$$z_{i} = \frac{{x_{i} - \mu }}{\sigma }$$
(6)

where \(x_{i}\), \(\mu\), and \(\sigma\) are the original data, mean, and standard deviation of the data, respectively.

3 Data used for the study

The comprehensive database utilized for developing proposed ML approaches is the same used by Shen and Jimenez [3] for genetic programming analysis derived from the 32 groups of sandstone triaxial test data available in RocData presented by Rocscience [29] company. Groups with some possibly implausible triaxial data and less than three points are excluded from the regression analysis. Therefore, 23 groups with 221 datasets are used for ML analysis.

Data division is the first step in constructing a model layout for prediction research. The data were divided randomly. The randomized data are divided into training (80%) and testing (20%) datasets. Training dataset is utilized for developing and learning the ML techniques, while testing dataset is used for testing the techniques based on performance criteria which are described in the next section.

Testing data are within the range of specified training dataset of shear strength parameters during the development of models. Out of 221 datasets, 176 datasets are used for the training and the remaining 45 datasets are used for testing the generalization capability of the models. In the dataset, the input variables are UCS, UTS, and \(\sigma_{3}\). Therefore, the effect of the \(\sigma_{3}\), in which the shear failure occurs is also considered in models. The output variables are cohesion strength and internal friction angle. The statistical analysis of the input and output parameters for sandstone rock is given in Table 1 and Fig. 2. In Fig. 2, the diagonal components represent the histograms and density functions of variables. The lower panels show scatter plots for each pair of variables. Each group of sandstone triaxial test data is represented with a different colour. The upper panels in the matrix indicate the correlation coefficients for all data.

Table 1 Statistics analysis of the datasets
Fig. 2
figure 2

Scatterplot matrix of datasets

4 Performance evaluation measures

In this study, three well-known performance indicators including the coefficient of determination (\(R^{2}\)), root mean square error (RMSE), and mean absolute error (MAE) are employed to evaluate the performance of predictive models of RF, AT, and SVM.

\(R^{2}\) is used as a measure of the goodness of fit of a statistical model. A higher value of \(R^{2}\) indicates a better fit for the model. The RMSE is utilized as a measure of the sample standard deviation of the difference between predicted and observed shear strength parameters values, and MAE is employed to measure the closeness of the prediction to the eventual shear strength parameters values. The smaller values of RMSE and MAE indicate better prediction with zero values for these two criteria implying a perfect prediction. The formulation of these indices is expressed by the following equations:

$$R^{2} = \frac{{\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i}^{*} - \overline{{y^{*} }} } \right)\left( {y_{i} - \overline{y}} \right)} \right]^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i}^{*} - \overline{{y^{*} }} } \right)^{2} \mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \overline{y}} \right)^{2} }}$$
(7)
$${\text{MAE}} = \mathop \sum \limits_{i = 1}^{n} \left| {y_{i}^{*} - y_{i} } \right|/n$$
(8)
$${\text{RMSE}} = \sqrt {\mathop \sum \limits_{i = 1}^{n} \left( {y_{i}^{*} - y_{i} } \right)^{2} /n}$$
(9)

where \(y_{i}^{*}\), \(y_{i}\), \(\overline{{y^{*} }}\), and \(\overline{y}\) indicate the predicted, calculated, mean value of predicted and calculated shear strength parameters values, respectively. Also, \(n\) is the number of data.

5 Results and discussion

In this study, shear strength parameters of sandstone rock are estimated using three data mining techniques. The accuracy of the proposed RF, AT, and SVM models is calculated using the evaluation indices described above. \(R^{2}\), MAE, and RMSE are used to investigate the performance of the established models by dividing the data into training data and testing data subsets. The optimum values of the hyperparameters of RF, AT, and SVM models are tuned by grid search and trial and error processes. For the RF model, two parameters, namely number of trees and features; for the AT model, the number of iterations; for the SVM-RBF model, three parameters, namely C, \(\gamma\), and \(\varepsilon\); for the SVM-PUK model, four parameters, namely C, \(\omega\), \(\sigma\), and \(\varepsilon\), are tuned. These obtained optimum hyperparameters that developed models generalize well are presented in Table 2.

Table 2 Main optimized parameters of developed models

For the three different techniques used, the \(R^{2}\), MAE, and RMSE statistics are calculated based on the training and testing datasets. The statistical results of different predictive models used in this study are summarized in Table 3. According to these statistical performance evaluation indices, the performance of the proposed models is comparable. As is seen for friction angle prediction, the RMSE value for the AT model is the lowest than that of the other models in both the training and testing sets. In the AT model for friction angle prediction, values of RMSE and MAE are determined to be 0.627 and 0.446, respectively, based on the training dataset and 1.128 and 0.78, respectively, based on the testing dataset. Although MAE of the AT model is higher than that of the SVM-PUK model in the training stage, it is the lowest than that of the other models in the training and testing phases. In addition, the \(R^{2}\) value in the AT model is the highest than that of the other models in the training dataset (\(R^{2}\) = 0.984). However, this value (\(R^{2}\) = 0.94) is marginally lower than that of the RF model (\(R^{2}\) = 0.944) and higher than that of the SVM (RBF and PUK) model in the testing dataset. This indicates that the AT model performance is better than the other models for the given data of friction angle. Although the SVM-PUK model presents better performance than the SVM-RBF model in the training dataset, the SVM-RBF model shows slightly better performance (smaller MAE and RMSE) than the SVM-PUK model in the testing dataset of friction angle.

Table 3 The evaluation indicators of \(R^{2}\), MAE, and RMSE statistics of RF, AT, and SVM models for the prediction of friction angle and cohesion

As is evident for cohesion prediction, the lowest MAE and RMSE and highest \(R^{2}\) values among developed models are for SVM-PUK model in the training dataset. These values are MAE = 0.145, RMSE = 0.5, and \(R^{2}\) = 0.997. With a slight difference with the SVM-PUK model, the RF model shows then better performance in the training dataset (MAE = 0.385, RMSE = 0.548, and \(R^{2}\) = 0.997). However, the RF model demonstrates better performance indices with the lowest MAE = 0.679, RMSE = 0.942, and highest \(R^{2}\) = 0.993 than those of the other models in the testing dataset. The SVM-RBF model shows similar performance to that of the SVM-PUK model with the values of the evaluation indices being close to those obtained based on the SVM-PUK model in the testing dataset of cohesion.

The calculated shear strength parameters values and their estimated values obtained from the proposed RF, AT, and SVM models for the train and test datasets are plotted in Figs. 3 and 4, respectively. It is clear from these graphs that the estimated shear strength values obtained from the proposed models are in good agreement with the calculated values indicating that the models employed accurately predict shear strength parameters. According to Figs. 3 and 4, the performance of the AT model is generally better compared with the RF and SVM (with RBF and PUK) models for the friction angle dataset. However, the RF model generally shows better performance for the cohesion dataset.

Fig. 3
figure 3

Comparison of calculated and predicted friction angle using RF, AT, and SVM models in the training dataset: a friction angle and b cohesion

Fig. 4
figure 4

Comparison of calculated and predicted friction angle using RF, AT, and SVM models in the testing dataset: a friction angle and b cohesion

In addition, the predicted values of shear strength parameters obtained from the RF, AT, and SVM models along with their corresponding calculated values of shear strength parameters are illustrated in the form of scatter plots in Figs. 5 and 6. Predicted values using developed models are aligned around the 45° line. According to these results, the AT and RF models show promising performances in predicting the given shear strength parameters variations based on the training and testing used datasets. Consistency and agreement between calculated and predicted data show the high capability of these methods in estimating shear strength parameters.

Fig. 5
figure 5

Scatter plot of friction angle values by various developed predictive models for training and testing datasets

Fig. 6
figure 6

Scatter plot of friction angle values by various developed predictive models for training and testing datasets

A Taylor diagram [30] is a graphical representation of comparing various model outcomes to measured data. The standard deviation, RMSE, and R between different models and measurements are depicted in this diagram. This diagram is plotted for friction angle and cohesion in Figs. 7 and 8, respectively. The location of each model in the diagram indicates how closely the predicted pattern matches with measurements. According to these figures, due to the distance of developed models points to the measured point, developed models are promising methods for estimating shear strength properties.

Fig. 7
figure 7

Comparisons of measured and predicted friction angle by developed models

Fig. 8
figure 8

Comparisons of measured and predicted cohesion by developed models

Mohr–Coulomb shear strength parameters are important in geomechanics engineering applications indicating the resistance of intact rocks with respect to applied stresses on them. Therefore, predicting the cohesion and friction angle as important shear strength parameters is necessary for rock engineering applications such as slope stabilities, tunnelling, and underground cavities. The crucial role of the shear strength of rock in such applications has driven scientists to propose reliable and cost-effective techniques for determining this parameter. Using ML methods with minimal error and high performance plays an important role in providing alternative tools to traditional experiments for estimating desired output. Consequently, comparing the performance of different models gives an insight into the identification of better models for predicting purposes. Other studies have confirmed the applicability and performance of NN-based models [10,11,12]. This study proposed and compared the accuracy of three data-centric ML techniques, namely RF, AT, and SVM models, in estimating shear strength parameters values. The performance of the models indicates that generally the AT model shows the potential in predicting friction angle, while the RF model shows a promising approach for estimating the cohesion of sandstone rocks. The obtained results demonstrate the high capability of these methods in predicting shear strength parameters.

6 Sensitivity analysis

Sensitivity analysis is performed to assess the effect of each input parameter on the developed models. In terms of the RMSE and \(R^{2}\), each surrogate model indicates the extent to which the eliminated variable would impact the model accuracy.

Figures 9 and 10 depict the influence of input variables on the friction angle and cohesion predictions, respectively. As can be seen in Fig. 9, \(\sigma_{3}\) has the most significant influence in increasing the prediction accuracy. In other words, eliminating \(\sigma_{3}\) caused a sharp increase in RMSE values and decrease in \(R^{2}\) values in all established models for friction angle and cohesion parameters. Therefore, the \(\sigma_{3}\) is the most important variable for predicting the shear strength parameters of sandstone rock. The importance of different variables can be displayed as \(\sigma_{3}\) > UCS > UTS for the cohesion parameter. However, UCS and UTS demonstrate differing influences on the friction angle parameter using established models.

Fig. 9
figure 9

Influence of removing input variables on the accuracy of different models in the prediction of friction angle parameter

Fig. 10
figure 10

Influence of removing input variables on the accuracy of different models in the prediction of cohesion parameter

7 Conclusions

The present study explores employing data-oriented ML-based solutions, namely RF, AT, and SVM algorithms, to estimate the shear strength parameters of sandstone rocks using UCS, UTS, and confining stress. The developed surrogate models show good performance in predicting shear strength parameters, with \(R^{2}\) > 0.90, MAE < 1, and RMSE < 1.6\(^\circ\) in the training and testing phases for friction angle and \(R^{2}\) > 0.98, MAE < 1.1, and RMSE < 1.5 MPa in the training and testing phases for cohesion prediction. In general, the AT model shows the potential in predicting friction angle, while the RF model shows a promising approach for estimating the cohesion of sandstone rocks. However, there is no certainty that one model will universally outperform other models. Overall, the results prove the viability of estimating shear strength parameters using the proposed techniques. Therefore, establishing faster surrogate models based on a data-driven approach will lead to a reduction in the cost, effort, and time required for measuring shear strength parameters. Finally, the results of sensitivity analysis show that the \(\sigma_{3}\) is the most important influencing input parameter in the estimation of the shear strength parameters for this dataset