1 Introduction

Accurate measurement of mode I fracture toughness is of great importance in rock mechanics applications such as slope stability analysis, tunnel excavation and rock fragmentation including blasting and hydraulic fracturing (Whittaker et al. 1992; Feng et al. 2017; Roy et al. 2018). According to the literature, mode I fracture toughness could be considered as a characterization of geomaterials, rock fragmentation index and the material property for stability analysis and modelling (Franklin et al. 1988; Afrasiabian and Eftekhari 2022). To measure the mode I fracture toughness, various laboratory procedures including the chevron bend and shot rod (ISRM 1988), Brazilian disc method (Guo et al. 1993; Atkinson et al. 1982; Xu and Fowell 1994) and semicircular bend method (Kuruppu et al. 2014; Wang et al. 2021) have been developed. Other advanced experimental methods such as cracked chevron-notched Brazilian disc (CCNBD), hollow centre cracked disc (HCCD), straight notch disk bend (SNDB), chevron notch semi-circular bend (CNSCB) and single edge crack round bar bending (SECRBB) among others have been used to measure the rock fracture toughness in different modes (Chang et al. 2002; Amrollahi et al. 2011; Pakdaman et al. 2019). However, the laboratory experiment for KIC determination is generally too tedious and time consuming coupled with the requirement of high level of expertise as compared to other mechanical properties such as compressive and tensile strengths (Zhixi et al. 1997; Kahraman and Altindag 2004; Ke et al. 2008).

Apart from the laboratory means of determining KIC, other methods like conventional numerical, analytical and empirical methods have been used to predict the KIC (Chen et al. 2001; Eftekhari et al. 2015a,b; 2017). The analytical and numerical methods are said to be one-to-one mapping models depicting that detailed geometric and physical mechanisms are required which make them rigorous, tedious, computationally expensive and requiring some assumptions (Jing 2003; Sakellariou and Ferentinou 2005; Lawal and Kwon 2021). Their results also diverge from the experimental results on some occasions, as their veracity depends on how good the boundary assumptions are (Lawal and Kwon 2022). Although the laboratory experiment if conducted properly remains the most reliable or viable means of KIC determination, the quick estimation of KIC may be needed during the routine design of mines and also many laboratories that are void of the KIC equipment may also require the estimation of KIC for the design purpose. As a result, researchers have developed some empirical models for the estimations of KIC (Chang et al. 2002; Zhang 2002; Zhixi et al. 1997, etc.). The models are sometimes the correlation between the KIC and physical or mechanical properties, while those that combined different properties of rocks are also available. However, the accuracy of the empirical equations is usually low. Machine learning (ML) models have also been used for accurate prediction of KIC (Roy et al. 2018; Afrasiabian and Eftekhari 2022). The major drawback of the ML models is the unavailability of the tractable mathematical form that can be easily implemented (Afrasiabian and Eftekhari 2022). The recently proposed ML model by Afrasiabian and Eftekhari (2022) is in the mathematical form, but the performance of their model is low.

Despite the availability of different advanced methods for the KIC predictions, field engineers seem to prefer empirical equations to the complex methods without minding the accuracy. Although, some empirical equations have shown a very high R2 value that is greater than 90%, most importantly those that were based on the acoustic rock properties and density. Hence, it will be important to assess the reliability of the existing empirical equations to assist in quick selection of the most suitable one among the scattered equations, as the reliability of the scattered empirical equations for KIC prediction is yet to be evaluated by any researcher. Therefore, we assess the reliability of the existing empirical models, the proposed MARS- and ANN-based models in this study using the experimental database compiled from previous studies. This will serve as a guide to the users of the scattered equations for KIC predictions in the literature and therefore the proposed study is novel and useful in rock mechanics applications.

2 Methodology

2.1 Data Compilation and Explanation

The adopted datasets are compiled from scattered experimental datasets in the literature. The adopted datasets comprised the non-destructive rock properties such as acoustic rock properties and rock density together with the KIC. The datasets are about forty-three (43) in total as presented in Table 1. The P-wave velocity (VP), S-wave velocity (VS) and rock density (ρ) are the model-independent variables, while KIC is the dependent variable. The correlations between these datasets are presented in Fig. 1. The model predictors’ correlations with KIC are relatively good based on the confidence ellipses set at 95% confidence interval. This is also supported by coefficient of determinations (R2) shown in Fig. 1. The correlation between rock density and KIC seems to be the weakest as revealed by the big confidence ellipse and low R2 value. The correlation between VS and VP is the highest, as the size of the confidence ellipse is narrower. The bigger the confidence ellipse, the weaker is the correlation. These geomaterial properties, that is, VP, VS and ρ are used as the model parameters, as the procedures required in determining them are not cumbersome and not destructive.

Table 1 Adopted database for the reliability assessment
Fig. 1
figure 1

The scattered correlation plots of the datasets

2.2 Literature Equations

In this study, some of the available empirical equations for the prediction of KIC are extracted from the literature. About eighteen (18) equations which considered single independent variables were obtained, while three multi-independent variable equations were also obtained from different literature (Table 2). However, out of the 18 empirical equations and three multi-parameters equations, only about nine (9) equations which utilized single parameter were subjected to reliability evaluation alongside with the newly proposed ML-based models in this study. The reason for excluding some of the models in Table 2 is on the bases of the parameters used in developing those equations, which are not considered in some of the obtained dataset for this study. The excluded equations comprised at least one destructive rock property. The exclusion is imperative to enhance fair comparison/evaluation. The models based on non-destructive properties also revealed good performance and are more realistic in the sense that they can be measured alongside with KIC on a single core sample unlike the destructive rock properties such as UCS and σt, which require separate rock core sample preparation for their determination. This implies that more samples will be needed, which is also costly and time consuming to prepare separate core samples for KIC and other destructive property predictions. There could be a slight disparity in the core samples characteristic even if obtained from the same rock mass. Hence, the selected non-destructive properties, apart from the fact that they are not difficult to measure, are also more realistic, because the same sample used in determining them is also used for the KIC test (which is also a destructive rock property).

Table 2 Obtained literature equations

2.3 Assessments of Equations

The assessment of the obtained equations in the literature and the proposed models in this study was performed using the collated data presented in Table 1. To compare the predictions of the assessed models and the measured data points, the root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), and p value from Mann–Whitney test were adopted. Thereafter, the most suitable model is selected. The adopted approach in assessing the most reliable model in this study is similar to that of Mohammed et al. (2019). They selected a reliable model for UCS prediction in their study. After the selection of a suitable model in this study, the selected model(s) is(are) further correlated with the measured value.

2.4 Model Development and Statistical Examinations

2.4.1 MARS

Multivariate adaptive regression spline (MARS) proposed by Friedman (1991) is a non-parametric regression method that enables the capturing of the nonlinear relationships between the data by assessing the knots in the manner similar to the step functions. MARS builds the model using the sum of the weighted basis functions. The basis function can be of three forms: the constant which is always a single term, a hinge function and product of two or more hinge functions. The hinge function is of the form max(0,x-c) or max(0,c-x) (Lawal et al. 2021) and it is very paramount as it is the part that captures the nonlinearity in the data. To build the MARS model, there are two stages, the forward and backward stages. Many candidate basis functions are generated in pairs in the forward stage. Each of the generated pair of the functions is added if it minimizes the overall error of the model. The required number of functions that the model generate can be controlled with hyperparameters. In the backward stage, the generated basis functions are pruned and only those that add to the performance of the model are allowed to remain while others are deleted. The deletion is achieved with the generalization cross-validation (GCV) score. The key advantage of the MARS model over other machine learning approaches is the ability to present its results in the form of simple equation. Apart from this, it can be developed with limited data sets just like the linear regression, which the MARS model leveraged on to capture the nonlinearity in the data. The proposed MARS model was built in the MATLAB with the rock non-destructive properties as the input, while the KIC was the only targeted output. The number of bases function was set to 7 at the forward phase, but the number of bases functions of the final model was pruned to 5 at the backward phase (Fig. 2). The obtained MARS using the piecewise-linear model is presented in Eq. (23).

$$MARS_{{K_{{IC}} }} = 0.93712 + 4.7523BF1 - 1.3373BF2 - 94.7414BF3 + 12.7223BF4,$$
(23)

where

Fig. 2
figure 2

The generalization cross-validation (GCV) score with number of basis functions

$$BF1=\mathrm{max}\left(0,{V}_{S}^{n}-0.49481\right),$$
$$BF2=\mathrm{max}\left(0, 0.49481-{V}_{S}^{n}\right),$$
$$BF3=\mathrm{max}\left(0, 0.51573-{V}_{P}^{n}\right),$$
$$BF4=\mathrm{max}\left(0,{\rho }^{n}-0.58375\right).$$

2.5 Artificial Neural Network

ANN can arguably be said to be the most adopted machine learning method. It has been used to solve a wide range of problems across all the fields of human endeavour. The ANN model is adopted in this study, as it has not been widely used in predicting the KIC and none of the existing study that utilizes ANN for KIC makes available the implementable code for the practical implementation of the ANN model. The proposed ANN in this study was developed using the gathered data presented in Table 1. The dataset is enough for the development of ANN model, as several ANN models have been developed in the past using 27, 30, 34 and 38 datasets (Dehghan et al. 2010; Ebrahimi et al. 2015; Akinwekomi and Lawal 2021; Aladejare et al. 2022) that are below the number of datasets used in this study. The adopted datasets are pre-processed through normalization to ensure data uniformity and avoid overfitting. The ANN model was implemented in the MATLAB using a self-iterated approach. The number of neurons in the input layers are three, which are VP, VS and density, while the number of neurons in the output layer is one, which is KIC. The number of neurons in the hidden layer was varied between two and ten and the results obtained for the training and testing stages with the overall performance are presented in Table 3. The network with nine neurons in the hidden layer outperformed the others and therefore was selected as the optimum network (Fig. 3). The weights and biases extracted from the selected network are transformed into the implementable MATLAB code as presented in Appendix A for easy KIC prediction.

Table 3 Different simulated ANN structures
Fig. 3
figure 3

Selected ANN structure with the performances

2.6 Statistical Analyses and Hypotheses Test

The values of KIC were predicted for the empirical models developed based on non-destructive rock properties using the mined datasets from the literature (Table 1). Afterwards, the RSME, MAE and R2 in Eqs. (2426) were computed for the predicted and measured data points. Thereafter, the normality test was conducted on the measured and predicted KIC using the MiniTab software. Based on the outcome of the normality test, the statistical test was selected. For the non-normal datasets, the p value of each empirical equation was considered using non-parametric Mann–Whitney test. The procedures adopted for the statistical test is as suggested by Mohammed et al. (2019) and presented in Fig. 4.

$$RMSE=\sqrt{\frac{\sum_{i=1}^{n}{\left({Y}_{meas}-{Y}_{pred}\right)}^{2}}{n}},$$
(24)
$$MAE=\frac{\sum_{i=1}^{n}abs\left({Y}_{meas}-{Y}_{pred}\right)}{n},$$
(25)
$${R}^{2}=1-\frac{\sum_{i=1}^{n}{\left({Y}_{meas}-{Y}_{pred}\right)}^{2}}{\sum_{i=1}^{n}{\left({Y}_{meas}-{\overline{Y} }_{meas}\right)}^{2}},$$
(26)

where Ymeas and Ypred are the measured and predicted KIC, while \({\overline{Y} }_{meas}\) is the mean of the measured KIC and n is the number of data points.

Fig. 4
figure 4

Framework of the adopted methodology (after Mohammed et al. 2019)

A two-tailed test with 95% confident interval was used with test hypothesis assumptions, null hypotheses and research hypotheses. For the null hypotheses, Ho measured and predicted data are identical, while for the research hypotheses, Ha measured and predicted data are not identical. During the statistical analysis, Ho was accepted for p value >  = 0.05, and Ho was rejected for p value < 0.05 for all tests.

3 Results and Discussion

The reliability of empirical equations and the proposed models was assessed using the adopted database of acoustic properties, rock density and KIC to compute the RMSE, MAE and R2 between the measured and predicted values as presented in Table 4. From Table 4, the minimum RMSE and MAE are 0.0437 (for ANN) and 0.0259 (for ANN), respectively, while their respective maximum values are 0.8588 (Eq. (3)) and 0.5537 (Eq. (9)), respectively. The maximum R2 value was 0.997, obtained for the ANN model, while the minimum R2 is 0.2233 obtained for Eq. (9).

Table 4 Evaluation criteria for KIC compared to measured data

For the accuracy assessment of the models, RMSE is more suitable than MAE when the distribution of the error (that is the difference between the measured and predicted values) is normally distributed (Chai and Draxler 2014). However, MAE can as well be used where two models have similar RMSE values and different MAE values. In this study, all the error distributions, as presented in Table 5, are not normally distributed, as their p values are less than 0.05. Therefore, RMSE will be misleading in assessing the reliability of the models and hence not considered (Chai and Draxler 2014). Also, MAE is not also considered in evaluating the suitability of the models because of its correlation with RMSE. In addition, R2 value is also a weak indicator and not considered as well (Willmott and Matsuura 2005). The results in Table 4 reveal that RMSE and MAE may give misleading assessment, because their values for RMSE and MAE are considerably small and all the models are suitable for the assessment of KIC, thanks to the detailed statistical study conducted to further probe into the effectiveness of these equations in KIC predictions.

Table 5 Mann–Whitney test for the KIC modes compared to measured data

The statistical outcomes of the measured and predicted KIC by ANN, MARS and Eq. (9) are non-normal, while that of the predictions by Eqs. (1–8) are normal, as their p values are greater than 0.05. The Mann–Whitney test was conducted to check the reliability of the models, since both measured and predicted or either one of the two is required to be non-normally distributed as shown in the chart (Fig. 5). The normal probability plot of the measured value is presented in Fig. 5, while the normal probability values of the models are presented in Table 5.

Fig. 5
figure 5

Probability plot of the measured KIC

The p values obtained from Mann–Whitney test conducted on the pair of measured and predicted values by the models are also presented in Table 5. The minimum and maximum p values are 0 and 0.972, respectively. The p values obtained revealed that out of the models subjected to the reliability analysis, ANN, MARS and Eqs. (1, 2, 7, 8, 9), have p values greater than 0.05. Since the higher the p value, the better is the model based on the conducted test, ANN, MARS and Eq. (2) are most suitable for the prediction of KIC and can be used with more confidence, while Eqs. (3, 4, 5 and 6) should not be used.

The correlation plot of the measured and the predicted values using the ANN and MARS models, the most suitable models, are shown in Fig. 6. It can be seen that the prediction of the ANN model is actually close to the measured values. The histogram of the ANN model is closely related to that of the measured data. Similarly, the MARS model prediction also revealed a close predictions to the measured KIC, but not as that of the ANN model. In fact, the histogram of the MARS model differs from that of the measured KIC.

Fig. 6
figure 6

Correlation between the measured and predicted values of the most reliable models

4 Conclusion

This study assessed the reliability of various empirical models for KIC predictions alongside with two machine learning methods, ANN and MARS models, with practical implementation insight using the mined data in the literature. To achieve this, each of the selected empirical equations based on the available data were re-evaluated alongside with the two newly proposed ML models. They are then subjected to strong statistical tests beyond the usual RMSE, MAE and R2 statistical indicators. The normality tests were first conducted on the measured data, equations and the error in the MiniTab software with two different hypotheses to accept or reject the model. The non-parametric statistical examination was then used to select the most suitable model. The outcomes of the study revealed that ANN, MARS and Eq. (2) are found to be the most suitable for the prediction of KIC and can be used with more confidence, while Eqs. (3, 4, 5 and 6) should not be used. The traditional indicators (RMSE, MAE and R2) should be used with caution, as they can give misleading information about the models. This type of study is highly imperative to ensure that an appropriate model is used for rock mechanics application with outmost confidence.