Introduction

Estimation of rock mechanical properties is considered one of the most important components in geotechnical, geological and rock engineering projects. Two of the most commonly used and fundamental mechanical parameters are the uniaxial compressive strength (UCS) and the static modulus of elasticity (E s). These are important input parameters for rock mass classification systems and characteristics, rock failure criteria, stability analysis of underground and surface structures, and analytical and numerical design approaches like the design of tunnel excavations (Jahed Armaghani et al. 2015). The method for measuring them has been standardized by both ASTM (1984) and ISRM (1981). This method is destructive, expensive, time-consuming, and needs a large number of well-prepared rock samples. In order to overcome these problems, many studies have considered the possibility of an easy and quick way to predict the UCS and E s based on indirect methods or nondestructive tests. The ultrasonic method offers the possibility of obtaining these parameters without changing the internal structure of the sample, and at relatively low operational costs as compared to static testing; therefore, it is faster, cheaper and simpler than static testing. In nondestructive tests, the parameters density (ρ), primary wave velocity (V p), and shear wave velocity (V s) are comparatively cheap and easy to use, and these three parameters can be effective in predicting UCS and E s.

In order to predict UCS and E s, both statistical, including simple or multivariate regression analysis (SRA and MRA), and computational intelligence, including artificial neural networks (ANN), have been using to assess predictive models. In recent years, these methods have improved, and now have a wide range of applications in rock mechanics and engineering geology.

For sedimentary rocks, several researchers have reported a series of simple regression relationships between UCS and E s with density (ρ) (e.g., Shalabi et al. 2007; Dincer et al. 2008; Garagon and Can 2010; Yagiz et al. 2012) or primary wave velocity (V p) (e.g., Freyburg 1972; Vutukuri et al. 1978; Kahraman 2001; Horsrud 2001; Yasar and Erdogan 2004; Ghazvinian et al. 2007; Sharma and Singh 2008; Kilic and Teymen 2008; Dincer et al. 2008; Cobanoglu and Celik 2008; Yilmaz and Yuksek 2008; Moradian and Behnia 2009; Yagiz 2009a, b; Diamantis et al. 2011; Minaeian and Ahangari 2013; Khandelwal 2013; Tonizam Mohamad et al. 2014; Azimian and Ajalloeian 2015; Abbaszadeh Shahri et al. 2016; Jahed Armaghani et al. 2016). From the results of simple regression analyses reported in these studies, it was concluded that no single relationship can be considered reliable for all rock types; they should be used with caution and only for specified rock types.

In most problems, and many instances, more than one independent variable is effective, available, and correlated with the dependent variable. In this regard, a number of researchers found a series of relationships between UCS/E s with a combination of ρ and V p, using multivariate regression as shown in Table 1.

Table 1 Empirical relations with combination of ρ and V p to estimate uniaxial compressive strength (UCS) and the static modulus of elasticity (E s) for sedimentary rocks

Static parameters are more acceptable, realistic, and reliable than dynamic parameters, and are applied in geomechanical modeling. However, measurement of static parameters is more difficult than that of dynamic parameters because static measurements are highly influenced by pores, cracks, pore pressure, and stress–strain. Thus, finding a valid correlation between static and dynamic parameters is essential. In this regards, because of the above-mentioned problems with UCS tests, strength parameters (UCS and E s) are estimated indirectly using dynamic properties that are determined from density, primary, and shear wave velocity. It should be noted that both physical (ρ) and ultrasonic (V p, V s) tests are initial and routine tests that can be done on rock samples in the laboratory. Also, these tests are inexpensive and non-destructive, and there is adequate information about them. As a result, the combination of these parameters can predict the behavior of the strength parameters well. Table 2 presents the background of empirical relations of dynamic Young’s modulus (E d) for prediction of UCS and E s. Dynamic Young’s modulus is calculated from density, primary, and shear wave velocity (according to the formulae given). Comparison of Tables 1 and 2 clearly shows that dynamic Young’s modulus can be used to estimate strength parameters better than the combination of density and primary wave velocity, and with higher accuracy because of the existing shear wave velocity.

Table 2 Empirical relations of dynamic Young’s modulus (E d) to estimate UCS and E s

Sedimentary rocks cover the Earth’s crust widely, with their total contribution approximating nearly 70% of the total volume of rocks. The study of sedimentary rocks is beneficial for mining, and in civil engineering constructions such as dams, tunnels, roads, houses or other structures. Moreover, the most abundant sedimentary rock is limestone, and its volume is large and extension in various projects numerous. Of the many Formations (Fm.) that consist of limestone rocks, the Asmari Fm. is the youngest and most important reservoir rock of the Zagros basin located in the southwest of Iran. This reservoir is the major oil producer Fm. in the Middle East. Knowledge and optimal use of this unit affects the economy of the petroleum industry. Further, geomechanical modeling of the oil unit is broadly applied in optimum drilling, production and reservoir compaction. Asmari Fm. is the dominant formation in many dam sites in Iran (Karun4, Khersan1 and 3, Seymareh, and Ilam dams). The mechanical and physical properties of this unit have attracted the attention of a large number of geologists not only in Iran but also in other countries. Hence, more studies and many mining and civil construction projects (such as dams, tunnels, roads, houses or other structures) focused on the characteristics of this unit have been carried out. Forat limestone Fm., Kalhor limestone Fm., and Khamir limestone Fm. are other names for this unit. In addition, it contains limestone in various forms (porous limestone to marlstone) and is an index for rocks containing abundant variety of limestone and extending over various projects. Thus, studies on this unit can be applied to understand limestone rocks with similar geological characteristics in all over the world [Iran Water and Power Resources Development Company (IWPCO) 2003, 2007, 2008a, b, 2009].

As a consequence, developing a knowledge of the relationships between rock properties will be advantageous for other constructions in this formation, and can help guide engineers and geologists towards initial approximation of the mechanical properties of Asmari rock units. In the last few years, a new supervised machine learning kernel-based method, support vector regression (SVR), has become popular in modeling studies, and also in small-sample cases such as predicting and fault diagnosis. The capability and applicability of SVR methods for prediction have been proved previously in different sciences, like civil-, electronic-, computer-, and mining-engineering. Despite superior SVR performance, so far this technique has not been utilized in rock mechanics and rock engineering, especially in estimation of strength parameters. Thus, it was applied in this investigation because of its desirable capability and applicability. In brief, the UCS and E s of Asmari limestone rocks were examined via their dynamic elastic properties, measured using MRA, ANN, and SVR.

Rock properties and testing process

In the present investigation, samples consisting mainly of limestone, porous limestone, limestone marl, marl limestone, and marlstone were obtained from drilled exploratory boreholes. They were collected from different dam sites (mentioned in previous section) located in Asmari Fm. Rock types with no bedding planes were chosen during sampling in order to avoid any anisotropic effects on the measurements. Thus, the samples were isotropic, homogeneous, and with no major discontinuity or macroscopic structure. Laboratory tests were carried out on intact rock material, and determination of the strength parameters was according to ISRM (1981) recommendations. For testing, a total of 482 samples in the 54–83 mm diameter range, mostly 54 mm (NX-sized), was prepared. The length-to-diameter ratio of the prepared specimens considered was between 2 and 3. The ends of the specimens were flattened to ensure they were precisely perpendicular to the specimen axis. In addition, their sides were smoothed and polished. Moreover, the parameters of rock samples determined by tests were carried out in the laboratory in saturated conditions, to more closely resemble actual site conditions, and also to allow better comparison of the results.

As a non-destructive testing method, ultrasonic techniques have been applied for many years in mining science and geotechnical practice. They are employed in the laboratory for determination of the dynamic properties of rocks and also in the field for geophysical investigations. ISRM (1981) describes three methods of taking pulse velocity measurements in the laboratory; the direct transmission method was used in the present study. This is the most sensitive method, with transducers on opposite faces of the sample. In the measurements, the PUNDIT and two transducers (a transmitter and a receiver) having a frequency of 1 MHz were used (Fig. 1).

Fig. 1
figure 1

Apparatuses of ultrasonic testing for determination of primary wave velocity (V p), and shear wave velocity (V s)

The dynamic poisson ratio (ϑ d) and dynamic Young’s modulus (E d) were calculated from the well-known relationship of isotropic materials, involving the primary and shear wave velocities (V p, V s), and density (ρ) as shown below (Eqs. 1 and 2):

$$\vartheta_{\text{d}} = \frac{1}{2} \cdot \frac{{\left( {\frac{{{\text{V}}_{\text{p}} }}{{{\text{V}}_{\text{s}} }}} \right)^{2} - 2 }}{{\left( {\frac{{{\text{V}}_{\text{p}} }}{{{\text{V}}_{\text{s}} }}} \right)^{2} - 1}}$$
(1)
$$E_{\text{d}} \; = \;\rho \cdot V_{\text{p}}^{2} \cdot \frac{{(1 + \vartheta_{d} )(1 - 2\vartheta_{d} )}}{{1 - \vartheta_{d} }} = \rho \cdot V_{\text{s}}^{2} \cdot \frac{{3V_{\text{p}}^{2} - 4V_{\text{s}}^{2} }}{{V_{\text{p}}^{2} - V_{\text{s}}^{2} }}.$$
(2)

Techniques

Multiple regression analyses

Regression analysis techniques are widely applied to create predictive models among relevant parameters. There are two major regression methods in statistics: simple and multivariable analysis. SRA provides a means of summarizing the relationship between two variables, whereas MRA is a statistical technique that uses several explanatory (predictor) variables to predict the outcome of a response variable. This is an extension of simple linear regression that was used in Pearson (1908). The equation representing this model can be written in the following form (Eq. 3):

$$y = \beta_{0} + \beta_{ 1} x_{ 1} + \beta_{ 2} x_{ 2} + \cdots + \beta_{n} x_{n}$$
(3)

where y is the dependent variable, x 1 to x n are independent variables, β 1 to β n are partial regression coefficients for x 1 to x n and β 0 is a constant representing the value of y when all the independent variables are zero. There are many books and papers describing more details of MRA (e.g., Cohen et al. 2002; David Garson 2014).

Artificial neural networks

ANNs have been developed to solve complex problems since the 1940s. Simple interconnected elements known as nodes or neurons, which are located in distinct layers of the network, processes information in the network. Each layer of the network has one or more neurons. The lines between neurons display the flow of information from one neuron to the next (Meulenkamp and Grima 1999). Multi-layer perceptron (MLP) is the best type of ANN and has the following three layers: (1) an input layer, which is applied to the present data in the network; it does not perform any computations, but serves only to feed the input data to the hidden layer. (2) Hidden layer(s), which are applied to act as a collection of feature detectors; these are between the input and output layers. There can be any number of hidden layers in the ANN structures (usually one or two hidden layers are used). Furthermore, the number of hidden layers and their neurons can be determined by trial and error. (3) An output layer, which is applied to produce a suitable response to the given inputs (Atkinson and Tatnall 1997). In addition, the ANN model requires to be trained, so several learning algorithms have been suggested. The most proficient learning procedure for MLP neural networks is the feed forward back propagation (BP) algorithm. Many books and papers have described details of ANN (see, e.g., Haykin 1999; Ham and Kostanic 2001; Sonmez et al. 2006; Ceryan et al. 2013; Okkan and Serbes 2013).

Support vector machine

SVM is supervised learning model with associated learning algorithms that analyze data and recognize patterns. SVM is one of the intelligence techniques originally invented by Vladimir N. Vapnik and Alexey Ya Chervonenkis in 1963. SVM comprises all the main features that characterize a maximum margin algorithm. As shown in Fig. 2, the basic idea of SVM is to map the training data from the input space into a higher dimensional feature space via a transfer function (ϕ), and then to construct a separating hyper plane with maximum margin in the feature space.

Fig. 2
figure 2

Mapping data from input space to the high-dimension feature space

SVM is applied to classification problems [support vector classification (SVC)] and regression analysis [support vector regression (SVR)]. There is a motivation to find and optimize the generalization bounds given for regression and classification. In general, both SVC and SVR depend on defining the −epsilon intensive −loss function that disregards errors within a certain distance of the true value. One of the central ideas of SVC and SVR is presenting the solution by means of small subset of training points, which gives very great computational advantages. SVR uses the same principles as SVC with only a few minor differences, but the main idea is the same: minimizing error, individualizing the hyper plane, which maximizes the margin, considering that part of the error is tolerated.

Support vector regression

SVR carries out linear regression in the high-dimension feature space by ɛ-insensitive loss and, at the same time tries to minimize ||ω||2 with the purpose of reducing model complexity. To measure the deviation of training samples outside the ɛ-insensitive zone, non-negative slack variables ξ i and ξ * i (i = 1, …, n) are introduced. SVR is formulated as minimization of the following functional, as shown in Fig. 3. This figure demonstrates an example of one-dimensional linear regression with an epsilon intensive band, slack variables and selected data points. SVR model complexity and its generalization performance (prediction accuracy) depends on a good setting of kernel function and meta-parameters (including C and ɛ). In existing software implementations of SVR, meta-parameters are user-defined inputs. The selection of a kernel function type and its parameters is based on application-domain knowledge, and its effects on the distribution of input values (x) of the training data. The trade between the model complexity (flatness) and the degree (deviations are larger than ε) is determined by parameter C. If C is too large (infinity), the target only minimizes the empirical risk without considering model complexity in the optimization formulation. Additionally, parameter ɛ is used to fit the training data. The width of the ɛ-insensitive zone is also controlled by this parameter. The value of ɛ influences the number of support vectors used for constructing the regression function. The bigger the value of ɛ, the fewer support vectors are selected. In brief, C and ɛ-values affect model complexity in different ways (Cortes and Vapnik 1995; Chapelle and Vapnik 1999).

Fig. 3
figure 3

One-dimensional linear regression

Assessment of prediction performance

To check the quality of the models and their validity, various criteria are applied. Once it is determined that at least one of the independent variables is important, a logical question becomes which one(s). The statistical test for this hypothesis is Student’s t test. This test reveals the validity and reliability of equations by their significance. As this test was performed at the 95% significance level, the significance should be <5% (Montgomery et al. 2006).

In previous studies, relations are accompanied by some statistical indices, such as coefficient of determination (R 2) and root mean square error (RMSE). These indices are calculated to control the performance of the prediction model. R 2 is called the proportion of variation clarified by the x, and is between 0 and 1. When values of R 2 are close to 1, the most of the variability in y is explained by the model. In addition, the RMSE or root mean square deviation (RMSD) is a frequently used measure of the differences between values (sample and population values) predicted by a model and the values actually observed. Besides, it is a good measure of accuracy to compare forecasting errors of different models, as calculated according to Eq. 4.

$${\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{1}^{N} (y - y^{\prime})^{2} }}{N}}$$
(4)

where y and y′ are measured and predicted values, respectively, and N is the total number of measured data points (Hyndman Rob 2006). Higher R 2 and lower RMSE would be desired. If the R 2 is 1 and RMSE is 0, the model is excellent.

Data procedure and results

In present study, the data used pertain to limestone (porous limestone to marl limestone) sampled from five different dam sites located in the Asmari Fm. Input variables consisted of ρ, V p, and V s and were used to calculate ϑ d and E d according to Eqs. 1 and 2, with the purpose of establishing models for the prediction of UCS and E s. A statistical basic description of the dataset is given in Table 3.

Table 3 Statistical basic description of the dataset used

In this paper, multivariate regression, two neural network models with one [ANN (1)] and two [ANN (2)] hidden layer(s) and SVR structures were implemented in MATLAB 2014a software environment for prediction of UCS, and E s. Firstly, unconfined compressive strength and modulus of elasticity were correlated to the cited properties (ϑ d and E d), using MRA, ANN and SVR. Then, the potential of using ANNs and SVR in predictions compared with the results obtained from statistical methods (e.g., MRA) was considered.

To begin with, SRA (including one independent variable) and MRA (including two independent variables) were used to predict UCS and E s with correlation to the independent variables ϑ d and E d. A comparison of SRA and MRA models indicates that the statistical index of coefficient of determination of equations, increased in going from one independent variable to two; however, RMSE values decreased. Consequently, a combination of the dynamic Poisson ratio and dynamic Young’s modulus can estimate the behavior of the strength parameters very well. Figures 4 and 5 illustrate the very good linear correlations with two mentioned independent variables for predicting UCS and E s using MRA, shown as follows:

$${\text{UCS}} = - 7. 7 1 + 9 2. 7 2\vartheta_{d} + 0. 8 7\cdot E_{\text{d}} \cdot \left( {R^{ 2} = 0. 90,{\text{ RMSE}} = 2. 7 4} \right)$$
(5)
$$E = - 3 1. 2 3 + 1 3 1. 5\vartheta_{d} + 0. 4 5\cdot E_{\text{d}} \cdot \left( {R^{ 2} = 0. 9 1,{\text{ RMSE}} = 1. 5 3} \right).$$
(6)
Fig. 4
figure 4

Illustration of uniaxial compressive strength (UCS) vs. Poisson ratio and dynamic Young’s modulus (E d)

Fig. 5
figure 5

Illustration of static Young’s modulus (E s) vs. dynamic Poisson ratio and E d

As indirect tests are easy to use, two suggested equations (Eqs. 5 and 6) are valuable at the preliminary stage of design. It should be noted that the Asmari formation can be a typical limestone Fm., so the predicted experimental equations are valid for studied limestone rocks with similar geological characteristics from all over the world.

Later, a MLP structure was applied to the ANN model. ANNs are made up of neurons organized in the form of layers. Each layer in a network contains sufficient numbers of neurons that depend on the specific application. Selecting the appropriate number of hidden layer(s) and its neurons is the most important, and also the most controversial, issue in ANN design. Ham and Kostanic (2001) explained that hidden layers can be determined by trial and error. There can be any number of hidden layer(s) in an ANN structure. However, from practical considerations, only one or two are used. Many studies have stated that one hidden layer can be adequate for most problems, and can estimate almost all complicated problems well (Hecht-Nielsen 1987; Hornik et al. 1989; Baheer 2000). This was highlighted in investigations conducted by several researchers (Sonmez et al. 2006; Yagiz et al. 2000; Gordan et al. 2015; Jahed Armaghani et al. 2016). In addition, some researchers have pointed out that two hidden layers are necessary for a learning function with discontinuities, but there is rarely an advantage in using more than two hidden layers (Rumelhart et al. 1986; Lippmann 1987; Masters 1994). Hence, one and two hidden layers are preferred in this study.

On the other hand, the number of neurons in the hidden layer(s) is the most critical task in developing the ANN structure. Many empirical heuristics have been proposed for determining the number of neurons in one hidden layer, as summarized in Table 4.

Table 4 Heuristics proposed for the number of neurons to be used in a hidden layer in an artificial neural network (ANN). N i number of input neurons, N 0 number of output neurons

Depending on the proposed heuristics, the number of neurons for one hidden layer can vary from 1 to 6 in this study. Thus, a trial and error approach was applied by increasing the number of neurons in the hidden layer at this stage of the analyses to search and find the optimum models. To establish the most effective ANN structure, the number of hidden neurons in the ANN(1) model was selected as 6, with 9 and 8 selected for the ANN (2) model, as shown in Fig. 6.

Fig. 6
figure 6

Artificial neural network (ANN) structures in present study. ANN (1) ANN with 1 hidden layer, ANN (2) ANN with two hidden layers

In order to shuffle data, the data set was coded and moved using the Rand and Sort commands, respectively, in Microsoft Office Excel software. The data set was then divided into two subsets: training and testing. In fact, almost 75% of data is used in the training stage and the rest (~25%) in testing. Using the BP algorithm forms the basis of training supervised neural networks. Inputs are transmitted through the network layer by layer and then a set of outputs is obtained. In addition, the weights of the network are set during teh forward pass. The weights are adjusted to minimize differences (errors) between actual and desired output by using a sigmoid transfer function. Next, the errors are passed backwards through the weights of the network if they are higher than the desired values. Each time the network is processed, the whole set of data (both the forward and the backward pass) is called an "epoch". The errors are reduced with each epoch until an acceptable level of error is obtained (Fig. 7). The training procedure continues until the sum of RMSE falls within an acceptable range or is minimized. Each developed network and SVR model is trained several times (at least 25 times) to select the best model. The most appropriate network structure provided the best training result. Finally, the network is tested with the rest of the input and output data set to evaluate the model’s capability and performance. It should be noted that all analyses were performed to be statistically significant at 95% level of confidence. In SVR models, the calibrated C and ɛ values are 400 and 0.0025, respectively. Furthermore, the kernel functions used are Gaussian, tangent and periodic.

Fig. 7a,b
figure 7

Training state in ANN (2). a UCS vs. ϑ d and E d. b E s vs. ϑ d and E d

Finally, the measured UCS and E s values are shown in Fig. 8 against those predicted from the MRA, ANN, and SVR models (with two independent variables). The ANN models gave predicted values relatively close to the measured ones, and its coefficient of determinations were reasonably good. Comparing the two ANN models demonstrates that an ANN with two hidden layers gave the higher performance. Measured values were found to match the training data perfectly, with a little deviation in some points using SVR. It should be mentioned that, despite approximately equal amounts of determination coefficients for the ANN and SVR models for prediction purposes, RMSE values were higher for the ANN models than for the SVR models, showing that the performance of the SVR model for predicting UCS and E s is better than that of the ANN models.

Fig. 8
figure 8

Scatter plots for the multivariate regression analysis (MRA), ANN, and support vector regression (SVR) models (with four independent variables) developed in this study

The calculated indices with one and two independent variables are summarized in Tables 5 and 6, respectively. The model is excellent when RMSE has the smallest and R 2 has the largest value, so the obtained values of R 2 and RMSE, given in Table 6, indicate high prediction performances.

Table 5 Performance indices of the predictive models with one independent variable
Table 6 Performance indices of the predictive models with two independent variables

Predictive analytics comprises a variety of statistical methods that are clustered into regression and machine learning techniques—a branch of artificial intelligence. Regression techniques focus on establishing a mathematical equation as a model to represent the interactions between the different variables under consideration. Since regression equations are easy to use, the equations can be used at the preliminary stage of design. However, machine learning approaches are employed to directly predict the dependent variable without focusing on the underlying relationships between variables by enabling computers to learn. In other cases, the underlying relationships can be very complex, and the mathematical form of the dependencies unknown. This means that the exact relationship between inputs and output is not known. On the other hand, the applicability of machine learning methods, and their capacity for high accuracy prediction are well known and have been proved in many sciences. In general, machine learning techniques are more powerful tools than regression methods in detecting and exploiting complex patterns on data by clustering, classifying, and ranking the data. As a result, the use of machine learning methods is suggested in prediction destinations.

Conclusions

Due to problems with UCS tests, the strength parameters (UCS and E s) were predicted indirectly using non-destructive tests such as the dynamic properties (ϑ d and E d) determined from density, primary and shear wave velocity. For this purpose, data from 482 rock samples pertaining to limestone (porous limestone to marl limestone) from the Asmari formation were used to develop predictive models. To establish relations and analyze them, SVR and two ANN models, along with regression analysis, were used to estimate UCS and E s parameters. In order to check the accuracy of the models developed, R 2 and RMSE were applied. In addition, the applicability, capability and performance of the SVR, ANN and MRA models for prediction purposes were tested and compared.

The main results can be summarized as follows:

  1. 1.

    The comparison of SRA and MRA models indicates that the R 2 and RMSE of equations are increased and decreased, respectively, in going from one independent variable to two. Consequently, the combination of the dynamic Poisson ratio and dynamic Young’s modulus estimated the behavior of the UCS and E s very well (R 2 = 0.91 and 0.90, respectively).

  2. 2.

    As SVR and ANN methods directly predict the dependent variable without focusing on the underlying relationships between variables with capability for high accuracy prediction, they are more powerful tools than regression methods in detecting and exploiting complex patterns on data by clustering, classifying, and ranking the data. So, their use is suggested in prediction destinations. However, the suggested regression equations can be used at the preliminary stage of design because they are easy to use.

  3. 3.

    In both the training and testing periods, ANN with one hidden layer had a lower performance than ANN with two hidden layers.

  4. 4.

    Although ANN and SVR can predict in a quite short period of time with tiny error rate, SVR models surpassed the ANN and MLR models (with the higher R 2 and lower RMSE). Additionally, the SVR models include few relevant vectors and a controlling Gaussian kernel parameter in comparison with the ANN models, so the computational time and the possibility of overlearning are minimized.

After investigating SVR studies in other sciences, apparently a comprehensive application of SVM has not been done in the field of rock mechanics, thus additional studies are needed to verify this method and its performance in this field.