Introduction

Uniaxial compressive strength (UCS) of rock material is very important parameter for the geotechnical engineering applications such as rock mass classification, numerical modelling bearing capacity, mechanical excavation, slope stability and supporting with respect to the engineering behaviors’ of rock (Haramy and De Marco 1985; Ceryan 2014; Wang and Aladejare 2015). UCS can be measured directly or can be predicted by different methods including using existing tables and diagrams, regression, Bayesian approach and soft computing methods.

In addition to the instic properties of the rock materials, the sample geometry and loading conditions also affect laboratory test results directly on the rock sample. For this reason and also to compare the result, this test is carried out according to the international test standards set by the American Society for Testing and Materials (ASTM) or the International Society for Rock Mechanics (ISRM). However, the direct measurement of UCS is expensive and time-consuming, and the preparation of a standard core specimen for weak and highly fractured and thinly bedded rocks include a major challenge (Gokceoglu 2002; Gokceoglu and Zorlu 2004).

In order to assess the strength of rock materials easily and quickly in the field work, the tables and diagrams based on “simple means tests” were developed. As examples using the tables and diagrams, first approach to estimate UCS of rock materials can be given by the method recommended by the British Standard (BS 5930 1981) and the International Society for Rock Mechanics (ISRM 2007). On other hand, Pollak and co-workers (Pollak et al. 2017) suggested a method for estimations of the UCS of carbonate rock materials without conducting index test. The said method is based on observing four basic elements: lithology, fabric, defects and porosity. The LFDP determination method is simple, efficient, inexpensive and versatile. The use of existing tables and diagrams and LFDP determination method is useful in classification of rock mass, but the use of the UCS value estimated from the tables and diagram in engineering calculations is not appropriate.

Basic mechanical tests including the Shore Scleroscope hardness (Deere and Miller 1966; Koncagul and Santi 1999; Yasar and Erdogan 2004), Schmidt hammer (Shorey et al. 1984; Yagiz 2009; Fattahi 2017; Demirdag et al. 2018; Ghasemi et al. 2018), block punch test (Ulusay and Gokceoglu 1997; Mishra and Basu 2012; Sulukcu and Ulusay 2001), Brazilian test (Nazir et al. 2013), core strangle index (Yilmaz 2009), nail penetration test (Chaudhary 2004; Yamaguchi et al. 2005; Maruto Corporation 2006; Ngan-Tillard et al. 2009), Equotip hardness tests (Verwaal and Mulder 1993; Alvarez-Grima and Babuska 1999; Yilmaz 2013), Hybrid Dynamic Hardness (Yilmaz 2013) and Edge Load Strength (Palassi and Mojtaba Pirpanahi 2013) are used with empirical equations to obtain UCS. Indentation test (Cheshomi et al. 2017; Haftani et al. 2013; Mateus et al. 2007; Szwedzicki 1998; Yuen 2010), loading reconstructed cores test (Mazidi et al. 2012), modified point load test (Sheshde and Cheshomi 2015) and single particle loading test (Cheshomi et al. 2012; Cheshom and Ahmadi-Sheshde 2013; Cheshomi et al. 2015; Ashtaria et al. 2019) have been also suggested for indirect determination of UCS that have been also suggested for indirect determination of UCS. Although these tests are serious shortcomings, limitations and problems related to these testing methods (Yilmaz 2009; Kayabali and Selcuk 2010; Nefeslioglu 2013), they are also used as a input parameters in the predicted models such as statically and soft computing models (Ceryan and Korkmaz Can 2018) .

Most investigations involve determining the individual correlation between an index and the UCS (i.e., a simple regression analysis) (e.g. Rzhevsky and Novick 1971; Koncagul and Santi 1999; Fener et al. 2005; Chang et al. 2006; Mishra and Basu 2013; Fereidooni 2016; Aboutaleb et al. 2018; Heidari et al. 2018; Jamshidi et al. 2018). Certain studies have used more than one index to predict the UCS (i.e. multiple regression analysis) (e.g. Moos et al. 2003; Moradian and Behnia 2009; Ali et al. 2014; Torabi-Kaveh et al. 2014; Ceryan 2014; Heidari et al. 2018; Aboutaleb et al. 2018; Cengiz et al. 2018). There are some difficulties in the implementation and generalization of these statistical models (Sridevi 2000; Fener et al. 2005; Sonmez et al. 2006; Maji and Sitharam 2008; Yuen 2010; Wang and Aladejare 2015; Ng et al. 2014). There is no agreement of the equations obtained from regression analysis for the different rock types (Fener et al. 2005; Sonmez et al. 2006). Feng (2015) indicated that most empirical correlations are obtained using regression methods that do not quantify the uncertainties of predictions and it is not always possible to modify them to incorporate project-specific data. According to Maji and Sitharam (2008), in evolving trend-fitting curves by statistical regression, the data is constrained along a particular two-dimensional geometry of the statistical model used. To overcome these difficulties of these conventional methods, many researchers have employed soft computing methods in estimating UCS of rock material (Table 1).

Table 1 The some proposed model on soft computing methods to predictive UCS reported in the literature

ANNs have been used extensively for modeling in the prediction UCS (Table 1). However, they may suffer from some disadvantages such as converging at local minima instead of global minima, overfitting if training goes on for too long and non-reproducibility of results, partly as a result of random initialization of the networks and variation of the stopping criteria during optimization (Sattarib et al. 2014). In the past decade, a new alternative kernel-based technique called a support vector machines (SVM) (Vapnik 1995) has been derived from statistical learning theory. SVM model using sigmoid kernel function is equivalent to a two-layer perceptron neural network. Using a kernel function, SVMs are alternative training methods for polynomial, radial basis function and multilayer perceptron classifiers in which the weights of the network are found by solving a quadratic programming problem with linear constraints, rather than by solving a non-convex, unconstrained minimization problem as in standard ANN training (Huang et al. 2010). Despite this advantages of standard SVM, it has some shortcomings. Some of them are (i) that SVM employs basis functions superfluously in that the needed support vectors increase with the training data size and (ii) there is a dubiousness to get the control parameters. Thus, the calibration of the three parameters of SVM can be time-consuming and wearing (Gedik 2018). For this, Suykens and Vandewalle (1999) applied some modifications to the traditional SVM algorithm to simplify the process of finding a model by solving a set of linear equations (linear programming) instead of non-linear equations (quadratic programming) and named it as least square support vector machine (LS-SVM). It can be said that LS-SVM includes similar advantages of traditional SVM. But it performs faster computationally. The LS-SVM method has been used in the prediction of UCS by some researchers (Table 1).

Recently, extreme learning machine (ELM) has been proposed for training single hidden layer feedforward neural networks (SLFNs), which randomly choose hidden nodes and analytically determine the output weights of SLFNs (Huang et al. 2006; Zong et al. 2013). In the ELM method, the only free parameters that need to be learned are the connections (or weights) between the hidden layer and the output layer (Huang et al. 2006; Zausa and Civolani 2001; Huang et al. 2015). This mechanism is different from the conventional learning of SLFNs. ELM is formulated as a linear-in-the-parameter model which boils down to solving a linear system, which can be applied as the estimator in regression problem or the classifier for classification tasks (Huang et al. 2006; Liu et al. 2015; Huang et al. 2015). Theoretically, this algorithm tends to provide good generalization performance at extremely fast learning speed and has a highly accurate learning solution (Liu et al. 2015). And even with randomly generated hidden nodes, ELM maintains the universal approximation capability of SLFNs (Huang et al. 2015). According to the results of the study performed by Liu et al. (2015), the ELM approach can perform much better than the RBF-neural network (RBF-NN) and the BP-neural network (BP-NN), in modeling the rock parameter problems. Also, the ELM performs equivalently to the generalized regression neural network (GRNN) and the SVM in estimation of the UCS of rocks and takes much less time than the GRNN. The authors indicated that it can be easily used in the problems in rock mechanics and engineering where uncertainty substantially exists and expert opinions play an important role.

The other method used in solving engineering problems in recently is Minimax Probability Machine (MPM). MPM introduced in the studies performed by Lanckriet and co-workers (2002a-b) is a novel classification algorithm based on the prior knowledge and has been successfully applied in classification and regression problems (Yang et al. 2019). The problem of constructing a regression model can be posed as maximizing the minimum probability of future predictions being within some bound of the true regression function (Strohmann and Grudic 2002). This approach constitutes the main framework of MPM (Strohmann and Grudic 2002). This method has advantages over other machine learning methods (Yang et al. 2019): (i) MPM is a moment-based constrains algorithm (or called a nonparametric algorithm). It utilizes all the information from the samples, mean and variance, to find a minimax probabilistic decision hyperplane for separating the two-class samples for binary classifications; (ii) making no assumption on the data distribution, MPM can directly estimate a probabilistic accuracy bound by minimizing the maximum probability of misclassification error; and (iii) MPM formulation is reformulated as a second-order cone programming (SOCP).

The main purpose of this study is to examine the applicability and capability of the Extreme Learning Machine (ELM), Minimax Probability Machine Regression (MPMR) approaches for prediction of UCS of the volcanic rocks and to compare its performance with Least Square Support Vector Machine (LS-SVM). The samples tested were taken from the rock slopes on the Giresun-Gumushane highway and from the İyidere-Rize quarry (NE Turkey). In this study, also, the use of porosity and slake durability index together in estimating the UCS of the weathered volcanic rocks was investigated. The degree of weathering of the said volcanic rock material was defined by using Schmidt hammer rebound number (SHV). In this study, RMSE, VAF, R2, Adj. R2, PI, REC curve, and Taylor diagram were used to evaluate the performance of the models suggested.

Materials and testing procedures

The study area is located in the Eastern Pontides of NE Turkey (Fig. 1). The eastern Pontides comprise Late Cretaceous and Middle Eocene to Miocene volcanic and volcanoclastic rocks in the north, whereas in the south, pre-Late Cretaceous rocks are widely exposed. The area is characterized by three magmatic cycles developed during Liassic, Late Cretaceous and Eocene times (Fig. 1). The samples used in this study included Late Cretaceous volcanic rocks and interbedded sedimentary rocks. These volcanics are andesite, dacite and rhyolite in composition.

Fig. 1
figure 1

The geological maps of the study area (Acarlioglu et al. 2013)

In this study, 47 groups of block samples, each sample measuring approximately 30 × 30 × 30 cm, were collected in the field for petrographic analyses, index and mechanical test. These analyses and test were performed at the Rock Mechanics Laboratory in the Engineering Faculty of Karadeniz Technical University, Trabzon NE Turkey.

The dasitic and andesitic rock samples investigated are from the excavated slopes throughout the Gumushane-Giresun roadway in NE Turkey (Figs. 1, 2 and 3). Andesite is grayish green and dark green in color, and macroscopically, augite, hornblende, biotite and plagioclase minerals can be identified. The groundmass of the said rocks is composed of plagioclase, augite, hornblende, biotite, chlorite and opaque minerals. Calcite, sericite and chlorite are found as alteration products and sometimes as crack fills in these rocks.

Fig. 2
figure 2

The dasitic and andesitic rock exposed throughout the Gumushane-Giresun road (a, b) basalt and tuffs in the Iyidere-Ikizdere quarry, Rize NE Turkey (c, d, Usturbelli 2008)

Fig. 3
figure 3

Test core samples with different volcanic rock types investigated

Dacite has microgranular texture and contains a lot of quartz minerals. In the dacite, quartz is found euhedral to subhedral phenocrysts, and micro and cryptocrystalline in the groundmass and plagioclase occur subhedral phenocrysts and small anhedral crystals in the groundmass. In the plagioclases, mostly, sericitization, calcification and argillitization are observed. The biotite is abundant in euhedral and subhedral crystals and generally forms chloritized as small flakes in the dacites.

The rock samples from the tuffs and basalt investigated were obtained from the Iyidere-Ikizdere quarry, Rize NE Turkey (Figs. 1, 2 and 3). Basalt has microlitic-porphyric textures with plagioclase, clinopyroxene and hornblende phenocrysts. Their groundmass has an intergranular texture and contains plagioclase, clinopyroxene, hornblende, Fe-Ti oxide and volcanic glass. The tuffs are found as lytic-crystal tuff. The crystal fragments are composed of plagioclase, augite and hornblende as coarse and small grains, opacified and chloritized biotite along the cleavages, and opaque minerals in the tuff. The rock fragments in the rocks are composed of andesite.

Schmidt hammer rebound hardness test on the block samples, determine the porosity, slake durability test and the uniaxial compressive strength (UCS) was performed according to International Society for Rock Mechanics (ISRM 2007) (Table 2). Rebound hammer used in this study was Schmidt hammer N-type. From the 60 readings collected undertaken at different point on the block surface, the average of rebound number value was calculated from 50% of the highest readings. In this study, Schmidt hammer rebound number (SHV) obtained on unweathered block samples was used to define the weathering degree of the samples collected from the field (Eq. 1. Table 2)

$$ Wc={R}_f/{R}_w $$
(1)

where Wc is the decomposition index, Rf is the SHV value obtained for unweathered rock samples, Rw is the SHV value obtained for the investigated samples. The definition of the weathering degree of the sample investigated was made according to the classification given in Gokceoglu (1997) (Eq. 1). The Wc value is less than 1 in the unweathered rock sample, between 1.1 and 1.5 in slightly weathered samples, between 1.5 and 2 in moderately weathered samples and greater than 2 in highly weathered samples.

Table 2 Index and strength properties of the samples examined

The slake durability test was performed using the standard testing method recommended by the International Society for Rock Mechanics (ISRM 1981). The test was performed using 10 samples of each test for four cycles. The slake durability index corresponded to each cycle and was calculated as the percentage ratio of the final to the initial dry weights of the rock in the drum after the drying and wetting cycle. This experiment was performed three times for each block sample.

The core samples were prepared from the rock blocks using the core-drilling (Fig. 3). They were 50 mm in diameter and the edges of the specimens were cut parallel and smoot. And then, porosity and the uniaxial compressive strength test were performed on these core samples.

The porosity (n) of the rock was estimated using the following equations:

$$ n=1-\frac{\rho_d}{\rho_s} $$
(2)

where ρd is the dry density and ρs is the grain density.

The UCS test experiments were performed using 10 samples under dry conditions for each group. During the test, the samples were loaded to be broken in 10 and 15 min. The sample’s stress was calculated as the ratio of the compressive force and sample cross-section area at the beginning of the test, and the uniaxial compressive strength was the ratio of the maximum applied force and cross-section area (Table 2).

Modeling techniques and their application

The input parameters and data normalization

The intrinsic properties affecting UCS of rock material can be divided into two groups; one is pore characteristics, and the second is microstructural variables consisting of mineralogical composition and rock texture (Ceryan 2014). Pore characteristics is known to exert a first-order control on the physical properties of rocks that aids in governing physical attributes of rocks, such as strength, deformability and hydraulic conductivity (Tugrul 2004; Bubeck et al. 2017; Griffiths et al. 2017). Porosity is one of the most important and widely used parameters used in defining the pore characteristics of rock materials. For this, porosity is often used in the models developed to estimate UCS (Baud et al. 2014; Bubeck et al. 2017; Ceryan and Korkmaz Can 2018). These studies focused on the impact of porosity on UCS demonstrated that a negative linear or curvilinear relationship exists between UCS and porosity of rock materials.

Mineralogical and petro-physical properties including density, cation packing index, content of specific mineral such as quartz and clay are widely used for characterizing microstructural variables and weathering grades of rock materials (e.g. Zorlu et al. 2008; Ceryan 2012; Ceryan 2008; Manouchehrian et al. 2014; Ceryan 2014; Ceryan 2015). The slake durability of a rock is an important property and is closely related to its mineralogical composition. Hence, its resistance to degradation (weakening and disintegration) is measured using a standard drying and wetting cycle (Sharma and Singh 2008). The slake durability test is a cheap and easy test to carry out and requires very little sample preparation. For this, some researchers have investigated the relation between UCS and slake durability index (Id) to develop an estimation equation for UCS (e.g. Cargill and Shakoor 1990; Koncagul and Santi 1999; Gokceoglu et al. 2000; Dincer et al. 2008; Yagiz 2011; Kahraman et al. 2016).

In this study, porosity representing pore characteristics and slake durability index representing mineralogical and petro-physical properties were used as an input parameter for the LS-SVM, RPRM and ELM models.

In the models suggested in this study, the input and output data are normalized to prevent the model from being dominated by the variables with large values, as is commonly used. The normalization of all data was carried out using Eq. 24:

$$ {z}_i=\frac{x_i-{x}_{\mathrm{min}}}{x_{\mathrm{max}}-{x}_{\mathrm{min}}} $$
(3)

where zi is the scaled value, xi is the original data, xmin and xmax are, respectively, the minimum and maximum values of the original data.

Regression analysis

The regression analysis is a statistical tool that can be applied to examine the relationships between variables. In this technique, the relationship between independent (predictor) variable and dependent (output) variable is systematically determined in the form of a function (Jahed Armaghani et al. 2016a). Two main regression methods in statistics are simple and multivariable analysis The simple regression analyses provide a means of summarizing the relationship between two variables (Yagiz et al. 2012). While a simple linear equation (SLR) has one basic form as y = b0 + b1x, non-linear equations obtained by simple none linear equation (SNLR) can take many different forms including power (y = axb), logarithmic (y = a lnx + b) and exponential (y = aebx) functions that covers many different forms, which is why non-linear regression provides the most flexible curve-fitting functionality (Ceryan and Korkmaz Can 2018).

The general purpose of multiple regression is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. The goal of multiple linear regression (MLR) is to model the linear relationship between the explanatory (independent) variables and response (dependent) variable. A linear equation is constructed by adding the results for each term. This constrains the equation to just one basic form such as “Response = constant + parameter * predictor + ... + parameter * predictor” (Ceryan and Korkmaz Can 2018).

In order to estimate UCS by regression analysis in this study, two simple linear and many simple non-linear relationships were found. The highest performance was obtained from exponential (y = aebx) functions. In addition, MLR analysis was performed in this study.

Least square support vector machine

The basic concept of SVM is to transform the signal to a higher dimensional feature space and find the optimal hyper-plane in the space that maximizes the margin between the classes (Lee et al. 2008). Suykens and Vandewalle (1999) applied some modifications to the traditional SVM algorithm to simplify the process of finding a model by solving a set of linear equations instead of non-linear equations and named it as least square support vector machine (LSSVM). LS-SVM includes similar advantages of traditional SVM, but it performs faster computationally (Sattarib et al. 2014). LS-SVM method can be described as follows according to the studies proposed by Suykens et al. (2002) and Van Gestel et al. (2004).

Considering a given training set of N data points \( {\left\{{x}_k,{y}_k\right\}}_{k=1}^N \) with input data xkN and output data yk ∈ r where N is the N-dimensional vector space and r is the one-dimensional vector space. Given training set \( {\left\{{x}_k,{y}_k\right\}}_{k=1}^N \), the minimization of the cost function J of LS-SVM is defined as Eq. 4

$$ J\left(\mathrm{W},e\right)=\frac{1}{2}{\mathrm{W}}^T\mathrm{W}+\gamma \frac{1}{2}\sum \limits_{k=1}^N{e}_k^2 $$
(4)

where \( {e}_k^2 \) is the quadratic loss term and γ is the regularization parameter.

The solution of the optimization problem of LS-SVM is obtained by considering the Lagrangian as Eq. 5

$$ L\left(W,b,e,\alpha \right)=J\left(W,e\right)-\sum \limits_{k=1}^N{\alpha}_k\left\{{W}^T\phi \left({x}_k\right)+b+{e}_k-{y}_k\right\} $$
(5)

where αk are the Lagrange multipliers. The conditions for optimality can be obtained by differentiating with respect to W, b, ek and αk, i.e. (Eq. 6)

$$ \frac{\partial L}{\partial W}=0\to W=\sum \limits_{k=1}^N{\alpha}_k\phi \left({x}_k\right),\frac{\partial L}{\partial b}=0\to \sum \limits_{k=1}^N{\alpha}_k=0,\frac{\partial L}{\partial {e}_k}=0\to {\alpha}_k=\gamma {e}_k,\kern0.62em k=1,...,N,\frac{\partial L}{\partial {\alpha}_k}=0\to {W}^T\phi \left({x}_k\right)+b+{e}_k-{y}_k,\kern0.48em k=1,...,N $$
(6)

Solution of expressions can be written as Eq. 7

$$ \left[\begin{array}{cc}0& {\overrightarrow{\;1}}^T\\ {}\overrightarrow{1}\;& \varOmega +{\gamma}^{-1}I\end{array}\right]\left[\begin{array}{l}b\\ {}\alpha \end{array}\right]=\left[\begin{array}{l}0\\ {}y\end{array}\right] $$
(7)

with \( y=\left[{y}_1;....;{y}_N\right],\overrightarrow{1}=\left[1;....;1\right],\alpha =\left[{\alpha}_1;....;{\alpha}_N\right] \) and by applying Mercer’s theorem (Mercer 1909).

The resulting LS-SVM for function estimation can be expressed as Eq. 8

$$ \hat{\mathrm{y}}=f(x)=\sum \limits_{k=1}^N{\alpha}_k^{\ast }K\left({x}_k,x\right)+{b}^{\ast}\hat{\mkern6mu} $$
(8)

where \( {\alpha}_k^{\ast } \) and b values are the solutions to Eq. (7).

where K(xk, xm) =  ∅ (xk)T ∅ (xk) for k, m = 1, …, N is the kernel function and b is the bias term. Any kernel function can be preferred in accordance with Mercer’s theorem (Gedik 2018).

The kernel functions treated by LSSVM modeling studies are generally some specific functions including linear, spline, polynomial, sigmoid and Gaussian radial basis (Samui 2008; Gedik 2018). The Gaussian kernel is used in this analysis. The Gaussian kernel is given as Eq. 9 (Burges 1998):

$$ K\left({x}_k,{x}_1\right)=\exp \left\{-\frac{\left|\left|{x}_k-{x}_1\right|\right|}{2{\sigma}^2}\right\};k,l=1,\dots .,N $$
(9)

where σ is the width of the Gaussian kernel.

MPRM

A MPMR algorithm is an improved version of SVM (Strohmann and Grudic 2002), where one data is analyzed by shifting all of the regression data +ε along the dependent variable axis, and the other is analyzed by shifting all regression data −ε along the dependent variable axis (Deo and Samui 2017). MPMR uses Mercer’s kernel for obtaining non-linear regression models (Gopinath et al. 2018). Let us assume that the unknown regression function f: Rd➔R, which has the form

$$ y=f(x)+\rho $$
(10)

where x∈ RD = input vector according to a bounded distribution Ω; y∈ R = output vector; ρ is the noise or fitting error and has zero mean value, i.e. E(ρ) = 0, variance Var(ρ) = σ2 and finite σR (Strohmann and Grudic 2002; Lanckriet et al. 2002a; Gopinath et al. 2018). Let us consider the training set examples (Eq. 11)

$$ \Gamma =\left\{\left(\ {\mathrm{x}}_1,{\mathrm{y}}_1\right)\dots .\left({\mathrm{x}}_{\mathrm{N}},{\mathrm{y}}_{\mathrm{N}}\right)\right\} $$
(11)

where ∀i{1, 2 …N}, xi = {xi1, xi2 …xid}∈Rd and yi ∈ R.

We have two objectives; one is to find the approximation function \( \hat{y}=\hat{f}(x) \) and the second objective is to find the error ε > 0 and fitting error (minimum probability) (Strohmann and Grudic 2002; Lanckriet et al. 2002b; Gopinath et al. 2018) (Eq. 12)

$$ \Omega =\operatorname{inf}\ \Pr \left\{\left|\ \hat{\mathrm{y}}\hbox{--} \mathrm{y}\ \right|\le \varepsilon \right\} $$
(12)

The MPMR formulation for the above approximation \( \hat{\mathrm{y}} \) is given by Eq.13

$$ \hat{\mathrm{y}}=\hat{\mathrm{f}}\left(\mathrm{X}\right)=\sum \limits_{i=1}^N{\beta}_iK\left({x}_i,x\right)+b $$
(13)

where k(xi, x) = φ(xi), φ(x) is the kernel which satisfies Mercer’s conditions; xi ∀i ∈ {1, 2, …. N} are obtained from learning data Γ. In the above formulation, βi and b ∈ R are the outputs of MPMR learning algorithm.

One data set is obtained by shifting all of the regression data +ε along the output variable axis. The other data is obtained by shifting all of the regression data −ε along the output variable axis. The classification boundary between these two classes is defined as a regression surface.

ELM

The brief methodology of ELM will be given in this section. The ELM proposed by Huang et al. (2006) is in essence a least-square-based learning algorithm for “generalized” SLFNs, which can be applied as an estimator in regression problem. The weights of the hidden layer in the ELM can be initialized randomly; thus, it is only necessary to optimize the weights of the output layer (Liu et al. 2015). The optimization can be carried out by means of the Moore–Penrose generalized inverse (Liu et al. 2015).

In SLFN, the relation between input (x) and output (y) is given below (Eq. 14):

$$ \sum \limits_{i=1}^K{\beta}_i{g}_i\left({w}_i.{x}_j+{b}_j\right)={y}_j\kern0.5em \mathrm{j}=1,\dots, \mathrm{N} $$
(14)

where wi denotes the weight vector connecting the ith hidden neuron and the input neurons, βi represents the weight vector connecting the ith hidden neuron and the output neurons, bi denotes the threshold of the ith hidden neuron, gi represents the activation function, m denotes the number of hidden nodes and N is the number of datasets.

The Eq. 14 can be expressed in the following way (Eq. 15).

$$ H\beta =T $$
(15)

where H = {hij} (i = 1,…,N, j = 1,…,K and hij = g(wj. xi)) denotes the hidden-layer output matrix, β(β = [β1,..., βK]) represents the matrix of output weights and T(T = y1, y2, ..., yN)T denotes the matrix of targets.

The value of β is determined from the following expression (Eq. 16).

$$ \beta ={H}^{-1}T $$
(16)

where H−1 is the Moore–Penrose generalized inverse of H (Serre 2002). The learning speed of ELM is increased by using Moore–Penrose generalized inverse method.

Results and the prediction performances

Since the studied samples have different degrees of weathering, the change of the porosity (n), slake durability index (Id) and the uniaxial compressive strength (UCS) with weathering were determined (Fig. 4). These changes of the said properties of the samples with weathering can be seen at the box diagrams graphed using maximum, minimum, median, the first quartile and third quartiles of the data measured for each weathered degree (Fig. 4).

Fig. 4
figure 4

The change of the porosity (n), slake durability index (Id) and the uniaxial compressive strength (UCS) with weathering

The mean values of n and Id are 5.0% and 92% in slightly weathered samples while these values are 2.2% and 97% in fresh samples, respectively. On the other hand, the mean values of n and Id are 8.2% and 81% in moderately weathered samples, respectively. The mean of UCS values is 194.7 MPa in fresh samples. These UCS values are 136.8 MPa and 84.8 MPa in slightly weathered samples and in moderately weathered samples, respectively. According to these values obtained for n, Id and UCS, n increases when the grade of weathering of the samples increases while slake durability index decreases with weathering. The condition resulted in decreases of UCS with weathering degree increasing (Table 2, Fig. 4).

To evaluate the performance, the models suggested in this study accuracies, the root mean square error (RMSE), variance account factor (VAF), maximum determination coefficient value (R2), adjusted determination coefficient (Adj. R2) and performance index (PI), given in Eqs. 1721 (Gokceoglu 2002; Gokceoglu and Zorlu 2004; Yagiz et al. 2012; Ceryan 2014) were computed for each model (Tables 3 and 4).

Table 3 Prediction performance measures of SNLR amd MNLR models
Table 4 Prediction performance measures of LS-SVM, ELM and MPRM model for the training and testing periods

RMSE evaluates the residual between desired and output data, and VAF represents the ratio of the error variance to the measured data variance. R2 and Adj. R2 evaluate the linear relation between desired and output data. For a statistical model, in theory, the VAF, RMSE and R2 are 100% for VAF, 0 for RMSE and 1 for R2 and approximately 2 for PI (Ceryan 2014). In reality, VAF, RMSE and R2 can be separately used to examine the model accuracy. Due to none of these indices is superior, the performance index (PI) can be used to examine the accuracy of the statistical models (Yagiz et al. 2012; Ceryan 2014). PI was suggested by Yagiz et al. (2012), and then, it was modified by Ceryan (2014). In the PI modified, Adj. R2 was used instead of R2. The RMSE was calculated from the normalized data.

$$ RMSE=\sqrt{\frac{1}{n}{\sum}_{t=1}^n{\left({d}_t-{y}_t\right)}^2} $$
(17)
$$ VAF=1-\left(\mathit{\operatorname{var}}\left({d}_t-{y}_t\right)/\mathit{\operatorname{var}}{d}_t\right) $$
(18)
$$ {R}^2={\left\lfloor \frac{\sum \limits_{t=1}^n\left({d}_t-{d}_{mean}\right)\left({y}_t-{y}_{mean}\right)}{\sqrt{\sum \limits_{t=1}^n{\left({d}_t-{d}_{mean}\right)}^2}\ \sqrt{\sum \limits_{t=1}^n{\left({y}_t-{y}_{mean}\right)}^2}}\right\rfloor}^2 $$
(19)
$$ {AdjR}^2=1-\frac{\left(n-1\right)}{\left(n-p-1\right)}\left(1-{R}^2\right) $$
(20)
$$ PI= Adj{R}^2+0.01 VAF- RMSE $$
(21)

where n is the number of training or testing samples, p is the model input quantity and dt and yt are the measured and predicted values, respectively.

The performance of the SNLR models with slake durability index is higher than the other SNLR model. The performance index and VAF values of the SNLR models with n are 1.302 and 66.6. It can be said that the performance of the SNLR models with n is not successful (Table 3). Considering the values of the performance indices except RMSE, it can be said that the peformence of MLR models is higher than SNLR model with slake durability index. The performance index of the SNLR models with Id is 1.532 wihle PI value of MLR model is 1.560. The performance of these regression models in predicting maximum and minimum values is not successful.

Taking into considering RMSE, R2, maximum and minimum value and VAF values, the LS-SVM model having the worst performance among soft computing methods given in this study performed slightly better than the MLR models having the best performance among regression models (Tables 3 and 4). The performance index of the MLR model is 1.560 while PI value of LS-SVM model is 1.573 (Tables 3, 4 and Fig. 6).

In this study, the huffled complex evolution algorithm (Wang et al. 2009) and grid-search method (Samui and Dixon 2012) were used to determine the optimum range for LS-SVM parameters, respectively. For the developed LSSVM, the design parameters of γ and σ are 0.7 and 50, respectively.

In the ELM applications, three-layered FFNN was constructed for modeling of runoff series. The optimal number of neuron in the hidden layer was determined using a trial-and-error approach by varying the number of neurons from 2 to 10. The developed ELM gives best performance for seven neurons in the hidden layer.

In the MPRM modeling, this study uses trial and error approach for determination of design parameters of error insensitive zone (ε) and σ. The design values of ε and σ are 0.001 and 1.9, respectively.

The R2 values calculated for the MPRM, ELM and LS-SVM models are 0.9597, 0.9587 and 0.9099 during training and 0.9114, 0.8997 and 0.9062 during the testing periods, respectively (Table 4). The Adj. R2 values calculated for the MPRM and ELM models are 0.9570 and 0.9560 during training and 0.8953 and 0.8814 during the testing periods, respectively. These values for LS-SVM model during training and testing periods are 0.9039 and 0.8892, respectively. In the training and testing periods (Table 4). The RMSE values calculated for the MPRM, ELM and LS-SVM models are 0.0590, 0.0597 and 0.0954 during training and 0.0905, 0.0934 and 0.1497 during the testing periods, respectively (Table 4). The best R2 and RMSE values in during the training and testing periods were obtained for MPRM model while the worse values in both periods obtained for LS-SVM models (Table 4). The VAF values calculated for the MPRM, ELM and LS-SVM models are 0.981, 0.985 and 0.781 during training and 0.853, 0.848 and 0.834 during the testing periods, respectively (Table 4).

In MPRM and ELM models, the difference in VAF, R2 and Adj. R2 values between training and testing period ranges 5–13% while the difference in PI was obtained 13%. In the LS-SVM models, the difference in VAF, R2, Adj. R2 and PI values between training and testing period ranges 0.5–6%. This is very low value.

In training period, the best convergence to the maximum and minimum values measured was obtained for the ELM model while the best convergence to these measured values was obtained for LS-SVM in testing period (Table 4).

According to the results in this study, the MPRM, ELM and LS-SVM methods are useful tools for modeling the sample UCS and performed well. The PI values obtained for the MPRM, ELM and LS-SVM models are 1.879, 1.881 and 1.590 during training and 1.658, 1.636 and 1.573 during the testing periods, respectively (Table 4). Considering PI value, The MPRM performed slightly better than the ELM model. Further, the difference in training and testing performance between the ELM and LS-SVM models is meaningful.

In order to check the validation of the prediction models suggested in this study, the relations of predicted values versus measured values are plotted in Fig. 5.

Fig. 5
figure 5

The predictions of LS-SVM, ELM and MPRM models for training and testing periods

The error in the predicted value was represented by the distance that each data point plots from the 1:1 diagonal line (Fig. 6). It can be seen that the predicted values for ELM and MPRM models are almost lying on the diagonal line unlike LS-SVM model (Fig. 6).

Fig. 6
figure 6

The scatter plots of LS-SVM, ELM and MPRM model developed in this study during the training and testing periods

Regression error characteristic (REC) curve (Bi and Bennett 2003) gave the graph of error tolerance versus percentage of points that are predicted within the tolerance. The x-axis and the y-axis represent the error tolerance and the accuracy of a regression function, respectively (Fig. 7). The area over the REC curve (AOC) provides the approximation of the expected error. The lesser the AOC, the better is the performance of the models. Thus, ROC curves allow easy and reliable visual estimate of the performance of the models (Fig. 7). Figures 7 and 8 show the ROC curves and bar chart of AOC values of the different models. The value of AOC of LSSVM model is higher than the developed MPMR and ELM models. Hence, performance of MPMR and ELM is better than the developed LSSVM model. The performance of MPMR and ELM is almost the same.

Fig. 7
figure 7

ROC curve of the developed models

Fig. 8
figure 8

BAR chart of AOC values of the developed models

Taylor diagrams (Taylor 2001) are simple graphical representation of how the predicted values of are in correspondence with the observed values and compare the performance of various models used for prediction. It depicts statistical comparison of various models in a two-dimensional graph by plotting standard deviations, correlation coefficient and centered root mean square (RMS). Standard deviation is denoted by the radial distance from the origin. The RMS error is relational to the distance between observed and simulated fields assessed in the identical units to standard deviation. Correlation coefficient is represented by the azimuthal angle (Fig. 9).

Fig. 9
figure 9

Developed taylor diagram of the constructed models

Figure 9 illustrates taylor diagram of the developed MPMR, ELM and LSSVM models. It is clear from Fig. 3 that the developed MPMR and ELM produce better performance than the LSSVM model.

Discussion and conclusions

This study examines the applicability and capability of the Extreme Learning Machine (ELM), Minimax Probability Machine Regression (MPMR) approaches for prediction of UCS of the volcanic rocks with different weathering degree. The results of the developed models ELM and MPMR have been compared with Least Square Support Vector Machine (LS-SVM) models. In these models, porosity and slake durability index were used as input parameters.

According to results of this study, the LS-SVM model having the worst performance among soft computing methods given in this study performed slightly better than the MLR models having the best performance among regression models. Also, some difficulties in the implementation and generalization of these statistical models are valid such as given in introduction section.

Considering the PI, REC and Taylor diagram, the performance of MPMR and ELM is better than the developed LS-SVM model. The prediction performance of MPRM, ELM models is excellent while the performance of LS-SVM model is good. Hence, the relation between inputs and output has been captured successfully by the developed soft computing models. The performance of MPMR and ELM is almost the same.

The developed MPMR tries to keep the predicted output within a bound and model controls future prediction of UCS. However, ELM and LS-SVM have no control over future prediction. MPMR uses two tuning parameters (error insensitive zone and width of radial basis function). LS-SVM also uses two tuning parameters (error insensitive zone and regularization parameter). ELM uses four tuning parameters(activation function, number of hidden neurons, number of training dataset and size of block data). ELM is a modified version of ANN. So, it is constructed based on empirical risk minimization principle. MPMR and ELM have been constructed based on structural risk minimization principle. The concept of probability is used for developing MPMR model. ELM and LS-SVM are not probabilistic models.

In general, there is no discernible difference between the Id values of the fresh rocks and the Id values in the slightly weathered rocks (Ceryan et al. 2008; Wyering et al. 2014; Ceryan 2015; Undul and Tugrul 2016; Udagedara et al. 2017). On the other hand, there is a significant difference between the Id value of the moderately weathered rock and the Id value of the fresh rock. This difference is much higher in highly decomposed rocks. This condition is also valid for n. Similarly, UCS decreases with increasing degree of weathering, especially after moderately weathering degree (Wyering et al. 2014; Ceryan 2015; Undul and Tugrul 2016; Udagedara et al. 2017). These implications apply also to this study. Therefore, these soft computing models given in this study are suitable for samples from magmatic and metamorphic rock with at least three different degrees of weathering except completely weathering degree. The soft computing models provided in this study are not recommended for examples that do not have different degrees of weathering. User can use the developed models as quick tools for prediction of UCS of magmatic and metamorphic rock with different weathering degree. The developed models can be tried for solving different problems in weathered rocks.