Introduction

In general, providing a reliable estimate of rock mass characteristics (i.e. strength and deformation) is of primary importance in analysing and designing rock engineering applications such as slope, foundation and underground excavation. In particular, the elasticity constants of a rock mass (E, \(\upsilon\)) are considered as two main inputs for analysing rock deformation behaviour. Typically, rock mass deformability parameters such as Young’s modulus are measured directly by field tests as in situ modulus (denoted by E rm) and indirectly by laboratory tests as intact modulus (denoted by E). Moreover, since true deformation analysis of rock must be carried out according to site conditions, it is important to calculate the rock deformation parameters by taking both lab and field circumstances into consideration. For this purpose, the intact material modulus derived from lab experiments should be correlated with the rock mass modulus using an appropriate classification scheme.

The most widely used methods for determining Young’s modulus include plate loading (equipped with a multi-point extensometer in the rock mass) and flat jack testing for measurements in the field (Hoek and Diederichs 2006) and uniaxial compressive strength (UCS) testing in the laboratory. However, the results of these tests may be susceptible to uncertainty due to the discontinuity and anisotropic behaviour of rock mass subjected to diverse field stresses (Yazdani Bejarbaneh et al. 2015; Armaghani et al. 2015, 2016a). For example, rock samples extracted from the failure zone around a tunnel free face might be exposed to grain-scale damage (micro-cracking) as a result of either stress relaxation or blasting (Martin and Stimpson 1994). On the other hand, both testing methods are time-consuming and require costly equipment, particularly for field tests (Mishra and Basu 2013; Armaghani et al. 2016b).

Several polynomial regression models using field and lab test data have been proposed to overcome these limitations. Table 1 lists a series of regressions that utilise correlations between field data and rock mass classification systems, including rock mass rating (RMR; Bieniawski 1973), tunnelling quality index (Q-system; Barton et al. 1974) and geological strength index (GSI; Hoek and Brown 1997), in order to predict deformation modulus values, E rm, for an isotropic rock mass. Findings from most of these correlations show a relatively good fit to the field data, despite the fact that the exponential relationships and modulus-based equations, as suggested by Mitri et al. (1994) and Sonmez et al. (2004), deliver poor performance in predicting rock mass deformation moduli.

Table 1 Empirical relationships for estimating rock mass deformation modulus

Additionally, Table 2 provides a number of typical simple regression equations based on the data from lab measurements. In fact, these regressions were developed by relating a range of data from simple index tests, including Schmidt hammer (Yilmaz and Sendir 2002; Dincer et al. 2004), ultrasonic velocity (Yasar and Erdogan 2004; Armaghani et al. 2014), point load strength (Yilmaz and Yuksek 2008, 2009) and porosity (Lashkaripour 2002; Beiki et al. 2013), to Young’s modulus E. However, these statistical models suffer from low generalisability, such that they are not able to be generalised to a wide range of data from different engineering applications (e.g. Beiki et al. 2013; Rezaei et al. 2012).

Table 2 A number of correlations between Young’s modulus and other rock index tests

Over the last 20 years, there has been a marked increase in the successful application of intelligent methodologies such as artificial neural networks (ANNs), fuzzy inference system (FIS) and evolutionary computation for the preliminary stage of rock engineering design and rock mechanics modelling (Feng and Hudson 2004, 2010; Hudson and Feng 2007; Mishra and Basu 2013; Mishra et al. 2015). Gokceoglu and Zorlu (2004) validated the results of a FIS model in predicting the E and UCS of greywacke samples. The proposed fuzzy model benefited from 54 fuzzy rules to map four input variables (rock index properties) to two output variables (E and UCS). The authors analysed the performance of the fuzzy model using multiple regression, and concluded that the predicted results were in good agreement with the lab test results compared with a statistical model. Kahraman et al. (2009) trained an ANN model to predict the levels of UCS and E for Misis fault breccia, and their prediction results revealed the highest accuracy when compared with the regression models. Yagiz et al. (2012) examined the effect of slake durability cycles on the UCS and elasticity constant of carbonate rocks by developing ANN and multivariate regression models based upon 54 carbonate rock cores. They reported more reliable prediction of ANN than that with multivariate regression. An ANN model together with a multiple regression using several intact rock properties of gypsum were developed by Yilmaz and Yuksek (2008) to predict Young’ modulus of gypsum, with results demonstrating that the proposed ANN model was able to predict the proposed rock modulus with reasonable accuracy. Yilmaz and Yuksek (2009) undertook another study of gypsum rock samples obtained from the Sivas basin in Turkey in order to assess the predictability of two engineering properties of gypsum; UCS and E. For this purpose, they constructed ANN and hybrid neuro-fuzzy systems from the basic and index properties of the samples (input data), including water content, porosity, sonic velocity, Schmidt hammer rebound number and point load index, to predict both E and UCS (output data). They reported successful prediction for both models compared with multiple regression. Tonnizam Mohamad et al. (2015) and Momeni et al. (2015) showed the successful application of hybrid particle swarm optimization (PSO) and ANN in predicting the UCS of rocks. Several recently developed models for predicting the rock modulus, E, using soft computation techniques are shown in Table 3.

Table 3 Recent works on the prediction of E using soft computation techniques

This study attempts to estimate Young’s modulus for a series of known index properties for sandstone samples. To this end, two soft computing techniques, FIS and ANN, were designed such that the three index properties act as inputs to the systems, with Young’s modulus as the target. More specifically, this fitting problem involves matching a set of numeric lab measurement inputs, including Schmidt hammer, rebound number (R n), P-wave velocity (V p), and point load index (I s(50)) to an associated set of numeric targets E. In the final stage of this study, a multiple regression (MR) model using the same data set is constructed for the sake of comparison. This statistical model is employed to provide a measure of how well both systems (FIS and ANN) fit the data in terms of their performance indices.

Methods

Fuzzy inference system

Zadeh (1965) established the fundamental mechanism of fuzzy logic theory on which mapping the input space onto the output space is based. The concept of fuzzy logic is principally founded on a fuzzy set. In such a set, there are a number of components with partial membership in that set, as opposed to crisp or well-defined boundaries for a classical or ordinary set. In the case of fuzzy logic, the truth of a conditional expression is measured by degrees (between 0 and 1) resulting from an appropriate membership function (MF). In other words, characterisation of a fuzzy set is carried out by the specific MF which conveys a sense of ambiguity (Zadeh 1973). In fact, the MF implies how each of the crisp values from the input dimension is connected to a membership grade (values within the interval from 0 to 1). The mathematical relationships are capable of constructing any form of MF in a succinct and straightforward way. In addition, each of the various classes of MF is denoted by a specific designation which is directly dependent on the shape and formula of that function, with triangular, trapezoidal, Gaussian, generalised bell and sigmoidal MFs as the most common types (Jang et al. 1997).

Fuzzy conditional statements

Fundamentally, in a fuzzy system, the input–output mapping can make some inferences based upon a set of rule statements using parallel evaluation. Overall, an if–then rule structure is divided into two differentiable parts, the "if" part and the "then" part, which are referred to as the antecedent or premise and the consequent or conclusion, respectively (Sivanandam et al. 2007). With regard to the configuration of the fuzzy rule, there are various primary elements in defining the rules which involve input and output variables in conjunction with descriptive adjectives pertaining to those variables. For instance, a statement for the if–then rule is represented as follows:

$$\text{If}\;x\;\text{is}\;A\;\text{ then}\;y\;\text{is}\;B,$$

where two variables x and y are the universes of discourse for fuzzy sets describing linguistic labels or values A and B, respectively.

Fuzzy reasoning (or approximate reasoning)

In general, fuzzy reasoning attempts to specify conclusions by applying various inference operations to a certain number of fuzzy statements (Bai et al. 2007). More specifically, the fuzzy reasoning (or approximate reasoning) mechanism comprises five steps:

  1. 1.

    The first step (fuzzification of inputs) consists in interpreting the input values through the specific input membership functions.

  2. 2.

    The second step (application of fuzzy operators) involves the application of a fuzzy logic operator to the membership values from the “if–then” parts of a rule in order to yield a single firing strength per rule.

  3. 3.

    The third step (implication method), in which a fuzzy set is assigned to the output variable, refers to an implication method that involves truncating that fuzzy set to a degree derived from the preceding stage.

  4. 4.

    The fourth step (aggregation) involves aggregating all the truncated fuzzy sets for each rule in order to obtain an overall consequent membership function.

  5. 5.

    The fifth step (defuzzification) consists in converting the resultant fuzzy set (i.e. overall consequent membership function) into a single crisp output using a competent method selected from a list of defuzzification techniques, e.g. centroid, bisector, mean of maximum.

Fuzzy rule-based models

The combination of fuzzy sets, fuzzy logic operators (fuzzy reasoning) and fuzzy statement constitutes the backbone of the FIS, also referred to as a fuzzy model or rule-based model. FIS applications cover a broad scope of research areas including pattern recognition, data and image classification, management, economics, automatic control, robotics, signal processing, computer vision, decision-making, expert systems and prediction of chaotic time series (Zadeh 1965; Rutkowski 2004).

Overall, the most frequently used FIS models for various applications can be categorised under two headings: the Mamdani fuzzy model (Mamdani and Assilian 1975) and the Sugeno fuzzy model (Takagi and Sugeno 1985). These systems differ based on the type of function being used in the consequent part of their fuzzy statements, as demonstrated in Fig. 1. As a result, the operations employed in the consequent part of fuzzy rules (i.e. aggregation and defuzzification) vary in accordance with this distinction (Jang et al. 1997).

Fig. 1
figure 1

Approximate reasoning procedures for the most commonly used FIS models (Jang et al. 1997)

Sugeno-type fuzzy model

Takagi and Sugeno (1985) developed a systematic approach for generating fuzzy statements, commonly referred to as the Sugeno fuzzy inference method (other designations include the Takagi–Sugeno method or TSK method, hereinafter called the Sugeno method). In the Sugeno fuzzy model, the consequent function is expressed mainly in polynomial form (defined by the input variables) rather than any of the MFs previously mentioned. For this reason, the fuzzy operations are applied only to the antecedent part of the if–then rule. In other words, unlike the Mamdani type, the fuzzy reasoning process in the Sugeno model cannot be rigorously executed in both parts of the fuzzy rule statement. Since the resulting outputs inferred from each rule are crisp values, the total single-value output for each target variable is equal to a weighted average of outcomes from all the rules. These two versions of the fuzzy model are graphically illustrated in Fig. 1.

The Sugeno model substitutes the weighted average method for the defuzzification techniques employed in the Mamdani model. In fact, the defuzzification of the resulting output MF into a single crisp value suffers from two major drawbacks, the failure to reach an exact result mathematically and the time-consuming nature of the computations (Shams et al. 2015). Therefore, the Sugeno-type fuzzy inference is far and away the preferred choice for modelling a fuzzy system from any given input–output data set.

Data clustering

Data clustering is a quick, one-pass approach for developing a FIS from any given number of data sets. In most cases, it is advisable to apply a clustering algorithm to the data-based fuzzy model, particularly when there is little to no prior information as to the underlying behaviour of the data being analysed (Jain and Dubes 1988; Jain et al. 1999). Most fuzzy systems generated without clustering suffer from an excessive number of rules, especially those systems with relatively large input variables. In contrast, the rules produced by data clustering are extremely well suited to the input data points such that the number of rules is optimised according to the number of identified clusters.

The extraction of fuzzy rules from a set of data is usually undertaken by either subtractive or mountain clustering analysis. However, the mountain clustering approach involves high computational expense in comparison to the subtractive method. Therefore, in this research, all fuzzy rules were propagated based on subtractive clustering in order to avoid the extra computational cost. Further information on these clustering paradigms can be found in the authors' publications (Chiu 1994; Yager and Filev 1994a, b).

Artificial neural network

In general, an artificial computational system, or ANN, is designed by simulating various organisational principles upon which a nervous system functions are based. Unlike traditional expert systems, ANN is inherently capable of learning from any given training pattern to find the underlying relationship between input and output data for a mapping problem (Zurada 1992). Artificial neurons are regarded as the constitutive units of an ANN computing system and enable the parallel processing of information in the same way as a biological brain.

Pioneering work in neural network modelling by McCulloch and Pitts (1943) led to the development of a binary threshold logic unit (binary decision unit) for modelling artificial neuron behaviour. Every artificial node of the network captures a weighted sum of incoming signals, and then passes the signals through a specific activation function to produce a more useful output. Structurally, ANNs can be viewed as highly parallel systems in which a network of interconnected computational units, called neurons or nodes, are organised into successive layers. Each pattern of connection between neurons affects network behaviour and also defines the network class (Kanellopoulas and Wilkinson 1997).

As mentioned previously, it is possible to train the network so that network performance can be effectively improved. More precisely, in the course of network training, both the architecture and connection weights are iteratively modified to minimise the error from the output layer node. In fact, the produced output error is computed by a squared error function, given as

$$E = \tfrac{1}{2}\sum\limits_{i = 1}^{p} {(t^{(i)} - y^{(i)} )^{2} },$$
(1)

where the parameters t and y represent the target value and actual produced value, respectively. The parameter P denotes the number of training patterns.

Network learning tasks are commonly undertaken through a gradient-based learning procedure, referred to as a back-propagation (BP) learning algorithm, especially for multilayer feedforward networks. Basically, each training period in BP learning is a twofold procedure comprising a forward stage and a backward stage. During the forward stage, input signals move forward through the network, sending out error signals for each output-layer node. Subsequently, in the next stage, the resulting error rates will be passed backward along the network to modify the network weights and biases.

Depending on the network architecture, ANNs are classified into two functional groups: feedforward and feedback. One of the most commonly used variants of multilayer feedforward networks is the multilayer perceptron (MLP), in which successive layers of processing units (neurons) exchange and process information (signals) through weighted links and activation functions, respectively (Haykin 1999). In general, hidden and output neurons can perform certain specific activation functions of net input in order to produce neuron outputs. Note that each neuron output is regarded as input to the next layer of neurons. Generally speaking, the type of activation function should be selected according to the complexity of the problem to be solved. In the case of nonlinear problems, therefore, it is advisable to employ the sigmoid transfer functions, e.g. log-sigmoid and tangent sigmoid. Each of the hidden neurons is fed with the total net input in which each incoming signal (x i ) from the previous layer is multiplied by an associated adaptive weight coefficient (w ij ) to yield weighted input signals. A summation function is then applied to these weighted signals, and finally a small amount of bias is added to the aggregate signal. This process is repeated for each layer until the system’s overall output is produced. Mathematically, the total net input to every hidden or output neuron is expressed as:

$${\text{net}}_{{\mathop h\nolimits_{j} }} \text{ = }\sum\limits_{{i\;\text{ = }\;\text{1}}}^{n} {w_{ij} } \times x_{i} + b_{j}$$
(2)

For each neuron output, the resulting total net input is squashed into the activation function (e.g. sigmoid). Thus the output for every hidden or output neuron is derived as:

$$y_{j} = {1 \mathord{\left/ {\vphantom {1 {\left( {1 + \exp \left\{ { - {\text{net}}_{{\mathop h\nolimits_{j} }} } \right\}} \right)}}} \right. \kern-0pt} {\left( {1 + \exp \left\{ { - {\text{net}}_{{\mathop h\nolimits_{j} }} } \right\}} \right)}}$$
(3)

Figure 2 briefly demonstrates data processing operations for a typical artificial neuron.

Fig. 2
figure 2

Schematic for an artificial node j (Jang et al. 1997)

Case study and experimental work

The data set employed in the present study relates to a hydroelectric power project which includes the construction of a roller-compacted concrete (RCC) dam, located in the Malaysian state of Sarawak (see Fig. 3). The state of Sarawak benefits from an abundant supply of water, thanks to average annual precipitation of up to 4000 mm. In addition to the high annual rainfall levels, the presence of appropriate geographical and geological conditions in the state provides the foundation for the development of hydroelectric power (HEP) dams. Given these two attributes, Sarawak is ideally positioned as a sustainable source of renewable energy.

Fig. 3
figure 3

The location of the Lawas site

The Lawas RCC dam is designed with output capacity of about 100 MW to meet a portion of the current state demand for electrical power. As part of exploration and subsurface investigations at the proposed site, a total of 20 boreholes were drilled to a depth of 150 m by means of a wash boring machine. This subsurface survey revealed that the substrata profiles situated below the RCC dam foundation consisted primarily of a range of sedimentary rocks with degrees of weathering ranging from fresh to moderately weathered zones. These sedimentary rocks comprise sandstone, shale, and mudstone at the foundation level, which indicates an RMR number of 40.

The data set analysed herein was developed based on the core samples of a sandstone layer at depths varying from approximately 13.50 to 81.50 m. A sufficient number of core specimens of sandstone were collected from the boreholes of ZKB1, ZKB2, ZKB3, ZKB4 and ZKB7 using the NX core barrel (54-mm core diameter). Next, the extracted core samples were packed out and then transported to the laboratory, where the geotechnical properties of the rock would be characterised numerically using several laboratory tests, including Schmidt hammer, point load, P-wave velocity and UCS.

Specimen preparation

Each core sample employed in this study was trimmed by a diamond disc cutter in order to obtain a standard cylindrical shape with a 54-mm diameter, which allows for height/diameter (H/D) ratios within an acceptable range of 2.5–3 such that troublesome size effects are eliminated. After the core samples were cut, a grinding machine was used to grind the end planes of the specimen to provide parallelism and flatness, facilitating the axial loading condition. Some typical core samples prepared for laboratory tests are shown in Fig. 4. In the present study, the drill core preparation and all testing procedures fully conformed to the guidelines of the International Society for Rock Mechanics (ISRM 2007). It is also worth pointing out that all laboratory tests were performed on air-dried core samples.

Fig. 4
figure 4

Some of the cylindrical core samples prior to UCS tests

Point load index test (PLT)

A group of diametral tests were conducted to classify the strength of the core samples investigated in this study. The testing machine is equipped with a loading system of 100 kN capacity, two measuring systems (load and displacement records) and a controller unit. A loading frame, pump, ram and a pair of conical platens constitute a loading system in which the load is applied incrementally to the core specimen such that a sudden rupture occurs within 10–60 s. During the load application, the corresponding records for both failure load and distance (between core sample and platen contacts points) were monitored through a hydraulic pressure gauge and a displacement transducer, respectively, and these records were also simultaneously transferred to the controller unit, a data logger, for producing the data as an indication of the strength of the sample being tested.

Schmidt/rebound hammer test

An L-type Schmidt hammer with an impact energy of 0.74 Nm was employed to assess the surface hardness of NX core samples. In order to avoid movement and vibration during the test, the core specimens were securely clamped to a semi-cylindrical slot embedded in a steel base. The test was performed by employing a spring-driven steel hammer with vertical downward axis orientation. When a steel plunger rod is pushed against the prepared core surface, an internal spring-controlled mass with a predetermined amount of energy impacts the plunger and rebounds a certain distance. The rebound distance travelled by the mass is measured on a graduated scale as the rebound number. Based on ISRM guidelines (2007), 20 representative points with even spacing equal to at least the diameter of the plunger were determined on the surface of the core sample. Accordingly, an average of 20 valid readings for each specimen was calculated and used.

Ultrasonic velocity test

In this study, a high-frequency ultrasonic pulse technique using transducers with a frequency range of 100 kHz to 2 MHz was adopted to measure the compressional wave velocity (denoted by V p). In order to meet the full coupling condition, the end planes of specimens were uniformly covered with a thin film of a specific gel. In accordance with ISRM (2007), the transducers are first pressed against the core samples with a small stress up to 10 N/cm2 (seating force), and a pulse generator subsequently sends out an input signal of compression waves along the core axis. A direct pulse transmission method was utilised to calculate P-wave velocities by recording the time during which the waves travel from the transmitter to the receiver.

Uniaxial compressive strength test

In the present study, the uniaxial compressive strength (UCS) and deformation attributes of the rock materials were verified under uniaxial compression by means of a servo-controlled 3000 kN compression machine. All tests were conducted under a stress-controlled state in which the compressive load was applied at a constant rate of stress around 0.5–1 MPa/s. In accordance with ISRM guidelines (2007), the rock samples under compression were ruptured within 5–10 min. During the test, a set of measurements on load cell and axial strains (linear variable differential transformers [LVDTs]) were recorded for the core samples at regular intervals until failure in order to determine various rock material properties, including UCS, strain at failure \(\varepsilon\) and elastic modulus E. Similar to igneous and metamorphic rocks, these medium-grained sedimentary core samples typically display a brittle behaviour under uniaxial compression, resulting in sudden failure in the form of distinct fracture planes (see Fig. 5).

Fig. 5
figure 5

Fracture planes in failed rock samples after performing UCS tests

Calculating the modulus of elasticity

In engineering practice, the deformation behaviour of any rock material/mass is commonly described either through tangent elastic modulus (E tan) or secant elastic modulus (E sec). These values are determined using an analysis of the stress–strain relationship for any given rock material/mass which is subject to unconfined compression. In the case of rock material deformation measurement, it is customary to use the tangent modulus (E tan; also called the modulus of elasticity), which represents the slope of a stress–strain curve at one-half the ultimate strength (50% UCS), whereas for specifying rock mass deformation, most engineers prefer to employ the secant modulus (E sec; also known as modulus of deformation), which represents the slope of a straight line from origin (0, 0) to a certain stress–strain point corresponding to either ultimate strength or one-half the ultimate strength. The procedures representing both the E tan and E sec are illustrated in Fig. 6. In the present study, the former procedure was utilised for analysing the stress–strain curve of each sandstone material to produce the relevant tangent elastic modulus (hereinafter referred to as the modulus of elasticity and denoted by E).

Fig. 6
figure 6

Two procedures for characterising rock deformation using a stress–strain curve

In this study, a total database of 96 data samples, including R n (in a range of 20–40.5), V p (in a range of 1.67–3.16 km/s), and I s(50) (in a range of 1.43–4.29 MPa) as predictors and E (in a range of 10.5–32.22 GPa) as output, were prepared to construct the predictive models. Figure 7 demonstrates three input and one output variable with their respective data used in the modelling process for all systems under consideration. In addition, basic descriptive statistics of the database are presented in Table 4.

Fig. 7
figure 7

Operating range of three-element input and one-element target

Table 4 Basic descriptive statistics of the database used

Simple regression analysis

In order to examine the effect of input parameters, simple regression analysis was carried out between the E and other input parameters including R n, V p and I s(50). To obtain equations with higher performance capacity, various types including linear, exponential, power and logarithmic equations were performed. In this study, the coefficient of determination (R 2), variance accounted for (VAF) and root mean square error (RMSE) were calculated to control the capacity performance of all developed models:

$$R^{2} = 1 - \frac{{\sum\nolimits_{i\; = \;1}^{N} {\left( {y - y^{\prime} } \right)^{2} } }}{{\sum\nolimits_{i\; = \;1}^{N} {\left( {y - \tilde{y}} \right)^{2} } }}$$
(4)
$$\text{VAF} = \left[ {1 - \frac{{\text{var} \left( {y - y^{\prime} } \right)}}{{\text{var} \left( y \right)}}} \right] \times 100$$
(5)
$$\text{RMSE} = \sqrt {\frac{1}{N} - \sum\nolimits_{i\; = \;1}^{N} {\left( {y - y^{\prime} } \right)^{2} } }$$
(6)

where y and y′ are the measured and predicted values, respectively, \(\tilde{y}\) is the mean of the y values, and N is the total number of data. The model will be excellent if R 2 = 1, VAF = 100 and RMSE = 0. The selected equations for predicting E using the above-mentioned predictors together with their performance indices are presented in Table 5, which shows that the power, linear and logarithmic equation types give the best results for predicting E using R n, V p and I s(50), respectively. The R 2 values obtained for the equations are 0.503, 0.545 and 0.445, respectively. The purposed relationships between the E and relevant parameters of the rock are given in Fig. 8. The results revealed that these relationships were statistically meaningful, but in order to obtain higher-performance models for predicting E in practice, multi-input parameters may be needed. Therefore, three types of modelling techniques—FIS, ANN and MR—were also constructed and developed.

Table 5 Selected equations for estimating E, together with their performance indices
Fig. 8
figure 8

Purposed relationships between the E and input parameters

Multi-input predictive models

Designing the Sugeno fuzzy system

This section presents the fuzzy predictive technique for predicting the E of sandstone using results of R n, V p and I s(50). For this purpose, as a first stage of modelling, the proposed data set was normalised into a unit interval [0, 1] using the following equation:

$$X_{\text{norm}} = {{\left( {X - X_{ \hbox{min} } } \right)} \mathord{\left/ {\vphantom {{\left( {X - X_{ \hbox{min} } } \right)} {\left( {X_{ \hbox{max} } - X_{ \hbox{min} } } \right)}}} \right. \kern-0pt} {\left( {X_{ \hbox{max} } - X_{ \hbox{min} } } \right)}},$$
(7)

where X and X norm represent the measured and normalised values, respectively, and X min and X max are the minimum and maximum values of the measured parameters, respectively. Modelling and validating the fuzzy systems are accomplished by dividing the complete data set into training and test sets, each of which is determined as a percentage of the original data: 80% is designated to design the systems, and 20% is designated to measure the accuracy of the systems. This was implemented according to the work of several scholars, such as Swingler (1996) and Looney (1996). Therefore, in the present study, 77 data sets were randomly chosen from which to develop the models, and the remaining 19 data sets were assigned to test these models.

In this research, several FIS models of first-order Sugeno class were created based upon the subtractive clustering algorithm as an effective preprocessor to those data-based models. As a result of this preprocessing, a number of clusters were identified to produce the MFs and fuzzy conditional statements for each fuzzy system. Actually, the clustering-induced fuzzy system attempts to form a pattern for physical behaviour of the proposed empirical data set by relating the observations of three-dimensional input space (R n, V p and I s(50)) to their corresponding targets (E). Figure 9 presents a schematic of these components associated with each of the proposed fuzzy systems.

Fig. 9
figure 9

Overall structure of a three-input, one-output fuzzy system

In contrast to the linear MFs, with sudden changes and breaks at the intersection points of straight lines, the MFs representing nonlinear relationships allow for a gradual, smooth movement among fuzzy sets (Jang et al. 1997). Consequently, a bell-shaped function with normal distribution, referred to as Gaussian MF, was defined to characterise each fuzzy set on the premise part of an if–then rule. The following formula expresses a Gaussian MF by its two geometric parameters c and σ:

$${\text{gaussian}}\left( {x,c,\sigma } \right) = e^{{ - \frac{1}{2}\left( {\frac{x - c}{\sigma }} \right)^{\text{2}} }},$$
(8)

where the parameters c and σ represent the center and spread coefficient for the Gaussian curve, respectively. As mentioned above, the identified clusters within the proposed data set provide both geometric parameters for each input MF. In addition, each of the output MFs is a first-order polynomial composed of the input variables and expressed by the following equation. The coefficients of this linear relationship are drawn from training data samples based on least squares estimation.

$$E = aR_{\text{n}} + bV_{\text{p}} + cI_{{{\text{s}}(50)}} + d.$$
(9)

The fuzzy operation for each rule in the antecedent part was accomplished by AND operator: prod (product). The AND method was then used to perform the implication function whereby the antecedent part outcome (firing strength) for each rule defined a level of corresponding linear output MF, E i . Finally, a weighted average method was employed to combine all rules outcomes into a single value, representing the final output of the proposed system, E, as follows:

$$E = \frac{{\sum\nolimits_{{\text{i}\; = \;1}}^{n} {w_{i} E_{i} } }}{{\sum\nolimits_{i\; = \;1}^{n} {w_{i} } }},$$
(10)

where the number of each rule is designated by n. In total, seven Sugeno-type fuzzy models were constructed from the training patterns given diverse parametric values of design. The resulting models were then validated using only the test set. Generally, the first stage in developing any data-based FIS is to adjust the data clustering parameters. The two characteristics associated with center coordinate and number of identified clusters can be controlled by altering the extent of the cluster radius parameter (indicated by R a ). In most cases, assigning a minimum value to the cluster radius parameter increases the size and quantity of identified clusters, and vice versa (Chiu 1994). The optimal range of the cluster radius is usually set between 0.2 and 0.5 (MATLAB user guide 2007).

In addition to cluster radius, another parameter of subtractive clustering, known as cluster neighbourhood (indicated by R b ), can be tuned to govern the range of influence of each cluster as well as the cluster numbers in the input space under consideration. More precisely, increasing the neighbourhood for each cluster allows subtractive clustering to find the centers for those clusters with larger intermediate distances. Accordingly, a number of various cluster radii (R a ) ranging from 0.2 to 0.5, along with a fixed cluster neighbourhood value (usually greater than cluster radius) for all the data dimensions, were used to create the seven fuzzy models. Table 6 presents the design parameters along with the number of fuzzy rules for each of the seven fuzzy models.

Table 6 Design parameters and input MF numbers for each developed FIS model

The training set and then testing set performance capacities of the proposed models are evaluated by means of R 2, VAF and RMSE, as shown in Table 7. However, selecting the best model based upon only these performance indices can be difficult, due to small differences among the pertinent statistics. For this reason, a simple ranking approach was used (Zorlu et al. 2008), in which each of the fuzzy models was graded separately according to its performance on training and test sets. As shown in the assigned total rank scores listed in Table 7, the FIS1 model was the most successful of the seven fuzzy models in predicting E values.

Table 7 Training and testing performance indices of each FIS model and their respective rank values

For the FIS1 model, the training data points were grouped into six clusters to construct the fuzzy system. Figure 10 demonstrates this natural grouping for just two selected dimensions of input space, I s(50) against E. As a result of the clustering process, the geometric parameters were produced to form Gaussian-type input MFs. These Gaussian parameters used in the model FIS1 configuration are arranged into a matrix (denoted by c) in relation to center coordinates and into a row vector (denoted by σ) in relation to spread coefficients, as indicated below:

$$\begin{array}{*{20}c} {C = } & {} & {} \\ \end{array} \begin{array}{*{20}c} {{\mathbf{Dim1}}} & {{\mathbf{Dim2}}} & {{\mathbf{Dim3}}} & {{\mathbf{Dim4}}} & {} \\ {0.2488} & {0.2289} & {0.2168} & {0.0106} & {{\mathbf{Cluster1}}} \\ {0.4098} & {0.3304} & {0.2273} & {0.3352} & {{\mathbf{Cluster2}}} \\ {0.8585} & {0.7459} & {0.6958} & {0.5801} & {{\mathbf{Cluster3}}} \\ {0.0976} & {0.0580} & {0.1014} & {0.1842} & {{\mathbf{Cluster4}}} \\ {0.4927} & {0.4319} & {0.3986} & {0.7293} & {{\mathbf{Cluster5}}} \\ 1 & {0.8521} & {0.7273} & {0.7698} & {{\mathbf{Cluster6}}} \\ \end{array}$$
$$\begin{array}{*{20}c} {\sigma = } & {} & {} \\ \end{array} \begin{array}{*{20}c} {{\mathbf{Dim1}}} & {{\mathbf{Dim2}}} & {{\mathbf{Dim3}}} & {{\mathbf{Dim4}}} & {} \\ {0.0707} & {0.0709} & {0.0707} & {0.0707} & {{\mathbf{All the Clusters}}} \\ \end{array}$$
Fig. 10
figure 10

Data points and identified cluster centers for two dimensions

Since the input MFs produced are equal in number to the identified clusters, six fuzzy sets with their respective linguistic labels (denoted by corresponding cluster number) are assigned to each of the input variables. Figures 11, 12 and 13 graphically depict these six antecedent fuzzy sets for the model FIS1. Similar to the input MFs, the number of propagated fuzzy rules and identified clusters would be equal. Therefore, six fuzzy conditional statements are propagated for the model FIS1, as summarised in Table 8. Functionally, each row of the rules tries to form a direct relationship between a cluster from the premise part and a cluster from the conclusion part.

Fig. 11
figure 11

Six MFs derived from the clustering process for the input R n

Fig. 12
figure 12

Six MFs derived from the clustering process for the input V p

Fig. 13
figure 13

Six MFs derived from the clustering process for the input I s(50)

Table 8 Six fuzzy conditional statements for the best rule-based model

Additionally, Fig. 14 shows how the antecedent and consequent MFs interact with each other in the fuzzy system FIS1. In other words, this graphical diagram simulates the system’s behaviour in mapping three-element input to a one-element target. The prediction performance of the FIS1 will be further discussed in a later section.

Fig. 14
figure 14

Schematic representation of the entire fuzzy inference process for the optimal Sugeno fuzzy model, FIS1

Designing the ANN model

Like the FIS systems, each of the ANN models utilises the same distribution of normalised original data for estimating the elastic modulus of sandstone, that is, 80% of the database is set in the training part and the remaining data sets are devoted to testing of the models. A challenging task in designing any ANN models entails adopting an optimal ANN architecture based upon the number of hidden layer(s) and the number of nodes per hidden layer. In practice, an ANN model benefiting from a single hidden layer can solve any complex fitting problem given sufficient nodes for that layer (Cybenko 1989; Hornik et al. 1989). Furthermore, theoretically, the number of hidden neurons per layer should be proportional to the problem complexity. This means that a higher degree of complexity requires additional nodes in hidden layer(s), so that they can capture the true underlying relationships of the modelled data, although an excessive increase in neuron numbers may cause certain difficulties, including overfitting and longer computation time. Thus, specifying an optimal number of nodes for each hidden layer is crucial (Sonmez et al. 2006). Table 9 lists chronologically several empirical expressions with respect to hidden node numbers. Based on the above discussion and the information in Table 9, the initial network connection structure used in this research will be composed of three input nodes, one output node and a single hidden layer with neurons ranging in number from one to seven.

Table 9 Suggested relationships for estimating the number of hidden layer nodes (Takagi and Sugeno 1985)

Hence, the suggested network for this research will be a two-layer feedforward network consisting of sigmoid hidden nodes and a single linear output node. Based on the resulting neuron range, seven networks with different hidden node numbers are established. The neural network models are thus configured to be trained and then tested to find the optimal number of nodes in the proposed hidden layer. The RMSE is considered as convergence criterion for the training process (Simpson 1990). In addition to RMSE, other statistics are designated to assess the predictive performance of these trained networks, including R 2 and variance accounted for (VAF).

After training the proposed models, testing samples are used to put the trained models to the test and also to validate the model generalisation according to the performance results on the test data set. The task of selecting the best network performance on both training and test sets is accomplished by means of a simple ranking approach (Zorlu et al. 2008), as described earlier. Most scholars involved in the field of ANN put considerable emphasis on the importance of the learning algorithm utilised for training purposes. With regard to the efficiency of these algorithms, several studies have shown that the Levenberg–Marquardt (LM) BP algorithm has a number of distinct advantages over conventional gradient descent approaches (Hagan and Menhaj 1994). Accordingly, all the ANN models developed in this research are trained using the LM BP algorithm. Table 10 summarises the trained and tested ANN models along with their respective performance measures (RMSE, R 2, VAF) and lists rank values for both the training and testing parts of randomly chosen data.

Table 10 Performance statistics of the ANN models, each of which is graded using a ranking technique

As shown in Table 10, a set of performance measures with a maximum rank value of 36 demonstrates the superiority of model ANN4 over the others. Consequently, the optimal number of hidden nodes will be equal to four, according to the selected ANN model. Figure 15 illustrates a schematic of the optimal structure for the ANN model under consideration. Evaluation of the ANN model will be provided later.

Fig. 15
figure 15

A 3-4-1 back-propagation MLP suggested to estimate E values of sandstone materials

Designing the multiple regression

MR analysis is used to determine the values of parameters for a function such that the function will best fit a provided set of data observations. With this technique, the function is a linear (straight-line) equation. MR solves engineering problems by performing a least squares fit, which constructs simultaneous equations through the creation of a regression matrix. By employing this technique, coefficients are suggested by means of a backslash operator.

Using the established normalised data set, an MR equation was developed to predict E, as shown in Eq. 11. Values of R 2 of 0.588 and 0.715 were obtained for training and testing of the proposed MR, respectively. In these models, R n, V p and I s(50) were considered as inputs, and the E was then estimated as a function of these inputs. The statistical package SPSS 11.5 (SPSS 2007) was used to construct the MR models. The predictive performance of the MR models will be examined in greater detail in the following section.

$$E = 0.182 \times R_{\text{n}} + 0.453 \; \times \;V_{\text{p}} \; + \;0.173 \times I_{{{\text{s}}\left( {50} \right)}} + 0.0 6 1.$$
(11)

Comparing predictive performance

This section presents an evaluation of the capacity of the developed models for predicting the E. Simple regression analysis revealed the need to develop E predictive models with higher accuracy using multi-input parameters. Hence, FIS, ANN and MR models were also proposed for estimating the E of the sandstone samples. In the FIS, ANN and MR modelling procedures, all 96 data sets were randomly divided into two sets for model development and evaluation. As mentioned previously, in this study, R 2, VAF and RMSE were considered and calculated to evaluate the performance of the predictive models.

Table 11 presents the results of the models in predicting E. Based on these results, the performance of the ANN model is superior to that of both the FIS and MR for all items, and the performance of the FIS predictive model is superior to that of the MR model in most of the items. Based on the predictive performance results for both superior FIS and ANN models, the proposed ANN model is able to estimate the sandstone elastic modulus, E, for both training and testing samples with better accuracy compared to the fuzzy model. As an example, R 2 values of 0.715, 0.670 and 0.818 for the testing data sets of MR, FIS and ANN, respectively, indicate that the ANN is the best predictive model for estimating the E of the sandstone samples.

Table 11 Performance indices for the proposed models in predicting the E of rock samples

The high predictive ability of the proposed ANN model is essentially attributable to its use of iterative optimisation in predicting the response data E. In contrast, the poor predictive performance of the fuzzy model is principally due to its rule-based, one-pass mechanism that does not implement any iterative optimisation for capturing the underlying behaviour of the training data. However, to improve the predictive capability of the FIS, two common overall techniques have been proposed, both of which emphasise the fine-tuning of MF parameters of the fuzzy system (i.e. premise and consequent MFs) over a training period. In the first of these, which is referred to as adaptive network-based fuzzy inference system (ANFIS), a class of adaptive networks is converted to an equivalent FIS being used as a whole. In the second technique, a hybrid system can be developed to combine the FIS with any optimisation algorithm, such as particle swarm optimisation.

Conclusions

Several laboratory tests, including uniaxial compressive strength, Schmidt hammer, point load strength and P-wave velocity, were conducted on 96 samples of sandstone. These core samples were acquired from sites in the state of Sarawak, Malaysia, and sample preparation and testing were carried out in accordance with ISRM guidelines. As a target of this study, elastic modulus values were obtained after conducting UCS tests.

Based on simple regression analysis, the relationships between the E and other predictors were found to be acceptable. Nevertheless, in order to obtain models with higher accuracy, MR, FIS and ANN models were also developed. Based on model performance indices and using a simple ranking method, the best FIS and ANN models were chosen from among the group of models constructed, and then using the same data sets, an MR model was developed to predict the E of the rock. The indices R 2, VAF and RMSE were utilised to check the predictive performance of the models, with results revealing the ANN to be the best predictive model. Based on RMSE, results of 0.167, 0.151 and 0.127 were obtained for testing data sets of the MR, FIS and ANN models, respectively, demonstrating the higher capacity of the ANN model in estimating modulus of elasticity of the rock. It should be noted, however, that the predictive models proposed in this study were designed based on the properties of sandstone rock samples; hence, direct implementation of the models must be undertaken with caution and for similar conditions.