1 Introduction

Uniaxial compressive strength of rocks is a competent parameter for designing surface and underground rock structures (Bieniawski 1974). Direct determination of this parameter in the laboratory is carried out according to standards of the American Society for Testing and Materials (ASTM) and the International Society for Rock Mechanics (ISRM). To do so, rock samples have to be prepared, which is expensive and time consuming. Furthermore, in some cases of weak rocks, sampling is almost impossible. To overcome problems with direct laboratory determination of UCS, indirect methods have been developed (e.g. empirical models and artificial intelligence (AI) based models). In the empirical models, predictive functions are usually derived from simple tests such as Schmidt rebound number, point load tests, impact strength, and sound velocity using traditional regression analysis (Fener et al. 2005; Singh et al. 2001). In the past investigations these models have been employed by many researchers. However, accuracy of such models is rather low which may be attributed to linearity assumptions (Singh et al. 1983; Haramy and DeMarco 1985; O’Rourke 1989; Garret 1994; Huang and Wanstedt 1998). In view of the above shortcomings of the empirical methods, artificial neural network (ANN) models, a subsystem of artificial intelligence, may properly be used for predicting UCS. This method has been used in the field of rock mechanics and mining sciences and particularly for predicting UCS by many investigators (Nie and Zhang 1994; Huang 1999; Cevik et al. 2011; Atici 2011).

In this study, a new ANN model was developed to predict UCS. Specialty of this new so-called neuro-genetic model is optimizing the network parameters (number of neurons in hidden layers, learning rate and momentum) with the help of genetic algorithm. It should be mentioned that this is the first application of this kind for predicting UCS.

2 UCS of Rocks: Measurement and Prediction Methods

UCS is regarded as the highest stress that a rock specimen can carry when a stress is applied in an axial direction to the ends of a cylindrical specimen. The UCS test allows comparisons to be made between rocks and affords some indication of rock behavior under more complex stress systems (Bell, 2005). As previously mentioned, ASTM and ISRM specifications are direct methods for measurement of this parameter in the laboratory. Measurements of UCS can be time-consuming and expensive and requires carefully prepared rock samples. Therefore, several different ways to predict UCS, including the point load test, Schmidt hammer test, shore hardness test, porosity, sonic velocity, etc. have been recently employed by various researchers. There are a vast agreement of published empirical relationships between point load index and UCS. Broch and Franklin (1972) reported that UCS is about 24 times the point load index. Bienawski (1975) proposed this coefficient to be approximately 23. ISRM (1985) suggested this value as 20–25. In a laboratory study, Kahraman (2001) presented a comprehensive list of such relationships. Isik Yilmaz (2009) applied core strangle test (CST) instead of point load index to estimate UCS for different types of rocks. This research indicated that CST will be more efficient than point load index test for estimation of UCS. Besides, Kayabali and Selcuk (2010) offered a new and practical index test method, nail penetration Test (NPT), to estimate intact rocks’ UCS, as well as an alternative to the point load test (PLT). Further studies have been done by Russell and Wood (2009), Basu and Kamr (2010).

The shore hardness test is also reported for evaluating and comparing the hardness of rocks. These reports showed that the relations between shore hardness and UCS are weaker than those obtained from the Schmidt hammer (Yasar and Erdogan 2004; Altindag and Guney 2005). Another method applied for UCS estimation is block punch index (BPI) test. However, this test is only performed on very thin specimens, which is considered as a deficiency for this method (Ulusay et al. 2001).

Many researchers have used ultrasonic velocity index to predict rock strength by measurement of ultrasonic velocities in directions parallel and perpendicular to weakness planes of anisotropic rocks (Vasconcelos et al. 2007; Sharma and Singh 2008; Vasconcelos et al. 2008; Khaksar et al. 2009; Moradian and Behnia 2009; Vishnu et al. 2010; Kelessidis 2011, Rigopoulos et al. 2011). Shalabi et al. (2007) proved that the relationships from this test were weaker than the results obtained by Schmidt hammer and shore tests.

Some researchers have studied the effect of petrographic characteristics (e.g. grain size, grain shape, type and amount of cement, and packing density) on compressive strength of concrete and rocks (Ulusay et al. 1994; Hale and Shakoor, 2003). Meddah et al. (2010) revealed that compressive strength of concrete increases with maximum size of coarse aggregate. Zhang et al. (2011) studied the scale effect on intact rock strength using particle flow modeling.

According to Demirdag et al. (2010) physical properties of rock materials such as porosity, unit volume weight, and Schmidt hardness have more significant effects on the dynamic mechanical behavior of the rock samples. For both the quasi-static and dynamic loading conditions, the compressive strength of the rock samples increased with increase in their unit volume weight and Schmidt hardness values while it increased with decrease of their porosity.

Moreover, Bell (1978) concluded that the strength of sandstone increases as packing density increases. Doberenier and De Freitas (1986), also, confirmed that a low packing density generally characterized weak sandstones. Porosity has an important effect on mechanical properties of rocks. The researches by Dube and Singh (1972) showed that strength properties decrease as porosity increases.

Schmidt rebound number (SRN) has widely been employed for prediction of UCS. Schmidt hammer is a portable test tool which imparts a known amount of energy to the rock through a spring-loaded plunger. Two type hammers are used for SRN, L-type and N-type. The studies have been confirmed that the N-type should be used for rocks with UCS > 20 MPa (Sheorey et al. 1984). However, the previous studies show that both types have been employed to predict the strength of various rock types (Katz et al. 2000; Kahraman et al. 2002; Aydin and Basu 2005; Porto and Hurlimann 2009).

Moh’d (2009) pointed out that compressive strength of the studied samples has positive relationships with density and sonic velocity and inverse relationships with permeability, modified saturation, total and other porosity types. According to Moh’d (2009) studies, dry density, as the easiest parameter to measure in laboratory or field, can be used for predicting compressive strength of rocks.

Considering mentioned methods for measuring and predicting of UCS, Schmidt rebound number, porosity and density, convenient and inexpensive tests could be preferably used to estimate the UCS of various rock types. For this goal, in this study, a neural network optimized by genetic algorithm is employed using the mentioned parameters as input. The majority of models used for prediction of UCS in literature have been developed based on a simple or multivariate linear or nonlinear regression analysis using a limited number of data and parameters. If new available data are different from the original ones then the form of obtained equation is necessary to be update. On the contrary, a trained ANN can conveniently re-train and adapt to new data (Lee 2003). Atici (2011) used a backpropagation neural network with levenberg–marquardt algorithm (considering blast-furnace slag, mixed age, rebound number, and ultrasonic pulse velocity as input parameters and UCS as output parameter) to predict strength of the mineral admixture concrete. The results showed a high accuracy of ANN model rather than multivariable regression analysis. Selver et al. (2008) predicted brecciated rock specimens UCS using neural networks and different learning models. Also, Cevik et al. (2011) used neural network modeling to predict UCS of some clay-bearing rocks. These studies showed the superiority of ANN compared to traditional prediction models. It is worth to mention that the architectural parameters of all these networks (number of neurons in hidden layers, learning rate, and momentum coefficient) are obtained by trial and error process and in this paper these parameters are calculated by genetic algorithm optimization process.

3 Neuro-Genetic Hybridization

3.1 ANN

Artificial neural network can be defined as an information-processing system that is identical to biological neural networks. This type of network was first introduced by Mc Culloch and Pitts (1988). ANN structure is fundamentally composed of several fully interconnected layers; input layer, output layer and hidden layer(s). The number of hidden layers is determined on the basis of problem complexity. To increase prediction capability, it is normally recommended to utilize two hidden layers for more complex problems. Each layer contains a number of simple information processing units called neurons. The number of input and output neurons is simply equal to the number of input and output problem variables. However, the number of neurons in the hidden layer(s) is dependent to the unknown interrelationship among the input–output variables. The neurons in each layer are connected to the neurons of the subsequent layer through weighted connections. By this way, each connection weight multiplies into the signal transmitted from the preceding layer. In the neural networks, with the exception of the input layer, all the other neurons are associated with a bias neuron and a transfer function. A bias vector which is referred to as the temperature of a neuron is similar to a weight with a constant value of 1. The biases are applied in the transfer functions to distinguish between neurons. The purpose of a particular neural network determines the type of transfer function is to be used. Activation of the neurons is performed using simple step transfer functions which are usually nonlinear. Through activation process, sum of weighted net input signals and their corresponding biases for each and every neuron is filtered to determine its output signal.

In training of the neural networks different types of algorithms can be applied. Back-propagation algorithm provides the most efficient learning procedure. This technique is especially suitable for solving predicting problems. During training process a sufficient number of sample datasets are required to reach pretend results. For each dataset, input and corresponding output or training pairs, processing starts from the input layer and lasts to the output layer (Feedforward). At this point, the output is compared to the measured actual values. The calculated difference or error is back propagated through the network (Back propagation) updating the weights and the biases. The above mentioned process is repeated for all the training pairs. Convergence of the network error to a minimum threshold which is usually determined by a cost function—known as Mean Square Error (MSE)—is the end of training process.

It is hereby mentioned that efficiency of the ANN model is considerably influenced by its topology, learning rate, and momentum. During training process, lack of sufficient number of neurons in the hidden layer(s) can cause this stage not performed properly, means that relationship between the input and output variables is not recognized. On the other hand, if the neurons are too high, model training time and computations is increased and also memorization due to over fitting may be occurred. Therefore, maximum efforts should be incurred to create as possible as simple network topology with tolerable errors. Too small learning rates or weights adjustment velocity would cause elongation of training and keeping them too large may result in lack of convergence. Finally, an unsuitable momentum can cause the model to be trapped in a local minima and inaccuracy of the network.

In a routine ANN network all of the aforesaid parameters are determined on the basis of trial and error approach, which is tedious and time consuming process in which supreme optimized model may not be acquired. To overcome the shortcomings encountered in application of ANN, genetic algorithm in combination with neural networks, so-called neuro-genetic network, can be effectively utilized.

3.2 GA

Holland (1975) developed the first GA for optimizing problems using Charles Darwin theory of natural evolution in the origin of species. This approach can efficiently be used when the exploration space is extensive. The basic of GAs is as follows:

At first a population or set of chromosomes (sequence of genes) is randomly initialized. A chromosome itself is one of the possible problem solutions not necessarily being the best one. Normally, genes of each chromosome are a set of bits with a binary formation (genotype). In the second step, fitness of each decoded (phenotype) chromosome in the initial random population is evaluated using an objective function. To initialize a new evolved population, an optimization process known as reproduction with genetic operators such as “selection”, “crossover” (recombination), and “mutation” is applied. In the “selection” process, two parent chromosomes are randomly selected using Roulette wheel method, Tournament method, etc. The criterion behind selection is conformity of each chromosome, which is determined on the basis of a fitness function such as Mean Square Error (MSE). The more the fitness is the higher the chance of selection of a chromosome. In the “crossover”, the parent chromosomes from selection step are used to probabilistically produce new chromosomes by a swapping mechanism. Probabilistic crossover is conducted aiming to generate better chromosomes from parts of the parents. This process is applied to describe how often crossover would be performed. If no probability is considered, new generation is made from exact copies of chromosomes from old population. This does not mean that the new generation is the same. On the other hand, if probability of 100 % is employed, then all of the old population would be changed. But, it is better to let some part of the old population to be survived for the next generation. Finally, in the “mutation”, new versions of some of the chromosomes (individuals) are produced. In fact, mutation prevents the GA from falling into local optima. In the mutation process, randomly bits of genes existed in the original chromosomes are flipped to form a new string. Mutation is probabilistically performed for selecting the number of bits to be mutated. In a similar fashion to the crossover, mutation probability determines how often parts of chromosome is to be mutated. If mutation of 0 % is selected nothing would be changed and offspring is generated immediately after crossover. On the other hand, if mutation probability is 100 % the whole chromosome would be changed. Mutation should not occur very often because the GA convergence would be very difficult or even impossible. The reproduction process continues until a particular selected stopping criterion such as maximum generations and maximum evolution time is satisfied. For example, number of generated populations can be considered to stop the process. To obtain more accurate results, sufficient number of generations should be applied. In the last stage, the best solution which is a chromosome with maximum fitness is introduced by the GA (Goldberg 1989; Sivanandam and Deepa 2008).

3.3 Combination of ANN and GA

The GA can be utilized to design and construct an optimum neural network, a combination of GA and ANN so-called neuro-genetic. In the first step, an initial population of neural networks with their own individual parameters (number of neurons in hidden layers, learning rate, and momentum) is randomly created. In the second step, each of the networks is trained and evaluated to determine its fitness. In the third step, to create a new evolved population, the operators “selection”, “crossover”, and “mutation” are applied. The new processed population is again evaluated in the same manner. This process is repeated until the maximum generations or maximum evolution time is reached. Figure 1 illustrates the process of optimizing neural network parameters using GA applied in this study. Many investigators (Fogel et al. 1990; Bornholdt and Graudenz 1992) used this technique to train feedforward networks. Regular neural networks were optimized by applying evolutionary algorithms. Same applications were reported for generalized regression neural network (Hansem and Meservy 1996) and Hopfield neural networks (Lin et al. 1995).

Fig. 1
figure 1

Combination of genetic algorithm and neural network

4 Datasets

In this study, 93 samples of different rock types including sandstone, limestone, dolomite, granite, chalk, gneiss, siltstone, tuff, gypsum, olivine, granodiorite, slate, schist, conglomerate, quartzite, gabbro, and amphibolite were collected and tested for determination of parameters density, porosity, Schmidt rebound number and UCS. N-type Schmidt hammer was selected and applied according to methods proposed by ISRM (1981). In all carried out tests, the hammer was held vertically downwards. It must be added that the tests can be conducted in the field if ISRM suggested methods is followed (Kahraman et al. 2002). Finally, a universal testing machine was implemented for determination of UCS. In this study, density, porosity, and Schmidt rebound number was considered as input parameters to predict UCS as output parameter. A summary of the laboratory test results are given in Table 1.

Table 1 Summary of the laboratory test results

5 Regression Analysis

Regression analysis can be applied to establish a mathematical model for realizing the relationships between independent and dependent variables (Jennrich 1995). Multivariable regression analysis gives more realistic results where number of variables is too high. Application of this particular method in the mining related problems has been reported by many researchers (Alveraz Grima and Babuska 1999). Applying the statistical software SPSS16 and using the prepared database collected from the laboratory, an arithmetical model (Eq. 1) was developed to predict UCS using new input parameters.

$$ {\text{UCS}} = 0.801\,{\text{Schmidt}} + 0.423\,{\text{Density}} - 0.172\,{\text{Porosity}} - 0.246. $$
(1)

From this equation, correlation of determination (R2) and mean square error (MSE) were calculated as 0.774 and 1.61, respectively.

6 Neuro-Genetic Based Analysis

Implementing GA, the parameters of ANN was determined to find the optimal architecture of neural network. Here, the process of optimizing ANN is performed by “NeuroSolution” for Excel Release 5.05 software package, produced by Neuro Dimension, Inc. To apply this software, the concerned data should be normalized to keep the values within the range (0, 1). Data normalization is fulfilled using Eq. (2):

$$ {\text{Var}}_{\text{n}} = \frac{{{\text{Var}}_{\text{i}} - {\text{Var}}_{\min } }}{{{\text{Var}}_{\max } - {\text{Var}}_{\min } }} $$
(2)

where, Varn is normalized value, Vari is the real value, Varmin and Varmax are the minimum and the maximum real values, respectively.

In the next step, the database was randomly divided into three groups, i.e. training (60 %), cross validation (15 %), and (25 %) testing. In this study, a feedforward backpropagation neural network with two hidden layers was identified to be suitable. This network was trained using GA. In the training process of the ANN, initial population is generated. The chromosomes of each population contain three genes (i.e. the number of neurons in the hidden layers, the momentum and the learning rates). The number of hidden neurons and training parameters were represented by haploid chromosomes consisting of ‘‘genes’’ of binary numbers. The genes themselves have also a few numbers of bits which determine the length of the chromosomes. The process of determination of the chromosomes lengths is automatically made by the NeuroSolution package. For generation of the initial population a boundary limit should be defined for chromosomes’ components of this population. The boundary limits for number of the hidden neurons, the learning rate, and the momentum were set as (1, 30), (0, 1) and (0, 1), respectively. After the limits are set, for each components, the software randomly select a value from the defined limits and then automatically produce the initial population.

Network performance is evaluated using Mean Square Error (MSE) as defined in Eq. (3). The errors of training datasets were computed and network with the smallest MSE was considered to be optimum (Niculescu 2003).

$$ {\text{MSE}} = \frac{1}{n}\sum\limits_{{{\text{i}} = 1}}^{n} {({\text{O}}_{\text{i}} - {\text{T}}_{\text{i}} )^{2} } $$
(3)

where Oi is the desired output for training data or cross validation data i, Ti is the network output for training data or cross validation data i, and n is the number of data.

To start working with GA, setting the concerned parameters (population size, stopping criterion, “Selection”, “Crossover”, and “Mutation”) is essential. Normally, trial and error mechanism and/or previous experiences are applied for selecting the parameters. Following this procedure, population size and stopping criterion were considered to be 40 and 30, respectively. Reproduction of the new chromosomes is commenced with “selection” which was performed using Roulette wheel ranking algorithm-based method. In this way, chromosomes are arranged according to their relative fitness. The chromosome with the lowest fitness is received ranking 0 and accordingly the worst chromosome is assigned ranking 1. After ranking is finished, the chromosomes are placed into the intermediate population (Kim et al. 2004). In continuation of reproduction process the crossover and mutation operators are applied for the intermediate population. For this study, two point crossover and uniform mutation were used and accordingly their probabilities were determined to be 0.7 and 0.01, respectively (Table 2). The process of creating new generations is repeated until the stopping criterion is satisfied. Figure 2 shows average and best improving MSE for new generations. Finally, the best chromosome available in the last generation is considered to be the problem optimum solution.

Table 2 GA parameters used for optimization of ANN
Fig. 2
figure 2

Average fitness (MSE) versus generation (a) best fitness (MSE) versus generation (b)

Table 3 shows the details of the Generation No. 23 which is considered the best solution. As a result, it was revealed that the optimum number of neurons, which was obtained by GA, in first and second hidden layers, is 9 and 5, respectively. Furthermore, the other network parameters, learning rate and momentum, also optimized by the GA, were equal to 0.66 and 0.53, respectively (Fig. 3).

Table 3 Optimization summary by GA
Fig. 3
figure 3

Optimized structure of the proposed ANN model for prediction UCS

Performance of the proposed model was evaluated using selected datasets considered for testing the model. Coefficient of determination for measured and predicted UCS was computed 0.96 which shows superiority of the neuro-genetic network over conventional statistical method (Fig. 4).

Fig. 4
figure 4

Correlation between measured and predicted UCS

7 Sensitivity Analysis

Sensitivity analysis is a method for extracting the cause and effect relationship between the inputs and outputs of the network. The network learning is disabled during this operation such that the network weights are not affected. The basic idea is that the inputs to the network are shifted slightly and the corresponding change in the output is reported either as a percentage or a raw difference. Figure 5 illustrates the sensitivity analysis results.

Fig. 5
figure 5

Sensitivity analysis results

8 Conclusion

In this paper, a hybrid neuro-genetic network was implemented to predict uniaxial compressive strength of rocks. In this regard, neural network parameters including number of neurons in hidden layers, learning rate, and momentum coefficient were optimized by genetic algorithm. For this study, two point crossover and uniform mutation were used and accordingly their probabilities were determined to be 0.7 and 0.01, respectively. Determination of the optimum model with this method as compared with the classic networks (based on trial and error process) is faster and more convenient. In optimization process by GA, the optimum number of neurons obtained 9 in first and 5 in second hidden layers. Also, learning rate and momentum were equal to 0.66 and 0.53, respectively. The results showed the robustness of this hybrid network for estimation of UCS, a time consuming and costly test, with easily-attained parameters Schmidt rebound number, density, and porosity. Competency of the method over conventional regression analysis was also confirmed. To compare neuro-genetic network results and statistical analysis, R2 and MSE were calculated. Significant efficiency of neuro-genetic model was proved with R2 and MSE, 0.9589 and 0.0045, respectively. For regression model R2 and MSE were calculated 0.774 and 1.61, respectively. Rather poor performance of this method may be attributed to applying linearity assumption. Finally, sensitivity analysis revealed that the most sensitive parameter on the predicting UCS is Schmidt rebound number, which is consistent with the previous experiences.