Introduction

Soil masses often do not have the properties required for engineering structure constructions. In such cases, in order to obtain the desired properties, the geotechnical properties of the soil can be improved. The selection of the soil improvement techniques to be applied depends on many factors such as the type of soil on the field, the condition of the soil and the economic factors. The aim in all improvement techniques is to increase the soil density and strength, and to reduce the permeability and settlements. One of them, the compaction method, is used to increase the density and bearing capacity of the soil and reduce the permeability.

The compaction is to increase the dry density by the applied energy of the soil where the water content (w) is changed. Air volume reduces while water and solid part do not compact, and the grains come closer together. If the soil is added with some water and compacted, the soil will have a certain dry unit weight (γd) and if the same soil is increased in water content and compacted in the same energy, the dry unit weight gradually increases. By increasing the water content, the dry unit weight reaches the highest value which is defined as maximum dry density (MDD) (γdmax), and after this limit value, the dry unit weight begins to decrease if the water is added to the soil. The water content providing the maximum dry density is defined as the optimum moisture content (OMC) (wopt). The MDD and OMC are two significant parameters representing the compaction behavior of the soils which are determining with the standard proctor (SP) and modified proctor (MP) tests in the laboratory. These two parameters are determined on compaction curve that obtained from the laboratory tests and have an important role in compacted fillings which are inevitable for engineering structures such as highways, railways, and earth dams.

However, the laboratory tests for determination of the wopt (OMC) and γdmax (MDD) have time consuming and laborious process and required an important effort. For this reason, many scientists and researchers have tried to determine the compaction parameters from index properties of soils by empirical correlations based on the regression analysis (Table 1). The effect of index properties on compacted the soils has been known for a long time. The grain size and grain distribution on coarse-grained soils and the consistency limits on fine-grained soils are the main factors. In addition, the tests to determine the index properties have a fairly easy and inexpensive procedure compared with the compaction tests.

Table 1 Proposed empirical correlations for determining the compaction parameters

The proposed correlations related to relationships between the physical characteristics and the compaction parameters are generally based on multiple linear regression (MLR) analysis. The most important problem with the use of these correlations is that the correlations are usually developed for a particular locality or same geological origin soils. The use of these correlations for an area outside the grounds of the locality where the correlations are developed can cause significant differences between the expected and computed compaction parameters. Therefore, it is necessary to be cautious about the use of the compaction parameters determined by empirical correlations.

Recently, geotechnical engineering practices has begun to include many studies related with soft computing methods (Lee and Lee 1996; Najjar and Basheer 1996; Kiefa 1998; Juang and Chen 1999; Sakellariou and Ferentinou 2005; Wang et al. 2005; Kim and Kim 2006; Sinha and Wang 2008; Samui 2008; Abdel-Rahman 2008; Kuo et al. 2009; Gunaydin 2009; Nejad et al. 2009; Kalinli et al. 2011; Samui and Kothari 2011; Isik and Ozden 2013; Tenpe and Kaur 2015; Abdalla et al. 2015; Chenari et al. 2015; Suman et al. 2016).

In this paper, soft computing methods such as group method of data handling (GMDH)–type neural network, support vector machine (SVM), Bayesian regularization neural network (BRNN), and extreme learning machine (ELM) were used to predict the compaction parameters of the soils. The index properties of soil samples were used as an input parameter on estimation of compaction parameters for all models.

Database compilation

Database containing the index and standard proctor (SP) test results of 451 soil samples was used for this study. Approximately half of the dataset was provided from published studies (Gunaydin 2009; Olmez 2007), and the remainder part was provided from laboratories focused on different soil investigations such as compaction tests in various parts of Turkey. As mentioned in the introduction section, the grain size distribution on coarse-grained soils and the consistency limits on fine-grained soils are the most effective features. In this context, the database was formed by the percentages of liquid limit (LL), plastic limit (PL), fines content (FC), sand content (SC), and gravel content (GC) of the compacted soils. The soil classes of the soil samples has a wide scale such as CH, CI, CL, GC, GM, GP, GP-GC, GP-GM, GW, GW-GC, GW-GM, MH, MI, ML, SC, SP, SP-SC, and SW-SC. Statistical description of the data is given in Table 2.

Table 2 Statistical specifications of index and compaction parameters

Method

The GMDH is a more complex model, which is gradually evaluated on a set of multiple input, single output data pairs (Vissikirsky et al. 2005). The GMDH architecture is a self-organizing polynomial neural network model with a flexible structure (Ghanadzadeh et al. 2012). Firstly, the data set is divided into training and test sets, and the calculations are made according to the following regression equation for both input variable pairs;

$$ \mathrm{Quadratic}:\hat{y}=G\left({x}_i,{x}_j\right)={w}_0+{w}_1{x}_i+{w}_2{x}_j+{w}_3{x}_i{x}_j+{w}_4{x}_i^2+{w}_5{x}_j^2 $$
(1)

The weights in the above equation are obtained according to the least squares method. Then, new variables are obtained by using polynomial equations. The output of the polynomial equation of variables is carried to the next layer with the minimum error value. Thus, new variables are produced from the input variables that best describe the output variable.

$$ E=\frac{\sum_{i=1}^M{\left({y}_i-{G}_i\left({x}_i,{x}_j\right)\right)}^2}{M}\rightarrow \mathrm{minimum} $$
(2)

The GMDH model has been used with success in geotechnical practice in recent years (Jirdehi et al. 2014; Kordnaeij et al. 2015; Hassanlourad et al. 2017; Ardakani and Kordnaeij 2017).

ELM is basically similar to artificial neural networks with one hidden layer. Therefore, the working principle of ELM is to some extent the same as the working principles of artificial neural networks (ANN). However, in over-learning machines, the weights (Wi) in the hidden layer are randomly assigned, and these values are not changed (not updated) at a later stage of training. In contrast, the weights (βi) between the output layer and the hidden layer are determined at once, analytically and quickly using a linear model. The basic ELM model is based on feedforward neural networks with a single hidden layer (Huang et al. 2006). The ELM method has begun to be used in most geotechnical problems and successful results were achieved (Muduli et al. 2013; Huang et al. 2017a; Huang et al. 2017b; Liu et al. 2014; Li et al. 2016; Liu et al. 2015).

Determining the number of hidden neurons is most of the biggest challenges in creating an ANN model. Overfitting can be occurs when the number of neurons is high, and there is a difficulty in training the network when there are few hidden nodes. In addition, an ANN model that is designed to be complex or simple will have poor predictive performance. To overcome this, BRNN model, which describes ANN’s training function with a probabilistic approach, was proposed by MacKay (1991). BRNN is a widely used model for solving nonlinear problems (Bui et al. 2012; Okut 2016; Caballero and Fernández 2006). The BRNN method has been used on different geotechnical problems and obtained good results (Nejad et al. 2009; Das et al. 2010; Muduli et al. 2014; Sabat 2015).

SVM model theory was first proposed by Cortes and Vapnik (1995). SVM is widely used for regression and classification purposes. An extended version of SVM, also known as support vector regression (SVR), has been developed for complex regression problems. SVM method has been widely used in different geotechnical problems in recent years (Kordjazi et al. 2014; Sabat 2015; Samui 2008; Samui et al. 2008; Samui and Kothari 2011). In these studies, the theory of the SVR model is also explained in detail.

Results

GMDH results

In this paper, different GMDH architectures were used to prediction of the compaction parameters. The results of these trials are given in Table 3. Since the rise in the number of hidden layers caused an increase in the calculation costs, a maximum of 7 trials were performed. However, when the number of hidden layers rises, it is seen that the MSE error values decrease and R determination coefficients increase and also the success of the model increases (Table 3). The regression graphics and curves belonging to the actual and predicted values are given on Figs. 1 and 2. The graphics and histograms related to errors and distributions are given in Figs. 3 and 4.

Table 3 Performance results of the GMDH model
Fig. 1
figure 1

Determination coefficient performance values for OMC parameter

Fig. 2
figure 2

Determination coefficient performance values for MDD parameter

Fig. 3
figure 3

Distribution and error graphics related to the OMC parameter

Fig. 4
figure 4

Distribution and error graphics related to the MDD parameter

ELM results

The training and test sets were randomly generated in ELM models. The R and MSE performance values obtained for OMC and MDD parameters are given in Table 4.

Table 4 Performance criteria in the case of output OMC and MDD

In this study, different architectural structures of ELM have also been tried to estimate OMC and MDD parameters. The different architectural structures of the ELM model were carried out for the 70–30% training-test set, and the obtained results are given in Tables 5 and 6.

Table 5 Effect of different activation functions for OMC (70–30% training-test set)
Table 6 Effect of different activation functions for MDD (70–30% training-test set)

The best performance in predicting OMC was achieved with radial basis activation function (Table 5). The best performance in predicting MDD was obtained with sine activation function (Table 6). It is clear that the estimated and the actual values for the OMC and MDD were seen to be close when related graphics are examined (Fig. 5 and 6).

Fig. 5
figure 5

The estimated and the actual values for OMC

Fig. 6
figure 6

The estimated and the actual values for MDD

Many trials were performed for the estimation of the OMC and MDD parameters with ELM. The training set was used to train the model while the test set was used to verify the generalization ability of the model in the trials with ELM. The R and MSE performance values obtained in estimating the OMC and MDD parameters are given in Fig. 7. It is understood that the training period for the OMC parameter is very well performed (Fig. 7a). However, the same performance could not be demonstrated in the testing process. In trials where the number of neurons in the hidden layer is greater than 30, it is seen that the values of R and MSE are wavy for the test samples; in addition, the MSE error values are increased. Accordingly, it can be said that there is no functional relationship between the R and MSE performance criteria and the number of neurons in the hidden layer. However, the performance of ELM’s generalization ability depends on the number of neurons in the hidden layer. Similarly, when the number of neurons in the hidden layer of the ELM is greater than 30–40 for the estimation of the MDD parameter, the performance of the model was reduced. In other words, the MSE error values were increased, and the R values were decreased (Fig. 7b).

Fig. 7
figure 7

Predictive performance versus hidden neuron number (with radial basic activation function), a training and test performances for OMC parameter, b training and test performances for MDD parameter

BRNN results

In this paper, the weights between hidden-output layers and input-hidden layers in neural networks were also calculated with Bayesian regularization model. To show the performance of success, BRNN method was compared with different training methods. The variables in the input layer are the index properties of soil such as LL, PL, FC, SC, GC, and the compaction parameters (OMC or MDD) are used in the output layer. Different activation functions and different neuron numbers for the hidden layer were used for the success of the BRNN model. Training and test sets were randomly generated in each attempt. Min–max conversion was applied to the data before the analysis. The best performance results were obtained by using log-sigmoid and linear activation functions in hidden and output layer, respectively. The performance results obtained with BRNN in the estimation of OMC and MDD parameters are given in Table 7. The best results obtained with BRNN as R = 0.9191 and MSE = 0.0043 for OMC and R = 0.9219 and MSE = 0.0047 for the MDD parameter, respectively. The regression graphs obtained for 70–30% training-test sets of OMC and MDD parameters are given in Figs. 8 and 9. The graphs show the regression models between measured and estimated values for training, testing, and all data.

Table 7 Performance criteria in the case of output OMC and MDD
Fig. 8
figure 8

Comparison between the estimated and measured values of OMC

Fig. 9
figure 9

Comparison between the estimated and measured values of MDD

To show the success of the BRNN method, other ANN training methods were also tested, and the results were compared with BRNN (Tables 8 and 9). All methods were used with the same activation functions and architectures.

Table 8 Performance criteria in the case of output OMC for 70–30% training-test sets
Table 9 Performance criteria in the case of output MDD for 70–30% training-test sets

As seen in the Tables 8 and 9, it has been observed that the best success for the estimation of OMC and MDD parameters is obtained with BRNN. The results obtained with BRNN for test data set as R = 0.9191 and MSE = 0.0043 for OMC (Table 8) and R = 0.9219 and MSE = 0.0047 for the MDD (Table 9), respectively. However, acceptable results have been obtained with other train methods.

SVM results

In this section, trials were performed with SVM models in different architectures for the estimation of OMC and MDD. Different kernel functions were used by investigating the effect of kernel function on SVM model performance. In addition to specifying the kernel function, the decision of the user-defined parameters C and ε is also important. The performance values for the estimation of OMC and MDD parameters are given in Tables 10 and 11.

Table 10 Performances observed with SVM in the case of output OMC
Table 11 Performances observed with SVM in the case of output MDD

The prediction achievements showed little change according to the variation of ε parameter. The best success in estimating the OMC parameter with SVM was R = 0.8510 and MSE = 2.5247 in the case of ε = 0.010, and the best success in estimating the MDD parameter was R = 0.8483 and MSE = 0.9557 in the case of ε = 0.002 (Tables 10 and 11).

The graphics of R values observed for different kernel functions and ε values are given in Fig. 10. The performances obtained by different kernel functions of OMC and MDD parameters are seen on Figs. 10 and 11. As shown in Figs. 10, 11, and 12, the most successful kernel function in estimating OMC and MDD parameters is observed as RBF

Fig. 10
figure 10

. R values for different kernel functions. (A) R values for the OMC, (B) R values for the MDD

Fig. 11
figure 11

The actual and predicted with different kernel functions OMC values. a Predicted with LF, b predicted with PF, c predicted with RBF, d predicted with SF

Fig. 12
figure 12

The actual and predicted with different kernel functions MDD values. a Predicted with LF, b predicted with PF, c predicted with RBF, d predicted with SF

Discussion

In the present study, many trials were performed to obtain best prediction performance of OMC and MDD with different training algorithms, different activations, or kernel functions in all models. Eventually, the performance results of the GMDH, ELM, BRNN, and SVM models are compared. Comparison of the prediction performances of all models are presented in Table 12.

Table 12 The comparison of the model performance results

Despite the overfitting problem, ANN models are widely preferred by many researchers in solving linear and nonlinear problems. The GMDH method, which has been gaining increasing popularity in recent years, is a successful model if there is a linear relationship between data (inputs and outputs). On the contrary, the ELM and BRNN methods are known as successful on solving nonlinear problems. SVM can be used for classification and regression purposes and has been successfully applied in the solution of nonlinear problems in recent years.

There are various research studies focused on to solve complex geotechnical problems by the types of AI techniques (Samui and Kothari 2011; Das and Basudhar 2007; Das et al. 2010; Liu et al. 2015; Huang et al. 2017a, b; Muduli et al. 2013; Nejad et al. 2009; Sabat 2015; Samui 2008). It is clear that the AI techniques are found to be more efficient compared to statistical models in the aforementioned studies. When AI techniques are compared in themselves, the success results can vary according to the content of the study. Although ANN models seem to be more successful than SVM when the relationship between inputs and output is not linear, all AI techniques can be more successful in different data sets.

Conclusions

In this paper, prediction models were developed by using soft computing methods such as GMDH, ELM, BRNN and SVM for the compaction parameters of soils and compared the model performances. Totally 451 test data (index and Standard Proctor) belongs to the compacted soils were used for the study. Trials with GMDH were carried out with different architectures. The best performance success was obtained with 7 hidden layer (R = 0.9047, MSE = 0.451 for OMC; R = 0.9174, MSE = 0.756 for MDD) (Table 3). In trials with ELM, the best performance was obtained with Radial Basis activation function and 24 neurons in the hidden layer on the prediction of OMC (R = 0.9369 and MSE = 0.2224) (Table 5) while the best performance on the prediction of MDD was obtained with Sine activation function and 6 neurons in the hidden layer (MSE = 0.4673and R = 0.9465) (Table 6). Besides, in trials where the number of neurons in the hidden layer is greater than 30, it was seen that the MSE error values are increased (Fig. 7). Trials with BRNN were also performed different activation functions. The best performance results were obtained as R = 0.9191 and MSE = 0.0043 for OMC; R = 0.9219 and MSE = 0.0047 for the MDD (Table 7). In addition, the BRNN was compared to the other ANN training methods such as the LM, BFG, CGB, CGP, GDA, GDM, GDX, RP and SCG and found to be the most successful (Tables 8 and 9). In trials with SVM, the best performance was determined with RBF kernel function on the prediction of both OMC (R = 0.8510 and MSE = 2.5247) and MDD (R = 0.8483 and MSE = 0.9557) parameters (Tables 10 and 11).

As a result of this study, it is seen that the ELM method was the most successful method on the prediction of compaction parameters. And also, The ELM and BRNN methods are found to be more successful than GMDH due to the lack of a very linear relationship between the data. This has also been confirmed with compare of the performance results of all methods (Table 12). However, the SVM has been considered as the lowest-performing method in this study. The results obtained with SVM method were less successful in all trials on the prediction of both OMC and MDD parameters comparing to the ANN-based models. It is believed that the limitation on achieving more successful results is due to the small number of data (451 test data) and it is thought that the success rates of different soft computing models will increase if the data set is expanded in the future.