Introduction

When saturated cohesionless soils were subjected to the undrained static or cyclic loading, the pore water pressure developed abruptly, and the effective stress reduced dramatically leading to a loss in shear strength or liquefaction [1,2,3,4,5]. The static liquefaction has been considered one of the biggest disastrous failure mechanisms in saturated sandy soils because when it occurs, the resistance of the soil reduces, and the ability of a soil layer to sustain many of geotechnical applications such as foundations of buildings and bridges, earth dams, slopes, and embankments is reduced [3, 6]. The phenomenon of complete static liquefaction can be described by

$$\sigma^{{\prime }}_{3c} = \, 0\;{\text{and}}\;\sigma_{1} {-} \, \sigma_{3c} = \, 0$$
(1)

In which σ′3c is the initial effective confining pressure, and (σ1 − σ3c) is the principle stress difference. The static liquefaction has been examined in many of previous experimental studies and most of these work reported that the static liquefaction is profoundly dependent on numerous factors such as void ratio (e), effective confining pressure (σ′3c), relative density (Dr), consolidation type which defined by ratio of initial shear stress to the initial effective confining pressure (α), the degree of saturation (Skempton’s coefficient B), fine content and the sample preparation methods [4, 7,8,9]. For practical purposes, the liquefaction susceptibility of sandy soils is generally evaluated in two ways: (1) utilizing monotonic undrained tests on undisturbed and remoulded specimens and adapting the collected results to equivalent parameters such as state parameter, undrained brittleness index, liquefaction potential, pore pressure ratio, and the ratio of minimum of deviatoric stress to the initial peak deviatoric stress [3, 4, 6, 10, 11], and (2) using the empirical relations based on the correlation between the field tests and laboratory tests [12]. The liquefaction susceptibility has been investigated in many studies based on different parameters [3, 13, 14]. The liquefaction susceptibility of soils may also be evaluated by assessing the ratio of minimum deviatoric stress to the peak deviatoric stress (qmin/qpeak) [4]. The complete static liquefaction is associated with an qmin/qpeak ratio of zero and the stable behaviour with complete dilation is associated with an qmin/qpeak ratio of 1 [4]. The qmin/qpeak ratio can be used to compute the liquefaction potential which is defined by (qpeak − qmin)/qpeak = 1 − qmin/qpeak [15, 16]. Rahman and Lo [17] pointed out the ratio of qmin/qpeak can be correlated to the state parameter to normalize the amount of strain softening in samples.

There are many limitations for using the results of experimental tests on disturbed specimens directly in the field situations. One of these constraints is these tests do not take into account the all actual properties of natural soils; for example fabric, cementation, strain history, and overconsolidation [18]. Next, the experimental tests are often costly and time-consuming. In the same way, analytical methods such as finite element method which is used to analyse many geotechnical engineering problems are hindered because these techniques require a large number of parameters in order to obtain an accurate constitutive model for complex problems such as liquefaction [19]. Therefore, artificial intelligence (AI) approaches have been used to reduce these difficulties and to provide an easy technique for evaluating the complex issues in geotechnical engineering [20, 21]. Artificial neural network (ANN) is one of artificial intelligence approaches which has been used widely to solve varieties of complicated issues in civil engineering. ANNs are becoming more dependable than analytical techniques such as traditional empirical and statistical methods due to two reasons. The first one is ANNs can learn from input data given to it. The second reason is the ability of ANNs to recognise the precise practical relationships between input data even though the fundamental relationships are unrecognised. Research on the development of ANN models based on laboratory elements tests is still very limited. Young-Su and Byung-Tak [22] used data that have been collected from different published works including simple cyclic shear and undrained cyclic triaxial tests in estimating the liquefaction resistance ratio of sand by using the ANN model. Banimahd et al. [23] developed ANN models to predict the undrained stress–strain behaviour and excess pore water pressure of sandy soils containing nonplastic fines. Their results showed that the undrained behaviour of sandy soil was dominated by four major input variables, including fine content, fine shape, relative density, and effective confining pressure. However, many ANN models have been used in various research to estimate the seismic liquefaction susceptibility of soils based on standard penetration test (SPT), cone penetration test (CPT), and seismic records after some main earthquakes took place in different countries Goh [24, 25], Ural and Saka [26], Hanna et al. [21], Tung et al. [27], Baziar and Nilipour [28]. For example, Goh [24] stated that Neural Networks could be considered as feasible tools for soil liquefaction evaluation and it is simpler to apply when compared with other methods. Mughieda et al. [29] adopted the ANN to estimate the liquefaction susceptibility of soils by relying on a database of CPT. Contradictory to previous studies, Mughieda et al. [29] argued that the preprocessing, normalising or calibrating the data before evaluating the liquefaction potential is not necessary. Farrokhzad et al. [30] developed ANN models to predict seismic liquefaction potential of soils based on a dataset from field tests of 30 boreholes. Additionally, ANN has been adopted to assess many problems in the area of geotechnical engineering such as prediction of scours at bridge piers, unsaturated shear strength of soil, safety of a typical artificial slope subjected to earthquake forces, horizontal ground displacement generated by earthquakes, the maximum dry density and permeability of various types of soils, and residual friction angle of clay soils [31,32,33,34,35,36].

Symbolic regression via genetic programming (GP) is another artificial intelligence approach that has been employed in many previous studies to develop new predictive relationships for numerous geotechnical engineering issues. GP is a process of developing computer programs to solve a problem. It depends on the evolutionary algorithms to provide a good approximate solution to problems by unexpectedly creating populations of computer applications presented by a tree structure. GP technique is distinguished from other AI techniques and statistical method by the possibility of predicting compact and explicit prediction model equations in terms of various model parameters [37]. Although the GP technique has been successfully used to predict some of the complex soil mechanics parameters, application of this technique in the evaluation of liquefaction potential is very limited [37,38,39,40]. Muduli and Das [37] used the multi-gene genetic programming (MGGP) to calculate the liquefaction potential of soil by relying on datasets from standard penetration tests (SPT). The results showed that the liquefaction potential model which was developed by MGGP was more accurate than models developed by using the support vector machine (SVM) and ANN when the same database was used. Muduli & Das [38] developed two different multi-gene genetic programming MGGP models to predict the liquefaction potential of soils regarding liquefaction index based on CPT database. Das and Muduli [40] investigated the liquefaction susceptibility of soil by using genetic programming based on CPT data collected after the 1999 Chi–Chi earthquake which took place in Taiwan. Javadi et al. [41] developed a new approach to evaluate liquefaction-induced lateral displacement of soil by using GP and based on SPT data. Jafarian et al. [42] have implemented GP to develop a predictive equation for strong ground motions induced by the earthquake. Jafarian et al. [18] used data of cyclic triaxial tests to develop the predicted equation of cyclic resistance ratio using GP. Gandomi and Alavi [39] have employed GP to analyse different geotechnical problems including soil liquefaction under earthquakes. Johari et al. [43] developed a predictive equation for soil–water characteristics curve using GP. Rezania and Javadi [44] used the GP to predict the settlement of shallow foundations.

As shown above, most studies in ANN and GP focused on developing models for prediction of seismic liquefaction potential based on in situ tests and seismic records. However, the studies on the prediction of static liquefaction susceptibility of clean sandy soil are rare. Moreover, using data-sets that include experimental parameters may provide another way to understand the static loading response of soils in an essential manner. Therefore, in present work, two types of artificial intelligence (AI) approaches are used to evaluate the static liquefaction susceptibility of sandy soils based on results of various undrained monotonic triaxial tests on the clean sand that were collected carefully from the previously published work with different initial characteristics. The first approach is ANN, which is used to predict the ratio of (qmin/qpeak) based on various combinations of input data. The second method is symbolic regression via genetic programming using the HeuristicLab software to correlate the ratio of (qmin/qpeak) to the initial soil parameters based on one of ANN models that led to the best estimation. For geotechnical engineering application, the outcomes of the present study can be used in initial investigations of the ratio of (qmin/qpeak) for clean sandy soils before final liquefaction evaluation.

Database of Undrained Monotonic Triaxial Tests

An input data involving the findings of both anisotropically and isotropically consolidated static triaxial tests on clean sand soils were collected by investigating previously presented research. Results of experimental studies conducted by Murthy et al. [45], Yamamuro and Lade [4], Jafarian et al. [3], Della et al. [46], Della et al. [47], Rahman [17], Verdugo & Ishihara [48], Belhouari et al. [49], and Yang and Wei [50] have been used in the database. Table 1 summarises the key characteristics of these laboratory tests. According to Table 1, most samples were prepared by using moist tamping method while few samples were prepared by using other sample preparation methods. The moist tamping method is widely employed in previous studies because various relative densities can be obtained using this method. Although the above tests were performed under different parameters, two significant parameters such as relative density and initial effective confining pressure were extensively varied in these tests. The static liquefaction database involves the results of 135 undrained static triaxial tests and correlating ratio of (qmin/qpeak) with different initial characteristics of clean sandy soils. The coefficient of uniformity (Cu), mean diameter (D50), maximum void ratio (emax), minimum void ratio (emin), void ratio (e), relative density (Dr), initial confining pressure (σ′3c), ratio of initial shear stress to the initial effective confining pressure (α), and the Skempton’s coefficient B were selected in this study because they have been considered as the primary factors which might affect the static behaviour of sandy soil in previous research. The criterion for complete static liquefaction in this database, and accordingly in this study, is the ratio of (qmin/qpeak) of zero. However, the unity, and range between 0 and 1 of the ratio of (qmin/qpeak) associated with dilative behaviour and limited liquefaction respectively. Table 2 presents the statistical distribution of input parameters.

Table 1 Properties of sands that are used in the database
Table 2 The statistical distribution of each parameter in the database

Artificial Neural Network (ANN) Models

Artificial neural network (ANN) is one of machine learning approaches constructed to imitate the human central nervous system especially the brain to model many complex engineering issues. Further, ANNs have the ability to learn from data given to them, generalising the predicted interrelationships for a future solution, and self-updating [51, 52]. Detailed information on ANN is reported in [53, 54]. In present work, nine parameters were utilised as input data of ANN models. Only one variable was used as output data. The process of training in a neural network includes adjusting the weights of parameters inside hidden layers until reaching the lowest difference between predicted output and actual output. In this study, the ANN models were generated using MATLAB’s Neural Network Toolbox. The ANN model was based on Levenberg–Marquardt back-propagation algorithm, two layers feed-forward backprop, and seven hidden neurones. The number of hidden neurones has been chosen after many trails until reaching the minimum Root Mean Square Error (RMSE). All datasets were normalised using MATLAB’s normalisation function which is a requirement of ANN modelling. The dataset in this study was split into three groups namely training, testing, and validation. The percentage of each group was 70, 15, and 15% for training, testing, and validation respectively. The training data is implemented to modify the connection weights. The testing dataset is adopted to avoid the overfitting, and the validation set is used to investigate the estimation ability of the model. The model can be considered an optimal model if it combines three conditions: (1) perfect performance in testing set, (2) a minimum number of hidden neurones, and (3) good performance in the training, testing, and validation sets. The performance of the ANN model was investigated by Root Mean Square Error (RMSE), and coefficient of determination (R2) as shown below:

$$RMSE = \sqrt {\frac{1}{n} \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - x_{i} } \right)^{2} }$$
(2)
$$R^{2} = \left[ {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)\left( {y_{i} - \bar{y}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} \mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} } }}} \right]^{2}$$
(3)

where xi are the input data, yi is model estimation, n is number of data points, \(\bar{x}\) and \(\bar{y}\) is the mean value of observed data model estimation respectively. The best performance of ANN models is reached through some steps of trial and error until the coefficient of determination R2 around 90% is achieved. After completing the training and testing of the model, a sample data-set is used to test the accuracy of the model. Nine parameters are used to the ANN models as inputs. These include Cu, D50, emax, emin, e, Dr, σ′3c, α, and B. These input variables are chosen according to the previous experimental studies that pointed out the static behaviour of sandy soil is profoundly dependent on many factors such as physical properties, relative density, initial effective confining pressure, degree of saturation, and consolidation type. It is pertinent to mention here that the effect of sample preparation method was not considered in input data-set because the majority of samples used in the present work were deposited by the moist tamping technique as listed in Table 1. The ratio of qmin/qpeak serves to identify which of the input data is more superior in estimating the liquefaction susceptibility of sandy soil. The number of input variables is varied, and the five ANN models are investigated. Figures 1 and 2 compare the measured ratio of qmin/qpeak and ANN predicts the overall data-sets, whereas Table 3 presents the coefficient of determination (R2) and Root Mean Square Error (RMSE) for overall data, training, testing, and validation of each model. Furthermore, it shows identity in statistical significance values for testing and validation in all models. It can be seen from Figs. 1, 2 and Table 3 that for model 1, when all inputs variables were used, it showed a good prediction with highest R2 values 0.865, 0.846, and 0.864 for overall data, testing and training respectively. Lowest R2 for overall data (0.733) and training set (0.722) were obtained in model 4 when the variables α and B are eliminated while the lowest R2 for testing set was obtained in model 5. In present study the efficiency of five models was evaluated in terms of testing data-sets as reported in Muduli and Das [38] and Das and Basudhar [36]. They stated that the performance of testing data-set should be used in evaluating the efficiency of different developed ANN models. Thus, it is found that the performance of model 1 with R2 for testing set 0.864 was highest when compared to other models which indicates that good agreement between the measured and predicted the ratio of qmin/qpeak. However, model 5 showed the lowest R2 for testing set 0.707 but with some difficulty in predicting the ratio of qmin/qpeak. The results in Fig. 1c and Table 3 also show a slight reduction in R2 of testing and overall data-sets for model 3 when the Skempton’s coefficient B eliminated comparing with model 1. This behaviour could be related to the almost tests were completely saturated, and the B values were more than 0.95. The significant reduction in R2 (0.707) of testing data-set for model 5 indicates the high effect of emax, emin, and α on liquefaction susceptibility of sandy soils. The results of models 1–4 indicate the significant effect of Cu, D50, emax, emin, Dr, σ′3c on undrained static behaviour of sandy soils. A similar conclusion has been observed and reported by Young-Su and Byung-Tak [22] when they reported that using data that include Cu, D50, emax, emin, Dr, and σ′3c increased the ability of ANN model to capture the liquefaction resistance ratio of sandy soils. Banimahd et al. [23] also reported that the relative density, effective confining pressure, fines content, and fines shape has a profound effect on the ability of ANN models to predict the undrained static stress–strain behaviour and excess pore water pressure of sandy soils. Figure 2c compares the five ANN models implemented in present work, and as can be seen the results of five models are quiet close which exhibited a higher prediction performance for ANN models.

Fig. 1
figure 1

Measured versus predicted qmin/qpeak for the ANN models: a model 1; b model 2 and c model 3

Fig. 2
figure 2

Measured versus predicted qmin/qpeak for the ANN models: a model 4; b model 5 and c all ANN models

Table 3 The performance and details of the ANN models

Genetic Programming Method

In present work, the HeuristicLab software was used to develop a functional relationship for the ratio of qmin/qpeak of clean sandy soils based on symbolic regression via genetic programming (GP). Symbolic regression can be defined as one of data mining approaches which is used to extract hidden, meaningful relationships using input data and weights. Symbolic regression depends on a tree-based genetic programming (GP) system to develop mathematical equations. The structure and operation of genetic programming (GP) have been described by numerous authors Koza [55], Johari et al. [43], [37, 38], and Rezania and Javadi [44]. HeuristicLab has been employed in some engineering problems. However, the utilisation of this software in soil mechanics and foundation engineering is still quite rare. HeuristicLab is an open source software based on heuristic and evolutionary algorithms and is generated by the Heuristic and Evolutionary Algorithms Laboratory (HEAL) since 2002 at the University of Applied Sciences Upper Austria. It works in C+ and is based on the Microsoft.Net framework [56]. Also, it has a high ability to provide a graphical user interface. HeuristicLab is supporting a broad range of algorithms such as genetic algorithm, Gaussian process regression and classification, neural network regression and classification…etc. Also, it supports many types of problems such as artificial art, classification, clusterin, symbolic regression, symbolic classification, etc. One of the features of HeuristicLab is the ability of software to simplify the complex model by trimming it to find a good agreement between the complexity and accuracy [56]. For more details about HeuristicLab, please see [56]. The input data were chosen based on the best results in ANN’s modelling as described in part 3 which showed that model 1 with nine input variables exhibits the best prediction with the highest value of R2. Nine parameters were used to the GP model as inputs. These include Cu, D50, emax, emin, e, Dr, σ′3c, α, and the B, while one parameter the ratio of qmin/qpeak is used as output. The input data were loaded into the software, thereafter a symbolic regression by GP was performed with variables set listed in Table 4.

Table 4 Symbolic regression parameters

The input data were divided into 67% for training and 33% for testing. The software approached the better model of the ratio of qmin/qpeak with the highest values of R2 for training and testing, after a cycle of 75 generations. Therefore, the following equation was evolved to connect the ratio of qmin/qpeak of clean sandy soils to nine input parameters:

$$\begin{aligned} q_{{min}} /q_{{peak}} = & (7.314*{\text{EXP}}(19.980/((0.027* - 13.896 - (c_{1} *e_{{min}} + c_{2} *D_{{50}} ))/((6.146* - 0.83/(1.100) - c_{3} *C_{u} /(1.670))))/((6.146*(13.875* - 13.283/(c_{4} *e) \\ \quad + ({\text{EXP}}(((c_{5} *D_{r} + C_{6} *e_{{max}} ) + 5.780)) - (c_{5} *D_{r} + c_{6} *e_{{max}} ))) - {\text{LN(LN(EXP}}((c_{5} *D_{r} + c_{6} *e_{{max}} )))*(c_{7} *D_{r} + c_{8} *\sigma _{{3c}} ^{\prime } )/((c_{9} *D_{{50}} - 6.544))*18.994*(c_{{10}} *D_{r} + - 0.187)))))* - 1.864 + 14.832) \\ \end{aligned}$$
(4)

The values of coefficients c1–c10 are listed in Table 5.

Table 5 The coefficients of Eq. (4)

The developed model (Eq. 4) was more sensitive to change in physical properties and initial state than other parameters. The same results were reported in ANN modelling where the performance of ANN model is affected by the change in Cu, D50, emax, emin, Dr, and σ′3c. This finding is consistent with findings of past studies by Young-Su and Byung-Tak [22] and Banimahd et al. [23]. It is also worth noting that the impact of Cu, D50, emax, emin, Dr, and σ′3c on the liquefaction susceptibility of sandy soil has been reported in many previous experimental studies [3, 4, 6]. The developed qmin/qpeak equation ignores some parameters such as ratio of initial shear stress to the initial effective confining pressure (α), and the Skempton’s coefficient B. This can be related to these values are considered barely useful because experimental tests were almost fully saturated and isotropically consolidated with values around 1 and 0 for (B) and (α) respectively. Moreover, Eq. 4 does not take into account the effect of some field factors such as ageing, strain history, cementation, and stratification due to the difficulties in mimicking these conditions in experimental work. Figure 3 shows the measured values of the ratio of qmin/qpeak versus the equivalent values as predicted by Eq. (4). This Figure shows the data of training and testing sets are closely distributed around bisector line which indicates a good prediction ability for the developed model. The performance of the HeuristicLab model was examined using statistical precision parameters such as Root Mean Square Error (RMSE) and coefficient of determination (R2). Table 6 presents R2 and RMSE of the proposed model for the training and testing set. Figure 4 illustrates the plot of the normalized qmin/qpeak (i.e. the ratio of measured to the estimated qmin/qpeak values) versus the estimated qmin/qpeak values for all data-sets. The Figure shows that the almost normalized qmin/qpeak values are distributed around one which indicates a good agreement between measured and predicted values. Figure 5 demonstrates the tree of the developed model. In comparison, the classification accuracy of the ANN model1 was 0.846 and 0.864 for training and testing respectively. Similarly, the clasification accuracy for GP model was 0.868 and 0.842 for training and testing respectively. Thus, it is found that a good agreement exits between the two models in predicting the ratio of qmin/qpeak.

Fig. 3
figure 3

Measured values of the ratio of qmin/qpeak versus those predicted by the developed model for training and testing datasets

Table 6 Performance of the qmin/qpeak model for the training and testing data sets
Fig. 4
figure 4

The normalized ratio of qmin/qpeak versus those predicted by the developed model for all datasets

Fig. 5
figure 5

The tree of the developed model

Parametric Study

The efficiency of newly proposed models in the prediction of static liquefaction susceptibility of sandy soils requires one to compare it with that of other modelling methods or experimental results. Thus, in present work, the methodical parametric study was implemented for the verification of the rate of success for Eq. 4 in the estimation of the ratio of qmin/qpeak, taking into consideration its physical meaning. In this parametric study, the findings of three undrained static triaxial tests were compared with the results of Eq. 4. These tests are part of authors published work which has been executed on very loose clean Perth sand samples [57]. Static undrained compression triaxial tests were performed on soil samples deposited by the moist tamping technique and isotropically consolidated under three different confining pressures; namely 100, 150, and 200 kPa. The sands used for the experiments was clean sand, and Cu, D50, emax, emin, e, Dr, B, α were equal at 2.235, 0.35, 0.675, 0.544, 0.6615, 10%, 0.95 and zero respectively. The ratio of qmin/qpeak is calculated using GP model following Eq. 4 and experimental tests. As per the comparison presented in Fig. 6, a good agreement between experimental results and modelling results at confining pressure of 100 and 150 kPa. However, there was a slight difference at a confining pressure 200 kPa. This can be related to the developed equation which is more suitable in low confining pressure than higher confining pressures. Also, the applicability and validity of the developed equation are dependent upon the range of variables in input data which were collected from previous studies. Furthermore, the parametric study demonstrated that the ratio of qmin/qpeak increased with increasing confining pressure. This is reported in many experimental studies which showed that the liquefaction susceptibility of very loose samples decreased with increasing the relative density and confining pressure. Thus, it can be observed that the current GP model is equally efficient in predicting the ratio of qmin/qpeak when compared to experimental methods.

Fig. 6
figure 6

The ratio of qmin/qpeak measured by experimental tests and the ratio of qmin/qpeak predicted by the developed model

Summary and Conclusion

An artificial neural network and genetic programming models were developed for the prediction of static liquefaction susceptibility of sandy soils in terms of the ratio qmin/qpeak using data obtained from previously published work. The dataset included nine input parameters namely Cu, D50, emax, emin, e, Dr, σ′3c, α, and B and one target parameter called the ratio of qmin/qpeak. The ANN results demonstrate that the developed model using all nine parameters is able to efficiently capture the liquefaction sucepbtibility of soils with a coefficent of determination (R2) of 0.864 for testing set. However, the accuracy of ANN model is reduced to 0.707 for testing set when four parameters were eliminated (i.e., emax, emin, α, and B). A new equation for the prediction of liquefaction susceptibility of clean sandy soil in terms of the ratio of qmin/qpeak is proposed by using the symbolic regression via genetic programming.The results indicate a good agreement between ANN and GP approches in predicting the ratio of qmin/qpeak. Although the the ANN models and GP model showed a successful rate of predictiing the ratio of qmin/qpeak, the proposed models still contain some limitations. The limitations in present work might be related to some sources such as properties of database, the number of data, method of sample deposition, type of software, and type of regression analysis. Therefore, the findings of this work should be used carefully to account for the limitations presented above.