1 Introduction

One of the most crucial earthquake-induced hazards is liquefaction, this hazard is recognized as one of the most common damaging incidents related to earthquakes. Liquefaction occurs whenever loosely packed liquid sediments at the surface of the earth lose strength due to intense shaking. During an earthquake, liquefaction underneath structures may cause considerable damage. For instance, the 1964 Niigata earthquake in Japan caused significant liquefaction and structural destruction. In the 1989 Loma Prieta earthquake in California, liquefied fill soils and debris produced considerable subsidence, fracture, and lateral spreading of the surface of the ground in San Francisco's Marina neighborhood. Soil liquefaction is the ground breakdown or loss of strength that enables normally solid soil to behave as a viscous liquid. The phenomenon happens in unconsolidated soils impacted by secondary seismic S-waves (ground vibrations). Construction activities such as blasting, soil compaction, and vibrio flotation (which utilizes a vibrating probe to modify the microstructures of the surrounding soil) purposefully create liquefaction. Sand, silt, and gravelly soils with poor drainage are particularly prone to liquefaction. When an earthquake shocks saturated soils, the liquid pore spaces collapse, reducing the soil's volume. This raises the hydrostatic pressure between soil grains that reduces the soil's resistance to shear force and turns the soil into a liquid. Soil deforms quickly when liquefied, and massive items like buildings may be destroyed when they lose support from underneath. Liquefaction has reportedly proven to occur in loose saturated sand deposits (Juang et al. 2003; Owen and Moretti 2011; Pathak and Purandare 2016; CubrinovskiI et al. 2018; Mohanty and Patra 2018; Zhang et al. 2018; Sharma et al. 2019; Anderson 2019; Rasouli et al. 2019; Beyzaei et al. 2019). Due to the severe destruction made by earthquakes associated with liquefaction, researchers are increasingly involved in studying the liquefaction vulnerability of soils. Most liquefaction studies use a traditional empirical method such as regression methods that do not usually provide a clear liquefaction assessment other than statistical experimentation based on observed events, due to the complexity of the liquefaction mechanism. Pal (2006) used Standard Penetration Tests (SPT) and CPT data tested by several machine-learning approaches amongst which the Support Vector Machines provided the best liquefaction prediction.

Real-world methods developed based on SPT and practitioners of engineering prefer CPT tests since it is readily available, cost-effective, and easy to perform. CPT test, particularly, gained popularity and broad acceptance in liquefaction studies since this test is known to provide a reliable estimation of mechanical parameters of sands. A primary advantage of the CPT method is the continuous data produced over the entire depth of the investigated soil layers. CPT is also recognized with consistent and repeatable measurements more than other in situ test methods. However, most of the CPT-based methods are empirical performance functions established based on field observations during earthquake events (Baez et al. 2000; Andrus et al. 2003; Juang et al. 2003; Samui 2007; Samui and Sitharam 2011; Zhao and Cai 2015; Setiawan et al. 2018). Susceptibility of liquefaction is indexed by the Factor of Safety defined (Idriss and Boulanger 2008) as

$$\text{FS} = \frac{{\text{CRR}}_{\text{M}}}{{\text{CSR}}_{\text{M}}}$$
(1)

where \({\mathrm{CRR}}_{\mathrm{M}}\) is cyclic resistance ratio at earthquake magnitude M and \({\mathrm{CSR}}_{\mathrm{M}}\) is cyclic stress ratio earthquake magnitude (M).

CRR is equivalent to CSR that induces liquefaction for a particular soil and was introduced by several authors (Seed and Idriss 1971) as a function of CPT test parameters (primarily qc in different forms) or SPT values (Seed and Idriss 1971; Seed 1975). (Seed and Idriss 1971; Seed et al. 1975) introduced the well-known equation to estimate the cyclic stress ratio (CSR)

$$CSR_eq = \frac{\tau_ave}{\sigma\nu 0^{\prime}}= 0.65 \frac{\text{a max}}{\text{g}}\cdot \frac{\sigma\nu 0}{\sigma\nu 0^{\prime}} \cdot r_d$$
(2)

where τave is the average earthquake-induced shear stress, σʹvo is the effective vertical stress, amax is the maximum horizontal acceleration, g is the gravity acceleration, σvo is the total vertical stress, and rd is the depth reduction factor to account for the soil column flexibility. The constant 0.65 is employed to transform the peak cyclic shear stress ratio into a cyclic stress ratio.

Liquefaction susceptibility requires interpretation of too many parameters that are obtained from cone penetration testing, in addition to seismic parameters including cyclic stress ratio, CSR which provide a meaning of seismic charge in a soil matrix, and the cyclic resistance ratio, CRR which provides the capability of a soil to resist liquefaction (Youd 2000). Seed and Idriss (1971) suggested incorporating cyclic stress and cyclic resistance ratio in assessing liquefaction susceptibility. Later on, several techniques have been developed to evaluate the cyclic resistance ratio (Idriss and Boulanger 2006). Interpreting a large number of parameters, and using many methods to estimate the same parameter, assimilates a significant amount of uncertainties in the results and conclusions. ANN is the most principal method utilized in soil liquefaction investigation representing great capabilities concerning complex nonlinear problems (Mughieda et al. 2009; Stolte and Cox 2019; Javdanian 2019; Sideras 2019; Njock et al. 2020). Artificial intelligence is generally applied for the classification and prediction of a phenomenon, rather than using conventional methods (Hanandeh 2007, 2022a, b; Fang et al. 2018; Bi et al. 2018; Hanandeh et al. 2020a, b; Al Bodour et al. 2022)

In the past few years, there has been an increasing interest in Supervised Machine learning models in the sciences and engineering fields. The main reason for the success of these models is their ability to sufficiently approximate a general complex function provided enough data is fed into these models. Moreover, the abundance of various methods to collect data and the availability to process this data has also contributed to the popularity of this field. The premier benefit of machine learning methods over the conventional methods is their strength to capture the nonlinear behavior and interrelations between dependent and independent variables, in addition to their high capabilities to operate with complicated data hierarchies (Goh 1996; Juang et al. 2003; Goh and Goh 2007; Oommen et al. 2010; Samui and Sitharam 2011). This study is intended to introduce a comparison analysis using various machine-learning classifiers for assessing liquefaction potential. More specifically, it presents a comparative study on supervised machine learning classifiers to classify the soil type (whether liquefiable or not) under certain conditions. In this study, three supervised machine learning classifiers were studied on three parameter-based models. The following machine learning methods were used in this study: decision tree, support vector machine (SVM), and quadratic discriminant analysis (QDA). More explanation of these techniques is explained in the appendix. These models are considered to study the liquefaction phenomenon. A comparison between the three models is performed to determine how strongly they correlate to the phenomenon and which one amongst them best classifies the soil into liquefiable or non-liquefiable soils.

2 Database

The data used in this study to propose and verify the machine-learning models were collected from Stark and Olson (1995). The database consists of resistance values obtained from CPT testing versus observation-based information on whether the soil liquefied during an earthquake event or not. The experimental data includes 94 incidents of liquefiable and non-liquefiable sites. Data was depicted in 53 sections that liquefied and 41 sections that did not experience liquefaction. The soils in these locations vary from silty sand to sandy silt. The measured depth of the CPT test varies from 1.3 to 15.1 m. The tip resistance (qc) value varies from 0.38 to 20.6 MPa. The measured total stress varies from 31.4 to 290.3 kPa, while the effective stress ranges from 13.9 to 227.5 kPa. The peak ground horizontal acceleration at the ground surface varies from 0.15g to 0.5g. Moreover, experimental CPT test data sets along with different types of other soil parameters were used to predict Machine learning (ML) models. The computer program Python was used to perform the Machine Learning analysis. Each of the three proposed models maps the liquefaction occurrence to a set of parameters is presented in Table 1. A summary of the statistical parameters performed for the collected data is tabulated in Table 2.

Table 1 Parameters used to determine soil liquefaction susceptibility class for 3 models
Table 2 Basic statistical parameters for data used in developing machin learning model

3 Analysis of machine learning methods/classification

Each of these classifiers is inspected for the three parametric models provided above with different input parameters, and the output for all three models includes one output layer which denotes happens (1) and non-happens (0) of liquefaction. The goal here is to consider applying these classifiers to the three models in order to utilize the provided real-world measurements in predicting earthquake-induced liquefaction and to identify the most appropriate method that describes each model. In order to train the above classifiers, one usually starts with a reasonable default set of hyper-parameters. In this study, the hyper-parameters were chosen to be the default hyper-parameters that are provided with the Scikit-Learn package. Once the hyper-parameters are initially chosen, standard learning curve analysis is performed, and the variance and bias of the curves are examined. After the initial inspection of the learning curves, the model complexity analysis is then performed. For the three models, it is important to modify the hyper-parameters to reduce the likelihood of false-negative classifiers. This is because a false negative prediction potentially has very severe consequences. For this reason, the hyper-parameters are tuned with respect to the recall metric, which is defined to be:

$$\text{Recall} = \frac{\text{True positive}}{\text{True positive+False negative}}$$
(3)
$$\text{Precision } = \frac{\text{True positive}}{\text{True positive+False positive}}$$
(4)
$${\text{Accuracy}}=\frac{\text{True positive+True positive}}{\text{True positive+False positive+False negative+True negative}}$$
(5)

It is also sometimes desirable to record the classification results in a single matrix called the confusion matrix of a classifier. Specifically, the confusion matrix of a binary classifier is a 2 × 2 matrix that summarizes the prediction results of the classifier. In particular, it stores the number, or percentage, of the correct and incorrect predictions broken down into each class. Finally, it is common to divide the dataset into two subsets: a training part and a testing part. In this study, each dataset was divided into 70% for training and 30% for testing. This study performs an analysis of the training data set using tenfold cross-validation. Without testing the three models, an initial inspection of the recall metric analysis of the seven classifiers mentioned above is computed and reported in Table 3 on the testing dataset with respect to the three models. These results are improved in later sections when we perform a hyperparameter search.

Table 3 Supervised classification models recall results

The explanations of essential terminology are utilized to express the fundamental metrics while recognizing the absence of liquefaction occurrences. The explanation of the terms used in this study is defined as follows: True negative and true positive designate that the representations are predicted accurately. A false positive expresses the quantity of no liquefaction that is predicted inaccurately as positive. A false negative indicates the quantity of liquefied units that are predicted inaccurately as negative. Precision relates to the efficiency of the forecasts for a particular type (positive or negative). Recall estimates the precision of forecasts, recognizing just the predicted value. For the confusion matrix, the resulting metrics were utilized to estimate and analyze the forecast for three models.

3.1 Analysis of model 1

The first input data set (input data-1) is constructed as a function of the Mean Grain Size (D50), Measured CPT Tip Resistance (qc), Earthquake Magnitude (M), and Cyclic Shear Resistance (CSR). The output layer contains one output layer that denotes the liquefaction that happens (1) and non-happens (0). As mentioned earlier, the choice of the metric for choosing the classifier is based on the recall value that this classifier gives on the testing data. For model-1, one can observe from Table 4 that all three classifiers (SVM, Decision Tree, and QDA) provide a recall score of 1. In order to decide on the best classifier, other metrics are examined on these classifiers. The metrics that are examined along with recall are "precision" and "accuracy" as reported in the table. Table 4 shows that the QDA achieves the highest precision and accuracy score. The confusion matrix for this classifier is shown in Fig. 1.

Table 4 Classification report for model-1
Fig. 1
figure 1

The confusion matrix for the QDA classifier on Model-1

The model description can be represented using the decision tree that recognizes which situations are commonly expected to give a purposeful assemblage of liquefaction occurrence. The proposed decision tree model may be employed to determine the accurate soil liquidation frequency. Figure 2: Common Fuzzy Interpretation Purpose is applied to implement the common probable purpose established for liquefaction occurrence, which is for model 1. This demonstrates that the prediction results are collaborative and unique and that there is a high level of model accuracy.

Fig. 2
figure 2

Graphical result of Decision tree design soil liquefaction probability for model 1

3.2 Analysis of model 2

The second input data set (input data-2) employed D50, Normalized CPT Tip Resistance (qc−1), M, and CSR. The output layer contains one output layer that denotes the liquefaction that happens (1) and non-happens (0). For this model, three classifiers provided identical results on the recall metric. These classifiers are QDA, SVM, and Decision Tree. The most appropriate classifier for this model is the SVM because the accuracy metric for this classifier is 0.91, as shown in Fig. 3. Whereas the accuracy metric for the QDA and the Decision Tree classifiers is 0.88, as shown in Fig. 4. In other words, all of these classifiers are equally reliable for predicting negative examples, but when it comes to predicting an arbitrary example, the SVM outperforms the other two classifiers. All three classifiers provided identical results on the recall metric. In order to decide on the best classifier, one must look at other metrics examined in the unsupervised setting. The metrics that are examined along with recall are "precision" and "accuracy" as reported in table. Table 5 shows that the SVM achieves the highest precision and accuracy score. The confusion matrix for this classifier is shown in Fig. 3.

Fig. 3
figure 3

Confusion matrix of SVM for model-2

Fig. 4
figure 4

Confusion matrix of QDA and the Decision Tree classifiers for Model-2

Table 5 Classification report for model-2

Figure 5 common feasible interpretation purpose is applied to implement the common probable purpose established of liquefaction occurrence for model1.

Fig. 5
figure 5

Graphical result of Decision tree design soil liquefaction probability for model 2

3.3 Analysis of model 3

The third input data set (input data-3) consists of D50, qc−1, M, the Maximum Ground Acceleration (amax), Effective Vertical Overburden Stress, and Total Overburden Stress. The output layer contains one output layer that denotes the liquefaction that happens (1) and non-happens (0). Based on the recall results of the classifiers obtained in Table 1, the classifier of choice in this model is the support vector machine. Inspecting the confusion matrix of this classifier in Fig. 4, we observe that while the percentage of false negatives is very low for this classifier, the percentage of false positives is very high. Hence, this classifier is reliable to eliminate false negative examples, but it is unreliable when trying to decide on the positively predicted examples. In order to decide on the best classifier, one must look at other metrics examined in the unsupervised setting. The metrics that are examined along with recall are "precision" and "accuracy" as reported in the table. Table 6 shows that the decision tree achieves the highest precision and accuracy score. The confusion matrix for this classifier is shown in Fig. 6.

Table 6 Classification report for model-3
Fig. 6
figure 6

Confusion Matrix of Decision Tree on Model 3

The other classifiers that are performed and return relatively good results for this model are the QDA and the decision tree classifiers. The confusion matrices of these two classifiers are shown in Fig. 7. Observe that these classifiers give identical results on the confusion matrices. On the other hand, both of these two classifiers outperform the decision tree when it comes to predicting positive examples.

Fig. 7
figure 7

Confusion matrices for the decision tree and the SVM classifiers

Figure 8 common feasible interpretation purpose is applied to implement the common probable purpose established of liquefaction occurrence for model1.

Fig. 8
figure 8

Graphical result of Decision tree design soil liquefaction probability for model 3

3.4 Sensitivity analysis

This section discusses the importance of the features in the supervised classification tasks. The importance of a feature is defined as the increase in the prediction error of a given classifier after perturbing the values of the feature. In other words, feature importance measures how sensitive a classifier is with respect to changing a certain feature. In this study, the feature importance with respect to the best classifier for each model is only considered. Model-1 was found to be the best classified using the QDA classifier. Figure 9 shows the result of the feature importance test of features when using the QDA classifier for Model-1. The length of the bars represents the importance of the feature in the final classification task. The most important input variables for liquefaction potential prediction modeling were graded discerningly as follows: measured CPT tip resistance, cyclic stress ratio, earthquake magnitude, and mean grain size.

Fig. 9
figure 9

Feature Importance for QDA on model-1

The SVM was shown to be the best classifier for Model-2 in the previous analysis. Figure 10 shows the feature importance reported for this classifier. The most important input variables for liquefaction potential prediction modeling were graded descendingly as follows: normalized CPT tip resistance, cyclic stress ratio, earthquake magnitude, and mean grain size.

Fig. 10
figure 10

Feature importance for SVM on model-2

Model-3 has six input parameters, and the best classifier in this model was the Decision tree. Figure 11 shows the results of the feature importance in this model. The most important input variables for liquefaction potential prediction modeling were graded descending as follows: mean grain size, total effective overburden stress, effective overburden stress, earthquake magnitude, measured CPT tip resistance, and Peak Acceleration at the ground surface.

Fig. 11
figure 11

Feature importance for Decision tree on Model-3

4 Results and Discussion

The performance of QDA provides better results than the decision tree and support vector machine considering QDA reduces the error of the output model. Model 1 has a fantastic achievement percentage of 100% for experimental data; additionally, measured CPT tip resistance influences model 1 output results more than other input parameters. Furthermore, model 2 with the SVM method provides a supervised classification model recall percentage of 0.98 for experimental data. Also, for model 2, normalized CPT tip resistance influences the output results for model 2 more than other input parameters. Moreover, model 3 with the decision tree method provides better results than other machine learning methods. Furthermore, measured CPT tip resistance influences the output results more than other input parameters. To predict the liquefaction occurrence, three models with different input variables were presented and discussed above. In this study, for model 1, we included four input variables, designating the Cyclic Stress Ratio (CSR), the mean grain size (D50), the earthquake magnitude M, and the measured CPT tip resistance. Model 2 was composed of four input variables depicting the mean grain size, earthquake magnitude M, the measured CPT tip resistance, and the Cyclic Stress Ratio (CSR). Model 2 differs from model 1 by adding a new input variable, which is normalized cone tip resistance. The analysis’s results showed that the predicted value of liquefaction is similarly equal to the liquefaction observation in the proposed models.

5 Deployment of the Models

Deployment of the trained models can be done in practice by loading the previously trained model and executing this model on a newly available data point. All models in this manuscript were trained using the scikit-learn package. The final trained models that were explained in Sect. 3 are made available online on the following URL: https://www.dropbox.com/sh/iv3rv8azilbfsup/AAAhgYplOYiefX4z1pd7Fxjia?dl=0. The URL also contains instructions on python installation as well as the installation of the scikit-learn package. Furthermore, all available models are saved in “joblib” format which is a standard scikit-learn format to save trained models. To deploy these models, the following steps can be done:

  • After running python, the classifier can be loaded by using the command clf = load('filename.joblib'), where filename is the name of the model available in the URL provided above.

  • Given a new point x obtained by doing field measurement, a classifier can be utilized by using the command y = clf.predict(x). The obtained y is the final label that can be used to determine the final liquefaction label.

6 Conclusion

Liquefaction in saturated sand soil is an example of an important topic in geotechnical design. The CPT has been confirmed to be a powerful method in soil exploration and analysis of various features of soil response. In this study, machine-learning methods are utilized to estimate the liquefaction occurrence in soil by using CPT information. Three models were proposed based on different types of machine learning methods. The results show that Model-1 is the best among all classifiers, and across the three models, the results show that QDA provides better performance when compared with other classifier methods. For model-1, QDA generates (a score of 1 on the recall metric, 0.97 on the accuracy metric, and 0.94 on the precision metric). Model-2 was best described using SVM, (with a recall score of 0.90). Model-3 was best described using a decision tree (with a recall score of 0.95). In all models, the recall metric was not sufficient to decide the best classifier, as some classifiers performed equally well with respect to this metric. In order to decide on the best classifier computed, other metrics such as precision and accuracy were used to present the final decision. Using the available trained models explained in Sect. 5, engineers can apply the proposed three models as reliable and active tools to evaluate soil liquefaction perceptivity without any additional regulation calculation methods such as applying charts, equations, and tables. The findings confirm that using various machine learning methods is extremely effective for predicting liquefaction events.