Abstract
This study systematically presents the application of machine learning (ML) algorithms for constructing a constitutive model for soils. A genetic algorithm is integrated with ML algorithms to determine the global optimum model, and the k-fold cross-validation method is used to enhance the models’ robustness. Three typical ML algorithms with formulations explicitly expressed [i.e., back-propagation neural network (BPNN), extreme learning machine (ELM) and evolutionary polynomial regression (EPR)], and two modelling strategies (i.e. total or incremental stress–strain strategies) are used. A synthetic database is first generated based on a simple constitutive model to objectively evaluate the performance of three ML algorithms and two modelling strategies. Next, the optimum ML algorithm and the well evaluated modelling strategy are applied to experimental tests for examining its robustness. All results indicate that a BPNN-based constitutive model using the incremental stress–strain strategy performs best in modelling the mechanical behaviour of soils in terms of interpolation and extrapolation abilities, followed by ELM and then EPR.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Experimental investigations show that the mechanical behaviour of soils is very complicated, involving elements such as state-dependence [56], contraction-dilation [57], anisotropy [72], destructuration [41, 74], stress-path dependence [21], time-dependence [75], and non-coaxiality [59]. Accurate description of such soil behaviours is vitally important in engineering practice [33, 46, 66, 89]. Numerous constitutive models have been developed during the past few decades. These models can be classified as (1) linear-elastic, (2) elastic perfectly plastic (such as the Mohr–Coulomb model), (3) nonlinear (such as the hardening soil [62] and nonlinear Mohr–Coulomb [28] models, (4) critical state–based advanced (such as the modified cam-clay model [53], Nor-Sand model [25], CSAM model [82], Severn–Trent model [11], UH models [68,63,70], SANISAND model [58], SIMSAND model [26,25,28] and ANICREEP model [80]), hypoplasticity [36, 42, 64, 65] and (5) micromechanical models [4, 67, 76,71,72,79]. The last two categories are usually called advanced soil models [28, 80]. However, traditional soil models have three main disadvantages in modelling soil behaviours: (1) most constitutive models were developed based on certain assumptions [71, 72, 75] (e.g., the associated or non-associated flow rule, non-coaxiality), (2) each model was suitable only for a specific type of soil or specific stress-paths and (3) although the mathematical formulas in a constitutive model are developed based on some theories (e.g., elastoplasticity theory) or derived from finite experimental data (e.g., the critical state line from triaxial tests), the formula’s form gives good accuracy for selected tests, but at the same time limits the model’s simulation ability for other stress paths. For example, the Modified Cam-Clay (MCC) was derived from the triaxial tests of saturated remoulded clay, and thus the MCC model is difficult to predict other kind of tests or other soils. In addition, the mathematical formulas become increasingly complicated when involving many parameters, resulting in difficulties of parameter identification and further limiting their engineering applications.
Soil normally exhibits highly nonlinear characteristics. To simulate such characteristics, machine learning (ML) algorithms are very powerful and can thus be employed as an alternative way to construct data-driven constitutive models [88]. ML algorithms have three following advantages in developing soil models [86]: (1) ML algorithms can directly extract the stress–strain relationship from the experimental data without making any assumptions [9, 10, 12]. More stable and accurate results can be obtained by ML-based models if the physical mechanism is implied in training data and/or incorporated into the training process; (2) ML algorithms have a strong ability to capture complicated non-linear relationships [1, 5, 6, 17] and (3) the prediction accuracy of ML-based models can rise with the increasing datasets [83, 87]. Numerous ML-based soil models have already been developed, and they can be categorized according to the model’s training strategy, whether (1) training models using the total values of stress and strain or (2) training models in incremental form [38]. However, up to now there is no comparative study to discuss which one is more suitable to develop ML based model for describing soil behaviours. Accordingly, the performance of two stress–strain strategies in developing ML-based constitutive models deserves investigation.
To construct a ML-based soil model, myriad ML algorithms can be adopted, such as a back-propagation neural network (BPNN) [2, 16, 19, 49, 51, 61], evolutionary neural network (ENN) [32], recurrent neural network (RNN) [52, 90], support vector machines (SVMs) [35], evolutionary polynomial regression (EPR) [8, 24, 45] and genetic programming (GP) [3]. To find an ML algorithm that efficiently models soils’ stress–strain relationship, a comparison of performance of different ML algorithms is demanding. Furthermore, the performance of an ML-based constitutive model is usually evaluated by the testing data within the range of the training data (interpolation ability), but this strategy neglects its performance on the unseen data (extrapolation ability).
This study aims to comprehensively demonstrate the process of constructing a ML-based constitutive model. To this end, three representative ML algorithms that can give explicit expression—BPNN, extreme learning machine (ELM) and EPR—were selected. The k-fold cross-validation method was employed in the validation phase to enhance the robustness of ML-based constitutive models. A genetic algorithm (GA) was used to optimize parameters for developing the global optimum model. A synthetic database based on a simple shear soil constitutive model was first built, which focuses on revealing the real capabilities of BPNN, ELM and EPR to model soil behaviours, including interpolation and extrapolation abilities and the effects of the total and incremental stress–strain strategies. Thereafter, the optimum ML algorithm and modelling strategy were further applied to the experimental tests for examining its robustness.
2 Methodology of machine learning
2.1 Back-propagation neural network
In this study, the BPNN denotes a feedforward neural network characterized by propagation of errors from the output layer to find a set of weights and biases able to ensure that the output of the network is identical to the actual value [54]. A BPNN includes an input layer, any number of hidden layers and an output layer, which also determine its performance. Based on a given framework, the purpose of other hyper-parameters such as activation function is to further improve the training efficiency or optimize the model. Considering that this study focuses on simulating mechanical behaviours of soils, the deep investigation regarding the effect of each hyper-parameter on the model performance is not conducted. Herein, the optimum framework of the BPNN-based model is carefully investigated, whereas remaining hyper-parameters are set as the default value in Matlab toolbox. Once the hyper-parameters are determined, weighting and bias values can be calculated by gradient descent or optimization algorithms. Figure 1a illustrates a typical BPNN with one hidden layer. Taking the numbers of inputs and hidden and output neurons to be r, p and q, respectively, and assuming that there are n datasets in the training set, the output of the hidden and output layers can be expressed as
where X = matrix of input variables (r × n); H = matrix of the hidden layer output (p × n); O = matrix of output variables (q × n); W, V = weights matrix on the connections between input and hidden neurons (p × r) and between hidden and output neurons (q × p), respectively; θ, θo = bias vectors on the connections between input and hidden neurons (p × 1) and between hidden and output neurons (q × 1), respectively; and f, g = activation functions in hidden and output layers, respectively.
2.2 Extreme learning machine
The ELM is a type of feedforward neural network characterized by a single hidden layer (see Fig. 1b). The hyper-parameters in the ELM are equal to the number of hidden neurons. The weights of the input layer and the biases of the hidden layer are assigned randomly, and the weights of the hidden layer (β) are determined analytically through a simple generalized inverse operation of the hidden layer output matrix [22], as shown in Eqs. (3)–(4), making the ELM’s learning speed thousands of times faster than seen in traditional feedforward networks:
where X = matrix of input variables (r × n), H = matrix of the hidden layer output (p × n), O = matrix of output variables (q × n), W = weights matrix connecting input and hidden neurons (p × r), θ = the bias vector connecting input and hidden neurons (p × 1), β = the weight matrix connecting the hidden and the output layers (q × p) and f = the activation function in the hidden layer.
2.3 Evolutionary polynomial regression
EPR is a genetic programming method characterized by the modelling of a system using a mathematical expression in the form of polynomial structures. Constructing an EPR-based model consists of two phases: (1) structure identification and (2) parameter estimation [14]. During the first phase, optimization algorithms are used to search for symbolic structures—that is, to determine the exponent matrix. At the second phase, the parameters’ values are estimated by solving a least squares (LS) linear problem. Compared with BPNN and ELM, the training set in the EPR does not require normalization. A typical EPR expression can be formulated as
where y = predicted output, X = matrix of input variables, F = a function constructed by the process, fj (X) = jth transformed variable, aj = an adjustable parameter for the jth term and a0 = an optional bias. fj (X) is determined by the optimization algorithm, and aj and a0 are determined by the LS.
The EPR’s key objective is to identify the number of transformed variables and a combination of vectors of independent input variables. Herein, the transformed variable is obtained via
where xi = ith input variable, k = a total number of input variables and ESm×k = exponent matrix.
2.4 Genetic algorithm
GA is a meta-heuristic optimization algorithm inspired by natural evolution [20]. It has been extensively employed in geotechnical engineering for tasks such as identification of constitutive models’ parameters [26, 28, 73, 81], model selection [27], slope [39, 60], embankment [15, 44], tunnelling [37, 40], pile foundation [29, 31] and excavation [30]. In this study, the GA was selected to optimize initial weights and biases in BPNN and ELM algorithms and to search for symbolic structures in EPR. In GA, a population of individuals is first generated. A chromosome based on a coding scheme (real-coded GA) is then employed to represent each individual. After calculating the loss value of each individual, the best individual having the lowest loss value in the population is selected and then evolves through crossover and mutation operations to generate a new population. The process continues until it satisfies the termination criterion, that is, whether it reaches the maximum generation. Meanwhile, the loss value converges at a constant value.
2.5 K-fold cross-validation
Three phases are involved in the integrated process of constructing a ML model: training, validation and testing. The validation phase seeks to improve the robustness of the model and avoid overfitting. The k-fold cross validation can detect whether the overfitting issue exists after the training of model is completed. Moreover, the cross validation can also prevent the overfitting by integrating such method into the training process as the loss function [50] which is a commonly used method in the data-mining field. Currently, the k-fold cross-validation (CV) method is widely used to validate models [55]. In this method, the original training set is randomly divided into k sub-datasets. Herein, k–1 sub-datasets, which form a new sub-training set, are employed to train models, and the performance of the trained model is validated by the remaining sub-dataset. Each sample in the training set thus has an opportunity to train and validate models. k is generally set as 10 [34], thereby 10-fold CV method was used in this study.
At each round, the ML model with a fixed set of hyper-parameters was trained ten times based on nine sub-training sets, thereafter the performance of this ML model was evaluated by the mean squared error (MSE) for the remaining sub-dataset. Therefore, the loss function in the GA can be expressed as
where \({\overline{y}_{i}}\) = predicted output, yi = actual output, m = the number of datasets in the remaining sub-dataset and k = the number of CV sets.
2.6 Evaluation indicators
Two commonly used evaluation indicators—mean absolute error (MAE) and mean absolute percentage error (MAPE)—were used to evaluate the performance of ML models in this study. The combination of MAE and MAPE helps overcome the deficiencies of both, so that both are used extensively to evaluate model performance [5, 84, 85]. Low values of these two indicators indicate that a model has excellent performance. The expressions of MAE and MAPE can be obtained by
where r = actual output value, p = predicted output value and n = the total number of datasets.
2.7 Model framework
Figure 2a presents the flowchart for constructing a ML-based constitutive model. This type of data-driven model starts from the collection of datasets with which to form a database. In the ML domain, 80% (for training the model) and 20% (for testing the model) is a widely acknowledged scheme for data split ratio in the community. Such separation ratio can ensure the ML-based model being well trained and tested, which has been theoretically proved [13]. Therefore, 80% are used to train the model and 20% are used to test it in this study. The total or incremental stress–strain strategy is selected beforehand; thereafter, the corresponding features or input variables can be determined. At the next step, the 10-fold cross-validation method is used to divide the training set into ten subsets for training and validating models. At each round, GA is employed to identify the general parameters of ML algorithms. The hyper-parameters are determined by the trial-and-error method. After determining three optimal constitutive models based on BPNN, ELM and EPR, their performance is compared using the test set.
Figures 2b, c illustrate the schematic view of the total and incremental stress–strain strategy, respectively. In the total stress–strain strategy, the stress in the ith step is affected by the strain at the ith step and the physical parameters (see Eq. (10)). In the incremental stress–strain strategy, the stress at the ith step is affected by the strain, the stress at the (i–1)th step, the strain increment at the ith step and the physical parameters (see Eq. (11)):
where X = [x1, x2,…, xr], the vector of independent variables; σi, σi−1 = stress at the ith and (i–1)th steps; εi, εi–1 = strain at the ith and (i–1)th steps; Δεi = axial strain increment at the ith step; and f = formulation of stress–strain relationship, as determined by the ML algorithms in this study.
It should be noted that in the incremental stress–strain strategy, the predicted stress at the ith step needs to update the input stress variable in real time to predict the stress at the (i + 1)th step. In addition, the strain ε at the (i + 1)th step is updated by the following:
To eliminate the effect of scales of parameters on the model performance and improve convergence, all datasets need to be pre-processed. Herein, the independent variables (such as σn0 and p) only have their initial values keeping constant, and the strain and strain increment are manually pre-set, which comply with uniform distribution. Considering the distributions of all variables are different from each other and do not conform to Gaussian distribution, Minmax normalization method instead of standardization method is used in this study, as shown in follows:
where x = actual value of input variables, xmin = minimum value of input variables and xmax = maximum value of input variables. \({\overline{x}_{min}}\) = –1; \({\overline{x}_{max}}\) = 1.
3 ML–based constitutive models using synthetic data
3.1 Synthetic data by a simple soil model
To comprehensively compare the performance of three ML algorithms and two modelling strategies on developing constitutive models, a simple sand shear constitutive model was first used to generate synthetic datasets (see Eq. (14)). The purpose of ML-based constitutive models developed based on synthetic datasets rather than directly based on the experimental data is to eliminate the interference of experimental and measurement errors on the mapping capability of ML algorithms [88]. Moreover, the experimental data tend to be limited and insufficient for comparison of ML algorithms’ performance, whereas the data can be generated infinitely by a theoretical function.
where σn0 = vertical stress; τ = shear stress; γ = shear strain; G = shear modulus, 1000 kPa; and μ = friction angle, tan(π/6).
A total of fourteen curves were generated to develop the ML-based constitutive model. Herein, the axial strain γ ranges from 0 to 10%, and a fixed set of axial strain increment Δγ, including 0.01%, 0.05%, 0.1%, 0.15% and 0.2%, was chosen for ten curves. Each curve consists of 91 data points. Nine curves (σn0 = 25, 50, 100, 200, 250, 300, 400, 500 and 600 kPa) with a total of 819 data points were employed to train the ML-based constitutive model, and the remaining five curves (σn0 = 15, 150, 350, 650 and 700 kPa) were used to test the model.
According to the stress–strain strategy, as mentioned in Eqs. (11)–(12), the vector X of independent variables in this soil model is σn0. As a result, the total and incremental stress–strain strategies have two and four input variables, respectively. Both have an output variable. The corresponding total and incremental stress–strain strategy can be expressed by
where the definitions of τ, γ and Δγ are similar to those of σ, ε and Δε in Eqs. (11)–(12).
3.2 Determination of parameters in ML algorithms
The parameters to be determined in the ML algorithms include hyper-parameters and general parameters. The search space of the hyper-parameters regarding the framework of ML are presented in Table 2. A single-layer BPNN was used to construct constitutive models, which is sufficient to capture the stress–strain relationship. Table 3 summarizes several methods of determining the optimal number of hidden neurons. The optimal number of hidden neurons ranged from one to five in the total stress–strain strategy and from one to ten in the incremental stress–strain strategy. Because there is no method for determining the optimal number of neurons and transformed terms in the ELM and EPR, respectively, the ranges of hidden neurons and transformed terms in these two algorithms increase continuously until the number of hidden neurons and transformed terms cannot improve the model’s performance. In this way the ranges of hyper-parameters in three ML algorithms can be determined, as shown in Table 2.
In addition to the hyper-parameters, the initial weights and biases in BPNN and ELM as well as the exponent matrix in the EPR were determined using the GA. Note that the values of exponents must be non-negative in the EPR algorithm, because the datasets include the initial stress–strain stage (0, 0); indeed, negative exponents are wrong under this condition. The values of exponents were thus limited to [0, 1, 2, 3]. Table 4 presents the parameter values in GA. Note that BPNN and EPR are set to a maximum of 500 generations, whereas for the ELM, because of its different convergence rate, the figure is 5000.
3.3 Results of the validation set
Figure 3 presents the evolution of loss value generated by three types of ML-based constitutive models using the total stress–strain strategy. It can be clearly observed that the convergence rates of BPNN and EPR are much faster than that of ELM. The loss value roughly holds steady when the generation exceeds 350 and 200 in BPNN and EPR, respectively, whereas the loss value remains roughly constant when the generation reaches 4000 in ELM. From the perspective of the convergent loss value, as shown in Fig. 3, the optimal number of hidden neurons in BPNN and ELM are four and eight, respectively, and the optimal number of transformed terms in EPR is eleven.
The evolution of loss value generated by three ML algorithms using the incremental stress–strain strategy is shown in Fig. 4. Overall, the convergence rate in three types of ML-based constitutive models using the incremental stress–strain strategy is faster than that using the total stress–strain strategy. The loss value roughly holds steady when the generation reaches 250, 1000 and 100 in the BPNN, ELM and EPR, respectively. From the perspective of the convergent loss value, as shown in Fig. 4, the optimal numbers of hidden neurons in BPNN and ELM are four and ten, respectively, and the optimal number of transformed terms in EPR is eleven. Note that the optimal loss values are much less than those yielded using the total stress–strain strategy.
Another important hyper-parameter used in ELM and BPNN is the activation function. For comprehensively comparing the performance of BPNN and ELM, the optimum activation functions used in each algorithm should be determined. The commonly used five activation functions are applied, as shown in Eq. [18]. Figure 5 presents the evolution of loss values generated by BPNN and ELM based models (incremental strategy) with five activation functions. It can be observed that the tanh is the optimum activation functions in the hidden layers for both BPNN- and ELM-based models, thereby the tanh is used as the activation function in the following study.
3.4 Results of the training set
The optimal hyper-parameters of the three ML algorithms using two stress–strain strategies are determined as heretofore mentioned. Accordingly, three optimal ML-based constitutive models of each stress–strain strategy are constructed based on the training set. Figure 6 presents the predicted stress–strain curves using three optimally trained models on the basis of the total stress–strain strategy, compared with the measured curves. It is clear that the predicted results of BPNN show perfect agreement with the measured curves, whereas the results predicted by ELM and EPR deviate from the measured curves. In particular, the prediction error for the ELM- and EPR-based constitutive models is much larger at the initial stage (0, 0), and these models also yield a large error at the early stage of stress–strain curves, attributable to the principles of these two algorithms as described in Sect. 3.3.
Figure 7 presents the predicted stress–strain curves using three optimal models based on the incremental stress–strain strategy, compared with the measured curves. It can be seen that all three models can accurately capture the stress–strain curves, which indicates that the incremental stress–strain strategy for simulating stress–strain relationship shows a significant improvement.
3.5 Results of the test set
During the last phase, the performance of the ML-based constitutive model is evaluated against the test set, with σn0 in the training set ranging from 50 to 600 kPa. Generally, test datasets are taken within the range of training datasets, so that test sets with σn0 = 150 and 350 kPa are taken into consideration. To investigate the ability of the ML-based constitutive model to extrapolate beyond the range of training datasets, test sets for which σn0 = 15, 650 and 700 kPa are also conducted. Table 5 summarizes the values of indicators for these five test sets. For the interpolated test sets, Fig. 8 presents the results of simulation using three optimal ML-based constitutive models based on the total stress–strain strategy. The predicted stress–strain curve using BPNN largely agrees with the measured curve, and the corresponding MAE and MAPE values are also lower than those produced by ELM and EPR. Notably, ELM and EPR cannot accurately predict initial stress when strain equals zero, and ELM- and EPR-based constitutive models cannot accurately predict the evolution of stress. Figure 9 presents the results of simulation using three optimal ML models based on the incremental stress–strain strategy for the test set. The predicted stress–strain curves using the BPNN-based constitutive model still agree perfectly with the measured curves and outperform the ELM- and EPR-based constitutive models. The performance of the ELM-based constitutive model is better than that of the total stress–strain strategy, and it also accurately captures the evolution of stress. Nevertheless, the change in the performance of the EPR-based constitutive model is different from others, perfectly predicting the stress–strain relationship for σn0 = 350 but exhibiting worse performance at predicting the stress–strain relationship for σn0 = 150. Note that prediction performance at the initial stage of ELM- and EPR-based constitutive models is clearly improved from that seen with the total stress–strain strategy. Overall, ML-based constitutive models that use the incremental stress–strain strategy offer reliable performance for interpolated test sets, and a BPNN-based constitutive model exhibits the best performance.
The extrapolated test sets are used to further examine the generalization ability of ML-based models. Figure 10 presents the results of simulation using three optimal ML-based constitutive models based on the total stress–strain strategy. For σn0 = 650 and 700 kPa, three ML-based constitutive models can still capture the stress–strain relationship. The BPNN-based constitutive model performs perfectly, followed by the EPR and ELM-based constitutive models. However, for σn0 = 15 kPa, the predicted stress–strain curve by the BPNN-based constitutive model deviates from the actual stress–strain curve. The results of simulation using three optimal ML-based constitutive models based on the incremental stress–strain strategy are shown in Fig. 11. In addition to stress–strain curves for σn0 = 650 and 700 kPa, it can be observed that the BPNN-based constitutive model’s ability to predict the stress–strain relationship for σn0 = 15 kPa improves significantly. Meanwhile, the performance of the ELM-based constitutive model improves dramatically with lower MAE and MAPE values, whereas the prediction performance of EPR-based constitutive model decreases.
Overall, ML-based constitutive models are better at predicting stress–strain relationships within the range of the training datasets than at extrapolating beyond the range of the training datasets. ML-based constitutive models developed using the incremental stress–strain strategy outperform those developed using the total stress–strain strategy. A BPNN-based constitutive model developed using the incremental stress–strain strategy is thus recommended for describing the stress–strain relationship, because this model makes highly accurate predictions of the stress–strain relationship for the interpolated and extrapolated test sets.
4 ML–based constitutive models using real data
4.1 Database
To investigate ML-based constitutive models’ ability to predict soil behaviour in engineering practice, this study uses datasets from twelve sets of triaxial compression shear tests conducted by [18] on Kaolinite clays having various over-consolidation ratios (OCRs). The results of shear and void ratio behaviour are collected as shown in Fig. 12. Herein, datasets from nine tests having OCRs of 1, 2, 2.25, 2.5, 2.7, 4, 5, 10 and 20 are used to train the model, and the remaining three, with OCRs of 3, 8 and 50 kPa, are used to test it.
4.2 Selection of a simulation strategy
According to previous comparisons, BPNN integrating the incremental stress–strain strategy is used to model Kaolinite clays’ behaviour. According to the incremental stress–strain strategy seen in Eq. (16), the vector X of independent variables is the OCR, and there are two output variables: deviatoric stress q and void ratio e. Accordingly, ML-based Kaolin clays’ constitutive models can be obtained by
where p i–1, q i–1, ei−1, \(\varepsilon_{1}^{i - 1}\) = mean stress, deviatoric stress, void ratio and axial strain at the (i–1)th steps, respectively; q i, ei, Δ\(\varepsilon_{1}^{i}\) = deviatoric stress, void ratio and axial strain increment at the ith step, respectively; and f, g = formulations of deviatoric stress–strain and void ratio–strain relationships.
Figure 13 presents the framework of the BPNN-based constitutive model for predicting Kaolinite clay behaviour. Note that the predicted deviatoric stress and void ratio at the ith step must update the deviatoric stress and void ratio in real time to predict deviator stress and volumetric strain at the (i + 1)th step. Updates to strain ε at the (i + 1)th step follow Eq. (12). After training, formulations of BPNN-based Kaolinite constitutive models are found and summarized in Appendix.
4.3 Results of simulation
Validation results of these simulations with which to determine the optimal parameters of the BPNN-based Kaolinite constitutive model are not presented, but Appendix A presents the formulation of optimal the BPNN-based Kaolinite clay constitutive model in detail. It can be observed that the optimal number of hidden neurons in BPNN is 8. Figure 14 presents the results of the training set predicted by the optimal BPNN model, compared with the measured results, showing that the BPNN-based constitutive model can accurately capture non-linear deviatoric stress–strain and void ratio–strain relationships.
Figure 15 presents the results of the predicted deviatoric stress–strain and void ratio–strain relationships for the interpolated test set. Because this study uses the recursive simulation strategy, prediction error accumulates gradually with increasing strain. The accumulated error is negligible up to strain of 20% for simulation of the deviatoric stress–strain relationship. By contrast, the predicted void ratio–strain curve gradually deviates from the accrual curve when strain exceeds 10%. Overall, the BPNN-based constitutive model better simulates the deviatoric stress–strain relationship, likely because the void ratio–strain relationship is more complicated than the deviatoric stress–strain relationship. Figure 15 shows that deviatoric stress increases monotonically with strain for all experimental tests, whereas the void ratio–strain relationship differs for different OCR values because of the dilatant behaviour associated with high OCR and the contractive behaviour associated with low OCR.
Figure 16 presents the predicted deviatoric stress–strain and void ratio–strain relationships for the extrapolated test set. Absent experimental results for OCR = 22.5, 25, 27.5 and 30, the reasonability of predicted curves is referred from the results for OCR = 20 and 50. For OCR = 50, a predicted deviatoric stress–strain curve using the BPNN-based constitutive model agrees well with the actual curve, although no experimental data are available in the training set beyond OCR = 20. Predicted curves for OCR = 22.5, 25, 27.5 and 30 also suggest reasonable trends (deviatoric stress increasing monotonically with increasing strain and peak deviatoric stress increasing with decreasing OCR), with all results falling into the range OCR = 20 and OCR = 50. However, the BPNN-based constitutive model’s predicted void ratio–strain curves obviously deviate from the actual curves when strain exceeds 5%. Overall, the BPNN-based constitutive model performs well (in terms of both interpolation and extrapolation) at simulating actual soil behaviour so long as datasets are sufficient. What’s more, the ability to obtain simple, explicit function can further extend the application of the BPNN-based constitutive model.
4.4 Comparison of different ML based models
To further compare the performance of BPNN for modelling soil behaviours with ELM and EPR, the latter two ML algorithms are also used to predict the behaviours of Kaolinite clays. It should be noted that the training, test datasets and also the modelling framework for ELM and EPR are consistent with that used in BPNN. Herein, the optimum number of hidden neurons in ELM based model is identified as 7, and the optimum transformed terms in EPR based model is identified as 8. For brevity, the process for determining the hyper-parameters of ELM and EPR are not presented in detail.
The predicted stress–strain relationships on the testing datasets using the optimum ELM-based model are presented in Fig. 17. It can be seen from Figs. 17a, b that the predicted results on the interpolated datasets show a good agreement with experimental results. In Figs. 17c, d, the strain softening and volumetric contraction are obviously observed from the predicted results on extrapolated datasets, which severely violates the measured results. Such factors indicate the ELM-based model can well describe the known soil behaviours, but may be not suitable to predict the soil behaviours on the unseen datasets. Figure 18 presents the predicted stress–strain relationships on the testing datasets using the optimum EPR-based model. Similar to the results presented in Sect. 3, the predicted error on both interpolated and extrapolated datasets are larger than that generated by BPNN- and ELM-based models. It indicates the generalization ability of EPR is inferior to the neural networks-based algorithms; thereby it has difficulty in modelling complicated soil behaviours. Overall, the generalization ability of BPNN is excellent and it can be used to simulate soil behaviours on both known and unknown datasets.
It should be noted that the ML-based model is a kind of data-driven model, thereby its application scope can be expanded as the type and information of datasets increase. For example, if the database involves data under unloading and different stress paths, the ML-based model can be well trained and used to simulate soil behaviours under such conditions. Otherwise, more effective physical mechanism needs to be added to refine the ML based model. Future work will focus on such issues.
5 Conclusions
Determination of soil constitutive models is vitally important to engineering practice. ML algorithms have been used to model soil behaviour, because ML-based constitutive models are free of assumptions and offer strong non-linear mapping capabilities. This study systematically demonstrated the application of ML algorithms for construction of a soil constitutive model, using three commonly used ML algorithms able to present an explicit formulation—BPNN, ELM and EPR—to develop models and comprehensively comparing their modelling performance.
A database based on a simple sand shear constitutive model was first built to objectively reveal the capacity of three ML algorithms and two modelling strategies to model soil behaviour, with the intent of eliminating potential interference from noise-corrupted experimental data. Although the ML algorithm can learn directly from data, an incremental stress–strain strategy able to take loading path into consideration was more suitable than the total stress–strain strategy for constructing the ML-based constitutive model. ML-based models’ hyper-parameters can be determined through the trial-and-error method, and the genetic algorithm can identify general hypermeters for developing the global optimum model. The application of k-fold cross validation can enhance the robustness of the ML-based model, facilitating the application of the ML-based model to engineering practice.
Simulation results on theoretical and experimental data indicated that the BPNN-based constitutive model is more stable and accurate, including for interpolation and extrapolation when modelling soil behaviour, than the ELM- and EPR-based constitutive models. Notably, the BPNN-based constitutive model’s predictions of deviatoric stress–strain and void ratio–strain relationships for Kaolinite clay agreed with actual experimental data. Overall, the ML-based constitutive model can directly capture non-linear soil behaviours based on limited experimental data without making any assumptions; what’s more, an explicit formulation for the constitutive model can be determined that guarantees its application to numerical analysis and engineering practice.
This study focuses only on the modelling of soil behaviours under monotonic loading. Besides, this study merely explores a general modelling framework for ML algorithms, and indicates that the neural network based algorithm is superior to other types of ML algorithms in modelling soil behaviours. As the rapid development in the ML field, more advanced algorithms have been proposed. Future works will investigate the performance of various neural network -based algorithms and determine the most appropriate one for modelling soil behaviours under more complex loading paths.
References
Badawy MF, Msekh MA, Hamdia KM, Steiner MK, Lahmer T, Rabczuk T (2017) Hybrid nonlinear surrogate models for fracture behavior of polymeric nanocomposites. Probabilist Eng Mech 50:64–75
Basheer IA (2000) Selection of methodology for neural network modeling of constitutive hystereses behavior of soils. Comput-Aided Civ Inf 15:440–458
Cabalar AF, Cevik A (2011) Triaxial behavior of sand–mica mixtures using genetic programming. Expert Syst Appl 38(8):10358–10367
Chang CS, Hicher PY (2005) An elasto-plastic model for granular materials with microstructural consideration. Int J Solids Struct 42(14):4258–4277
Chen RP, Zhang P, Kang X, Zhong ZQ, Liu Y, Wu HN (2019) Prediction of maximum surface settlement caused by EPB shield tunneling with ANN methods. Soils Found 59(2):284–295
Chen RP, Zhang P, Wu HN, Wang ZT, Zhong ZQ (2019) Prediction of shield tunneling-induced ground settlement using machine learning techniques. Front Struct Civ Eng 13(6):1363–1378
Ellis GW, Yao C, Zhao R, Penumadu D (1995) Stress-strain modeling of sands using artificial neural networks. J Geotech Eng 121(5):429–435
Faramarzi A, Javadi AA, Alani AM (2012) EPR-based material modelling of soils considering volume changes. Comput Geosci-UK 48:73–85
Feng XT, Chen BR, Yang CX, Zhou H, Ding X (2006) Identification of visco-elastic models for rocks using genetic programming coupled with the modified particle swarm optimization algorithm. Int J Rock Mech Min 43(5):789–801
Feng XT, Li SJ, Liao HJ, Yang CX (2002) Identification of non-linear stress-strain-time relationship of soils using genetic algorithm. Int J Numer Anal Met 26(8):815–830
Gajo A, Wood M (1999) Severn-Trent sand: a kinematic-hardening constitutive model: the q–p formulation. Geotechnique 49(5):595–614
Gao W, Ge M, Chen D, Wang X (2016) Back analysis for rock model surrounding underground roadways in coal mine based on black hole algorithm. Eng Comput-Germany 32(4):675–689
Gholamy A, Kreinovich V, Kosheleva O (2018) Why 70/30 or 80/20 relation between training and testing sets: a pedagogical explanation. Int J Intell Technol Appl Stat 11(2):105–111
Giustolisi O, Savic DA (2006) A symbolic data-driven technique based on evolutionary polynomial regression. J Hydroinform 8(4):235–237
Guo X, Dias D, Carvajal C, Peyras L, Breul P (2018) Reliability analysis of embankment dam sliding stability using the sparse polynomial chaos expansion. Eng Struct 174:295–307
Habibagahi G, Bamdad A (2003) A neural network framework for mechanical behavior of unsaturated soils. Can Geotech J 40(3):684–693
Hasanipanah M, Noorian-Bidgoli M, Jahed Armaghani D, Khamesi H (2016) Feasibility of PSO-ANN model for predicting surface settlement caused by tunneling. Eng Comput-Germany 32(4):705–715
Hattab M, Hicher PY (2004) Dilating behaviour of overconsolidated clay. Soils Found 44(2):27–40
He S, Li J (2009) Modeling nonlinear elastic behavior of reinforced soil using artificial neural networks. Appl Soft Comput 9(3):954–961
Holland JH. Adaptation in natural and artificial system: University of Michigan Press, 1975.
Hu X, Zhang Y, Guo L, Wang J, Cai Y, Fu H et al (2018) Cyclic behavior of saturated soft clay under stress path with bidirectional shear stresses. Soil Dyn Earthq Eng 104:319–328
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501
Iebeling K, Milton B (1996) Designing a neural network for forecasting financial and economic time series. Neurocomputing 10(3):215–236
Javadi AA, Rezania M (2009) Applications of artificial intelligence and data mining techniques in soil modeling. Geomech Eng 1(1):53–74
Jefferies M (1993) Nor-Sand: a simle critical state model for sand. Geotechnique 43(1):91–103
Jin Y-F, Wu Z-X, Yin Z-Y, Shen JS (2017) Estimation of critical state-related formula in advanced constitutive modeling of granular material. Acta Geotech 12(6):1329–1351
Jin Y-F, Yin Z-Y, Shen S-L, Hicher P-Y (2016) Investigation into MOGA for identifying parameters of a critical-state-based sand model and parameters correlation by factor analysis. Acta Geotech 11(5):1131–1145
Jin Y-F, Yin Z-Y, Shen S-L, Hicher P-Y (2016) Selection of sand models and identification of parameters using an enhanced genetic algorithm. Int J Numer Anal Met 40(8):1219–1240
Jin Y-F, Yin Z-Y, Wu Z-X, Daouadji A (2018) Numerical modeling of pile penetration in silica sands considering the effect of grain breakage. Finite Elem Anal Des 144:15–29
Jin Y-F, Yin Z-Y, Zhou W-H, Huang H-W (2019) Multi-objective optimization-based updating of predictions during excavation. Eng Appl Artif Intel 78:102–123
Jin YF, Yin ZY, Wu ZX, Zhou WH (2018) Identifying parameters of easily crushable sand and application to offshore pile driving. Ocean Eng 154:416–429
Johari A, Javadi AA, Habibagahi G (2011) Modelling the mechanical behaviour of unsaturated soils using a genetic algorithm-based neural network. Comput Geotech 38(1):2–13
Kim J, Hwang W, Kim Y (2018) Effects of hysteresis on hydro-mechanical behavior of unsaturated soil. Eng Geol 245:1–9
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. International joint conference on artificial intelligence: Morgan Kaufmann Publishers Inc., 1995. p. 1137-1143
Kohestani VR, Hassanlourad M (2016) Modeling the mechanical behavior of carbonate sands using artificial neural networks and support vector machines. Int J Geomech 16(1):04015038
Kolymbas D. A generalized hypoelastic constitutive law. Proc XI Int Conf Soil Mechanics and Foundation Engineering. San Francisco: Balkema, Rotterdam, 1985. p. 2626.
Koopialipoor M, Jahed Armaghani D, Haghighi M, Ghaleini EN (2017) A neuro-genetic predictive model to approximate overbreak induced by drilling and blasting operation in tunnels. B Eng Geol Environ:1–10
Lefik M, Schrefler B (2003) Artificial neural network as an incremental non-linear constitutive model for a finite element code. Comput Methods Appl Mech Eng 192(28–30):3265–3283
Liu C, Jiang Z, Han X, Zhou W (2019) Slope displacement prediction using sequential intelligent computing algorithms. Measurement 134:634–648
Liu K, Liu B (2019) Intelligent information-based construction in tunnel engineering based on the GA and CCGPR coupled algorithm. Tunnell Undergr Space Technol 88:113–128
Liu WZ, Shi ML, Miao LC, Xu LR, Zhang DW (2013) Constitutive modeling of the destructuration and anisotropy of natural soft clay. Comput Geotech 51:24–41
Mašín D (2005) A hypoplastic constitutive model for clays. Int J Numer Anal Methods Geomech 29(4):311–336
Masters T (1994) Practical neural network recipes in C++. Academic Press
Müthing N, Zhao C, Hölter R, Schanz T (2018) Settlement prediction for an embankment on soft clay. Comput Geotech 93:87–103
Nassr A, Esmaeili-Falak M, Katebi H, Javadi A (2018) A new approach to modeling the behavior of frozen soils. Eng Geol 246:82–90
Ng CWW, Akinniyi DB, Zhou C, Chiu CF (2019) Comparisons of weathered lateritic, granitic and volcanic soils: Compressibility and shear strength. Eng Geol 249:235–240
Nielsen RH (1987) Kolmogorov’s mapping neural network existence theorem. In: Proceedings of the IEEE first international conference on neural networks, San Diego, CA, USA, pp 11–13
Paola JD (1994) Neural network classification of multispectral imagery. The University of Arizona
Penumadu D, Zhao RD (1999) Triaxial compression behavior of sand and gravel using artificial neural networks (ANN). Comput Geotech 24:207–230
Prechelt L (1998) Automatic early stopping using cross validation quantifying the criteria. Neural Netw 11:761–767
Rashidian V, Hassanlourad M (2014) Application of an artificial neural network for modeling the mechanical behavior of carbonate soils. Int J Geomech 14(1):142–150
Romo MP, García SR, Mendoza MJ, Taboada-Urtuzuástegui V (2001) Recurrent and constructive-algorithm networks for sand behavior modeling. Int J Geomech 1(4):371–387
Roscoe KH, Burland J. On the generalized stress-strain behaviour of wet clay. Engineering Plasticity. Cambridge, UK: Cambridge University Press, 1968. p. 535–609.
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(9):533–536
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc C-appl 36(2):111–147
Su D, Yang ZX (2019) Drained analyses of cylindrical cavity expansion in sand incorporating a bounding-surface model with state-dependent dilatancy. Appl Math Model 68:1–20
Su L-J, Yin J-H, Zhou W-H (2010) Influences of overburden pressure and soil dilation on soil nail pull-out resistance. Comput Geotech 37(4):555–564
Taiebat M, Dafalias YF (2008) SANISAND: Simple anisotropic sand plasticity model. Int J Numer Anal Met 32(8):915–948
Tian Y, Yao YP (2017) Modelling the non-coaxiality of soils from the view of cross-anisotropy. Comput Geotech 86:219–229
Tran C, Srokosz P (2010) The idea of PGA stream computations for soil slope stability evaluation. Cr Mecanique 338(9):499–509
Turk G, Logar J, Majes B (2001) Modelling soil behaviour in uniaxial strain conditions by neural networks. Adv Eng Softw 32:805–812
Vermeer P (1978) A double hardening model for sand. Geotechnique 28(4):413–433
Wang C (1994) A theory of generalization in learning machines with neural application. The University of Pennsylvania
Wang S, Wu W, Yin Z-Y, Peng C, He X-Z (2018) Modelling time-dependent behaviour of granular material with hypoplasticity. Int J Numer Anal Methods Geomech 42(12):1331–1345
Wu W, Bauer E, Kolymbas D (1996) Hypoplastic constitutive model with critical state for granular materials. Mech Mater 23(1):45–69
Xie X, Qi S, Zhao F, Wang D (2018) Creep behavior and the microstructural evolution of loess-like soil from Xi’an area, China. Eng Geol 236:43–59
Xiong H, Nicot F, Yin Z (2017) A three-dimensional micromechanically based model. Int J Numer Anal Methods Geomech 41(17):1669–1686
Yao Y, Hou W, Zhou A (2009) UH model: three-dimensional unified hardening model for overconsolidated clays. Geotechnique 59(5):451–469
Yao Y, Sun D, Luo T (2004) A critical state model for sands dependent on stress and density. Int J Numer Anal Methods Geomech 28(4):323–337
Yao Y, Sun D, Matsuoka H (2008) A unified constitutive model for both clay and sand with hardening parameter independent on stress path. Comput Geotech 35(2):210–222
Yao YP, Hou W, Zhou AN (2009) UH model: three-dimensional unified hardening model for overconsolidated clays. Géotechnique 59(5):451–469
Yin Z-Y, Chang CS, Karstunen M, Hicher P-Y (2010) An anisotropic elastic–viscoplastic model for soft clays. Int J Solids Struct 47(5):665–677
Yin Z-Y, Jin Y-F, Shen S-L, Huang H-W (2016) An efficient optimization method for identifying parameters of soft structured clay by an enhanced genetic algorithm and elastic–viscoplastic model. Acta Geotech 12(4):849–867
Yin Z-Y, Karstunen M (2011) Modelling strain-rate-dependency of natural soft clays combined with anisotropy and destructuration. Acta Mech Solida Sin 24(3):216–230
Yin Z-Y, Karstunen M, Chang CS, Koskinen M, Lojander M (2011) Modeling time-dependent behavior of soft sensitive clay. J Geotech Geoenviron Eng 137(11):1103–1113
Yin Z-Y, Zhao J, Hicher P-Y (2014) A micromechanics-based model for sand-silt mixtures. Int J Solids Struct 51(6):1350–1363
Yin ZY, Chang CS (2009) Microstructural modelling of stress-dependent behaviour of clay. Int J Solids Struct 46(6):1373–1388
Yin ZY, Chang CS, Hicher PY (2010) Micromechanical modelling for effect of inherent anisotropy on cyclic behaviour of sand. Int J Solids Struct 47(14–15):1933–1951
Yin ZY, Chang CS, Hicher PY, Karstunen M (2009) Micromechanical analysis of kinematic hardening in natural clay. Int J Plast 25(8):1413–1435
Yin ZY, Chang CS, Karstunen M, Hicher PY (2010) An anisotropic elastic-viscoplastic model for soft clays. Int J Solids Struct 47(5):665–677
Yin ZY, Jin YF, S SJ, Hicher PY (2017) Optimization techniques for identifying soil parameters in geotechnical engineering: comparative study and enhancement. Int J Numer Anal Met 42(1):1-25
Yu H (1998) CASM: a unified state parameter model for clay and sand. Int J Numer Anal Methods Geomech 22(8):621–653
Zhang P, Wu H-N, Chen R-P, Chan THT (2020) Hybrid meta-heuristic and machine learning algorithms for tunneling-induced settlement prediction: a comparative study. Tunnell Undergr Space Technol 99:103383
Zhang P, Yin Z-Y, Jin Y-F, Chan THT (2020) A novel hybrid surrogate intelligent model for creep index prediction based on particle swarm optimization and random forest. Eng Geol 265:105328
Zhang P, Yin Z-Y, Zheng Y, Gao F-P (2020) A LSTM surrogate modelling approach for caisson foundations. Ocean Eng 204:107263
Zhang P, Yin ZY, Jin YF (2021) State-of-the-art review of machine learning applications in constitutive modeling of soils. Arch Comput Method Eng. https://doi.org/10.1007/s11831-020-09524-z
Zhang P, Yin ZY, Jin YF, Chan T, Gao FP (2021) Intelligent modelling of clay compressibility using hybrid meta-heuristic and machine learning algorithms. Geosci Front 12(1):441–452
Zhang P, Yin ZY, Jin YF, Ye GL (2020) An AI-based model for describing cyclic characteristics of granular materials. Int J Numer Anal Met 44(9):1315–1335
Zhou YF, Tham LG, Yan WM, Dai FC, Xu L (2014) Laboratory study on soil behavior in loess slope subjected to infiltration. Eng Geol 183:31–38
Zhu JH, Zaman MM, Anderson SA (1998) Modelling of shearing behaviour of a residual soil with Recurrent Neural Network. Int J Numer Anal Met 22(8):671–687
Acknowledgements
This research was financially supported by the Research Grants Council (RGC) of Hong Kong Special Administrative Region Government (HKSARG) of China (Grant No.: 15209119, R5037-18F); Joint research project between SiChuan Province and National Universities funded by Science & Technology Department of Sichuan Province (No.2019YFSY0015), and Open research grant of MOE Key Laboratory of High-speed Railway Engineering.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A. Formulations of BPNN-based Kaolinite constitutive models
Appendix A. Formulations of BPNN-based Kaolinite constitutive models
where, X = [p, q, e, ε1, Δε1], matrix of input variables; H = matrix of the hidden layer output; O = [q, e], matrix of output variables; f = tansig formulation; g = purlin formulation. Herein,
Rights and permissions
About this article
Cite this article
Zhang, P., Yin, ZY., Jin, YF. et al. Modelling the mechanical behaviour of soils using machine learning algorithms with explicit formulations. Acta Geotech. 17, 1403–1422 (2022). https://doi.org/10.1007/s11440-021-01170-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11440-021-01170-4