1 Introduction

Experimental investigations show that the mechanical behaviour of soils is very complicated, involving elements such as state-dependence [56], contraction-dilation [57], anisotropy [72], destructuration [41, 74], stress-path dependence [21], time-dependence [75], and non-coaxiality [59]. Accurate description of such soil behaviours is vitally important in engineering practice [33, 46, 66, 89]. Numerous constitutive models have been developed during the past few decades. These models can be classified as (1) linear-elastic, (2) elastic perfectly plastic (such as the Mohr–Coulomb model), (3) nonlinear (such as the hardening soil [62] and nonlinear Mohr–Coulomb [28] models, (4) critical state–based advanced (such as the modified cam-clay model [53], Nor-Sand model [25], CSAM model [82], Severn–Trent model [11], UH models [68,63,70], SANISAND model [58], SIMSAND model [26,25,28] and ANICREEP model [80]), hypoplasticity [36, 42, 64, 65] and (5) micromechanical models [4, 67, 76,71,72,79]. The last two categories are usually called advanced soil models [28, 80]. However, traditional soil models have three main disadvantages in modelling soil behaviours: (1) most constitutive models were developed based on certain assumptions [71, 72, 75] (e.g., the associated or non-associated flow rule, non-coaxiality), (2) each model was suitable only for a specific type of soil or specific stress-paths and (3) although the mathematical formulas in a constitutive model are developed based on some theories (e.g., elastoplasticity theory) or derived from finite experimental data (e.g., the critical state line from triaxial tests), the formula’s form gives good accuracy for selected tests, but at the same time limits the model’s simulation ability for other stress paths. For example, the Modified Cam-Clay (MCC) was derived from the triaxial tests of saturated remoulded clay, and thus the MCC model is difficult to predict other kind of tests or other soils. In addition, the mathematical formulas become increasingly complicated when involving many parameters, resulting in difficulties of parameter identification and further limiting their engineering applications.

Soil normally exhibits highly nonlinear characteristics. To simulate such characteristics, machine learning (ML) algorithms are very powerful and can thus be employed as an alternative way to construct data-driven constitutive models [88]. ML algorithms have three following advantages in developing soil models [86]: (1) ML algorithms can directly extract the stress–strain relationship from the experimental data without making any assumptions [9, 10, 12]. More stable and accurate results can be obtained by ML-based models if the physical mechanism is implied in training data and/or incorporated into the training process; (2) ML algorithms have a strong ability to capture complicated non-linear relationships [1, 5, 6, 17] and (3) the prediction accuracy of ML-based models can rise with the increasing datasets [83, 87]. Numerous ML-based soil models have already been developed, and they can be categorized according to the model’s training strategy, whether (1) training models using the total values of stress and strain or (2) training models in incremental form [38]. However, up to now there is no comparative study to discuss which one is more suitable to develop ML based model for describing soil behaviours. Accordingly, the performance of two stress–strain strategies in developing ML-based constitutive models deserves investigation.

To construct a ML-based soil model, myriad ML algorithms can be adopted, such as a back-propagation neural network (BPNN) [2, 16, 19, 49, 51, 61], evolutionary neural network (ENN) [32], recurrent neural network (RNN) [52, 90], support vector machines (SVMs) [35], evolutionary polynomial regression (EPR) [8, 24, 45] and genetic programming (GP) [3]. To find an ML algorithm that efficiently models soils’ stress–strain relationship, a comparison of performance of different ML algorithms is demanding. Furthermore, the performance of an ML-based constitutive model is usually evaluated by the testing data within the range of the training data (interpolation ability), but this strategy neglects its performance on the unseen data (extrapolation ability).

This study aims to comprehensively demonstrate the process of constructing a ML-based constitutive model. To this end, three representative ML algorithms that can give explicit expression—BPNN, extreme learning machine (ELM) and EPR—were selected. The k-fold cross-validation method was employed in the validation phase to enhance the robustness of ML-based constitutive models. A genetic algorithm (GA) was used to optimize parameters for developing the global optimum model. A synthetic database based on a simple shear soil constitutive model was first built, which focuses on revealing the real capabilities of BPNN, ELM and EPR to model soil behaviours, including interpolation and extrapolation abilities and the effects of the total and incremental stress–strain strategies. Thereafter, the optimum ML algorithm and modelling strategy were further applied to the experimental tests for examining its robustness.

2 Methodology of machine learning

2.1 Back-propagation neural network

In this study, the BPNN denotes a feedforward neural network characterized by propagation of errors from the output layer to find a set of weights and biases able to ensure that the output of the network is identical to the actual value [54]. A BPNN includes an input layer, any number of hidden layers and an output layer, which also determine its performance. Based on a given framework, the purpose of other hyper-parameters such as activation function is to further improve the training efficiency or optimize the model. Considering that this study focuses on simulating mechanical behaviours of soils, the deep investigation regarding the effect of each hyper-parameter on the model performance is not conducted. Herein, the optimum framework of the BPNN-based model is carefully investigated, whereas remaining hyper-parameters are set as the default value in Matlab toolbox. Once the hyper-parameters are determined, weighting and bias values can be calculated by gradient descent or optimization algorithms. Figure 1a illustrates a typical BPNN with one hidden layer. Taking the numbers of inputs and hidden and output neurons to be r, p and q, respectively, and assuming that there are n datasets in the training set, the output of the hidden and output layers can be expressed as

$${\mathbf{H}} = f\left( {{\mathbf{WX}} + {\varvec\theta }} \right)$$
(1)
$${\mathbf{O}} = g\left( {{\mathbf{VH}} + {\varvec\theta }_{\rm o} } \right)$$
(2)

where X = matrix of input variables (r × n); H = matrix of the hidden layer output (p × n); O = matrix of output variables (q × n); W, V = weights matrix on the connections between input and hidden neurons (p × r) and between hidden and output neurons (q × p), respectively; θ, θo = bias vectors on the connections between input and hidden neurons (p × 1) and between hidden and output neurons (q × 1), respectively; and f, g = activation functions in hidden and output layers, respectively.

Fig. 1
figure 1

Schematic view of ML algorithms: (a) BPNN; (b) ELM

2.2 Extreme learning machine

The ELM is a type of feedforward neural network characterized by a single hidden layer (see Fig. 1b). The hyper-parameters in the ELM are equal to  the number of hidden neurons. The weights of the input layer and the biases of the hidden layer are assigned randomly, and the weights of the hidden layer (β) are determined analytically through a simple generalized inverse operation of the hidden layer output matrix [22], as shown in Eqs. (3)–(4), making the ELM’s learning speed thousands of times faster than seen in traditional feedforward networks:

$${\mathbf{H}} = f\left( {{\mathbf{WX}} + {\varvec \theta }} \right)$$
(3)
$$\mathop{\min}\limits_{{\boldsymbol{\upbeta}}} \| {\mathbf{H}}{\boldsymbol{\upbeta} } - {\mathbf{O}} \|$$
(4)

where X = matrix of input variables (r × n), H = matrix of the hidden layer output (p × n), O = matrix of output variables (q × n), W = weights matrix connecting input and hidden neurons (p × r), θ = the bias vector connecting input and hidden neurons (p × 1), β = the weight matrix connecting the hidden and the output layers (q × p) and f = the activation function in the hidden layer.

2.3 Evolutionary polynomial regression

EPR is a genetic programming method characterized by the modelling of a system using a mathematical expression in the form of polynomial structures. Constructing an EPR-based model consists of two phases: (1) structure identification and (2) parameter estimation [14]. During the first phase, optimization algorithms are used to search for symbolic structures—that is, to determine the exponent matrix. At the second phase, the parameters’ values are estimated by solving a least squares (LS) linear problem. Compared with BPNN and ELM, the training set in the EPR does not require normalization. A typical EPR expression can be formulated as

$${\varvec{y}} = \sum\limits_{j = 1}^{m} {F\left[ {{\mathbf{X}},f_{j} \left( {\mathbf{X}} \right),a_{j} } \right]} + a_{0}$$
(5)

where y = predicted output, X = matrix of input variables, F = a function constructed by the process, fj (X) = jth transformed variable, aj = an adjustable parameter for the jth term and a0 = an optional bias. fj (X) is determined by the optimization algorithm, and aj and a0 are determined by the LS.

The EPR’s key objective is to identify the number of transformed variables and a combination of vectors of independent input variables. Herein, the transformed variable is obtained via

$$f_{j} \left( {\mathbf{X}} \right) = x_{1}^{{{\mathbf{ES}}\left( {j,1} \right)}} \cdot {\text{K}} \cdot x_{i}^{{{\mathbf{ES}}\left( {j,i} \right)}} \cdot {\text{K}} \cdot x_{k}^{{{\mathbf{ES}}\left( {j,k} \right)}}$$
(6)

where xi = ith input variable, k = a total number of input variables and ESm×k = exponent matrix.

2.4 Genetic algorithm

GA is a meta-heuristic optimization algorithm inspired by natural evolution [20]. It has been extensively employed in geotechnical engineering for tasks such as identification of constitutive models’ parameters [26, 28, 73, 81], model selection [27], slope [39, 60], embankment [15, 44], tunnelling [37, 40], pile foundation [29, 31] and excavation [30]. In this study, the GA was selected to optimize initial weights and biases in BPNN and ELM algorithms and to search for symbolic structures in EPR. In GA, a population of individuals is first generated. A chromosome based on a coding scheme (real-coded GA) is then employed to represent each individual. After calculating the loss value of each individual, the best individual having the lowest loss value in the population is selected and then evolves through crossover and mutation operations to generate a new population. The process continues until it satisfies the termination criterion, that is, whether it reaches the maximum generation. Meanwhile, the loss value converges at a constant value.

2.5 K-fold cross-validation

Three phases are involved in the integrated process of constructing a ML model: training, validation and testing. The validation phase seeks to improve the robustness of the model and avoid overfitting. The k-fold cross validation can detect whether the overfitting issue exists after the training of model is completed. Moreover, the cross validation can also prevent the overfitting by integrating such method into the training process as the loss function [50] which is a commonly used method in the data-mining field. Currently, the k-fold cross-validation (CV) method is widely used to validate models [55]. In this method, the original training set is randomly divided into k sub-datasets. Herein, k–1 sub-datasets, which form a new sub-training set, are employed to train models, and the performance of the trained model is validated by the remaining sub-dataset. Each sample in the training set thus has an opportunity to train and validate models. k is generally set as 10 [34], thereby 10-fold CV method was used in this study.

At each round, the ML model with a fixed set of hyper-parameters was trained ten times based on nine sub-training sets, thereafter the performance of this ML model was evaluated by the mean squared error (MSE) for the remaining sub-dataset. Therefore, the loss function in the GA can be expressed as

$${\text{MSE}} = \frac{{\sum\nolimits_{i = 1}^{m} {\left( {\overline{y}_{i} - y_{i} } \right)^{2} } }}{km}$$
(7)

where \({\overline{y}_{i}}\) = predicted output, yi = actual output, m = the number of datasets in the remaining sub-dataset and k = the number of CV sets.

2.6 Evaluation indicators

Two commonly used evaluation indicators—mean absolute error (MAE) and mean absolute percentage error (MAPE)—were used to evaluate the performance of ML models in this study. The combination of MAE and MAPE helps overcome the deficiencies of both, so that both are used extensively to evaluate model performance [5, 84, 85]. Low values of these two indicators indicate that a model has excellent performance. The expressions of MAE and MAPE can be obtained by

$${\text{MAE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {r_{i} - p_{i} } \right|}$$
(8)
$${\text{MAPE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\frac{{r_{i} - p_{i} }}{{r_{i} }}} \right|} \times 100\%$$
(9)

where r = actual output value, p = predicted output value and n = the total number of datasets.

2.7 Model framework

Figure 2a presents the flowchart for constructing a ML-based constitutive model. This type of data-driven model starts from the collection of datasets with which to form a database. In the ML domain, 80% (for training the model) and 20% (for testing the model) is a widely acknowledged scheme for data split ratio in the community. Such separation ratio can ensure the ML-based model being well trained and tested, which has been theoretically proved [13]. Therefore, 80% are used to train the model and 20% are used to test it in this study. The total or incremental stress–strain strategy is selected beforehand; thereafter, the corresponding features or input variables can be determined. At the next step, the 10-fold cross-validation method is used to divide the training set into ten subsets for training and validating models. At each round, GA is employed to identify the general parameters of ML algorithms. The hyper-parameters are determined by the trial-and-error method. After determining three optimal constitutive models based on BPNN, ELM and EPR, their performance is compared using the test set.

Fig. 2
figure 2

Model framework: (a) flowchart of constructing ML-based constitutive models; (b) schematic view of the total stress–strain strategy; (c) schematic view of the incremental stress–strain strategy

Figures 2b, c illustrate the schematic view of the total and incremental stress–strain strategy, respectively. In the total stress–strain strategy, the stress in the ith step is affected by the strain at the ith step and the physical parameters (see Eq. (10)). In the incremental stress–strain strategy, the stress at the ith step is affected by the strain, the stress at the (i–1)th step, the strain increment at the ith step and the physical parameters (see Eq. (11)):

$$\sigma^{i} = f\left( {{\text{X}},\varepsilon^{i} } \right)$$
(10)
$$\sigma^{i} = f\left( {{\text{X}},\sigma^{i - 1} ,\varepsilon^{i - 1} ,\Delta \varepsilon^{i} } \right)$$
(11)

where X = [x1, x2,…, xr], the vector of independent variables; σi, σi−1 = stress at the ith and (i–1)th steps; εi, εi–1 = strain at the ith and (i–1)th steps; Δεi = axial strain increment at the ith step; and f = formulation of stress–strain relationship, as determined by the ML algorithms in this study.

It should be noted that in the incremental stress–strain strategy, the predicted stress at the ith step needs to update the input stress variable in real time to predict the stress at the (i + 1)th step. In addition, the strain ε at the (i + 1)th step is updated by the following:

$$\varepsilon^{i}\, { = }\,\varepsilon^{i - 1}\, { + }\,\Delta \varepsilon^{i}$$
(12)

To eliminate the effect of scales of parameters on the model performance and improve convergence, all datasets need to be pre-processed. Herein, the independent variables (such as σn0 and p) only have their initial values keeping constant, and the strain and strain increment are manually pre-set, which comply with uniform distribution. Considering the distributions of all variables are different from each other and do not conform to Gaussian distribution, Minmax normalization method instead of standardization method is used in this study, as shown in follows:

$$x_{norm} = \frac{{x - x_{\min } }}{{x_{\max } - x_{\min } }}\left( {\overline{x}_{\max } - \overline{x}_{\min } } \right) + \overline{x}_{\min }$$
(13)

where x = actual value of input variables, xmin = minimum value of input variables and xmax = maximum value of input variables. \({\overline{x}_{min}}\) = –1; \({\overline{x}_{max}}\) = 1.

3 ML–based constitutive models using synthetic data

3.1 Synthetic data by a simple soil model

To comprehensively compare the performance of three ML algorithms and two modelling strategies on developing constitutive models, a simple sand shear constitutive model was first used to generate synthetic datasets (see Eq. (14)). The purpose of ML-based constitutive models developed based on synthetic datasets rather than directly based on the experimental data is to eliminate the interference of experimental and measurement errors on the mapping capability of ML algorithms [88]. Moreover, the experimental data tend to be limited and insufficient for comparison of ML algorithms’ performance, whereas the data can be generated infinitely by a theoretical function.

$$\tau = \sigma_{n0} \frac{\mu \gamma }{{1/G + \gamma }}$$
(14)

where σn0 = vertical stress; τ = shear stress; γ = shear strain; G = shear modulus, 1000 kPa; and μ = friction angle, tan(π/6).

A total of fourteen curves were generated to develop the ML-based constitutive model. Herein, the axial strain γ ranges from 0 to 10%, and a fixed set of axial strain increment Δγ, including 0.01%, 0.05%, 0.1%, 0.15% and 0.2%, was chosen for ten curves. Each curve consists of 91 data points. Nine curves (σn0 = 25, 50, 100, 200, 250, 300, 400, 500 and 600 kPa) with a total of 819 data points were employed to train the ML-based constitutive model, and the remaining five curves (σn0 = 15, 150, 350, 650 and 700 kPa) were used to test the model.

According to the stress–strain strategy, as mentioned in Eqs. (11)–(12), the vector X of independent variables in this soil model is σn0. As a result, the total and incremental stress–strain strategies have two and four input variables, respectively. Both have an output variable. The corresponding total and incremental stress–strain strategy can be expressed by

$$\tau^{i} = f\left( {\sigma_{n0} ,\gamma^{i} } \right)$$
(15)
$$\tau^{i} = f\left( {\sigma_{n0} ,\tau^{i - 1} ,\gamma^{i - 1} ,\Delta \gamma^{i} } \right)$$
(16)

where the definitions of τ, γ and Δγ are similar to those of σ, ε and Δε in Eqs. (11)–(12).

3.2 Determination of parameters in ML algorithms

The parameters to be determined in the ML algorithms include hyper-parameters and general parameters. The search space of the hyper-parameters regarding the framework of ML are presented in Table 2. A single-layer BPNN was used to construct constitutive models, which is sufficient to capture the stress–strain relationship. Table 3 summarizes several methods of determining the optimal number of hidden neurons. The optimal number of hidden neurons ranged from one to five in the total stress–strain strategy and from one to ten in the incremental stress–strain strategy. Because there is no method for determining the optimal number of neurons and transformed terms in the ELM and EPR, respectively, the ranges of hidden neurons and transformed terms in these two algorithms increase continuously until the number of hidden neurons and transformed terms cannot improve the model’s performance. In this way the ranges of hyper-parameters in three ML algorithms can be determined, as shown in Table 2.

Table 1 Previous research works for identifying constitutive models of geomaterials**
Table 2 Hyper-parameters regarding model framework in three selected ML algorithm
Table 3 Methods for determining the number of hidden neuron

In addition to the hyper-parameters, the initial weights and biases in BPNN and ELM as well as the exponent matrix in the EPR were determined using the GA. Note that the values of exponents must be non-negative in the EPR algorithm, because the datasets include the initial stress–strain stage (0, 0); indeed, negative exponents are wrong under this condition. The values of exponents were thus limited to [0, 1, 2, 3]. Table 4 presents the parameter values in GA. Note that BPNN and EPR are set to a maximum of 500 generations, whereas for the ELM, because of its different convergence rate, the figure is 5000.

Table 4 Values of parameters in the GA algorithm

3.3 Results of the validation set

Figure 3 presents the evolution of loss value generated by three types of ML-based constitutive models using the total stress–strain strategy. It can be clearly observed that the convergence rates of BPNN and EPR are much faster than that of ELM. The loss value roughly holds steady when the generation exceeds 350 and 200 in BPNN and EPR, respectively, whereas the loss value remains roughly constant when the generation reaches 4000 in ELM. From the perspective of the convergent loss value, as shown in Fig. 3, the optimal number of hidden neurons in BPNN and ELM are four and eight, respectively, and the optimal number of transformed terms in EPR is eleven.

Fig. 3
figure 3

Evolution of loss value using the total stress–strain strategy for: (a) BPNN; (b) ELM; (c) EPR

The evolution of loss value generated by three ML algorithms using the incremental stress–strain strategy is shown in Fig. 4. Overall, the convergence rate in three types of ML-based constitutive models using the incremental stress–strain strategy is faster than that using the total stress–strain strategy. The loss value roughly holds steady when the generation reaches 250, 1000 and 100 in the BPNN, ELM and EPR, respectively. From the perspective of the convergent loss value, as shown in Fig. 4, the optimal numbers of hidden neurons in BPNN and ELM are four and ten, respectively, and the optimal number of transformed terms in EPR is eleven. Note that the optimal loss values are much less than those yielded using the total stress–strain strategy.

Fig. 4
figure 4

Evolution of loss value using the incremental stress–strain strategy for: (a) BPNN; (b) ELM; (c) EPR

Another important hyper-parameter used in ELM and BPNN is the activation function. For comprehensively comparing the performance of BPNN and ELM, the optimum activation functions used in each algorithm should be determined. The commonly used five activation functions are applied, as shown in Eq. [18]. Figure 5 presents the evolution of loss values generated by BPNN and ELM based models (incremental strategy) with five activation functions. It can be observed that the tanh is the optimum activation functions in the hidden layers for both BPNN- and ELM-based models, thereby the tanh is used as the activation function in the following study.

$$\left\{ {\begin{aligned} &{{\text{sigmoid}}\left( x \right) = \frac{1}{1 + e^{ - x} }} \\ &{{\text{tanh}}\left( x \right) = \frac{{e^{x} - e^{ - x} }}{{e^{x} + e^{ - x} }}} \\ &{{\text{ReLU}}\left( x \right) = \left\{ \begin{gathered} x, \, x > 0 \hfill \\ 0, \, x \le 0 \hfill \\ \end{gathered} \right.} \hfill \\& {{\text{ELU}}\left( x \right) = \left\{ \begin{gathered} x, \, x > 0 \hfill \\ \alpha \left( {e^{x} - 1} \right), \, x \le 0 \hfill \\ \end{gathered} \right.} \hfill \\& {{\text{Swish}}(x) = \frac{x}{{1 + e^{ - x} }}} \hfill \\ \end{aligned} } \right.$$
(17)
Fig. 5
figure 5

Evolution of loss value generated by five activation functions in: (a) BPNN; (b) ELM

3.4 Results of the training set

The optimal hyper-parameters of the three ML algorithms using two stress–strain strategies are determined as heretofore mentioned. Accordingly, three optimal ML-based constitutive models of each stress–strain strategy are constructed based on the training set. Figure 6 presents the predicted stress–strain curves using three optimally trained models on the basis of the total stress–strain strategy, compared with the measured curves. It is clear that the predicted results of BPNN show perfect agreement with the measured curves, whereas the results predicted by ELM and EPR deviate from the measured curves. In particular, the prediction error for the ELM- and EPR-based constitutive models is much larger at the initial stage (0, 0), and these models also yield a large error at the early stage of stress–strain curves, attributable to the principles of these two algorithms as described in Sect. 3.3.

Fig. 6
figure 6

Predicted results on the training set using the total stress–strain strategy for: (a) BPNN; (b) ELM; (c) EPR

Figure 7 presents the predicted stress–strain curves using three optimal models based on the incremental stress–strain strategy, compared with the measured curves. It can be seen that all three models can accurately capture the stress–strain curves, which indicates that the incremental stress–strain strategy for simulating stress–strain relationship shows a significant improvement.

Fig. 7
figure 7

Predicted results on the training set using the incremental stress–strain strategy for: (a) BPNN; (b) ELM; (c) EPR

3.5 Results of the test set

During the last phase, the performance of the ML-based constitutive model is evaluated against the test set, with σn0 in the training set ranging from 50 to 600 kPa. Generally, test datasets are taken within the range of training datasets, so that test sets with σn0 = 150 and 350 kPa are taken into consideration. To investigate the ability of the ML-based constitutive model to extrapolate beyond the range of training datasets, test sets for which σn0 = 15, 650 and 700 kPa are also conducted. Table 5 summarizes the values of indicators for these five test sets. For the interpolated test sets, Fig. 8 presents the results of simulation using three optimal ML-based constitutive models based on the total stress–strain strategy. The predicted stress–strain curve using BPNN largely agrees with the measured curve, and the corresponding MAE and MAPE values are also lower than those produced by ELM and EPR. Notably, ELM and EPR cannot accurately predict initial stress when strain equals zero, and ELM- and EPR-based constitutive models cannot accurately predict the evolution of stress. Figure 9 presents the results of simulation using three optimal ML models based on the incremental stress–strain strategy for the test set. The predicted stress–strain curves using the BPNN-based constitutive model still agree perfectly with the measured curves and outperform the ELM- and EPR-based constitutive models. The performance of the ELM-based constitutive model is better than that of the total stress–strain strategy, and it also accurately captures the evolution of stress. Nevertheless, the change in the performance of the EPR-based constitutive model is different from others, perfectly predicting the stress–strain relationship for σn0 = 350 but exhibiting worse performance at predicting the stress–strain relationship for σn0 = 150. Note that prediction performance at the initial stage of ELM- and EPR-based constitutive models is clearly improved from that seen with the total stress–strain strategy. Overall, ML-based constitutive models that use the incremental stress–strain strategy offer reliable performance for interpolated test sets, and a BPNN-based constitutive model exhibits the best performance.

Table 5 Values of indicators for the test set
Fig. 8
figure 8

Predicted results on the test set (interpolation) using the total stress–strain strategy: (a) BPNN; (b) ELM; (c) EPR

Fig. 9
figure 9

Predicted results on the test set (interpolation) using the incremental stress–strain strategy: (a) BPNN; (b) ELM; (c) EPR

The extrapolated test sets are used to further examine the generalization ability of ML-based models. Figure 10 presents the results of simulation using three optimal ML-based constitutive models based on the total stress–strain strategy. For σn0 = 650 and 700 kPa, three ML-based constitutive models can still capture the stress–strain relationship. The BPNN-based constitutive model performs perfectly, followed by the EPR and ELM-based constitutive models. However, for σn0 = 15 kPa, the predicted stress–strain curve by the BPNN-based constitutive model deviates from the actual stress–strain curve. The results of simulation using three optimal ML-based constitutive models based on the incremental stress–strain strategy are shown in Fig. 11. In addition to stress–strain curves for σn0 = 650 and 700 kPa, it can be observed that the BPNN-based constitutive model’s ability to predict the stress–strain relationship for σn0 = 15 kPa improves significantly. Meanwhile, the performance of the ELM-based constitutive model improves dramatically with lower MAE and MAPE values, whereas the prediction performance of EPR-based constitutive model decreases.

Fig. 10
figure 10

Predicted results on the test set (extrapolation) using the total stress–strain strategy: (a) BPNN; (b) ELM; (c) EPR

Fig. 11
figure 11

Predicted results on the test set (extrapolation) using the incremental stress–strain strategy: (a) BPNN; (b) ELM; (c) EPR

Overall, ML-based constitutive models are better at predicting stress–strain relationships within the range of the training datasets than at extrapolating beyond the range of the training datasets. ML-based constitutive models developed using the incremental stress–strain strategy outperform those developed using the total stress–strain strategy. A BPNN-based constitutive model developed using the incremental stress–strain strategy is thus recommended for describing the stress–strain relationship, because this model makes highly accurate predictions  of the stress–strain relationship for the interpolated and extrapolated test sets.

4 ML–based constitutive models using real data

4.1 Database

To investigate ML-based constitutive models’ ability to predict soil behaviour in engineering practice, this study uses datasets from twelve sets of triaxial compression shear tests conducted by [18] on Kaolinite clays having various over-consolidation ratios (OCRs). The results of shear and void ratio behaviour are collected as shown in Fig. 12. Herein, datasets from nine tests having OCRs of 1, 2, 2.25, 2.5, 2.7, 4, 5, 10 and 20 are used to train the model, and the remaining three, with OCRs of 3, 8 and 50 kPa, are used to test it.

Fig. 12
figure 12

Experimental data of Kaolinite clay [18]

4.2 Selection of a simulation strategy

According to previous comparisons, BPNN integrating the incremental stress–strain strategy is used to model Kaolinite clays’ behaviour. According to the incremental stress–strain strategy seen in Eq. (16), the vector X of independent variables is the OCR, and there are two output variables: deviatoric stress q and void ratio e. Accordingly, ML-based Kaolin clays’ constitutive models can be obtained by

$$q^{i} = f\left( {p^{i - 1} ,q^{i - 1} ,e^{i - 1} ,\varepsilon_{1}^{i - 1} ,\Delta \varepsilon_{1}^{i} } \right)$$
(18)
$$e^{i} = g\left( {p^{i - 1} ,q^{i - 1} ,e^{i - 1} ,\varepsilon_{1}^{i - 1} ,\Delta \varepsilon_{1}^{i} } \right)$$
(19)

where p i–1, q i–1, ei−1, \(\varepsilon_{1}^{i - 1}\) = mean stress, deviatoric stress, void ratio and axial strain at the (i–1)th steps, respectively; q i, ei, Δ\(\varepsilon_{1}^{i}\) = deviatoric stress, void ratio and axial strain increment at the ith step, respectively; and f, g = formulations of deviatoric stress–strain and void ratio–strain relationships.

Figure 13 presents the framework of the BPNN-based constitutive model for predicting Kaolinite clay behaviour. Note that the predicted deviatoric stress and void ratio at the ith step must update the deviatoric stress and void ratio in real time to predict deviator stress and volumetric strain at the (i + 1)th step. Updates to strain ε at the (i + 1)th step follow Eq. (12). After training, formulations of BPNN-based Kaolinite constitutive models are found and summarized in Appendix.

Fig. 13
figure 13

Framework for predicting Kaolinite clay behaviours

4.3 Results of simulation

Validation results of these simulations with which to determine the optimal parameters of the BPNN-based Kaolinite constitutive model are not presented, but Appendix A presents the formulation of optimal the BPNN-based Kaolinite clay constitutive model in detail. It can be observed that the optimal number of hidden neurons in BPNN is 8. Figure 14 presents the results of the training set predicted by the optimal BPNN model, compared with the measured results, showing that the BPNN-based constitutive model can accurately capture non-linear deviatoric stress–strain and void ratio–strain relationships.

Fig. 14
figure 14

Predicted results on the training set using BPNN-based constitutive model with incremental stress–strain strategy: (a) qε1; (b) eε1

Figure 15 presents the results of the predicted deviatoric stress–strain and void ratio–strain relationships for the interpolated test set. Because this study uses the recursive simulation strategy, prediction error accumulates gradually with increasing strain. The accumulated error is negligible up to strain of 20% for simulation of the deviatoric stress–strain relationship. By contrast, the predicted void ratio–strain curve gradually deviates from the accrual curve when strain exceeds 10%. Overall, the BPNN-based constitutive model better simulates the deviatoric stress–strain relationship, likely because the void ratio–strain relationship is more complicated than the deviatoric stress–strain relationship. Figure 15 shows that deviatoric stress increases monotonically with strain for all experimental tests, whereas the void ratio–strain relationship differs for different OCR values because of the dilatant behaviour associated with high OCR and the contractive behaviour associated with low OCR.

Fig. 15
figure 15

Predicted results on the test set (interpolation) using BPNN based on the incremental stress–strain strategy: (a) qε1; (b) eε1

Figure 16 presents the predicted deviatoric stress–strain and void ratio–strain relationships for the extrapolated test set. Absent experimental results for OCR = 22.5, 25, 27.5 and 30, the reasonability of predicted curves is referred from the results for OCR = 20 and 50. For OCR = 50, a predicted deviatoric stress–strain curve using the BPNN-based constitutive model agrees well with the actual curve, although no experimental data are available in the training set beyond OCR = 20. Predicted curves for OCR = 22.5, 25, 27.5 and 30 also suggest reasonable trends (deviatoric stress increasing monotonically with increasing strain and peak deviatoric stress increasing with decreasing OCR), with all results falling into the range OCR = 20 and OCR = 50. However, the BPNN-based constitutive model’s predicted void ratio–strain curves obviously deviate from the actual curves when strain exceeds 5%. Overall, the BPNN-based constitutive model performs well (in terms of both interpolation and extrapolation) at simulating actual soil behaviour so long as datasets are sufficient. What’s more, the ability to obtain simple, explicit function can further extend the application of the BPNN-based constitutive model.

Fig. 16
figure 16

Predicted results on the test set (extrapolation) using BPNN based on the incremental stress–strain strategy: (a) qε1; (b) eε1

4.4 Comparison of different ML based models

To further compare the performance of BPNN for modelling soil behaviours with ELM and EPR, the latter two ML algorithms are also used to predict the behaviours of Kaolinite clays. It should be noted that the training, test datasets and also the modelling framework for ELM and EPR are consistent with that used in BPNN. Herein, the optimum number of hidden neurons in ELM based model is identified as 7, and the optimum transformed terms in EPR based model is identified as 8. For brevity, the process for determining the hyper-parameters of ELM and EPR are not presented in detail.

The predicted stress–strain relationships on the testing datasets using the optimum ELM-based model are presented in Fig. 17. It can be seen from Figs. 17a, b that the predicted results on the interpolated datasets show a good agreement with experimental results. In Figs. 17c, d, the strain softening and volumetric contraction are obviously observed from the predicted results on extrapolated datasets, which severely violates the measured results. Such factors indicate the ELM-based model can well describe the known soil behaviours, but may be not suitable to predict the soil behaviours on the unseen datasets. Figure 18 presents the predicted stress–strain relationships on the testing datasets using the optimum EPR-based model. Similar to the results presented in Sect. 3, the predicted error on both interpolated and extrapolated datasets are larger than that generated by BPNN- and ELM-based models. It indicates the generalization ability of EPR is inferior to the neural networks-based algorithms; thereby it has difficulty in modelling complicated soil behaviours. Overall, the generalization ability of BPNN is excellent and it can be used to simulate soil behaviours on both known and unknown datasets.

Fig. 17
figure 17

Predicted results on the test set using ELM based on the incremental stress–strain strategy: (a) qε1 (interpolation); (b) eε1 (interpolation); (c) qε1 (extrapolation); (d) eε1 (extrapolation)

Fig. 18
figure 18

Predicted results on the test set using EPR based on the incremental stress–strain strategy: (a) qε1 (interpolation); (b) eε1 (interpolation); (c) qε1 (extrapolation); (d) eε1 (extrapolation)

It should be noted that the ML-based model is a kind of data-driven model, thereby its application scope can be expanded as the type and information of datasets increase. For example, if the database involves data under unloading and different stress paths, the ML-based model can be well trained and used to simulate soil behaviours under such conditions. Otherwise, more effective physical mechanism needs to be added to refine the ML based model. Future work will focus on such issues.

5 Conclusions

Determination of soil constitutive models is vitally important to engineering practice. ML algorithms have been used to model soil behaviour, because ML-based constitutive models are free of assumptions and offer strong non-linear mapping capabilities. This study systematically demonstrated the application of ML algorithms for construction of a soil constitutive model, using three commonly used ML algorithms able to present an explicit formulation—BPNN, ELM and EPR—to develop models and comprehensively comparing their modelling performance.

A database based on a simple sand shear constitutive model was first built to objectively reveal the capacity of three ML algorithms and two modelling strategies to model soil behaviour, with the intent of eliminating potential interference from noise-corrupted experimental data. Although the ML algorithm can learn directly from data, an incremental stress–strain strategy able to take loading path into consideration was more suitable than the total stress–strain strategy for constructing the ML-based constitutive model. ML-based models’ hyper-parameters can be determined through the trial-and-error method, and the genetic algorithm can identify general hypermeters for developing the global optimum model. The application of k-fold cross validation can enhance the robustness of the ML-based model, facilitating the application of the ML-based model to engineering practice.

Simulation results on theoretical and experimental data indicated that the BPNN-based constitutive model is more stable and accurate, including for interpolation and extrapolation when modelling soil behaviour, than the ELM- and EPR-based constitutive models. Notably, the BPNN-based constitutive model’s predictions of deviatoric stress–strain and void ratio–strain relationships for Kaolinite clay agreed with actual experimental data. Overall, the ML-based constitutive model can directly capture non-linear soil behaviours based on limited experimental data without making any assumptions; what’s more, an explicit formulation for the constitutive model can be determined that guarantees its application to numerical analysis and engineering practice.

This study focuses only on the modelling of soil behaviours under monotonic loading. Besides, this study merely explores a general modelling framework for ML algorithms, and indicates that the neural network based algorithm is superior to other types of ML algorithms in modelling soil behaviours. As the rapid development in the ML field, more advanced algorithms have been proposed. Future works will investigate the performance of various neural network -based algorithms and determine the most appropriate one for modelling soil behaviours under more complex loading paths.