Keywords

1 Introduction

Driven piles are commonly used to transfer the loads from the superstructure through weak strata onto stiffer soils or rocks. For these piles, the impact of the piling hammer induces compression and tension stresses in the piles. Hence, an important design consideration is to ensure that the strength of the pile is sufficient to resist the stresses introduced by the impact of the pile hammer. One common method of calculating the driving stresses is based on the stress-wave theory [18] which involves the discrete idealization of the hammer-pile-soil system. Considering that the conditions at each site are different, generally a wave equation based computer program is required to generate the pile driving criteria for each individual project. The pile driving criteria include:

  • Hammer stroke versus Blow per foot BPF (1/set) for required bearing capacity,

  • Maximum compressive stresses versus BPF,

  • Maximum tension stress versus BPF.

However, this process can be rather time consuming and requires very specialized knowledge of the wave equation program.

The essence of modeling/numerical mapping is prediction, which is obtained by relating a set of variables in input space to a set of response variables in output space through a model. The analysis of pile drivability involves a large number of design variables and nonlinear responses, particularly with statistically dependent inputs. Thus, the commonly used regression models become computationally impractical. Another limitation is the strong model assumptions made by these regression methods.

An alternative soft computing technique is the artificial neural network (ANN). The ANN structure consists of one or more layers of interconnected neurons or nodes. Each link connecting each neuron has an associated weight. The “learning” paradigm in the commonly used Back-propagation (BP) algorithm [14] involves presenting examples of input and output patterns and subsequently adjusting the connecting weights so as to reduce the errors between the actual and the target output values. The iterative modification of the weights is carried out using the gradient descent approach and training is stopped once the errors have been reduced to some acceptable level. The ability of the trained ANN model to generalize the correct input-output response is performed in the testing phase and involves presenting the trained neural network with a separate set of data that has never been used during the training process.

This paper explores the use of ANN and another soft computing technique known as multivariate adaptive regression splines (MARS) [3] to capture the intrinsic nonlinear and multidimensional relationship associated with pile drivability. Similar with neural networks, no prior information on the form of the numerical function is required for MARS. The main advantages of MARS lie in its capacity to capture the intrinsic complicated data mapping in high-dimensional data patterns and produce simpler, easier-to-interpret models, and its ability to perform analysis on parameter relative importance. Previous applications of the MARS algorithm in civil engineering include predicting the doweled pavement performance, estimating shaft resistance of piles in sand and deformation of asphalt mixtures, analyzing shaking table tests of reinforced soil wall, determining the undrained shear strength of clay, predicting liquefaction-induced lateral spread, assessing the ultimate and serviceability performances of underground caverns, estimating the EPB tunnel induced ground surface settlement, and inverse analysis for braced excavation [1, 7, 8, 12, 13, 15,16,17, 19,20,21,22,23]. In this paper, the Back propagation neural network (BPNN) and MARS models are developed for pile drivability predictions in relation to the Maximum compressive stresses (MCS), Maximum tensile stresses (MTS), and Blow per foot (BPF). A database of more than four thousand piles is utilized for model development and comparative performance between BPNN and MARS predictions.

2 Methodologies

2.1 Back-Propagation Algorithm

A three-layer, feed-forward neural network topology shown in Fig. 1 is adopted in this study. As shown in Fig. 1, the back-propagation algorithm involves two phases of data flow. In the first phase, the input data are presented forward from the input to output layer and produces an actual output. In the second phase, the error between the target values and actual values are propagated backwards from the output layer to the previous layers and the connection weights are updated to reduce the errors between the actual output values and the target output values. No effort is made to keep track of the characteristics of the input and output variables. The network is first trained using the training data set. The objective of the network training is to map the inputs to the output by determining the optimal connection weights and biases through the back-propagation procedure. The number of hidden neurons is typically determined through a trial-and-error process; normally the smallest number of neurons that yields satisfactory results (judged by the network performance in terms of the coefficient of determination R2 of the testing data set) is selected. In the present study, a Matlab-based back-propagation algorithm BPNN with the Levenberg-Marquardt (LM) algorithm [2] was adopted for neural network modeling.

Fig. 1
figure 1

Back-propagation neural network architecture used in this study

2.2 Multivariate Adaptive Regression Splines Algorithm

MARS was first proposed by [3] as a flexible procedure to organize relationships between a set of input variables and the target dependent that are nearly additive or involve interactions with fewer variables. It is a nonparametric statistical method based on a divide and conquer strategy in which the training data sets are partitioned into separate piecewise linear segments (splines) of differing gradients (slope). MARS makes no assumptions about the underlying functional relationships between dependent and independent variables. In general, the splines are connected smoothly together, and these piecewise curves (polynomials), also known as basis functions (BFs), result in a flexible model that can handle both linear and nonlinear behavior. The connection/interface points between the pieces are called knots. Marking the end of one region of data and the beginning of another, the candidate knots are placed at random positions within the range of each input variable.

MARS generates BFs by stepwise searching over all possible univariate candidate knots and across interactions among all variables. An adaptive regression algorithm is adopted for automatically selecting the knot locations. The MARS algorithm involves a forward phase and a backward phase. The forward phase places candidate knots at random positions within the range of each predictor variable to define a pair of BFs. At each step, the model adapts the knot and its corresponding pair of BFs to give the maximum reduction in sum-of-squares residual error. This process of adding BFs continues until the maximum number is reached, which usually results in a very complicated and overfitted model. The backward phase involves deleting the redundant BFs that made the least contributions. An open MARS source code from [10] is adopted in performing the analyses presented in this paper.

Let y be the target dependent responses and X = (X 1 , …, X P ) be a matrix of P input variables. Then it is assumed the data are generated based on an unknown “true” model. For a continuous response, this would be

$$ y = f(X_{1} , \ldots ,X_{P} ) + e = f({\mathbf{X}}) + e $$
(1)

in which e is the fitting error. f is the built MARS model, comprising of BFs which are splines piecewise polynomial functions. For simplicity, only the piecewise linear function is expressed and considered in this paper. Piecewise linear functions follow the form \( \hbox{max} (0,x - t) \) with a knot defined at value t. Expression \( \hbox{max} ( \cdot ) \) means that only the positive part of \( (.) \) is used otherwise it is assigned a zero value. Formally,

$$ \hbox{max} (0,\,x - t)\, = \,\left\{ {\begin{array}{*{20}l} {x - t,\,if\,\begin{array}{*{20}c} {x\, \ge \,t} \\ \end{array} } \hfill \\ {0,\,\,otherwise} \hfill \\ \end{array} } \right. $$
(2)

The MARS model f(X), which is a linear combination of BFs and their interactions, is expressed as

$$ f(X) = \beta_{0} + \sum\limits_{m = 1}^{M} {\beta_{m} \lambda_{m} (X)} $$
(3)

where each \( \lambda_{m} \) is a BF. It can be a spline function, or interaction BFs produced by multiplying an existing term with a truncated linear function involving a new/different variable (higher orders can be used only when the data warrants it; for simplicity, at most second-order is adopted). The terms β are constant coefficients, estimated using the least-squares method.

Figure 2 shows an example illustration of how the MARS algorithm would make use of piecewise linear spline functions to fit provided data patterns. The MARS mathematical equation is as follows

Fig. 2
figure 2

Knots and linear splines for a simple MARS example

$$ y \, = - 5.0875\, - \,2.7678\, \times \,BF1\, + \,0.5540\, \times \,BF2\, + \,1.1900\, \times \,BF3 $$
(4)

in which BF1 = max(0, x – 17), BF2 = max(0, 17 – x) and BF3 = max(0, x – 5) and max is defined as: max(a, b) is equal to a if a > b, else b. The knots are located at x = 5 and 17. These two knots delimit/cut the x range into three intervals where different linear relationships are identified.

The MARS modeling is a data-driven process. To construct the model in Eq. (3), first the forward phase is performed on the training data starting initially with only the intercept \( \beta_{0} \). At each subsequent step, the basis pair that produces the maximum reduction in the training error is added. Considering a current model with M basis functions, the next pair to be added to the model is in the form of

$$ \hat{\beta }_{M + 1} \,\lambda_{l} (X)\hbox{max} (0,X_{j} - t) + \hat{\beta }_{M + 2} \,\lambda_{l} (X)\hbox{max} (0,t - X_{j} ) $$
(5)

with each \( \beta \) being estimated by the least-squares method. This process of adding BFs continues until the model reaches some predetermined maximum number, generally leading to a purposely overfitted model.

The backward phase improves the model by removing the less significant terms until it finds the best sub-model. Model subsets are compared using the less computationally expensive method of Generalized Cross-Validation (GCV). The GCV is the mean-squared residual error divided by a penalty that is dependent on model complexity. For the training data with N observations, GCV is calculated as [9]

$$ GCV = \frac{{\frac{1}{N}\sum\nolimits_{i = 1}^{N} {[y_{i} - f(x_{i} )]^{2} } }}{{[1 - \frac{M + d \times (M - 1)/2}{N}]^{2} }} $$
(6)

in which M is the number of BFs, d is a penalty for each basis function included in the developed sub-model, N is the number of data sets, and \( f(x_{i} ) \) denotes the MARS predicted values. Thus the numerator is the mean square error of the evaluated model in the training data, penalized by the denominator which accounts for the increasing variance in the case of increasing model complexity. Note that \( (M - 1)/2 \) is the number of hinge function knots. The GCV penalizes not only the number of BFs but also the number of knots. A default value of 3 is assigned to penalizing parameter d and further suggestions on choosing the value of d can be referred to [3]. At each deletion step, a basis function is pruned to minimize Eq. (3), until an adequately fitting model is found.

After the optimal MARS model is determined, by grouping together all the BFs involving one variable and another grouping of BFs involving pairwise interactions, the analysis of variance (ANOVA) decomposition procedure [3] can be used to assess the parameter relative importance based on the contributions from the input variables and the BFs.

3 Performance Measures

Table 1 shows the performance measures and the corresponding definitions utilized for prediction comparison of the two surrogate methods.

Table 1 Summary of performance measures

4 Pile Drivability Data Sets

In this paper, a database containing 4072 piles with a total of seventeen variables is developed from the information on piles already installed for bridges in the State of North Carolina [11]. Seventeen variables including hammer characteristics, hammer cushion material, pile and soil parameters, ultimate pile capacities, and stroke were regarded as inputs to estimate the three dependent responses comprising of the Maximum compressive stresses (MCS), Maximum tensile stresses (MTS), and Blow per foot (BPF). A summary of the input variables and outputs is listed in Table 2.

Table 2 Summary of input variables and outputs

For purpose of simplifying the analyses considering the extensive number of parameters and large data set, Joen and Rahman [11] divided the data into five categories (Q1–Q5) based on the ultimate pile capacity, as detailed in Table 3. In this paper, for each category 70% of the data patterns were randomly selected as the training dataset and the remaining data were used for testing. For details of the entire data set as well as each design variable and responses, the report by Joen and Rahman [11] can be referred to.

Table 3 Division of data with respect to ultimate pile capacities

5 BPNN Models

For simplicity, only BPNN models with one single hidden layer structure are considered. The optimal BPNN model is selected from models with different hidden neurons since the other main parameters for BPNN algorithms have been fixed as:

logsig transfer function from the input layer to the hidden layer;

tansig transfer function from the hidden layer to the output layer;

maxepoch = 500;

learning rate = 0.01;

min_grad = 1 × 10−15;

decrease factor mu_dec = 0.7;

increase factor mu_inc = 1.03.

5.1 The Optimal BPNN Model

The BPNN with the highest coefficient of determination R2 value for the testing data sets is considered to be the optimal model. Figure 3 plots the R2 values of the testing data sets for BPNN models with different neurons (from 5 to 15) in the hidden layer for MCS, MTS and BPF predictions. It can be observed that for the optimal MCS, MTS, and BPF models, the number of the neurons in the hidden layer is 9, 7 and 11, respectively.

Fig. 3
figure 3

R2 for different neuron numbers for MCS, MTS and BPF models

5.2 Modeling Results

Figures 4, 5 and 6 show the BPNN predictions for the training and testing data patterns for MCS, MTS, and BPF, respectively. For the MCS predictions, considerably high R2 (>0.97) are obtained for both the training and testing patterns. Compared with the MCS predictions, the developed BPNN model is slightly less accurate in predicting the MTS mainly as a result of the bias (errors) due to the significantly smaller tensile stress values in comparison to the compressive stresses. For the BPF estimation, high R2 are also obtained for both the training and testing patterns, with the latter slightly greater than the training sets. In addition, the three optimal BPNN models can serve as reliable tools for prediction of MCS, MTS and BPF.

Fig. 4
figure 4

Prediction of MCS using BPNN

Fig. 5
figure 5

Prediction of MTS using BPNN

Fig. 6
figure 6

Prediction of BPF using BPNN

5.3 Parameter Relative Importance

The parameter relative importance determined by BPNN is based on the method by [5] and discussed by Goh [6]. Figure 7 gives the plot of the relative importance of the input variables for the three BPNN models. It can be observed that MCS is mostly influenced by the input variable x11 (Slenderness) and MTS is mostly influenced by the input variable x8 (Penetration). Interestingly, BPF is primarily influenced by the input variable x16 (Ultimate pile capacity).

Fig. 7
figure 7

Prediction of BPF using BPNN

5.4 Model Interpretability

For brevity, only the developed BPNN MCS model is expressed in mathematical form through the trained connections weights, the bias, and the transfer functions. The Mathematical expression for MCS obtained by the optimal MCS analysis is shown in the Appendix 1.

6 MARS Models

It is assumed that at most the 2nd order interaction is considered for the prediction of MCS, MTS and BPF using MARS. The number of basis functions changes from 2n to n2 (n = 17 in this study, numerical trials indicate that overfitting occurs when the number of BFs exceeds 80).

6.1 The Optimal MARS Model

The MARS model with the highest R2 value and less BFs for the testing data set is considered to be the optimal. Figure 8 plots the R2 values of the testing data sets for the MARS models with different BFs (from 34 to 78) in the hidden layer for the MCS, MTS and BPF predictions. It can be observed that for the optimal MCS, MTS, and BPF models, the number of BFs is 52, 36 and 38, respectively.

Fig. 8
figure 8

R2 for different number of BFs for MCS, MTS and BPF models

6.2 Modeling Results

Figures 9, 10 and 11 show the MARS predictions for the training and testing data patterns for MCS, MTS, and BPF, respectively. For the MCS prediction, considerably high R2 (>0.95) are obtained for both the training and testing patterns. As in the BPNN analysis, the developed MARS model is less accurate in predicting MTS compared with the MCS predictions, mainly due to the bias brought about by the smaller tensile stress values. For the BPF estimation, high R2 (>0.90) are also obtained for both the training and testing patterns, with the latter slightly greater than the training sets. Consequently, the three optimal MARS models can serve as reliable tools for prediction of MCS, MTS and BPF.

Fig. 9
figure 9

Prediction of MCS using MARS

Fig. 10
figure 10

Prediction of MTS using MARS

Fig. 11
figure 11

Prediction of BPF using MARS

6.3 Parameter Relative Importance

Table 4 displays the ANOVA decomposition of the built MARS models for MCS, MTS and BPF respectively. For each model, the ANOVA functions are listed. The GCV column provides an indication on the significance of the corresponding ANOVA function, by listing the GCV value for a model with all BFs corresponding to that particular ANOVA function removed. It is this GCV score that is used to assess whether the ANOVA function is making a significant contribution to the model, or whether it just marginally improves the global GCV score. The #basis column gives the number of BFs comprising the ANOVA function and the variable(s) column lists the input variables associated with this ANOVA function.

Table 4 ANOVA decomposition of MARS model for MCS, MTS and BPF

Figure 12 gives the plot of the relative importance of the input variables for the three HP drivability models developed by MARS. It can be observed that both MCS and BPF are mostly influenced by the input variable x1 (hammer weight). Interestingly, MTS is primarily influenced by the input variable x6 (the weight of helmet). It should be noted that since the BPNN and MARS algorithms adopt different methods in assessing the parametric relative importance, it is understandable that the two algorithms give different results.

Fig. 12
figure 12

Relative importance of the input variables in MARS pile drivability models

6.4 Model Interpretability

Table 5 lists the BFs of the MCS model. The MARS model is in the form of

Table 5 BFs and corresponding equations of MARS MCS model
$$ \begin{array}{*{20}l} {MCS\left( {MPa} \right){ = }169.4 + 0.0095 \times BF1 + 35.6 \times BF2 - 47.5 \times BF3 - 0.46 \times BF4 - 2 \times BF5 + } \hfill \\ {8847 \times BF6 + 9.2 \times BF7 - 8.2 \times BF8 - 0.0025 \times BF9 + 0.0062 \times BF10 - 3.2 \times BF11 + } \hfill \\ {470 \times BF12 - 0.0036 \times BF13 - 0.8 \times BF14 - 0.0012 \times BF15 + 0.006 \times BF16 + 9.43 \times BF17} \hfill \\ { - 6.1 \times BF18 + 0.136 \times BF19 - 0.098 \times BF20 - 0.83 \times BF21 - 0.17 \times BF22 - 540 \times BF23} \hfill \\ { + 1.34 \times 10^{5} \times BF24 + 1.672 \times BF25 - 0.42 \times BF26 + 0.144 \times BF27 - 4.57 \times BF28} \hfill \\ { - 0.0054 \times BF29 + 0.052 \times BF30 + 87 \times BF31 + 250 \times BF32 - 763 \times BF33 - 16 \times BF34} \hfill \\ { - 28.1 \times BF35 + 0.217 \times BF36 - 0.2 \times BF37 + 34.5 \times BF38 + 31.3 \times BF39 - 50.2 \times BF40} \hfill \\ { - 425 \times BF41 + 0.0018 \times BF42 - 0.003 \times BF43 - 7.4 \times BF44 + 341 \times BF45 + 51.4 \times BF46} \hfill \\ { + 5.67 \times BF47 + 12 \times BF48 + 0.96 \times BF49 + 100.2 \times BF50 - 0.2 \times BF51 + 0.23 \times BF52} \hfill \\ \end{array} $$
(7)

7 Discussions

Comparisons of R2, r, RRMSE and ρ, as well as the built model interpretability between MARS and BPNN are shown in Table 6. It can be observed that generally BPNN models are slightly more accurate than MARS. However, in terms of the model interpretability, MARS outperforms BPNN through easy-to-interpret model. Thus, both these two methods can actually be used for cross-validation.

Table 6 Comparison of performance measures for BPNN and MARS

8 Summary and Conclusions

A database containing 4072 pile data sets with a total of 17 variables is adopted to develop the BPNN and MARS models for drivability predictions. Performance measures indicate that both the BPNN and MARS models for the analyses of pile drivability provide similar predictions and can thus be used for predicting pile drivability as cross-validation. In addition, the MARS algorithm builds flexible models using simpler linear regression and data-driven stepwise searching, adding and pruning. The developed MARS models are much easier to be interpreted.