1 Introduction

Taking into consideration the complex behavior associated with the soil along with soil-structure interaction, measuring the pile loading bearing capacity, is considered as one of the most challenging problems in geotechnics. Different researchers have proposed various methods for forecasting the piles bearing capacity [1,2,3]. In some of these methods such as pile static analysis, and pile empirical analysis relations due to the simplification is made, selection of a large safety factor is unavoidable which causes low accuracy and loss of resources [4]. In some other methods, like the pile loading test procedures despite the high percentage of reliability, application of these methods can make noneconomic, time-consuming, and cause high costs of setup [5, 6]. Cone penetration test (CPT) is one of the most common in situ field tests considered due to its simplicity, high speed, and relatively low cost. In addition, CPT could make an achievement a continuous output at soil depths; also since the similarity characteristics between penetrometer cone tip related to pile tip and cone sleeve related to pile friction surface, estimating the piles bearing capacity is one of common CPT applications [7]. There are two approaches to the use of CPT results in designing piles [8]. The direct approach calculates piles bearing capacity using CPT results and indirect approach which calculates pile bearing capacity using the soil specifications obtained from CPT results [9, 10]. The use of computational intelligence in estimating the bearing capacity of piles based on the CPT results classified as direct approaches [11, 12]. Despite the significant progress of soil mechanics and geotechnical engineering in recent decades, determination of pile bearing capacity is considered as a difficult issue. Mechanical properties and the physical behavior of the soils and also piles diversity lead to interaction between the pile and surrounding soils [13, 14]. Soil specifications could be varied due to nonhomogeneous, anisotropy, the presence of water and complex stress–strain behavior; also, sometimes due to various region conditions, the pile properties can be changeable such as the type, material, shape, construction and setups methods [15, 16]. With respect to the mentioned reasons, modeling such complex conditions including interaction among different parameters is not simply possible. Therefore, a large number of investigators [17, 18] have made incorporations over the past decades to provide theoretical or empirical relationships for determining the bearing capacity of the piles [6, 19]. However, each method using different input parameters associated with the laboratory conditions and the simplified assumptions may not be satisfactory for solving pile analysis and design in practice [20]. Therefore, the use of analytical and semi-experimental methods leads to an inaccurate determination of the bearing capacity of the piles [21]. Due to the high cost of laboratory and field tests of deep foundations as well as the need to optimally design pile structure, many researchers [22, 23] have been proposed to apply artificial intelligence (AI) techniques as complementary and alternative methods of the existing traditional methods for estimating the bearing capacity of piles [24,25,26,27].

Artificial neural networks (ANNs) and other AI algorithms inspired by the structure and function of the human brain which have been widely used in various field of science and engineering in recent years [28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]. Many researchers have also used a wide range of these techniques for pile capacity prediction recently [64,65,66,67, 12, 68]. The results of this research reported the accuracy and reliability of AI and soft computing methods [9, 69] in predicting pile bearing capacity. Later on, another polynomial neural network [70] was developed known as group method of data handling (GMDH) which was used to predict axial pile bearing capacity in geotechnical engineering [71, 72]. Recently, the use of subset of the AI techniques such as the GMDH type neural network, genetic algorithm (GA) and the fuzzy logic theory and their parallel integration have led to the development of advanced hybrid computing algorithms [3, 73,74,75,76]. The hybrid synergies structure of these approaches had important significance for researchers [77,78,79,80]. Some degrees of success in the field of combining of these approaches have been reported to improve structure and tuning parameters of each specific algorithm during recent years [27, 50, 81]. The objective of this research is to achieve a novel hybrid neural network through combining GMDH type neural network by substituting structure of adaptive neuro-fuzzy inference system (ANFIS) in each partial description and finally to improve new hybrid ANFIS–GMDH network using particle swarm optimization (PSO) method to develop ANFIS–GMDH–PSO model for evaluation and prediction of the ultimate pile bearing capacity. Along with the development of ANFIS–GMDH–PSO, another model called as fuzzy polynomial neural network type GMDH (FPNN–GMDH) is extended for comparison purposes in terms of accuracy and overall performance against each developed model.

2 Theoretical concepts

2.1 Framework of group method of data handling (GMDH) type neural network structure

The idea associated with ANNs stimulation through modeling structure of the complex human brain, known as a non-linear, parallel performing approach [82]. GMDH type neural network structure is the self-organizing method by which a behavior system identified by assessment of their performances over a provided set of multi-input single output dataset \((x_{i} ,\,x_{j} )\,,\,(i = 1,2, \ldots ,M)\). The concept of the GMDH network is to make an analytic function within a feed-forward network determined by a polynomial transfer function in which coefficients attained applying the particular regression process [71]. By applying the GMDH algorithm, a model displayed as a set of neurons through which various sets in every single layer usually interconnected through a quadratic polynomial, creating new neurons inside the subsequent layer. These types of representation employed to map inputs space to outputs space. The basic description of the identification issue is to uncover a function \((\hat{f})\) utilized as opposed to the desired function \((f)\) as a way to predict output result \((\hat{y})\) for any provided input vector \(X = \left( {x_{1} ,\,x_{2} ,\,x_{3} , \ldots ,\,x_{n} } \right)\) as close as possible towards the target value (y). For that reason, provided M observations involving multivariable input–single variable output dataset:

$$y_{i} = f\left( {x_{i1} ,\,x_{i2} ,\,x_{i3} , \ldots ,\,x_{in} } \right)\,,\,\,\,\left( {i = 1,2,3, \ldots M} \right).$$
(1)

It could be practical to train a GMDH type of ANN to estimate predicted values \((\hat{y}_{i} )\) considered to be for each provided input vector (X):

$$\hat{y}_{i} = \hat{f}(x_{i1} ,\,\,x_{i2} ,\,x_{i3} , \ldots ,\,x_{in} )\,\,\,\,\,(i = 1,2,3, \ldots ,M).$$
(2)

The main issue is to specify a GMDH type neural network to ensure the square of the differences between observed and expected output values minimized as follows:

$$\sum\limits_{i = 1}^{M} {\left[ {\hat{f}(x_{i1} ,\,\,x_{i2} ,\,\,x_{i3} , \ldots ,\,\,x_{in} ) - y_{i} } \right]}^{2} \,\,\, \Rightarrow \hbox{min} .$$
(3)

An elaborate discrete type of the Volterra functional series, referred to as Kolmogorov-Gabor polynomial can present the general relationship among input and output parameters space. As a result:

$$y = a_{0} + \sum\limits_{i = 1}^{n} {a_{i} x_{i} + \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {a_{ij} x_{i} x_{j} + \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {\sum\limits_{k = 1}^{n} {a_{ijk} x_{i} x_{j} x_{k} + \cdots \,} } } } } } .$$
(4)

This specific full form of mathematically description typically exhibited using a solution of partial quadratic polynomials made from simply two variables (neurons) by applying Eq. (5) as follow:

$$\hat{y} = G(x_{i} ,\,x_{j} ) = a_{0} + a_{1} x_{i} + a_{2} x_{j} + a_{3} x_{i} x_{j} + a_{4} x_{i}^{2} + a_{5} x_{j}^{2} .$$
(5)

Using this approach, the description of a partial quadratic is recursively applied to interconnected neurons network to develop the standard mathematical relation concerning inputs and output provided in Eq. (4). The ai coefficients in Eq. (5), computed by applying the regression method to decrease the main difference between observed output values (y) and predicted the output ones \((\hat{y})\) for each pair input parameters \((x_{i} ,\,x_{j} )\). Clearly, a tree of polynomials developed by utilizing the quadratic type provided in Eq. (5) whose coefficients achieved by the least squares method. The coefficients of each quadratic function Gi obtained for an optimum fitting for that output associated with the total pairs of input–output data sets based on minimizing Eq. (6) criteria:

$$E = \frac{{\sum\nolimits_{i = 1}^{M} {(y_{i} - G_{i} (x_{i} ,x_{j} ))^{2} } }}{M}\,\, \Rightarrow \,\,\,\,\hbox{min} .$$
(6)

For the standard form of GMDH type neural network formula, all possibilities of two independent variables from the total n input variables are taken into account to construct the regression polynomial by the use of Eq. (5) that most closely fit the dependent observations \((y_{i} \,,\,\,\,i = 1,\,2, \ldots ,\,M)\,\) based on the least squares perspective.

Therefore, \(\left( \begin{aligned} n \hfill \\ 2 \hfill \\ \end{aligned} \right) = \frac{n(n - 2)}{2}\) neurons established within the first hidden layer from the feedforward network using the observations \(\left\{ {(y_{i} ,\,x_{ip} ,\,x_{iq} );\,(i = 1,2,3, \ldots ,M)} \right\}\) using \(p,q \in \,\left\{ {1,\,2,\,3, \ldots ,\,n} \right\}\) in the form of Eq. (7):

$$\left[ {\begin{array}{*{20}c} {x_{1p} } & {x_{1q} } & \vdots & {y_{1} } \\ {x_{2p} } & {x_{2q} } & \vdots & {y_{2} } \\ \cdots & \cdots & \cdots & \cdots \\ {x_{Mp} } & {x_{Mq} } & \vdots & {y_{M} } \\ \end{array} } \right].$$
(7)

From the quadratic sub-expression, using Eq. (5) per each row of M data triples, the below matrix formulation achieved through Eq. (8) to Eq. (11), respectively:

$$Aa = Y,$$
(8)
$$a = \left\{ {a_{0} ,\,a_{1} ,\,a_{2} ,\,a_{3} ,\,a_{4} ,\,a_{5} } \right\},$$
(9)
$$Y = \left\{ {y_{1} ,\,y_{2} ,\,y_{3} , \ldots ,y_{M} } \right\}^{\text{T}} ,$$
(10)
$$A = \left[ {\begin{array}{*{20}c} 1 & {x_{1p} } & {x_{1q} } & {x_{1p} x_{1q} } & {x_{1p}^{2} } & {x_{1q}^{2} } \\ 1 & {x_{2p} } & {x_{2q} } & {x_{2p} x_{2q} } & {x_{2p}^{2} } & {x_{2q}^{2} } \\ \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\ 1 & {x_{Mp} } & {x_{Mq} } & {x_{Mp} x_{Mq} } & {x_{Mp}^{2} } & {x_{Mq}^{2} } \\ \end{array} } \right]\,.$$
(11)

The least squares technique from the multiple regression analysis leads to a solution of the characteristic equations, in the following form:

$$a = (A^{\text{T}} A)^{ - 1} \,A^{\text{T}} Y.$$
(12)

The solution determines the coefficients vector of Eq. (5) for all range of M dataset triples. It observed that this process repeated for every neuron of any succeeding hidden layer as outlined by the interconnection topology in this network.

2.2 Framework of fuzzy polynomial neural network GMDH (FPNN–GMDH) structure

In FPNN–GMDH structure, partial descriptions are in the form of RBF networks. In each network, this partial description that each of which has two inputs introduced, and network structure created as a hierarchy of these blocks. If M describes the number of partial descriptions in each layer and P is the number of layers of the network, output calculation procedure in the network as follows:

If \(A_{ki} (x_{i} )\) represents the membership function for the kth fuzzy rule in the domain of ith input variable, then its membership function calculated with Eq. (13):

$$\mu_{k}^{pm} = \prod\limits_{i = 1}^{L} {A_{ki}^{pm} (x_{i} )\,} .$$
(13)

In which L could adopt the values of 1 and 2. The inference part of fuzzy inference engine to conclude y value, presented by a coefficient such as \(w_{i}\) coefficient. For mth partial description in pth layer, the output calculated as follows Eq. (14):

$$y^{pm} = \sum\limits_{i = 1}^{K} {\mu_{k}^{pm} \,w_{k}^{pm} } .$$
(14)

In which the membership function is chosen as Eq. (15):

$$A_{ki}^{pm} (x_{i} ) = \exp \left\{ { - \frac{{(x_{i}^{pm} - a_{ki}^{pm} )^{2} }}{{b_{ki}^{pm} }}} \right\}.$$
(15)

In which Eq. (14) known as RBF network. In the end, FPNN–GMDH model output calculated as follow according to Eq. (16):

$$y = \frac{1}{M}\sum\limits_{m = 1}^{M} {y^{pm} } .$$
(16)

Figure 1 represents a sample of this network that has three layers and, in each layer, has assigned three partial descriptions. The researchers introduced different methods to train FPNN model. Most common methods are gradient descent method, structural learning with forgetting (SLF) [83], MSLF [84]. Also, other various intelligence optimization methods proposed by researchers in recent years including evolutionary algorithms [85], and meta-heuristics methods such as PSO [86, 87].

Fig. 1
figure 1

Structure of FPNN–GMDH with six input variables

2.3 Framework of adaptive network based-fuzzy inference system (ANFIS) structure

In this article, an overview of Takagi–Sugeno type Adaptive Neuro-Fuzzy Inference System (ANFIS) network discussed. ANFIS structure takes advantage from two main fields of the fuzzy logic and neural network concepts [88]. If two approaches combined with each other, the better results will achieve the best performance regarding quality and quantity due to fuzzy wisdom and neural networks computational ability [89]. Like other types of fuzzy-neural systems, ANFIS framework consists of two parts. The primary section is an antecedent, and the subsequent section is a consequence part so that these two parts connected to each other by a set of rules. Five layers observed in ANFIS structure considered as a multi-layer network. A sample of ANFIS structure is shown in Fig. 2. The 1st layer performs fuzzification, the 2nd layer completes fuzzy (AND/OR) operations and developing fuzzy rules; the 3rd layer performed the membership functions normalization; the 4th layer carries out fuzzy rules inference, and at last, the 5th layer calculates the output of the network (system predicted output).

Fig. 2
figure 2

ANFIS architecture including two inputs, four rules, and one output

Formulated equations regarding ANFIS network are listed as follows:

$$w_{i} = \mu_{{A_{i} }} (x_{1} )\, \times \,\mu_{{B_{i} }} (x_{2} ),$$
(17)
$$\bar{w}_{i} = \frac{{w_{i} }}{{w_{1} + w_{2} }},\,\,\,\,i = 1,2,$$
(18)
$$\begin{aligned} f_{1} = q_{11} x_{1} + q_{12} x_{2} + q_{13} \hfill \\ f_{2} = q_{21} x{}_{1} + q_{22} x_{2} + q_{23} , \hfill \\ \end{aligned}$$
(19)
$$f = \frac{{w_{1} f_{1} + w_{2} f_{2} }}{{w_{1} + w_{2} }} = \bar{w}_{1} f_{1} + \,\bar{w}_{2} f_{2} .$$
(20)

ANFIS network uses fuzzy membership functions, and most important membership functions are Bell-shaped functions having minimum and maximum values of zero and one, respectively, as follows:

$$\mu_{{A_{i} }} (x) = \frac{1}{{1 + \left[ {\left( {\frac{{x - \bar{x}_{i} }}{{\sigma_{i} }}} \right)^{2} } \right]^{{b_{i} }} }},$$
(21)
$$\mu_{{A_{i} }} (x) = \exp \left\{ { - \left[ {\left( {\frac{{x - \bar{x}_{i} }}{{\sigma_{i} }}} \right)^{2} } \right]^{{b_{i} }} } \right\}.$$
(22)

In which \(\left\{ {x_{i} ,b_{i} ,\sigma_{i} } \right\}\), the parameters associated with a membership function shape.

Various methods have been proposed to train ANFIS network. The most common approach among them is gradient descent method that can minimize the output error. Other hybrid methods were introduced for training this network as training consequence part by gradient descent method and training antecedent part by PSO [24]. Training of both the antecedent and consequence parts of ANFIS structure using evolutionary optimization techniques, for instance, GA or meta-heuristic optimization methods e.g., PSO and gravitational search algorithm are among the other intelligence optimization methods [90,91,92].

2.4 MLP–ANNs framework

Artificial neural networks (ANNs) which have developed by McCulloch and Pitts [93], are information processing patterns made by mimicking the neural network of the human brain. ANN consists of input, hidden, and output layers. In each layer, there is a set of interconnected processor components (neurons) whose output is the input layer of the next layer. The output signal from one layer will be connected to the next layer by means of weight factors through an intermediate that amplifies or weakens the signals [94]. An active function such as the linear or sigmoid function will be used to calculate the outputs of neurons in the hidden and output layers. The number of neurons in the input and output layers is determined by the number of input and output variables. Given the number of neurons in the hidden layer, there is no specific way; however, the number of hidden layers is determined by the number of neurons according to the complexity of the problem and the trial and error method [95]. There are several neural networks with different training algorithms, but a review of the articles shows that forward training with back-propagation (BP) algorithm is commonly used in different areas such as mining and geotechnical engineering [96,97,98,99,100]. The ANN modeling process can be summarized in two main parts: (1) assigning network structure and (2) adjusting the weight of connections between neurons. In the BP algorithm, weights will be determined by minimizing the error between the outputs and the value predicted by the ANN and the error returns to the input layer. Finally, the network response will be obtained as the model output [95]. In the next step, if the response is different from the target value, the bias correction will start to reduce the error rate. Therefore, the BP algorithm was used in this study [101]. However, the feedforward back propagation suffers from convergence problems and is trapped in the local minimum. Figure 3 illustrates the architecture of common ANN used in this study as a benchmark model for comparative purposes with other models [101].

Fig. 3
figure 3

Traditional ANN structure

3 Pile and soil information

A compiled dataset collected from published paper based on CPT results and PLT results to develop and running different predictor hybrid models related to pile capacity evaluation. Databank compiled from various sources: The most provided by those found in literature together with the experimental field test reported in past years in some southern area of Iran. An ongoing database consisted of soil characteristics, pile properties (pile embedded length, pile cross-section shape, pile material), CPT results including the resistance of cone tip and the sleeve friction of cone and ultimate pile capacity (Qt) derived from in situ pile loading tests (PLT). Two important types of parameters influence Qt; a group associated with the measured soil properties, and other groups relevant to pile characteristics. Typically, soil characteristics close to embedded piles could be assumed described by CPT output results including the resistance of cone tip (QC) and cone sleeve friction resistance (FS) in most cases. Therefore, CPT results utilized as the representation of soil parameters which influencing Qt values. Pile geometry specifications (length and diameter) involving pile characteristics which effecting on Qt. Additionally, unmodified CPT results were applied through modeling process since they are seldom included the pore pressure measurement using CPT due to the cone device limitation in the past; also pile load tests were performed by different researchers over 72 piles collected in literature reviews used in this study shown in Table 1. Pile setup and establishment installed by hammer and jack driving tools; Some of which are concrete piles while remaining ones are steel piles [69]. Limited offset load adopted as the standard reference for piles bearing capacity calculations derived from PLT results [102].

Table 1 The applied collected databases [69]

In the following section, new AI hybrid models were developed to evaluate the ultimate pile bearing capacity using the collected database according to Table 1 for training and testing stage of proposed AI models; finally, the performance of developed models were compared to each other with the aid of applying conventional ANN as a reference model based on statistical indices criterion. The methodology flowchart of this study is briefly described in Fig. 4 and in Sect. 4.

Fig. 4
figure 4

The flowchart of this study

4 Methods

In this section, the researchers intend to combine the structure of two soft computing approaches called as ANFIS algorithm and GMDH algorithm to develop new hybrid network model called as ANFIS–GMDH. Furthermore, to optimize the structure of developed ANFIS–GMDH network model for pile bearing capacity prediction, first a brief description of particle swarm optimization algorithm was described; then, through applying PSO method over topology of desired ANFIS–GMDH model, the membership function parameters and network structure was improved to achieve better performance model (ANFIS–GMDH–PSO) compared to another model (FPNN–GMDH). Finally, the prediction and regression results of applied two developed models compared to each other based on some common statistical criteria. The result was shown graphically by charts and tabulated by tables for each developed model to verify the precision and performance in training and testing stages for each predictor model in predicting pile bearing capacity.

4.1 Development of hybrid ANFIS–GMDH structure

In this part, the new structure of GMDH type neural network has discussed in which partial descriptions (PD’s) are ANFIS networks having two inputs in place of RBF structure. Each partial description is an ANFIS network with two inputs in which the number selection of membership functions per each input is changeable. Accordingly, the output of each partial description (PD) defined as follows in Eq. (23) through Eq. (24):

$$F^{pm} = \frac{{\sum\nolimits_{l}^{n} {\sum\nolimits_{k}^{n} {\mu_{{A_{l} }} (x_{1}^{pm} )\,\mu_{{B_{k} }} (x_{2}^{pm} )} } \,f_{lk}^{pm} }}{{\sum\nolimits_{l}^{n} {\sum\nolimits_{k}^{n} {\mu_{{A_{l} }} (x_{1}^{pm} )\,\mu_{{B_{k} }} (x_{2}^{pm} )} } }},$$
(23)
$$f_{lk}^{pm} = q_{lk}^{1} x_{1} + q_{lk}^{2} x_{2} + q_{lk}^{3} .$$
(24)

In which m is partial description number in the pth layer, n is the selected number of membership functions intended for inputs and q coefficients are real numbers. In this case, the network output achieved based on Eq. (25):

$$y = \frac{1}{M}\sum\limits_{m = 1}^{M} {F^{pm} } .$$
(25)

The notation M refers to the number of partial descriptions in the last layer.

4.2 Description and development of PSO algorithm on ANFIS–GMDH topology

The ANFIS–GMDH network model has different components which could be optimized by common meta-heuristic algorithms such as PSO method; the PSO algorithm has been employed for improving the structure of ANFIS–GMDH network model through optimizing the membership functions and tuning associated parameters in PDs. The PSO algorithm was proposed by Kennedy and Ebertman which inspired by the social behavior of animals such as fish, insects, and birds [91]. Each member in a bunch acts like a particle that these particles make massive batches and each particle is like a potential solution for optimization problem; for instance, the ith particle with tth iteration has the \(X_{i}^{t}\) position vector and \(V_{i}^{t}\) velocity vector, as follow in Eq. (26) and (27):

$$X_{i}^{t} = \left\{ {x_{i1}^{t} ,\,x_{i2}^{t} ,\, \ldots ,x_{iD}^{t} } \right\}$$
(26)
$$V_{i}^{t} = \left\{ {v_{i1}^{t} ,\,v_{i2}^{t} ,\, \ldots ,v_{iD}^{t} } \right\}$$
(27)

where D indicates solution space dimension.

The particle can move across the position vector and its position varies with its speed. The best position of a particle is called (pbest) and the best global position is (gbest) and the bunch experience them in its first iteration.

$$V_{i}^{t + 1} = \omega^{t} \,V_{i}^{t} + c_{1} r_{1} \,(p{\text{best}}_{i}^{t} - X_{i}^{t} ) + c_{2} r_{2} (g{\text{best}}^{t} - X_{i}^{t} ),$$
(28)
$$X_{i}^{t + 1} = X_{i}^{t} + V_{i}^{t + 1} ,$$
(29)

where \(r_{1}\) and \(r_{2}\) = two uniform random values sequences generated from interval [0 1], \(c_{1}\) and \(c_{2}\) = cognitive and social scaling parameters, respectively.

PSO is very sensitive to the inertial weight (w) parameter which has an inverse relationship with the number of iterations.

$$\omega = \omega_{\rm{max} } - \frac{{\omega_{\rm{max} } - \omega_{\rm{min} } }}{{t_{\rm{max} } }}\, \cdot \,t$$
(30)

where \(w_{ \rm{max} }\) and \(w_{ \rm{min} }\) = maximum and minimum values of w, respectively, and \(t_{ \rm{max} }\) = limit numbers of optimization iteration.

We can combine the PSO algorithm with the ANFIS–GMDH model and get into the ANFIS–GMDH–PSO model that generates three PD in the first layer. The second layer is created using the PD form of the first layer and finally, the ANFIS–GMDH–PSO model is optimized with three layers.

The particles, P, are initialized with random positions and velocities, then the population is evaluated. Initialize the \(p{\text{best}}_{i}^{k}\) with a copy of the position for each particle such as \(X_{i}^{k}\). If the final condition is satisfactory, the flowchart reaches to \(g{\text{best}}_{i}^{k}\) and if not, the updating process of velocities and the positions will be performed; then evaluating the population again parallel to updating \(p{\text{best}}_{i}^{k}\) and \(g{\text{best}}_{i}^{k}\). Eventually, k = k + 1 is gained; the flowchart of PSO algorithm and the flowchart process of combining PSO topology on developed ANFIS–GMDH model were shown in Fig. 5a, b, respectively.

Fig. 5
figure 5

a Flowchart of the PSO algorithm. b Flowchart of ANFIS–GMDH model optimized by PSO algorithm

Fig. 6
figure 6

Results of performance indices of the ANN model for training and testing stages in predicting pile bearing capacity

Fig. 7
figure 7

Predicted vs. measured values plot for ANN model in train and test stages

Fig. 8
figure 8

Results of performance indices of FPNN–GMDH model in pile bearing capacity prediction

5 Predictive AI models evaluation

In this research, to verify best-fitted models for predicting pile capacity, some statistical parameters such as coefficient of correlation (R), mean square error (MSE), root mean square error (RMSE) were calculated to evaluate the performance prediction of the developed models as in following equations:

$$R = \left( {\frac{{\sum\limits_{i = 1}^{M} {\left( {y_{{i({\text{Actual}})}} - \bar{y}_{{ ( {\text{Actual)}}}} } \right)\left( {y_{{i({\text{Model}})}} - \bar{y}_{{ ( {\text{Model)}}}} } \right)} }}{{\sum\limits_{i = 1}^{M} {\left( {y_{{i({\text{Actual}})}} - \bar{y}_{{ ( {\text{Actual)}}}} } \right)^{2} \, \times \,\sum\limits_{i = 1}^{M} {\left( {y_{{i ( {\text{Model)}}}} - \bar{y}_{{ ( {\text{Model)}}}} } \right)^{2} } } }}} \right),$$
(31)
$${\text{MSE}} = \frac{1}{M}\sum\limits_{i = 1}^{M} {(y_{{i({\text{Model}})}} - y_{{i({\text{Actual}})}} )^{2} } ,$$
(32)
$${\text{RMSE}} = \left( {\frac{{\sum\nolimits_{i = 1}^{M} {\left( {y_{{i({\text{Model}})}} - y_{{i({\text{Actual}})}} } \right)^{2} } }}{M}} \right),$$
(33)
$${\text{Error}}\,{\text{Mean}} = \frac{{\sum\nolimits_{i = 1}^{M} {(y_{{i({\text{Actual}})}} - y_{{i({\text{Model}})}} } }}{M},$$
(34)
$${\text{Error}}\,{\text{StD}} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{M} {(E_{{i({\text{Model}})}} - \bar{E}_{\text{Model}} )} }}{M - 1}} .$$
(35)

in which \(y_{{i({\text{Model}})}}\) implies predicted value (model output) for each observation (i = 1,2,…, M), \(y_{{i({\text{Actual}})}}\) is target value (measured value), M is the number of observations and E indicate the error value between measured actual values and model outputs for each observation within the dataset.

For comparative purposes, the same training and test datasets were used for all estimator AI models, respectively, while above quantitative performance evaluation criteria were applied to evaluate different models’ performance. The degree of accuracy and reliability of predicted output values (Pile Capacity) determined using R, MSE and RMSE known as statistical indications. Theoretically, a predictive model could be perfect if R = 1, MSE/RMSE = 0 obtaining lower error mean parallel with minimum error standard deviation (error StD) in some cases depends on scattering and outlier nature of datasets. The results of the model performance indices for the best ANN, FPNN–GMDH and ANFIS–GMDH–PSO models for data training and testing stages presented in Table 2.

Table 2 Results of performance statistical values for the developed AI models

As illustrated in Figs. 9 and 11 respectively, it was determined that the ANFIS–GMDH–PSO model’ performance were relatively higher than the FPNN–GMDH model’ performance in train and test stages. The results of the integrated FPNN–GMDH approach based on R values are 0.93 and 0.92, respectively, shown Fig. 8 for train and test datasets, while hybrid ANFIS–GMDH–PSO model achieves the values of 0.94 and 0.96 for R values for train and test stages, respectively, according to Fig. 10. Moreover, RMSE values of 0.048 and 0.069 for training and testing stages of ANFIS–GMDH–PSO model show that the proposed hybrid model could be introduced in pile bearing capacity calculation as the more effective accurate model in comparison to other developed models according to Figs. 6, 7 and 8. It was observed from Table 1 that two developed hybrid AI models (FPNN–GMDH and ANFIS–GMDH–PSO) perform well during training and testing phase compared to traditional ANN’s; and also these methods shown better performance rather than ANN benchmark model utilized in this study for all mentioned statistical criteria. To estimate the bearing capacity of the piles, in the training stage, the ANFIS–GMDH–PSO model achieved the best R, MSE, RMSE, and Error StD of 0.94, 0.002, 0.048 and 0.048, respectively, based on Fig. 10; while according to Figs. 6 and 8, it was shown that the FPNN–GMDH model obtained better results than ANN model. By analyzing the results during the testing stage, it was determined that the optimized ANFIS–GMDH model performs better than all the other models overall based on Fig. 10. The relation between the best-fitted ANN, FPNN–GMDH, ANFIS–GMDH–PSO models and measured actual values in pile bearing capacity prediction for train and test datasets are shown in Figs. 7, 9 and 11, respectively. Also there is a significant difference among the results of a new developed model (ANFIS–GMDH–PSO) and other developed models (ANN, FPNN–GMDH). This can be justified by the use of PSO algorithm to adjust the weights and bias of the hybrid network structure during the learning process. As indicated in Table 2, the performance indices demonstrate that the results derived from ANFIS–GMDH–PSO model are much correlated in pile bearing capacity forecast and have demonstrated that this model can estimate pile capacity with a high level of precision. In addition, the plots of predicted versus measured ultimate pile bearing capacity for ANFIS–GMDH–PSO model performance were shown through Fig. 11 for train and test phases. The RMSE estimates the residual between the observed and the predicted values. R evaluates the linear correlation between the observed and calculated values while E evaluates the model’s ability to predict mean values. According to the statistics presented in Table 2, it can be concluded that the best performance of all the methods of artificial intelligence developed in this paper differs in terms of different statistical criteria. It is noted that during the modelling procedure, there is no significant limitation on running developed hybrid algorithms, however, it should be taken into account that due to the limitation of the existing datasets (72 datasets) introduced to the optimum model it is not possible to further train the network to achieve best model performance over test dataset; therefore it is valuable to consider big datasets for AI network training stage if expects to get a better degree of accurate results from optimum models.

Fig. 9
figure 9

Predicted vs. measured values plot for FPNN–GMDH model in train and test stages

Fig. 10
figure 10

Results of performance indices of ANFIS–GMDH–PSO model in pile bearing capacity prediction

Fig. 11
figure 11

Predicted vs. measured values plot for ANFIS–GMDH–PSO model in train and test stages

6 Conclusions

In this study, one of the most significant problems related to predicting ultimate pile bearing capacity of deep foundations has been solved with the aid of in situ field CPT and PLT results through utilizing new developed AI models. An experimental database was collected from existing literature review including cone penetration test (CPT) and pile loading tests (PLT) results were applied for constructing and developing different AI models. Two new hybrid AI methods (FPNN–GMDH, ANFIS–GMDH–PSO) were developed simultaneously with applying ANN model for comparing and validating best-fitted predictive model among developed ones in terms of the degree of accuracy and performance indices based on standard statistical parameters. According to the derived modeling results for different mentioned AI models, the following conclusions would present:

  • Based on results derived from two different hybrid neural models under consideration, it was concluded that model based on optimized ANFIS–GMDH–PSO network model shown better performance than FPNN–GMDH and also ANN models due to having lowest value of RMSE and highest value of R. It observed that two developed models could employ as a form of new hybrid soft computing tool which showing acceptable degree of precision in the field of geotechnical engineering problems. In this respect, the new alternative approaches possibly could substitute instead of using in situ field experimental tests and semi-empirical regression based-equations methods related to ultimate pile bearing capacity assessment that lead to high cost-time consuming, unreliability and uncertainty in case of complicated executive conditions.

  • It can be concluded to extend furthermore improving the hybrid structure network while developing hybrid ANFIS–GMDH by applying other intelligent meta-heuristics optimization technique such as GA, and imperialism competitive algorithms for future investigation.

  • For simplicity reason, in constructing of structure network with lower complex structure, both ANFIS-GMD-PSO and FPNN–GMDH topology have been created based on assumed setting parameters by the user that leads to algorithm running time as faster as possible in MATLAB programming language.

  • Statistical indices such as R, MSE, RMSE, and Error StD were used as model structure evaluation criterion associated with the various models developed. During the modeling and testing process, it was found that the developed ANFIS–GMDH–PSO model had a relatively high level of accuracy and precision for estimating the bearing capacity of the piles compared to the other developed models so that the predicted values had a relatively high correlation with the measured values. Relative error estimation shows relatively good performance in the hybrid models developed in the test process.