Keywords

1 Introduction

Recent advances in computing have given a boost in the use of advanced soft computing methods across all industries; the field of artificial intelligence has been developing since the 1950s. Knowledge-based methods or expert systems have been deployed to assist risk-related decisions under uncertainty. Applications in structural engineering are witnessed in the research field as early as the 1980s [1] since structural engineering problems in practice are governed by a wide range of uncertainties related for example to the applied actions, the material performance and homogeneity, or the models used to describe the problem itself. A benefit of such soft computing methods is that they can provide reliable solutions on multi-parametric problems and highly nonlinear correlations of the input. Simultaneously, various disciplines of structural engineering rely on empirical, semi-empirical, or numerical predictive models.

Design of anchorage to concrete, offers an example of semi-empirical predictive and design models, particularly when it comes to concrete-related failure modes. The assessment of anchor capacity using Artificial Neural Networks (ANN) has been previously investigated in [2, 3]. More recent studies present paradigms of machine learning algorithms such as the Gaussian process regression (GPR) with very high efficiency in structural engineering problems [4, 5]. The present paper investigates the feasibility of using GPR algorithms to predict the concrete breakout strength of single anchors loaded in shear. Towards developing the model, experimental results of 366 tests on single anchors with concrete edge breakout failures were collected based on an extensive literature search to establish the database to train and test the model. A parametric study and comparison of the proposed models with other existing predictive models were reported to assess the accuracy and efficiency of the shear capacity design methods for anchors subjected to shear loads.

2 Existing Strength Models for Concrete Breakout Capacity of Single Anchors in Shear

The current EN 1992-4 [6] and ACI 318 [7] design standards provide Eqs. 1 and 2, respectively, for the evaluation of a single anchor’s resistance against concrete edge failure in non-cracked concrete. The European design is valid for a \(d_{nom} \le 60\;{\text{mm}}\) and influence length \(l_{f} \le 12d_{nom}\) in case of \(d_{nom} \le 24\;{\text{mm}}\) and otherwise \(\le \max \left\{ {8 \cdot d_{nom} ; 300\;{\text{mm}}} \right\}\). The design calculations covered by CEN are valid only up until \(f_{ck} < 60\;{\text{N/mm}}^{2}\). ACI 318 is valid for a concrete compressive cylinder strength of 10,000 psi (70 MPa) for cast in anchors and 8000 psi (55 MPa) for post-installed anchors and an anchor diameter up to 4in. (100 mm).

$$ V_{EC2 - k} = 2.4.d_{nom}^{\alpha } .l_{f}^{b} .\sqrt {f_{ck} .} c_{1}^{1.5}. $$
(1a)

The mean concrete breakout capacity of single shear anchor in non-cracked concrete for the EN 1992-4 standard is calculated according to Eq. 1b [8].

$$ V_{EC2 - m} = 3 .d_{nom}^{\alpha } .l_{f}^{\beta } .\sqrt {f_{cm} .} c_{1}^{1.5}. $$
(1b)

where

$$ \alpha = 0,1 \cdot \left( {\frac{{l_{f} }}{{c_{1} }}} \right)^{0,5} $$
$$ \beta = 0,1 \cdot \left( {\frac{{d_{nom} }}{{c_{1} }}} \right)^{0,2} $$

The design and the mean concrete breakout capacity of single shear anchor in non-cracked concrete for ACI 318 are calculated according to Eq. 2a, b respectively [9, 10].

$$ V_{ACI} = {\text{min}}\left\{ {\left( {0.6 \cdot \left( {\frac{{l_{f} }}{{{\text{d}}_{nom} }}} \right)^{0.2} \cdot \sqrt {d_{nom} } } \right)\lambda_{a} \sqrt {f^{\prime}_{c} } \left( {c_{1} } \right)^{1.5} , 3.7.\lambda_{a} .\sqrt {f^{\prime}_{c} } \left( {c_{1} } \right)^{1.5} } \right\} $$
(2a)
$$ V_{ACI - m} = {\text{min}}\left\{ {\left( {\frac{{l_{f} }}{{{\text{d}}_{nom} }}} \right)^{0.2} \cdot \sqrt {d_{nom} } \sqrt {f_{cm} } \left( {c_{1} } \right)^{1.5} , 7.1.\sqrt {f_{cm} } \left( {c_{1} } \right)^{1.5} } \right\} $$
(2b)

where \(d_{nom}\) is the outside diameter of the anchor. \(f^{\prime}_{c}\) is the concrete cylinder strength per the ACI acceptance standards. \(\lambda_{a}\) is the modification factor for applications in lightweight concrete. \(f_{cm}\) is the mean concrete cylinder compressive strength. \(l_{f}\) is the influence length of the anchor loaded in shear.

3 Gaussian Process Regression (GPR)

The Gaussian process model is a kernel machine type which can be used as a supervised learning technique for classification as well as regression. Gaussian processes can give a simple probabilistic representation of random processes and can be used for many different types of nonparametric estimation. This method is currently well adopted and applied in various areas in structural engineering [4, 5]. A summary of the GPR algorithm is presented in this section. More details on the GPR methodology can be found in [11].

Given a training set, \(U = \left\{ {\left( {x_{i} ,y_{i} } \right);i = 1,2, \ldots ,n} \right\}\), where the input \(x_{i} \in {\mathbb{R}}^{U.n}\) denotes the design matrix and \(y_{i} \in {\mathbb{R}}^{n}\) denotes the vector of the desired output, drawn from an unknown distribution. A GPR model predicts the value of the output variable \(y_{new}\), given a new input vector \(x_{new}\), and training data. In the setting of classic linear regression, we model the output variable \(y\) by a function of an input variable \(x\) expressed in Eq. 3 [11].

$$ y = x^{T} \beta + \varepsilon $$
(3)

where \(x\) is the input vector, and \(y\) is the observed target value. The random error term \(\varepsilon \sim N\left( {0,\sigma_{n}^{2} } \right)\). The error variance \(\sigma_{n}^{2}\) and the coefficients \(\beta\) are estimated from the data.

The multivariate Gaussian distribution, which has a mean vector \(\mu\) and covariance matrix \({\Sigma }\) have the joint probability density expressed as Eq. 4.

$$ p(x|{\upmu }) = (2\pi )^{{ - \frac{D}{2}}} \left| {\Sigma } \right|^{{ - \frac{1}{2}}} {\text{exp}}\left( { - \frac{1}{2}\left( {x - \mu } \right)^{T} {\Sigma }^{ - 1} \left( {x - \mu } \right)} \right) $$
(4)

Unlike the Gaussian distribution, which is a distribution over vectors, the Gaussian process is a distribution over functions with a covariance function \(k\left( {x,x^{\prime}} \right)\) and a mean function \(m(x\)) (Eq. 5).

$$ f\left( x \right) \sim GP\left( {m\left( x \right), k\left( {x,x^{\prime}} \right)} \right) $$
(5)

The indexes of the GP is \(x\). Where the mean function and covariance function of a real process \(f\left( x \right)\) is defined as Eqs. 6 and 7, respectively.

$$ m\left( x \right) = E\left[ {f\left( x \right)} \right] $$
(6)
$$ k\left( {x,x^{\prime}} \right) = E\left[ {\left( {f\left( x \right) - m\left( x \right))(f(x^{\prime}} \right) - m\left( {x^{\prime}} \right))} \right] $$
(7)

The Gaussian Process is a multivariate Gaussian of infinite length. Following the GPR procedure, the \(n\) observations in an arbitrary data set, \(y = \left\{ {y_{1} , \ldots ,y_{n} } \right\}\) can be taken as a sample from some multivariate Gaussian distribution. Hence, going from the process of distribution, we can get an understanding of a GP and then draw samples from it. The Gaussian process \(f \sim GP\left( {m, k} \right)\) is defined with a mean function \(m\left( x \right) = 0\) (Eq. 8) and covariance/kernel function \(k\left( {x,x^{\prime}} \right)\) (Eq. 9). The goal of only working with finite quantities is simply achieved by requiring the values of \(f\) at a distinct number of \(n\) locations. Given the \(x\)-values we can evaluate the GP, which is now reduced to a multivariate Gaussian distribution [11].

$$ \mu = m\left( x \right) = 0 $$
(8)
$$ k\left( {x,x^{\prime}} \right) = \sigma_{f}^{2} \exp \left( { - \frac{1}{{2l^{2} }}\left( {x - x^{\prime}} \right)^{2} } \right) + \sigma_{n}^{2} \delta \left( {x,x^{\prime}} \right) $$
(9)

where \(l\) denotes the length parameter of the kernel function. \(\delta \left( {x,x^{\prime}} \right)\) is denotes the Kronecker delta function.

Gaussian Process regression has different types of kernel functions, some of which includes the squared exponential kernel, Laplace kernel and Linear Kernel. Since different kernel functions are suitable for different type of data, several kernel functions need to be trialled to choose the most appropriate. In this study, the two most suitable kernel functions (non-linear kernel functions) obtained for the database is the Gaussian or Radial Basis Function (RBF) \(k\left( {x,x^{\prime}} \right) = \exp \left( { - \frac{1}{{2\sigma^{2} }}x - x^{{\prime}{2}} } \right)\). Where \(\sigma\) is the width of the kernel which are user-defined parameters.

4 Development of the GPR Model

The experimental database considered in this investigation was compiled from different research published in technical literature. They are majorly experiments conducted by [10, 12,13,14,15,16,17,18,19,20,21,22]. The different experiments were aimed and designed to reflect the behaviour of single anchors in shear. The experimental database consists of the failure load of 366 single anchors in shear, failing due to concrete edge breakout in non-cracked concrete. The database covers a wide range of anchor configurations that can be used to assess any anchor design method against experimental results.

The development of an efficient model for predicting the concrete breakout strength of single anchors in shear requires the inclusion of the main factors affecting anchors in shear as inputs. The various parameters affecting the breakout strength of single anchors in shear are discussed in [9]. In this study, the input parameters considered for the implementation of the GPR model includes edge distance \(c_{1}\), anchor diameters \(d_{nom}\), influence length \(l_{f}\) and concrete strength \(f_{c}\).

In order to implement the GPR model, the database of experimental anchor tests was split into two subsets, namely: the training data set, and testing data set. The training data set is used to develop both the GPR model, whereas the performance of the developed model is evaluated using the testing dataset. While splitting the database into subsets, it is essential to ensure data patterns that are statistically consistent in both the training and testing datasets. This was achieved by randomly sorting both the training and testing data set until an acceptable consistency is maintained among the input variables, in terms of statistical properties (such as mean and standard deviation) and range of data. This is summarised in Table 1. In this study, 70% of the data (256 out of 366 cases) were used for training, and the remainder (110 cases) were used for testing the models.

The nonlinear regression technique of the GPR models, implemented in a MATLAB environment, was used to predict the concrete breakout strength of anchors in shear. In order to map input data into feature space, nonlinear regression technique requires kernel functions. The optimum search method was used to obtain the optimum parameters. The performance of the developed model was detailed using statistical parameters, namely the coefficient of determination (\(R^{2}\)), mean absolute percentage error (MAPE), Mean square error (MSE) and root-mean-squared error (RMSE).

Table 1 Statistical summary of the experimental dataset used for the model development

5 Results and Discussion

5.1 Performance of the GPR Model

The investigation of the performance of the developed model using the training and testing data set is discussed in this section. Using the experimental data presented in Table 1, the GPR model was adopted to successfully learn the interrelationships between concrete breakout strength and varied shear strength parameters (input variables). The model accuracy is assessed using statistical parameters such as \(R^{2}\), RMSE, MAE and MSE, calculated between the experimental and predicted results, and the results presented in Table 2. The statistical result for the training database (256 experimental tests) is reported as follows: RMSE = 8.7; MAE = 5.8 and \(R^{2}\) = 0.99. The \(R^{2}\) and MAE value of the testing database is comparable to that of the training database (110 experimental tests), but with a slightly higher RMSE value. These results indicate that the GPR model is a good predictor of concrete breakout strength. The results demonstrate the generalization capability of the developed GPR model.

Table 2 Statistical properties of the developed models

5.2 Comparative Study of the GPR with Existing Concrete Breakout Strength Models

The trained GRP model was compared to existing concrete breakout capacity predictive models, using the testing dataset, to examine the predictive performance of the models. Two existing concrete breakout capacity models in shear were considered, namely the predictive model of (1) EN 1992-4 (Eq. 1) (2) ACI 318 (Eq. 2). The mean shear resistance function/best estimate model of the EN 1992-4 and ACI 318 models were used in this comparative assessment.

The plot of the experimental breakout capacity against the predicted breakout capacity of the developed models and other existing models, using the testing database, are presented in Fig. 1. The figure portrays the deviation of the predicted strength by the methods from the line of ‘Perfect model’, which is defined as the line position of all the points where the experimental breakout capacity is equal to predicted breakout capacity. As seen from the figures, the predicted values by the GPR model are much less scattered compared to the values from the other models.

A comparison of the GPR model with other existing models in terms of MAE and RMSE values is presented in Table 3. The table revealed that the MAE and RMSE values of the GPR model is less than that of the other models. Güneyisi et al. [23] developed a model to predict the concrete edge breakout capacity of single adhesive anchors using gene expression programming (GEP). Their developed model yielded \(R^{2}\) of 0.92 (lower than what is obtained for GPR model in this study) and the value of RMSE = 13; MAPE = 14.2, MSE = 168.7 (higher than what is achieved for GPR model in this study), using the testing database of 34 anchor experiments. However, it should be highlighted that a larger database is utilized for training and testing in this study compared to [23]. A general assessment of the results presented in Table 3 suggests that the GPR model outperformed all the other models investigated in this study.

Fig. 1
figure 1

a The full plot of experimental breakout capacity versus predicted breakout capacity. b Partial plot of experimental breakout capacity versus predicted breakout capacity (breakout strength up to 100 kN only)

Table 3 Statistical properties of the proposed and existing models

The summary of the statistical properties of the model uncertainty (obtained as the ratio of experimental breakout strength to predicted breakout strength) associated to the GPR and the other models are presented in Table 3, and the distributions are plotted in a box plot shown in Fig. 2. A box plot is a statistical tool that can be used to provide statistical summaries of the underlying distribution of a dataset. The box plot displays the maximum and minimum values in the dataset, the lower and upper quartiles, the mean and the median. A model uncertainty mean value \(\mu_{ME} = 1\) is a condition for an ideal model. The GPR and the other models are assessed, using the criteria that an ideal model is expected to have in addition to model uncertainty mean value \(\mu_{ME} = 1\); high precision (that is, small scatter of data) [24, 25].

Assessment of the box plot revealed that the GPR model has the smallest length of the interquartile range of all the models investigated, thereby suggesting less variability of the GPR model uncertainty. As presented in Table 3, the model uncertainty variable associated with the GPR model has a mean value of \(\mu_{ME} = 0.99\) (closest to mean value of 1), suggesting that the model reasonably predicts the breakout strength. Regarding standard deviation, the GPR model yields the lowest dispersion of all the models investigated with \(\sigma_{ME} = 0.11\). Table 3 also shows that the best estimate models of EN 1992-4 and ACI 318 overpredict the shear breakout strength, but predictive models in design standards may typically lie on the conservative side.

Fig. 2
figure 2

Comparison of the prediction error of GPR to other models

6 Conclusions

A possible failure mode for shear loaded anchors is the concrete breakout failure. Concrete related failure mode poses a significant safety issue, since they may develop abruptly, without preceding signs of damage. Given this, accurate prediction of the concrete breakout resistance of anchors in shear is crucial. This contribution focuses on the feasibility of using the Gaussian Process Regression (GPR) machine learning algorithms to predict the shear breakout strength of single anchors and quantifies the model uncertainties in existing predictive models. The following general conclusions may be drawn from the present study:

  • A reasonable accuracy was obtained for both the training and testing datasets in terms of low RMSE, MAE and high determination coefficients \(R^{2}\), even though the database for testing were not utilized for training. This reflects the generalization capability of the developed GPR model.

  • The prediction capability of the developed model was compared to that of the existing models proposed in EN 1992-4 and ACI 318. The statistical analysis revealed that the proposed GPR model had relatively lower errors and higher determination coefficient than the existing codified models investigated.

  • The model uncertainty associated with the GPR model has the closest mean value to 1 (\(\mu_{ME} = 0.99\)) and the lowest standard deviation (\(\sigma_{ME} = 0.11)\). Therefore, the GPR model is described as the best performer of all the models analysed in this study.

  • In the context of the reliability analyses, a limit state function should ideally be based on a good predictive model, with low bias (with a mean close to 1) and uncertainty coupled. Such a model can be used as a general probabilistic model in the reliability analysis of fastening to concrete design provisions.