Introduction

In recent decades, post-tensioning techniques have been widely used to control swaying on bridge piers (Marriott et al., 2009). Bridge piers built with this idea in mind can withstand seismic vibrations and return to their former shape afterward. Posttensioned (PT) base rocking steel piers have lately been the subject of an investigation into their potential as a cost-effective alternative to concrete piers (Ahmad et al., 2021). A round steel tubular column is incorporated into the design of the proposed pier construction. This column is joined to a circular base plate that is welded together. A PT tendon is an additional component of the framework, and it has additional energy dissipators (EDs). The occurrence of gap opening at the connection interface due to the rocking mechanism gives high deformation capacity when subjected to lateral loading in the presence of nonlinearity in the material. When exposed to a certain stimulus or circumstance, the pier exhibits a hysteresis response that can be described as having the shape of a flag. In addition, it has been noticed that, when subjected to seismic activity, the pier exhibits a hysteresis response matching that of a flag, with residual displacements that are either limited or minimal. This has been observed to be the case. Because it is simple to create pre-fabricated columns on-site, the method may be an option worth considering for accelerated bridge construction (ABC), which refers to the process of building a bridge in a shorter amount of time. Because of the possible application of this model in areas that are prone to seismic activity, the creation of an efficient and accurate model for predicting the lateral reaction of the system is of the utmost importance. Despite this, there are no prognostic models in the literature that are accessible to the public that can accurately predict the response of the system in question (Wakjira et al., 2022).

There has been a major uptick in interest in the application of machine learning and deep learning techniques in the field of earthquake and structural engineering (Flood, 2008). It's possible that the ability of machine learning (ML) models to forecast the connection between independent and dependent variables is the root cause of this phenomenon. These models don't require prior knowledge of the underlying physical or mathematical models to make these predictions. A great number of studies have been carried out with the objective of predicting the reaction of a wide variety of structural components by employing machine learning models that have been trained on data obtained from experimental as well as numerical simulations.

A novel hybrid intelligent model has been developed by Keshtegar et al. (2021) to predict the maximum capacity of an reinforced concrete shear wall (RCSW) structure before it fractures. Thaler et al. (2021) developed a Monte Carlo simulation method for engineering structures with nonlinear behavior that incorporates machine learning and other forms of artificial intelligence. Naser et al. (2021) examined the use of machine learning methods inspired by nature to uncover previously unknown correlations between geometric and material parameters that affect CFST column load capacity. Almustafa and Nehdi (2020) applied the random forests technique to estimate the maximum movement of blast-exposed reinforced concrete slabs. In addition, a machine learning model with thirteen features was presented to predict the maximum displacement of blast-exposed reinforced concrete columns (Almustafa & Nehdi, 2022). Rahman et al. (2021) estimated the shear strength estimation of steel fiber reinforced concrete (SFRC) beams using eleven ML models. Rofooei et al. (2011) employed artificial neural network (ANN) models to assess the seismic vulnerability of concrete structures with moment resistant frames. Nguyen et al. (2021) proposed an ANN model for estimating the shear strength of polymer concrete beams reinforced with fibers. Dey et al. (2020) used several popular corrosion models to estimate the useful life of reinforced concrete buildings, then compared those results to those generated by an artificial neural network. Bardhan et al. (2022) created a high-performance machine learning system to calculate the ultimate load-carrying capacity of concrete-filled steel tube columns. Kaveh and Khalegi (1998) trained an ANN using backpropagation algorithm, neural nets with one, two and three hidden layers model for different types of concrete mixes to predict the strength of concrete. They used the best networks to provide accurate predictions about the strength of concrete blends with minimum error. To determine the connection between fiber angle and buckling capacity of the cylinders under bending induced loads, Kaveh et al. (2021) studied various ML approaches. Naderpour and Mirrashid (2020) used an innovative ANN model to predict the shear strength of concrete beams reinforced with fiber reinforced polymer (FRP) bars. The punching shear strength of two-way reinforced concrete slabs can be predicted using a neural network model developed by Tran and Kim (2021). In order to create an ANN model, they utilized a total of 218 test data taken from the relevant literature. For forecasting the ultimate buckling load of composite cylinders, Kaveh et al. (2021) used several machine learning approaches.

In order to predict the lateral cyclic response of rocking steel bridge piers, this research aims to develop cutting-edge physics-based ML techniques such as linear regression, support vector regression, decision tree, random forest, artificial neural network, k-nearest neighbors, and XGBoost. The purpose of this research was to assess and contrast the performance of various ML models and ANN frameworks. By comparing results, we can gauge the accuracy and precision of the predictions made by the various ML models.

This research demonstrates innovative utilization of the ANN framework and ML models to predict various lateral cyclic reactions of rocking steel bridge piers. These various lateral cyclic reactions include column residual drift, column shortening, the ratio of degraded stiffness to initial stiffness, the maximum lateral strength to uplift force ratio, and the lateral strength reduction ratio. The recognition that, in contrast to statistical or mathematical methods that require a predetermined model, the machine learning algorithms enable the discovery of patterns and relationships in complex datasets serves as the motivation for employing machine learning and deep learning models in this research study.

In the current investigation, an input vector consisted of eight different parameters that have an effect on the lateral cyclic response of PT-base rocking steel bridge piers. These parameters consist of the following: column diameter (\({d}_{c}\)), column diameter-to-thickness ratio (\({d}_{c}/{t}_{c}\)), column height-to-diameter ratio (\({h}_{c}/{d}_{c}\)), cross-sectional area of the tendon to column ratio (\({A}_{\mathrm{pt}}/{A}_{c}\)), tendon prestressing ratio (\({f}_{pt,0}/{f}_{pt,u}\)), dead load ratio (\(P/{A}_{c}{f}_{y,c}\)), base plate thickness (\({t}_{\mathrm{bp}}\)), and base plate extension (\({e}_{\mathrm{bp}}\)). The research looked into a number of response characteristics, including maximum lateral strength to uplift force ratio, column residual drift, ratio of column shortening to height, ratio of degraded stiffness to beginning stiffness, and ratio of degraded stiffness to initial stiffness. The study investigated several response variables, including the column residual drift (\({\updelta }_{\mathrm{res}}/{h}_{c}\)), column shortening to height ratio (\({\updelta }_{\mathrm{short}}/{h}_{c}\)), ratio of degraded stiffness to initial stiffness (\({\mathrm{K}}_{\mathrm{deg}}/{K}_{\mathrm{ini}}\)) and maximum lateral strength to uplift force ratio (\({\mathrm{V}}_{\mathrm{max}}/{V}_{\mathrm{up},\mathrm{rigid}}\)).

In this context, the symbol (\({t}_{c}\)) represents the thickness of the column wall, (\({A}_{pt}\)) denotes the area of the tendon, (\({A}_{c}\))represents the cross-sectional area of the column, (\({f}_{pt,0}\))stands for the prestressing force of the tendon, (\({f}_{pt,u}\)) denotes the ultimate strength of the tendon, (\(P\)) represents the load of the superstructure, and (\({f}_{y,c}\)) denotes the yield strength of ASTM A252-19 Gr (2019) in relation to the column. Three tubes were subjected to experimental measurement, resulting in a value of 415 MPa. The input parameters used to develop the models are presented in Table 1. All the input parameters, except for (\({d}_{c}\)) and (\({d}_{c}/{t}_{c}\)), were assigned three distinct values, as delineated in Table 1. The column diameter was observed to exhibit four distinct values, while a total of 23 different values for the (\({d}_{c}/{t}_{c}\)) ratio were considered, as observed in Table 1.

Table 1 Input parameters

Machine learning models

In this study, we tested six different approaches to predicting punching power. Linear regression, K-nearest neighbors (KNN), support vector regression (SVR), decision tree (DT), random forest (RF), and extreme gradient boosting (xgBoost) are only a few of the Machine Learning approaches used for prediction. Linear regression is a statistical method in which the input variables and the output variables are assumed to have linear connections. K-nearest neighbors (KNN) is a non-parametric machine learning approach whose output is a weighted average of the K-nearest neighbors. In DT, the dataset is broken down into a hierarchy of simple judgments, where each branch relies on just one or a small number of input features. As a result, the information is structured in this tree. The goal of SVR is to develop a prediction equation that yields a result within the allowed error range, based on the expected result. The estimated result of using RF is the average of a series of trees constructed using a random vector drawn independently of the input vector. AdaBoost and xgBoost are the latest developments in the line of boosting methods, which take several relatively weak classifiers and merge them into a single robust one. The definitions of the weak classifiers are such that they greatly improve prediction performance when used in conjunction with one another. Each leaf in the tree is given a continuous score as it is developed in xgBoost (Mangalathu et al., 2021). Table 2 displays the optimum values for the hyperparameters of the machine learning models.

Table 2 Hyperparameters of machine learning models most

Artificial neural networks

In machine learning, artificial neural networks (ANNs) rely on mathematical techniques founded on the principle of interconnected layers of nodes. An ANN is a type of specialized artificial intelligent (AI) built to identify and address complex issues and events. While it is possible to draw parallels between neural networks and traditional digital computing methods, it is essential to remember that neural networks have several additional benefits. For instance, high precision is frequently used in conjunction with the similarity between processing modes and distributed data storage. Furthermore, after the training phase is complete, these methods show remarkable resilience and the ability to learn from and use new information. In most cases, an ANN will consist of a layer of input neurons, which will then typically be followed by other layers of interconnected neurons. As shown in Fig. 1, the neurons are able to make predictions regarding the results of a certain process. According to Rafiq et al. (2001), the interface that exists between the layers is built on the basis of the link weights. An ANN is a computational structure that can be claimed to be made of several straightforward modules and intricately linked processing entities. This can be said as a description of what an ANN is. These components analyze data while responding in a dynamic state to input from the outside world (Bu et al., 2021).

Fig. 1
figure 1

Structure of the artificial neural network (Bu et al., 2021)

A capability of an ANN is the ability to acquire skill in retaining the traits or properties of data that are bestowed upon it. This enables the ANN to build connections or parallels between new data and data that it has already seen, with different degrees of success. The fundamental purpose of the hidden layers is to serve as either a connector or a carrier of information. The architecture makes it easier for the neural networks to derive a non-linear association from the dataset that is being supplied to them. An ANN with reference to a specific neuron \({n}_{j}\) is composed of six fundamental components: inputs (\({p}_{i}\)), bias (\({b}_{j}\)), weights (\({w}_{ij}\)), the respective sum function (n)j, activation function (f), and outputs (\({a}_{j}\)) as illustrated in Fig. 1.

The term "input" refers to a piece of information that is considered to be a decision variable and that originates either from the external environment or from the neurons themselves. Inputs can come from either source. Weights provide numerical numbers that can be used to measure the influence that various process elements and input factors have on one another. There is a possibility that the procedure of initialization will result in the development of arbitrary weight values. The operation that is widely known as the "sum function" is used to thoroughly reflect the combined influence of inputs and related weights while also accounting for a predetermined bias value in the provided process element. This is accomplished by using the operation that is commonly known as the "sum function". This concept is mathematically described by Eq. 1:

$${(n)}_{j}=\sum_{i=1}^{i=k}{w}_{ij}{p}_{i}+{b}_{j}$$
(1)

\(i=\left[1;k\right]\) denote the index of the ith input neuron, \(j=\left[1;m\right]\) denote the index of the jth output neuron, k = signifies the total number of units contained within the ith input vector, \({b}_{j}\) = denotes the bias of the jth node, which is the activation threshold.

The activation function or transfer function, commonly represented by the log-sigmoid function or the hyperbolic, is an operable function that processes the input value (n)j, and subsequently determines the corresponding output value according to the formula stated in Eq. 2:

$${(a)}_{j}={f(n)}_{j}=\frac{1}{1+{e}^{-\alpha {(n)}_{j}}}$$
(2)

The variable \({(a)}_{j}\) denotes the output of the jth neuron, while the constant \(\alpha\)  serves as a control parameter that regulates the slope of the semi-linear region and it is common to assign a numeric value of 1 to this parameter (Bu et al., 2021).

Python was the chosen programming language for the creation of the neural network model that was utilized in this investigation. For the objectives of data preprocessing and management, libraries such as Pandas, Numpy, and Scikit-Learn are utilized. The model was developed with the help of three hidden layers, each of which had sixty-four neurons (Table 3). The training phase was provided with access to 75% of all of the available data for its own purposes. For the purpose of testing, in accordance with our methodology, a subset that was chosen at random and consisted of 25 percent of the data that was left over was allotted. The parameters for the ANN model are outlined in Table 3, which provides an overview of these parameters.

Table 3 Parameters of the ANN model

Modeling metric for evaluation

Through the application of performance metrics, the accuracy of the suggested system as well as the error quantification could be determined. In statistical analysis, the R2 coefficient, also known as the coefficient of determination, is the most important parameter that is taken into account. This metric provides a numerical value ranging from 0 to 1 that measures the proportion of a variable's variance that can be accurately attributed to a particular explanatory variable. This value can be stated as a proportion of the total variance. The above phrase refers to a metric that, expressed by a numerical value, quantifies the degree to which a model elucidates the data that is at hand. The coefficient of determination, R2, can take on any value from 0 to 1, with 0 being the smallest and 1 being the greatest. As the value of R2 gets closer to 1, the likelihood that the predicted value will be within a reasonable distance of the experimental value increases, as shown by Eq. 3:

$${R}^{2}=1-\frac{\sum_{i=1}^{N}{({y}_{i}-{\widehat{y}}_{i})}^{2}}{\sum_{i=1}^{N}{({y}_{i}-\overline{y })}^{2}}$$
(3)

In predictive modeling, a performance evaluation metric known as the root means square error (RMSE) is produced by taking the square root of the mean square error. This metric is extensively used. It provides a numerical representation of the typical amount by which a data point deviates from the value that was anticipated by the model that is being considered. A smaller value for the RMSE is positively connected with the efficacy of the model, as shown in Eq. (4):

$$\mathrm{RMSE}=\sqrt{\frac{1}{N}\sum_{i=1}^{N}{({y}_{i}-{\widehat{y}}_{i})}^{2}}$$
(4)

The mean absolute error (MAE) formula is denoted as Eq. 5

$${\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} |y_{i} - \hat{y}_{i} |$$
(5)

Results and discussion

The inclusivity and arrangement of the data that is used for training purposes, as well as the extent of the volume of the data, all have a substantial impact on the effectiveness of a network. In the current investigation, a database was utilized that contained information on the lateral quasi-static cyclic response of post-tensioned steel bridge piers (Wakjira et al., 2022).The neural network model was educated employing the keras module, which was then implemented atop the Tensorflow backend. In the pre-learning phase, one of the most important steps is looking for hyperparameters. Table 3 presents the hyperparameters that, after optimization, best represent the models. The use of statistical analysis indicators allows for the evaluation of the optimized models in predicting the seismic response of rock and steel bridge piers. In the course of this study, the models that were used to forecast the seismic response of rock and steel bridge piers were optimized.

A number of different metrics, including the coefficient of determination (R2), the RMSE, and the MAE, are computed in order to determine each method's performance. The R2 value illustrates how close the proposed formulation may get to matching the observed values from the experiment. The root means square error, also known as the RMSE, is a cost function that actively contributes to the learning process of the algorithm. A good predictor model will have values that are closer to R2 value of 1.0, as well as RMSE and MAE values that are lower than those values. According to the findings of this investigation, the random forecast (RF) model had the highest R2 and the lowest RMSE values compared to all of the other models that were taken into consideration.

The performance of the various ML models was analyzed using the three performance indicators, and the results are shown in Table 4. According to statistics, a model is considered to have strong performance when the R2 value of the model is high and the error measures associated with the model are correspondingly low. Table 4 displays that the R2 value for the RF model's prediction of residual drift has the highest value (0.98), while the MAE and RMSE values have values of 0.004 and 0.015, respectively. When it comes to predicting residual drift, the DT model achieves the next-highest value of R2, which is 0.97, and it has RMSE values of 0.019. These data provided evidence that the RF model is superior to the other models in terms of how accurately it can make predictions. It has been demonstrated that the linear model has the worst performance when it comes to forecasting residual drift. This can be seen from the fact that its R2 value is the lowest (0.35) and its MAE value the highest (0.061). In Fig. 2 and 3, the plot of loss and RMSE is depicted against the epoch number for the ANN model. As can be seen, As the number of iterations increases, the root-mean-square error of the training and testing set exhibits a decreasing trend.

Table 4 Performance measures for the various machine leaning models
Fig. 2
figure 2

Loss plot in ANN model for: a \({\updelta }_{res}/{h}_{c}\); b \({\updelta }_{short}/{h}_{c}\); c \({\mathrm{K}}_{deg}/{K}_{ini}\); d (\({\mathrm{V}}_{\mathrm{max}}/{V}_{\mathrm{up},\mathrm{rigid}}\))

Fig. 3
figure 3

RMSE plot in ANN model for: a \({\updelta }_{res}/{h}_{c}\); b \({\updelta }_{short}/{h}_{c}\); c \({\mathrm{K}}_{deg}/{K}_{ini}\); d (\({\mathrm{V}}_{\mathrm{max}}/{V}_{\mathrm{up},\mathrm{rigid}}\))

One of the most important variables that might affect the seismic performance and functioning of rocking bridge piers after a seismic event is residual drift (\({\updelta }_{\mathrm{res}}/{h}_{c}\)). Figure 4 presents scatter plots that demonstrate the residual drift forecast and actual residual drift comparison. The 45-degree concealed line in these graphs illustrates the exact match between the anticipated and actual values of \({\updelta }_{\mathrm{re}s}/{h}_{c}\). On both the train and test sets, each figure also displays the coefficient of determination (R2). The complicated nonlinear relationship between the predictors and the response variable, \({\updelta }_{res}/{h}_{c}\), could not be accurately captured by the linear regression algorithm as shown in Fig. 4.

Fig. 4
figure 4

Comparison of predicted and actual residual drift: a ANN; b Linear; c SVR r; d KNN; e DT; f RF; g XGBoost

Figures 5, 6, 7 show scatter plot of anticipated against actual values for \({\updelta }_{\mathrm{short}}/{h}_{c}\), \({\mathrm{K}}_{\mathrm{deg}}/{K}_{\mathrm{ini}}\), and.(\({\mathrm{V}}_{\mathrm{max}}/{V}_{\mathrm{up},\mathrm{rigid}}\)), respectively. Similarly to the residual drift, the relationship between the input features and the response variables was not represented using linear regression models.

Fig. 5
figure 5

Comparison of predicted and actual column shortening to height ratio: a ANN; b Linear; c SVR r; d KNN; e DT; f RF; g XGBoost

Fig. 6
figure 6

Comparison of predicted and actual degraded stiffness to initial stiffness ratio: a ANN; b Linear; c SVR; d KNN; e DT; f RF; g XGBoost

Fig. 7
figure 7

Comparison of predicted and actual ratio of maximum lateral strength to uplift force: a ANN; b Linear; c SVR; d KNN; e DT; f RF; g XGBoost

Conclusions

The purpose of this research was to determine whether or not it would be possible to forecast the lateral cyclic response of PT base rocking steel bridge piers using an ANN model. Column residual drift, column shortening, ratio of deteriorated stiffness to initial stiffness, maximum lateral strength to uplift force ratio, and lateral strength decrease ratio were the response variables that were investigated in this study. In this analysis, we tested several theorized predictive characteristics. These variables included the diameter of the column, the ratio of column diameter to thickness, the ratio of column height to diameter, the cross-sectional area ratio of tendon to column, the prestressing ratio of tendon, the ratio of dead load, the thickness of the base plate, and the degree to which the base plate was extended. All of these variables were compared to one another to determine which one had the greatest influence. The models were educated and validated using a dataset that had more than 18,000 distinct data points. The model errors, which are determined using statistical metrics such as R2, RMSE, and MAE, shows that there are only very slight deviations between the values that were predicted and those that were actually observed. The results of this research showed that an upgraded version of the RF model could precisely and effectively estimate the lateral cyclic reactions of PT-based rocking steel bridge piers.