1 Introduction

Vertical deflection, a change in a bridge’s structure along the vertical axis, can occur due to various factors such as load, temperature changes, or material degradation over time. Excessive vertical deflection can lead to lateral drift, affecting the stability of the bridge, and cause vehicle-bridge coupling vibration, impacting the operation of vehicles on the bridge. Thus, vertical deflection in bridge structures is indeed a critical safety concern. While a bridge may be structurally sound, excessive vertical deflection can indicate potential issues [1]. Therefore, monitoring and controlling vertical deflection is crucial for maintaining the safety and longevity of bridge structures.

Monitoring and analyzing vertical deflection can also help identify potential structural weaknesses or areas of the bridges in need of maintenance or repair. Traditionally, there are several approaches for determination of vertical deflection of bridges including dial gauge method (DGM), linear variable differential transformer (LVDT), laser scanning method (LSM), and finite element method (FEM). In DGM, a dial gauge is mounted on the bridge structure, and the deflection is measured by observing the movement of the gauge; the accuracy of this method is limited by the sensitivity of the dial gauge. LVDT uses a linear transducer to measure the displacement of a target point on the bridge; this method has better accuracy than the dial gauge method, but it requires more complex instrumentation. LSM involves using a laser scanner to measure the distance between the scanner and the bridge structure. The deflection is then calculated based on the change in distance between the scanner and the bridge; this method provides high accuracy and precision, but is also more complex and expensive. FEM is a computer-based simulation method that uses mathematical models to simulate the behavior of the bridge structure under load; this method is useful for predicting the behavior of a bridge under different loading scenarios, but its accuracy is dependent on the quality of the model and input data.

In recent years, soft computing (SC) models, including machine learning (ML) models, have gained prominence. These models are built on computational algorithms that learn from data and analyze the relationship between input and output variables. This ability has made them particularly effective in solving complex real-world problems. They have found applications in various fields, including structural engineering [2, 3], material science [4, 5], and bridge engineering [6]. It is considered more flexible and adaptable approach than traditional computing approaches like FEM. Yue et al. [7] applied deep learning and linear regression in prediction of temperature-induced deflection of long-span cable-stayed bridges. Deng et al. [8] used deep learning to predict the deflection of in-service bridge based on time-continuous vehicle influence coefficient and environmental temperature variables. Yue et al. [9] developed independent recurrent neural network to predict temperature-induced deflection of a cable-stayed bridge based on the average temperature of the main girder. Wang et al. [10] developed and compared two SC models namely LSTM neural network and support vector machines (SVM) for prediction of temperature‑induced defection of cable‑stayed bridges. In general, these mentioned studies showed that the SC models have a great potential in solving the problems of bridge engineering [6]. However, their application is still limited in prediction of the deflection of various types of bridges [11]. Thus, development and application of novel SC models in prediction of the deflection of bridges are required for better performance of predictive models which might help the bridge engineers to save time and cost for bridge health monitoring and evaluation.

In this study, thus, the main objective is to predict the Vertical deflection of steel–concrete composite bridges (VDCB) using three novel ensemble models namely SGBE-RF, RSS-RF and B-RF—a combination of random forest and various ensemble techniques namely stochastic gradient boosting (SGBE), random subspace (RSS), and bagging. The main difference of this study compared with previous published works is that this is the first study working on the prediction of the VDCB using soft computing models. In addition, novelty of this study also lies on the development of the novel ensemble soft computing models in prediction of the VDCB. Database of 83 experimental loading tests carried out on 83 bridges in various locations throughout Vietnam was collected and used for the model’s study. Correlation-based feature selection was used to evaluate the importance of input variables used in prediction modeling. Various methods namely RMSE, MAE, and R, and Taylor diagram were selected for validation and comparison of the models. Weka software was used for predictive modeling and data processing.

2 Materials and Methods

2.1 Data Used

In this study, experimental loading tests were conducted on 83 bridges located throughout Vietnam [11]. These bridges have operating periods ranging from 4 to 43 years. The data collected from these tests were utilized for the model’ study. Database collected includes two main types of variables: input parameters (X1–X5) which consist of the bridge's cross-sectional shape, length of concrete beam, number of years in use, height of the main girder, and distance between the main girders and an output parameter (Y) which is the maximum vertical deflection measured in millimeters. The maximum vertical deflection was determined through a truck-loading test using a 300kN load, in which the largest measured bridge deflection was selected through the eccentricity method. The deflection measurements were taken using a dial indicator with 0.01 mm increments, mounted at the mid-span position on the bottom of the beam with limit pins and an outer frame clamp (Fig. 1 and Fig. 2). The distribution of the input and output variables used in this study is presented in Fig. 3. Table 1 shows the initial statistical analysis of data used in this study. To train and validate the models, the experimental data were randomly split into two parts including 70% of the data used to generate the training dataset and the remaining 30% used for the testing dataset. In this study, we have used the hold-out method to divide the data into training and testing as this method is a popular method for splitting the data for ML modeling [12]. Data of this study was also presented in the previous work [11].

Fig. 1
figure 1

Dial indicator used in the experiments (a) and field photo of experiments (b)

Fig. 2
figure 2

Diagram of truck-loading test used in this study

Fig. 3
figure 3

Distribution of the data used in this study

Table 1 Initial statistical analysis of data used in this study

2.2 Methods Used

2.2.1 Random Forest (RF)

RF algorithm—a popular ML technique was first introduced by Leo Breiman and Adele Cutler in 2001 [13]. It is based on the concept of decision trees, but instead of using a single decision tree, Random Forest builds an ensemble of decision trees, which are combined to make predictions [14]. In RF, multiple decision trees are built on different subsets of the data and features, and the predictions of the trees are combined to make the final prediction [15]. In each iteration of the algorithm, a random subset of the data and a random subset of the features are selected for building the decision trees [16], which helps to reduce the risk of overfitting and improves the generalizability of the model.

RF has been applied in a wide range of applications, including finance, healthcare, and bioinformatics. In this work, RF was used as a base model to develop various novel ensemble models for prediction of the VDCB.

2.2.2 Stochastic Gradient Boosting (SGBE)

SGBE is an extension of the gradient boosting algorithm and was first introduced by Jerome Friedman in 2002 [17]. It is an ensemble method used in ML that combines the results of multiple models trained on different subsets of the data. In this technique, the main principle is to iteratively train models on the residuals of the previous models, while introducing a stochastic element to reduce overfitting and improve the generalizability of the model [17]. It starts by training an initial model on the training data, which is then used to make predictions on the same data. In each subsequent iteration, the SGBE method introduces a new model that is trained on the residuals (difference between the predicted values and the actual values) of the previous model. However, instead of training the model on the entire dataset, the method randomly selects a subset of the data for each iteration, as well as a subset of the features [18], which helps to reduce overfitting and improve the robustness of the model.

SGBE has been successfully applied in a wide range of applications, including image and speech recognition, natural language processing, and bioinformatics. In this work, SGBE was used to improve the performance of RF algorithm to develop an ensemble model of SGBE-RF in prediction of the VDCB.

2.2.3 Random Subspace (RSS)

RSS was first introduced by Tin Kam Ho in 1998 which is known as an effective ensemble technique in improving the performance of the weak models [19]. In this method, a subset of the features is randomly selected for each tree in the forest, and the trees are then combined to form the final ensemble model [20]. It is particularly useful when working with high-dimensional datasets, where there are many features, and the number of training samples is limited. Unlike traditional ensemble techniques like Bagging or Boosting which use a single model with multiple training sets, RSS employs a different strategy by using multiple models on different subsets of features [21]. Therefore, the method can reduce overfitting, identify important features, and provide more robust predictions on new data.

RSS has been shown to be effective in a wide range of applications, including image classification, text classification, and bioinformatics. In this study, RSS was used to improve the performance of Random Forest algorithm to develop an ensemble model of RSS-RF in prediction of the VDCB.

2.2.4 Bagging (B)

Bagging known as bootstrap aggregating is one of the classical ensemble techniques used in ML to improve the accuracy and robustness of single models [22]. It was first introduced by Leo Breiman in 1996 [22]. The main principle of Bagging is to create multiple versions of the same model, each trained on a different subset of the training data and then, combine their predictions to produce a more accurate and robust ensemble model [23]. Bagging is particularly useful when working with complex models that are prone to overfitting [24], as it can help to reduce the variance in the predictions and improve the generalizability of the model.

Bagging has become a popular technique in a wide range of applications such as Image and speech recognition, financial forecasting, fraud detection, medical diagnosis, natural language processing. In this study, Bagging was selected in improving the performance of RF model for developing an ensemble model of B-RF for prediction of the VDCB.

2.2.5 Validation Indicators

Several statistical measures are utilized for model validation. In this study, we have used statistical metrics R-squared (R2), root mean squared error (RMSE), and mean absolute error (MAE) [25,26,27,28]. R2, also known as the coefficient of determination, signifies the proportion of variance in the dependent variable that can be explained by the model. Its values range from 0 to 1 [29, 30]. A value of “1” suggests that the independent variables account for all the variation in the dependent variable, indicating a perfect fit of the model to the data. Conversely, a value of “0” implies that the independent variables do not explain any of the variation in the dependent variable around its mean. The RMSE is a metric used in model validation to quantify the difference between predicted and actual values [31]. A lower RMSE indicates a better fit of the model to the data [32, 33]. The MAE is another important metric used in model validation [34, 35]`. It shows how close a model’s predictions are to the actual results. It gives equal weight to all errors and is less influenced by outliers. A lower MAE means the model’s predictions are more accurate. It is always desirable to use statistical measures R2, RMSE and MAE together for more comprehensive view of model performance. Some researchers have also calculated the error level-cumulative frequency relationship to evaluate accuracy of predictive model [32, 33].

In model development, statistical measures such as R2, RMSE, and MAE play a crucial role during both the training and testing phases. During the training phase, these measures help evaluate the model’s performance on the training data, providing insights into how well the model is learning. For instance, a high R2 coupled with low RMSE and MAE values would suggest that the model has effectively learned from the training data. On the other hand, during the testing phase, these measures assess how well the trained model can generalize to unseen data. If the model exhibits high R2 and low RMSE and MAE values on the testing data, it is an indication that the model is not overfitting the training data and can effectively generalize to new, unseen data.

In addition to RMSE, MAE, and R2, Taylor diagram was also used to compare the performance of different ML models in this study. It is a polar coordinate plot that shows how well a set of model simulations match observed data in terms of their R2, RMSE, MAE, and standard deviation [36]. Each model simulation is represented by a point on the diagram, and the closer the point is to the reference point (which represents the observations), the better the model’s performance.

2.2.6 Correlation-Based Feature Selection (CFS)

CFS is one of the most popular feature selection algorithms used in selecting a subset of relevant features that have strong correlations with the target variable [37]. It was first proposed by Mark Hall in 1999 as a way to improve the performance of ML models by reducing the number of irrelevant or redundant features in the dataset [38]. It works by computing the correlations between each feature and the target variable, as well as the correlations between features themselves, representing by average metric (AM), on which it selects the subset of features that have the highest correlations with the target variable while minimizing the correlation between features [38]. In this work, CFS was selected to evaluate the importance of input variables vs the output of the VDCB.

3 Methodological Flowchart

Methodology of this study is presented in Fig. 4. It started with the preparation of the data used for the modeling. Data of 83 bridges collected were divided into training (70%) and testing (30%). Using training dataset, the models were then trained and constructed. Out of four models, three hybrid models namely B-RF, RSS-RF, and SGBE-RF were constructed by combing the RF and B, RSS, and SGBE optimizations. In the hybrid models, B, RSS, and SGBE optimizations were used to optimize the training dataset, and then, the optimal training dataset was used in the RF for prediction. Validation and comparison of the models were carried out using testing dataset and several popular validation indicators including R, RMSE and MAE.

Fig. 4
figure 4

Methodological flowchart of this study

4 Results and Discussion

4.1 Importance of Input Variables Using CFS

CFS feature selection method was applied to evaluate the importance of input variables used for prediction of the VDCB as shown in Fig. 5. It can be observed that X2 has the highest value of AM (1.668), followed by X1 (1.434), X4 (1.314), X5 (0.997), and X3 (0.92), respectively. Thus, it can be stated that X2 has the highest importance for predictive modeling of the VDCB, and all input variables (X1–X5) have contributed and selected in predicting the VDCB. It is reasonable as the length of concrete beams can impact the overall load-carrying capacity of the bridges affecting the vertical deflection [39].

Fig. 5
figure 5

Importance of input variables used in the models

Longer beams usually deflect more under the same load. This is because deflection in a beam is also influenced by its length. According to beam theory, for a simply supported beam under a uniformly distributed load, the maximum deflection is proportional to the length of the beam to the fourth power. So, if all other factors (like material, cross-sectional shape, and load) are constant, a longer beam will deflect more. Thus, while a larger moment of inertia (related to the beam’s cross-sectional shape) can help resist bending, the length of the beam plays a significant role in its overall deflection. Both these factors are crucial in structural design. It is also considered as one of the most important factor affecting the vertical deflection and vertical acceleration of the bridges [40, 41].

4.2 Training and Validating the Models

Using training dataset, three ensemble models (SGBE-RF, RSS-RF, B-RF) and a single RF model were trained and constructed. In order to receive the highest performance of these models, the hyper-parameters used in training the models were optimized through the trial-and-error process as shown in Table 2.

Table 2 Hyper-parameters of the models used in this study

Validation of the models was carried out on both training and testing datasets as shown in Figs. 6, 7, 8, and Table 3. It is observed from Fig. 6 that SGBE-RF has the highest value of R2 (0.845) in the case of training dataset, followed by RF (0.834), RSS-RF (0.824), and R-RF (0.773), respectively. In the case of testing dataset, SGBE-RF also has the highest values of R2 (0.805), followed by RSS-RF (0.781), B-RF (0.764), and RF (0.74), respectively. It is also observed that R values of ensemble models (SGBE-RF, RSS-RF and B-RF) are higher than R2value of RF. In respect with RMSE, Fig. 7 shows that SGBE-RF has the lowest value (1.298) compared with other models such as RF (1.329), RSS-RF (1.408), B-RF (1.601) in the case of training dataset while in the case of testing dataset SGBE-RF also has the lowest value (1.215) compared with RSS-RF (1.232), B-RF (1.317), and RF (1.509). It is also observed that RMSE values of ensemble models (SGBE-RF, RSS-RF and B-RF) are lower than RMSE value of RF. In the case of MAE, Table 3 shows that that SGBE-RF has the lowest value (0.805) compared with other models such as RF (0.816), RSS-RF (0.935), B-RF (1.087) in the case of training dataset while in the case of testing dataset SGBE-RF also has the lowest value (0.92) compared with RSS-RF (0.926), B-RF (1.030), and RF (1.097). It is also seen that MAE values of ensemble models (SGBE-RF, RSS-RF and B-RF) are lower than MAE value of RF. Analysis of Taylor diagram shows that SGBE-RF is the nearest point to the reference line compared with other models RSS-RF, B-RF, and RF for both training and testing datasets (Fig. 8).

Fig. 6
figure 6

R values of the models: a training RF, b testing RF, c training SGBE-RF, d testing SGBE-RF, e training RSS-RF, f testing RSS-RF, g training B-RF, and h testing B-RF

Fig. 7
figure 7

Analysis of error of the models: a training RF, b testing RF, c training SGBE-RF, d testing SGBE-RF, e training RSS-RF, f testing RSS-RF, g training B-RF, and h testing B-RF

Fig. 8
figure 8

Analysis of Taylor diagrams of the models: a training dataset and b testing dataset

Table 3 Performance of the models using training and testing datasets

Generally, it can be stated that all three ensemble models (SGBE-RF, RSS-RF and B-RF) and a single RF model performed well for prediction of the VDCB. However, the performance of SGBE-RF is better than other ensemble models (RSS-RF and B-RF), and all three ensemble models (SGBE-RF, RSS-RF and B-RF) outperforms single RF model. It means that ensemble techniques like SGBE, RSS, and Bagging slightly improved the performance of a single RF model. It is because of the advantages of ensemble techniques in improving the accuracy and performance of the single ML models. More specifically, SGBE has several advantages [17, 42] including (i) the algorithm can reduce the risk of overfitting and improve the generalization performance of the model due to randomly subsampling the data and selecting a random subset of features at each stage of the boosting process, and (ii) it is capable in working with complex, high-dimensional datasets and effectively capturing the underlying patterns in the data and improve the overall accuracy of the predictions by combining multiple weak models into a strong ensemble model. Advantages of RSS are [19, 43] (i) it is able to improve the accuracy of the predictions by building multiple models on different random subsets of features, (ii) it helps in reducing overfitting by limiting the number of features that are used in each individual model, and (iii) it is able to identify which features are consistently important across the different models. For Bagging, the advantages are [22, 44] (i) it is able to reduce the risk of overfitting to the training data by building multiple models on different bootstrapped samples of the dataset; thus, it can improve the performance of the model, especially in cases where the original dataset is small or noisy, (ii) it is capable in improving the stability of the model by reducing the variance of the predictions, and (iii) this algorithm is powerful to outliers in the data, as the samples are drawn with replacement, and each model is built on a different subset of the data; it can help in reducing the impact of outliers on the final predictions; thus, the performance of the final model is improved. In addition, RF is a versatile and powerful algorithm that also offers several advantages including high prediction accuracy, robustness to overfitting, automatic feature selection, tolerance to missing data, and scalability [13, 45]. In this work, RF is the most suitable model for combining with SGBE for improving the predictive modeling of the VDCB.

5 Concluding Remarks

This study proposed and applied three novel ensemble models (SGBE-RF, RSS-RF, B-RF) and a single RF model for predicting the VDCB. These models were trained and validated using data from 83 steel–concrete composite bridges located in various part of Vietnam. The validation and comparison of these models were conducted using standard statistical methods, including R2, RMSE, MAE, and the Taylor diagram. The validation results indicated that the SGBE-RF model outperformed the other models (RSS-RF, B-RF, and RF) in predicting the VDCB. Notably, the three ensemble models (SGBE-RF, RSS-RF, B-RF) demonstrated superior performance compared to the single RF model.

The findings of this study underscore the potential of ensemble models in enhancing the accuracy and robustness of VDCB predictions. It was also observed that the length of the concrete beam is a critical parameter in predicting the VDCB. Future research could investigate the applicability of these models to other types of bridge types, depending on the sufficient availability of data. Additionally, exploring other ensemble models or machine learning techniques could further enhance the accuracy and robustness of predictions in bridge engineering.