Keywords

1 Introduction

Automated operational modal analysis (AMA) has seen significant growth in recent years, with the technology being widely adopted for the condition assessment of large-scale civil infrastructures [1,2,3,4,5,6]. AMA algorithms aim to reduce user intervention by automating various processes, including algorithmic parameter selection, spurious pole removal, and grouping physical poles into structural modes. This automated implementation allows for near real-time tracking of dynamic properties and the automatic detection of abnormal structural behaviors or damages [3,4,5].

Determining the damping ratio of large-scale structures is crucial for assessing their vibrational serviceability; however, many AMA studies have ignored identifying the damping ratio [4,5,6] due to the high complexity and uncertainty associated with damping estimates. In particular, the accuracy of AMA-based damping estimates is highly dependent on their parameter selection, and using automatically selected parameters can lead to increased uncertainty in the results. Thus, accurate damping estimation remains a significant challenge within the modal analysis community.

The clustering algorithm is also a significant issue in automated damping estimations. Even after applying a stabilization diagram, several spurious poles can remain within the modal estimate in long-term applications, and the accuracy of damping estimates is much more sensitive to the existence of spurious poles compared to the accuracy of natural frequency and/or mode shape estimates [7, 8]. In other words, the clustering process can lead to large variability in damping estimates even if it is successfully carried out based on the similarity of natural frequency and mode shape; thus, an appropriate technique is required to discard the spurious poles properly.

The uncertainty stemming from a lack of understanding of the underlying damping mechanisms is also a noteworthy challenge in modeling damping estimates. Indeed, this uncertainty is further compounded by ambiguous and sometimes conflicting relationships between environmental and operational conditions (EOCs) and damping, as reported in various studies [8,9,10,11]. Traditional deterministic regression approaches, which rely on a pre-determined set of parameters, are ill-suited to address the complexities and uncertainties involved in assessing the damping characteristics of large-scale structures.

In this regard, this study aims to enhance the accuracy of automated damping estimates and evaluate the damping characteristics of a cable-stayed bridge from long-term monitoring data. We employed a displacement reconstruction algorithm to minimize the required model order in AMA and DBSCAN for eliminating spurious poles. Based on these high-quality damping estimates, we further developed an enhanced regression model for the damping ratio using ML approaches. The training and testing data were established from 2.5 years of monitoring data. Data cleansing strategies based on statistical techniques and knowledge-based criteria were employed to ensure that only stable damping estimates were retained. Gaussian process (GP) models were used to derive the regression model. The impact of different data cleansing strategies and input features on model performance was examined to gain deeper insight into the damping mechanisms.

2 Methodology

2.1 Data Pre-processing: Displacement Reconstruction

Kim et al. [12] have demonstrated that the use of displacement reconstruction effectively suppresses high-mode components in the measured acceleration data, resulting in a more reliable and robust damping estimate for low-frequency modes. The displacement reconstruction algorithm is defined by the minimization problem, as shown in Eq. (1).

$$\underset{\mathrm{u}}{\mathrm{min}}\Pi \left(u\right)=\frac{1}{2}{\int }_{{T}_{1}}^{{T}_{2}}{\left(\frac{{d}^{2}u}{d{t}^{2}}-\overline{a }\right)}^{2}dt+\frac{(1-{\alpha }_{t}){\left(2\pi {f}_{t}\right)}^{4}}{2{\alpha }_{t}}{\int }_{{T}_{1}}^{{T}_{2}}{u}^{2}dt$$
(1)

where u, ā, T1, and T2 represent the displacement to be reconstructed, the measured acceleration, and the initial and final times of a time window, respectively. The regularization term, the second term in Eq. (1), can filter out low-frequency noise in the measured acceleration. For a more detailed description of this algorithm, readers are referred to previous studies [12,13,14].

2.2 Data Establishment: NExT-ERA-DBSCAN

2.2.1 NExT-ERA

The cross-correlation function between the measurement and reference response vectors can satisfy the homogeneous equation of motions under the assumption that the excitation is a stationary white-noise process. Accordingly, such correlation functions can be regarded as impulse response functions (IRFs) [15]. Then, the system matrices are calculated from the Hankel matrices composed of the IRF matrix. The Hankel matrix is subsequently decomposed using a singular value decomposition, allowing for the extraction of modal parameters from the system matrix A [16]. Equations (2) and (3) demonstrate this process, where U and V are the singular vector matrices and S is the diagonal matrix composed of the square root of the singular values of either HT(0)H(0) or H(0)HT(0).

$$ {\mathbf{H}}\left( 0 \right)\, = \,{\mathbf{HSV}}^{{\text{T}}} $$
(2)
$$ {\mathbf{A}}\, = \,{\mathbf{S}}^{{ - {1}/{2}}} {\mathbf{U}}^{{\text{T}}} {\mathbf{H}}\left( {1} \right){\mathbf{US}}^{{{1}/{2}}} $$
(3)

2.2.2 DBSCAN

Once the modal estimates are accumulated with the different system orders, all estimates should be separated appropriately into physical and spurious poles. To this end, we utilized a density-based clustering algorithm (DBSCAN) [17]. It does not require the specification of the number of clusters in advance, as well as the built-in concept of “noise” enables the elimination of outliers in the stabilization diagram as spurious poles within the clustering process. The overall steps of DBSCAN include the following tasks: (1) selecting and initializing an observation or point as the first cluster; (2) obtaining a set of points within a specified neighborhood search radius; (3) iterating through each neighbor until all points in the dataset are labeled; (4) merging clusters with varying densities that have border points with a distance less than the predefined parameter, ϵ. In this study, a nondimensional modal similarity distance measure between poles i and j is defined as Eq. (4).

$${d}_{ij}=\alpha \frac{\left|{f}_{i}-{f}_{j}\right|}{\mathrm{min}\left({f}_{i},{f}_{j}\right)}+\left(1-\alpha \right)\left(1-{\mathrm{MAC}}_{ij}\right)$$
(4)

In Eq. (4), α is a weighting value determining the importance of each modal parameter, min(∙) represents the minimum operator, fi is the modal frequency of the i-th pole, and MACij is the modal assurance criterion (MAC) value between the i-th and j-th poles.

2.3 Data Cleansing: Knowledge- and Statistics-Based Approaches

Data cleansing is a crucial step in the application of ML techniques as it ensures the integrity and accuracy of the data. In this study, outliers in damping estimates likely to include analytical errors are removed as an initial step. The outliers are detected based on the two statistical criteria: exceedance of 1.5 interquartile range (IQR) or three local scaled median absolute deviations (MAD) from a moving window with five points. This approach is referred to as a statistics-based strategy in this study.

Additionally, the system’s input should ideally be a stationary white-noise process to ensure accurate estimation of modal parameters. Therefore, the presence of strong sinusoidal excitation may result in underestimating the structure’s damping ratio. Therefore, damping estimates from vortex-induced vibration (VIV) conditions that introduce harmonic excitation must be discarded from the dataset. In that sense, the data point is discarded if any of the ten-minute mean wind speeds from the one-hour wind speed records were within the VIV wind speeds of 10.4 to 16 m/s. This approach is referred to as a VIV-based strategy in this study.

2.4 Probabilistic Regression: Deep GPR

Gaussian Process Regression (GPR) is a non-parametric method for modeling the relationship between inputs and outputs in a dataset [18]. It uses a probability distribution over functions, called a Gaussian process, to model the underlying function that generated the data. GPR makes predictions by assuming that the function’s value at any point is a Gaussian random variable and that the function’s values at different points are correlated. The correlation between function values at different points is determined by a kernel function, which can be chosen based on prior knowledge about the function or learned from data. The deep Gaussian Process (DGP) is a hierarchical extension of GPR that allows for the modeling of complex and high-dimensional data by stacking multiple GPRs (see Fig. 1) [19]. Each GPR in the stack models a different level of abstraction in the data, with the output of one GPR serving as input for the next one. This configuration allows for the modeling of multi-scale and non-linear relationships in the data. Also, it can overcome the limited expressiveness of kernel functions of a single GPR.

Fig. 1.
figure 1

The illustration of a two-layer DGP

3 Data

The target bridge in this study is a twin cable-stayed bridge located in the southern part of the Korean Peninsula (see Fig. 1). This bridge comprises two adjacent bridge structures, with a mid-span of 344 m and two side spans of 73.79 m. The first of these bridges (referred to as Bridge 1), the target bridge in this study, was built in 1984; the second bridge (referred to as Bridge 2) was built in 2005 to accommodate the increased traffic flow. In 2011, a significant vortex-induced vibration (VIV) occurred at Bridge 2, with a 10-min average wind velocity ranging from 9.8 to 11.5 m/s. The low damping ratio of Bridge 2 was found to be the reason for this unexpected vibration [20, 21].

The bridge is equipped with a wired SHM system consisting of 83 sensors, such as accelerometers, GNSS, strain gauges, ultrasonic anemometers, thermometers, and tiltmeters. Such data is collected at a sampling rate of 100 Hz. From the accumulated dataset, we utilized long-term monitoring data from January 1st, 2016, to August 31st, 2018, in this study. Note that the limited vortex-induced vibrations (VIV) were observed within the wind velocity range of 12 to 16 m/s, consistent with the results from a previous wind tunnel test [21].

Fig. 2.
figure 2

Target bridge

4 Results

4.1 Damping Estimation

The proposed OMA-based automated damping estimation algorithm was employed for the 2.5-year monitoring data of Bridge 1. The fundamental vertical mode was investigated in this study. The optimal parameters derived from previous studies for the NExT-ERA and DBSCAN were used [20, 22], as listed in Table 1. The percentage of successful modal parameter identification from the total number of datasets was almost perfect, 99.99%, highlighting the robust nature of the automated clustering process with the optimal parameters with minimum user intervention.

Table 1. Algorithmic parameters for NeXT-ERA and DBSCAN

4.1.1 Monthly Variation

Figure 3 depicts the fluctuations of the modal damping ratio during the monitoring period. It can be characterized by a gradual decrease from winter to summer and an increase from summer to winter. Figure 4(b) confirms this monthly variation in damping estimates. This finding aligns with previous research on cable-stayed bridges, which showed that modal damping ratios typically exhibit a negative correlation with temperature.

Fig. 3.
figure 3

Fluctuation of damping estimates

4.1.2 Amplitude Dependency

As shown in Fig. 4(a), damping estimates increase during daytime hours and decrease at night. The bivariate histograms of the RMS acceleration and damping estimates (See Fig. 4(b)) indicates a clear correlation between the vibration amplitude and the damping ratio. The damping ratio increased proportionally from 0.5 to 1.2%, with the RMS acceleration from 0.00 to 0.05 m/s2.

Fig. 4.
figure 4

Daily fluctuation of damping estimates

4.1.3 GPR Modeling

In this study, the input features for estimating the damping ratio were selected as follows. First, the RMS of acceleration and reconstructed displacement, as well as their power spectral density (PSD) amplitudes at the natural frequency were utilized as measures of the excitation level. In addition, one-hour mean wind speed and direction were considered in terms of wind conditions. Temperature data were acquired from the closest automatic weather stations. The established database of damping ratio and corresponding EOC was then processed by the two data-cleansing strategies described in Sect. 2.3. The details of the three datasets used in this study are presented in Table 2.

Table 2. Dataset according to the data cleansing strategies

4.1.4 Model Performance

We conducted a comparative study between several regression models. A total of three DGP models with 2, 3, and 4 layers were constructed, referred to as DGP2, DGP3, and DGP4, respectively. We additionally employed a multivariate Linear Regression Model (LM) and Generalized Linear Model (GLM) as a comparison group. The model performance is evaluated by 10-fold cross-validation with DS3.

Fig. 5.
figure 5

Model performance according to (a) model and (b) data cleansing strategies (TBU)

Figure 5(a) shows the box plots of RMSE from each model, indicating the superior performance of the DGPs compared to the classical regression models. The GP models had not only low RMSE but also a little variance, highlighting their stability and robustness. Among the GP models, DGP2 exhibited slightly better performance compared to others. In addition, Fig. 5(b) illustrates the impact of data cleansing through a comparison of the model performance of the DGP2 using datasets DS1, DS2, and DS3. The 80/20 holdout method was applied, and the RMSEs of each test dataset were calculated. The cleaned datasets, DS2 and DS3, showed improved performance compared to the raw dataset, DS1. The comparison between DS2 and DS3 also confirms that the aerodynamic effects from interference VIVs can lead to uncertainties in the damping estimates and corresponding modeling errors.

4.1.5 Prediction Performance

The final regression model was established using DGP2 on DS3. Figure 6(a) presents the prediction results of the DGP2 model trained on dataset DS3. The model exhibits a high level of accuracy, as demonstrated by the RMSE value of 0.123 and adjusted R2 value of 84.7% (See Fig. 6(b)), indicating a favorable regression interval. The results demonstrate the model’s capability to effectively capture the dynamic fluctuations of the damping ratios, with a substantial majority of 94% of the test data estimates falling within the 95% confidence interval.

Fig. 6.
figure 6

Predicted damping ratio from DGP2: (a) one-week example and (b) scatter diagram

5 Conclusions

This study proposed a probabilistic regression model for predicting the damping characteristics of a cable-stayed bridge from environmental and operational conditions. A comprehensive evaluation of the long-term damping characteristics of the bridge was conducted through the application of the proposed automated damping estimation framework to the 2.5-year monitoring data. The results exhibited a consistent trend in damping estimates even under varying environmental and operational conditions by minimizing analytical errors and excluding aerodynamic uncertainties. It is highlighted that the amplitude dependency of the estimated damping ratio and the monthly trends in estimated damping ratios having a negative correlation with temperature.

The proposed Deep Gaussian Process (DGP) model demonstrated successful prediction of the variability in damping with good accuracy, as 94% of the damping estimates were located within the 95% confidence interval of the model prediction. A comparative study with other regression techniques confirmed the superiority of the DGP models in terms of accuracy, stability, and robustness across different datasets. Data cleansing was carried out effectively by incorporating the knowledge gained from Operational Modal Analysis (OMA) and interactive vortex-induced vibration (VIV), thereby improving the performance of the regression model.