1 Introduction

Aging infrastructure, increased traffic load and frequency, and climate change motivate monitoring the condition of our built environment using autonomous, continuous and quantitative methods. Bridges, a key link in our transportation infrastructure network, are largely assessed in the US via visual inspection. This approach takes place at prescribed frequencies and is costly, possibly unsafe, and subject to human interpretation [1]. Automated procedures to identify damage to bridges and other structures have been investigated for some time and are typically referred to as Structural Health Monitoring (SHM) [2]. An ensemble of sensors provides raw data that form the “front end” of a SHM system. A signal-processing layer then extracts information on structure condition and damage identification techniques are applied to extract useful information from the data. Subsequent information is incorporated into structural analysis and probabilistic models to assess the state of the structure and develop an updated health and service life prognosis [3].

Normally, a structural deficiency leads to degradation of material properties or variations in geometry and, therefore, results in changes in system dynamic properties. In a vibration-based damage detection framework, properties such as modal curvature [4], Eigenmodes [5,6,7], and modal strain energy [8, 9] are sensitive to the aforementioned deficiencies. The classic approach for vibration-based damage detection is accomplished in two phases, modal identification and model updating, where an optimization procedure is employed to find values for damage indices that minimize a discrepancy function. Vibration signatures are also combined with Machine Learning tools for identification of local deficiencies using fault detection methods [10,11,12]. He et al. [10] used induced vibrations from trains passing across a coupled FEM model of simply supported, single span bridge and Genetic Algorithm to detect damage location and intensity. Loads coupled to the bridge model were from a single bullet train car and, as a result, did not simulate most train loading configurations. Variations in train axle loads were also not included. Kim et al. [11] investigated recorded accelerations, temperature and vehicle weights collected using a long-term SHM system deployed on a steel multi-girder bridge to detect deficiencies. Their methodology included: employing autoregressive model coefficients to extract damage-sensitive features from recorded accelerations; considering environmental and vehicle weights via regression analysis of damage sensitive features; and making decisions about bridge health based on differences between observed and predicted damage-sensitive features using Bayesian hypotheses testing with a 95% confidence interval [11]. The proposed framework was based on input measurement and the damage detection scheme was not intended to pinpoint damage location. It was concluded that using Bayesian regression that incorporated environmental and vehicle loading yielded more accurate results when compared against cases where these items were excluded [11]. Bellino et al. [12] used Principle Component Analysis to eliminate site condition effects, such as train mass and velocity, on structural frequencies so that frequency changes would be purely due to damage. The study included laboratory testing of a single moving mass on a short cantilever beam subjected to damage to verify the methodology and the proposed method was able to detect damage and differentiate between various damage levels [12].

A major challenge associated with using identified modal properties for damage detection under operational conditions is that environmental conditions (i.e., temperature, moisture, wind) may drastically affect identification results [13, 14]. Additionally, most operational modal analysis (OMA) algorithms assume that ambient excitations are stationary, white noise. In many cases, this assumption may be violated and, consequently, modal properties would not be consistently identified. Research is in progress that focuses on alleviating issues caused by these non-stationary external inputs [15, 16].

Another major drawback of conventional modal-based damage detection methods is their sensitivity to modeling errors. In an attempt to address this issue, full-scale, dynamic tests of a seven-story reinforced concrete building were completed to examine uncertainties in common damage detection methods [17]. Findings indicated that level of confidence in damage identification results was a function of the level of uncertainty in identified modal parameter choices when designing the monitoring schemes (e.g., spatial density of measurements) along with modeling errors (e.g., mesh size) [5, 18]. Based on these findings, it can be inferred that, for reliable structural health monitoring to occur, there is a need for high signal–noise ratios, precise modeling, and stationary external excitations. However, the stationarity condition is often violated, highly accurate models demand time and expertise, and high signal–noise ratios are usually not achievable using reasonably priced sensors.

Several issues have motivated research on data-driven methods for SHM. These include: efficacy of automatic damage feature extraction under operational conditions; the “curse of dimensionality” when dealing with relatively large parameter sets; how to accurately account for unknown and non-stationary external excitations; and inaccuracies associated with using global damage features to pinpoint local deficiencies [2]. An objective stemming from these issues could be succinctly stated as developing SHMs that correctly detect statistically significant damage feature variations via analysis of sensor data [19]. Researchers are attempting to address this output-only, damage feature detection need. One study implemented a continuous monitoring system on a highway bridge and used a combination of Statistical Process Control (SPC) and Gaussian Process Regression (GPR) for centralized damage identification based on novelty detection [20]. GPR was used to mitigate vehicle–bridge interaction and environmental effects to isolate damage and a relatively long window of measurements was used for determining the SPC threshold. A second study utilized laboratory fatigue tests of a wind turbine blade and adopted multivariate numerical analysis methods such as Radial Basis Functions, Principal Component Analysis, and Artificial Neural Networks (ANNs) to detect damage via measurement of system response to harmonic excitations [21]. Yet, another study developed a novel impact localization method using Proper Orthogonal Decomposition (POD) [22]. Acoustic emission measurements were also used to evaluate structural condition and self-healing performance of textile reinforced cements [23]. The effectiveness of SHM schemes based on statistical damage features versus those based on modal parameters was examined via a second study of a wind turbine blade. It was concluded that statistical-based methods better identified induced damage, even at low damage levels [24]. Kim and Eun performed simulated experiments on a beam to study damage detection capabilities of an algorithm based on POD of the structure’s Frequency Response Function (FRF). It was concluded that using POMs of FRFs in specific frequency ranges could effectively identify incurred damage [25].

Most health monitoring systems are centered on measuring strains, accelerations, displacements, or a combination of these items. While acceleration measurements are suited for monitoring global behavior of structural systems, strain measurements provide a unique understanding of local behavior of the system. Recent research has focused on the use of specific types of strain sensors to not only evaluate their effectiveness in the field but also to examine their efficacy for detecting damage in a SHM application [26] and their use continues to be studied [27, 28]. Glisic et al. compared damage detection capabilities of the two main FO techniques, one based on fiber Bragg gratings and the other on Brillouin optical time-domain analysis. The former technique enables long gage lengths and the latter allows for distributed sensing. It was concluded that both strain sensing techniques are suitable for damage detection via their application to full-scale reinforced concrete structures [29]. Tondreau and Deraemaeker developed a method to locate structural damage based on local modal filters applied to dynamic strain measurements and performed lab experiments to validate their method using a dense array of strain sensors installed on a steel beam [30].

Reported output-only damage detection methods are dependent on stationary, external, system excitations and require high signal–noise ratios for accurate detection. To address these apparent interdependencies, the authors developed a framework for detecting damage under operational conditions using POD and Artificial Neural Networks (ANNs) [26, 31]. A supervised learning scheme was proposed for output-only classification of structural response to minimize POM variations that belong to each defined damage class and load intensity. Additionally, a regression analysis was performed using ANNs to quantify the relationship between identified POMs and damage severity and location. Summarized herein is an extension of the experimental validation of this technique to include the effects of system nonlinearities, measurement errors, local impacts induced by vehicle–structure interaction, and other operational load aspects that potentially could affect performance. A railway bridge was adopted and instrumented as the full-scale test bed. The relatively large ratio of passing train axle loads to the bridge weight rendered system matrices time-varying. Moreover, train loads varied significantly and were highly non-stationary. To validate the accuracy of the method, measured signals from operational train loads were fed into a damage detection algorithm. Artificial, “high-noise” signals were used within the damage detection algorithm to assess its robustness. It was observed that the method was able to expand damage detection capabilities under new, unknown train loads and was robust enough to accurately address noisy measurements.

2 Studied bridge and implemented monitoring system

To explore the efficacy of the coupled POD and ANN methodology, an in-service, steel, truss, double track, railway bridge in central Nebraska was monitored. The bridge was instrumented using strain transducers and measured response was continuously transferred to a data acquisition system that was accessed remotely. This section describes the bridge span under study and the monitoring system.

2.1 Studied bridge

The selected bridge is a simply supported, through-truss that spans 44.7 m. This truss is comprised of six panels with floor beams spaced longitudinally at 7.45 m. The truss span contains riveted and built-up members including: end posts; top chords; verticals; diagonals; and, in the first two panels adjacent to supporting piers, bottom chords. Midspan bottom chords and diagonals are composed of eyebars of varying thickness.

Two types of built-up, I-sections are used for the floor beams, each having differing numbers and sizes of web plates, angles, and cover plates. One built-up, stringer cross section is provided and consists of an I-section with a web plate and angles for the flanges. Bottom laterals and laterals between stringers are single angles of varying dimensions. Top laterals and end portals are trussed elements containing double angles, single angles and lacing bars. Elevation and plan views of the bridge span are shown in Fig. 1. In general, steel bridges can be subjected to a wide variety of deficiencies caused by corrosion, fatigue cracks, scour and other items [32]. For the bridge under study, main structural deficiencies included stringer–floor beam connection deterioration, deterioration of the stringer and bottom lateral connections and members, and frozen roller supports [33]. While all of these deficiencies can be of concern to the bridge owner, as stated earlier, fatigue of stringer–floor beam connections is of primary concern for many riveted, steel, truss railway bridges as connection failure can lead to partial collapse of the structure, potential safety concerns for employees and citizens and expensive traffic disruptions [32, 34, 35]. Therefore, while other SHM configurations were utilized on the selected bridge, the SHM system reported herein was designed to focus on potential deficiencies at the stringer–floor beam connections. It should be noted that proposed method can be used to detect other deficiencies using differing sensor configurations [33].

Fig. 1
figure 1

Truss span plan and elevation

2.2 Monitoring system

A total of 24 strain transducers, manufactured by Bridge Diagnostics, Inc. (BDI), were deployed on the bridge to measure structural response under train passage and 20 of those instruments were installed at stringer ends as shown in Fig. 2. Sensors were installed on the stringer bottom flanges. Selected instrument locations were based on recommendations by the bridge owner and preliminary FEM models. A sensitivity analysis using an FE model of the bridge, completed in a previous study, showed that while the sensor network reported herein furnished excellent sensitivity to critical damage scenarios at the connections, a sparser instrumentation plan using 12 sensors located at midspan of the stringers provided enough response sensitivity for effective detection of stiffness degradation at non-instrumented stringer–floor beam connections [33].

Fig. 2
figure 2

Stringer instrumented locations

The monitoring system consists of a BDI data logger, a wireless base station and wireless nodes, with each node being connected to 4 strain sensors using cables as shown in Fig. 3. The system was powered using six 24-volt batteries that were recharged by two solar panels, also shown in Fig. 3. Example sensor installations on stringer bottom flanges near floor beam connections are shown in Fig. 4. Data were collected remotely with the system set to activate and record strains at a sampling rate of 50 Hz when a train crossed the bridge. It was understood by the authors that measured strains and associated POMs would likely be affected by environmental conditions. Those effects are being examined in a follow-up study.

Fig. 3
figure 3

Deployed monitoring system: a BDI node; b solar panels

Fig. 4
figure 4

Installed strain sensors (circled)

3 Feature extraction and data cleansing

The current study considered 1 week of train “events,” which totaled 363 recorded passages. Live loads associated with these datasets varied with respect to speed, number of axles, and axle load magnitudes. The proposed methodology is discussed in the subsections that follow and summarized in a flowchart in Fig. 19.

3.1 POD for feature extraction using POMs

POMs are known to contain information on structural deficiencies and have been widely used for model reduction [36], impact localization [22], and damage detection in mechanical systems [37, 38]. POMs are used to graphically highlight data having the most variation for a given number of events. Therefore, for the current study, POMs were dependent on recorded strain magnitude and duration, which are a function of train configuration and speed. To minimize POM variations, strain signals included in snapshot matrices used for POM extraction needed to be of similar magnitude and features, and data reduction and cleansing were necessary prior to mode extraction. Therefore, data windowing, load location identification and peak picking were performed. MATLAB algorithms were implemented or developed to render this process autonomous and involved steps are described in the sections that follow.

3.2 Windowing recorded events

The first step was elimination of time intervals having negligible live load strains, typically before and after train passage. The MATLAB find algorithm was applied to window the recorded strains at Location 3 in Fig. 2, selected since that sensor was positioned at midspan under a train rail and, as a result, would be quite sensitive to live load effects. Time steps involving strain changes greater than 7.5 µε were selected as the first filter through an offline trial and error process. Subsequently, all time steps having magnitudes less than 7.5 µε at the start and end of each event were eliminated. Two representative recorded signals at Locations 3 and 18 are shown in Fig. 5 before and after windowing.

Fig. 5
figure 5

Signal windowing: a original; b windowed

3.3 Determining load location

The second step focused on developing automated classification of recorded signals based on the track upon which the train crossed the bridge. Initial field testing and model results indicated that stringer end bottom flange strains would be in compression if that stringer was underneath the loaded track and in tension if the other track was loaded [33]. Means for recorded strains at Locations 1–10, underneath Track 2, were calculated and if values exceeded zero, the train was classified as being located on Track 1, with the opposite sign indicating the train was located on Track 2. This classification showed that 187 of 363 trains traversed the bridge on Track 1. Windowed strain signals at Location 3 and 18 for a train on Track 1 and on Track 2 are shown in Fig. 6.

Fig. 6
figure 6

Load location: a Track 1; b Track 2

3.4 Automated peak picking

The third step focused on automatically selecting a constant number of peaks in each recorded event dataset so that POM variations due to train load disparities were minimized. A lower bound threshold of 50 µε for recorded strain peaks was established based on average strains recorded at Location 3 when Track 1 was loaded or the corresponding location (Location 18) when Track 2 was loaded. The MATLAB findpeaks function was used to select the first 40 peaks having strains greater than 50 µε at Locations 3 or 18, with 40 peaks being selected to ensure that the snapshot matrices included enough samples for stable POM calculation. The developed code excluded the first five peaks, corresponding to four train cars, to eliminate transient response developed from the locomotives. After automated peak peaking was performed, 74 events were filtered, 15 for trains on Track 1 and the remainder for trains on Track 2. Representative final strain events used to develop POMs for trains on Tracks 1 and 2 at Locations 3 and 18 are shown in Fig. 7. Figure 7a is for a case where the train is located on Track 1, while Fig. 7b is for the train located on Track 2.

Fig. 7
figure 7

Peak picking: a Track 1; b Track 2

4 ANN training

4.1 Theoretical background

Regression and classification models can be constructed based on a linear combination of predetermined nonlinear basis functions φ (x) [39]:

$$y\;(x,w) = f\left[ {\sum\limits_{j = 1}^{M} {w_{j} \varphi_{j} (x)} } \right] ,$$
(1)

where f [♦] is a nonlinear activation and classification function, which, for regression situations, equals the identity matrix. One neural network development approach is to assume predetermined nonlinear basis functions are themselves parametric functions of inputs, where coefficients of the linear combinations are adaptive parameters that should be determined for each specific problem. A two-layer, feed-forward neural network was adopted for the current study, as it has been proven that this architecture can approximate arbitrary nonlinear functions [40, 41]. The relationship between input and the jth component of the output of such network is given by [39]:

$$y_{k} (x,w) = \sigma \left[ {\sum\limits_{j = 1}^{M} {\mathop w\nolimits_{kj}^{(2)} } h\left( {\sum\limits_{i = 1}^{D} {\mathop w\nolimits_{ji}^{(1)} x_{i} + \mathop w\nolimits_{j0}^{(1)} } } \right) + \mathop w\nolimits_{k0}^{(2)} } \right] ,$$
(2)

where \(y \in {\mathbb{R}}^{K}\) is the output vector; \(x \in {\mathbb{R}}^{D}\) is the input of the neural network; M denotes the number of neurons in the hidden layer; \(\mathop w\nolimits_{kj}^{(2)}\) and \(\mathop w\nolimits_{k0}^{(2)}\) represent weights and biases of the output layer; and \(\mathop w\nolimits_{ji}^{(1)}\) and \(\mathop w\nolimits_{j0}^{(1)}\) stand for weights and biases of the hidden layer.

The process of obtaining weights and biases of abovementioned relationship between input and output from a set of data is called supervised training of ANN. Consider a given set of training data including input vectors {xk}, where k = 1, 2,…, N and their corresponding target values {tk}. Training of the network is performed by minimizing discrepancies between target and computed output. When dealing with regression, the most common objective function is the least mean squared error [42]:

$$E^{d} = \sum\limits_{k = 1}^{N} {\left\| {t_{k} - y_{k} (x_{k} ,w)} \right\|^{2} } ,$$
(3)

where d refers to data. For training a feed-forward ANN, the weights and biases (thresholds) are calculated in batch mode and the Levenberg–Marquardt (LM) algorithm is often adopted for optimization [43]. In general, one of the issues with LM is its inability to identify global minimums of the objective function. To alleviate this issue, one might need to utilize the algorithm several times with different initializations for the parameters or use soft computing methods, such as simulated annealing or genetic algorithms [44]. With regard to use of LM for calculating ANN weights, until the late 1990s, it was believed that classic gradient descent algorithms would be “trapped” at poor local minima locations [45]. Additional research indicated that poor local minima were not detrimental for large ANNs and that a suboptimal set of weights could furnish a near optimal network performance [45]. For ANNs having more than one hidden layer, error back propagation in the LM algorithm is often used to determine objective function gradients and to optimize corresponding weights [45].

One of the most striking properties of ANNs is network generalization, which means that, once the network is appropriately trained for a set of input and output data, it will make accurate predictions given arbitrary inputs. During the training process, however, the network might not distinguish between noise and hidden structure of the data. This issue is referred to as network overfitting. To alleviate this issue, Bayesian regularization of the objective function is commonly pursued. The basic idea behind Bayesian regularization is that the true underlying function is smooth to some extent and when the weights in a network are kept small, network response will be smooth [46]. Therefore, the regularization adds another term to the objective function:

$$E = \beta E^{d} + \alpha E^{w} ,$$
(4)

where Ew is the sum of square root of network weights; superscript w refers to weights; and α and β are objective function parameters. The ratio of the objective function parameters determines training emphasis, with larger α/β pushing the network toward generalization and smaller values driving the network to smaller errors [46]. The main challenge in implementing ANN regularization is choosing appropriate regularization parameters value. MacKay proposed a Bayesian framework for obtaining optimal objective function values [40], termed Bayesian regularization.

Another strategy for overfitting is early cessation of weight optimization iterations [47]. In this method, available input and output data are divided into three subsets: a training subset; a validation subset; and a test subset. Training subset data are used to calculate gradients and optimize weights using the LM algorithm, while testing subset is used to verify the network generalization by monitoring its validation error. Typically, for the initial training phase, validation errors decrease; however, when the network begins to overfit the training data, validation error increases and LM iterations cease before they converge to a global minimum of the training data set objective function. In current study, Bayesian regularization and early cessation were used for training ANNs.

4.2 Automated peak picking, windowing, and load classification for filtering ANN training data

To investigate statistical relationships between various train loading configurations that helped in identifying snapshot matrix features used to develop POMs, an analytical study was completed using SAP2000 v19 [48]. The study involved 81 different train loads recorded using weigh-in-motion systems in close proximity to studied bridge, with data being provided by the bridge owner [31]. Calculated strains were automatically extracted at sensor locations (Fig. 2) using MATLAB in conjunction with the SAP2000 v19 Open Application Programming Interface (OAPI) [31]. Average root mean square (RMS) values were calculated from extracted analytical stresses for each train with corresponding RMS values cumulatively representing stress statistics during each event [31]. Equivalent uniform loads for each event were also determined as they were efficient representations of train load intensity and length. As shown in Fig. 8, comparison between normalized values of average RMS and equivalent uniform load demonstrated strong correlation, with higher equivalent uniform loads producing higher RMS averages. As a result, average RMS was chosen as the main feature for output-only load classification and feature extraction.

Fig. 8
figure 8

Uniform axle loads and associated average RMS

To further mitigate influence of variable non-stationary external inputs on field-measured strain POM variability, snapshot matrices for both tracks were sorted based on average RMS, with matrices having RMS averages between 45.4 and 47.1, a range of minimal variation, being selected for damage detection. In Fig. 9, average RMS snapshot matrices for Track 1 and 2 are shown. Track 2 matrices were selected for training and testing the developed damage detection ANNs, with matrices for trains 29–46 specifically being selected due to similar RMS averages. It is important to note that testing snapshot “events” were not included in the ANN training process. They were used to investigate how accurately the proposed method detected damage under various loading scenarios in the same average RMS range. Snapshots for trains 29, 35, 41 and 46 were randomly selected for ANN testing, while snapshots for trains 30–34, 36–40 and 42–45 were used for ANN training.

Fig. 9
figure 9

RMS averages: a Track 1; b Track 2

Since imposing actual damage on the studied bridge was not permissible, measured strains were reduced by multiplying a reduction factor to account for various levels of damage at stringer–floor beam connections. This “damage” was simulated via reduction of field measured strains at selected locations, with reductions being proportional to assumed damage intensities (DIs). These reductions were based on the assumption that connection deterioration would reduce rotational stiffness, and in the limit would convert a semi-rigid (“healthy”) connection into a pinned connection having limited to no moment restraint [35]. Potential crack propagation through the connection depth was modeled via continuous decrease in connection rotational stiffness [35], resulting in smaller moments at the connection and, accordingly, smaller stringer bottom flange strains. In this study, induced damage at any connection was simulated using a strain reduction factor at that connection, with other connections being undamaged.

4.3 Nonlinear regression ANN for damage detection

To generate training data, 10 DIs, varying between 10 and 90% in 10% increments, were examined at each field instrument location (see Fig. 2). The DIs were sequentially varied at the 20 locations for 14 train events, which produced 2800 damage scenarios. These damage scenarios trained the ANNs using MATLABs Neural Net Fitting function, where various numbers of internal neurons were explored to ensure that ANNs were accurately generalized for damage identification. A nonlinear regression ANN was used to establish damage detection from POMs of examined scenarios. It was decided that 70% of the input POMs would be used for training, 15% for validation and 15% for testing during ANN regression analysis. ANN regression correlation curves for training, testing and the entire input data set are shown in Fig. 10a–c. In each subplot in Fig. 10, solid lines represent the best fit of the estimated DIs. Higher scatter was observed for DIs less than 20% in Fig. 10a–c, meaning that DIs greater than 20% could be more accurately predicted.

Fig. 10
figure 10

Representative ANN regression plots, 100 internal neurons: a training; b test; c combined

5 ANN testing

As stated earlier, Trains 29, 35, 41 and 46 were randomly selected to test ANN effectiveness, with tested ANNs featuring 25, 50, 100 and 200 internal neurons, with the number of neurons being selected to use trial and error for determining the appropriate number of internal neurons. A representative comparison between ANNs using 100 and 200 neurons for a DI of 90% at Location 8 when loaded by Train 29 is shown in Fig. 11. The figure indicated that both ANNs predicted damage location and intensity very well; however, the network with 200 neurons appeared to be marginally affected by overfitting as evidenced by false positives and negatives shown in Fig. 11b. The 100 neuron ANN was shown to be more robust for predicating damage location and intensity when compared against 25 and 50 neurons, and, as a result, 100 neurons were adopted. Training the ANNs was performed using a desktop computer, featuring a multi-core architecture and running Windows 7, 64 bit, as operating system. Its Central Processing Unit (CPU) was an Intel Xeon E5-2630 2.4 GHz processor, 8 Cores, 32 GB DDR4 RAM of main memory and 20 MB Smart Cash. Training time varied based on the number of internal neurons in the internal layer of the ANNs. For 20, 50, 100, and 200 neurons, the elapsed CPU time was, respectively, 27, 47, 120, and 221 min.

Fig. 11
figure 11

ANN testing, Train 29: a 100 neurons; b 200 neurons

To ensure that the developed method was robust against false-positive damage signals, POMs for a healthy bridge subjected to various load events were also used to test trained ANNs. As shown in Fig. 12, it was observed that, for events associated with the 4 trains selected for ANN testing, the maximum false-positive DI was approximately 6%, which was deemed to be small when compared against the actual DI of 0%. These results supported the premise that the method would successfully detect damage with the caveat that an acceptable threshold should be established via long-term monitoring and corresponding statistical analyses.

Fig. 12
figure 12

Healthy bridge ANN testing: a Train 29; b Train 35; c Train 41; and d Train 46

To further ascertain the ability of the proposed methodology to detect damage location and intensity, ANN damage index predictions at instrumented locations were studied. Representative results are shown in Figs. 13, 14 and 15. The choice of damage intensity and location was arbitrary; however, DIs ranged from 0 to 90% and locations were chosen to cover various spots on the bridge. Results of ANN testing at Location 11 are shown in Fig. 13 and indicated that DI predictions might be affected by recorded signals from certain trains, especially at low DIs. A DI of 20% was captured well for all testing sets except for Train 35, where false-positive DIs, approaching 5%, existed. Figure 14 demonstrates the ability of the proposed methodology for capturing the studied range of DIs at Location 8. Predicted DIs for Train 29 were 17, 37, 58, 79 and 89% for imposed DIs of 20, 40, 60, 80 and 90%, respectively.

Fig. 13
figure 13

ANN testing, Location 11, Train 35, all DIs

Fig. 14
figure 14

ANN testing, Location 8, Train 29, all DIs

Fig. 15
figure 15

ANN testing, Location 13, Trains 29, 35, 41 and 46, DI = 60%

Conversely, Fig. 15 presents damage identification capabilities of trained ANNs for a DI of 60% at Location 13 under multiple train loads. Trains 29, 35, 41 and 46 and were used and damage location and intensity were, again, captured accurately with predicted DIs ranging between 57 and 60%.

To estimate the importance of classifying trains based on RMS as shown in Fig. 9, the trained ANN was tested using healthy POMs for 4 trains whose average RMS was located outside the selected range. Trains 10, 20, 50 and 55 were selected for this exercise. A moderate increase in false positives was evident for three of the four selected trains (Trains 10, 50 and 55), with false positives predicting DI ranges between 8 and 25% as shown in Fig. 16a, c–d. For Train 20, the predicted DI showed a significant increase in false positives, with maximum value of 70% as shown in Fig. 16b. These results showed that selecting ANN training sets based on the associated average RMS is necessary to reduce POM variations associated with changes in non-stationary loading configurations.

Fig. 16
figure 16

Healthy bridge ANN testing: a Train 10; b Train 20; c Train 50; and d Train 55

To examine the effectiveness of the proposed methodology for detecting damage using noisy strain signals, as is often the case with low-cost sensing devices, zero mean, white Gaussian noise was added to the measured strain time-histories. A representative example of noisy strain signals for Train 41 is shown in Fig. 17. ANNs were retrained and retested using the “noisy” data and results showed that the proposed methodology was capable of capturing damage from noisy signals with acceptable accuracy. “Noisy” strains at Location 15, for a DI of 80% for Trains 29, 35, 41 and 46 are shown in Fig. 18. Predicted DIs ranged from 78 to 81%.

Fig. 17
figure 17

Signal with 20% simulated noise, Locations 3 and 18, Train 41; a peak picking; and b magnified view

Fig. 18
figure 18

ANN testing, simulated 20% noise, Location 15, DI = 80%, Trains 29, 35, 41 and 46

As stated earlier, a flowchart describing the methodology is shown in Fig. 19.

Fig. 19
figure 19

Feature extraction and data cleansing methodology

6 Conclusions

In this study, an automated, output-only, damage detection approach using POD and ANNs was developed and investigated for steel truss railway bridges. Measurements from full-scale field monitoring data and calibrated numerical models were used to develop and examine the proposed approach. Results demonstrated its efficacy for detecting deficiencies in stringer–floor beam connections and the approach can be extended to include damage detection for other structural systems, details and types of data. The following conclusions were drawn from the study:

  • The proposed method successfully detects damage using strain outputs induced by unknown, nonstationary external inputs.

  • Automated data cleansing prior to POM extraction was necessary to reduce discrepancies caused by nonstationary inputs.

  • The developed approach could accurately capture damage represented by DIs great than 20%, with clearly improved accuracy for DIs higher than 40%.

  • The method is robust enough to accurately predict damage supplied from highly noisy signals.

It should be reiterated that the current study was performed neglecting modeling errors and environmental variability. It is noteworthy that existing filtering methods in the literature, even in the absence of modeling errors and environmental effects, lead to large estimation errors when measurement noise is large and a relatively large number of damage indices have to be identified. For example, hybrid particle filters systematically account for modeling and measurement errors but are prone to bias as the noise–signal ratio increases [38].

Ongoing work includes:

  • Improving the approach via consideration of environmental effects.

  • Statistical investigations of damage indices and resulting damage thresholds.

  • Incorporation of higher fidelity models to improve DI prediction accuracy.