Keywords

23.1 Introduction

Vibration-based damage detection is among the most adopted strategies for structural health monitoring (SHM) of civil structures [1, 2]. Damage is indeed a change in structural properties (i.e., mass, stiffness, and damping) that results in a change in modal parameters [3, 4]. For this reason, the dynamic response carries useful damage-related information. When real structures under operating conditions are considered, unsupervised learning approaches are the most suitable choice, since damage-related data are not available, prior to the starting of the monitoring [5]. Unsupervised learning algorithms detect damage when a statistically significant difference is assessed between current data and a reference database (baseline). When vibration-based damage features are adopted, unsupervised learning approaches are limited by the effects of environmental and operational variations (EOVs) [6]. Changes of dynamic properties associated with EOVs are often greater than those due to damage at an early stage, thus they must be properly accounted for.

In this context, this chapter shows a challenging case study in the field of vibration-based unsupervised learning damage detection by dealing with axially-loaded beam-like structures. Slender beams naturally undergo significant vibration levels during normal operational conditions, making the adoption of automatic modal identification algorithms convenient. However, the presence of an axial load which changes according to many physical factors (e.g., temperature or loading conditions) can easily mask the effects of damage, as proved in a previous work [7]. However, when modal parameters of multiple vibration modes are considered together to define a multivariate damage feature, a separation of the effects is possible and damage detection algorithms can be successfully adopted [8].

In order to make the strategy totally automatic, a point of great importance that is often overlooked is that related to the quality of the data used to apply the algorithms. In fact, automatic identification of modal parameters can sometimes be unsuccessful, due to nonideal operational conditions. As it will be shown in the following, removing these wrong identifications from the analysis is a key factor in order to spot early signs of damage in monitoring data. In this chapter, details will be provided on an automatic data cleansing algorithm that allowed a successful adoption of unsupervised learning outlier detection to identify real damage under the effects of uncontrolled EOVs.

The chapter is organized as it follows: the case study is described in Sect. 23.2. The data cleansing algorithm is explained in Sect. 23.3, while its effects on damage detection performances are presented in Sect. 23.4. Finally, conclusions are drawn in Sect. 23.5.

23.2 The Case Study

The data that will be discussed in the following come from an experimental set-up located in the laboratory of Mechanical Deparment of Politecnico di Milano (Fig. 23.1), where a series of nominally identical tie-rods are monitored over time for research purposes (e.g., [8,9,10]). The length of the tie-rods is equal to 4 m, with a cross-section equal to 15 × 25 mm2. Each tie-rod is equipped with general purpose piezoelectric accelerometers (PCB, model 603C01) with a sensitivity of 10.2 mV/(m/s2) and a full scale of ±490 m/s2. The choice for industrial accelerometers was made to consider sensor performances which are similar to those that can be found in real applications. Furthermore, the room temperature is measured by a thermocouple and the axial load of each tie-rod is monitored with strain gauges composing a full Wheatstone bridge. Data are acquired at a sampling frequency of 512 Hz by NI9234 modules with anti-aliasing filter on board.

Fig. 23.1
A photograph of an experimental setup. It depicts identical tie-rods installed with piezoelectric accelerometers in a laboratory.

The experimental set-up in the laboratory of Mechanical Department of Politecnico di Milano

Although being in a laboratory, the environment is characterized by uncontrolled temperature conditions, with temperature that reaches values close to 30 °C at summer and close to 5 °C at winter, with daily excursions in the order of 3–10 °C. Regarding the operational environment, many activities take place close to the monitored tie-rods, e.g., human activities or functioning machineries.

The data that will be discussed in the following are related to approximately 15 months of monitoring, and they include data related to real damage, obtained through a chemical attack on one of the tie-rods. The attack caused general corrosion (i.e., a loss of material that caused a section height reduction) at a distance equal to 5/8 of the free length from one fixed end. The adopted damage feature and the data cleansing algorithm developed to in presence of EOVs are introduced in the next section.

23.3 Analysis

The damage feature adopted in this work is a vector containing the eigenfrequencies of the bending modes in the vertical plane:

$$ \underset{\raisebox{10pt}{\mbox{\_}\,}}{f}={\left\{{f}_1,{f}_2,\dots, {f}_M\right\}}^{\textrm{T}} $$
(23.1)

where M is the number of considered vibration modes and the superscript “T” means the transpose. The working conditions allowed for a sufficiently stable identification of the vibration modes below 150 Hz. The eigenfrequencies can be generally identified with operational modal analysis approaches (e.g., [11,12,13]), that can be effectively adopted on slender structures which undergo significant vibration levels. The results presented in this chapter come from data acquired by a single sensor which is located at a 1/10 of the free length of the tie-rod from one fixed end. Using a single sensor, the adopted identification technique is the best-fitting approach [14], which uses the power spectrum of the response as input and allows for the identification of the eigenfrequency values. However, what will be discussed in the following is valid despite the adopted identification technique.

As an example, the trends of three eigenfrequencies are presented, for a period of 2 weeks in Fig. 23.2. Subscripts are used to indicate the order of the eigenfrequencies and f1, f2, and f3 are the eigenfrequencies associated to the third, fourth, and fifth bending vibration modes in the vertical plane, respectively. The automatically identified eigenfrequencies are represented by black points, joint by a black-solid line. As it is possible to see, the eigenfrequencies show daily trends, associated with temperature daily cycles. Furthermore, it is possible to see some peaks (e.g., f1 in day March 14th). This kind of behavior is associated with a wrong eigenfrequency identification that can be due to a bad signal to noise ratio or the lack of excitation of the considered vibration mode. Another abnormal behavior is that observed in period March 5th to March 10th, on eigenfrequencies f2 and f3. In this case, while f1 shows daily cycles, many identifications are at the same value for both f2 and f3, showing a flat trend. This latter case can be associated with the presence of a monoharmonic excitation close to resonance. In such a situation, the best fitting algorithm can converge to a wrong solution, since the response to a harmonic input is mistaken for the dynamic amplification due to the resonance, when only the power spectrum obtained with a single sensor is considered. When modal identification is carried out automatically during long-term monitoring, the presence of such corrupted data can represent an issue for damage detection, as it will be shown in Sect. 23.4.

Fig. 23.2
3 line graph plots of f subscript 1 for different March dates for the legends selected, discarded steps 1 and 2. The first plot depicts medium-height waveforms between March 05 and March 11. In the second and third plots, the waveforms depict flat, pointed waveforms between March 05 and March 11.

Time-trends of three eigenfrequencies. Black points indicate all the automatically identified eigenfrequency values, blue circles and red triangles indicate the observations discarded by step 1 and 2 of the data cleansing algorithm, respectively

An automatic data cleansing algorithm made by two steps was developed to allow for spotting and removing the corrupted data without any human supervision. At the first step, the R2 coefficients evaluated between the analytical power spectrum of a single-degree-of-freedom linear time invariant mechanical system, and the experimental power spectrum is used to assess the quality of the modal identification, considering one vibration mode at a time. If even a single eigenfrequency identification is associated with a R2 below a user defined threshold, the entire feature vector is removed. As an example, by adopting a threshold of 0.9, the observations marked with a blue circle in Fig. 23.2 are removed.

In the second step, all the eigenfrequencies are considered together. The linear relationship between squared pairs of eigenfrequencies of an axially-loaded beam (see, e.g., [15, 16]) is exploited. The trend in time of the m-th squared eigenfrequency is indicated by the symbol \( {\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}_{\raisebox{10pt}{m}} \). The scatter plots of the lowest squared eigenfrequency (i.e., \( {\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}_{\raisebox{10pt}{1}} \)) and each of the others are reported in Fig. 23.3a, b.

Fig. 23.3
4 scatterplots of squared eigenfrequencies for legends S, S cap, and linear trend. All depict an increasing trend.

Scatter plots of the squared eigenfrequencies. Figures (a) and (b) show all the observations available before step 2 of the data cleansing algorithm. Figures (c) and (d) show only the observations used to estimate the coefficients of the linear trend reported with black-dashed line

As it is possible to observe, the majority of the observations are scattered around a linear trend. In addition, there are few observations that deviate from the majority of the others. By assuming the hypothesis that in a short-term window the eigenfrequency variations are only due to axial load variations and not by damage, these observations are clearly associated to corrupted data. It must be considered that the presence of these out-of-scale values can introduce errors in the estimate of the coefficients of the linear regression. For this reason, a preselection of data is carried out, by adopting the Hampel Identifier [17], which is a variation of the three-sigma rule of statistics that is robust against outliers [18]. Data that are less than 1 scaled median absolute deviation (MAD) [19] distant from the local mean over a moving window of 72 h are selected. The selected observations are indicated by the symbol \( {\hat{\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}}_{\raisebox{10pt}{m}} \) for the vibration mode number m, and they are represented with red-filled circles in Figs. 23.3c, d. A number M − 1 of couples are defined, composed by \( {\hat{\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}}_{\raisebox{10pt}{1}} \) and \( {\hat{\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}}_{\raisebox{10pt}{j}} \), with 1 < j ≤ M, and the coefficients of the linear regression \( {\hat{a}}_{1j} \) and \( {\hat{b}}_{1j} \) (the hat symbol is to indicate that the coefficients are estimated considering the selected observations \( {\hat{\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}}_{\raisebox{10pt}{m}} \) only) can be estimated through least squares solution of the linear problem:

$$ {\hat{\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}}_{\raisebox{10pt}{j}}={\hat{a}}_{1j}\cdot {\hat{\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}}_{\raisebox{10pt}{1}}+{\hat{b}}_{1j}\cdot \underset{\raisebox{10pt}{\mbox{\_}\,}}{1} $$
(23.2)

where \( \underset{\raisebox{10pt}{\mbox{\_}\,}}{1} \) is a column vector with all elements equal to 1.

Once the coefficients are identified considering only the points of Fig. 23.3c, d, all the observations (i.e., all the points in Fig. 23.3a, b) are again considered to finally detect the ones that must be discarded and, for this purpose, the residuals of the linear regression are estimated, according to the following expression:

$$ {\varepsilon}_{1j}={\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}_{\raisebox{10pt}{j}}-{\hat{a}}_{1j}\cdot {\underset{\raisebox{10pt}{\mbox{\_}\,}}{s}}_{\raisebox{10pt}{1}}-{\hat{b}}_{1j}\cdot \underset{\raisebox{10pt}{\mbox{\_}\,}}{1} $$
(23.3)

The discarding procedure is that of Fig. 23.4, where the Hampel filter is again used (this check can be done on a broader window, e.g., of length equal to 14 days). The horizontal black-dashed lines indicate the range of ±2 scaled MAD around the median of each of the two residuals ε12 and ε13: observations which are not inside the range are discarded (red circles in Fig. 23.4).

Fig. 23.4
2 plots of linear regression. The first plot depicts peaks up to negative 500 for March 08 and smaller peaks following that with red circles. The second plot depicts a fluctuating waveform with maximum peaks on March 05 and March 08 with red circles.

Residuals of the linear regression. Red circles indicate the observations that are considered outliers and that are removed by step 2 of the data cleansing algorithm

The second step is able to detect and discard the observations indicated with red triangles in Fig. 23.2. As it is possible to see, the data cleansing algorithm cancels out a considerable amount of wrong identifications, while not intervening on normal data. Having explained this preliminary step, its effects on the damage detection stage are shown in the next section.

23.4 Results

The long-term data related to the eigenfrequencies associated to the third, fourth, and fifth vibration modes of one tie-rod are now considered. By using the Mahalanobis squared distance, one of the most adopted multivariate metrics in outlier detection [20], a damage index DI is defined, according to the next expression:

$$ \textrm{DI}={\left({\underset{\raisebox{10pt}{\mbox{\_}\,}}{f}}^{\textrm{new}}-{\mu}^{\textrm{base}}\right)}^{\textrm{T}}{\left({\left[\varSigma \right]}^{\textrm{base}}\right)}^{-1}\left({\underset{\raisebox{10pt}{\mbox{\_}\,}}{f}}^{\textrm{new}}-{\mu}^{\textrm{base}}\right) $$
(23.4)

where \( \underline{s}_{\,k} {\underset{\raisebox{10pt}{\mbox{\_}\,}}{f}}^{\textrm{new}} \) is a new observation of the damage feature, μbase and [Σ]base are vector mean and covariance matrix of the matrix [f]base, which is a matrix containing multiple observations of the damage feature \( \underset{\raisebox{10pt}{\mbox{\_}\,}}{f} \), when the structure is in a healthy condition. The index DI is a scalar quantity that can be checked against a threshold, here defined according to the procedure reported in [20], based on a Monte Carlo method.

In Fig. 23.5, data indicated with black crosses are those used to build a baseline set. Blue circles are the validation data, i.e., observations associated with a healthy condition but not included in the baseline matrix. Finally, red triangles are used to present data associated with an ongoing damaging process (four different pictures related to different stages of the corrosion are also reported, named A, B, C, and D on top of Fig. 23.5).

Fig. 23.5
A plot of D I versus Time for five legends. It depicts an increasing trend, with various points on the plot indicating illustrations of different stages of corrosion.

Damage detection in presence of real damage. Black crosses are used for the baseline set, blue circles are used for the validation set, and red triangles are used for damage-related data. The horizontal black-dashed line represents the damage detection threshold. Green points are used to show a moving averaged trend obtained with a window of duration equal to one day, shifted every hour. Pictures of different stages of the corrosion are reported on top

The obtained results indicate that it is possible to detect real damage. Furthermore, no false alarms are produced on the validation set (i.e., blue circles are below the damage detection threshold). It is worth noticing that if condition B is considered, when damage is still barely detectable through a visual inspection, the damage index is clearly above the damage detection threshold, also considering the dispersion associated with the results.

The receiver operating characteristic (ROC) curve is shown in Fig. 23.6. This graphical tool resumes the performances of a binary classifier by reporting how the true positive and false positive rates change by variating the damage detection threshold. The performances of DI are plotted with a black-solid line, and they are very close to that of a perfect classifier (green-dashed trend that goes from the origin with coordinates (0,0) to the top left corner (0,1) and from (0,1) to the top right corner (1,1)).

Fig. 23.6
A line graph plot of true positive rate versus false positive rate for perfect classifier, random, classifier, with and without data cleansing. All depict an increasing trend.

Receiver operating characteristic curve. Comparison between with (black-solid line) and without data cleansing (black-dotted line). Red-dashed and green-dashed lines indicate the curves of a perfect classifier and a random classifier, respectively

The importance of the automatic algorithm presented in the previous section can be assessed by comparing the results obtained with or without the data cleansing stage (this latter case is reported with a black-dotted line). As it is possible to see, if no data cleansing is adopted, the performances of DI are even worse than those of a random classifier, which are represented by the red-dashed diagonal from (0,0) to (1,1).

23.5 Conclusion

The quality of data is a crucial issue when automatic monitoring is carried out. This chapter proposed a two-step automatic data cleansing algorithm that allowed to successfully adopt a simple and cost-effective damage detection strategy under the effects of uncontrolled environmental and operational conditions. The damage associated with a real deteriorative process was spotted when the effects were barely visible through a visual inspection, showing the great potential for real scenarios.