Keywords

1 Introduction

For past decades, one of the most commonly used ways for structural health diagnosis was manual inspection, which had the shortcomings of high dependence on subjective judgment and engineering experience, severe unreliability, and low efficiency. Following the paradigm of damage prognosis established by Farrar and Lieven (2007) [1], structural damage recognition, condition assessment, and reliability evaluation were the most significant issues towards structural health diagnosis. Since the 1990s, structural health monitoring techniques have been widely adopted in large-scale infrastructure using non-destructive testing and vibration-based methods. The measured signals were directly compared with peak values or statistical indices with thresholds regulated by design codes. However, the following challenges remained to be addressed: these techniques required the dense deployment of sensors on bridges and faced the ill-posedness of the reverse problem; the modal parameters were insensitive to minor damage in a local position; the accuracy was influenced by temperature and noise.

With the successful development of artificial intelligence, data-driven methods have been developed for damage detection and condition assessment based on machine learning, deep learning, and computer vision algorithms [2,3,4]. Recently, vision-based damage detection has been elaborately investigated using image processing techniques [5, 6]. Generally, these methods mainly utilized close-up imaging of structures and only focused on a small area of local damage regions. Moreover, model performances heavily relied on the optimal selection of handcrafted features and critical parameters, thus lacking accuracy and robustness facing large-scale images with complex backgrounds under real-world scenarios [7]. For deep learning-based methods, they were always performed by directly migrating the well-trained model to newly-collected onsite images, thus requiring a massive dataset for training and a large volume of model parameters to ensure the recognition accuracy and robustness under various scenarios. Additionally, the recognition stability on multiscale damages with different morphologies remained challenging.

To address these issues mentioned above, this study established a framework for structural health diagnosis under limited supervision from vision-based damage recognition and deep learning-based condition assessment in Sects. 2 and 4, respectively. Section 5 concluded this paper.

2 Vision-Based Damage Recognition Using Few Images

Although many investigations have been performed for damage recognition from images, the following issues remain to be addressed: (1) the accuracy heavily relies on sufficient images and large network parameters; (2) the sensitivity to minor damage in local positions is limited; (3) the robustness is inadequate on complex coupled damage with various morphological features and disturbances. In this section, a series of recent advances are reported to solve the above issues.

A random elastic deformation (RED) algorithm was proposed to enrich the diversity of damage morphology with only a handful of original images [8], as shown in Fig. 1. Firstly, control nodes (red dots) were equidistantly set on mesh grids of the original image. Random offsets of (Δx,Δy) were assigned to these control nodes following a uniform distribution (blue arrows). Secondly, offsets of other pixels were calculated using two-dimensional cubic spline interpolation. Thirdly, the pixel value of the sub-pixel was determined by bilinear interpolation. The results indicated that RED could increase the geometric shapes and local microstructures of cracks and add high-order components into the original crack shapes. Therefore, new crack images generated by RED have significant differences from the original ones, demonstrating that RED could enrich the feature space of structural damage images.

Fig. 1.
figure 1

Random elastic deformation for image augmentation of structural damage

A novel Self-Attention-Self-Adaption (SASA) neuron computing model [8] was proposed to enhance the capability of feature extraction and nonlinear expression power for neural networks:

$$ \begin{aligned} x_{j}^{{l + 1}} = & \sigma \left[ {\mathop \sum \limits_{i}^{N} w_{{ij}}^{{l,l + 1}} x_{i}^{l} + b_{j}^{{l + 1}} + m_{j}^{{l + 1}} \left( {\user2{\alpha }^{l} \user2{*X}^{l} ,\theta _{j}^{{l + 1}} } \right)} \right] \\ \user2{\alpha }^{l} = & G\left( {softmax\left( {\user2{X}^{l} } \right),\beta } \right),{\text{~}}softmax\left( {\user2{X}^{l} } \right)_{i} = {{e^{{x_{i} }} } \mathord{\left/ {\vphantom {{e^{{x_{i} }} } {\mathop \sum \limits_{j}^{{N^{l} }} e^{{x_{j} }} }}} \right. \kern-\nulldelimiterspace} {\mathop \sum \limits_{j}^{{N^{l} }} e^{{x_{j} }} }} \\ \end{aligned} $$
(1)

where \(x_{i}^{l}\) denotes the ith neuron in the \(l\) th layer, \(N^{l}\) denotes the number of neurons in the \( l\) th layer, \(w_{ij}^{l,l + 1}\) denotes the connecting weight between \(x_{i}^{l}\) and \(x_{j}^{l + 1}\), \(b_{j}^{l + 1}\) denotes the individual bias associated with \(x_{j}^{l + 1}\), and \(\sigma\) denotes the nonlinear activation function. \({\varvec{X}}^{l}\) denotes the neurons in the lth layer, \(\user2{ \alpha }^{l}\) denotes the significance vector in the Self-Attention module, and \( m_{j}^{l + 1}\) denotes the subnet of multilayer perceptron in the Self-Adaption module associated with the jth neuron in the (l + 1) layer and parameterized with \(\theta_{j}^{l + 1}\). \(\user2{*}\) denotes the multiplication operator of corresponding elements for two vectors. The gate function G reserves top β elements and assigns the others to zero. \(\theta_{j}^{l + 1}\) denotes subnet parameters.

Figure 2 shows the schematic of the SASA neuron computing model. The Self-Attention module applied softmax and gate operations to obtain the significance vector. It enabled the neuron to focus on the most significant receptive fields when processing large-scale feature maps, emphasized the saliency of interior neurons inside one layer, and did not introduce additional trainable parameters. The Self-Adaption module was designed as a subnet of multilayer perceptron \(m_{j}^{l + 1}\) and implemented by a standard neural network with k equal hidden layers. The interior subnet structure was controlled by the number of hidden layers k and the number of neurons h in each hidden layer. For consistency with the exterior neuron network, the number of neurons h in the hidden layer of the subnet was set proportionate to the number of neurons \(N^{l + 1}\) in the current layer with a default coefficient γ. It could achieve powerful feature extraction using only a small number of images. The SASA neuron model allowed for the “plug and play” of arbitrary conventional neural networks.

Fig. 2.
figure 2

Schematic of SASA neuron computing model

A case study was performed on the semantic segmentation of distributed tiny fatigue cracks in steel box girders using U-net as the baseline model. Figure 3 shows several comparative results of tiny crack segmentation with and without integrating the SASA neuron in U-net. The results indicated that using the modified U-net integrating with SASA neuron could achieve accurate pixel-level recognition of tiny cracks with complex background interferences. False alarms and crack gaps were effectually suppressed, implying that the SASA neuron could enable the model to focus on the local regions of interest. Based on the image recognition results of tiny fatigue cracks in steel box girder, a hierarchical dynamic Bayesian network was established for fatigue crack propagation modeling considering initial defects [9].

Fig. 3.
figure 3

Comparisons of tiny crack segmentation with/without SASA neuron in U-net

A dual-stage attribute-based few-shot meta learning paradigm was proposed for multitype structural damage identification [10], as shown in Fig. 4. An exterior few-shot meta learning framework was established based on randomly-selected tasks as meta-batches to produce robust classifiers for new damage classes. Support and query subsets comprising only partial damage categories and a few examples were randomly generated from the original image dataset. An interior attribute-based transfer learning model was trained by minimizing the l2-norm and angular differences of predicted and ground-truth attribute vectors. Damage attribute acts as the common inter-class knowledge and is transferred among various damage categories instead of using one-hot vector labels for the standard supervised classification. The latter only considered that the class labels were orthogonal and had no connections, and therefore only one class was assigned on the position of maximum softmax probability, causing the misrecognition of coupled damage.

For pixel-wise recognition for various structural damage, a lightweight modified DeepLabv3 + model was established as the interior model [11]. Figure 5 shows the schematic of the modified DeepLabv3 + model for semantic segmentation of multitype structural damage. The backbone network of the original ResNet101 was replaced with the lightweight MobileNetV2. Depthwise separable and dilated convolutions were used instead of standard convolution to reduce parameter volume. A refined atrous spatial pyramid pooling module was designed following the backbone network to expand the receptive fields of multilevel feature maps using dilated convolutions with various dilation rates. Furthermore, a piecewise loss function based on Focal and Dice losses was designed for different training stages. Several representative results for semantic segmentation of concrete crack, concrete spalling, rebar exposure, and cable corrosion indicated that the established model performed well and was stable facing various structural damage. It could be inferred that the morphological feature and shape contexture for various categories of structural damage were automatically captured.

Fig. 4.
figure 4

Dual-stage attribute-based few-shot meta learning for multitype damage classification

Fig. 5.
figure 5

Lightweight modified DeepLabv3 + model for structural damage segmentation

3 Correlation Pattern Recognition Based Condition Assessment

Considering that the correlation between quasi-static responses subjected to identical external loads is only a function of structural parameters and independent from the external loads, the correlation can therefore be employed as an indicator of the structural condition.

Fig. 6.
figure 6

Schematic of BiLSTM for temporal correlation modeling under normal conditions

A bi-directional long short-term memory (BiLSTM) model was established to model the temporal correlation between the vertical deflection of girders (GVD) and tension of cables (CT) [12], as shown in Fig. 6. Test results showed that the bridge was under normal conditions and that the average root mean square error (RMSE) and relative RMSE between the predicted and ground-truth CTs were 1.83 kN and 3.19%, respectively.

Fig. 7.
figure 7

Probabilistic correlation modeling and Wasserstein distance variation

A deep learning network comprising two variational autoencoders (VAEs) and two generative adversarial networks (GANs) was established to model the probabilistic correlations of quasi-static responses of bridges [13], as shown in Fig. 7. VAEs were designed to model intra-class correlations among either GVDs or CTs, and GANs were designed to model inter-class correlations between GVDs and CTs. The input and output were marginal probability density functions (PDFs) of the quasi-static responses, and they were obtained in the same time window under identical vehicle loads and structural parameters. The Wasserstein distance between the predicted and ground-truth PDFs of tension in the cables was used as an indicator of the structural condition. The results showed that the Wasserstein distance was very sensitive to damage and presented noticeable variations when the damage of the stay cable occurred.

4 Vision-Based Structural Seismic Assessment for Buildings

Recently, remote sensing satellites [14], unmanned aerial vehicles (UAVs) [15], and smartphones have been extensively utilized in non-contact post-earthquake inspection at different scales with cutting-edge computer vision and machine learning techniques. In this section, a computer-vision-based coarse-to-fine seismic assessment framework was established to localize dense buildings in urban areas, classify collapsed and non-collapsed states, recognize multi-type surface damage on structural components, and evaluate seismic performances.

A Transformer-CNN deep learning architecture was designed for semantic segmentation of dense buildings and binary classification of collapsed states using large-scale remote sensing satellite images [16]. It consisted of a Swin Transformer encoder, multi-stage feature fusion module, and UPerNet decoder to extract global correlations and local features of dense buildings synchronously, as shown in Fig. 8.

Fig. 8.
figure 8

Improved Swin Transformer for dense building segmentation and state classification

A multi-task learning strategy was proposed to simultaneously recognize multi-type structural components (column, beam, wall), seismic damage (concrete crack, spalling, and rebar exposure), and multi-level damage states (minor, moderate, major) using medium-scale UAV images [17]. It contained a CNN-based encoder-decoder backbone with skip-connection modules and multi-head segmentation subnetworks for different tasks, as shown in Fig. 9.

Fig. 9.
figure 9

Multi-task learning semantic segmentation of structural component, damage, and state

An earthquake engineering knowledge-enhanced machine learning method was established for seismic damage assessment of structural components [18], as shown in Fig. 10. A machine learning neural network was established to quantify the seismic damage index of structural components using damage-related parameters (lengths, areas, and numbers of concrete crack, spalling, and rebar exposure) and design-related parameters (axial compression ratio, shear span ratio, and volumetric stirrup ratio) as inputs. A seismic damage indicator with an explicit bound of [0,1] could be obtained to reflect the nonlinear accumulation of seismic damage.

Fig. 10.
figure 10

Seismic damage quantification using quasi-static experimental data and images

5 Conclusions

This study established a limited-data-driven machine learning framework for structural health diagnosis. The main conclusions were summarized as follows.

  1. (1)

    A data augmentation algorithm of random elastic deformation was designed to enrich the feature space using a few structural damage images. A novel neural network model was designed to enhance the nonlinear expression power, feature extraction ability, and recognition accuracy by introducing the self-attention and subnet modules inside a standard neuron. A task-significance-aware meta-learning optimization algorithm was proposed to learn across various tasks and enhance the generalization ability for multitype structural damage identification.

  2. (2)

    Two deep learning networks were established to mine the shared latent space be-tween the source and target domains based on intra-class and inter-class temporal and probabilistic correlations between two kinds of quasi-static responses for structural condition assessment.

  3. (3)

    A computer-vision-based coarse-to-fine seismic assessment framework was established to localize dense buildings in urban areas, classify collapsed and non-collapsed states, recognize multi-type surface damage on structural components, and evaluate seismic performances. A series of deep learning models were designed for the localization, classification, and quantification of dense buildings, deterioration states, and damage index using large-scale remote sensing satellite images, medium-scale UAV images, near-field surface images, and quasi-static experimental data.

  4. (4)

    Real-world applications, including distributed tiny fatigue crack segmentation in steel box girders, multitype structural damage classification and segmentation for bridge inspection, condition assessment for long-span cable-stayed bridges, and vision-based structural seismic assessment for buildings were performed to demonstrate the effectiveness of the proposed limited-data-driven machine learning methods for structural health diagnosis.