Introduction

Civil asset owners often put the safety of civil infrastructure ahead of everything else because it directly affects human lives (Flah et al., 2021). Civil infrastructures such as buildings, bridges, wind turbines, and electrical transmission towers are prone to member or joint damage during their service periods due to operational and environmental variability (Sohn, 2007). If the damage present in the structures is not detected in its early stages, it leads to the catastrophic failure of the structures (Paral et al., 2021). In this aspect, structural health monitoring (SHM) provides useful information by analyzing the responses of the structure under external loads and detecting damage, as well as the current state of the structure (Avci et al., 2021).

The SHM strategies are divided into two types: the first is traditional based, and the second is vibration based. In the past, traditional methods such as the impedance method (Chen & Xu, 2012), the ultrasonic method (Fakih et al., 2018), and the acoustic emission method (Liu et al., 2017), modal strain energy methods (Pal & Banerjee, 2015), static displacement measurements (Park et al., 2015), the frequency response function method (Pal et al., 2013), mode shape changes (Görl & Link, 2003), and mode shape curvature (Roy, 2017) have also been utilized for the identification of joint damage in steel frame structures. Although the traditional methods are not suitable for identifying damage in large-scale structures, their data collection range is comparatively small (Magalhães et al., 2012).

In recent years, machine learning (ML) techniques have been extensively used in vibration-based health monitoring techniques. SHM using ML techniques addresses the pattern identification problem (Fallahian et al., 2022). However, the performance of the ML-based techniques depends on the selected ML model, samples, and number of learning datasets. Among the ML models, deep learning (DL) models have already become the most popular with their impressive performance in many scientific areas (Lecun et al., 2015; Silver et al., 2016). DL models use several learning layers that are multi-layered to find out how input and output datasets are related. Convolutional neural networks (CNN) are newly developed DL techniques that adopt how the visual brain of humans works (Oh et al., 2020). CNNs are an amazing tool for extracting and classifying features. They are mostly used to recognize data like pictures and videos (Konstantinidis et al., 2020). The complete overview and working procedure of CNN are explained in “CNN-based methodology”.

The study (González & Zapico, 2008) proposed an ML-based method for member damage assessment in a multi-story steel building based on the Artificial Neural Network (ANN) architecture and modal parameters. In the work, the modal parameter data were used as input to the ANN algorithm, which evaluated the mass and stiffness to provide the damage index. However, it was found that the technique was quite sensitive to modal errors. Similarly, Chang et al., (2018) presented an ANN-based approach for damage identification in a 7-story 3D steel frame structure using natural frequency and mode shape. The authors concluded that the approach was quite effective in identifying damage if the modal data were relatively accurate. Qian and Mita (2008) proposed the technique by utilizing time-history acceleration data as the input of the ANN framework to identify damage in a 5-story frame structure. Beheshti Aval et al., (2020) proposed a combination of the ANN model and different signal processing techniques to identify member damage in an ASCE benchmark building under seismic conditions. Dackermann et al., (2013) presented an SHM technique at the joint of a steel frame structure using frequency response function (FRFs) and ANN architecture. Moreover, the literature Kaveh and Iranmanesh (1998) discussed the use of several ANN models employed in structural analysis and optimization techniques. Kaveh et al. (2020) presented a boundary strategy-based approach for damage identification in truss structures using meta-heuristic algorithms According to Kaveh and Khavaninzadeh (2023), the ANN and four meta-heuristic optimization methods have been employed in other domains. Naresh and Vimal Kumar (2023) presented a joint damage identification in a single-story 2D frame structure using an SVM algorithm based on statistical features of vibration data. The combination of speeded-up robust features and KNN, SVM-based health monitoring techniques was presented by Naresh et al., (2023).

Using the modal parameters, Kaveh and Maniat (2015) studied an optimization-based technique for structural damage identification in different types of civil engineering structures. In the work, the modal parameters were considered the objective functions, and the Magnetic Charged System Search (MCSS) and Particle Swarm Optimization approaches were used to optimize both. It was noticed that MCSS delivers more accurate results than Particle Swarm Optimization. They stated that the damage can be accurately estimated even with measurement data contaminated with noise. A complete overview of the MCSS application in civil engineering was presented (Kaveh, 2016). The study Kaveh and Zolghadr (2015) developed an improved Charged System Search (CSS) algorithm for damage assessment in truss-type structures using modal parameters. Kaveh (2017) presented a damage detection method for skeleton-type structures using a CSS algorithm and incomplete modal data. Likewise, Kaveh et al., (2022a, 2022b) used a newly developed guided water algorithm-based optimization technique for damage detection in different types of civil engineering structures using incomplete modal data. Kaveh and Iranmanesh (1998) presented the application of backpropagation neural network and the improved counter propagation neural network along with neuro-optimization technique for analysis and design of the structure. Kaveh et al., (2022a, 2022b) used a Q-learning-based water strider model for the selection of sensor optimal positions in order to evaluate the health condition of the structure.

Recently, a CNN-based deep learning technique has been utilized for crack detection in pavements (Song & Wang, 2021), crack damage detection in concrete structures (Barkhordari et al., 2023; Yang et al., 2020), health monitoring of wind turbine blades (Yang et al., 2021; Zou & Cheng, 2022), and damage detection in bridge structures (Fu et al., 2021; Hajializadeh, 2023).

The fact that structural vibration characteristics, such as natural frequencies and modes, change with changes in environmental conditions, especially temperature variations. Changes in the natural frequencies of steel-framed structures have not been studied as broadly as bridge structures (Xia et al., 2012). Cornwell et al., (1999) presented the variability of modal frequencies with the temperature of a bridge. In the work, the first three natural frequencies decreased by 4.7, 6.6, and 5%, correspondingly, over a day with a variation of 22 °C in temperature. Fu and DeWolf (2001) discovered that the expansion bearings were partly restricted below 15.6 °C. The temperature reached from − 17.8 to around 15.6 °C, while the first three frequencies were reduced by 12.3, 16.8, and 9.0%, consecutively. The first three frequencies of a bridge decreased according to Liu and DeWolf (2006), who found that as temperatures increased by 0.8%, 0.7%, and 0.3% per °C, frequencies dropped by 0.7%, 0.8%, and 0.7% Hz. Likewise, a 17-story steel frame structure was observed by Nayeri et al., (2008), who found a significant relationship between frequencies and air temperature, but that frequency variations lagged temperature variations by a few hours. Using the Bayesian spectral density procedure, Yuen and Kuok (2010) obtained the modal frequencies of a building structure for a year. In contrast to their analytical findings, they discovered that the first three frequencies increased when ambient temperature increased.

From the literature, it is evident that most of the studies have been focused on SHM of members’ damaged identification as compared to joint damage identification. Moreover, in the ML/DL domain, there is no guarantee that a certain feature or classifier set will be the best option for all types of structural damage. Therefore, it is always an important area for research. In this work, a convolutional neural network (CNN)-based DL architecture is proposed to identify joint damage in a steel plane frame structure with welded connections using scalogram images of vibration data yet to be addressed. The main contributions of the current study include the following aspects:

  1. 1.

    Development of a CNN-based architecture for joint damage identification in a 2D steel plane frame structure under temperature variability.

  2. 2.

    The current investigation proposes the application of scalogram images for joint damage detection in a steel plane frame structure.

  3. 3.

    The effectiveness of the architecture for joint damage detection in a steel plane frame structure is further verified through an unseen dataset

CNN-based methodology

In the current research, base excitation is used to excite the structures in order to measure the time-history acceleration data in both healthy and variously damaged circumstances. Using the continuous wavelet transform (CWT) tool in MATLAB, the time-history acceleration responses are transformed into frequency-domain scalogram images. Then CNN is utilized to train and test the scalogram image data set and to categorize healthy and various damaged conditions. Figure 1 provides a comprehensive overview of the methodology involved in the study.

Fig. 1
figure 1

Flowchart indicates research methodology

CNN architecture

CNNs are composed of layers, including artificial neurons, which are organized in three dimensions, width, height, and depth (Meghana et al., 2021). It is very suitable for processing structured grid data such as images and videos (Konstantinidis et al., 2020). Convolution layers, pooling layers, and fully connected layers are the basic three layers in CNN architectures, as shown in Fig. 2.

Fig. 2
figure 2

Basic concept of CNN architecture

Convolutional layer

The basic component of CNN is the convolutional layer (CL). The learnable variables, such as weights (filters) and biases, were included in each CL. The width and height of the filter are spatially smaller than the input, but the depth is the same. The preceding layer’s feature maps were compressed using filters to generate the output using the activation function. The formula for CL for a pixel is as follows:

$${CL}_{X,Y}=\sigma \left(\sum_{i=1}^{a}{\sum }_{j=1}^{b}{\sum }_{k=1}^{n}{W}_{ijk}\times {X}_{x+I,Y+J-1, K}+B\right)$$
(1)

where a and b are the dimensions of the filters, n is the number of channels of each input image, B is the bias of the vector, \(\sigma\) activation function, and W is the weight matrix.

For understanding the convolutional operation, consider a 4 × 4 matrix as shown in Fig. 3. The filter size of the 3 × 3 matrix was the first step, randomly generated and updated from the architecture using a backward propagation approach. Since the stride was set to 1, sliding along the width and height of the input array produced four subarrays that were the same size as the filter. The filter matrix was multiplied by each subarray’s elements. The output value was subsequently generated by adding the values that had been multiplied and the bias. Because of the stride, the output’s size was less than that of the preceding layer.

Fig. 3
figure 3

Convolutional layer operation

A nonlinear activation function was employed to add nonlinearity to the architecture after the CL. The rectifier linear unit (ReLU) and SoftMax are two of the most often utilized activation functions in neural networks. ReLU and SoftMax both perform the following functions:

$$f\left(x\right)=(0,x)$$
(2)
$${f(x)}_{i}=\frac{{e}^{xi}}{{\sum }_{j=1}^{k}{e}^{xj}}$$
(3)

Pooling layer

For quicker computation and improved reliability of feature recognition, the size of the feature maps was reduced using the pooling layer. The most popular techniques were average and maximum pooling. The result of max pooling was the highest value in the filter region, as depicted in Fig. 4. In this case, the max pooling filter, dimension 2 × 2, was used to operate the input layer, a 4 × 4 matrix. Since the stride is two, the following filter should shift two things to the right or below. Then, the output’s dimension was reduced to 2 × 2, and the value reflected the maximum number of things in the response field. It used the average value in the filter area as the average pooling. In the present study, average pooling is utilized.

Fig. 4
figure 4

Pooling layer operation

Fully connected layer

The layer of the whole network that comes before the output layer is the fully connected (FC) layer. All the neurons are related to the features created by the preceding layer in this layer. In addition, this layer’s weights and biases transformed the feature into the appropriate class. The output of xl equation is displayed:

$${X}^{l}=\sigma ({X}^{l-1}\times w+b)$$
(4)

where w and b represent both the weights and bias vectors in the in-FC layer and σ is the activation function.

In the current study, the adopted architecture layers graph is shown in Fig. 5, which is described by layers in the design of a DL architecture with a more complex graph structure in which levels may receive inputs from other layers and send outputs to other layers.

Fig. 5
figure 5

Layer graph of the architecture

Experimental study

For the validation of the proposed architecture and the identification of joint damage in a steel frame structure with welded connections, a one-story, one-bay steel frame is considered in the engineering mechanics laboratory of IIT Bombay, India as shown in Fig. 6a. Table 1 provides the frame model’s dimensions and physical properties. The joints between beams and columns are made with welding. The base plate is joined to the columns tightly using welding, and the base plate is fastened to the shaking table using bolts. In Fig. 6b and Table 2, healthy and various damaged cases are depicted. As seen in Fig. 6b, the damage in the experimental investigation is produced by decreasing the cross-section of the member close to the joints.

Fig. 6
figure 6

a Experimental setup and b damage created near the joint

Table 1 Details of the parameters of the frame model
Table 2 Detail of the different experimental cases

A base acceleration that sweeps from 5.0 to 30.0 Hz at an interval of 0.50 Hz excites the frame. As shown in Fig. 6a, accelerometers are installed at each column close to the joints, 2 cm apart. The MGCplus DAQ from the HBM system is used to acquire acceleration signals at a sampling rate of 800.0 Hz. Figure 7a and b displays typical acceleration data for the healthy (HA1) and damage case 2 (DA2), respectively.

Fig. 7
figure 7

Raw acceleration signal a healthy and b damaged (D2) case

Results and discussion

In the CNN-based DL architecture (Fig. 2), the acceleration signals in the time-history domains (Fig. 7) are received from five accelerometers under healthy and variously damaged conditions. In the present work, each experimental case was repeated 15 times. By adopting the continuous wavelet transform (CWT) in MATLAB, generate the frequency-domain scalogram images as shown in Fig. 8. The size of the scalogram images is [875 × 656 × 3]. The imresize function is used to reduce the image’s size (224 × 224) to minimize the computing work. As per the literature (Dang et al., 2021; Ray, 2019), it is mentioned that ML and DL architectures require huge datasets for training and testing purposes. In this context, the data augmentation process is achieved by adding different levels of Gaussian noise to the resized image to generate a huge dataset, as shown in Fig. 9.

Fig. 8
figure 8

a Scalogram image of a healthy state, b scalogram image of damaged (DA2) case

Fig. 9
figure 9

Data augmented procedure

In this work, among the five accelerometers, four are used for training and validation purposes, and one is used for testing. The distribution of image data for training, validation, and testing is given in Table 3.

Table 3 Distribution of image dataset

The graph in Fig. 10 represents the training accuracy (100%) and validation accuracy (95%) of the proposed architecture carried out without considering the temperature changes in the data (HA1, DA1, and DA2), along with the loss for each case. For training the network, 80 epochs, a batch size of 30, and 168 iterations per epoch are considered. The figure also shows that the CNN architecture consistently enhances its confidence in the prediction of damage classes. At the earlier stage, few oscillations are spotted in both the cases of loss and validation accuracy, which is because of the limited image dataset used in a single batch. The small portion of the images in each batch passes through the CNN architecture and updates the weights in the architecture.

Fig. 10
figure 10

Training, validation, and loss curve of the CNN-based SHM architecture (without considering temperature changes)

The entire image set is divided into ten parts in order to conduct a tenfold test to improve the architecture’s degree of confidence. Out of these ten, nine are maintained for training the architecture during the operation, and one is preserved for validation. The tenfold test gives the confusion matrix’s (CM) typical results, which are given in Table 4. In the study, the average training accuracy is computed as \(\left(\left(\frac{45}{45}+\frac{41}{45}+\frac{41}{45}\right)/3\right)\times 100=94.07\%\).

Table 4 The CM indicates the mean testing outcome of CNN using 45 images

Temperature variability

In this section, both the localization and quantification assessment of damages to the plane frame structure are carried out under temperature variability using the CNN-based architecture mentioned earlier. In this context, temperature variability was taken as the origin of environmental changes (Sohn, 2007).

Certain earlier research (Cornwell et al., 1999; Kim et al., 2007) pretended that the most impacted parameters by changes in temperature were either the material’s density or Young’s modulus. They found that the frequency of the steel tower changes by 0.5% hourly for every 3 °C variation in temperature. The frequency variation for bridge-type structures was between 5 and 10% on a daily or seasonal basis. In the current study, it was shown that the natural frequency range changed by 0.2–2.85% for every 10 °C increase in temperature. For that purpose, in this study, a novel approach (circular shifting) was suggested to generate synthetic data when the temperature changes.

In the beginning, the time-domain data were modified to produce frequency bands (Eq. 5) shown in Fig. 11a to identify the structure’s inherent frequencies. The first natural frequency was then shifted by + 0.30%, − 0.30%, + 1.0%, − 1.0%, + 1.50%, and − 1.50%, which may represent a wide range of temperature variations (± 9 °C). An example of a frequency shift of + 0.3% will be used to determine the process. As a result of the shifting, certain bands will extend outside of the 400 Hz range, which was chosen and placed at the beginning of the bands. There are no changes in energy during this procedure:

Fig. 11
figure 11

a Original experimental data and b shifted data

$${X}_{\mathrm{c}}\left(\omega \right)=\sqrt{\frac{2}{\pi }}\underset{0}{\overset{\pi }{\int }}x\left(t\right)\mathrm{cos}\left(\omega t\right)\mathrm{d}t$$
(5)

To shift the frequency bands into the time-domain, Eq. 6’s inverse cosine transform was used. This modified dataset is used to represent the structure’s synthetic experimental time-domain response at various temperatures as shown in Fig. 11b. The dataset was produced similarly for the other situations stated in the previous paragraph. The schematic view of the shifting procedure is shown in Fig. 12.

Fig. 12
figure 12

Shifting procedure

$$x\left(t\right)=\sqrt{\frac{2}{\pi }}\underset{0}{\overset{\pi }{\int }}{X}_{\mathrm{c}}\left(t\right)\mathrm{cos}\left(\omega t\right)\mathrm{d}\omega t\ge 0,$$
(6)

By locating and quantifying the damage under temperature variations, three classes HA1, DA2, and DA3 are considered for this purpose. The robustness of the developed CNN-based architectures is further trained and validated with accuracies of 100% and 94.33% as shown in Fig. 13. As described in the previous section, the temperature variation (± 9 °C temperature variation) included six different changes in temperature. Table 5 provides the average tenfold classification testing accuracy of 91.85%.

Fig. 13
figure 13

Training, validation, and loss curve of the CNN-based SHM architecture (with considering temperature changes)

Table 5 The CM indicates the average testing outcome of CNN under temperature variability

In order to examine the effectiveness of the architecture, it is tested with pre-trained architecture (considering temperature changes) based on unseen images that were not used in the training and validation. The unseen images are created by considering the shift of sensor position data and different damaged data. For that purpose, one healthy (HA2) and two different damaged cases (DA3, DA4) (Table 2) are considered, and the outcomes are given in Table 6. The findings show a 90.37% accuracy rate, indicating that even if the location of sensors changed and different damaged data was used to test the architecture, each case remains capable of being classified. This shows that even when the network has not been trained on the unseen data, the architecture is still able to identify the type of damage.

Table 6 The CM indicates the mean testing outcome of CNN using the unseen image

Neural models’ interpretability

The broad adoption of DL architectures that are becoming more complex contributed to the concept that neural networks have become black boxes, increasing uncertainties about the interpretability of neural architectures in the scientific community over the last few decades. Indeed, it became quite difficult to clarify in detail how the architecture developed to recognize an image as belonging to one class over another. Therefore, it is possible to emphasize the strong localization capabilities of CNN architectures, even when trained purely for classification purposes and not specifically, for example, for object identification tasks, according to the authors (Rosso et al., 2023; Zhou et al., 2016). They achieved this by introducing class activation maps (CAM), a visual representation of those input image areas that mostly contribute to the classification score for a certain class, and not using the global average pooling layer in a particular method. It was suggested to use a gradient-weighted CAM or, as an example, the most recent gradient-free Score-CAM approach (Wang et al., 2020). In the current work, the MATLAB function “grad-CAM” is used to carry out gradient CAM. In contrast, it appears that the architecture primarily concentrates on the middle portions of the image for class HA, while in contrast, concentrating on the middle and top sections of the image for classes DA1 and DA2, in level, exhibits a different pattern of grad-CAM. This is evident in Fig. 14, which compares healthy and various damaged classes level by level.

Fig. 14
figure 14

Actual images of different classes and corresponding activation map

Conclusions

In the present work, a convolutional neural network (CNN)-based deep learning architecture is proposed to identify joint damage in a steel plane frame structure with welded connections under temperature variability. For that purpose, a laboratory-based, single-story steel plane frame is considered. Initially, the study was carried out without considering the temperature changes in the data. Then, the localization and quantification of the damage are identified under temperature variability. Finally, the architecture is tested with an unseen dataset with the pre-trained architecture (temperature variability).

  • The training and validation accuracy is found 100% and 94.88%, whereas the testing accuracy is 94.07%, which shows that the architecture can differentiate the healthy and different damaged cases.

  • The study assumes that temperature changes that may encompass (± 9 °C) variations are caused by the shifting of natural frequencies (+ 0.30%, − 0.30%, + 1.0%, − 1.0%, + 1.50%, and − 1.50%). The findings indicate that the architectures can be localization and quantification of the damage with a testing accuracy of 91.85% with these variants.

  • The proposed architecture can automatically classify the unseen with an accuracy of 90.37%. It represents that the health monitoring architecture has the potential to identify the damage near the joints.

  • The architecture needs only acceleration time-history data for the feature’s extraction which was used for the damage identification of the planer frame structures that require the minimum labor intervention.

  • Significant variations between the different classes are shown by the class activation map. Overall, it shows that the design architecture is strong enough to localize damaged regions.

  • Further, the results show that the proposed architecture has come up as a potential automated and mechanized tool for the health monitoring of joints of planer frame structures.