Introduction

With the advancement of science and technology, modern productivity has significantly improved, and the application of frame structure has become widespread in various fields, including mining machinery, civil engineering, aerospace, and bridge construction [1, 2]. Frame structure consists of interconnected members held together by bolts or welding, it often experiences failures due to factors such as bolt loosening, uneven force distribution and oxidation [3]. These failures may lead to machinery malfunction and catastrophic collapse of the frame structure, which can pose significant risks to human life, property, and safety. Therefore, it is of great engineering practical significance to propose effective damage diagnosis methods for state detection and damage identification of frame structure, and to make early predictions of their healthy operating states.

The composition of the frame structure is increasingly moving towards gigantism, complexity and modularity. However, the rising computational costs of data pose challenges in achieving effective damage diagnosis for these structures. Traditional damage diagnosis methods include short time Fourier transform (STFT) [4], K − nearest neighbor algorithm (KNNA) [5, 6], fuzzy cluster analysis (FCA) [7, 8] and peak to peak comparison (PTPC) [9]. Li et al. conducted numerical research on planar truss structures by using autocorrelation functions of structural acceleration responses under white noise excitation to form a covariance matrix; they identified damage conditions under different noise levels [10]. Malekjafarian et al. proposed an improved transition mode identification method by using Hankel matrix averaging to detect closely spaced modes [11]. Yang et al. employed empirical mode decomposition (EMD) and Hilbert transform to extract damage peaks caused by sudden changes in structural stiffness, thereby achieving detection of the moment and location of damage occurrence [12]. Li et al. investigated a combined method by using EMD and wavelet analysis to detect changes in structural response data; they decomposed the structural vibration response signal into multiple single−component signals by using EMD, and then transformed them into analytical signals through the Hilbert transform; subsequently, they performed a wavelet transform on each single−component signal to accurately identify damage location and severity [13]. Zhu et al. proposed a bearing fault diagnosis method based on wavelet packet decomposition and KNNA; this method first decomposed the original bearing vibration signal by using wavelet packet decomposition, then calculated the sample entropy value for each decomposed signal to construct a feature vector, and finally employed KNNA for bearing fault diagnosis [14]. Despite the widespread application of traditional damage diagnosis methods in various fields, the changing complexity of frame structure with scientific progress poses challenges. When using traditional damage diagnosis methods to deal with damage diagnosis problems with complex damage mechanisms, numerous classification categories and massive data, it may lead to a decrease in diagnostic performance. This problem poses a challenge to achieving efficient and accurate damage diagnosis, which contradicts the requirements for rapid and intelligent development of structural damage diagnosis.

In recent years, with technological advancements and continuous algorithms improvements, deep learning has achieved significant success in various fields. For instance, in computer vision [15,16,17], natural language processing [18, 19], speech recognition [20, 21], and autonomous driving [22, 23]. Kostic et al. combined sensor clustering−based time series analysis with artificial neural networks for bridge damage detection under temperature variations; they performed 2000 simulations with temperature effects and damage conditions by using a pedestrian bridge finite element model [24]. Khodabandehlou et al. utilized vibration signals and a two−dimensional deep convolutional neural network to extract features from historical acceleration responses and reduce the dimensionality of the response history; this enabled damage state classification through a limited number of acceleration measurements [25]. Avci et al. proposed a one−dimensional convolutional neural network−based wireless sensor network (WSN) for real−time and wireless structural health monitoring; in this method, each CNN was assigned to its local sensor data, and the respective models were trained for each sensor unit without any synchronization or data transmission [26]. Tang et al. segmented the original time series data and applied visual processing in both time and frequency domains; then they overlaid these segmented images into single or double−channel images and labeled them based on visual features; subsequently, they designed and trained a CNN for data anomaly classification [27]. Cuşkun et al. employed a novel 3D deep learning architecture to classify MR images of patients with brain tumors, thereby determining the primary site of brain metastasis. [28]. Al-Areqi and colleagues proposed a machine learning approach for the rapid diagnosis of the Covid-19 disease, with a focus on the impact of different features on classification accuracy [29]. Yue et al. proposed a fault diagnosis method by using deep adversarial transfer learning; they used a single−layer CNN and transferred learning to employ the ResNet residual network as both the generator and discriminator in a GAN and obtained higher accuracy in both GAN recognition and generation capabilities [30]. GAN has shown high accuracy in dealing with problems with limited training samples and can effectively extract feature information even from one−dimensional vibration data. However, GAN generates many simulated models, making training time − consuming and computationally expensive. It exhibits unique accuracy advantages when handling problems with diverse vibration data but limited sensor numbers. However, in the context of fault diagnosis for frame structure with a large amount of data and numerous sensors, it often faces drawbacks such as low efficiency, limited diagnostic accuracy and poor intuitiveness.

In order to solve the problem of accuracy degradation caused by multi-sensor data in frame structure damage diagnosis and reduce the computational cost of the model, and achieve accurate damage diagnosis on mobile devices, this paper proposed a new structural damage diagnosis method. Firstly, the sensor data was subjected to mean filtering to achieve smoother data. Subsequently, the processed data was input into the SGNet model for training. The foundation of the SGNet model is based on the ShuffleNet [31] and GhostNet [32], they are lightweight models. By making appropriate improvements to these models, the new SGNet model became more suitable for structural damage diagnosis in building frame structures, thereby enhancing the efficiency and accuracy of frame structure diagnosis. This article has two important contributions. One is to propose an accurate damage diagnosis method for frame structure in a multi-sensor data environment, and the other is to propose a lightweight structural damage diagnosis model suitable for mobile devices while ensuring the accuracy of damage diagnosis, greatly reducing the computational cost of the model.

This paper consists of 6 parts, “Signal Filtering Method” section introduces the signal filtering method, “Neural Network Model” section presents the neural network model, and “Damage Diagnosis Process” section introduces the damage diagnosis process. “Experimental Study” section is the main content of the experiment, including experimental objects, experimental data, damage degree diagnosis experiment, damage type diagnosis experiment, model comparison experiment and discussion. “Conclusion” section summarizes this paper and draws relevant conclusions.

Signal Filtering Method

Filtering is a commonly used method in signal processing; it is used to remove noise or unwanted components from signals while preserving helpful information. Noise arises from random fluctuations caused by measurement errors, sensor interference, or other environmental factors. Filtering aims to extract useful information from signals while suppress or eliminate redundancies. Mean filtering and median filtering are two standard methods used for this purpose.

Mean filtering is a linear filtering method that can achieve signal smoothing by replacing each sample point with the average value of samples in its surrounding neighborhood. The mathematical formula for mean filtering can be represented as:

$$y\left[n\right]=\frac{1}{N}{\sum}_{i=n-N/2}^{n+N/2}x\left[i\right]$$
(1)

where y[n] is the filtered signal sample, x[i] is the original signal sample, and N determines the size of the neighborhood used for calculating the average value. A larger neighborhood size can provide stronger smoothing effects but may result in the loss of signal details.

Median filtering, on the other hand, is a non − linear filtering method that can remove noise by replacing each sample point with the median value of samples in its surrounding neighborhood. The mathematical formula for median filtering can be represented as:

$$y\left[n\right]=\textrm{median}\left(x\left[n-N/2\right],x\left[n-N/2+1\right],\dots, x\left[n+N/2\right]\right)x\left[i\right]$$
(2)

where y[n] is the filtered signal sample, x[i] is the original signal sample, and N determines the size of the neighborhood used for calculating the median value. Median filtering is suitable for cases where noise statistics do not follow the Gaussian distribution and isolated outlier values are present. A larger neighborhood size can remove larger−size noise, but it may lead to blurring of signal.

Using mean and median filtering as signal processing methods can improve signal quality. However, their applicability depends on specific application cases and the statistical characteristics of the noise. In some cases, these methods can improve accuracy, but in others, they may impact signal details. Therefore, adjustments and evaluations should be made based on the specific problem when using these methods. Generally, sampling mean filtering is often used when the signal follows the Gaussian distribution. On the other hand, if the signal does not follow the Gaussian distribution and the preservation of signal edges and detail features is a concern, median filtering is more suitable.

Neural Network Model

ShuffleNet Model

ShuffleNet is a lightweight convolutional neural network model proposed by Megvii in 2018. Its main features are group convolution, channel shuffle and depthwise separable convolution. Group convolution divides the input tensor into multiple subgroups and performs independent convolution operations on each subgroup, which can reduce computational complexity and model parameters. Channel shuffle rearranges the output tensor of group convolution to achieve cross−group information fusion and reduce information bottlenecks; it is shown in Fig. 1. Depthwise separable convolution is a lightweight convolutional operation that splits the standard convolution into depth and point−wise convolution, it can further reduce computational complexity and model parameters. Compared to the MobileNet [33] architecture, ShuffleNet demonstrates significant advantages in terms of performance; it has smaller parameters and computational sizes, and higher accuracy. ShuffleNet is mainly composed of multi-layer ShuffleNet unit structures, and ShuffleNet unit mainly utilizes the advantages of channel rearrangement and combines the residual principle of ResNet [34] model, it is illustrated in Fig. 2.

Fig. 1
figure 1

Channel Shuffle structure

Fig. 2
figure 2

ShuffleNet unit structure

GhostNet Model

The GhostNet model is a lightweight convolutional neural network model proposed by Huawei Noah’s Ark Lab in 2019. Deep convolutional neural networks [35, 36] typically consist of many convolutions, resulting in a significant increase in computational costs. In contrast, the GhostNet model has a relatively simple network structure, enabling faster training and inference speeds with smaller computational and parameter sizes while achieving higher accuracy. Its distinguishing feature is the introduction of the Ghost Module structure.

The Ghost Module is a lightweight convolutional module proposed to extend ordinary convolutions. It achieves this by splitting the input channels into two parts: the main branch and the ghost branch. The convolution kernels of the main and ghost branches are independent. The output of the main branch serves as the output of the entire module. In contrast, the production of the ghost branch can be discarded or used for subsequent operations, thus reducing computational costs; it is shown in Fig. 3.

Fig. 3
figure 3

Ghost Module structure

GhostNet comprises multiple Ghost Bottlenecks; each Ghost Bottleneck is formed by stacking multiple Ghost Modules. When the stride is 1, a Ghost Bottleneck consists of two stacked Ghost Modules, which are connected by using residual connections. The first Ghost Module acts as an expansion layer to increase the number of channels, while the second Ghost Module reduces the number of channels to match the residual connection; it is shown in Fig. 4(a). When the stride is 2, in addition to Fig. 4(a) configurations, a 3 × 3 depthwise separable convolution is inserted between the two Ghost Modules, as illustrated in Fig. 4(b).

Fig. 4
figure 4

Ghost Bottleneck structure

SGNet Model

Since both ShuffleNet and GhostNet are designed for lightweight and efficient models while maintaining accuracy and precision, combining the advantages of two models can result in a superior model ensemble effect. The different characteristics and advantages of ShuffleNet and GhostNet complement each other, providing more comprehensive feature extraction and representation capabilities. ShuffleNet’s channel shuffle and group convolution operations can help capture spatial information and feature correlations, while GhostNet’s ghost channels can help improve parameter efficiency and feature utilization. Combining the advantages of the above two models not only ensures powerful and efficient feature extraction and model representation capabilities, but also enables a more lightweight model. The new model is named SGNet (ShuffleNet and GhostNet−based Network), and its structure is shown in Fig. 5.

Fig. 5
figure 5

SGNet model

Using large convolution kernels and strides can increase the receptive field and capture a larger range of features in the input signal, which helps extract vibration features relevant to damage. In addition, it can also reduce the size of the output feature maps, achieve signal downsampling and reduce computational costs and data dimensions. Therefore, the SGNet model’s initial layers utilize large convolution kernels and strides. The first convolutional layer has a kernel size of 64 and a stride of 8, followed by a Dropout layer. The second convolutional layer has a kernel size of 32 and a stride of 4. After each convolutional layer, Batch Normalization (BN) and max−pooling layers are applied. Following the second max−pooling layer, a structure with 3 Ghost Bottlenecks and 3 ShuffleNet units alternately distributed. In the latter part of the model, three fully connected layers are employed to obtain the output results. The number of outputs can be modified based on the classification categories, denoted as n. Dropout is added between each fully connected layer to prevent overfitting.

Damage Diagnosis Process

The damage diagnosis process mainly consists of three parts: data processing, model building and training, and model testing, as illustrated in Fig. 6.

Fig. 6
figure 6

Damage diagnosis process

  1. (1)

    Data processing

Data processing consists of several steps, including data acquisition, mean filtering, data augmentation, data normalization and data partition. Firstly, vibration signals obtained from acceleration sensors under different conditions were denoted as Sij (i represents the condition, j represents the sensor position), it shown in formula (3). Then, the acquired vibration signals underwent mean filtering, and the filtered data was organized and stored according to the sensor ID in data storage files.

$${S}_{ij}=\left[\begin{array}{cccc}{S}_{11}& {S}_{12}& \cdots & {S}_{1j}\\ {}{S}_{21}& {S}_{22}& \cdots & {S}_{2j}\\ {}\vdots & \vdots & \vdots & \vdots \\ {}{S}_{i1}& {S}_{i2}& \cdots & {S}_{ij}\end{array}\right]$$
(3)

Since a large amount of data is required for model training, but the acquired data samples are limited, data augmentation needs to perform to increase the number of data samples. Data augmentation helps avoid overfitting due to a small dataset and allows the model to learn the data distribution better, thereby enhancing the model’s generalization ability. Sliding window overlapping sampling is a commonly used data augmentation method. The data after data augmentation was denoted as SAij, as shown in formula (4). The sliding window overlapping sampling can be depicted in formula (5), in which L represents the length of the vibration signal, W is the sliding window size, and S is the step size for sliding the window. N represents the final number of samples obtained.

$$S{A}_{ij}=\left[\begin{array}{cccc}S{A}_{11}& S{A}_{12}& \cdots & S{A}_{1j}\\ {}S{A}_{21}& S{A}_{22}& \cdots & S{A}_{2j}\\ {}\vdots & \vdots & \vdots & \vdots \\ {}S{A}_{i1}& S{A}_{i2}& \cdots & S{A}_{ij}\end{array}\right]$$
(4)
$$N=\frac{L-W}{S}+1$$
(5)

Data normalization scales all feature information within a specified range, which is beneficial for the convolutional neural network to extract features from signals, thus improving the algorithm’s stability, convergence speed and accuracy. The normalized data was represented as SANij, as shown in formula (6). After normalization, the data was divided into training dataset, validation dataset and testing dataset based on the corresponding proportions.

$$SA{N}_{ij}=\left[\begin{array}{cccc} SA{N}_{11}& SA{N}_{12}& \cdots & SA{N}_{1j}\\ {} SA{N}_{21}& SA{N}_{22}& \cdots & SA{N}_{2j}\\ {}\vdots & \vdots & \vdots & \vdots \\ {} SA{N}_{i1}& SA{N}_{i2}& \cdots & SA{N}_{ij}\end{array}\right]$$
(6)
  1. (2)

    Model building and training

For model training, the model structure needs to be constructed first. The number of required models was determined based on the number of sensors for detection. Then, the model underwent forward propagation during training to obtain the output results and calculate the loss function. After getting the loss function, the model performed backward propagation for updating, computing gradients and optimizing parameters. As training epochs increased, the accuracy of the model can be continuously improved while the loss function will continue to decrease until the model reaches the set number of training epochs.

  1. (3)

    Model testing

After SGNet j reached the designated number of training epochs, the testing dataset of sensor j was input into the model. The model parameters must be modified and retrained if the output accuracy does not meet the requirements. If the output accuracy meets the requirements, the model can be saved.

Experimental Study

This section includes the experimental object, experimental data, damage degree diagnosis experiment, damage type diagnosis experiment, model comparison experiment and discussion. The deep learning framework used in the experiments is Tensorflow version 2.6.1, and the computer CPU used in the experiments is Intel Core i7 − 10750H with an NVIDIA GTX 1660Ti GPU.

Experimental Object

The experimental object used in this study is a four−story frame structure constructed by Columbia University [37]. Its 3D model is shown in Fig. 7(a). The size of the frame structure base is 2.5 m × 2.5 m, and the height of the frame structure is 3.6 m. The structure consists of four faces: east, south, west, and north, each face is composed of beams and columns with the same structural size. Different components of various orientations were represented by the same codes (e.g., east 1, north 1) in the fig. A total of 15 acceleration sensors were placed on the frame structure. Each floor junction had 3 acceleration sensors (1 on the west, 1 on the east, and 1 near the central column). Acceleration sensors numbered 1–3 were placed at the ground level, while the rest were positioned at the relevant locations on each floor’s top. The positions of the sensors are shown in Fig. 7(b).

Fig. 7
figure 7

Frame structure and sensor position [37]

By removing or loosening the diagonal bracing and bolt connections numbered 1–12 in Fig. 7(a), nine different damage cases were simulated in the experiment, and the specific operations for each case are shown in Table 1. Since the overall damage degree of the frame structure varies in these nine cases, there are significant differences between the collected vibration signal data. Therefore, the differences in the data can be used to distinguish the damage cases of the frame structure.

Table 1 Specific damage cases

Experimental Data

In experiments, the frame structure was sequentially damaged according to the Table 1, and a 200 Hz impact was applied to the frame structure to acquire vibration signals. Then, the vibration signals were processed by using the data processing procedures in “Damage Diagnosis Process” section.

From the publicly available data provided by Columbia University, 135 data were obtained from the 15 sensors under the nine damage cases. In experiments, both mean filtering and median filtering were used to process the data. The processed results of the data obtained from the sensor numbered 13 in case 1 are shown in Fig. 8. From the figure, it can be observed that after using mean filtering, the data edges become smoother, and after using median filtering, the data density becomes smaller with prominent edges.

Fig. 8
figure 8

Signal filtering results

To select an appropriate filtering method, it is necessary to further determine whether the data conforms to the Gaussian distribution. Taking the data of the sensor numbered 5 under case 1 as an example; the histogram, Quantile−Quantile (Q − Q) plot, skewness, and kurtosis of the data were plotted in Fig. 9. From the figure, it can be observed that the histogram shape is approximately bell−shaped, the data points in the Q − Q plot are distributed close to the straight line, and the skewness of the signal is 0.007, with a kurtosis close to 3.49, this indicates that the signal approximately follows the Gaussian distribution. As mentioned in “Signal Filtering Method” section, when the data follows the Gaussian distribution, the mean filtering method is more suitable. Therefore, based on the analysis results from Figs. 8 and 9, the mean filtering method was selected to process the data in experiments.

Fig. 9
figure 9

Signal histogram and Q-Q plot

The length of the data collected in the experiment varies under different cases, with the lengths of 24,000, 60,000, and 72,000 for cases 1–5, case 6 and cases 7–9, respectively. Different sizes of sliding step were used for different cases to ensure the number of samples was similar for each case during data augmentation. The sliding window size W was set to 1024 for all cases, and the sliding step sizes S for cases 1–5, case 6, and cases 7–9 were set to 4, 10, and 12, respectively. After enhanced processing of the data, the number of samples N for cases 1–5, case 6, and cases 7–9 were 5744, 5898, and 5915, respectively.

Damage Degree Diagnosis Experiment

Since case 1 and case 9 represented the undamaged and damaged conditions of the frame structure, respectively, the data (SAN1j and SAN9j) of the sensor numbered j under case 1 and case 9 were used for binary classification training in the SGNet j model (j = 1–15). The SGNet j model with the required accuracy were saved. Then, the data of each sensor under different cases was input into each model. In other words, for each case i and the sensor numbered j, the data was input to model j to obtain the diagnosis result. Based on the obtained results, the Pod (Probability of damage) was calculated according to Fig. 10. Finally, the average probability Podavg of all sensors under case i represented the damage degree of the frame structure under case i. The larger the Podavg is, the more severe the overall damage to the frame structure.

Fig. 10
figure 10

The calculation process of Pod

Training Results Analysis

Due to the extensive number of sensors used in experiments, this section primarily focused on training results with data collected from the sensor numbered 13. In order to study the impact of different parameters on the model, this paper used convolution kernels of different sizes and strides for training. During training, the Adam optimizer and cross-entropy loss function were employed. The experimental results were presented in Fig. 11, in which the notation “First 32-8 Second 16-4” signifies that the first convolutional layer used kernel sizes of 32 and strides of 8, while the second convolutional layer used kernel sizes of 16 and strides of 4.

Fig. 11
figure 11

The influence of different convolution kernels and strides on the model

It can be seen from Fig. 11 that, under the “First 64-8 Second 32-4” configuration, the validation accuracy exhibits the most significant improvement, the loss function decreases rapidly, and the training process remains stable. Conversely, under the “First 128-16 Second 64-8” configuration, there is substantial fluctuation in the training process. Excessive kernel sizes and strides can lead to a rapid reduction of feature maps, leading to the model ignoring important information in the data. Consequently, the model’s parameters were determined to be “First 64–8 Second 32–4” based on comprehensive evaluation of the model’s performance on the validation dataset.

Under the above selected parameters, the training process of model SGNet 13 for 30 epochs is shown in Fig. 12. From the figure, it can be observed that the model starts to converge when the training epoch reaches 5. The training and validation datasets’ accuracy exceeds 99%, and the loss is below 1 × 10−4. When the training epoch reaches 25, the loss of the training dataset stabilizes around 5 × 10−4, and the validation dataset’s loss remains close to 0.

Fig. 12
figure 12

Training results of damage degree

Confusion Matrix Analysis

The confusion matrix, also known as the error matrix, is a method used to evaluate the performance of a model. It shows the relationship between the model’s predicted and true labels. By analyzing the confusion matrix, classification indicators such as accuracy, recall, precision, and F1 score can be calculated, which can comprehensively evaluate the performance and error types of the classification model, help understand the classification performance of the model, and further adjust and optimize the model as needed.

The confusion matrices of the training, validation, and testing datasets for model SGNet 13 during different training epochs are shown in Fig. 13. By calculating the data in the figure, it can be seen that when the training epoch reaches 4, the training, validation, and testing dataset’s accuracy is approximately 86%. The recall is 0.787, the precision is 1, and the F1 score is 0.881. When the training epoch reaches 30, the accuracy, recall, precision and F1 score for all three datasets reach a perfect score of 1. The model can accurately distinguish between undamaged and damaged data.

Fig. 13
figure 13

Confusion matrix of damage degree experiment

Damage Degree Diagnosis Results

Based on the damage probability calculation process shown in Fig. 10, the Podij and Podavg for 15 sensors under 9 different damage cases were recorded in the experiment. The data was presented in Table 2 and visualized in Fig. 14.

Table 2 Podij and Podavg for different sensors and cases
Fig. 14
figure 14

Podij for different sensors and cases

In Case 1, the Pod for all 15 sensors is below 1%. In Case 2, the Pod of each sensor is below 10%. In Case 3, the Pod of the sensor numbered 2 is the highest, it is 99.48%. In Case 4, the Pod of sensors numbered 2 and 4 are above 80%, while the rest are below 65%. In Cases 5–6, the highest Pod of sensors numbered 4, 6, 9, 12, and 15 can reach 94%. In Cases 7–9, the highest Pod of sensors numbered 4–15 can reach 100%. When using data from sensors numbered 5, 8, 11, and 14 at the center column for damage diagnosis, the Pod of cases 1–5 does not change much. As the degree of damage to the frame structure increases, Podavg also gradually increases, and there is a certain difference between each Podavg. Therefore, the overall damage degree of the frame structure can be judged based on the size of Podavg. The proposed SGNet model performs excellently in damage degree diagnosis of frame structure and can be used to determine the damage degree of frame structure and further identify the damaged locations.

Damage Type Diagnosis Experiment

The purpose of the damage type diagnosis experiment is to conduct multi classification training on the model, detect whether the model can distinguish the damage case where the sensor’s data belongs, and test the classification ability of the model. In this experiment, a new SGNet model was constructed, and data from each sensor under 9 different cases were used for training to obtain a 9 − class classification result. On this basis, the performance of the model was tested through training results, and the damage types of the frame structure were obtained.

Training Results Analysis

Taking the training results of the data collected by sensor numbered 13 as an example; the results are shown in Fig. 15. The figure shows that when the training epoch reaches 3, the accuracy of the training dataset is above 70%, while the accuracy of the validation dataset is 84%; the loss of the training dataset is around 0.7, and the loss of the validation dataset is 0.3. After the training epoch reaches 5, the training curves begin to converge; the accuracy reaches over 90%, and the loss decreases to below 0.1. When the training epoch reaches 26, the training and validation dataset’s accuracy is higher than 99%, and the loss is around 0.02. From the above results, it can be seen that the SGNet model has a faster convergence speed, higher accuracy, and lower loss during the training process.

Fig. 15
figure 15

Training results of damage type experiment

In the initial stages of training, the model’s parameters typically start in a randomly initialized state. This leads to some degree of variability in training results at the outset, where the loss function and accuracy may exhibit instability. This is because the model needs to adapt to the data and gradually adjust its parameters for better fitting.

As the number of training epoch increases, the model gradually converges towards a state closer to the optimal solution. This is manifested in an incremental improvement in accuracy and a gradual decrease in the loss function. The model fine-tunes its parameters over time through the optimization algorithm to fit the training data more effectively.

When a certain stage is reached with an adequate number of training epochs, the model’s parameters have essentially found the optimal solution. Consequently, the accuracy becomes very high, and the loss function is extremely low. This indicates that the model has become highly stable at this point and can perform tasks with a high degree of accuracy.

Confusion Matrix Analysis

Taking the training results of the data collected by sensor numbered 13 as an example; the confusion matrices are shown in Fig. 16. In the matrices, labels 1 to 9 represent the 9 damage cases. When the training epoch reaches 10, the accuracy of training, validation and testing is 99%, 98.9%, and 99%, respectively; when the training epoch reaches 30, the accuracies for all three datasets reach 100%. The above results indicate that the model has strong classification ability and achieved satisfactory classification results.

Fig. 16
figure 16

Confusion matrix of damage type experiment

T − SNE Visualization

T − SNE (T − Distributed Stochastic Neighbor Embedding) is a machine learning algorithm for dimensionality reduction and data visualization in high−dimensional spaces. It can map high−dimensional data to two−dimensional or three−dimensional space, and effectively display the structure and relationships of high-dimensional data and reveal patterns and clusters in the data. T-SNE is of great value in exploratory data analysis and visualization, as it can capture nonlinear relationships while preserving local structures, making the results easy to observe and understand.

Taking the visualization result of the data collected by sensor numbered 13 as an example, the data during the training process was reduced to lower dimensions by using T − SNE, and the visualization results of the original data, the data of the 10 training epoch, and the data of the 30 training epoch were obtained, it is shown in Fig. 17.

Fig. 17
figure 17

T − SNE visualization

It can be seen from Fig. 17 that the original data shows a relatively scattered distribution of the 9 damage cases. When the training epoch reaches 10, the different cases can be somewhat distinguished and case 7 and case 9 have some dispersion at their edges. When the training epoch reaches 30, the model can completely distinguish different data labels.

Model Comparison Experiment

Model Parameter Quantity

The number of parameters in a model determines the diagnosis equipment and computational cost. MobileNet V1 has approximately 4.2 million parameters, GhostNet has about 5.2 million parameters, ShuffleNet 0.5x has about 1.8 million parameters, ShuffleNet 1.0x has about 2.3 million parameters, while SGNet has only about 1 million parameters, as shown in Table 3. SGNet has high flexibility and adjustability, and can be adjusted and optimized according to specific situations. From the experimental results, it can be seen that SGNet has smaller parameter quantities and lower computational costs compared to other classical models in terms of the total parameter quantity of the model.

Table 3 Comparison results of model parameter quantity

Testing Dataset Accuracy

To verify the superiority of the SGNet model, this section compared the testing dataset accuracy of SGNet, MobileNet, GhostNet, and ShuffleNet under the same experimental conditions. The experiment data was taken from the damage type diagnosis experiment. The number of training epochs was set to a fixed value, and the accuracy and average accuracy of the test dataset for four models with training epochs of 1, 3, 5, 10, 15, 20, 25, and 30 were recorded. The results are shown in Table 4, and the data in Table 4 is visualized in Figs. 18 and 19.

Table 4 Comparison results of testing dataset accuracy / (%)
Fig. 18
figure 18

Testing dataset accuracy (Bar chart)

Fig. 19
figure 19

Testing dataset accuracy (Stack bar chart)

The experimental results in Table 4 can demonstrate the following performance.

  • When the training epoch reaches 1, the accuracy of SGNet is 15.1%, the accuracy of MobileNet and GhostNet is 10.9%, and the accuracy of ShuffleNet is 10.1%.

  • When the training epoch reaches 3, the accuracy of SGNet accuracy improves to 61.7%, whereas the accuracy of MobileNet reaches 18.8%, and the accuracy of the other two models is lower than 11%.

  • When the training epoch reaches 20, the accuracy of SGNet is 97.9%, while the accuracy of the other three models is lower than 90%.

  • Finally, with the training epoch reaches 30, the accuracy of SGNet achieves 99.8%.

In terms of average accuracy, SGNet’s performance stands out, reaches 78.61%. This is notably higher than the average accuracies of MobileNet and ShuffleNet by 11.38% and 16.2%, respectively. The average accuracy of GhostNet is lower than 55%. Overall, SGNet consistently outperforms the other three models in terms of accuracy and exhibits faster convergence.

Comparison of the Accuracy of Other Methods

Ren et al. conducted experimental research by using the proposed BICCN [38] model to analyze data from 12 acceleration sensors under nine different damage cases. They obtained a set of samples suitable for structural damage localization and diagnosis for the frame structure. In order to ensure the reliability of the selected samples, experiments were conducted by using various models including 1DCNN, WDCNN [39], TICNN [40], ITICNN [41], and BICNN. The results are shown in Table 5; define results with an accuracy exceeding 95% as “excellent.” Among these models, 1DCNN had zero instances of excellence, while WDCNN, TICNN, ITICNN, and BICNN each had five instances of excellence. On the other hand, the SGNet model had 10 instances of excellence, demonstrating superior accuracy compared to the other models.

Table 5 Comparison results of the accuracy of other methods

Based on the data from Table 5, it can be observed that the model’s accuracy is relatively low at specific locations in the frame structure, specifically at locations numbered 5, 8, 11, and 14 in Fig. 7. This indicates that the model faces certain challenges or difficulties in damage diagnosis at the central pillars of the frame structure. This may be attributed to the fact that central pillars typically bear greater structural loads and stresses, making the detection and classification of damage more complex. Some types of damage may be harder for sensors to detect, leading to potentially higher levels of noise in the data from these locations, which affect the model’s accuracy.

To improve the model’s accuracy at these locations, it may be necessary to obtain more training data, especially focusing on damage cases related to central pillars. In addition, it is possible to consider improving the sensors layout to provide more reliable data. Overcoming these challenges will help improve the accuracy of damage diagnosis for the central pillars of frame structure.

Discussion

The SGNet model is an improved version based on ShuffleNet and GhostNe. It aims to reduce the computational cost of the model through a series of lightweight adjustments. These improvements include reducing the number of model parameters, decreasing network depth, and adopting more efficient network architecture. SGNet has achieves a better balance between performance and computational cost through carefully designed convolution kernels and step sizes, as well as parameter optimization, while maintaining high accuracy. Experimental results show that the highest accuracy of the model proposed in this article is 99.8%, with only 1 million parameters, and its performance is superior to other models.

However, there are certain limitations to be noted. 1. This method may be sensitive to the quality and placement of sensors, thereby affecting the accuracy of diagnosis results. 2. It is limited to data from frame structures and may not be applicable to other structural types. 3. The model proposed in this article requires a large number of data samples for training, which may not be suitable for situations with fewer data samples. 4. Evaluation of the model was performed by using dataset of frame structures provided by Columbia University, but it may not cover all real-world application scenarios.

In future work, the authors plan to collect other publicly available structural datasets and apply transformations to the Columbia University datasets, including the introduction of noise with different signal-to-noise ratios to conduct research on damage diagnosis in noisy environments. In addition, the authors will explore alternative methods for structural damage diagnosis in cases with limited data samples, addressing the issue of lower accuracy when data is scarce.

Conclusion

In order to propose a more efficient damage diagnosis of frame structure while reducing computational costs, this paper introduced a lightweight model suitable for mobile devices, named SGNet. This model is an improvement on the ShuffleNet and GhostNet models. This model has stacked the ShuffleNet and GhostNet modules, and carefully designed the convolution kernel and step size for the first layer of the model. In addition, a data preprocessing method employing mean filtering had been successfully applied to the damage diagnosis of frame structure.

To evaluate the performance of the proposed method, a frame structure at Columbia University was selected as the experimental subject. Multiple damage diagnosis experiments were conducted by using the SGNet model, and the proposed model was compared with MobileNet, GhostNet, and ShuffleNet under the same conditions. The following conclusions were drawn from the experimental research.

  1. (1)

    The experimental results of damage degree diagnosis indicated that the proposed SGNet model has the characteristics of fast convergence and high accuracy. In addition, the proposed SGNet model performed well in different damage cases and could determine the overall damage degree of the frame structure based on Podavg.

  2. (2)

    The SGNet model exhibited strong multi-classification capabilities in the damage type diagnosis experiment. The accuracy of the testing dataset reached 99% when the training epoch reached 10. The proposed model could quickly diagnose damage types in damage cases of frame structures.

  3. (3)

    In model comparison experiments, the SGNet model had the fewest parameters and the lowest computational cost compared to other models. Its accuracy on the testing dataset outperformed other models at different training epochs, and its average accuracy was the highest. The SGNet model also demonstrated the fastest convergence speed.

In summary, the experimental results of this paper unequivocally demonstrated the superiority of the SGNet model in the context of structural damage diagnosis for frame structure. It also highlighted the potential applicability of this model for mobile applications. Furthermore, the proposed preprocessing method provided valuable reference for similar tasks in research.