1 Introduction

Earthquakes can cause varying degrees of damage to buildings, leading to numerous casualties and significant economic losses. Coordinated and effective post-disaster emergency response can reduce casualties, minimize economic losses, and quickly restore urban social functions. All of this, nevertheless, is dependent upon an accurate and timely assessment of the structure's damage level, since an incorrect or delayed assessment could cause further losses.

With the improvement of computer hardware performance and the popularization of artificial intelligence technology, neural networks have brought new possibilities for the nonlinear seismic response assessment of structures (Xu and Chen 2021). Deep learning simulates the information processing mechanism of the human brain and establishes a model composed of a large number of neurons and connections so that complex pattern recognition problems can be solved without too much human intervention. In recent years, deep learning has gradually been applied to structural damage assessment caused by earthquakes and has achieved significant results (Sony et al. 2021). Typically, the most efficient approach involves capturing the structure's vibration signal during an earthquake, converting it to the frequency or time–frequency domain for reducing feature dimensions, eliminating unnecessary data, preserving valuable features, and ultimately feeding the extracted information into neural networks for damage assessment. (Kong et al. 2017; Hou and Xia 2021). The time–frequency analysis method can convert acceleration signals into time–frequency spectra, and the migration learning model based on CNN shows promising evaluation outcomes for this process. This method possesses the capability to assess the damage state and predict the nonlinear seismic response of structures following earthquakes, which could be advantageous for promptly evaluating regional earthquake damage post-event (Mangalathu 2020; Liao et al. 2021; Lu et al. 2021).

With its superior performance in time series analysis, 1D-CNN can maximize the benefits of deep learning in autonomously learning signal characteristics while also having the ability to directly learn the properties of vibration signals (Abdeljaber et al. 2018). One-dimensional time series and two-dimensional image coding are the two data preprocessing methods that are compared. It is discovered that the conversion of one-dimensional time series into two-dimensional images prolongs training time, the image coding method does not significantly improve prediction accuracy when compared with the time series method (Yuan et al. 2021). Secondly, 1D-CNN requires a relatively shallow architecture compared to 2D-CNN to manage classification tasks, which makes 1D-CNN superior in cost-effectiveness and computational efficiency (Kiranyaz et al. 2021). Classifying acceleration signals is the fundamental objective of post-seismic damage assessment; hence, 1D-CNN is more appropriate for earthquake damage assessment based on acceleration signals.

The effective use of deep learning-based approaches to structural earthquake damage assessment requires a significant and well-balanced data collection. In practice, though, obtaining enough damaged samples can occasionally be challenging and in turn performance of damage assessment may suffer (Alom et al. 2019). Augmenting the data is a common approach to increase damage assessment performance in small-sample scenarios. Given the circumstances, a deep learning architecture known as a generative adversarial network (GAN) may be a viable solution to meet our needs. Using a given training dataset, it trains two neural networks to compete with one another to produce more genuine new data. GAN can learn the distribution of data and generate new sample data, providing a new solution to the problem of small original data (Goodfellow et al. 2014; Rastin et al. 2021) using a deep convolutional generative adversarial network to train the one-dimensional acceleration signal of the monitored structure and realized the quantification and damage localization of structural damage. To address the issue of inadequate damage assessment data (Luleci et al. 2022) they developed a 1D-WDCGAN-GP model by incorporating the Wasserstein loss function into the deep convolutional generation confrontation network. This model can be utilised to produce a vibration damage dataset that is comparable to the input (Fan et al. 2023) added the self-attention mechanism to GAN to learn the intrinsic correlation which facilitates the extraction of spatial and temporal correlations between structural responses, and reconstructs lost data based on precisely measured data. Deep learning has been used in earthquake damage assessment, however, creating models for earthquake damage assessment that balance accuracy and efficiency remains the largest barrier to its wider use (Zhang et al. 2022).

Although deep learning has been applied to earthquake damage assessment, the primary challenges hindering its widespread adoption are achieving a balance between efficiency and accuracy in the assessment models and addressing the issue of data scarcity. To tackle these challenges, we propose a novel deep learning method based on time–frequency analysis and Conditional Generative Adversarial Networks (CGAN).

Proposed Framework:

  • Seismic Damage Evaluation Framework:

    1. a.

      Our framework combines signal time–frequency analysis with a one-dimensional Convolutional Neural Network (1D-CNN) for evaluating seismic damage.

    2. b.

      We considered five different time–frequency transformation methods to improve the resolution of the signal from various perspectives.

  • Parameter Optimization:

    1. a.

      To further enhance the prediction accuracy and computational speed of the deep learning model, we employ the Bayesian Optimization algorithm. This algorithm fine-tunes the model's parameters, ensuring both high precision and efficiency in the earthquake damage evaluation process.

  • Addressing Data Scarcity:

    1. a.

      To overcome the challenge of limited datasets with damage samples, we designed a data generation model based on Conditional Generative Adversarial Networks (CGAN).

    2. b.

      This CGAN model generates synthetic, yet high-quality seismic damage data, which are subsequently integrated into the damage assessment framework.

    3. c.

      The generated data are tested for their applicability and performance, ensuring they effectively augment the existing datasets and improve the robustness of the deep learning model.

  • Benefits:

    1. a.

      Enhanced Signal Resolution: The use of multiple time–frequency transformation methods ensures a detailed and accurate representation of seismic signals. It can improve accuracy and computational efficiency.

    2. b.

      Optimized Model Performance: Bayesian Optimization ensures the model operates with high precision, making the evaluation process more accurate.

    3. c.

      Augmented Data Availability: The CGAN-generated synthetic data help mitigate the issue of data scarcity, providing more training samples for the deep learning model and improving its generalization capabilities.

2 Methods

2.1 Time–Frequency Analysis

The accuracy of the earthquake damage model's assessment is directly impacted by the various time–frequency transformation techniques used (Kumar et al. 2018, 2021). Each of these extracts signals features from a different angle and thus has differences in signal sensitivity. To build an earthquake damage assessment model, five widely used time–frequency analysis techniques were chosen and integrated with 1D-CNN. The following are the chosen techniques for time–frequency analysis.

(1) Time domain

Acceleration sensors record the properties of the structural response, which include all facets of the structures' physical behavior. The original time-domain acceleration signal is directly fed into the one-dimensional convolutional neural network, and the damage features are extracted to evaluate the damage degree of the structure (Nguyen et al. 2022).

(2) Fast Fourier transform

Fast Fourier Transform (FFT) is one of the important methods in signal processing, which transforms the signal from time domain to frequency domain and bridges the gap between them. In practical engineering, if the change of the signal cannot be observed in the time domain, Fourier transform can be used to transform it to the frequency domain for observation (Kumar et al. 2015a, b). For a signal \(f\left(t\right)={L}^{2}(R)\), its Fourier transform is defined as:

$$F(\omega ) = \int_{ - \infty }^{ + \infty } {f(t)} e^{ - i\omega t} dt$$
(1)

where \(f(t)\) is the signal to be analyzed; \({L}^{2}(R)\) is a space domain; \(F(\upomega )\) is the Fourier transform of signal \(f(t)\).

(3) Short Time Fourier Transform

The short-time Fourier transform assumes that the signal is stationary within a fixed window function \(g(t)\). By analyzing the signal using the Fourier transform, the frequency components of the signal are obtained, and then the window function \(g(t)\) is moved along the time axis to obtain the frequency-time varying graph of the signal. For a signal \(f\left(t\right)={L}^{2}(R)\), its short-time Fourier transform is defined as:

$$\text{F}({\omega, \pi})={\int }_{-\infty }^{+\infty }g(t-\pi)f(t){e}^{{-i\omega t}}dt$$
(2)

In the formula: \(F(\upomega )\) is the short-time Fourier transform of the signal \(f(t)\), \(g(t)\) is the window function, other symbols are the same as above, using Hanning window function.

The result of the short-time Fourier transform is a 2D time–frequency spectrogram. 1D information can be extracted from the two-dimensional time–frequency spectrogram using moment methods, which can be used as the input for a 1D-CNN. The two time–frequency moments used are instantaneous frequency and spectral entropy (Kłosowski et al. 2020). Instantaneous frequency is the frequency of the signal extracted from the short-time Fourier transform of the spectrogram. Spectral entropy is a measure of the sharpness or flatness of the frequency spectrum of the signal, based on power spectral estimation. The two moments are used as the two feature vectors of the sample input into a 1D-CNN.

(4) Discrete Wavelet Transform

The Discrete wavelet transform (DWT) parses a signal across various frequency bands, each with its unique resolution. It systematically breaks down the signal into multiple layers. In this approach, the DWT lessens repeated information and decreases the complexity involved in calculations by streamlining the scaling aspects of the wavelet transform.

The signal is split up into a high-pass filter (HPF) and a low-pass filter (LPF) using DWT. The outputs obtained from the HPF are referred to as detail coefficients, while the outputs obtained from the LPF as approximation coefficients. For the purpose of carrying out the research study, only the low-frequency elements are subsequently separated into several layers, as shown in Fig. 1.

Fig. 1
figure 1

Wavelet decomposition process diagram

Two key steps in the discrete wavelet transform process are selection of the the right wavelet and decomposition level. Making sure these parameters are chosen correctly is essential since the results of the signal analysis can be greatly impacted by the wavelet basis and the number of decomposition steps chosen. According to references (Zhang 2019; Li 2020), it demonstrates that the number of decomposition layers for a sequence with high volatility is typically limited to three layers. Ultimately, two wavelets-db15 and sym6-were chosen as the wavelet basis functions, and each could be broken down into three layers. As the input, the low-frequency portion (cA3) was selected.

(5) Wavelet Scattering Transform

In the wavelet scattering transform (WST), we apply a modulus operation to the initial wavelet transform because the average of the wavelet coefficients is zero, which remains unchanged with linear transformations. To capture informative, non-zero coefficients, this necessitates a non-linear transformation—in this case, taking the modulus. Nonetheless, this approach may lead to a loss of the signal's high-frequency details. To address this, the method involves iterative decomposition of the modulus wavelet coefficients at successively higher levels, alongside modulus operations and convolutional averaging. Illustrated in Fig. 2, the method progressively outputs wavelet scattering coefficients that exhibit translation invariance, while the modulus wavelet coefficients are successively parsed into subsequent layers for further calculations. By iterating the signal layer by layer, a series of wavelet scattering coefficients can be obtained (Fan et al. 2022). The combination of wavelet and modulus operators in the scattering transform ensures that the resultant scattering coefficients do not vary with translation, addressing the issue of temporal variability in wavelet transforms. This process enhances stability against local deformations and captures a wealth of feature information.

Fig. 2
figure 2

Wavelet Scattering Decomposition Structure

2.2 One-dimensional Convolutional Neural Network

The retention of the original signal features can be maximized in signal classification problems by 1D-CNN since it does not require changing the input signal's dimension. Additionally, training speed can be greatly increased due to their minimal model complexity. In 1D-CNN, the fundamental layers are an input layer, an output layer, and hidden layers. The convolutional, batch normalisation, activation function, max pooling, dropout, and fully connected layers are the primary components of the hidden layer.

The basic model composition of one-dimensional convolutional neural network is shown in Fig. 3.

Fig. 3
figure 3

1D-CNN structure

The fundamental process of training a network model involves iteratively updating parameters through forward and backward propagation processes until the output value and true value error align with the expected value. This allows for the determination of the optimal parameters during training or the final parameters once training is finished (Cao 2022).

2.3 Bayesian Optimization Algorithm

Typically, a Bayesian optimization(BO) algorithm adds new sample points to the provided black-box objective function and updates the objective function's posterior distribution until it nearly matches the true distribution It is a very suitable adaptive parameter optimization for classification and regression models and belongs to global optimization algorithm (Cui and Yang 2018).

Bayesian optimization algorithm is mainly composed of a surrogate model and acquisition function. Since it is difficult to obtain the objective function, the surrogate model estimates the objective function based on the existing data and uses Gaussian process as the surrogate model. The acquisition function determines how to sample new data based on the estimated objective function, and then updates the surrogate model based on the newly collected data. This process is repeated iteratively, and in an ideal scenario, the global optimum of the objective function can be found. The BO algorithm is used to optimize the parameters of 1D-CNN models, The specific steps are as follows:

  1. (1)

    Determine the maximum number of iterations N, where the maximum number of iterations for BO is 60 times, with 300 training batches being completed each time.

  2. (2)

    Use the acquisition function to obtain an evaluation point, that is, to obtain a certain combination of optimized parameters.

  3. (3)

    Evaluate the objective function value at the evaluation point, and choose the error rate of the validation set as the objective function value.

  4. (4)

    Integrate the data and update the probabilistic surrogate model to make the surrogate model more consistent with the distribution of the objective function.

  5. (5)

    If the current iteration n is less than the maximum number of iterations N, return to step (2) to continue the iteration. Otherwise, select the optimal evaluation point corresponding to the minimum error rate as the output to obtain the optimal parameter combination for the network model.

Table 1 shows the parameters that the BO algorithm optimized for this case study. Only odd numbers are utilized, and the convolution kernel size is constant throughout each convolutional layer. The iteration process of the Bayesian optimization algorithm is shown in Fig. 4. The observation and estimation values of the function have reached the minimum value at the 56th calculation step, and the computation result is optimal at this time. The one-dimensional convolutional neural network model's final parameters are then chosen based on the Bayesian optimization parameters at this point. The model is evaluated using the testing set to obtain the final model evaluation accuracy.

Table 1 Bayesian optimization parameters
Fig. 4
figure 4

Iterative process of Bayesian optimization algorithm

2.4 Conditional Generative Adversarial Networks

The specific idea and design of a GAN are depicted in Fig. 5. A GAN is comprised of a generator and a discriminator.

Fig. 5
figure 5

Framework of GAN

The adversarial training between the generator and discriminator is represented by the training objective function of the GAN model, which can be obtained by,

$$\mathop {\min }\limits_{G} \mathop {\max }\limits_{D} V(G,D) = E_{{x\sim P_{r} }} [\log D(x)] + E_{{z\sim P_{z} }} [\log (1 - D(G(z)))]$$
(3)

where E represents the expectation, x is the real sample, Pr is the real sample distribution law; D(x) is the output of discriminator; z is random noise; Pz is the distribution law of random noise; G(z) is the output of the generator.

To create specified target data, CGAN adds auxiliary information, such as the data's type label, to the generator and discriminator inputs of the original GAN(Mirza, 2014). CGAN adds external label information to guide the training of the GAN. It can not only learn the information of multiple classes of samples at the same time, but also to some extent avoid problems such as gradient disappearance and model collapse that are easy to occur in the GAN training process. The structure of CGAN is depicted in Fig. 5.

The data produced by the model has the potential to infinitely mimic the real data if it completely understands the complex linkages concealed in the data and achieves a balance. The training objective function of the CGAN model has been modified based on the GAN as can be seen in Fig. 6, and the new objective function is represented by the following formula.

$$\mathop {\min }\limits_{G} \mathop {\max }\limits_{D} V(G,D) = E_{{x\sim P_{r} }} [\log D(x|y)] + E_{{z\sim P_{z} }} [\log (1 - D(G(z|y)))]$$
(4)
Fig. 6
figure 6

Framework of CGAN

The network model encounters issues such as mode collapse, and sluggish convergence speed when GAN encounters a high volume of data features during real training. These issues make it harder for the network model to train steadily. To solve these problems, CNN is introduced to construct the internal structure of the generator and discriminator. CNN is capable of extracting features from several hidden layers, sharing convolution kernels, and handling high-dimensional data with ease. The integration of CNN can enhance the CGAN's stability, rate of convergence, and quality of data generation. Since the damaged data is a 1D signal, 1D-CNN is utilised to build the CGAN model in a way that facilitates feature extraction.

Generators, typically take multidimensional random noise as input data. Step size convolution is employed to enable the network to sample in the autonomous learning space, and spatial pooling in CNN is not used to allow the network model to independently learn a more appropriate spatial sampling strategy. Between the levels, the batch normalization operation is used to accelerate the convergence speed and reduce the over-fitting effect. The ReLU activation function is used to add nonlinear factors to improve the expressive ability of the model and solve problems that cannot be solved by the linear model. Thus, an expected response might be produced. The CGAN generator structure is shown in Fig. 7a.

Fig. 7
figure 7

G and D network structure

The discriminator usually takes the conditional data as input data separately from the generated samples and the real samples. The input data is additionally concatenated by the discriminator into a matrix that serves as the convolutional layer's input. The Leaky ReLU function is utilised as the hidden layer's activation function. The feature loss function is applied to the discriminant model's second and third hidden layers. The discriminant model's fully connected layer and sigmoid activation function layer are then used to determine true and false, mapping the final result to [0, 1]. The discriminator structure in Fig. 7b.

During the training of CGAN, the generator and discriminator are trained alternately. During the training of the generator, the weight values are restricted based on the differences between the generated data, the discriminator's outcomes, and the feature vector deviations. The trained data and the predicted data produced by the generator are fed into the discriminator during training. The discriminator then has to calculate the probability that needs to be forecasted and adjust its parameters in response to the discrimination deviation. The CGAN model training process is divided into three steps:

(1) Raw data preprocessing. The primary focus is on analysing the original data's time–frequency.

(2) Update discriminator parameters. Random noise with labels is calculated by the generator to generate false samples. Then the coded label information is combined with the generated samples or real samples and passed into the discriminator, the loss value is calculated through the loss function, and the parameters of the discriminator are updated.

(3) Update generator parameters. Following training, the discriminator's parameters are set, and the generator's parameters are updated in accordance with the determined loss value.

3 Overview of Post-earthquake Damage Assessment Process

Prior to assessing earthquake damage, the time-domain signal is preprocessed, and then the hyperparameters of the one-dimensional convolutional neural network are optimized by Bayesian optimization algorithm to increase the accuracy of damage assessment. However, when there is little data, damage assessment accuracy is typically low. To address this, conditional generative confrontation networks are used to augment data to improve damage assessment accuracy.

3.1 CNN-Based Damage Assessment Method

  1. (1)

    The top acceleration of the structure is one of the feature variables that can be used to indicate the damage state of the frame structure in order to assess the post-earthquake damage state of reinforced concrete frame structures. A CNN-based post-earthquake damage assessment model is developed, which mostly consists of the actions shown in Fig. 8.

Fig. 8
figure 8

Post-earthquake damage assessment process based on CNN

Establish a finite element model of the structure to accurately simulate each component and boundary conditions. Determine the damage state of the structure based on the maximum inter-story displacement angle of the structure, and then calibrate the mapping relationship between the acceleration data and the damage state label.

To acquire the damage sample, time-frequency analysis is used as a preprocessing method for the acceleration data. The samples are then separated into datasets for training, validation, and testing.

The dataset is fed into a 1D-CNN, and the optimal parameters are identified using the Bayesian optimization algorithm. The network structure in its optimal configuration is then preserved as the damage assessment model.

Use the developed damage assessment model to evaluate the damage state of the building structure after a new earthquake strikes.

3.2 Generated Sample Model Based on CGAN

To enhance the precision of damage assessment with limited samples, CGAN is employed as the generator model to generate additional damage samples through data augmentation. The key steps involved in applying CGAN to post-earthquake structural damage assessment are outlined below, as depicted in Fig. 9.

  1. 1.

    Data preprocessing: To create a new training set, time–frequency analysis is used to preprocess the acquired acceleration data.

  2. 2.

    Training the network model: The datasets are input into the CGAN to find the optimal parameters and save the optimal network structure as the generator model.

  3. 3.

    Generating fake samples: The generator model is used to generate acceleration damage samples and mixed with the original data.

  4. 4.

    Evaluating sample quality: The quality and accuracy of the generated data is evaluated by the damage assessment model.

Fig. 9
figure 9

Seismic damage assessment model based on CGAN

4 Data Set Creation

4.1 RC Frame Structure Design

A six-story RC frame structure was designed according to Chinese codes (China Architecture and Building Press 2010; Code for Design of Concrete Structures). The basic design data are as follows:

The plan dimension of the structure is 21.6 m × 15 m.

The height of the first floor is 4.0m.

Height of the remaining floors is 3.6 m.

Total height of a building is 22.2 m as shown in Fig. 10.

Fig. 10
figure 10

The plan and elevation layout of RC frame(unit: mm)

Seismic fortification intensity is 8 degrees.

Seismic group is the second group.

Site category is class II.

Floor dead load is 3.5 kN/m2.

Floor live load is 2.0 kN/m2.

Concrete strength grade is C40.

Steel grade is HRB400.

Beam and column section sizes are 300 mm × 600 mm and 600 mm × 600 mm.

4.2 Finite element model

The finite element analysis model of the RC frame structure is established by using OpenSees software, and the incremental dynamic analysis is performed on the structural model. The P-Delta effect of each beam-column element was taken into consideration, and both beams and columns used fiber section and nonlinear beam-column elements, which were divided into several integration segments. The accepted Concrete01 concrete constitutive model has a high computational efficiency and is simple to converge. It is based on the Kent-Scott-Park uniaxial concrete constitutive model, which does not take into account the tensile mechanical properties of concrete. The steel constitutive model used Steel02, a uniaxial isotropic strengthening Giuffre-Menegotto-Pinto constitutive model with strong numerical stability and simulation performance that can capture the Bauschinger effect of steel. The foundation was handled as a rigid joint and the Rayleigh damping was applied by linearly mixing the mass and stiffness matrices.

4.3 Seismic wave selection

The initial requirement for conducting earthquake response analysis and accurately evaluating structural damage is the rational selection of seismic ground motions. Presently, the predominant approach for choosing ground motions considers magnitude, epicentral distance, and site conditions as the primary criteria (Zhang et al. 2019). The magnitude of an earthquake can largely affect the spectral and duration characteristics of ground motions. Small earthquakes don't produce much harm because their energy is so tiny (Qiao 2020). Therefore, ground motions with magnitudes greater than 4 are usually selected for seismic analysis of buildings. Near-field and far-field ground motions have different effects on the response of structures, which can be differentiated by epicentral distance. However, there is currently no unified definition of near-field and far-field ground motions, so an epicentral distance between 0 and 400 km is selected. Site conditions significantly affect ground motion records, and the amplitude and spectral characteristics of ground motions are reflected by changes in site conditions. Wang (2016) studied the impact of different earthquake motion classifications on the vulnerability curve. To classify a location, one generally uses the shear wave velocity within a soil layer thickness of 30 m. The PEER seismic motion database has divided ground motions into three categories based on these measurements: \({V}_{s}30\): \({V}_{s}30\)≤260 m/s, 260 m/s ≤ \({V}_{s}30\)≤510 m/s, 510 m/s ≤ \({V}_{s}30\). In accordance with the first set of criteria, 216 ground motions total—24 seismic events—were selected from the PEER earthquake motion database, with 72 ground motions assigned to each site type. The earthquake event parameters selected are shown in Table 2.

Table 2 Ground motion record information

4.4 Classification of Damage State

Selecting the maximum interstory drift angle as the performance indicator for structural damage, referring to the reference values in the code (FEMA 357 2000). The classification of the damage condition of reinforced concrete frame structures is divided into three categories, as illustrated in Table 3 (Han et al. 2020).

Table 3 Structural damage classification based on maximum story drift angle

4.5 Establish Seismic Damage Data Set

Peak ground acceleration (PGA) and maximum inter-story drift angle were chosen as the seismic intensity indicators for performing incremental dynamic analysis on a six-story reinforced concrete frame building. The maximum inter-story drift angle of the structure and the associated structural damage indicator were obtained by adjusting the seismic wave in accordance with PGA and entering it into the structure for dynamic time history analysis. The equal-step approach was applied to all seismic wave time histories, meaning that the PGA of each seismic wave time history was changed with a step of 0.02g, beginning at 0.02g and continuing until the structure collapsed. The direction of the seismic wave loading was in the X direction.

$$a{\prime} (t) = a(t) \times PGA/A_{\max }$$
(5)

where: \(a^{\prime}\left( {\text{t}} \right)\) is the adjusted acceleration time history curve, \(a\left(t\right)\) is the original seismic wave acceleration curve, PGA is the design acceleration peak, \({A}_{max}\) is max the original seismic wave acceleration curve peak.

Since the seismic data were collected from different stations with different sampling rates and durations, it is necessary to down-sample and truncate the seismic records to meet the input requirements of the network model. Specifically, single seismic data was down-sampled to 200Hz and truncated to 6561 data points, a time series of 32.805s, to ensure that the data size of all seismic sequences is the same length. Zeros are added to the seismic data collection in case there is insufficient data. Additionally, baseline drift was adjusted for each of the chosen seismic waves.

When applying the five time–frequency analysis methods mentioned in the previous text to the signal, the data dimension of the signal changes. There are 6561 samples in the original time-domain signal in total. The signal's length can be cut in half with the fast Fourier transform, yielding a data point length of 3281. Further, two matrices with a data point length of 129 are produced after the short-time Fourier transform in conjunction with the time–frequency matrix method. The low-frequency part's length becomes one-eighth of its original length with a data length of 821 following the discrete wavelet transform; the wavelet scattering transform converts the signal's data into a 7 × 258 matrix. The specific data dimensions are shown in Table 4.

Table 4 Data dimension of five time–frequency analysis

The earthquake waves in the training set, validation set, and test set are from different earthquake events This setup ensures that the seismic waves used to train, validate, and test the model are completely different, ensuring that the proposed method can reasonably and accurately evaluate the damage caused by earthquakes to buildings. A total of 216 different seismic waves were applied to the structure, and because the maximum number of times each seismic wave could be adjusted is not the same, Table 5. shows the final data size of the training set, validation set, and test set.

Table 5 Number of seismic waves in each data set

5 Damage Assessment Based on CNN Model

5.1 Results Evaluation Index

To evaluate the performance of the model more intuitively and understand its generalization ability, some evaluation indicators are introduced to measure the performance of the model. Commonly used evaluation indicators in deep learning include accuracy, precision, recall, and model runtime.

(1) Accuracy A, the ratio of correctly detected samples to the total samples, is calculated as follows:

$$A = \frac{TP + TN}{{TP + FP + FN + TN}}$$
(6)

In the formula: TP refers to the number of correctly identified positive samples; TN refers to the number of correctly identified negative samples; FP refers to the number of incorrectly identified positive samples; and FN refers to the number of incorrectly identified negative samples.

(2) Precision P, the ratio of correctly detected target samples to all detected target samples, is calculated as follows:

$$P = \frac{TP}{{TP + FP}}$$
(7)

(3) Recall R, the ratio of correctly detected target samples to the actual target samples, is calculated as follows:

$$R = \frac{TP}{{TP + FN}}$$
(8)

(4) Model training time: In deep learning, the model parameters need to be repeatedly updated during training. If the model converges too slowly and the training time is long, the number of iterations will be reduced, which will prevent the model from achieving better performance, and the accuracy of the model will also be affected. Therefore, the model training time is also an important performance indicator for the model.

5.2 Comparison of evaluation results.

Under the same computer hardware configuration, the network models of the five time–frequency analysis methods were trained and tested using the same dataset. The computer hardware used had an i5-12400F CPU, NVIDIA GeForce GTX3060 GPU, and 16GB of memory. 1D-CNN model was built and trained using the MATLAB platform. The generalization ability of the 1D-CNN optimized by Bayesian optimization was evaluated using the test set in the simulated data. The damage assessment results of the five time–frequency analysis methods and the 1D-CNN model are shown in Table 6., and the confusion matrix is shown in Fig. 11.

Table 6 Accuracy of five time–frequency analysis methods
Fig. 11
figure 11

Confusion matrix of five time–frequency analysis methods

According to Table 6, the accuracies of the original time-domain signal, fast Fourier transform, wavelet transform, and wavelet scattering are all above 80%, which indicates that the seismic damage assessment method combining time–frequency analysis with one-dimensional convolutional neural network can evaluate post-earthquake frame structural damage.

Figure 11. shows the confusion matrices of the five time–frequency analysis methods, where the columns are the predicted classes of the one-dimensional convolutional neural network model, the rows are the true classes obtained by nonlinear time-history analysis, the last column represents the recall rate, the last row represents the precision rate, and the lower right cell represents the accuracy. On the test set, the high recall rate, precision rate, and accuracy indicate that the model has better generalization ability. It can be seen that the accuracy of the wavelet scattering method is the highest among the five time–frequency analyses, up to 92.5%, which indicates that this model method has good generalization ability for seismic damage assessment. Since the accuracy of the fast Fourier transform combined with the time–frequency matrix method is very low, this method is no longer considered in subsequent analysis.

5.3 The Influence of Network Structure on Training Results

The common problem with training deep learning network models is that it takes too long to train, often requiring a long time to achieve high accuracy. Therefore, this section mainly compares various time–frequency analysis methods to select the method with the shortest training time and high accuracy, to improve the accuracy of the earthquake damage assessment model and shorten the training time. To evaluate the computational efficiency of deep learning network models, the computation time and resources consumed by five network models containing four time–frequency analyses were compared. Computational resources, as an important indicator of network models, mainly include two parameters: the number of parameters and the number of computations, which determine whether the model can run on limited hardware devices and whether its computation time is controllable.

Table 7 presents the network architecture, number of parameters, computational cost, and time taken for five different network models. As shown in Table 7., the computation time for the wavelet scattering-based time–frequency analysis method combined with one-dimensional convolutional neural network is the shortest, taking only 144 s with the same hardware configuration. This is because (1) as shown in Table 4, the wavelet scattering-based time–frequency analysis method has a smaller data dimension compared to other time–frequency analysis methods, and (2) the wavelet scattering-based network model has fewer parameters and computational costs compared to other network models, requiring less computing resources. So choosing the most efficient analysis method and network model with the highest computing accuracy is essential for post-earthquake damage assessment under limited computing hardware. Based on the results of the evaluation of accuracy, computation time, and resource utilization, we recommend using the wavelet scattering-based time–frequency analysis method combined with a one-dimensional convolutional neural network model for fast post-earthquake damage assessment among the five time–frequency analysis methods and network models compared.

Table 7 Computing resources and time of five network models

5.4 Robustness Verification

Since the signals collected in practical environments generally have some external interference, especially under the influence of earthquakes, the collected acceleration signals inevitably contain some noise. Usually, the impact of this type of noise is simulated by adding Gaussian white noise.

Signal-to-noise ratio (SNR) is commonly used in engineering to evaluate the strength of noise in the signal. The formula for SNR is as follows:

$$SNR(dB) = 10\log_{10} \left( {\frac{{P_{signal} }}{{P_{noise} }}} \right)$$
(9)

where: \({P}_{\text{signal}}\) represents the power of the original signal, and \({P}_{\text{noise}}\) represents the power of the noise signal.

In order to evaluate the noise resistance of the earthquake damage assessment model, Fig. 12 shows the original signal before adding Gaussian white noise and the signal after adding white noise. It can be seen from the figure that after adding white noise, the original signal is seriously contaminated by Gaussian white noise signal, which will affect the signal characteristics of the original vibration signal, the processing of the signal by time–frequency analysis, and the feature extraction of the network model, and may therefore affect the accuracy of damage assessment.

Fig. 12
figure 12

Raw signal and noisy signal

Table 8 shows the evaluation results of different time–frequency analysis methods and network models under the influence of noise. When the SNR is 1dB, the accuracy of all five models decreases, but the recognition accuracy of the wavelet scattering transform method is still above 90%. The results show that the earthquake damage identification model based on wavelet scattering transform and one-dimensional convolutional neural network has good anti-noise and robustness. Therefore, further research is conducted using the earthquake damage assessment model established based on wavelet scattering time–frequency method and deep learning.

Table 8 Accuracy under noise

5.5 Effect of Sample Size on Training Results

To improve the generalization ability of the structural post-earthquake damage assessment model, a large number of training samples are needed. However, it is not realistic to obtain a large number of real working condition data samples in practical engineering applications. This section focuses on the dependency of the structural post-earthquake damage assessment model on the number of samples. In this experiment, only the amount of training set data was reduced, while the amount of validation and test set data remained unchanged. The performance of the model was studied by selecting the sample quantity of 4 and 8 earthquake events in Table 3 as the training set. The specific earthquake event numbers and damage sample quantities of the training set are shown in Table 9. When the number of earthquake events is 4, the model recognition accuracy is less than 90%. As the number of earthquake events increases, the accuracy significantly improves. When the number of earthquake events increases from 4 to 8, the accuracy improves by 7.6%. When the number of earthquake events is 16, the accuracy can reach a maximum of 92.5%. The accuracy after adding noise also improves with the increase in the sample quantity. It indicates that the generalization ability of the damage assessment model improves with the increase of sample quantity, and the evaluation accuracy of the established post-earthquake damage assessment model is also higher.

Table 9 Training results with different sample sizes

6 Sample Generation Based on CGAN Model

6.1 Generative Adversarial Network Parameters

Due to the characteristics of the CGAN structure, the acceleration damage samples are adjusted to meet the input and output of the network model. The time domain samples are cropped, and the final data dimension is 16,400. The wavelet scattering transform samples are cropped, expanded, and flattened. First, the data dimension is cropped to 7256, the mean is taken in the vertical dimension and added to the original data, and then flattened. The final data dimension is 1*2048. After preprocessing, the above samples are used as the training set for the CGAN. By comparing the performance of the two in terms of data quality and damage evaluation accuracy, the effectiveness of the time–frequency analysis combined with CGAN data generation method in post-earthquake damage evaluation is verified.

Both the discriminator and generator in the text use convolutional neural networks. Reasonable network structures can improve the feature extraction ability of the model and further improve its generalization performance. Prior to multiple experiments, various network models are constructed for different datasets, and the network structures of the generating network and the discriminating network corresponding to different datasets are determined based on the above evaluation indicators, and the hyper-parameters are set as shown in Table 10.

Table 10 Improved CGAN network model parameter settings

6.2 Generating Samples Based on the CGAN Model

In the training process of the CGAN model, there is a mutual game between the generator and the discriminator. If one model learns too fast, it will prevent the other model from training, resulting in the model being unable to converge. To evaluate the training effect of the model, this paper calculates the scores of the generator and discriminator and draws their score graphs to evaluate the training status of the model. The score calculation formula for the generator and discriminator is as follows:

$$G_{score} = mean\left( {\widehat{Y}_{Generated} } \right)$$
(10)
$$D_{score} = \frac{1}{2}mean\left( {\widehat{Y}_{{{\text{Re}} al}} } \right) + \frac{1}{2}mean\left( {1 - \widehat{Y}_{Generated} } \right)$$
(11)

where \({G}_{score}\) is the generator score, \({D}_{score}\) is the discriminator score, \({\widehat{Y}}_{Generated}\) is the probability value of the output of the generated data in the discriminator, and \({\widehat{Y}}_{Real}\) is the probability valu + e of the output of the real data in the discriminator.

In the ideal CGAN model, the scores of the generator and discriminator are both 0.5, indicating that the generated data approximates the real data and the discriminator cannot distinguish between real and generated data. When the score of the generator model is close to 1, it means that the generator model learns too fast, causing the discriminator to be unable to effectively train. At this time, although the generated data from the generator model is significantly different from the real data, the discriminator cannot identify the authenticity of the input data. When the score of the discriminator model is close to 1, it means that the discriminator model learns too fast, causing the generator to be unable to effectively train. To balance the learning ability of the two models, the number of convolutional kernels in the discriminator can be increased (decreased), dropout layers can be added to the generator (discriminator), and the number of convolutional kernels in the generator can be reduced (increased).

After preprocessing the data and inputting it into the constructed CGAN model for training, the training score process is shown in Fig. 13. It can be seen from the training score graph that after 500 training batches and 13,000 iterations of training, the final scores of the generator G and the discriminator D in the CGAN model are both around 0.5, indicating that the model has converged.

Fig. 13
figure 13

CGAN model training process score chart

After the CGAN model converges, random numbers are input to the model for data generation, and the generated results are shown in Fig. 14. Since the goal of the CGAN model is not to produce data that is exactly the same as the real data, but rather to produce output that has slight variance and is sufficiently similar to the real data, direct comparison between generated data and real data cannot evaluate the quality of the generated data.

Fig. 14
figure 14

CGAN model data generation diagram

6.3 Damage Assessment Based on CNN Model

In order to intuitively verify the effectiveness and significance of this work, the seismic damage assessment model established was earlier selected to evaluate the quality of the generated acceleration damage data. Different training sets were designed to verify the actual effect of CGAN on damage assessment, and the number of validation sets and test sets remained the same as in the previous section. The different proportions of the training sets and their damage assessment accuracy are shown in Table 11.

Table 11 CGAN training set samples

Table 11 demonstrates that the model's classification accuracy rises with the number of training set data. The classification accuracy of the mixed-data training set is higher than that of the generated-data training set. Additionally, the training set with mixed data exhibits a significantly higher classification accuracy when compared to a training set with a small number of real samples. This suggests that the CGAN-generated samples can effectively improve the accuracy of damage assessment by filling small sample datasets. However, the classification accuracy does not increase as the number of generated samples is increased. The classification accuracy of the mixed data is marginally less than that of the real data when the amount of mixed data reaches the maximum value of the real data. This suggests that the CGAN-generated samples contain additional pertinent information with real sample features, which causes classification errors in the classification model.

In conclusion, the samples generated by the CGAN model have high similarity to real samples and can effectively expand small sample damage data, which has a certain effect on improving the accuracy of small sample damage assessment.

7 Conclusion

This research investigates a seismic damage assessment technique using time–frequency analysis and 1D-CNN, along with a seismic data generation approach using CGAN and time–frequency analysis, in order to address the issues of low accuracy and efficiency in post-earthquake structural damage assessment. Numerical simulations verify the effectiveness and robustness of these techniques. The following is a summary of the findings:

(1) Five time–frequency analysis techniques and their associated network models are examined and evaluated. Four of these methods can attain an accuracy exceeding 80%, with the highest evaluation accuracy recorded by the integration of the scattering time–frequency analysis method with 1D-CNN, achieving 92.5%. This enables automated seismic damage assessment without the need for manual intervention in extracting damage feature parameters.

  • The computational efficiency of deep learning is analyzed from two perspectives such as the data dimension after time–frequency analysis and the computational resources of the network model. The integration of scattering time–frequency analysis with one-dimensional convolutional neural network requires the smallest data dimension and computational resources, resulting in the shortest computation time of just 144 s. This facilitates swift seismic damage assessment using minimal hardware resources.

  • The combination of 1D-CNN with scattering time–frequency analysis continues to show greater accuracy even with noise added to the original signal. This highlights its resilience and superior generalisation skills in a range of work environments.

  • The CGAN model can be trained iteratively over several rounds to identify useful features in damage acceleration data and provide realistic acceleration damage data. The suggested acceleration damage sample production model is added to the original dataset, and this dataset is used to train the post-earthquake damage assessment model. The results show that the latter performs best under the identical training set conditions.