1 Introduction

Diagnosing the damage caused by natural disasters such as earthquakes and winds is considered a severe challenge for the engineering community, creating the potential for structural health monitoring (SHM) as an essential technology for disaster response and recovery. Loss of structural integrity during disasters, such as the Taiwan earthquake, the Loma Prieta earthquake and the recent Turkish earthquake, highlights the need for a seismic-induced health monitoring (SIHM) system. As part of SIHM strategies, autonomous and continuous building condition monitoring is crucial for disaster mitigation. The cost of expert inspection and continuous structure monitoring also underscores the need for a robust system to automate pattern recognition within the response signal. SHM systems vary in methods, tools and application, providing an effective solution for addressing the challenge of continuous building monitoring by experts.

Based on the detection regions, structural damage detection methods can be classified as local or global [1, 2]. Nondestructive testing methods, including CT (computerized tomography) scanning and ultrasonic, are used for detecting damage in specific areas of the structure. However, these techniques require a prior determination of the detection region [1, 3]. On the other, vibration-based damage detection (VBDD) is a global-based damage detection approach that depends on the changes in the dynamic characteristics of the structure, such as natural frequencies [4], mode shapes [5] and mode shape curvature[6]. On the other hand, based on the data collection method, three main methods can be categorised: time, frequency and modal domain. The time domain approach involves feature extraction of time-series responses [7]. Time-domain approaches have advantages over modal and frequency domain methods, as they do not require domain transition, thereby avoiding information loss and ensuring there is no requirement for a complete structural model. They can operate on a partial model with limited measurements [7,8,9].

ANNs, as a subset of machine learning, have the capabilities to work with few and incomplete data and can be flexibly trained with selected input and output data, leading to efficient damage detection [10]. Previous studies have utilized ANNs to detect damage in seismically excited structures using data collected in the time and frequency domain. Kazemi et al. [11] utilized different machine learning (ML) algorithms to accelerate and improve the seismic risk assessments of RC buildings, including random forest, boosting algorithm, support vector machine and artificial neural networks. The study performed incremental dynamic analysis to obtain the fragility curve, PDF (Probability Density function) and CDF (Cumulative Distribution Function) curve; different feature selection criteria such as fundamental period, story weight and spectral acceleration were used as input to the ML model. The study concluded that three input features are relatively more important than others: fundamental period, number of stories and spectral acceleration at fundamental period Sa(T1). Vafaei et al. [12] employed the Multilayer Layer Perceptron Neural Network (MLPNN) to predict the damage location and severity of the Kula Lumpur Airport tower. Their method used the response acceleration measured from the strategically selected location for damage identification. Furthermore, the studies used continuous wavelet transformation to decompose the signal. The principal component analysis (PCA) was initially employed to reduce the dimensionality of the wavelet transform modulus. The result showed that when the noise was introduced to the data with 5% or less, the prediction error was less than 15%. On the other hand, using the frequency response function, Ni et al. [13] conducted an experimental test on a 1/20 scaled 38-level tall concrete structure model subjected to seismic excitation using a shaking table to generate different levels of damage from light, moderate, severe and complete (nearly collapse) damage conditions. After applying different seismic excitations at each level, a low-intensity random white noise of 20-min excitation was injected into the numerically collected data. The PCA was used for dimensionality reduction, and the PCA-Compressed-FRF data were then used as input to the neural networks. Their study showed that the few PCA-compressed-FRF identification data somewhat agreed with the visual inspection result of the seismic damage during the test. Despite some success in the abovementioned studies using traditional NNs, the performance of neural networks is heavily dependent on input features; issues with input features, such as sensitivity to noise, can hinder the neural network's ability to achieve optimal results [14]. Additionally, the connectivity between layers in traditional neural networks can result in time-consuming processes and may even lead to overfitting in some cases [3]. Moreover, traditional neural networks (TNNs) such as MLPNNs were shown the lowest performance for damage detection compared with Deep Learning Neural Networks (DLNNS) [15, 16].

Deep Learning Neural Networks (DLNNs) have garnered significant attention for their ability to extract features [17]. Unlike traditional artificial neural networks (ANNs), which are NNs with two or three fully connected layers, DLNNs can automatically extract features from raw data as their hierarchical multilayer architecture and layer mechanism result in higher abstraction and feature learning. The performance of traditional neural networks is highly dependent on handcraft feature extractors and preprocessing tools. In handcraft feature methods, there is a high need to use some approaches such as (mode shape, Mac, Comac, etc.) without an automatic process that results in an efficient process. Convolutional neural networks (CNNs) are a popular class of deep learning neural networks [18]. The CNNs' architecture comprises three main layers: pooling (helps in dimensionality reduction), convolutional layers (helps extract features), and fully connected layers for classification purposes. Lin et al. [14] proposed using 1D Convolutional Neural Networks (CNNs) to extract features from 20 s of raw data directly without requiring manual feature extraction, such as wavelet packet energy. The study established a finite element model of the simply supported beam that was randomly excited to establish the training and validation dataset. The CNNs obtained a 94.57% accuracy in the noise-free situation compared to the 93.90% accuracy for the wavelet packet energy. In the case of the white noise injection, the study showed 86.99% accuracy for the DLNNs algorithm, which exceeds the 77.86% result of the wavelet packet energy extractor. LSTMs are specialised time series DLNNs and a subclass of RNNs that can extract long-term dependencies with the time step data and make a prediction. Unlike traditional RNNs, which have a considerable issue with the problem of gradient vanishing or exploding, which hinders their ability to extract long-term dependencies, Fu et al. [19] developed a hybrid CNN-LSTM model, CNN was utilised to capture a high-dimensional feature, while LSTMs extract a time series feature from the acceleration of the structure. The study investigates the model using a numerical example of a large-span suspension bridge. The result showed outstanding performance of the hybrid model with 94% of the damage localisation accuracy. Lin et al. [20] examined the LSTM model in a beam structure; while the damage was simulated by reducing the beam depth, the LSTM model can predict the damage to the beam with an accuracy of 93.81%. Different machine learning and neural network algorithms, such as SVM, NB, DT and BPNN, were examined with different validation accuracy: 49%, 52.23%, 55.98%, 86.40% and 93.81%, respectively.

2D CNNs are specialized to extract features from images; as a result, a classification of the damage quantity can be identified; different studies have utilised the spectrogram using time–frequency analysis methods such as fast Fourier transform [15], fast-S transform [21], continuous wavelet transform [22, 23]. The 2D convolutional Neural Networks were used to classify the damage in spectrograms after converting the time series sensory data to the frequency domain spectrograms in different structures, composite plates, three-story plexi frame structures and concrete beam structures.

A limited number of studies have investigated the use of deep learning algorithms for seismic-induced damage detection to study the effectiveness of the DL algorithm in extracting hidden features from the time-series acceleration response [15, 16, 24, 25]. Table 1 summarizes the studies investigating the DLNNs for seismic-induced damage detection and their limitation. Yamashita et al. [24] investigated using the multi-classifier deep neural networks to identify a damage pattern in braces installed in a steel frame subjected to seismic excitation. DNNs algorithm was evaluated using long-duration seismic response and diverse input ground motions designed to replicate the Japanese building standard. Both training and testing phases used experimental data to examine the performance of the DNNs. The results indicated an accuracy rate that surpassed 77% and reached 87.5%. However, the study also found that increasing the maximum acceleration in the training data had a negative impact on the accuracy of the damage detection output. Yu et al. [16] proposed using the lower frequency of the signal after conversion from time series sequence to frequency domain using fast Fourier transform as it contains crucial information about the damage location. The smart structure was excited by the El Centro earthquake and control signals. The study also provides a comparison between general regression neural networks and the adaptive neuro-fuzzy inference system. The study examined the noise influence with three different noise levels with 10,5- and 2.5-dB S/N ratios. When the noise level increases as 5- and 2.5-dB signals, the prediction accuracy decreases. Different statistical coefficients were used to compare the results for different neural networks, and the squared correlation coefficient showed 99.27% for DCNNs. Dang et al. [15] studied a 2D steel frame subjected to earthquake ground excitation using three different deep learning algorithms: long short-term memory (LSTM), 1D CNNs, 2D CNNs and traditional neural networks presented in MLPNNs. Monte Carlo simulation was utilized to generate a database labelled for training and validating the neural networks. The study adopted the measured vibration signals without requiring the extraction of structural characteristics such as modal properties. The performance of NNs was evaluated using different statistical indexes such as the Confusion matrix and F1 scores. The 2D convolutional neural networks outperformed the other LSTM and 1D CNNs in terms of accuracy. On the other hand, the 1D CNN showed better time and storage performance, which is suitable for real-time scenarios as it performs better on cumulative big data with fewer computation resources [26]. Previously mentioned studies utilizing seismic induced damage detection were only examined in approaches without considering the real-time applicability of the method, including examining the deep learning on unseen datasets, the real-time algorithm to handle the dataset from data collection to labelling and damage quantification, examining the earthquake characteristics that improve the performance of DLNNs.

Table 1 Seismic-induced damage detection using deep learning algorithm literature provided a comparative analysis of their limitations

This study proposes a new approach to detect damage to structures subjected to seismic or vibration excitation using different DLNN algorithms (DLNNs). The study investigates how time and resource-efficient DLNN architectures and traditional neural networks are in handling seismic-induced damage detection considering real-time application. To address this question, the study examines different deep learning architectures directly using time series sensory data without a preprocessing tool on other structures. The study considers the method's practicality, such as time response and performance, and real-time application criteria, such as noisy datasets, temperature effects and postearthquake retrofitting decision-making. Automate the process of seismic-induced damage detection from data collection to DLNNs damage detection and classification using a ready code Python algorithm. The study also examines the generalisation of DLNNs on unseen datasets of earthquake groups. A localisation algorithm is also coded to investigate the correlation between different aspects of the chosen earthquake records, such as record length and time steps, and its correlation with DLNNs' damage detection performance.

2 Methodology

This study aims to develop a new method that is able to capture the level of damage to a structure that is subjected to seismic or vibration excitation. The overview of the proposed methods is shown in Fig. 1 and is organized as follows: (1) numerical simulation that required nonlinear time history analysis to be carried out to obtain structural dynamic response data (i.e., acceleration response measured at each story); (2) the data are processed by preprocessing stage to form the damage matrix which involves a preprocessing tools to automate the process for further research conducted in the same environment using SAP2000 and TensorFlow; (3) development of 1D-CNNs as presented in Fig. 1 includes two stages: the feature extraction and classification stage which involves the dense layers and SoftMax activation function; (4) the localization of each acceleration response signal to their floor number (from floor 1 to floor 7) including their damage quantification using the localization algorithm which is coded and examined inside python environment. The 1D-CNNs algorithm is then examined on the IASC-ASCE benchmark experimental dataset to demonstrate the practicality of the 1D-CNNs algorithm on the experimental dataset. The one damage matrix or damage level fusion matrix (see Fig. 1) as the process of combining the time domain data using the dynamic response(acceleration response displacement response) extracted from nonlinear time history analysis into one damage matrix along with its corresponding damage level. For example, each acceleration response for each floor and incremental dynamic loading is combined with unique damage levels into one damage matrix, which is eventually used to train and test the DLNNs algorithm [3, 19, 20]. After the training process, the localization algorithm propagates the damage levels back to each floor as an automated process. A comparative analysis with different DLNNs (LSTM -MLPNNs-2D CNNs (VGGnet) – DNNs) is also provided to highlight the effectiveness of this approach. The art of data science is then used to divide the data into different segmentations with unique correlations, which might be used to decide on the structure or the forming NNs algorithm. The innovative segmentation approaches of the data into different groups postprocessing the DLNNs stem from significant data approaches of segmenting the data to various groups to understand its correlation with other factors that influence nonlinear time history analysis such as earthquake duration, time steps and level number.

Fig. 1
figure 1

The main procedure of damage detection through establishing the one damage level matrix and 1D-CNNs

2.1 Concrete Frame Structure

The first stage of this study involves modelling and simulating the 7-storey concrete frame structure (see Fig. 2). The frame structure is modelled and designed using the Sap2000 structural analysis. According to ACI 318–14 [28], the dead and live loads added to the beams are 20 and 10 kN/m, respectively. The concrete material's compressive strength is 25 MPa, and the steel grade is 60 (\({f}_{y}\)=420 MPa). The seismic design of the 7-story concrete frame structure is conducted following ASCE 7–16 seismic load provision and assumed the following parameters for spectral acceleration at 0.2 and 1 s periods as follows: 0.75 and 0.3, respectively. The study considered a medium stiff soil class (D), R response modification factors = 8, system over strength factors = 2 and deflection amplification factors = 4 following ASCE 7–16 Table 12.2–1. The design outcome of the frame structure is two-column sections with a cross-sectional size of 400 \(\times 600\) mm for the first four floors and 300 \(\times 500\text{mm}\) for the subsequent floors. The beam cross-sectional size is 300 \(\times\) 350mm, the optimal section selected for beams. The reinforcement details for the beam and column sections are presented in Fig. 3.

Fig. 2
figure 2

Seven-story concrete frame structure with 3d view; all dimensions are in mm

Fig. 3
figure 3

Typical beam and column sections for a 7-storey steel frame structure

2.2 Steel Frame Structure

Another steel structure was modelled and simulated under incremental nonlinear time history (IDA) analysis [29] to test the deep learning algorithms with different structure datasets [29]. The steel frame structure is modelled and designed utilizing Sap2000. According to Eurocode-3–2005 [30], the dead and live loads added to the beams are 20 and 10 kN/m, respectively. The seismic design of the 7-story steel frame structure is conducted following ASCE 7–16 seismic load provision and assumed the following parameters for spectral acceleration at 0.2 and 1 s periods as follows: 0.75 and 0.3, respectively. The study considered a very dense soil and soft rock (C) and R response modification factors = 3, system over strength factors = 2 and deflection amplification factors = 4 following ASCE 7–16 Table 12.2-.The design outcome of the frame structure according to the Eurocode-3–2005 under the dead and live loads is six-wide flange columns sections with a sectional size as the following in inches and weighs in Ib/ft W12 \(\times 62\), W12 \(\times 72\), W12 \(\times 58\), W12 \(\times 45\), W12 \(\times 40\), W12 \(\times 35\) and W12 \(\times 30\). The beam sections of two sizes are W12 \(\times 45\) and W12 \(\times 40\), the optimal section selected for beams. Figure 4 shows the frame width and height, and the 3d view shows the frame sections and the structural levels.

Fig. 4
figure 4

Steel structure 3d view and elevation to show levels and sections

In order to extract acceleration response signals and their corresponding damage levels, the method involves subjecting the structure to different earthquake varies in frequency content. Thus, each structure joint is assigned plastic hinges to the end of the columns and beams to evaluate the performance level of different structure elements. Hinges represent a member's localized force–displacement or moment rotation relationships through its elastic and inelastic phases under seismic loads. For example, the plastic hinges represent the moment rotation of structural elements during their elastic and inelastic phases. The hinge types from ASCE/SEI 41–17 Tables 10.7–10.9 inside SAP2000 software are chosen depending on the section types and materials. M3-type plastic hinges are selected for beams to consider bending in beams, while in the case of columns, P-M2-M3-type plastic hinges are chosen to take into account the axial force and biaxial moment. In this study, four different datasets were utilized to examine the CNNs: The dataset for the concrete frame structure (with different seismic records as in Sect. 2.3), steel frame structure (with a different seismic record as in Sect. 2.3), experimental benchmark and concrete frame structure dataset (conducted with nonlinear time history analysis using Chi-Chi-TCU052 seismic record as its different in it earthquake properties such as ∇t = 0.005, duration). Different datasets were generated using nonlinear time history analysis except for experimental benchmark datasets that were conducted using shaker vibration.

2.3 The Nonlinear Time History Analysis with Hibler-Huges-Taylor Method

The nonlinear time history analysis is then subjected to the structure to obtain the dynamic responses (i.e., acceleration time history responses). For nonlinear dynamic analysis, the implicit direct integration method is considered the most effective method of solving the nonlinear equations of motion [31]. The time integration method using Hibler-Huges-Taylor (HHT) is utilized, with direct time integration methods at each time step, which applies to the dynamic response for linear and nonlinear systems [31].

The generalized algorithm with the acceleration increment and the improvements made by Hughes T [32] as Eqs. 12 are thoroughly explained for the linear system to show the initial condition and the output of the method[33]:

$$\begin{aligned} {F}_{a}& =\left({\widehat{\Theta }}_{1}\widehat{\eta }M+{\widehat{\Theta }}_{2}\widehat{\gamma }\Delta tC+{\widehat{\Theta }}_{3}\widehat{\beta }\Delta {t}^{2}K\right)\Delta a\\ & \quad +K\left[{d}_{a}+{\widehat{\Theta }}_{1}\Delta t{v}_{a}+\frac{1}{2}{\widehat{\Theta }}_{2}\Delta {t}^{2}{a}_{a}\right]\\ &\quad -{\widehat{\Theta }}_{1}\left({F}_{a+1}-{p}_{a}\right)+M{a}_{a}+C\left[{v}_{a}+{\widehat{\Theta }}_{1}\Delta t{a}_{a}\right]\end{aligned}$$
(1)
$$\begin{array}{c}\\ {s}_{a+1}={s}_{a}+\Delta t{v}_{a}+\frac{1}{2}\Delta {t}^{2}{a}_{a}+\widehat{\beta }\Delta {t}^{2}\Delta av\\ ,\\ {v}_{a+1}={v}_{a}+\Delta t{a}_{a}+\widehat{\gamma }\Delta t\Delta a,\\ {a}_{a+1}={a}_{a}+\Delta a,\end{array}$$
(2)

where M is the mass matrix, C is the damping matrix, K is the stiffness matrix, \({\text{a}}_{\text{a}}\) is the acceleration, \({\text{v}}_{\text{a}}\) is the velocity and \({\text{S}}_{\text{a}}\) is the displacement and \({\text{F}}_{\text{a}}\) is the external load.

where

$$\begin{aligned} \hat{\Theta }_{1} &= \hat{\Theta }_{2} = \hat{\Theta }_{3} = \left( {1 + \alpha } \right), \hat{\beta } - \frac{1}{4}(1 - \alpha )^{2} ,\\ \hat{\gamma } &= \frac{1}{2} - \alpha ,~\hat{\eta } = \frac{1}{{\left( {1 + \alpha } \right)}}\end{aligned}$$

The initial condition for the Hilber, Hughes and Taylor methods is the mass, stiffness and damping matrix developed from modal analysis [34]. Then the Hilber, Hughes and Taylor method for each time step can be solved. The modal analysis characteristics for each structure are summarized in Table 2 .

Table 2 Modal analysis of dynamic characteristic natural frequencies and time periods for different mode shapes and structures used in this study

In order to extract acceleration response signals and their corresponding damage levels, the methods involve subjecting the structure to different earthquake varies in frequency content. Thus, each structure joint is assigned plastic hinges to the end of the columns and beams to evaluate the performance level of different structure elements. Hinges represent the localized force–displacement or moment rotation relationships of a member through its elastic and inelastic phases under seismic loads. For example, the plastic hinges represent the moment rotation of structural elements during their elastic and inelastic phases. The hinge types from ASCE/SEI 41–17 Tables 10.7–10.9 inside SAP2000 software are chosen depending on the section types and materials. M3-type plastic hinges are selected for beams to consider bending in beams. In the case of columns, P-M2-M3-type plastic hinges are selected to take into account the axial force and biaxial moment.

2.4 The Selected Earthquake Records

The generalization of the proposed algorithm is influenced by the wide range of earthquake records needed to train the 1D-CNNs (Fig. 5). In order to show the earthquake dynamic characteristics and provide a wide range of earthquake records, the main selected parameter used in this study is the peak ground acceleration to velocity ratio (A/V) [35]. Valuable information can be extracted from the (A/V) ratio related to the strong earthquake frequency content and duration as its correlation with other factors such as M (earthquake magnitude) and R (epicentral distance). Using 45 earthquake acceleration response spectra on 5% damping, Tso et al. [35] examined the relation between the frequency content and A/V. Three significant subdivisions using A/V ratio parameters in which the low range group having A/V < 0.8 g/m/s, while the high range and intermediate group having A/V > 1.2 g/m/s, 0.8 g/m/s < A/V < 1.2 g/m/s, respectively. Their result indicates that for 0.7 s or longer, the three groups have the same mean spectral acceleration; for short periods, however, the A/V with low value has the lowest mean spectral acceleration or vice versa. On the other hand, the relationship between the A/V ratio and magnitude and epicentral distance can be established as follows: In the small and moderate earthquake area, the A/V ratio was found to be high value. While at large distances from small and moderate earthquakes, the obtained A/V values were found to be low and intermediate values. The selected ten earthquake records are varied in A/V ratio, which is summarized in Table 3, and their acceleration time history is presented in Fig. 5.

Fig. 5
figure 5

Acceleration time history of the selected earthquake record for nonlinear time history analysis

Table 3 Different earthquake records used for nonlinear time history analysis

2.5 Seismic Damage Level

The outcome of the time history analysis is the acceleration responses and the categories of the damage level for different joints according to FEMA 356 (Federal Emergency Management Agencies) [36] and ASCE 41–17 [37]. Tables 10–7 in ASCE 41–17 provide rotation values for reinforced concrete beams, while Tables 10.8 and 10.9 offer values for reinforced concrete columns at points a, b, c, IO, LS and CP in radians, as illustrated in Fig. 6 [33]. The variation in these points depends on factors such as the amount of transverse and longitudinal reinforced steel, material properties and forces acting on the beam, column, axial and shear forces. Five key points, namely A, B, C, D and E, are presented to define plastic hinge behaviours along the moment-rotation curve, serving as pivotal points for the backbone curve. In Fig. 6, line AB represents the linear elastic range, extending from the unloaded point A to the effective yield at point B. The subsequent segment from B to C denotes the inelastic but linear response characterized by reduced (ductile) stiffness. This phase is termed the nonlinear state of the hinges, with its limitations defined from the 'immediate occupancy' level, where structural components experience minimal damage, to the collapse prevention (CP) level, where the structure sustains significant damage but remains standing.

Fig. 6
figure 6

The plastic hinge performance level points on the moment-rotation curve of a typical plastic hinge

(B-IO) is considered safe or at the operational level.

(IO-LS) is considered immediate occupancy.

(LS-CP) is considered life safety.

(CP-C) is considered collapse prevention.

2.6 The Preprocessing and Forming of the Damage Matrix

Before processing the time series acceleration data through the deep learning algorithm, the data must be formed in the data structure recognizable by the DLNNs algorithm. The 1D CNNs recognize 1D signal data, while 2D CNNs extract features from the 2D image, different acceleration responses and their associated damage detection categories are formed; for example, if the dataset is R, the training datasets for concrete frame structures\({R}^{140\times 900}\), the validation dataset is \({R}^{560\times 900}\), formed as a tensor of a 2d array. While the datasets in the steel frame structures are N, the training datasets are\({N}^{112\times 900}\), and the validation datasets are \({N}^{448\times 900}\). When the datasets for both structures are combined into one dataset for examining the two structures \({Z}^{1260\times 900}\) in the one damage matrix. The preprocessing stage includes forming the data from SAP2000 into one damage matrix. The data from SAP2000 were extracted in the form of tables containing earthquakes, joint numbers and acceleration responses. All the data are essential for data postprocessing techniques to examine different earthquake characteristics, as shown in Sects. 3.5 & 3.6. However, to form the one damage matrix, acceleration response from IDA and its damage detection labels are used. Different Python libraries, such as NumPy and Panda, are used to convert to this form. An automated algorithm that performs the iteration over the different acceleration response signals is coded to form the required matrix and be executed for different structures obtained from the same FEM software.

2.7 Simulating Environmental Effect

The environmental effect can lead to misdiagnosis of the damaging effect on the structure. Different damage diagnoses will result from changes in the temperature as changes in the acceleration responses. In order to account for environmental effects, different temperature variances are adapted along with the nonlinear time history analysis process. The temperature is considered a variable to be uniformly distributed between 10 °C and 35° C. The elasticity modulus is assumed to be temperature-dependent, as illustrated in Fig. 7 [38]. The collected acceleration responses are then used to form the one-damage-matrix and tested separately along with their damage categories; their result is illustrated in a separate Sect. 3.3

Fig. 7
figure 7

Young's modulus of the steel as dependent on temperature [38]

2.8 The Effect of the Noise

After collecting the time series acceleration response signals and merging them into a one-damage matrix, given a damaged matrix containing the acceleration response without a noise, each element inside the original matrix was calculated as its means and squared to calculate the signal power. Noise power is then calculated based on Noise power (n)= signal power/SNR_linear.

SNR_linear is the linear value of the signal-to-noise ratio after converting it from decibels to linear scale.

$$\ A \, = {\text{ X }} + {\text{ N}}$$
$${\text{N }} = {\text{Z}}\left( {{\text{R}}\left( {\text{X}} \right)} \right).$$

where à is the noisy datasets, X is the original datasets, N is the Gaussian white noise, Z generated value with standards normal distribution with the shape of the original signals matrix \({R}^{a\times b}\) where is the number of row, b length of row.

To examine the proposed algorithm under different environmental effects, different noise levels were used to pollute the one-damage matrix, with 10dB, 5 dB and 2.5dB signal-to-noise (S/N) ratios. The error increased with different levels of noise. For example, the highest accuracy is shown in the model with 10 dB S/N, followed by the models with 5 dB S/N and 2.5 dB S/N.

2.9 The Deep Learning Algorithms and Traditional Neural Architecture

2.9.1 Convolutional Neural Networks Layers

2.9.1.1 One-Dimensional Convolutional Layer

The one-dimensional convolutional layer algorithm executes two operations through the input array (the damage matrix). In detail, the operation on the input array is considered an element-by-element multiplication by the kernel and the product is then summed as presented in Equation (3). The added value and the bias are fed into the activation function, which is the ReLU activation function. The process is repeated through the array sequence (see Figure 8) and fed into the activation function, which considers another layer of the ReLU activation function. The kernel is the same width as the input array. The kernel weight is the trainable parameter and can be optimized by the training algorithm, as the Adam optimizer is used as the most efficient training algorithm examined for optimal CNNs model over different training algorithms such as gradient descent and adaptive gradient descent. Figure 8 presents the process inside the 1D convolutional layer; it only shows integer values and different actual parameters are considered in CNN's models.

Fig. 8
figure 8

The process of the 1D convolutional layer of the sequential acceleration array

$$\text{F}\left(\text{i}\right)= {\int }_{-\infty }^{\infty }\text{X}\left(\text{i}\right)\text{ R}\left(\text{n}-\text{i}\right)\text{dn}$$
(3)

In CNNs, the \(\text{X}\left(\text{i}\right)\) function is called the input function, and the function R(n-i) is the kernel or filter, and the output of this process is the feature maps.

2.9.1.2 Rectified Linear Unit (ReLU) Layer

The activation process is an essential part of the neural networks in which a transformation process to a nonlinear mapping from a linear one is considered. A simple RELU activation function form is \(\text{f}\left(\text{x}\right)=\text{max}\left(0,\text{x}\right)\) (Fig. 9) and its derivative is zero for x < 0. It has some advantages; at the right side of the function, the gradient can be preserved rather than vanishing. This brings a better training time than the use of the sigmoid function. However, the advantages of using ReLU, it suffers from significant disadvantages, known as the dying ReLU in training; during the process of training, some neurons do not give any values rather than zeros; they, in fact, die and do not give any output. To solve the significant disadvantages, the use of LeakyReLU [39] \({\text{f}}\left( {\text{x}} \right) = \max \left( {\alpha {\text{x}},{\text{x}}} \right)\)(see Eq. 4). The hyperparameter \(\alpha\) is defined for x < 0 and helped escape dying ReLU by the nonzero gradient when the neurons get inactive. The LeakyReLU activation function is proposed in this study to face the major disadvantages of other activation functions.

Fig. 9
figure 9

ReLU left and leaky ReLU right

$$\text{f}(\text{x})=\left\{\begin{array}{l} x \; {\text{i}}{\text{f}} \, x>0\\ \alpha x\end{array}\right.$$
(4)
2.9.1.3 Batch Normalization (BN)

In neural network training, Batch Normalization is a specialized layer that removes the effect of the so-called internal covariate shift problem. In short, the internal activation distribution varies as the network weights change during training. This results in low convergence as the learning rate is forced to adapt unstable distributions to each training step [40]. With a low computational cost, the BN layer is meant to overcome this issue. In detail, the batch normalization layer could be initialized after each layer as a strategy to reduce the danger of exploding gradient problems. At every batch of training data, a calculation of the mean batch \(\widehat{\text{X}}\) and variance \({\upsigma }_{\text{B}}^{2}\) is processed during the batch normalization layer training. It does so by evaluating the input variance over the current mini-batch. Finally, the shifted data \({\widehat{\text{a}}}^{\left(\text{i}\right)}\) are assigned a weight \(\upgamma\) and bias \(\upbeta\). The calculation of the batch normalization layer is given from Eqs. 58.

$$\begin{array}{c}\widehat{X}=\frac{1}{{m}_{B}}\sum_{i=1}^{{m}_{B}}\begin{array}{c}{a}^{\left(i\right)} \\ \end{array}\end{array}$$
(5)
$$\begin{array}{c}{S.D}^{2}=\frac{1}{{m}_{B}}\sum_{i=1}^{{m}_{B}}\begin{array}{c} \\ {\left({a}^{\left(i\right)}-\widehat{X}\right)}^{2}\end{array}\end{array}$$
(6)
$${\widehat{a}}^{\left(i\right)} =\frac{{a}^{\left(i\right)}-\widehat{X}}{\sqrt{{S.D}^{2}+\epsilon }}$$
(7)
$${z}^{\left(i\right)}=\gamma \otimes {\widehat{a}}^{\left(i\right)}+\beta$$
(8)

where

\({\widehat{\text{a}}}^{\left(\text{i}\right)}\) is the zero-centred vector input normalized over the instance i.

\(\widehat{\text{X}}\) is calculated input vector means over the entire mini-batch.

\(\text{S}.\text{D}\) is the input vector standard deviation.

\({\text{m}}_{\text{B}}\) is the number of instances in the mini-batch.

\(\upgamma\) Scale parameters vector output.

\(\otimes\) Element-wise multiplication.

\(\text{\rm B}\) is the one shit parameter output offset for the input.

ε is a small number to avoid multiplication by zero.

\({\text{z}}^{\left(\text{i}\right)}\) is the rescaled and shifted output version for the inputs.

2.9.1.4 Global 1D Pooling Layer

Pooling operation provides a reduction of the dimension of the feature maps, and one famous example of the pooling layers is max pooling and average pooling. The 1D average pooling is a downsampling technique that improves statistical efficiency and reduces computational effort. It is similar to the convolutional layer; instead of the matric multiplication, it computes the mean of each entire feature map within its "kernel size" along the time series direction. After pooling, the output is translated to a single number per feature map.

2.9.1.5 Fully Connected Layer

It can be considered as the hidden layer in the traditional neural networks. The process involved a multiplication of the input value \({\text{x}}_{\text{I}}by\) the weight \({\text{w}}_{\text{ji}}\) and it summed with the bias as shown in Eq. (9) and Step 4. The weight and the biases are trainable parameters, and the ReLU activation function is used in this layer in this study.

$${y}_{j}=\varphi \left(\sum_{i} {w}_{ji}{x}_{i}+b\right)$$
(9)
2.9.1.6 Dropout Layer

The dropout layer [41] is an effective solution to the overfitting problems experienced during training and validation of the CNNs as regularization techniques. The process involves the inactivation operation of some neurons at every training step (excluding output neurons) and reactivates them during validation. The hyperparameter probability "p" or the dropout rate in which the neuron is simply "Dropped out" or ignored during the training process and normally can be introduced between 10 and 50%. An improvement of the overall performance of the CNNs was utilized as the introduction of that technique, and the validation result was boosted for the optimal CNNs model from 85% to 87% in validation accuracy as the overfitting was reduced.

2.9.1.7 SoftMax Output Layer

This is the final stage, as shown in Step 4, in which different categories are classified. The classification can be achieved by calculating the probabilities of each input and dividing them by the sum of each exponential of the other inputs. The probabilities of each structural damage class are evaluated, and the structural damage condition with high possibilities is the final output, as shown in Equation (10).

$${y}_{i}=\frac{\mathit{exp}\left({x}^{\left(i\right)}\right)}{{\sum }_{i=1}^{n}\mathit{exp}\left({x}^{\left(\dot{I}\right)}\right)}$$
(10)

2.9.2 1D Convolutional Neural Network Architecture

Convolutional Neural Networks (CNNs) as a deep learning algorithm (DLNN) and a subclass of Feed Forward Neural Networks (FFNN) are capable of extracting features from 1D signals or images. They are mainly used in computer vision or pattern recognition problems. The convolutional stage in the neural training process is an operator that multiplies a value from another function to the primary function. This is followed by obtaining a new function and aggregating it against the interval [42]. The three main components of CNNs are the convolutional, max-pooling and activation functions. The hidden layer's complexity determines the neural network's ability to solve complex problems. Recent studies have focused on using CNNs for sequential pattern recognition. The CNNs architecture comprises three main layers: pooling (helps in dimensionality reduction), convolutional (helps to extract features) and fully connected layers for classification purposes.

To classify different input signals and their corresponding level of performance, the optimal design architecture of CNNs is found to consist of four stacks of layers containing the following convolution 1DLayer, Leaky-ReLU Layer, Batch Normalization Layer, dropout layer, followed by Global Average Pooling 1D Layer, fully Connected Layer, with SoftMax activation function (see Figure 10). The following sections explain each layer's procedures on the one-damage matrix. The evaluation matrix is measured with sparse categorical accuracy, and the loss is evaluated using sparse categorical cross-entropy loss. Different learning algorithms are examined to conclude that the best performance is the Adam optimization algorithm. The learning rate is 0.001, the number of epochs is 1000, and the batch size is 32. Early stopping introduces epochs with a monitoring function of validation loss value.

Fig. 10
figure 10

The architecture of the CNN in detail

2.9.3 The Long, Short-Term Memory (LSTM)

One of the main functionalities of the LSTM is the development of long-term dependencies and the generation of a future prediction. The main reason for the LSTM developments is the significant problem of gradient vanishing or exploding of Recurrent Neural Networks, which prevents the RNNs from capturing a long-term dependency [43]. In the traditional neuron cells, a memory cell takes place (Fig. 11). LSTM is capable of learning long-term dependencies in sequential data, and different gates require different activation functions. Forget gate is used to select which information should be discarded through the sigmoid activation function [44]. In the input gate, clarification of the information that needs to be stored, the hyperbolic tangent function is used to construct the candidate vector. Finally, the tanh function is used in the output gate to obtain the final output produced from the element-wise product of \({O}_{t}\) which is established through the sigmoid activation function of \({c}_{t}\) as explained in the sequence of equations as follows [44]:

Fig. 11
figure 11

Long short-term memory architecture LSTM cell, LSTM architecture

  1. (a)

    Forget gate (see Fig. 11a forget process)

$${\text{g}}_{\text{t}}=\text{S}\left({\text{W}}_{\text{f}}\left[{\text{Z}}_{\text{t}-1},{\text{x}}_{\text{t}}\right]+{\text{C}}_{\text{g}}\right)$$
(11)

S represents the sigmoid activation function, \({Z}_{t-1}\) is the previous LSTM cell output, \({W}_{f}\) is the weight, and \({C}_{f}\) represents the biased quantity.

  1. (b)

    Input gate

$${I}_{t}=S\left({W}_{i}\left[{Z}_{t-1},{x}_{t}\right]+{C}_{i}\right)$$
(12)
$${\widetilde{C}}_{t}=tanh\left({W}_{C}\left[{Z}_{t-1},{x}_{t}\right]+{c}_{c}\right)$$
(13)

where \({W}_{i}\) and \({W}_{C}\) represent the weight, and \({b}_{i}\) and \({b}_{c}\) denote the bias term.

  1. (c)

    Update gate

$${C}_{t}={g}_{t}*{C}_{t-1}+{I}_{t}*{\widetilde{C}}_{t}$$
(14)

where * is the element-wise product. The element-wise product of Ct − 1 and \({\text{g}}_{t}\) representing the portion of the previous cell that required to be retained, and the element-wise product of it, and Ct represents the new information that needs to be added.

  1. (d)

    Output gate.

$${\text{o}}_{\text{t}}=\text{S}\left({\text{W}}_{\text{o}}\left[{\text{Z}}_{\text{t}-1},{\text{x}}_{\text{t}}\right]+{\text{C}}_{\text{o}}\right)$$
(15)
$${\text{Z}}_{\text{t}}={\text{o}}_{\text{t}}*\text{tanh}\left({\text{C}}_{\text{t}}\right)\text{,}$$
(16)

The LSTM is examined in this study to provide a comparative analysis of the best performance deep learning compared to other DLNN models, and the result is presented in Sect. 3.1. The evaluation matrix is measured with sparse categorical accuracy, and the loss is measured with sparse categorical cross-entropy loss. The optimization algorithm is the Adam optimization algorithm; the total epochs is 1000, batch size equals 32, and learning rate is 0.001. Early stopping is introduced with the monitor of the validation loss. Three different blocks of LSTM layers with memory cells (64,128,512) in a gradual increase in each block, respectively, followed by batch normalization and dropout layers. These three blocks of LSTMs are followed by three blocks of fully connected layers with two dense layers with a number of neurons equal to 256 and 128 and ReLU activation function and one dense layer of softmax activation function.

2.9.4 2D-Dimensional Convolutional Neural Networks (2D)

In order to provide a comparison with the other DLNN algorithms, 2D convolutional neural networks were examined with the spectrogram of the signals after the conversion into the frequency domain. The spectrogram is the visual representation of the spectrum of the signal as it varies with time. The study chose a low pass filter with a maximum signal frequency. The fast Fourier transform properties are implemented to convert to the frequency domain for the whole datasets as hamming windows, length of windows signals = 512 Hz. The spectrogram of the first signals inside the datasets is presented in Fig. 12. The shape of the input for the first image from 700 images inside one damage matrix to 2D CNNs with 3d tensor with the shape of \({R}^{100\times 70\times 4}\) where 100 \(\times\) 70 is the dimension for spectrograms, and 4 is the RGB channels. The chosen 2D CNNs are commonly well-known VGG convolutional neural networks with five blocks and 21 layers (see Fig. 13) [45]. The best training algorithm found among different other DLNNs (Adam, SGD, Vcorps) is the SGD with learning rate = 0.001 and momentum = 0.9.

Fig. 12
figure 12

Spectrogram representation of the first signals from the 7-storey concrete structure datasets

Fig. 13
figure 13

The examined 2D Convolutional Neural Networks(VGGNets)

2.9.5 Misclassified/Truly Classified Signal Arrays

Improving the effectiveness of this study in the real-time scenario requires further localization tasks and investigation of the misclassified acceleration signals ( \* MERGEFORMAT Figs. 14 & \* MERGEFORMAT 15); an algorithm that compares the truly and misclassified acceleration response signals with the original damage matrix is utilized for that purpose. In order to deal with big data involved in different matrices, the process involves utilizing different libraries such as Panada, Matplotlib and NumPy. After quantifying the damage, the location on each level is unknown or needs an automated process showing the location. The basic principle of the localization algorithm is when y_pred[i] (predicted labels) = y_val[i] (validated labels); the algorithm will differentiate two different arrays (the truly classified signals and misclassified signals) (see Fig. 15). In case the two arrays are similar, the algorithm picks the sequence and the associated level numbers. This algorithm is helpful in two tasks: localizing the truly classified acceleration signals and their corresponding damage level to the associated floors automatically without the need for manual comparison and localization and studying the wrongly classified signal by appending them into one matrix. These misclassified acceleration response signals are investigated in their correlation with other parameters, such as their frequencies on each floor and earthquake record characteristics (time steps and record length (see result Sects. 3.5 & 3.6)).

Fig. 14
figure 14

Improving the effectiveness of this study in the real-time scenario with an automated algorithm for localization and studying the truly/misclassified damage levels

Fig. 15
figure 15

The main principle of the searching and appending process of the localization algorithms

2.9.6 Examples of Decision-Making on Retrofitting

Postearthquake decisions can be made based on the predictive model, which considers different criteria, such as the load path and each joint state. The definition of the structural performance level according to FEMA 356 [36] from immediate occupancy to collapse prevention is as follows: Immediate occupancy (IO) means that the structure elements remain safe to occupy and there is minimal structural damage has occurred; some minor structural repair may be appropriate. On the other hand, collapse prevention means the structure elements continue to support gravity loads, but repairing the structure may not be technically practical. While life safety is at a postearthquake damage level, such as damage to structural components but maintaining margin against partial or total collapse, it should be possible to repair the structural damage to the elements. The examples are articulated to show the damage levels, and the decision can be made based on the FEMA 356 [36] retrofitting of the exciting building postearthquake after the prediction of DLNNs for the damage control of the level performance (see Fig. 16).

Fig. 16
figure 16

(a) Example of safe structure for occupation after earthquake, according to FEMA 356 [32] damage control of structural performance, (b) example of structure after an earthquake with minor structural repairs at second and third floor of the structure, (c) example of the structure where it is not technically practical to repair the structure as a collapse prevention damage level control shown at level 2, 3,4, 7

2.9.7 Experimental Dataset

To examine the CNN algorithm with experimental data, the study is examined with the University of British Columbia-IASC-ASCE Benchmark dataset (Fig. 17) [46]. The dimensions of the structure are considered as 2.5 \(\times\) 2.5 m length by width and 3.6 m tall which is established as a benchmark SHM problem, and the hot-rolled grade structure members are considered with 300 W steel. The structural columns are considered B100 Χ9 sections and S75Χ11 for the beams. The mass distribution on each floor is 1000 kg on the first and second floors and 750 kg on the fourth and third floors. The structure was excited with an electrodynamic shaker attached with 81.6 kg of mass to the body of the shaker to add to the structure's total mass. The shaker's mechanical properties are as follows:

Fig. 17
figure 17

Different damage scenario configurations for IASC-ASCE experimental model at the University of British Columbia

311 N maximum capacity, 19mm stroke and 2.5 m/sec achievable velocity.

Four accelerometer types were allocated to the structure, and to measure the shaker mass displacement relative to the structure, a linear variable displacement transformer (LVDT) was utilized for this purpose [46]. A series of tests were conducted on the structure to generate various damage scenarios. A simulation of different damage scenarios is examined by structural braces removal or by the bolt losing at the beam-columns connection. In case A, the braces are all in place without any removal, the normal state of the structure. For case B and all the subsequent tests, additional mass is added to the structure.

For the shaker experiment considered for examining the CNNs algorithm, nine different damage scenario configurations were conducted as follows (Fig. 17):

Configure 1: Fully braced scenario without any bracing removal or bolt loss.

Configure 2: Remove all bracing on the east side.

Configure 3: On the southeast corner bay, all braces on all floors were removed.

Configure 4: On the southeast corner bay, braces on the first and fourth floors were removed.

Configure 5: On the southeast corner bay, braces on the first floor were removed.

Configure 6: All braces on the east face and the second-floor braces on the north face were removed.

Configure 7: All braces were removed.

Configure 8: All bolts on the first to fourth floors were loosened.

Configure 9: On the first and second floors, all bolts were loosened.

The acceleration data from different levels are collected, preprocessed and combined in different scenarios. Three levels of damage for each scenario are assigned, from safe, partial and fully damaged, to simulate the different levels of damage from the numerical study consistently. These damage levels are as follows: in the sensor location where no damage is introduced, the state of the joint is healthy. In case the braces are removed at the sensor location, the acceleration responses are given the state of fully damaged. If the bolt is loosed at the joint where sensors are installed, partial damage is considered at this location. The acceleration responses are combined into the one-damage matrix with three damage levels to train and validate the proposed CNNs and examine the algorithm on different structures with different natural frequencies and stiffness.

2.9.8 T-SNE Stochastic Neighbours Embedding Techniques for Reduced 2D Maps of the High-Dimensional Concrete Frame Structure Acceleration Data

To get a better understanding of the data, the "t-SNE" technique is employed [47]. The higher-dimensional data considered in this study are the acceleration data of the concrete frame structure. In order to create a 2-D or 3-D map of the high-dimensional data, the 't-SNE' is employed (see Fig. 18). The global and local structure of the data can be kept by t-SNE visualization techniques. The basic principle of the "t-SNE" technique is that nearby points in the original high-dimensional space are also close by in the low-dimensional space, and vice versa. Stochastic Neighbour Embedding (SNE) works by the high-dimensional Euclidean distance conversion into similar representational conditional probabilities.

Fig. 18
figure 18

TSNE algorithm for different level of noise for examining the concrete frame structure: (a) 2.5 dB, (b) 5dB, (c) 10dB, (d) without noise

The 2d maps of the predicted damage level for the tested data from 0 to 3 where 0—immediate occupation, 1—life safety, 2—safe and 3—collapse prevention. The 2D-dimensional reduced maps have a 2-axis t-SNE axis-1 and t-SNE axis-2. The 't_SNE' technique is implemented to find the best presentation of the relationship of the original high-dimensional dataset. The visualization maps show the clustering of the predicted categories set by CNNs in different colours. The testing data in case of no noise introduction show no intervention between different clusters, and each cluster holds different positions on t-SNE maps except for one point from the safe categories, which lies in the different categories.

In order to show how the algorithm handles noise, the TSNE algorithm was implemented on noisy datasets; the result is presented in Fig. 18. With high noise-to-signal ratios in Fig. 18 a and b, there appears to be a clear gap between the same points in the same cluster, indicating noise, specifically, 0-IO noise level. However, with the scenario of the low noise-to-signal ratio, there is one point that appears in the other clusters, and the TSNE algorithm shows less ability to form a cluster compared to the original dataset with no noise (big gap between the same point in the same cluster) with the other two noisy scenarios.

3 Results and Discussion

3.1 The Results of the Examined Concrete Frame Structure and IASC-ASCE Benchmark Dataset

The study examines a 7-storey concrete frame structure through different NNs architectures and highlights the method's accuracy in predicting and classifying damage levels, as shown in Fig. 19. The model is also examined on the experimental IASC-ASCE Benchmark dataset to show the model's robustness in classifying different damage with their associated damage level, and the result is presented in Table 5. A comparative analysis is formed in Fig. 19, Tables 4 and \* MERGEFORMAT 5 to show the effectiveness of the methods compared to other algorithms in different criteria (time efficiency, overall prediction accuracy). Figure 19 shows the confusion matrix result for 1d-CNNs, LSTM, 1-layer MLPNNs, 2-layer MLPNNs, 2d CNNs and DNNs. The MLPNNs with one layer and two layers present the lowest prediction accuracies. On the time efficiency scale, the DNNs are of the best time efficiency followed by the CNNs; for other 1 and 2 layers MLPNNs and LSTM,2D CNNs showed the lowest time efficiency with 615 s (see Table 4). On the overall accuracy scale, the 1d-CNNs showed the best validation accuracies, followed by DNNs with an accuracy of 85.7% and the LSTM with an overall validation accuracy of 80%. Although both concrete frame structure and IASC-ASCE Benchmark have different natural frequencies and time periods, the CNNs were able to classify the damage with high cross-validation accuracy. However, due to the lack of label data in the IASC-ASCE Benchmark dataset, the algorithm showed a reasonable accuracy, as shown in Tables 4 and 5.

Fig. 19
figure 19

Confusion matrix for different deep learning and traditional neural networks architecture performed over the 7-storey concrete frame structure datasets to show the performance of NNs architecture (a) CNNs, (b) LSTM, (c) 1-layer MLPNNs, (d) 2-layer MLPNNs, (e) DNNs, (J) 2D CNNs

Table 4 Model precision, recall and F1-score for the damage-level classification
Table 5 Model training, testing accuracy and training time

Damage-matrix method has proven very effective in damage classification with high accuracy, especially in massive dataset scenarios. The influence of the noise on the numerical model examines the environmental impact of different noise levels, as highlighted in Sect. 3.2. The result is bounded by the localization algorithm's performance and its robustness in accurately localizing various signals to their respective joints and structural levels. Additionally, the algorithm's ability to distinguish between correctly classified and misclassified damage levels significantly influences the obtained results. The algorithm was further examined with the effect of environmental variables presented in temperature variation on the 7-storey steel structure, and the validation accuracy showed 80%, as presented in Sect. 3.3. To further examine the one-damage matrix, the study combines two structures that are different in their physical properties into a one-damage matrix. Their result is presented in Sect. 3.4, which proves that NNs learn by being exposed to different examples, which improves their generalization. A specific earthquake group was selected for investigation—comprising acceleration responses from the Chi-Chi earthquake. This choice aimed to explore the relationship between validation accuracy outcomes and earthquake characteristics such as duration and time intervals. The validation accuracy of the (1D-CNNs) for that specific earthquake group examination is illustrated in Fig. 25.

Three different equations (see Eqs. 18) are performed on 7-storey concrete frame structures in addition to the IASC-ASCE Benchmark dataset to evaluate the robustness of the CNNs model to classify different signals with their associated damage categories. For a specific class, True positive is the number of outcomes where the model truly forecasts the class. True Negative (TN) represents the number when the acceleration time series comes from different classes, and the model avoids assigning the true labels. Simply, a False Positive (FP) occurs when a model wrongly identifies an input as belonging to a specific category when it does not belong to that category. False Negative (FN) occurs when the model fails to identify an input as belonging to a specific category, even though it does belong to that category. Precision (P), recall (R) and F1-score are metrics used to evaluate the performance of a model. Precision measures how accurate the model is when identifying inputs belonging to a specific category, while recall measures how well the model identifies all inputs that belong to that category. F1-score is a combination of both precision and recall. The results for the 7-story concrete frame structure using 1D CNNs are presented in Fig. 19 and Table 4. The numbers 0,1,2,3 present the different damage scenarios from IO, LS, S and CP, respectively. The truly positive acceleration signals are presented on the true positive and true negative axes, which are 50,10, 37 and 23. The accuracy of the model can be calculated with Eq. 1820. This result presents the actual frame structure without any noise addition or extraction of specific earthquake groups, as demonstrated later in sections. The 7-storey frame concrete structure results from the mispredicted damage levels without noise injection for each earthquake and incremental nonlinear time history loading (Fig. 24). The segmentation of each earthquake type and incremental loading is conducted to study the effect of each earthquake acceleration response group on the performance of the CNNs algorithm to predict the labels correctly. The three major earthquakes in which the CNNs algorithm misclassified their acceleration response signals are the Long Beach earthquake, El Centro Site Imperial Valley and the San Fernando earthquake. The result showed the intensive measurements of the seismic records, such as the Chi-Chi earthquake, which has 18,000 record points in 90 s with ∇t = 0.005, the accurate the CNNs algorithm in predicting the class of the damage (Fig. 24a). For the incremental dynamic loading, the study showed a higher correlation between the mispredicted damage levels and the higher dynamic loading (> 1g), which accounts for 80% of the mispredicted damage levels.

The accuracy for each class for the 1d-CNNs algorithm is as follows from S-IO-LS-CP 94.87, 82%, 76, 9% and 85% as calculated in Eq. 19, respectively; this accuracy presents a slightly higher accuracy for the lower damage class as S and IO. In order to test the efficiency of 1 D-CNNs, other traditional or shallow neural networks, deep learning algorithms (LSTM) and deep neural networks were tested on 7-storey concrete frame structure datasets. A comparative analysis is formed in Fig. 19, Table 4 and 5 to show the effectiveness of the methods compared to other algorithms in different criteria (time efficiency, overall prediction accuracy). Figure 19 shows the confusion matrix result for 1d-CNNs, LSTM, 1-layer MLPNNs, 2-layer MLPNNs, 2d CNNs and DNNs. The MLPNNs with one layer and two layers present the lowest prediction accuracies. On the time efficiency scale, the DNNs are of the best time efficiency followed by the CNNs; other 1 and 2 layers MLPNNs and LSTM showed the lowest time efficiency with 615 s (see Table 4). On the overall accuracy scale, the 1d-CNNs showed the best validation accuracies, followed by DNNs with an accuracy of 85.7% and the LSTM with an overall validation accuracy of 80%. Although both concrete frame structure and IASC-ASCE Benchmark have different natural frequencies and time periods, the CNNs were able to classify the damage with high cross-validation accuracy. However, due to the lack of label data in the IASC-ASCE Benchmark dataset, the algorithm showed a reasonable accuracy, as shown in Tables 4 and 5.

$$\begin{array}{c}\\ P=\frac{TP}{TP+FP}\\ .\end{array}$$
(18)
$$R=\frac{TP}{TP+FN}$$
$${F}_{1}=\frac{2}{\frac{1}{P}+\frac{1}{R}}$$
$$\text{Accuracy\, for\, Class}\, S\hspace{0.17em}=\hspace{0.17em}\frac{45}{55}=81.18{\%}$$
(19)
$$\text{Accuracy\, for\, Class }IO\hspace{0.17em}=\hspace{0.17em}\frac{44}{49}=89.7{\%}$$
$$\text{Accuracy\, for\, Class} LS\hspace{0.17em}=\hspace{0.17em}\frac{12}{16}=75{\%}$$
$$\text{Accuracy\, for\, Class}\, CP\hspace{0.17em}=\hspace{0.17em}\frac{20}{20}=100{\%}$$
$$\begin{array}{*{20}c} {} \\ {{\text{overall}}\,{\text{accuracy}} = \frac{{TP + TN}}{{TP + FP + FN + TN}}} \\ \end{array} = 86.4\%$$
(20)

3.2 The Influence of Noise on the Performance of the 7-Storey Concrete Frame Structure

Figure 20 shows the effect of the noise on the model's accuracy in predicting the damage to the concrete frame structure. The optimal model is the one with the low signal-to-noise ratio of 10 dB as the optimal model, and the training and testing accuracy is the highest among the other models with the highest level of noise-to-signal ratio. The model can distinguish the damage severity to the four identified labels: S, IO, LS and CP. When the noise level increases, the model performance decreases, and the effect of the noise level on the algorithm identification. The data with a higher noise-to-signal level 5 dB and 2.5 dB tend to have a similar result with the benchmark model datasets as the noise level in both datasets.

Fig. 20
figure 20

The confusion matrix for different levels of noise (a)10, (b) 5, (c) 2.5dB

The optimal model result with the highest accuracy comparing the different noise levels is the 10dB model (Table 6). The testing accuracy for the 10 dB S/N ratio model is 79.2%, with 5% lower accuracy than the model without noise. The model with 2.5 dB showed a testing accuracy of 69.9% and less training time than the 10 dB model. Similarly, the 5 dB model has the lowest accuracy among all the models and the best training time (see Tables 6 and 7).

Table 6 Model with different noise levels and their corresponding training and testing accuracies
Table 7 Noise-level S/N precision, recall and F1-score

3.3 The Influence of Environmental Variables on the CNNs Algorithm Performance of the 7-Storey Steel Frame Structure

The result presented in Fig. 21 in this section shows the performance of the 1D-CNN algorithm when tested with temperature effect as described in Sect. 2.7. The acceleration response data used for testing groups are the result of the nonlinear time history analysis conducted using the San Fernando seismic record as presented in Table 3. The testing accuracy proves the ability of the CNNs to detect the damage level even though the effect of the temperature variance is from 10 °C to 35 °C. The testing accuracy is around 80%, proving the practicality of the deep learning algorithm to work in real time with varying environmental effects.

Fig. 21
figure 21

The confusion matrix for testing the temperature effect on the steel frame structure

3.4 The result of examining synthesis datasets of two structures' acceleration response (concrete and steel frame structure) into a one-damage-matrix

In order to implement the study in a real-time scenario, another factor was used to examine the CNN’s performance using a one-damage-matrix. This factor is the ability to distinguish the acceleration response in the presence of a synthesis of structures that vary in their modal and material properties. Two structures were combined into one-damage-matrix and used to train and test the neural networks. The CNNs are proven to learn if exposed to different examples, and the result exceeded both structures when trained alone using the same algorithm. Figure 22 and Table 8 show the confusion matrix of the testing result of two structures combined into one damage matrix with a validation accuracy of 91%. This result proves that the one-damage matrix effectively works when different structures are added to the same datasets and shows the application's performance as a system in real time. Another examination is the ability to generalize to earthquake groups not present in the same datasets. The numerical model was examined with nonlinear time history analysis with different earthquake groups, and the acceleration time history was used to examine the generalization of the trained model; the agreements between the code and the numerical model showed 90% similarities in different levels of damage.

Fig. 22
figure 22

The confusion matrix of the combination of two structures into the one-damage-matrix

Table 8 The training and testing accuracy of the combined structure model and the training time

3.5 The Result of the Localization Algorithm for the Mispredicted Damage Levels and their Correlation with the Floor Number

The main principle behind the data classification to different floors is to establish a correlation between the model accuracy to each class and their mispredicted damage level's existence on each floor. As a result, a decision could be made on each group (e.g., changing the training algorithm for this mispredicted instance). For the 7-storey frame structure without noise, the localization algorithm presents misclassified damage levels and their frequency of occurrence in the validation and testing set on each floor (Fig. 23). In the case where no noise was injected into the damaged matrix, the frequencies of the misclassified damage levels showed a higher correlation with the lower floor numbers (Fig. 23a). For the base, first, third and fourth floors, 75% of the mispredicted damage levels were localized on those floors. When the noise was injected with a 10 dB S/N, 5 dB S/N and 2.5 dB S/N ratio, the localization algorithm captured no correlation with the floor's numbers (Fig. 23b).

Fig. 23
figure 23

The mispredicted damage level and their frequencies in each floor number for (a) the model without noise and (b) the model with different S/N ratio

3.6 The Results of the Localization Algorithm for the Mispredicted Damage Level and their Correlation with the Earthquake and the Incremental Dynamic Loading

The 7-storey frame concrete structure results from the mispredicted damage levels without noise injection for each earthquake and incremental nonlinear time history loading (Fig. 24). The segmentation of each earthquake type and incremental loading is conducted to study the effect of each earthquake acceleration response group on the performance of the CNNs algorithm to predict the labels correctly. The three major earthquakes in which the CNNs algorithm misclassified their acceleration response signals are the Long Beach earthquake, El Centro Site Imperial Valley and the San Fernando earthquake. The result showed that the intensive measurements of the seismic records, such as the Chi-Chi earthquake, which has 18,000 record points in 90 s with ∇t = 0.005, the accurate the CNNs algorithm in predicting the class of the damage (Fig. 24a). For the incremental dynamic loading, the study showed a higher correlation between the mispredicted damage levels and the higher dynamic loading (> 1 g), which accounts for 80% of the mispredicted damage levels.

Fig. 24
figure 24

Mispredicted damage levels for (a) each earthquake group and (b) incremental dynamic loadings

To further examine each earthquake group's 1D CNNs accuracy prediction, the Chi-Chi earthquake is chosen to be examined without the other groups with the same incremental dynamic nonlinear time history analysis procedure. The CNNs training and validation accuracy result showed 94.14% in validation and testing accuracy (Fig. 25), exceeding the validation accuracy of 87% when the model was subjected to different earthquake records. The result from the Chi-Chi earthquake record highlighted the correlation between the earthquake characteristics, such as duration and time intervals, and the validation accuracy. The reason behind the improvement is the fact that the time step in Hilber-Hughes-Taylor methods is critical for the accuracy of integration methods. The Chi-Chi groups have the smallest input ground motion time step ∇t = 0.005. Theoretically, however, the implicit direct integration method is unconditionally stable for any time step for multi-degrees of freedom system, a smaller time step can give more accurate results in time integration; however, too small-time step size will increase the computational time considerably with no significant improvement [48].

Fig. 25
figure 25

Training and validation accuracy and loss of Chi-Chi earthquake acceleration response groups

In this study, we used exactly the same time step as the input ground motion time step or smaller ∇t = 0.005, ∇t = 0.01 for time step integration methods. Robert Ebeling et al. [49] compared the accuracy of a single degree of freedom system with the exact solution for the different step-by-step direct integration methods for different time steps for the input ground motion acceleration. The methods are also applicable to multi-degree freedom systems. The result showed a higher accuracy of ∇t = 0.005 when compared to the exact solution with other time steps. The agreement between the result and the previous fact highlighted the accuracy and superiority of the methods.

The reason behind the chosen CNNs methods instead of the time series neural networks such as LSTM and the 2d-CNNs is time efficiency, especially when the data get bigger and cumulative in the real-time scenario, and this is proven by comparative analysis in Sect. 3.1. The study chooses the time series sensory data as direct input instead of converting it to a 3D time–amplitude sensor or 4D time–frequency amplitude sensory data, as the latter requires higher computational power and comparatively much more training time, especially in the case of real-time scenarios as demonstrated in Sect. 2.9.4. Considering the conversion time and the higher level of the 3D or 4D data processing time, these techniques are time-consuming and less efficient in real-time scenarios. The method also chooses the direct feed of the time series sensory data to the CNNs algorithm without changing the domain, even though the frequency domain could result in slightly improved accuracy. The reason behind that is the consideration of the effectiveness of the methods and the need for a system with high response to natural disasters.

The concept of the damage matrix is proved to be highly effective with the result from the different NNs and the localization algorithm. Considering ten earthquakes with 20 incremental dynamic loading to the level 2 g of the earthquake and appending the acceleration responses to the damage matrix is proven to be time and storage-efficient and simulates the real-time scenario where the data flow into one matrix that recognizes the damage to the structure.

4 Conclusion

A novel DCNN-based method was developed to quantify the damage in different structures subjected to seismically and vibration excitation methods. Unlike traditional neural networks and deep learning methods used in the literature, this approach operates directly on the time series data without requiring preprocessing tools and handcrafted feature selection methods. This makes it highly effective in real-time and autonomous scenarios. The method was trained on a large dataset consisting of 1260 arrays of time series accelerations data for the combined datasets, 700 arrays for concrete structure and 560 arrays for the steel structure (each array contains a signal which contains 900 points).

The influence of noise on a numerical 7-storey concrete frame structure was also examined at three levels: 10 dB, 5 dB and 2.5 dB. When the DLNNs algorithm was examined with the acceleration response data from the Chi-Chi earthquake records, the validation accuracy improved to 94.14%. The study further examined the CNNs with a combination of steel and concrete structures inside the same datasets, and the result showed improvements beyond the accuracies of each structure examined alone, with 91% validation accuracies. Another aspect that is examined to work in real-time is its ability to work on a seismic record not presented in the dataset, which is tested and shows 90% similarities with the numerical models. In order to provide a sense of different deep learning NNs prediction accuracy compared to 1D-CNNs, including traditional MLPNNs, LSTM, DNNs and 2D CNNs. the study provides a comprehensive comparative analysis of 7-story concrete frame structures. The results demonstrated the potential for applying these DLNNs algorithms to different structures with varying natural frequencies and time periods, as shown by the time series acceleration data collected from the IASC-ASCE Benchmark dataset and seven-storey steel frame structures, and the results yielded reasonable accuracies.

It achieved testing and validation accuracies of 87% for the concrete frame structure, with confusion matrix scores of 85% in precision, recall and F1-score. The concluding remarks and results can be summarized in the following points:

  • The algorithm achieved testing and validation accuracies of 93% for synthesis datasets.

  • On the temperature effects, when examined with the steel frame structure datasets, it achieved 80% validation accuracy.

  • For the noisy datasets, on the three levels of noises, 10, 5, 2,5, the validation accuracy is 79%, and 69.9%, 69.9%, respectively

  • Comparative analyses are also formed on LSTM, MLPNNs (with different layers), 2DCNNs(VGG) and DNNs, and the validation accuracy of 80%, 80%, 75% and 86.6%.

  • The 1D CNNs were further examined on IASC-ASCE Benchmark dataset experimental datasets, and the validation accuracy result was 74%.

  • To localize different damage levels and to study the influence properties on DLNNs performance such as earthquake properties and structure heights and IDL, a localization algorithm is also coded, while investigation using segmentation approaches is also conducted

  • Specific earthquake groups (Chi-Chi earthquake) were also investigated to direct impact of the seismic record on the DLNNs and the result showed improvements to 94%, while the seismic properties such as time steps and duration hold a significant impact on the DLNNs performance

  • The study examined the DLNNs generalisation on unseen earthquake records, while the result showed 90% prediction compared to the numerical result

  • The study demonstrates postearthquake retrofitting with three examples, according to FEMA 356.

The result demonstrates the performance of different DLNNs and TNNs on different structures, while the direct use of the time sensory has proven the best performance in terms of time, resource and accuracy. The study investigates different real-time factors such as noise effect, temperature effect, synthesis datasets and generalization to unseen earthquake records. The study demonstrates the great implication of the autonomous seismic-induced system on the decision-making postearthquake retrofitting of the structure.

5 Future Research

Future research could investigate the generalization of the DLNNs on unseen dataset scalability and topology of the structure. For example, if the structure has been changed to 5-story structure, while the width and height are still the same, how can the NNs generalize? Other factors could include how wide the dataset includes different structures similar to or close to the model properties. Other factors include the examination of method scalability to larger-scale structures, including infrastructures, in terms of the time resources performance of the collected big data and how to handle this huge flux of data while maintaining an efficient process.

This study provides guidance on the segmentation approaches in improving the model accuracy when correlated with earthquake characteristics such as time interval and earthquake record length. Further investigation using the segmentation approaches utilizing data science approaches on the different deep learning performance (2DCNNs, LSTMs and DNNs) correlation with different earthquake characteristics such as frequency content and structure properties (e.g., levels heights—structure width and heights). The automated algorithm for the preprocessing of the label data will be provided to facilitate the training in the same environment where the FEM are established using SAP2000, and the extracted acceleration and displacement sensory responses will be fed to the DLNNs algorithm inside the TensorFlow environment.