1 Introduction

Structural health monitoring (SHM) of a building structure during the lifetime, or after a seismic activity, has gained increasing attention in the engineering field. Over the past decades, vibration-based damage detection techniques have been studied extensively and have found increasing applications in civil and mechanical engineering. Most damage detection methods typically involve data processing to explore changes in both the dynamic properties of structures such as vibration frequency (Farrar et al. 2001; Roux et al. 2014; Vidal et al. 2014) and mode shapes (Maia et al. 2003; Zhu et al. 2011; Rucevskis et al. 2016) directly related to the stiffness reduction as a consequence of structural damage. For practical applications, both methods require to excite a building at high frequencies, which is not easy to achieve, and therefore, the damage may go unnoticed. Following this line, the bulk of research (Doebling et al. 1998; Zou et al. 2000; Carden and Fanning 2004; Fan and Qiao 2011; Pau and Vestroni 2013) provides an extensive summary review of vibration-based damage identification methods. The authors discuss the advantages and limitations of different methods under this approach. For a recent review based on vibration-based damage detection, readers can consult Das et al. 2016; Kong et al. 2017. The accuracy of those methods depends on many sensors, and they might be biased by measurement noise (Rahai et al. 2007).

Other studies as in Loh et al. (2011) suggest that during earthquakes, buildings structures exhibit a nonlinear and hysteretic behavior. Within this context, in Farrar et al. (2007) authors point out that under cyclic excitation associated with earthquakes, degradation of a structure manifests itself in the evolution of the associated hysteresis loop. The idea behind is that the plastic strain amplitude is related to the number of cycles to failure, and it can be represented by means of stress strain loops (Ma et al. 2004). Similarly, in Ikhouane et al. (2005) it is reported that structural damage caused by earthquake may be due to excessive deformations, or it may be in the form of accumulated damage sustained under repeated load reversals. Under this line, reference (Chatzi et al. 2010) presents a review of some damage detection methods, where damage-sensitive data features are based on nonlinear system response. In Ceravolo et al. (2013), the Bouc–Wen hysteretic model is employed to identify physical parameters such as stiffness degradation, strength deterioration and hysteresis behavior of the reinforced concrete frame, to be used as a safety assessment index for the seismic assessment of RC building. Moreover, an extensive bulk of research of vibration-based nonlinear system identification for damage detection can be found in Bursi et al. (2013). Applications of Bouc–Wen model and online identification for full three-dimensional scale steel-concrete can be found in Shan et al. (2016) and Wan et al. (2018). It is important to note that when the nonlinearity is known, nonlinear identification may reasonably follow a parametric approach and estimates response matches experimental data. Otherwise, the estimation is not always achieved and it is not possible to guarantee parametric convergence, due to measurement noise, offset and uncertainty. The most usual form to avoid measurement noise and offset is by using band pass filter. However, a prior system bandwidth is required to obtain satisfactory results; otherwise, correct filtering of the signal cannot be guaranteed, producing unwanted estimated values.

With respect uncertainty during experiments, data collection, measurement process or when determining the initial values. Arqub et al. (2016) proposes a new method to solve numerically fuzzy differential equations based on the use of the reproducing kernel as a potential tool to model several real physical phenomena under possibility uncertainty. This method yields more accurate approximations, especially in nonlinear cases. In the same research direction, a new efficient iterative algorithm for solving the analytic and approximate solutions of second-order, two-point fuzzy boundary value problems by using the Reproducing Kernel Hilbert Space method under the assumption of strongly generalized differentiability is investigated in Arqub et al. (2017). Similarly, in Arqub and Abo-Hammour (2014) the employment of continuous genetic algorithms is proposed in order to numerically approximate a solution of linear and nonlinear systems of second-order boundary value problems. Reported results showed that the three methods are fast, accurate and very effective with a great potential in mathematical and engineering applications. However, in both cases, methods have not been evaluated for damage detection task.

Regarding nonlinear system identification, the artificial neural network approach has been widely used in characterizing structure-unknown nonlinear systems. The neural network framework offers a rigorous basis for identification systems, mainly because this approach does not require a mathematical model of the system. Neural networks also overcome the problem of parameterization of nonlinear systems, and their structure can be modified for each case (Chen et al. 1990). An exhaustive review of these methods can be found in Sohn et al. (2003). However, its application to physical systems is not always robust and accurate, and some methods demand long-time histories from the undamaged structure and intensive data processing, which is not always easy to achieve. Recently, the rapid advances in computation power has led to the use of deep learning techniques like convolutional neural networks (CNN) (LeCun et al. 1989) as a promising tool. The difference between the classical neural network (NN) and CNN are

  1. (1)

    CNN includes at least a convolutional layer, where units are not connected to all units in previous layer as in a fully connected layer and they are only connected to units near to them; it can be said that their receptive field is small.

  2. (2)

    The filters of CNN are shared in the same convolution layers. This allows to reduce the parameters to be trained, and certain properties or features from input data can be detected no matter its locations in the data.

  3. (3)

    The main layers of CNN are the subsample layers or pooling layers. These layers reduce the sizes of the data through the neural network, allowing deeper structures cause these layers do not require updatable parameters. This characteristic is especially beneficial because they reduce the number of units in the neural network.

Therefore, CNN is getting popular, especially for image classification with broad usage taking relevance in fields such as the automotive sector, industries, medicine, robotics and others. Satisfactory results under this approach are reported in Kim (2014) and Simard et al. (2003) that use CNN for sentence classification and document analysis, face recognition (Lawrence et al. 1997), road sign detection and classification (Bouti et al. 2018), Chinese license plates recognition (Liu et al. 2018b), bearing defects classification (Appana et al. 2018) and ImageNet LSVRC-2010 contest (Krizhevsky et al. 2012). Moreover, positive results have also been reported for detection task by using CNN. An accurate lithography hotspot detection framework is addressed by a CNN in reference (Shin and Lee 2016), obtaining better results and higher performance in the ICCCAD 2012 dataset, also achieving a time reduction compare to optical simulation methods and SVM. Within some new CNN applications, we have an automated identification of abnormal EEG signals (Yıldırım et al. 2018), Alzheimer’s disease detected by using magnetic resonance images and CNN (Vu et al. 2018), real-time ozone concentration prediction system (Eslami et al. 2019) and application on rotating machines failure detection (Udmale et al. 2019; Ma et al. 2019). Another recent application of CNN arisen in recent years is in civil engineering. CNN applied to buildings is very sensitive to damage assessment, because CNN takes advantage of minimal engagement of signal processing and automated features extraction for the fault diagnosis. In Cha et al. (2017), the convolutional neural network is used as a classifier over concrete cracks images to determine damage. In Lin et al. (2017), a convolutional neural network is used as a classifier for damage detection from data obtained from a low-level sensor and for feature detection at the same time. One-dimensional CNN for real-time damage detection is proposed in Abdeljaber et al. (2017); they propose to use a CNN for each joint and this way they can determine whether there is damage and locate it in a fast fashion. Following this line, Modarres et al. (2018) proposes a convolutional neural network basis on a computer vision approach in automated inspection to identify the presence and type of structural damage. Conducted simply from images Atha and Jahanshahi (2018), evaluates corrosion assessment on metallic surfaces using a different convolutional neural network and images. Other applications and variations of CNN to other specific damage detection contexts can be found in Zhao et al. (2019) and Liu et al. (2018a). In the latter, several algorithms with applications in rotary machines are presented. Note that most methods for damage diagnosis under CNN approach have reported satisfactory results in the analysis of images, and generally, they are developed on time domain. Since CNN incorporates random filters in its design, it reduces measurement noise.

However, the convolution in the time domain is increment operation that can require higher computation time with respect to other algorithms. Moreover, supposing that random filters do not completely eliminate measurement and offset, estimation can be biased of real data. An alternative to avoid these problems is to introduce frequency domain CNN (FDCNN) that adds an spectral polling layer to reduce the measurement noise. The detailed reasons for using frequency domain CNN to estimate the hysteretic displacement are:

  1. (1)

    CNN is getting popular for image classification with a broad usage spanning across automotive, industrial, medicine, robotics and others. The convolution operation is change for a element-wise product that reduces the operations amount. This advantage is reflected on the training stage because the algorithm requires to realize on each iteration many of these operations. CNN takes advantage of minimal engagement of signal processing and automated features extraction for the fault diagnosis. CNN applied to buildings is very sensitive to damage assessment.

  2. (2)

    FDCNN avoids memory size growth compared to traditional CNN based approach. FDCNN avoid the convolution stage.

  3. (3)

    FDCNN does not require any assumption on the type and localization of structural nonlinearity.

  4. (4)

    FDCNN does not require preprocessing stage and automatically learns directly from the vibration data and eliminates the noise components of the signal augmenting the system response that makes it robust to identification task.

In the past, several new research projects have been funded to improve the damage detection methods, including the use of innovative signal processing, new sensors and control theory. This paper highlights these new research directions and uses FDCNN to learn features directly from frequency data of vibration signals for damage detection in a building structure. The damage detection method is based on dissipated energy. Since the earthquake introduces several stress cycles in different directions in the structure, load-strain curves can be used as an indicator of damage. To represent these phenomena, a Bouc–Wen model is used, which is estimated through frequency domain CNN. It has been previously described that CNN has an outstanding performance as a classifier, but in the authors knowledge, there are not reported works that show FDCNN is used for system identification. The objects of the paper are:

  1. (1)

    We use the frequency domain CNN to model the hysteretic displacement via vibration data. Then, we apply the hysteretic displacement for the damage diagnosis.

  2. (2)

    Since the measurement noise and offset affect the identification systems, we use FDCNN to overcome it . The combination of frequency random filters and spectral pooling avoids measurement noise effect in the identification process. Therefore, the robustness of proposed algorithm is evaluated.

The main result in this paper is to show that the properties of FDCNN have some advantages over the time domain ones and NN when measurement noise in data exists as follows:

  1. (1)

    FDCNN overcome nonlinear parameterization in the identification system that is generally difficult to achieve. It can extract most important damage-sensitive characteristics automatically from acceleration signals.

  2. (2)

    The proposed algorithm is alternative solutions to the identification methods, which is robust to high-frequency measurement noise. In the frequency domain, the convolution stage is replaced by elements-wise product that reduces computational complexity, as well as the execution time.

  3. (3)

    The proposed method avoids long-time histories from the undamaged structure and intensive data processing. Moreover, it can work at both large and small scales, depending on the number and location of sensors. In our paper, the intermediate scale approach is taken focusing on the detection of damage at a storey level.

The structure of the paper is the following: First, the mathematical model of a building structure and the Bouc–Wen hysteretic model are presented in Sect. 2. The architecture of proposed frequency domain convolutional neural network (FDCNN) for system identification task is described in Sect. refSec:CNNf, as well as a frequency analysis in a convolutional layer and a sensibility analysis of FDCNN to noise data. Section 4 contains the experimental results conducted in a reduced scale two-storey building prototype in order to investigate the damage detection capability. Moreover, a comparison study between neural network (see “Appendix C”), time domain CNN (see “Appendix B”) and proposed FDCNN is carried out to evaluate the performance of proposed method. Finally, a summary of the findings is provided in Sect. 5.

2 Mathematical model of building structure

The dynamics of a multiple degrees of freedom (MDOF) shear building structures subject to seismic activity is described by

$$\begin{aligned} M\ddot{x}(t)+C\dot{x}(t)+\mathcal {K}x(t)=-Ml\ddot{x}_g(t) \end{aligned}$$
(1)

where

$$\begin{aligned} x(t)&=\{ x_{1}(t),x_{2}(t),\ldots ,x_{n}(t) \}^{T} \in \mathfrak {R}^{n\times 1}, \end{aligned}$$
(2)
$$\begin{aligned} \dot{x}(t)&=\{ \dot{x}_{1}(t),\dot{x}_{2}(t),\ldots ,\dot{x}_{n}(t) \}^{T} \in \mathfrak {R}^{n\times 1}, \end{aligned}$$
(3)
$$\begin{aligned} \ddot{x}(t)&=\{ \ddot{x}_{1}(t),\ddot{x}_{2}(t),\ldots ,\ddot{x}_{n}(t) \}^{T} \in \mathfrak {R}^{n\times 1}, \end{aligned}$$
(4)
$$\begin{aligned} l&=\{ 1,1,\ldots ,1 \}^{T} \in \mathfrak {R}^{n\times 1}, \end{aligned}$$
(5)
$$\begin{aligned} \ddot{x}_{a}(t)&=\ddot{x}(t)+l\ddot{x}_{g}(t) \in \mathfrak {R}^{1\times 1} \end{aligned}$$
(6)

The term n indicates the number of floors; the entries \(x_{i}(t), \ \dot{x}_{i}(t) and \ \ddot{x}_{i}(t)\), with \(i=1,2,\ldots ,n\), are the relative displacement, velocity and acceleration of each floor, respectively, measured with respect to the basement. Signal \(\ddot{x}_a(t)\) represents the absolute acceleration, and \(\ddot{x}_{g}(t)\) is the ground acceleration induced by the earthquake that is distributed by the influence vector l. Moreover, M, \(\mathcal {K}\) and C are the mass, stiffness and damping matrices, respectively, defined as

$$\begin{aligned} M=&\begin{bmatrix} m_1 &{} 0 &{}\cdots &{} 0 \\ 0 &{} m_2 &{}\cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} m_n \end{bmatrix}>0 \in \mathfrak {R}^{n\times n}\end{aligned}$$
(7)
$$\begin{aligned} C=&\begin{bmatrix} c_1+c2 &{} -c_2 &{}\cdots &{} 0 \\ -c_2 &{} c_2+c_3 &{}\cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} c_n \end{bmatrix}\ge 0 \in \mathfrak {R}^{n\times n} \end{aligned}$$
(8)
$$\begin{aligned} \mathcal {K}=&\begin{bmatrix} \kappa _1+\kappa _2 &{} -\kappa _2 &{}\cdots &{} 0 \\ -\kappa _2 &{} \kappa _2+\kappa _3 &{}\cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} \kappa _n \end{bmatrix}>0 \in \mathfrak {R}^{n\times n} \end{aligned}$$
(9)

where parameters \(c_{i}\) and \(\kappa _{i}\) are, respectively, the lateral column damping and stiffness between the ith and \((i-1)\)th storey.

Note that the damping at the building structure is represented by Rayleigh model, Chopra (1995) defined by

$$\begin{aligned} C=a_{0}M+a_{1} \mathcal {K} \end{aligned}$$
(10)

where the Rayleigh parameters \(a_{0} \ \text {and} \ a_{1}\) are calculated by using the first and third eigen-frequencies \(\omega _{i}\), in the following expression

$$\begin{aligned} \frac{1}{2} \begin{bmatrix} \frac{1}{\omega _{i}} &{} \omega _{i}\\ \frac{1}{\omega _{j}} &{} \omega _{j} \end{bmatrix} \begin{bmatrix} a_{0}\\ a_{1} \end{bmatrix}= \begin{bmatrix} \xi _{i}\\ \xi _{j} \end{bmatrix} \end{aligned}$$
(11)

where \(\xi _{j}\)\(\omega _{i}\) with \(i=j=1,2,\ldots ,n\) are the damping ratio and the vibration frequency of the ith structural mode, respectively. Note that model (1) assumes that the building structure is undamaged and operates in its elastic range.

Remark 1

Initially the building is at rest, that is, \(x(0)=\dot{x}(0)=\ddot{x}(0)=0\). Moreover, ground acceleration is zero before an earthquake \(\ddot{x}_g=0\).

Remark 2

Acceleration measurements of each storey and basement are available, and they are affected by offset and high-frequency measurement noise.

$$\begin{aligned} \ddot{x}_{m}&=\ddot{x}+\varsigma +\lambda \end{aligned}$$
(12)
$$\begin{aligned} \ddot{x}_{gm}&=\ddot{x}_g+\varsigma _g +\lambda _g \end{aligned}$$
(13)

with \(\ddot{x}_m=[\ddot{x}_{1m}\ \ddot{x}_{2m}\ \ldots \ \ddot{x}_{nm}]\) is the measured acceleration vector, \(\ddot{x}_{gm}\) is the measured ground acceleration, \(\varsigma =[\varsigma _1 \ \varsigma _2 \ldots \ \varsigma _n]\) and \(\varsigma _g\) are measurement offsets, and \(\lambda =[\lambda _1 \ \lambda _2\ \ldots \ \lambda _n]\) and \(\lambda _g\) are high-frequency measurement noises. For ease \(\ddot{x}_m\) will be considered as \(\ddot{x}\) throughout the article.

Remark 3

Assuming that the building is damaged, a nonlinear degradation term is introduced in (1) that relates the strain-stress with damage

$$\begin{aligned}&M(\ddot{x}+l\ddot{x}_g)+C\dot{x}+T\rho (x,z)=0 \end{aligned}$$
(14)
$$\begin{aligned}&T=diag \begin{bmatrix} 1,&1,&\ldots ,&1 \end{bmatrix}\end{aligned}$$
(15)
$$\begin{aligned}&\rho (x,z)= \begin{bmatrix} \rho (x_{1},z_{1}),&\rho (x_{2},z_{2}),&\ldots ,&\rho (x_{n},z_{n}) \end{bmatrix}^{T} \end{aligned}$$
(16)

where nonlinearity \(\rho (x,z)\) is represented by using the smooth hysteresis Bouc–Wen model, Wen (1976)

$$\begin{aligned} \rho (x_{i},z_{i})=&\alpha _{i} \kappa _{i}x_{i}+(1-\alpha _{i})\kappa _{i} z_{i} \end{aligned}$$
(17)
$$\begin{aligned} \dot{z}_{i}=&\frac{A_{i}\dot{x}_{i}-\nu _{i}(\beta _{i} |\dot{x}_{i}|z_{i}^{\sigma _{i}-1}z-\gamma _{i} \dot{x}_{i}|z_{i}|^{\sigma _{i}})}{\eta _{i}} \end{aligned}$$
(18)

where the subscript \(i=1,2\ldots ,n\) refers to the floor number; \(\alpha , \ \kappa \ \text {and} \ \gamma \), are the ratio of postyield, the preyield stiffness and the yield deformation, respectively, whereas \(z_{i}\) is the hysteretic displacement of the nonlinear shear building. Generally, \(\beta \ \text {and} \ \gamma \) are called loop parameters and they affect the size, whereas \(\sigma >1\) influences the smoothness on the hysteresis loop. Moreover, \(\nu \) and \(\eta \), are strength and stiffness degradation functions of the dissipated hysteretic energy, respectively, defined as (Ma et al. 2006),

$$\begin{aligned} \eta _{i}(E_{i})=&1.0+\delta _{\eta ,i}E_{i} \end{aligned}$$
(19)
$$\begin{aligned} \nu _{i}(E_{i}) =&1.0+\delta _{\nu ,i}E_{i} \end{aligned}$$
(20)

where \(\delta _{\eta }\) and \(\delta _{\nu }\) are the stiffness and strength degradation ratio, respectively. Generally, these variables are nonnegative and unknown parameters, that will be estimated.

Remark 4

A convenient measure of degradation as a result of structural damage is the dissipated energy from structural hysteresis cycle measured from \(t=0\) to t

$$\begin{aligned} E_{i}(t)=\int ^{t}_{0}z_{i}\dot{x}_{i}\text {d}x \end{aligned}$$
(21)

Note that the systems described in (14) and (18) can be rewritten as a set of nonlinear differential equations subjected to the external force

$$\begin{aligned} m_{1}\ddot{x}_{1}+c_{1}\dot{x}_{1}+\rho (x_{1},z_{1})&=m_{1}\ddot{x}_{g} \end{aligned}$$
(22)
$$\begin{aligned} m_{2}\ddot{x}_{2}+c_{2}\dot{x}_{2}+\rho (x_{2},z_{2})&=m_{2}\ddot{x}_{g} \end{aligned}$$
(23)
$$\begin{aligned}&\vdots \nonumber \\ m_{n}\ddot{x}_{n}+c_{n}\dot{x}_{n}+\rho (x_{n},z_{n})&=m_{n}\ddot{x}_{g} \end{aligned}$$
(24)

equivalents to

$$\begin{aligned} m_{1}\ddot{x}_{1}+c_{1}\dot{x}_{1}+ \alpha _{1} \kappa _{1}x_{1} +(1-\alpha _{1}) \kappa _{1} z_{1}&=m_{1}\ddot{x}_{g} \end{aligned}$$
(25)
$$\begin{aligned} m_{2}\ddot{x}_{2}+c_{2}\dot{x}_{2}+ \alpha _{2} \kappa _{2}x_{2}+(1-\alpha _{2}) \kappa _{2}z_{2}&=m_{2}\ddot{x}_{g} \end{aligned}$$
(26)
$$\begin{aligned}&\vdots \nonumber \\ m_{n}\ddot{x}_{n}+c_{n}\dot{x}_{n}+ \alpha _{n} \kappa _{n}x_{n}+(1-\alpha _{n}) \kappa _{n} z_{n}&=m_{n}\ddot{x}_{g} \end{aligned}$$
(27)

Taking into account that parameters and the internal state \(z_{i}\) of Bouc–Wen hysteretic model (18) are unknown, then both must be estimated, as

$$\begin{aligned} \dot{{\hat{z}}}_{i}=&\frac{{\hat{A}}_{i}\dot{x}_{i}- \hat{\nu }_{i}(\hat{\beta }_{i} |\dot{x}_{i}|{\hat{z}}_{i}^{\sigma _{i}-1} {\hat{z}}-\hat{\gamma }_{i} \dot{x}_{i}|{\hat{z}}_{i}|^{\sigma _{i}})}{\hat{\eta }_{i}} \end{aligned}$$
(28)
$$\begin{aligned} \hat{\eta }_{i}(E_{i})=&1.0+\delta _{\eta ,i}{\hat{E}}_{i} \end{aligned}$$
(29)
$$\begin{aligned} \hat{\nu }_{i}(E_{i}) =&1.0+\delta _{\nu ,i}{\hat{E}}_{i} \end{aligned}$$
(30)
$$\begin{aligned} {\hat{E}}_{i}(t)=&\int ^{t}_{0}{\hat{z}}_{i}\dot{x}_{i}\text {d}x \end{aligned}$$
(31)

In this work, FDCNN is proposed to identify the Bouc–Wen hysteretic displacement 28, as an important application for damage detection in building structures through an energy analysis. The use of CNN in real application overcomes the parameter and state estimation problem. The inclusion of random filters in the FDCNN design eliminates measurement noise in acceleration data, as will be shown later.

3 Frequency domain CNN architecture

In this section, the development of the proposal frequency domain CNN is presented. The main differences with respect time domain CNN is that a discrete Fourier Transform (DFT) is applied to the inputs and to the filters in convolutional layers; thus, the operations become simpler from the computational point of view. Also no activation function is required. Definition and uses of DFT in the FDCNN are shown in “Appendix A.”

Consider an unknown discrete-time nonlinear system

$$\begin{aligned} y(q)=f\left( x(q)\right) .\;\;\;\; x(q+1)=g\left( x(q),u(q)\right) \end{aligned}$$
(32)

where y(q) is the scalar output, x(q) the internal state, u(q) the input, \(f(\cdot )\) and \(g(\cdot )\) smooth functions, \(f,g\in C^{\infty }\) .

A nonlinear autoregressive exogenous (NARX) model for (32) is defined as

$$\begin{aligned} y(q)=\varPhi \left[ \varpi \left( q\right) \right] \end{aligned}$$
(33)

and the system dynamics are represented by the unknown nonlinear difference equation \(\varPhi \), where

$$\begin{aligned} \varpi \left( q\right) =[y\left( q-1\right) ,\ldots ,y\left( q-n_{y}\right) ,u\left( q\right) ,\ldots ,u\left( q-n_{u}\right) ]^{T} \end{aligned}$$
(34)

y(q) and u(q) into (34) represent, respectively, the measurable output and input for the system, with \(n_{y}\) and \(n_{u}\) the regression order, respectively, which are unknown.

Consider the system (33) to be estimated and regard the same input for the CNN.A discrete Fourier transform (DFT) is applied to this input to obtain a frequency representation of the same length, i.e., \(\varPhi ^{(0)}=\mathcal {F}(\hat{\varpi })\). In this representation, it is assumed that DC frequencies are shifted to the center of the domain.

In (35), the output layer of this new frequency domain convolutional neural network (DFCNN) is shown, where \(\hat{y_{F}}(q)\) is the scalar output signal of the FDCNN. This layer now is a fully connected layer, with \(\varUpsilon \) is the output of the last subsample layer and \(V^{(\ell )}\in R^{L_{2}}\) are the weights in the output layer.

$$\begin{aligned} \hat{y_{F}}(q)=V^{(\ell )\text {T}}\varUpsilon \end{aligned}$$
(35)

For the convolutional layers, random filters are defined like \(\varrho _{i}^{(\ell )}\in \mathfrak {R}^{f_{\ell }}\), where \(i=1,2,\ldots ,h_{2}\). Furthermore, \(h_{2}\) is the total of filters in the current layer \(\ell \). These filters also go through a DFT to match the dimension of the output of the previous layer (for the first layer, the transform will match the size of \(\varPhi ^{(0)}\)), i.e., \(\varGamma _{i}^{(\ell )}=\mathcal {F}(\varrho _{i}^{(\ell )})\), for this conversion, matrix F defined in section A, a new matrix is built for each size of the data. So, the output of a convolutional layer is defined as the element-wise product \((\odot )\) of the output of previous layer and the filters, such as

$$\begin{aligned} \varPsi _{i}^{(\ell )}=\varPsi _{i}^{(\ell - 1)} \odot \varGamma _{i}^{(\ell )} \end{aligned}$$
(36)

Figure 1 shows how both convolutional layers work. While in time domain filters has less elements than the filters in frequency domain, the convolution between them and the input requires more operations. Also no activation function is used in frequency domain as it can be seen.

Fig. 1
figure 1

Convolutional layer operations

For the subsample layers, a spectral pooling operation is applied (Rippel et al. 2015). Here, the idea is to remove high frequencies to reduce size of the input. \(s^{(\ell )}\) represents the number of elements to be removed, so output of these layers is defined as

$$\begin{aligned} \varPsi _{i}^{(\ell )}=\text {Shrink}(\varPsi _{i}^{(\ell - 1)},s^{(\ell )}) \end{aligned}$$
(37)

The Shrink operation in the spectral pooling removes \(s^{(\ell )}\) elements in its input, two from the top and two from its bottom, so the output remains symmetric. Initially, this operation was introduced by Rippel et al. (2015). The Shrink is defined in Table 1:

Table 1 Algorithm 1: spectral pooling

For subsample layer, the shrink operation is based on algorithm presented in Table 2. Since Algorithm 1 is intended for a general case where matrices are used, some modifications were made. The first modification consists in eliminating the two steps where the DFT operation is performed as well as its inverse. Given the structure of FDCNN, it is not necessary to be passing between domains (time to frequency and vice versa) in each layer of the network, but only in the input layer and at the end of the subsampling layers. The second modification is made by eliminating step 3 of the algorithm, because this operation deals with the case where the representations do not have the appropriate dimension and therefore a real output cannot be obtained. This problem is eliminated by proposing an input with adequate dimensions in the network, in such a way that the convolutional and subsampling layers and the inverse DFT are applied; hence, real values are obtained. Finally, given that the proposed FDCNN works with vectors instead of matrices, step 2 is carried out while retaining the central sub vector of the input. These modifications are reflected in Table 2.

Table 2 Algorithm 2: spectral pooling proposed

Note that all representations in frequency domain have an odd size dimension to simplify calculations and in further operations we can obtain a time domain representation adequate. As previously mentioned, as many as they are needed, convolutional and subsample layers can be connected one after other. After this cascade connection of layers, the output of the last subsample layer has to be mapped back, i.e., \(\psi _{i}^{(\ell )}=\mathcal {F}^{-1}\left( \varPsi _{i}^{(\ell )}\right) \) and stacked in a single vector

$$\begin{aligned} \varUpsilon =\left[ \psi _{1}^{(\ell )T}, \psi _{2}^{(\ell )T}, \ldots , \psi _{h_{2}}^{(\ell )T}\right] ^{T} \end{aligned}$$
(38)

The complete architecture of FDCNN is illustrated in Fig. 2. The equations for training are described in the next section.

Fig. 2
figure 2

Frequency domain convolutional neural network for system identification

3.1 Training of frequency domain CNN

The backpropagation algorithm is used for training. The cost function is

$$\begin{aligned} J(q)=\frac{1}{2}e^{2}(q)=\left[ \hat{y_{F}}(q)-y(q)\right] ^{2} \end{aligned}$$
(39)

For the fully connected layer, the gradient of J with respect to synaptic weights V is:

$$\begin{aligned} V(q+1)=V(q)-\eta _{o}\frac{\partial J}{\partial V}=V(q)-\eta _{F} e \varUpsilon \end{aligned}$$
(40)

\(\eta _{F}\) is the learning rate defined one for each layer.

The propagated error to the previous layer is

$$\begin{aligned} \frac{\partial J}{\partial \varUpsilon }=\frac{\partial J}{\partial e}\frac{\partial e}{\partial \hat{y_{F}}}\frac{\partial \hat{y_{F}}}{\partial \varUpsilon }=eV \end{aligned}$$
(41)

Because \(\varUpsilon \) is the stacked vector of the inputs of subsample layers, we take the same amount of elements that each \(\psi _{1}^{(\ell )}\) gave in forward stage. Next, the propagated gradient to each one of this outputs has to be transformed using the DFT by applying the DFT matrix inverse corresponding to the size we want to match, so it can be propagated through the subsample and convolutional layers.

For the subsample layers, the spectrum has to equal the size of the previous convolutional layer, so, in this layer the only operation required is to increase the frequency representation of the propagated error, i.e.,

$$\begin{aligned} \frac{\partial J}{\partial \varPsi _{i}^{(\ell -1)}}=up\left( \frac{\partial J}{\partial \varPsi _{i}^{(\ell )}}\right) \end{aligned}$$
(42)

where \(up(\cdot )\) is an operation realized to increase the spectral representation. In order to realize this operation after the gradient goes through a convolutional layer, which is already in frequency domain, properties of DFT are used and a series of matrix multiplication are realized to match the size of previous layers; this prevents in some way making the inverse transform to time domain and back again to frequency domain.

For the convolutional layers, the update is as follows:

$$\begin{aligned} \frac{\partial J}{\partial \varGamma ^{(\ell )}}=\frac{\partial J}{\partial \varPsi ^{(\ell )}}\odot \varPsi ^{(\ell -1)} \end{aligned}$$
(43)

This is the element-wise product between the propagated error and the output of previous layer. Since each element of the filter can be updated separately, then (43) can be written as:

$$\begin{aligned} \frac{\partial J}{\partial \varGamma _{a}^{(\ell )}}=\frac{\partial J}{\partial \varGamma _{a}^{(\ell )}}\varPsi _{a}^{(\ell -1)} \end{aligned}$$
(44)

\(\varGamma _{a}^{\ell }\) is each element of the filters represented in frequency domain with \(a=1,2,\ldots \,f_{\ell }\).

To obtain the propagated gradient to the previous layers, we have

$$\begin{aligned} \frac{\partial J}{\partial \varPsi ^{(\ell -1)}}=\frac{\partial J}{\partial \varPsi ^{(\ell )}}\odot \varGamma ^{(\ell )} \end{aligned}$$
(45)

which is also the element-wise product between the filters and the propagated gradient to the current layer.

3.2 Frequency analysis in a convolutional layer

The convolutional layer as defined in (36) represents the element-wise product between the output of previous layer and the filters in the current layer, and it is interesting to obtain some more information about this operation; in that sense, a proposition is formulated showing the relationship between the output changes with respect to the input change into a convolutional layer.

Proposition 1

Consider a convolutional layer in a FDCNN defined as:

$$\begin{aligned} \varPsi _{i}^{(\ell )}=\varPsi _{i}^{(\ell - 1)} \odot \varGamma _{i}^{(\ell )} \end{aligned}$$
(46)

where \(\varPsi _{i}^{(\ell )}\) is the output of the current layer \(\ell \), \(\varPsi _{i}^{(\ell - 1)}\) represent the output of previous layer, in the case \(\ell =1\), \(\varPsi _{i}^{(0)}=\varPhi ^{(0)}\), \(\varGamma _{i}^{(\ell )}\) is the frequency domain representation of the filters in this layer and \(i=1,2,\ldots ,h\), where h is total number of hyperparameters in the layer.

The relationship between \(\varDelta \varPsi _{i}^{(\ell )}(q)\) and \(\varDelta \varPsi _{i}^{(\ell -1)}(q)\) is

$$\begin{aligned} \frac{\varDelta \varPsi _{i}^{(\ell )}(q)}{\varDelta \varPsi _{i}^{(\ell -1)}(q)} = \varGamma _{i}^{(\ell )}(q) \end{aligned}$$
(47)

when the filters are updated using the following rule

$$\begin{aligned} \varDelta \varGamma _{i}^{(\ell )} (q) = \frac{\varPsi _{i}^{(\ell )}(q-1) -\varGamma _{i}^{(\ell )}(q)\varPsi _{i}^{(\ell -1)}(q-1)}{\varDelta \varPsi _{i}^{(\ell -1)}(q)} \end{aligned}$$
(48)

Proof

Differentiating (46), it is obtained

$$\begin{aligned} \varDelta \varPsi _{i}^{(\ell )}(q) = \varDelta \varGamma _{i}^{(\ell )}(q) \varPsi _{i}^{(\ell -1)}(q) + \varGamma _{i}^{(\ell )}(q) \varDelta \varPsi _{i}^{(\ell -1)}(q) \end{aligned}$$
(49)

with the aid of the definitions \(\varDelta \varPsi _{i}^{(\ell -1)}(q) = \varPsi _{i}^{(\ell -1)}(q)- \varPsi _{i}^{(\ell -1)}(q-1) \), \(\varDelta \varPsi _{i}^{(\ell )}(q) = \varPsi _{i}^{(\ell )}(q)- \varPsi _{i}^{(\ell )}(q-1) \) and \(\varDelta \varGamma _{i}^{(\ell )}(q) = \varGamma _{i}^{(\ell )}(q) - \varGamma _{i}^{(\ell )}(q-1) \), it is followed that

$$\begin{aligned} \begin{aligned} \varDelta \varPsi _{i}^{(\ell )}(q)&= \varDelta \varGamma _{i}^{(\ell )}(q)( \varPsi _{i}^{(\ell -1)}(q)+ \varPsi _{i}^{(\ell -1)}(q-1)) \\&\quad + \varGamma _{i}^{(\ell )}(q) \varDelta \varPsi _{i}^{(\ell -1)}(q)\\&= \varDelta \varGamma _{i}^{(\ell )}(q) \varPsi _{i}^{(\ell -1)}(q-1) \\&\quad + (\varDelta \varGamma _{i}^{(\ell )}(q) + \varGamma _{i}^{(\ell )}(q))\varDelta \varPsi _{i}^{(\ell -1)}(q) \\&= (\varGamma _{i}^{(\ell )}(q) - \varGamma _{i}^{(\ell )}(q-1) ) \varPsi _{i}^{(\ell -1)}(q-1) \\&\quad + (\varDelta \varGamma _{i}^{(\ell )}(q) + \varGamma _{i}^{(\ell )}(q))\varDelta \varPsi _{i}^{(\ell -1)}(q) \\&= \varGamma _{i}^{(\ell )}(q) \varPsi _{i}^{(\ell -1)}(q-1) - \varPsi _{i}^{(\ell )}(q-1) \\&\quad + (\varDelta \varGamma _{i}^{(\ell )}(q) + \varGamma _{i}^{(\ell )}(q))\varDelta \varPsi _{i}^{(\ell -1)}(q) \\&= \varGamma _{i}^{(\ell )}(q) \varPsi _{i}^{(\ell -1)}(q-1) - \varPsi _{i}^{(\ell )}(q-1) \\&\quad + \varDelta \varGamma _{i}^{(\ell )}(q)\varDelta \varPsi _{i}^{(\ell -1)}(q)+ \varGamma _{i}^{(\ell )}(q)\varDelta \varPsi _{i}^{(\ell -1)}(q)\\ \end{aligned} \end{aligned}$$
(50)

Defining

$$\begin{aligned} \varDelta \varGamma _{i}^{(\ell )} (q) = \frac{\varPsi _{i}^{(\ell )}(q-1)-\varGamma _{i}^{(\ell )}(q)\varPsi _{i}^{(\ell -1)}(q-1)}{\varDelta \varPsi _{i}^{(\ell -1)}(q)} \end{aligned}$$
(51)

replacing (51) in (50)

$$\begin{aligned} \varDelta \varPsi _{i}^{(\ell )}(q) =\varGamma _{i}^{(\ell )}(q) \varDelta \varPsi _{i}^{(\ell -1)}(q) \end{aligned}$$
(52)

or

$$\begin{aligned} \frac{\varDelta \varPsi _{i}^{(\ell )}(q)}{\varDelta \varPsi _{i}^{(\ell -1)}(q)}= \varGamma _{i}^{(\ell )}(q) \end{aligned}$$
(53)

Remark 5

Proposition1 shows that using a different training rule over the filters of a convolutional layer allows to reach a proportional relationship between the output variations and the input variations.

Remark 6

This relationship can be applied for a direct analysis on several convolutional layers connected in cascade, showing that variations in the input data are only affected proportionally, decreasing or increasing the main components of these data according to each filter.

3.2.1 Sensibility of FDCNN to noisy data

Proposition1 shows the relationships in a convolutional layer, and this result can be extended to a cascade connection of convolutional layer, using a ReLU activation function after each operation and spectral pooling layer, it yields

$$\begin{aligned} \begin{aligned}&\varDelta \varPsi _{i}^{(\ell )} = \\&SP\left( f\left( \varGamma _{i}^{(\ell )} \odot \cdots \odot SP\left( f\left( \varGamma _{i}^{(2)}\odot SP\left( f\left( \varGamma _{i}^{(1)} \odot \varDelta \varPhi ^{(0)}\right) \right) \right) \right) \right) \right) \\ \end{aligned} \end{aligned}$$
(54)

For ease notation, the instant (q) is omitted, but this analysis can be done in each iteration of the FDCNN. The activation function f keeps the positive part of its argument; otherwise, set them to zero and they can be omitted. Hence, the analysis is carried out by means of positive values after the activation function. Therefore, Eq. (54) can be rewritten as

$$\begin{aligned} \begin{aligned}&\varDelta \varPsi _{i}^{(\ell )}=\\&SP\left( \varGamma _{i}^{(\ell )} \odot \cdots \odot SP\left( \varGamma _{i}^{(2)}\odot SP\left( \varGamma _{i}^{(1)}\odot \varDelta \varPhi ^{(0)}\right) \right) \right) \end{aligned} \end{aligned}$$
(55)

The spectral pooling operation reduces the frequency representation of its arguments by eliminating the highest frequency component and its conjugate. Thus, elements in (55) will not be considered for the analysis, reducing the expression to

$$\begin{aligned} \varDelta \varPsi _{i,a}^{(\ell )}= \varGamma _{i}^{(\ell )} \varGamma _{i,a}^{(\ell -1)} \cdots \varGamma _{i,a}^{(1)} \varDelta \varPhi _{a}^{(0)}=\varGamma _{T}\varPhi _{a}^{(0)} \end{aligned}$$
(56)

where \(a=1,2,\ldots ,q\), q indicates the remaining elements that were not eliminated by the spectral pooling. This equation represents the interaction between the convolutional output, pooling layers and the FDCNN input. Considering the input defined in (57) that includes measurement noise, i.e., \(\varPhi _{a}^{(0)} =\varPhi _{a,0}^{(0)}+\lambda \), where \(\lambda \) is a high-frequency bounded noise, it is also assumed that the main frequency of the system is much lower than that of the noise.

$$\begin{aligned} \hat{\varpi }\left( q\right) =[{\hat{y}}\left( q-1\right) , \ldots ,{\hat{y}}\left( q-r_{1}\right) ,u\left( q\right) ,\ldots ,u\left( q-r_{2}\right) ]^{T} \end{aligned}$$
(57)

with \(r_{1}\) and \(r_{2}\) being the regression order. \(r_{1}\ne n_{y} \) and \(r_{2}\ne n_{u}\). Under this assumption, absolute value of (56) is obtained

$$\begin{aligned} |\varDelta \varPsi _{i,a}^{(\ell )}|=|\varGamma _{T}||\varPhi _{a}^{(0)}| \end{aligned}$$
(58)

substituting \(\varDelta \varPhi _{a}^{(0)} = \varDelta \varPhi _{a,0}^{(0)}+\varDelta \lambda \)

$$\begin{aligned} |\varDelta \varPsi _{i,a}^{(\ell )}|=|\varGamma _{T}||\varDelta \varPhi _{a,0}^{(0)}+\varDelta \lambda | \end{aligned}$$
(59)

using the triangle property,

$$\begin{aligned} |\varDelta \varPsi _{i,a}^{(\ell )}|\le |\varGamma _{T}| |\varDelta \varPhi _{a,0}^{(0)}\Vert +|\varGamma _{T}| |\varDelta \lambda | \end{aligned}$$
(60)

finally considering \(|\varDelta \lambda |\le \mathcal {M}\), with \(\mathcal {M}\in \mathfrak {R}, \mathcal {M}>0\)

$$\begin{aligned} |\varDelta \varPsi _{i,a}^{(\ell )}|\le |\varGamma _{T}| |\varDelta \varPhi _{a,0}^{(0)}|+|\varGamma _{T}| \mathcal {M} \end{aligned}$$
(61)

From (61), the first term corresponds to the system response through the layers in the FDCNN, whereas the second one corresponds to the effect of noise across the network. Since the spectral pooling is used, the highest impact frequency components are eliminated, leaving only the low-frequency components whose contribution to the response is minimal. In this way, the first term provides the greatest response, while the output is bounded to a region very close to it.

The internal structure of \(\varGamma _{T}\) represents the element-wise product of filters in each layer that represent filters in frequency domain. Considering that

$$\begin{aligned} \varGamma _{i,a}^{(j)}=\mathfrak {R}|\varGamma _{i,a}^{(j)}|+i\mathfrak {I}|\varGamma _{i,a}^{(j)}| \end{aligned}$$
(62)

for \(j=1,2,\ldots ,\ell \). For two convolutional layers, the filters part in (61) can be expressed as

$$\begin{aligned} |\varGamma _{T}| =\left| \varGamma _{i,a}^{(2)} \varGamma _{i,a}^{(1)}\right| \end{aligned}$$
(63)

Moreover, using (62) a more detailed expression for noise data is found, where the interaction between real and imaginary parts of filters is shown

$$\begin{aligned} \begin{aligned} |\varGamma _{T}|&= \left| \left( \mathfrak {R}(\varGamma _{i,a}^{(2)} )+i\mathfrak {I}(\varGamma _{i,a}^{(2)} )\right) \left( \mathfrak {R}(\varGamma _{i,a}^{(1)} )+i\mathfrak {I}(\varGamma _{i,a}^{(1)} )\right) \right| \\&= \biggl |\left( \mathfrak {R}(\varGamma _{i,a}^{(2)})\mathfrak {R}(\varGamma _{i,a}^{(1)})-\mathfrak {I}(\varGamma _{i,a}^{(2)} )\mathfrak {I}(\varGamma _{i,a}^{(1)} )\right) \\&\;\;\;\; +i\left( \mathfrak {R}(\varGamma _{i,a}^{(1)} )\mathfrak {I}(\varGamma _{i,a}^{(1)} )+\mathfrak {I}(\varGamma _{i,a}^{(2)} )\mathfrak {R}(\varGamma _{i,a}^{(1)} ) \right) \biggl | \\&= \biggl \{\left( \mathfrak {R}(\varGamma _{i,a}^{(2)})\mathfrak {R}(\varGamma _{i,a}^{(1)})-\mathfrak {I}(\varGamma _{i,a}^{(2)} )\mathfrak {I}(\varGamma _{i,a}^{(1)} )\right) ^{2} \\&\;\;\;\; +\left( \mathfrak {R}(\varGamma _{i,a}^{(1)} )\mathfrak {I}(\varGamma _{i,a}^{(1)} )+\mathfrak {I}(\varGamma _{i,a}^{(2)} )\mathfrak {R}(\varGamma _{i,a}^{(1)} )\right) ^{2}\biggl \}^{\!1/2} \\ \end{aligned} \end{aligned}$$
(64)

In general, when adding more layers, the operations are repetitive and can be denoted as follows:

$$\begin{aligned} |\varGamma _{T}|&=\left| \prod _{j}^{\ell } \left( \varGamma _{i,a}^{(j)}\right) \right| \nonumber \\ |\varGamma _{T}|&= \left| \prod _{j}^{\ell }\left( \mathfrak {R}(\varGamma _{i,a}^{(j)}) +i\mathfrak {I}(\varGamma _{i,a}^{(j)})\right) \right| \nonumber \\ |\varGamma _{T}|&=\sqrt{\left( \mathfrak {R}(\varGamma _{T})\right) ^{2}+\left( \mathfrak {I}(\varGamma _{T})\right) ^{2}} \end{aligned}$$
(65)

The last equation shows the relationships between the real and imaginary parts of each filter and their interaction with others in different layers.

4 Experimental validation

The experimental two-storey building prototype used in this study is depicted in Fig. 3, constructed of aluminum with dimensions \((32.5\times 53)\) cm and height of 1.2 m. All of columns have a rectangular cross section with width of \((0.635\times 2.54)\) cm, with 58 cm of interstorey separation for the first floor and 62 cm for the remaining floor. The building is mounted over a shake table actuated by servomotors from Quanser, model I-40. During experiments, the structure is excited with the Northridge earthquake for a duration of 25 seconds that is fitted in amplitude to be in agreement with the structure and shown in Fig. 4. The building is equipped with Analog Devices accelerometers XL403A model, with a measuring range from 1 to 15 g and width band \([1\times 800]\) Hz, to measure the responses at every storey and at the base. Data acquisition was carried out by using a RT-DAC/USB2 series electronic boards from Inteco. The acquisition programs were operated in Windows 7 with Matlab 2011a/Simulink. The communication between these boards and Simulink were carried out using C compiler.

Fig. 3
figure 3

Experimental prototype

Fig. 4
figure 4

Northridge earthquake signal

From experiments, vibration frequencies of the reduce scale building structure are \(f_{i} = 1.758\) Hz and \(f_{2} = 4.0\) Hz, extracted by means of the Fourier spectra of the building acceleration data. On the other hand, from materials properties, preliminary information is obtained. Stiffness values \(k_{1}=12011\) N/m and \(k_{2}=12108\) N/m were calculated using the nominal values of the mechanical properties (Hibberler 2011), whereas the masses were measured directly giving \(m_{1}=2.034\) kg and \(m_{2}=2.534\) kg. Based on experimental data, damping Rayleigh is calculated assuming that the first two modes of the structure have a damping factor of \(2\%\), i.e., \(\xi _1=\xi _2=0.02\). The values were fixed, although during experiments \(\xi _1\) and \(\xi _2\) varied depending on the excitation signal. Moreover, assuming that during seismic activity only acceleration can be measured directly, velocity and displacement are estimates from available accelerations data \(\ddot{x}_i\text {,} \ \text {with}\ i=1,2\ldots , n\). Estimates are obtained employing the following filter, consisting of two high-pass (hp) filters connected in cascade with the integrator, defined by

$$\begin{aligned} f(s)=\underbrace{\frac{s^{2}}{s^{2}+3.77+3.55}}_{hp} \times \underbrace{\frac{s^{2}}{s^{2}+3.77+3.55}}_{hp} \times \frac{1}{s} \end{aligned}$$
(66)

where the cutting frequency is set at 0.3 Hz to removes the low-frequency components and to avoid drift.

For damage detection propose, the Bouc–Wen hysteretic model is introduced to represent the load deformation curves obtained during seismic activity test. The proposed method here postulated that stiffness loss reduces the capacity of the building to energy dissipation resulting from structural damage. In this sense, a system identification based on CNN is developed following the architecture shown in Fig. 5. From this, structural parameters are only employed to calculate analytical Bouc–Wen hysteretic state \(z_{i}\) required for CNN training stage in the identification system, by means of the backpropagation algorithm defined in previous sections. The hysteretic displacement signal corresponding to each storey is estimated by means of frequency domain CNN. Later, the FDCNN uses acceleration and velocity measured data for damage assessment, and analytical Bouc–Wen model is used as a reference signal to compare our results.

Despite the success reported in the literature about TDCNN, most applications to physical systems are mainly for image recognition. Unlike, this paper evaluates experimentally how the FDCNN performance can be affected when measured noise is presented in data. On the other hand, in most practical applications it is difficult to know accurately the structural-system bandwidth. So even, the implementation of signal preprocessing stages through filters also does not guarantee a good performance if the cutoff frequency does not match the system bandwidth. Therefore, finding the correct cutoff frequency can be a fairly difficult task to achieve. An alternative to these methods is to use FDCNN which incorporates random filters to strengthen the algorithm against measurement noise. In Sect. 3.2.1, a sensitivity analysis has been carried out that proves the sensitivity of FDCNN to overcome measurement noise. Moreover, the computing time is shorter compared to the TDCNN, as will be demonstrated in the experimental tests evaluated in the next section.

Fig. 5
figure 5

System identification process

4.1 System identification task

To validate the performance of frequency domain CNN, obtained results are compared with two different identification system scheme based on time domain convolutional neural network (TDCNN) and neural network (NN), respectively, described in “Appendices B and C.” All tests consist of vibration data containing measurement noise and offset. The final goal is to investigate the versatility to estimate the hysteretic state by using CNN and then make the damage diagnosis using the energy dissipated in the hysteretic cycle. Experiments were carried out in a 2.6 GHz Intel Core i7 processor with 16 GB in RAM.

4.1.1 Identification system using frequency domain CNN

In this subsection, the frequency domain CNN is used to identify the hysteretic state \(z_{i}\) at each storey. Is important to note that the FDCNN results will be used as reference to be compared with results employing TDCNN and NN.

The proposed frequency domain convolutional neural network (FDCNN) consists of 2 convolutional layers Conv1f and Conv3f each one with 5 filters, \(h_{2}=5 \), the length of the filters are \(f_{1} = f_{3}=3\). Two subsample layers Sub2 and Sub4; here, the frequency spectrum is reduced by 4 elements in each layer, eliminating the frequency components, i.e., \(s^{(2)}=s^{(4)}=4\). Then, the fully connected layer has 35 synaptic weights. For this proposed architecture, the input (57) only take 3 elements for each signal that consist of acceleration at each storey plus acceleration at ground level, velocity, position and hysteretic displacement estimated by the FDCNN.

$$\begin{aligned} \left. \begin{aligned} \hat{\varpi }\left( q\right) =&[{\hat{y}}\left( q-1\right) , \ldots ,{\hat{y}}\left( q-3\right) , \ddot{x}\left( q-1\right) ,\ldots ,\ddot{x}\left( q-3\right) ,\\&\dot{x}\left( q-1\right) ,\ldots ,\dot{x}\left( q-3\right) , \\&\ddot{x}_{g}\left( q\right) ,\ldots ,\ddot{x}_{g}\left( q-2\right) ]^{T} \end{aligned} \right. \quad \end{aligned}$$
(67)

Experimental data consist of 11 different tests, from which 9 of them are used for training and two for testing. Each experiment lasts 25 s with sampling time of 5 ms. The excitation signal comes from the Northridge earthquake, shown in Fig. 4, which is adjustment to match with building structure prototype. Experiments consist in using the row data just as it is acquired from the sensors, hoping the FDCNN can deal with the noisy data. Figures 6 and 7 show the results corresponding with the identification of the internal state \(z_{i}\) of the hysteretic model. From both figures, it is evident that an accurate estimation of the hysteretic state is achieved, since estimate \(z_{i}\) converges to reference signal. The inclusion of spectral pooling operation in FDCNN eliminates measurement noise and offset in acceleration data. The mean square error obtained is \(2.0901\times 10^{-9}\), which is also less than when time domain CNN is used. Computational time is \(3.0164\times 10^{-9}\) s for a 5-epoch training.

Moreover, note that Figs. 6 and 7 present an oscillatory behavior. Since the seismic excitation signal is oscillatory (harmonic motion), the response measured in each floor is also oscillatory. Therefore, the estimated hysteretic state is also oscillatory because it depends on the estimated velocities at each floor, as defined in Eq. (18).

Fig. 6
figure 6

First-storey hysteretic displacement using FDCNN with noisy data

Fig. 7
figure 7

Second-storey hysteretic displacement using FDCNN with noisy data

4.1.2 Identification system using time domain CNN

In this subsection, the time domain CNN presented in “Appendix B” is used to identify the hysteretic state \(z_{i}\) at each storey. For each one of them, a different CNN is used and they do not depend on each other. Further work will focus on the design of a single architecture that describes the complete building dynamics. The seismic excitation used here for data test is also the Northridge earthquake. For the time domain CNN, the hyperparameters are: 2 convolutional layers Conv1 and Conv3, each one with 5 filters, \(h=5 \), the length of the filters \(f_{1} = f_{3}=3\), and two subsample layers Sub2 and Sub4, in each layer every 2 elements; one is removed and only the one with the highest value is kept, \(s_{2}=s_{4}=2\). Given the proposal architecture, the fully connected layer Fu5 will have 50 synaptic weights, \(L=50\). The learning rate for all the layers is set to 0.3; the input of CNN is a vector building by 4 data of \({\hat{y}}\) estimated by the CNN, 8 acceleration data, of which 4 correspond to the excitation in the ground level and the remaining four correspond to the acceleration of each floor. Finally, 4 velocities plus 4 displacement data were also used. Therefore,  (77) can be described as (68)

$$\begin{aligned} \left. \begin{aligned} \hat{\varpi }\left( q\right) =&[{\hat{y}}\left( q-1\right) ,\ldots ,{\hat{y}}\left( q-4\right) , \ddot{x}\left( q-1\right) ,\ldots ,\ddot{x}\left( q-4\right) ,\\&\dot{x}\left( q-1\right) ,\ldots ,\dot{x}\left( q-4\right) , \\&\ddot{x}_{g}\left( q\right) ,\ldots ,\ddot{x}_{g}\left( q-3\right) ]^{T} \end{aligned} \right. \quad \end{aligned}$$
(68)

Hyperparameters initialization for TDCNN are randomly choose. The output synaptic weights are between \(\left[ -1,1\right] \), and filters in convolutional layers are within the range \(\left[ -\frac{1}{\sqrt{j}},\frac{1}{\sqrt{j}}\right] \), where j is the length of the input. It is important to point that like in the previous section, experimental data are from 11 tests, of which 9 of them are used for training and two for testing. Each experiment lasts 25 s with sampling time of 5 ms. In order to identify the hysteretic displacement of the building, two different identification tasks were carried out.

(a) The first of them consists in using the row data just as it is acquired from the sensors. Figures 8 and 9 show the identification results of Bouc–Wen hysteretic state for the first and second floors, respectively. From these figures, it can be observed that in both cases the parametric convergence is not achieved, which was expected due to the presence of noise in the measurement and offset. The means square error (MSE) obtained for the first floor is \(1.0127\times 10^{-7}\) and \(1.5168\times 10^{-7}\) for the second that is too small because of the magnitude of the signal, which is in order of \(10^{-4}\). Computational time required in this experiment was 158.52 s for a 5-epoch training.

Fig. 8
figure 8

First-storey hysteretic displacement using TDCNN with noisy data

Fig. 9
figure 9

Second-storey hysteretic displacement using TDCNN with noisy data

(b) The second identification task consists in using time domain CNN plus filter to eliminate measurement noise and therefore improve the performance of the identification scheme using CNN. The network configuration is the same, previously described in this section. For data processing, a third-order Butterworth filter is added to clean the signal, reducing the components of high and low frequencies. The bandwidth of this filter goes from 0.3 Hz to 5 Hz and was designed in Matlab. However, to get a good performance, a previous knowledge of building bandwidth is required; otherwise, the filter does not reduce the important component of frequency where the system is matching. Experimental results are shown in Figs. 10 and 11. From these figures, it is evident that due to filtering, the estimation of the hysteretic model is improved as shown in Figs. 8 and 9. Despite the improvement, convergence is not yet achieved, as a result of the exact lack of bandwidth. However, this situation is so common that it occurs in most systems, because characterization of a building structures is a complicated task. The mean square error (MSE) for stories is \(2.34\times 10^{-8}\) and \(1.984\times 10^{-8}\) for the first storey and the second storey, respectively. It is important to note that experiments with 5 epochs of training took 165.7371 s, which is greater than the previous result without filters.

Fig. 10
figure 10

First-storey hysteretic displacement using TDCNN with filtered data

Fig. 11
figure 11

Second-storey hysteretic displacement using TDCNN with filtered data

4.1.3 Identification system using neural network (NN)

In this subsection, a system identification based on neural network (NN) presented in “Appendix C” is used to estimate the hysteretic displacement \(z_{i}\), with \(i=1,2\). Used vibration data are from 11 tests, of which 9 of them are used for training and two for testing, all of them with duration of 25 s and sampling time of 5 ms. A two-layered neural network (NN) is used for comparisons. Its structure is made up of a hidden layer with 35 nodes, to which a \(\tanh (\cdot )\) activation function is applied; it has only one node in the output layer. The training is done using the BP algorithm with the same amount of data that were used with both CNN methods. In order to identify the hysteretic state at each storey, two different identification tasks were also carried out. The input defined in (65) is also used with the same structure as the one used in the FDCNN.

(a) The first of them, like in the previous section, consists in using the row data just as it is acquired from the sensors. Estimated Bouc–Wen hysteretic displacement corresponding to the first floor and second floor is depicted in Figs. 12 and 13, respectively. From these figures, it can be observed that in both cases the estimation does not converge to reference signal, due to measurement noise and offset contained in vibration data.

The means square error (MSE) obtained for the first floor is \(134.05\times 10^{-10}\) and \(310.54\times 10^{-10}\) for the second. Computational time required in this experiment was 28.35 s in the worst case, for a 5-epoch training.

Fig. 12
figure 12

First-storey hysteretic displacement using NN with noisy data

Fig. 13
figure 13

Second-storey hysteretic displacement using NN with noisy data

(b) The second identification task consists in using NN plus filter to eliminate measurement noise from vibration data. Data processing was carried out by using a third-order Butterworth filter that reduces the components of high and low frequencies, with bandwidth between 0.3 and 5 Hz. Figures 14 and 15 show that due to filtering, the estimation of the hysteretic model is improved as shown in Figs. 12 and 13. Despite the improvement, convergence is not yet achieved, evidencing that estimated states almost converge to reference signal. The mean square error (MSE) for stories is \(2.34\times 10^{-8}\) and \(1.984\times 10^{-8}\) for first storey and second storey, respectively. This experiment with 5 epochs of training took 27.73 s.

Fig. 14
figure 14

First-storey hysteretic displacement using NN with filterated data

Fig. 15
figure 15

Second-storey hysteretic displacement using NN with filterated data

4.1.4 Discussions about identification systems

From obtained results in Sects. 4.1.1, 4.1.2 and 4.1.3, it is evident that the proposed FDCNN has better performance compared to the other two methods; even though the NN is faster in its training, its MSE is higher than the one obtained through FDCNN. In any cases, applying measurement noise in data decreases the performance of the identification methods; however, the FDCNN is barely affected compared to the TDCNN and NN algorithms. Details about these results can be found in Tables 3 and 4, where features like precision, execution time and means square error (MSE) are compared.

Table 3 Comparison of proposed method with a neural network and TDCNN (first storey)
Table 4 Comparison of proposed method with a neural network and TDCNN (second storey)

Since the structural damage assessment is carried out offline in all cases, the precision is the most important feature for a adequate structural health diagnosis. Thus, the system identification architecture based on FDCNN algorithm is the one has the greatest potential for this task, with the highest precision and considerably low execution time. Therefore, once the versatility of the frequency domain CNN for identification system has been demonstrated under environmental noise, we prefer to use only estimation results from FDCNN for damage detection propose. Hence, in the following section, we will only present damage detection results employing data obtained through FDCNN.

4.2 Damage detection in building structure

In this subsection, we investigate damage detection sensibility based on the building capacity to dissipate energy, which is reduced in contrast to nominal conditions. Experiment were carried out reducing the stiffness \(k_{2}\) on the second storey by loosened only one screw in one of fourth column that make up each level. The remaining 3 columns are not modified. The next step consists in extracting the features of damage building from acceleration measurement, when the prototype was subjected to Northridge earthquake. In consequence, the fundamental vibration frequency and the bandwidth also change due to induced damage, reducing \(f_1= 1.733\) Hz and \(f_2=3.97\) Hz. From vibrational analysis, we know that changes in vibration frequencies are a good indicator for damage detection. Unlike, in this paper we use load-deformation curves and dissipated energy changes for damage assessment and diagnosis. This is achieved, employing the frequency domain CNN for a model-based identification described in Sect. 4.1.1 that allows to confirm structural damage through load-deformation curves, obtained after exciting the experimental prototype at basement. A comparison of hysteretic cycles between nominal and damage conditions is shown in Fig. 16 that correspond to the second storey. Results allow to observe that when there is structural damage, the relationship between load and deformation is also reduced significantly, which indicates that the building capacity to dissipate energy is also reduced in contrast to nominal conditions. Moreover, from hysteretic curves we also calculate the dissipated energy to be compared with results in nominal conditions, as shown in Fig. 17 that correspond to the same floor.Footnote 1 From Fig. 17, it can be noticed that the energy is much lower in the presence of damage. Results agree with the raised hypothesis of the problem. When there is structural damage, the capacity of the building to dissipate energy is reduced, which indicates that the building changes the elastic to the plastic zone. Results confirm the effectiveness of the proposed identification scheme for damage detection problem, where Bouc–Wen hysteretic model is a useful tool to capture the degrading energy. Similar results are obtained for the first floor.

Fig. 16
figure 16

Second-storey hysteretic cycle

Fig. 17
figure 17

Second-storey energy

4.3 Discussion

Two different perspectives of CNN and one more based on NN were presented applying them to a real problem in two different situations. Tables 3 and 4 compare the obtained results. The training time is one of the most representative results, where the FDCNN performs 4 times faster than the TDCNN, and both neural networks have similar architecture, same number of layers and same number of filters, only differ in how the operations are treated in the convolutional and subsample layers. In testing stage, the time does not vary so much because of the size of elements in the operation. Nevertheless, it is possible to appreciate that the proposed frequency domain CNN algorithm improves the execution time, even though the hyperparameters are also initialized in the same interval. In the case of the NN-based scheme, it is faster than FDCNN, but its accuracy in the identification system is less, even when both schemes have similar architecture.

Additionally, in the identification system, FDCNN also has a better performance than TDCNN and NN even when they are accompanied by a filter. Under same conditions and similar neural structure, FDCNN is more suitable for identification system rather than the TDCNN. Perhaps improvements could be present changing the architecture of TDCNN with a deeper structure, which is not possible to achieve with the NN design. Another difference is that the proposed FDCNN does not require any activation function as the TDCNN and NN; this also contributes to the reduction in computational time and does not affect the system identification performance.

5 Conclusions

The frequency domain CNN has been proven in this study to be more reliable to use as a identification system rather than time domain CNN in time domain and neural network, as an alternative approach to damage detection in buildings. The results demonstrate that the proposed method is able to learn features from frequency data and achieve higher diagnosis accuracy. Furthermore, FDCNN introduces the spectral pooling operation in its design that attenuates measurement noise and ensures the convergence of the identification scheme. Note that most methods introduce filters as a previous stage to overcome measurement noise. However, this is difficult to achieve if the system bandwidth is not known in advance, unlike FDCNN that does not need this information. Computational time for FDCNN is almost 4 times faster during the training stage, that is useful for applications with bigger data sets. Moreover, the inclusion of the dissipated energy by using the Bouc–Wen hysteretic model to capture the degrading energy that is directly related to the stiffness loss resulting from structural damage in buildings is an alternative study approach. The use of frequency domain CNN for identification system is an interesting alternative to signal processing method. We also recognized that it is necessary to carry out more and extensive research to assess the potential of this approach. However, we do find the results of the our experimental results to be a good step in that direction.