1 Introduction

Large-scale infrastructures such as towers, bridges, roads, and dams are subjected to deterioration due to exposure to the surrounding environment and applied loads which significantly affects their overall life-span. The structural integrity of these infrastructures can be monitored and evaluated using sensor-based structural health monitoring (SHM) techniques. SHM is able to provide useful information about the existing condition of the structure that can prevent any abnormal behaviour and avoid catastrophic failure [1]. Acoustic emission (AE) is a nondestructive technique (NDT) that can monitor and detect minor cracks in civil structures [2]. AE phenomenon is defined as a transient elastic wave generated as an outcome of a material deformation due to damage initiation and damage propagation [3]. Essential and useful information of the health condition of structure can be extracted from AE parameters such as amplitude, counts, rise time, duration, signal strength, and energy. However, these traditional parameters can be sensitive to the surrounding environment, the level of damage and the presence of noise in the measured AE data. Therefore, this study aims to develop an AE sensing-based improved crack detection technique using a deep learning method augmented with time-frequency decomposition.

Researchers have paid great attention to AE technique as a damage detection tool due to its high sensitivity to minor damage. For example, AE technique was used to assess the microcracks in different structural elements such as fibre-reinforced concrete beams and multi-story buildings [4]. In [5], the authors proposed AE technique to detect the damage in a reinforced concrete slab subjected to dynamic load. On the other hand, AE technique was proposed to evaluate the existing health condition of the real-life bridge. AE sensors were placed at different locations of the bridge to collect AE signals under various loading conditions [6]. In another study, AE technique was utilized to detect the severity and location of damage in prefabricated and prestressed concrete elements. The performance of the proposed technique was verified using AE data collected from the reinforced-concrete beam specimens under different loading conditions [7]. AE analysis based on an advanced deep neural network approach was proposed to detect cracks in prestressed concrete specimens. The proposed technique was implemented to monitor two full-scale bridges under ambient conditions [8]. However, most of the above-mentioned studies are based on pattern recognition techniques that require a suitable selection of features to identify the severity and location of the damage [9].

Recently, Deep Learning (DL)-based artificial intelligence techniques have shown the increased prevalence in the field of SHM as they can extract features from 1 and 2D data without being pro-processed by the users [10]. As such, mechanical faults such as bearing deviations, stator and rotor friction, rotor breaks, and poor insulation can be extracted from 2D greyscale images using CNN as demonstrated by [11]. Though many studies have focused on using traditional greyscale or RGB images in DL-based damage detection methods, there are several studies which implement images based on time or frequency domain methods. [12] demonstrated that the CNN architecture, ‘LeNet-5’ was able to efficiently and accurately diagnose faults of mechanical systems using CWT images. Similar conclusions were reached by [13], which demonstrated that ‘LeNet-5’ was able to distinguish between various system health conditions from 2D spectrograms. However, 2D CNN based on time-frequency (TF) images have focused primarily on mechanical systems, and very few studies have contributed to the field of SHM. Therefore, this paper proposes a new method for the localization of damage in civil structures through the classification of CWT images extracted from the AE signals using CNNs.

2 Proposed TF-Based 2D CNN Method

2.1 Empirical Mode Decomposition (EMD)

EMD is a TF-based signal processing method that has been widely applied as a modal identification and damage detection tool for civil structures due to its high performance with nonlinear and non-stationary data [14]. EMD can decompose a multi-component signal into a set of oscillatory waveforms defined as intrinsic mode functions (IMFs) [15]. In order to consider the waveform as IMF, it should meet the following criteria [16]: (a) the difference between the number of extremums and the number of zero-crossings should be equal or differ at most by one in the entire data set, and (b) the mean value of the envelope defined by local maxima and minima is zero at any point. There are some steps that need to be followed to decompose any signal, which is provided in [14]. Once the signal is decomposed, the input signal \(x(t)\) can be written as:

$$x\left(t\right)= \sum\nolimits_{i=1}^{m}{IMF}_{i}\left(t\right)+ {\varepsilon }_{m}(t)$$
(1)

where \({IMF}_{i}\left(t\right)\) represents the IMF of the original signal and \({\varepsilon }_{m}(t)\) is the residual of \(x\left(t\right).\) In this study, EMD is applied to eliminate the presence of noise in AE data and obtain the key AE components (IMFs) that belong to damage, which are used to generate the images using CWT.

2.2 Continuous Wavelet Transform (CWT)

WT is a TF method that can provide a TF representation of the signals in a multi-resolution framework. CWT is one of the powerful TF signal processing approaches that are used in different fields such as image compression, signal noise filtering, and pattern recognition [17]. The CWT of a signal \(x\left(t\right)\) can be expressed as:

$${w}_{n}^{l}\left(x\right)={\int }_{-\infty }^{\infty }x\left(t\right){\psi }^{*}(\frac{t-n}{l} )dt$$
(2)

The inverse CWT (ICWT) can be determined by:

$$x\left(t\right)=\frac{1}{{w}_{\beta }} \underset{-\infty }{\overset{\infty }{\iint }}\frac{1}{\left|l\right|}{w}_{n}^{l}\left(z\right){\psi }^{*}\left(\frac{t-n}{l}\right)dn\frac{dl}{{l}^{2}}$$
(3)

where \({w}_{\beta }\) can be written as:

$${w}_{\beta } ={\int }_{-\infty }^{\infty }\frac{|\psi (\omega ){|}^{2}}{|\omega |} d\omega <\infty$$
(4)

where \(l\) and \(n\) represent scale and translation of the mother wavelet, respectively. The basis function is called mother wavelet \(\psi (t)\), where superscript (*) denotes its complex conjugate. With the appropriate choice of \(l\) and \(n\), the CWT utilizes the shifted and scaled versions of \(\psi\) and subsequently performs its inner product with \(x(t)\). In this paper, CWT is applied to generate spectrograms of key AE components (IMFs) that are used to feed the CNN model to detect and identify the approximate location of the damage.

2.3 Convolutional Neural Network (CNN)

CNNs are a subset of DL Algorithms which were inspired by the visual cortex of animals and the interconnectivity of neurons between the eye (input) and the brain (output) for decision-making processes based on visual data. These networks can consist of several blocks (neurons) which are used to extract the relevant information from the input data accurately. Features are autonomously extracted from input data (x) through convolutional layers, which implement dot product operations to extract weights (W) and bias (b) through a pre-defined kernel. The presence of specific features spatially within images is emphasized by activation functions such as rectified linear units or hyperbolic tangent units, which allows nonlinear relationships to be defined between the input and anticipated output of the data. Once feature extraction has been completed, the results of the convolutional layers are flattened using Fully Connected Layers. Finally, Softmax Layers are used to determine the probability of a particular classification occurring based on the output (Y) defined by:

$$Y=Wx+b$$
(5)

Following classification, the overall performance of the network can be evaluated using various indicators such as accuracy, precision, recall and F1 score based on comparing the predicted and true outputs of the inputted data.

2.4 Proposed Approach

Once AE data \(y(t)\) is measured, EMD method is proposed to decompose the data and extract the key AE components (IMFs) of each AE sensor. Then, CWT is used to provide the TF representation of each IMF extracted from EMD, known as spectrograms. The resulting spectrogram of each IMF is used to feed into a 2D CNN model to identify the potential location of the damage. In this paper, a modified VGG-16 developed by [18] is used to classify the spectrograms of the IMFs. The number of filters of each convolutional layer from the traditional VGG-16 network is reduced by a factor of 8 as the number of classifications and complexity of the features are dramatically reduced compared to the original capacity the network was designed for. Moreover, the length of intermediary fully connected layers is reduced from 4096 to 512, with the final fully connected layer having a length equivalent to the number of classes used in the study. The main steps of the proposed approach are illustrated in Fig. 1.

Fig. 1
figure 1

The flowchart of the proposed approach

3 Numerical Illustration

3.1 Sine Example

In order to validate the performance of the proposed approach, a suite of sine signals containing four different frequencies (e.g., f1 = 1 Hz, f2 = 3 Hz, f3 = 5 Hz, and f4 = 7 Hz). However, a pure sine signal can generate only a single CWT image which is not enough to train the CNN model. To overcome this issue, an ensemble of sine signals with the same frequency is generated by adding noise to the original sine signal, as shown in Eq. (6):

$$X = \sin (2\pi ft) + a$$
(6)

Where f is the signal frequency, and \(\alpha\) is the signal noise component that is a random and normal time-series sequence. Each signal is processed using CWT to create spectrograms, as shown in Fig. 2. 1000 images of each frequency class are extracted using the CWT method and then used to feed the CNN model. 70% of the images are used for training, 20% for validation, while the rest 10% are used for training. This resulted in 2800 images, 800 images and 400 images, respectively, that are used for training, validation and testing. The training is conducted over 30 epochs using a Stochastic Gradient Descent with Momentum (SGDM) solver with an initial learning rate of 0.0005, minibatch size of 128 and L2 regularization of 0.0005. Figure 3 shows the variation of the accuracy and loss throughout the training and validation process. The network is trained for 3 epochs resulting in 39 minibatch iterations. It can be observed that due to the high performance of the network, the accuracy reaches 100% relatively quickly while minimizing the loss function. The trained network is tested using 100 images of each frequency class. Therefore, The proposed CNN model can achieve 100% classification accuracy for all frequency classes.

Fig. 2
figure 2

Spectrograms of the sine signals with frequency a 1.0 Hz, b 3.0 Hz, c 5.0 Hz and d 7.0 Hz, respectively

Fig. 3
figure 3

Performance evaluation of the proposed CNN (modified VGG-16) network

4 Experimental Study

In order to validate the performance of the proposed method as a damage localization tool, an experimental test is conducted using an AE monitoring system on a wooden beam. In this study, a wooden beam was monitored using two AE sensors. These sensors have an operating range of frequencies between 20–450 kHz, which is suitable for the proposed application. A preamplifier is attached to the AE sensors to amplify AE signal. A decoupling box is connected with a preamplifier at one end, and the other end is connected with the data acquisition (DAQ) system. It is also attached with a direct current supply to power the AE sensor and collect the AE signal. A DAQ with four input measurement channels is attached to a computer. The sampling frequency of AE sensors is set to 20 kHz. Figure 4(a) presents the setup of the AE monitoring system. In order to evaluate the performance of the proposed method, AE data collected from a wooden beam using two sensors (S1 and S2) is considered, as shown in Fig. 4(b). The dimensions of the beam are 62 cm in length, 6.5 cm width, and 2 cm thickness, respectively. Two damage locations were considered to check the capability of the proposed method for localizing the damage (e.g., location D1: damage near S1 and location D2: damage near S2) as shown in Fig. 4(b, c). A drilling machine was used to create the damage while the AE data was collected.

Fig. 4
figure 4

a AE monitoring system and experimental setup b actual, and c the schematic

Figure 5 show the time-history of AE data collected from wooden beam using S1 and S2. The AE sensors produced big data due to the higher sampling frequencies. Therefore, the time-series of AE signal was divided into a finite number of windows (say, N). EMD method was applied to each segment separately, and a number of IMFs were extracted. Figure 6 represents the first IMF obtained from EMD and its Fourier spectra using AE data from (a, b) S1 and (c, d) S2. Then, the CWT method was used to generate the spectrogram of each IMF obtained from EMD. Figure 7 shows a typical coloured (original) spectrogram of IMFs using AE data collected from (a) S1 and (b) S2 for damage at D1. This coloured spectrogram of IMFs obtained from CWT has a size of 936 × 1920, which were used as the training and testing data of the 2D CNN.

Fig. 5
figure 5

Time-history of the measured AE data of damage at D1

Fig. 6
figure 6

IMF1 and its Fourier spectra of AE data collected from (ab) S1 and (cd) S2 for damage at D1

Fig. 7
figure 7

Spectrograms of IMF using AE data from a S1 and b S2 for damage at D1

In this study, 473 randomly selected images of each sensor class were used in the training process. An additional 135 images and 67 images of each sensor class were used in the validation and training process. Figure 8 shows the training process of the CNN model. The network was trained for 50 epochs using the same hyperparameters outlined in Sect. 3.1. Thus, a total of 135 × 2 = 270 images were used for validation while 67 × 2 = 134 images were used for testing. The confusion matrix, as shown in Fig. 9(a, b) displays the classification accuracy for the validation and testing datasets, respectively. The accuracy, recall, precision and F1 scores of the validation dataset and testing dataset are summarized in Table 1. The performance indicators calculated from the confusion matrices suggest an excellent damage identification using the proposed approach.

Fig. 8
figure 8

Training and validation: a performance and b loss of the Modified VGG-16 using the CWT images

Fig. 9
figure 9

Confusion matrices for a validation and b testing data of the wooden beam

Table 1. Performance indicators calculated from the classification of CWT images of close and far damage

5 Conclusion

AE technique is considered as one of the powerful NDT techniques that is capable of detecting and localizing minor damage due to its sensitivity to damage initiation and propagation. For long-term AE monitoring of full-scale structure, AE sensors produce a huge amount of data due to the high sampling frequency. Dealing with such massive AE data using traditional feature extraction techniques can be time-consuming and computationally expensive. In this study, a 2D CNN model is proposed to automate the process of detecting and localizing the damage using massive AE data collected from structures. A set of numerical and experimental studies are conducted to validate the performance of the proposed approach as a damage detection tool using a limited number of AE sensors. The results show the high accuracy of the proposed approach to identify the approximate location of the damage in the structural element. The proposed approach can identify the approximate location of the damage with 93% accuracy, where it can be a suitable candidate for a damage detection tool.