1 Introduction

Carbon steel pipelines are widely used in industry to transport oil and gas. During manufacturing or while in-service, the rounded pipelines and other metallic structures often suffer from flaws such as cracks or wall loss corrosion, or a combination of these two [1]. Consequently, the same or mixed combination of flaws can simultaneously exist in a pipeline or other load bearing structures (rail tracks, bridges, etc.). When more than one flaws exist near each other, their stress and strain fields can interact and are referred to as interacting flaws. Each flaw introduces its disturbance in the stress and strain fields, which, when combined in the case of interacting flaws, can lead to a different and often more severe compromised structural integrity state compared to a single flaw [2, 3]. Interacting flaws are found to cause unexpected failure (such as a sudden burst of pipelines), posing a severe threat to the structural integrity [4]. Investigation of improved theoretical and empirical burst models for interacting flaws has been a focus of several ongoing research works, such as for the interaction between corrosion and crack [5], multiple cracks [6], dent and crack [1], and multiple corrosion [7]. A prerequisite to applying models for interacting flaws correctly or assessing structural reliability for continuing operations is the ability to identify and classify both the single and interacting flaws. However, this is particularly challenging for non-visible flaws. One important application is critical hydrocarbon carrying pipelines containing embedded cracks or wall loss corrosion on the opposite side of the available non-destructive testing (NDT) measurement surface. Internal wall loss corrosion is not visible when inspected from the outside surface for an externally accessible pipeline. In the case of buried underground pipes, the typical inspection is conducted using smart sensors that can travel inside the pipes where the external wall loss corrosion is hidden. An ultrasound test is the most common non-destructive evaluation (NDE) method for non-visible flaw detection in the structures [8]. The reflected ultrasound signals from the flaw–structure interface usually exhibit a slightly perturbed waveform compared to those reflected from the structures that are absent of flaws. A significant drawback in the current practice is the inevitable involvement of human judgment. It usually requires operators to evaluate the measured ultrasound signals. Still, even with the most detailed assessment, the NDE predictions are prone to errors and large uncertainties when interpreting a time-amplitude reflected ultrasound waveform [9]. In the case of non-visible interacting flaws, the complexity of ultrasound signals exacerbates the situation. Significant effort has been focused on designing better NDT sensors and equipment, such as the assembly of ultrasonic arrays by connecting small elements to increase detection accuracy [10, 11]. Nonetheless, accurate identification and classification of non-visible interacting flaws remain a challenging unsolved problem that this paper attempts to solve with the proposed methodology.

Machine learning has made remarkable progress during the past decades and has been successfully applied to various structural mechanics problems [12,13,14,15,16,17,18,19,20]. Several attempts have been made to apply machine learning models into flaw classification systems. Sambath et al. [21] used an artificial neural network combined with a feature vector that was extracted by wavelet transform to classify four types of defects using ultrasonic oscillograms. Yang et al. [22] utilized an artificial neural network towards the goal of flaw classification for ultrasound signals acquired from carbon-fiber-reinforced polymer (CFRP) specimens and compared the performances between different methods of feature extraction techniques. Liu et al. [23] used wavelet packet transformation to extract features and classified four types of stainless steel resistance spot welds with a neural network. Meng et al. [24] proposed a convolutional neural network (CNN)-based approach to classify ultrasound signals from CFRP for void and delamination. Recent research has focused on flaw identification and classification using 2D images as training data for image-based convolutional neural networks. For the 2D image-based machine learning applications, the images can be acquired directly from in-field cameras [25,26,27,28], or through alternative imaging methods such as X-ray [29,30,31] in the laboratory.

Image-based CNN flaw classifiers have the majority of attention. However, utilizing ultrasound time signals (1D data) as training data has several advantages. First, time signals, or A-scans, have the quickest structure scanning/response time. A fast A-scan is a viable and practical method for in-field measurements on large areas such as tens of miles of pipelines. Ultrasound imaging techniques such as B-scan depend upon the post-processing of raw data. They have a slower data acquisition speed and have the potential to compress and lose information. Also, compared to images, 1D data require a shorter training time and a much smaller training dataset [32, 33]. This is beneficial when the sources of training data are scarce. In fault diagnosis using machine vibration signals, several works suggested that a time signal-based CNN is capable of achieving high accuracy using one-dimensional time signals even in a noisy environment [34,35,36,37]. Munir et al. used time signal-based CNN for flaw classification of different weldment defects using experimentally acquired ultrasound signals [38, 39]. Recently, Niu and Srivastava [40] demonstrated that a simulation-trained signal-based CNN can predict crack characteristics from experimental ultrasound signals with very high accuracy.

It is very challenging and expensive to create well-labeled experimental datasets of ultrasound signals for non-visible interacting flaws. This hinders the use of machine learning and leaves a significant gap in NDE methods to accurately classify interacting hidden flaws. To fill this gap, we propose a methodology in which we used numerical simulations to create a sufficiently large and well-labeled training dataset for a CNN. The finite element method has been established as an acceptable method of simulating a variety of mechanical problems and providing reasonably accurate numerical solutions [41,42,43,44,45,46]. We have considered both embedded cracks and wall loss corrosions in this work. Five important categories of single and interacting flaws, including no flaw, single crack, single wall loss corrosion, two cracks, and a crack with corrosion, were considered for classification. To show that purely computational finite element simulation-trained CNN works on real-life experimentally measured ultrasound signals, we performed validation experiments on 3D-printed steel specimens using a commercial ultrasound NDT unit. Thirteen specimens were designed and contained a variety of single and interacting flaws. Actual ultrasound signals were measured from these specimens and were successfully classified by purely simulation-trained CNN. Figure 1 illustrates our proposed approach. This methodology addresses the important problem of non-visible single and interacting flaw classification. It demonstrates that a completely simulation-trained CNN can predict single or interacting flaws from independent experimentally measured signals.

Fig. 1
figure 1

A schematic of our methodology for identification and classification of single and interacting flaws. The CNN was initially trained with computational data from simulations and then applied successfully on independent experimental ultrasound measurements

The article is organized as follows: in Sect. 2 we discuss different categories of flaws considered in work; in Sect. 3 we present our computational method, including details of finite element simulations, the CNN architecture, and training data preparation; then, in Sect. 4, we demonstrate the performance of our CNN on simulation testing data, followed by validation experiments and simulation-trained CNN’s classification performances on the experimental data. We give closing remarks in Sect. 5.

2 Geometric description of single and interacting flaws

Five representative categories of non-visible flaws are considered in this work, namely:

  • No flaw (NF)

  • Single embedded crack (SC)

  • Single wall loss corrosion (SW)

  • Two embedded cracks (TC)

  • An embedded crack and a wall loss corrosion (CW).

We assume a rectangular cuboid geometry for the structure, elliptical penny-shaped cracks, and partial spheroid cut-section wall loss corrosion. NF establishes the baseline, SC and SW are cases of the single flaw, and TC and CW demonstrate interacting flaws. Figure 2 shows an illustration of each flaw category that was used in ultrasound finite element simulations.

Fig. 2
figure 2

Cross-sectional view on the center plane for five representative categories of single and interacting flaws that are non-visible for an observer indicted by the eye symbol

Multiple flaw-related parameters are needed to define the flaw’s geometry for each category. Generally, for an elliptical crack, one needs to specify its long axis, short axis, location, and orientation; for a partial spheroid wall loss corrosion, its width, height, and location. For this study, we focused on major flaw-related parameters as variables and made limited simplified assumptions, and kept some of the parameters fixed. A crack’s long axis dominates the stress concentration and fracture behavior. Hence, the minor axis (thickness of the penny-shaped elliptical cracks) was set at 0.5 mm, while the length (long axis) was varied. The orientation of the cracks was varied in the 1-3 plane. The flaws were positioned in the center of the geometry except for the case where we have two flaws, in which case the crack is allowed to have both a horizontal offset and the vertical offset.

Throughout this article, we use l, d, \(\theta\) to denote a crack’s length, depth from the measurement surface, orientation, and h and w to denote a wall loss corrosion’s height and width. The horizontal distance between two flaws is denoted by s for TC and CW. The subscript 1 and 2 denotes the first and second crack. With these notations, the number of variable parameters for SC is 3 and they are represented by \(l_1\), \(d_1\) and \(\theta _1\). The number of parameters for TC is 7 and they are represented by \(l_1\), \(d_1\), \(\theta _1\), \(l_2\), \(d_2\), \(\theta _2\) and s. There are two parameters for SW which are denoted by h and w, and the number of parameters for CW is 6, which are denoted by \(l_1\), \(d_1\), \(\theta _1\), h, w, and s. The details of identified parameters as well as their ranges are given in Table 1. The parameter ranges were selected to represent practical feature sizes and to cover a broad range of variability. The range of horizontal distance was selected to be 25% of the total thickness (5 mm) of the geometry, assuming that the two flaws can be considered interacting within this range.

Table 1 Details of the parameters for each flaw category and the corresponding given ranges

3 Computational methods

3.1 Finite element simulation

Numerical simulations were performed using finite element software Abaqus Explicit. The simulated geometry is a rectangular cuboid with dimensions of 50 mm \(\times\) 50 mm for the length and width and 19 mm for the thickness. The transverse directions’ (1 and 2 directions) dimensions are sufficiently large to represent large structures where the ultrasound reflections from the transverse direction boundaries do not interfere with through thickness wave measurements. Half of the cuboid was simulated because of the overall symmetry, including the flaws as discussed in Sect. 2. The majority of the geometry was meshed with C3D8R hexahedron elements. In regions containing the flaw, finer C3D10M tetrahedral elements with 0.1 mm size were used to adapt to the irregular geometry. Figure 3 illustrates an example of a simulated, flawed cuboid with the mesh shown. The linear elastic material response was assumed, with Young’s modulus of 180 GPa, Poisson ratio of 0.31, and density of 7300 kg/\(\hbox {m}^3\). These material properties were chosen to reflect the 3D-printed material used later in our studies. Considering the transient nature of ultrasound signals, a dynamic and explicit step with a total time of 8 \(\mu s\) was utilized in the simulations. The explicit time increment was chosen to be small enough to meet the stability requirements and was fixed at 2 ns. This time increment matched the data sampling rate in the experiments, as well. A time-dependent pressure boundary condition was used to represent a 5 MHz, raised-cosine type ultrasound pulse whose waveform can be described as

$$\begin{aligned} A = \left\{ \begin{array}{ll} &{} \cos (2\pi ft)\left[ 1-\cos \left( \frac{2\pi ft}{m}\right) \right] , 0\le t \le \frac{m}{f}\\ &{} 0, \text{ otherwise }, \end{array}\right. \end{aligned}$$
(1)

where t is the time, f is the pulse frequency, and \(m=2\) is the number of periods. The pulse was applied on a circular region in direction 3 (see Fig. 3) on the top surface, which has a diameter of 6 mm. The circular region in the simulation is the same as the ultrasound transducer used in experiments. Simulation ultrasound signals were produced by averaging the time history of the nodal displacements in direction 3 for all the surface nodes in the circular transducer region.

Fig. 3
figure 3

a Top view of the (half) simulated geometry. The half-circle at the bottom indicates where the pulse was applied to represent the size of an ultrasound transducer. b Cross-section side view of the symmetry mid-plane. The finite element mesh is significantly finer at the center region containing the elliptical crack

3.2 Convolutional neural network

Here, we discuss a brief introduction of CNN’s architecture. There are three important layers in a CNN: the convolutional layer, the pooling layer, and the fully connected (FC) layer. An activation function is needed for the convolutional layer and the fully connected layer to enable the nonlinear learnability of the network [36]. We selected rectified linear unit (ReLU) activation function for its relatively fast convergence, whose expression is given as follows:

$$\begin{aligned} \text{ ReLU }(x) = {\rm{max}}(0, x). \end{aligned}$$
(2)

We then adopted the most common max-pooling layer, where local maxima are subsampled from the input to reduce the computational demand. In this work, we also used the dropout technique to prevent overfitting in our network [47]. During the operation of dropout within the training stage, some neurons are randomly deactivated with a probability p. This simple method has been proved to be very effective in improving the network’s generalization ability. This is greatly desired for our application, where a simulation-trained CNN is required to work well with the experimental data.

We have used a CNN architecture shown in Fig. 4. This relatively small-sized CNN architecture was built in PyTorch, and had two convolutional layers, one pooling layer, and two fully connected layers, including the output layer. The CNN architecture was initially configured following the works of [36, 38] which demonstrated the need for moderate-to-large kernel size to eliminate noise that could exist in both simulations and experiments. We then optimized the network details for our application. The details of the architecture are listed in Table 2. We selected Adam as our optimization algorithm [48] and cross-entropy loss function which can be defined as

$$\begin{aligned} {\rm{Loss}} = -\sum ^n_{i = 1}t_i\log (p_i). \end{aligned}$$
(3)

Here, i denotes the ith class (flaw category), \(t_i\) is the binary true label (target), and \(p_i\) is the softmax probability. Softmax function converts the raw numerical outputs for each classification class from the CNN and normalizes it according to

$$\begin{aligned} p_i = {\rm{Softmax}}(y_i) = \frac{\exp (y_i)}{\sum _j\exp (y_j)}, \end{aligned}$$
(4)

where \(y_i\) is the raw ith output from the CNN and the summation in the denominator sums over all the classes. In the fully connected layer, we let the dropout probability to be 0.2. The CNN was then trained for 1000 epochs with a learning rate of 0.001.

Fig. 4
figure 4

Schematic of the CNN architecture for the single and interacting flaw classification study

Table 2 Details of the CNN architecture

3.3 Training data preparation

All training data for our CNN came from finite element simulations. We exploited the advantages of finite element simulations (relatively fast and significantly inexpensive) and conducted 500 simulations for each flaw category. One ultrasound signal was produced in each simulation, and we collected a dataset of 2500 signals in total. In each simulation, the parameters discussed in Table 1were assigned a random value within their respective given ranges. NF was simulated only once, and the signal was copied 500 times in the dataset for unbiased training purposes. The total number of simulations was determined through a study of the network performance. The number was gradually increased until the CNN achieved the desired classification accuracy on the simulation data.

The first few microseconds in the ultrasound signal contained the initial input pulse; these initial data containing the input pulse were removed for all the signals. This resulted in 6.8 \(\mu s\) numerical ultrasound signals. The absolute values of the signal were normalized between 0 and 1 by dividing it by the absolute value of the maximum signal amplitude to produce the final training data

$$\begin{aligned} \text{ Signal }(t) = \frac{\left| \text{ Signal }(t)\right| }{max(\left| \text{ Signal }(t)\right| )}. \end{aligned}$$
(5)

4 Results

4.1 Classification on simulation-generated testing data

We first examined the classification performance using an independent simulation dataset (not used for training). We used the standard confusion matrix as the evaluation metrics to quantify the classification accuracy [49]. The results were normalized to the true values (rows). The dataset containing 2500 simulation ultrasound signals was divided into 2000 training data (80%) and 500 testing data (20%). The data division is random within each flaw category, broken into 400 training data and 100 randomly selected testing data. Training loss is shown in Fig. 5a during the training stage. To see how the classification performance evolves over the training stage, the network parameters in the CNN were frozen after each epoch (evaluation mode), and the partially trained CNN evaluated the testing data. From Fig. 5b, we can see that the network demonstrated good overall classification accuracy of over 90% after only 100 epochs. The performance curve of the testing data is also similar to that of the training data. After 1000 epochs, the classification rate on the testing data is 100% for NF and SW, 95% for SC and CW, and 93% for TC, and the overall accuracy is 96.6%. The confusion matrix is given in Fig. 5c. Our convolutional neural network exhibited good learning capability on simulation data, and showed high classification accuracy for all the five categories of single and interacting flaws.

Fig. 5
figure 5

a Loss of the training data at each epoch. Y-axis is in log scale. b Performance (overall accuracy) of the CNN on training data and independent testing data at each epoch. c Confusion matrix of the testing data with the trained CNN

4.2 Validation experiments

4.2.1 Experimental method

Ultrasound non-destructive tests (NDT) were conducted using 3D printed metal specimens and a commercial ultrasound flaw detector shown in Fig. 6. A total of 13 specimens with the dimensions of 50 mm \(\times\) 50 mm \(\times\) 19 mm were fabricated using 17-4PH stainless steel powder as the base material. The 3D printed steel specimens that were used for the validation purpose have an average density of 7300 kg/\(\hbox {m}^3\). In contrast to 3D printed polymers that show highly porous structures, the 3D-printed steel specimens are homogeneous. The porosity is negligible not to disrupt ultrasound wave propagation in the bulk material. Three specimens were made for each category of SC, SW, TC, and CW, and one for NF as a control case. Details on the parameters of the single and interacting flaws considered in the experiments are summarized in Table 3. The parameters for the experimental analysis were chosen independently to be very diverse and over a broad range. The experiments were conducted using Olympus Epoch 650 Ultrasonic NDT Flaw Detector with a straight beam, single element, 5 MHz frequency transducer. Hydrogel couplant (35% Propylene glycol) was applied between the transducer element and the specimen to avoid air gaps during NDT.

When taking the measurement, the transducer was placed at the center of the top surface of the specimen (see Fig. 3a), which is consistent with the simulations as well. In real applications, the data to feed CNN for interpretation can be automatically identified by selecting the time signal snapshot that reflects the minimal distance to the flaw reflection. In a practical sense, some offset is possible. Therefore, we took additional measurements for each of the 12 flawed specimens (excluding the NF specimen) by placing the transducer with an offset. During this measurement, the transducer was placed with a small horizontal offset (10% of the specimen thicknessFootnote 1) in direction 1 from the center of the specimen. This resulted in a final experimental dataset of 13+12=25 ultrasound signals in total. The input pulse perturbation was removed, and the signals were normalized following the same procedure for the simulation data. These experimental ultrasound signal data were completely independent and were not a part of the simulation-based training.

Fig. 6
figure 6

Experimental set-up for ultrasound NDT. a 3D-printed steel specimens containing single and interactive flaws. From the top row to the bottom row: NF, SC, SW, CW, and TC, respectively. b An ultrasound NDT unit. c A 5 MHz single-element ultrasound transducer

Table 3 Parameters for the single and interacting flaws in the specimens used in ultrasound non-destructive experiments. Units are the same as in Table 1. Category NF does not have a flaw and not listed here

4.3 Validation using experimental data

Our CNN was trained only with finite element simulation-generated data. The previous section demonstrated good classification performance on testing data that came from the simulations. For the proposed method to be valid, it is essential to examine previously trained CNN performance on independent experimental ultrasound signals. The classification confusion matrix for the predictions on 25 ultrasound signals acquired from the experiments is shown in Fig. 7a. The results show that the CNN demonstrates perfect classification accuracy for all the five categories of single and interacting flaws. The observed higher accuracy for experimental data (100%) than the simulation testing data accuracy of 96.6% seems to come from the fact that the simulation-based testing data are 20 times larger and allow for a higher probability for predictive errors to occur. It is very promising to observe that despite being trained only on simulation-generated numerical data, our CNN successfully classified all 25 experimental signals that belonged to five distinct categories of non-visible flaws. Finally, it is prudent to make some remarks on the learning curve of CNN. It is essential that CNN does not overfit the simulation-based training data. We used dropout as the regularization method to prevent overfitting. To show the effect of the dropout method, as a comparative study, we compare the cases of training with and without the dropout effect in the fully connected layer. By plotting the performance of the CNN on experimental data at each epoch in Fig. 7b, we can see that the CNN without dropout shows a faster classification accuracy improvement, but the performance saturated in the later training stage. On the other hand, the CNN that uses dropout can achieve 100% accuracy. However, because dropout deactivates some neurons with a certain probability, it adds some randomness to the network learning and can cause fluctuations. Hence, network designs and the choice of learning epoch require careful attention.

Fig. 7
figure 7

a Confusion matrix of predictions from simulation trained CNN on independent experimental ultrasound NDT data. b Overall performance (classification accuracy) on experimental data at each epoch for two CNN architectures

5 Conclusions

Accurate non-destructive identification and classification of interacting and single structural flaws are critical to assessing fitness for continued use of a structure or equipment. Cracks and corrosion wall loss are two critical flaws in widely used carbon steel. The presence of an undetected single crack or corrosion, a combination of two cracks close to each other, or a crack close to a corroded wall loss are all the cases that can cause unwanted failure. In the case of two cracks close to each other or a crack close to a corrosion wall loss, the structural integrity can be further compromised. Therefore, interacting flaws must be differentiated from single flaws during the detection and classification stage. Ultrasound NDE is a commonly used method to detect non-visible flaws, but accurate detection and classification of interacting flaws continue to be evasive due to very subtle changes in the reflected ultrasound signals in three-dimensional geometries, which is unfeasible for a human to detect and interpret.

Towards this goal, we have demonstrated a methodology where a purely simulation-trained CNN can achieve very high accuracy detection and classification of both single and interacting flaws from experimental ultrasound NDT signals. Five representative categories of non-visible single and interacting flaws, namely, no flaw, single crack, single wall loss corrosion, two cracks, and combined crack and wall loss corrosion, were considered in this work. We show that ultrasound signals generated through finite element simulations can be used effectively to develop training data that otherwise are extremely difficult to acquire experimentally. As the highlight of the work, in Sect. 4.3, we presented our results where the simulation-trained CNN was able to classify non-visible flaw categories in real-life steel specimens from the ultrasound NDT measurements. This result is very promising, because the classification of non-visible interacting and single flaws using ultrasound non-destructive signals, traditionally a complex problem, can be addressed using the methodology proposed in this paper. The proposed method can be expanded to other materials and other types of flaws. This work focused on using only simulation data for neural network training to address the problems where the experimental training data are negligibly available. For applications where significant experimental data are available, using hybrid training data that mix simulations and experiments is also useful for CNN training. For the NDE of large structures like long pipelines, InLine Inspections (ILI) are conducted through length traversing instrumented PIGs (Pipeline Inspection Gauges). We envision a pre-trained neural network based on the method shown here on a processor onboard the inspection equipment. Continuous interpretation of A-scan ultrasound signals through the onboard neural network will allow accurate identification and only critical information storage during long field measurements.