1 Introduction

In the manufacturing processes of major complex equipment, such as steam turbines and ships, welding is often involved; hence, weld defect analysis and recognition based on the ray detection image of the weld play an important role in guaranteeing the welding quality, reliability, and safety of the equipment [1, 2].

Currently, several studies based on traditional machine learning follow the traditional technical method, which mainly includes the process of “defect segmentation - feature extraction - feature selection - defect recognition” [3]. In these studies, modified background subtraction [4] and other methods are proposed for identifying the defect in the X-ray image; feature extraction generally obtains a set of edge-based features [5], hybrid descriptors based on the geometry [6], texture features [7], and other features. Feature selection primarily achieves the function of removing redundant features and noise to retain the useful features alone, and realizes the effective characterization of the defect-type features. Defect recognition is the effective judgment of the type and nature of the defect, based on the above steps, and is the core step in the entire defect recognition system, in which the Bayes [8], SVM [9], DS evidence theory [10], and other pattern recognition methods play an important role.

In recent years, with the development of artificial intelligence (AI) technology, methods based on the convolutional neural network (CNN) have become a research hotspot in image processing, pattern recognition, and other fields [11] because the end-to-end [12] recognition method addresses the issues involved in complex artificial processes, and have been applied in several fields such as environmental sound classification [13], grasp classification in myoelectric hands [14], and sentiment analysis [15]. In order to enhance the performance of the CNN, many improvements have been proposed; some of them focus on the common problems existing in the CNN. He et al. [16] proposed a pooling strategy called spatial pyramid pooling, for solving the problem of the artificial fixing of the size of the input image by the CNN, which decreases the recognition rate. Zhang [17] introduced a new graph CNN architecture based on the depth-based representation of a graph structure, which captures both the global topological structure and local connectivity structure within a graph. Suganuma [18] proposed a method for designing CNN architectures based on Cartesian genetic programming (CGP). Some studies have proposed CNNs for specific tasks: Yan et al. [19] proposed a HD-CNN for large-scale visual recognition, UberNet [20] for recognition tasks in computer vision, and P-CNN [21] for action recognition. HD-CNN reduces the top 1 and top 5 errors of VGG-19 model by 1.11% and 0.74%, respectively, and achieves advanced results on both CIFAR100 and large-scale ImageNet 1000-class benchmark datasets. In addition, in some studies, the improvement of the CNN is closely related to the characteristics of its application objects and the specific problems existing in the application of the CNN. Chaturvedi et al. [22] combined dynamic Gaussian Bayesian networks with the CNN to address an existing issue in previous works, wherein the prior distribution is not generally considered, when using sliding windows to learn word embedding. Ha et al. [23] introduced a multiple neural network topology, referred to as the selective deep CNN, to obtain accurate results for distorted images. Wang et al. [24] combined CNNs with the RNN and proposed a CNN-RNN framework to address the failure of explicitly exploiting the label dependencies in an image. The mean average precision of this method on PASCAL VOC 2007 dataset is 84%. Wu et al. [25] introduced a light CNN for deep face representation with noisy labels. Md Zahangir et al. [26] introduced an inception recurrent residual convolutional neural network (IRRCNN) model which is a deep convolutional neural network (DCNN) model. IRRCNN model combines the strengths of the inception network (Inception-v4), residual network (ResNet), and recurrent convolutional neural network (RCNN) for breast cancer classification. On the CIFAR-100 dataset, the IRRCNN model achieves 72.78% object recognition accuracy, which is about 4.53% higher than the recursive convolutional neural network (RCNN). Abdulnabi et al. [27] proposed a joint multitask learning algorithm to better predict the attributes in images using deep CNN. The accuracy of attribute prediction of the algorithm proposed by the author is higher than other methods in color attributes, pattern group, cloth parts, and appearance group on the clothing dataset. Its total prediction accuracy reaches 92.82%. Chen et al. [28] presented an algorithm for unconstrained face verification, based on the CNN, for improving the performances of previous algorithms, which were often considerably degraded when images involving large variations in the pose, illumination, expression, aging, cosmetics, and occlusion were used. Recently, some studies have proposed convolution auditing neural networks for weld defect recognition. Khumaidi et al. [29] trained the CNN by replacing the convolution kernel with a Gauss kernel to form a neural network model to recognize two common types of weld defects. Liu et al. [30] proposed a VGG16-based fully convolutional structure for classifying weld defect images, achieving high accuracy with a relatively small dataset for the deep learning method. Anil et al. [31] improved the cost function of the CNN, which avoids redundant activation of the hidden layer in the CNN. Yuan et al. [32] changed the construction of the low and middle convolution kernels, and improved the generalization and convergence of CNNs. Xie et al. [33] combined data enhancement and the window slip detection method to realize defect classification and defect location marking. Using the super pixel segmentation algorithm and an improved ELU activation function, the model proposed by Fan et al. [34] can effectively identify four types of weld flaw detection images. And the overall recognition rate can reach 97.8%, which enhanced the recognition accuracy. Rui et al. [35] combined continuous wavelet transform (CWT) with the CNN to improve the accuracy. The accuracy of the proposed method by them is 96.94%, which is nearly 10% higher than the traditional method. Li et al. [36] constructed a deep learning network structure based on the principle of simulated visual perception, which can automatically learn the complex depth features in X-ray weld defect images.

However, in the existing studies on the application of CNNs for weld defect recognition, as the characteristics of the weld defect image are not studied in detail, the existing methods often lack pertinence and are therefore not conducive for the further improvement of the final defect recognition rate. First, the traditional pooling strategy (max pooling and average pooling) shows poor dynamic adaptability in the presence of different feature distributions in the weld defect area, resulting in inaccurate feature extraction for the entire image. As the gray distribution of the defects in a weld seam image has an important relationship with the gray distribution of the surrounding area [7], the pooling strategy needs to consider the distribution characteristics of the defect area. Furthermore, CNN-based methods generally include three layers: the input layer, hidden layer, and output layer. The output feature vector of the hidden layer is an important factor that causes over-fitting in the CNN model [11]. Therefore, improving the training ability and type recognition accuracy of the model by improving the feature selection ability of the output layer remains a problem to be solved.

To overcome the above problems, a CNN with improved pooling strategy, feature selection model, and weld defect recognition is proposed in this study. First, an improved pooling strategy considering the feature distribution of the pooling area and feature map comprehensively is proposed, which can overcome the problem in the traditional pooling strategy wherein the weld defects characteristics are disregarded. Furthermore, the ReliefF algorithm is integrated with the CNN for constructing a strengthened feature selection method. A CNN is then constructed and trained with the above pooling strategy and feature selection method for image recognition. A practical case demonstrates that this method effectively overcomes the shortcomings of the traditional CNN, improves the accuracy of the pool feature selection and feature selection ability of the CNN model, and achieves good recognition accuracy.

The remainder of the paper is organized as follows. Section 2 analyzes the two problems in the application of the CNN for weld defect recognition. Section 3 describes the proposed pooling strategy for solving the problem in the tradition pooling strategy, discusses the enhanced feature selection method, and illustrates the flow of the weld defect recognition method proposed in this study. Section 4 presents a weld defect recognition case and the results of the proposed method in comparison with the traditional CNN methods. Finally, Section 5 presents the conclusions on the proposed method.

2 Analysis of the problem in weld defect identification using a CNN

In this section, the problems in the pooling strategy and feature selection of the traditional CNN model are analyzed. The CNN, first proposed by Fukushima in 1980 [37], contains an input layer, hidden layer, and output layer. The hidden layer is generally composed of multiple convolution layers and polling layer structures, and a fully connected layer. The polling layer is obtained by pooling the input feature map. Selecting different continuous ranges in the input feature map as the pooling region, an n × n rectangular area is generally selected, and a feature in the pooling region is selected as the characteristic of the pooling region, for a certain strategy. The traditional feature selection strategy includes max and average pooling: max pooling involves the selection of the maximum value in the pooling region as the feature of the pooling region, whereas average pooling involves the selection of the average value of the pooling region as the feature of the pooling region. Assuming that Fij is the input image,n × n is the size of the pooling region, and n is the moving step, S represents the feature values obtained after pooling. The max and average pooling strategies are depicted in Eqs. (1) and (2), respectively.

$$ S=\underset{i=1,j=1}{\overset{n}{\max }}\left({F}_{ij}\right) $$
(1)
$$ S=\frac{1}{n^2}\left(\sum \limits_{i=1}^n\sum \limits_{j=1}^n{F}_{ij}\right) $$
(2)

To demonstrate the problems involved in the feature extraction of the two traditional pooling strategies, two different representative pooling regions were selected, as shown in Fig. 1. Figure 1(a) shows the pooling process of the CNN model, in which the gray background area in the red box represents a 2 × 2-sized pooling region. Figure 1 (b) and 1(c) display two types of defects with the inclusion of slag and tungsten, respectively. The red and blue boxes in the images represent the pixel value distribution in the four pooling regions depicted by Fig. 1(d), 1(e), 1(f), and 1(g), respectively. Based on the distribution of the pixel values in Fig. 1(d)1(g), two different pooling regions with different feature distributions can be observed because of the different positions of the pooling regions on the feature map. One is the pooling region depicted in Fig. 1(d) and 1(f). The distribution of the pixel values in this type of pooling region is more uniform. Pooling regions with such feature distribution occur mostly in the weld area. Here, feature extraction by average pooling is appropriate. If maximum pooling is used, noise may be introduced. Another type of pooling region is depicted in Fig. 1(e) and 1(g), which is located in the edge zone of the defect edge. This pooling region contains the edge feature of the defect. If average pooling strategy is used to extract the features of this type of pooling region, it will result in the elimination of the edge features of the defect. Therefore, the traditional maximum pooling and average pooling strategies have poor dynamic adaptability for pooling regions with different feature distributions, resulting in inaccurate feature extraction. In a weld image with defects, although the variation of the gray values within and outside the defect area is different, the traditional maximum pooling and average pooling strategies do not reflect these variations.

Fig. 1
figure 1

Pooling strategies in different areas

Regarding the feature selection problem in the CNN, as previously mentioned, the output feature vector of the hidden layer has significant influence on the training ability and classification effect of the CNN model. Figure 2 shows the hidden layer structure of a typical CNN, in which W11, W12, W21, and W22 constitute the parameter matrix of the convolution kernels on two layers of the convolution layer, and W3 is the parameter transfer matrix between the two fully connected layers. In the CNN training process, the final output feature is determined and selected by continuously updating the parameters of Wij. However, because CNN training often includes over-fitting or under-fitting, the output feature includes redundancy and noise, reducing the efficiency of the final classification.

Fig. 2
figure 2

Hidden layer structure of the CNN

3 Improved CNN model for weld defect–type identification

3.1 Improved pooling strategy

In this section, an improved pooling strategy is proposed to overcome the problems in the feature extraction of weld defect images using the traditional pooling strategy. The proposed improved pooling strategy is discussed using two types of defect images (slag inclusion and tungsten trapping) as examples, which are shown in Fig. 3; Fig. 3(a)3(d) indicate the respective pooling domains located in four different regions of the two defect images. Different pooling methods are needed depending on whether the pooling domain is outside the defect area or on the edge of the defect area, and the calculation method is shown in Eq. (3).

$$ S=\left\{\begin{array}{c}\underset{i=1,j=1}{\overset{n}{\min }}\left({F}_{ij}\right),{\sigma}_{\mathrm{P}}\ge {\sigma}_{FM} and\left|{t}_{\mathrm{min}}-{t}_{ave}\right|>\left|{t}_{\mathrm{max}}-{t}_{ave}\right|\\ {}\begin{array}{l}\underset{i=1,j=1}{\overset{n}{\max }}\left({F}_{ij}\right),{\sigma}_{\mathrm{P}}\ge {\sigma}_{FM} and\left|{t}_{\mathrm{min}}-{t}_{ave}\right|\le \left|{t}_{\mathrm{max}}-{t}_{ave}\right|\\ {}\mu \underset{i=1,j=1}{\overset{n}{\max }}\left({F}_{ij}\right),{\sigma}_{\mathrm{P}}<{\sigma}_{FM}\end{array}\end{array}\right. $$
(3)

where σP is the value in the pooling region, σFM is the variance of the values on the feature map, tmin is the minimum value in the pooling region, tmax is the maximum value in the pooling region, and tave is the average value considering the maximum and minimum values in the pooling region.

Fig. 3
figure 3

Revised max pooling strategy

According to Eq. (3), when σP ≥ σFM and |tmin − tave| > |tmax − tave|, as in the case of Fig. 3(a), the minimum value in the pooling region is the output feature. When σP ≥ σFM and |tmin − tave| ≤ |tmax − tave|, as in the case of Fig. 3(b), the maximum value in the pooling region is the output feature value. When σP < σFM, the feature variance in the pooling region is small and the maximum feature value is not obvious; for this case, a strategy is proposed, in which a modification factor,μ, is introduced based on the max pooling strategy. This modification factor is the ratio of the sum of the feature values and the difference in the pooling region to the sum of feature values in the pooling region, as shown in Eq. (4):

$$ \mu =\left({t}_{\mathrm{sum}}-{t}_{\mathrm{max}}+{t}_{\mathrm{min}}\right)/{t}_{\mathrm{sum}} $$
(4)

For example, Fig. 3(a) shows the pooling strategy for the pooled region in the slag inclusion area, where the minimum feature value in the pooling region represents the defect edge characteristics (the feature values in the pooling region of Fig. 3(a) are 92, 90, 90, and 33, of which 33 represents the defect edge). Figure 3(b) shows the pooling strategy for the pooled region in the inclusion defect region, where the maximum feature values in the pooling domain indicate the edge characteristics of the tungsten inclusion defects (the feature values in the pooling domain of Fig. 3(b) are 242, 174, 166, and 153, of which 242 represents the feature values of the defect edge in the pooling domain). Therefore, the minimum and maximum pooling strategies can be adopted for Fig. 3(a) and 3(b), respectively, because their output feature values are 33 and 242, respectively. When the pooling region is located in the feature map area outside the defect area, as shown in Fig. 3(c) and 3(d), the output value is calculated using the improved maximum pooling strategy, namely, Eq. (3).

In summary, using the improved pooling strategy, different output feature value calculation methods are used according to the different locations of the pooling region in the feature map, which can reflect the characteristics of the defect image and include certain adaptability.

3.2 Enhanced feature selection method

To enhance the CNN feature selection ability, an enhanced feature selection method that integrates the ReliefF algorithm with the CNN is proposed. The ReliefF algorithm, first proposed by Gore [38], is a traditional feature evaluation method that deals with two classification problems. It can provide the corresponding weight, based on the significance of the feature. The greater the weight of the feature, the stronger is its classification ability. The ReliefF [39] algorithm can deal with multiclass problems, and is used for re-evaluating the features extracted by the CNN for recognition. When dealing with multiclass problems, the ReliefF algorithm randomly extracts a sample, R, from the training sample set at a time, and then finds the k nearest neighboring samples of R from the same sample set as R, and the k nearest misses from the different sample sets of each R, and then updates the weight of each feature, W(A), as shown in Eq. (5):

$$ W(A)=W(A)-\sum \limits_{j=1}^k\mathrm{diff}\left(A,R,{H}_j\right)/\mathrm{mk}+\sum \limits_{C\ne \mathrm{class}(R)}\left[\frac{p(C)}{1-p\left(\mathrm{Class}\left({R}_i\right)\right)}\sum \limits_{j=1}^k\mathrm{diff}\left(A,R,{M}_j(C)\right)\right]/\left(\mathrm{mk}\right) $$
(5)

where m is the number of sampling times, Mj(C) is the jth nearest neighboring sample in different categories of C, p(C) is the proportion of class C samples in the total, Class(Ri) is the category to which Ri belongs, and diff(A, Ri, Rj) is the distance between Ri and Rj, and is mathematically expressed by Eq. (6):

$$ \mathrm{diff}\left(A,{R}_i,{R}_j\right)=\left\{\begin{array}{c}\frac{\left|{R}_i\left[A\right]-\left.{R}_j\left[A\right]\right|\right.}{\max (A)-\min (A)},A\ \mathrm{is}\ \mathrm{continuous}\\ {}0,\kern0.5em A\ \mathrm{is}\ \mathrm{discontinuous},{R}_i\left[A\right]={R}_j\left[A\right]\\ {}1,\kern0.5em A\ \mathrm{is}\ \mathrm{discontinuous},{R}_i\left[A\right]\ne {R}_j\left[A\right]\end{array}\right. $$
(6)

The traditional ReliefF algorithm may cause the samples to fall into one or several categories, during the random sampling of multiclass samples, and the distribution of the characteristics of the entire sample cannot be considered. Based on this, this study adopts the “interclass ratio, intraclass randomness” sampling method; “interclass ratio” is the ratio of the number of samples extracted in each category to the total number of samples in that category:

$$ \frac{C_n}{C_m}=\frac{n}{m} $$
(7)

where n is the total number of samples in category C, m is the total number of samples in all the categories, Cn represents the samples selected in category C, and Cmrepresents the total samples selected in all the categories; “intraclass randomness” refers to the random selection of samples in a category, whereas “intraclass randomness “ refers to the random selection of samples in category C.

The feature weights calculated by ReliefF may contain negative values. If the feature weight is a negative value, it indicates that the distance between samples of the same category is greater than those between samples of different categories, which is contrary to the expected feature properties. Therefore, during feature selection, this study eliminates this type of feature and sets the corresponding weight of the feature to zero; the revised weight vector is then obtained, based on the initial weight vector provided by the ReliefF algorithm. The revised weight vector is assigned to the CNN for extracting the feature vectors for classification, and the features selected by combining the ReliefF algorithm with the CNN are obtained. Hence, the ReliefF algorithm combines its understanding of the feature significance with that of the CNN. The evaluation and selection of features are beneficial for improving the feature selection ability of CNN models.

3.3 Weld defect recognition process based on the proposed CNN

Based on the methods described in Sections 3 and 4, an improved CNN model for weld defect–type identification is proposed. The flow chart of the defect-type identification using this method is depicted in Fig. 4.

Fig. 4
figure 4

Flow chart of the proposed defect recognition method

From the flow chart, we can see that the proposed defect recognition process includes steps A, B, and C. In step A, a CNN with a specific architecture is constructed, which includes substep A1 for constructing an improved pooling model considering the pooling region and area surrounding the defect feature distribution comprehensively, and substep A2 for constructing an enhanced feature selection method. The structure of the CNN with the improved pooling strategy and feature selection model is depicted in Fig. 5.

Fig. 5
figure 5

Structure of the CNN with the improved pooling strategy and model

The basic structure of the network includes an input layer, two convolution layers (C1 and C3), two pooling layers (P2 and P4), two full connection layers (F5 and F6), one feature selection layer, and one output layer. By abstracting and extracting the input image layer-by-layer, the characteristic information of the representative sample can be obtained.

In the first convolution operation, the input image is convoluted by six convolution checkers sized 5 * 5, and six 28 * 28-pixel feature maps are obtained in the C1 layer. The input layer is a gray image of 32 * 32 pixels, with a stride of one and padding of zero. In the first pooling operation, six feature maps of the C1 layer are operated using a 2 * 2 pooling domain, and six 14 * 14-pixel feature maps are obtained in the P2 layer, with a stride of two.

In the second convolution operation, sixteen convolution checkers sized 5 * 5 are used to convolute the feature images of the P2 layer, and sixteen 10 * 10-pixel feature maps are obtained in the C3 layer, with a stride of one and padding of zero. In the second pooling operation, the feature maps of the C3 layer are operated using a 2 * 2 pooling domain, and sixteen 5 * 5-pixel feature maps are obtained in the P4 layer, with a stride of two.

The F5 and F6 layers are fully connected, containing only a one-dimensional vector. In the full connection operation of P4 and F5, 120 convolution checkers sized 5 * 5 are used to convolute sixteen feature maps of the P4 layer. The stride is one and the padding is zero. One hundred and twenty feature maps sized 1 * 1 are obtained in the F5 layer, i.e., the F5 layer is a one-dimensional vector containing 120 values. The F5 layer is fully connected, and a one-dimensional vector containing 84 values is obtained in the F6 layer.

The method described in Section 3.2 is used for feature selection of the F6 layer. Features with high importance are reserved, whereas those with low importance are eliminated by setting the corresponding node value to zero to get the feature selection layer. Finally, the softmax multiclass classifier is used to classify the feature selection layer, and the output layer is obtained. The principle of the softmax classifier is as follows:

$$ P\left(Y={Y}_i\right)=\frac{e^{Y_i}}{\sum_{i=1}^K{e}^{Y_i}}, $$

where Yi represents the output result corresponding to defect i in the image, i = 1, 2, 3, 4, 5, 6 corresponds to crack, lack of fusion, lack of penetration, slag inclusion, porosity, and nondefect in the weld, respectively. K is the number of categories and P(Y=Yi) is the probability information of the output results Yi corresponding to defect i in the image.

Furthermore, in step B, iteration is performed with the objective of minimizing the cost function to train the neural network constructed in step A for weld defect recognition. Finally, in step C, the sample weld image to be identified is input to the CNN trained in step B for the automatic recognition of the defect types.

4 Experiment and result

The main research object of this study is the defect in the welding process of steam turbines, and radiographic images were provided by the Dongfang Turbine Co., Ltd., Sichuan, China. The base metal of the welding seam includes mainly steel, nickel, and copper, and the welding joint is a double-sided butt weld. As the weld defects are detected through X-ray inspection, the X-ray film is digitized using an X-ray film scanner (JD-RTD) developed in-house, as shown in Fig. 6.

Fig. 6
figure 6

X-ray film scanner (JD-RTD)

There are five types of weld defects in digital radiograph, including porosity (PO), slag inclusion (SL), lack of penetration (LP), lack of fusion (LF), and crack(CR), as shown in Fig. 7.

Fig. 7
figure 7

Defects: (a) porosity (PO), (b) slag inclusion (SL), (c) lack of penetration (LP), (d) lack of fusion (LF), and (e) crack (CR)

Because the weld defect image was large, it was difficult to directly use the image as the input to the training network model of the neural network. Therefore, the original weld image was preprocessed, the defect and surrounding area in the original weld image was intercepted as a 32 × 32-sized region of interest (ROI), and the ROI image was the input to the neural network. In this study, 3486 ROI weld images were selected, including 504 porosity (PO), 410 slag inclusion (SL), 460 lack of fusion (LF), 864 lack of penetration (LP), 804 crack (CR), and 444 nondefect images. All these images were divided into a training set and testing set at a ratio of 4:1. A total of 2789 images were obtained for training and 697 images for testing. Some of the images, as experimental samples, are shown in Fig. 8.

Fig. 8
figure 8

Experimental sample images

The experiment was performed on a Windows 7 operating system using an Intel (R) Core (TM) i5-4460 CPU with a 3.20-GHz processor, 8.00-GB running memory, PyCharm integrated development environment based on Python 3.6.4, and Google open source tensorflow 1.13.0 deep learning framework. The model training and testing scheme is shown in Fig. 9.

Fig. 9
figure 9

Flow chart of the model training and testing scheme

The minibatch gradient descent method was selected for training; the batch size was set to 64 and cross entropy was used as the loss function. In the model training process, 20 steps were set as an epoch and the maximum number of training steps was 200. After training, the data of the test set was input to testing, and the accuracy of defect identification in the test set was obtained.

4.1 Validation of the improved pooling strategy

In order to verify the effectiveness of the pooling strategy proposed in this study, the basic CNN network model was used for the experiment, and the maximum pooling strategy, average pooling strategy, and pooling strategy proposed in this study were used to construct the network model in the pooling layer. And CNN-1, CNN-2, and CNN-3 were obtained. The network architecture of the model is shown in Fig. 10. The selection of specific pooling strategies is shown in Table 1.

Fig. 10
figure 10

Basic CNN architecture

Table 1 Pooling strategy selection

Experiments were performed using CNN-1, CNN-2, and CNN-3, respectively. The defect recognition accuracies, under different iterations, for the three models are listed in Table 2.

Table 2 Recognition accuracies of the three pooling strategies under different iterations

The experimental results demonstrate that under different iterations, pooling model, CNN-3, obtained higher accuracy than the max pooling model, CNN-1, and the average pooling model, CNN-2. The recognition accuracy calculation method is as follows: the single defect identification accuracy (correct number / total number) is calculated, and the average identification accuracy of all the defects is considered as the recognition accuracy.

According to Table 2, the accuracy of defect image recognition of the network models constructed using the pooling strategy is higher than those of the maximum pool strategy and average pool strategy models under different iterations. When the number of iterations is 200, the accuracy of defect image recognition of the network model constructed using the pooling strategy is 90.0%, which is 5.3% higher than the model with the maximum pool strategy and 1.8% higher than the model with average pool strategy.

4.2 Validation of the enhanced feature selection method

To validate the proposed feature selection method, a feature selection layer was added between the FC6 layer and output layer of the basic CNN architecture (Fig. 10) to construct a CNN with enhanced feature selection (Fig. 11). In the feature selection layer, the features extracted from the FC6 layer were selected by the ReliefF algorithm. Features with strong classification ability were retained along with their weights, and the redundant features were eliminated. The selected features were used as the input to the output layer.

Fig. 11
figure 11

Enhanced feature selection CNN architecture

The training set included 3348 images. In the FC layer, 84 features were extracted from each image. In each iteration of the CNN shown in Fig. 11, the batch size used was 110, to obtain a feature set, T, of dimension, 110×84. The ReliefF algorithm described in Section 4 was then used for processing feature set, T, and 84 feature weights were obtained. Some of the parameters in the ReliefF algorithm were set as follows: The number of neighboring samples in each sample set was five, and the number of samples selected randomly by the ReliefF algorithm for weight evaluation in each sample set is presented in Table 3.

Table 3 Number of extractions of different samples

After the above processing, we obtained the initial feature value vector, L0, with 84 nodes corresponding to each feature of the FC6 layer in the CNN architecture shown in Fig. 11, with each iteration of the CNN; the 200th iteration is shown in Table 4.

Table 4 Corresponding weights of the initial feature value vector

From L0, it was determined that the weights corresponding to some of the features were negative, indicating that the distances between samples in the class were greater than those between classes, which is not conducive for classification and needs to be eliminated. Based on this, feature weights less than zero were set to zero, and the revised weight vector,L1, was obtained, as depicted in Table 5.

Table 5 Corresponding weights of the revised feature value vector

The feature weight of weight vector, L1, was assigned to the corresponding feature in FC6 of the CNN architecture shown in Fig. 11, and the feature selected by the ReliefF algorithm could be obtained at the feature selection level. For the zero weights in L1, the FC6 feature was given a corresponding feature weight, which is equivalent to removing the corresponding feature with a weight of zero. A total of 16 zero feature weight values were calculated, indicating that the feature vectors in the FC6 layer were reduced to 68 dimensions after feature selection.

As shown in Table 5, the weights of 16-dimensional features were zero, which is equivalent to discarding the neurons in the locations of the 16-dimensional features, during the process of feature selection. In addition, for reducing the training time and preventing over-fitting, the “dropout” method is often used to temporarily discard certain neurons. However, in the “dropout” process where neurons are discarded randomly, some neurons with important characteristics may be discarded, leading to certain blindness, and in the process of training, it is necessary to eliminate neurons. It is difficult to debug the super-parameter of the quantity in training. In the process of feature selection, some neurons can be selectively discarded, based on the feature importance, playing a role not only in evaluating the importance of the features but also in selectively discarding neurons.

To verify the effectiveness of this feature selection method and the proposed defect recognition method, we consider CNN-1, the CNN architecture in Fig. 5 referred to as CNN-4, and CNN-5 as comparison experimental objects; more information on CNN-1, CNN-4, and CNN-5 is listed in Table 6. Under different iterations, experiments were carried out on CNN-1, CNN-4, and CNN-5, respectively, and the recognition accuracies are shown in Table 7. By comparing the accuracies of CNN-1 and CNN-4, the effectiveness of the proposed enhanced feature selection method can be verified. By comparing the accuracy of CNN-5 with those of CNN-1 to CNN-4, the validity of the welding defect recognition method proposed in this study can be verified.

Table 6 Model construction
Table 7 Recognition rates of CNN-1 and CNN-4 under different iterations

The experiments demonstrate that the recognition accuracy can be further improved by combining the ReliefF algorithm with the neural network for feature selection. Moreover, when the number of iterations is relatively small, the advantage of this feature selection method is more obvious.

In Table 7, by comparing the defect recognition accuracy rates of CNN-1 and CNN-4 and CNN-3 and CNN-5 under different iterations, it can be seen that the method of enhanced feature selection proposed in this study can effectively improve the defect recognition rate, and when the number of iterations is 200, the defect image recognition accuracy rate of CNN-4 is 87.5%, which is 2.3% higher than that of CNN-1, and the defect image recognition accuracy of CNN-5 is accurate. The rate is 91.0%, which is 1.1% higher than that of CNN-3. The results show that the combination of ReliefF algorithm and neural network can further improve the recognition accuracy. Moreover, when the number of iterations is relatively small, the advantage of this enhanced feature selection method is more obvious.

By comparing the defect recognition accuracy of CNN-5 with CNN-1, CNN-3, and CNN-4 under different iterations, it can be seen that the model based on pooling strategy and feature selection has good recognition performance. When the number of iterations is 200, the recognition accuracy rate of defect image of the model is 91.0%, which is 1.1% higher than that of the model without enhanced feature selection, and compared with the model without enhanced feature selection. Compared with the traditional CNN model, the maximum pooling strategy is improved by 4.0% and 6.4%, respectively. Experimental results show that the proposed molten pool strategy and feature selection method have good effect on improving the defect recognition rate, and when the two are combined, the recognition accuracy of weld image defects can be further improved.

4.3 Validation of the proposed method

Sections 4.1 and 4.2 verified the proposed pooling method and feature selection method, respectively. In this section, the entire improved CNN is tested and validated. The data set used is the experimental data set of this study. The proposed method, CNN, SVM, and improved DS [10] method were used to perform the experiments, respectively. Table 8 shows the recognition accuracy for different defects under the various methods.

Table 8 Comparison of the recognition accuracy of different methods (%)

The proposed method can determine the category of defects in the input image. It can be seen from the table that the average accuracy of this method for defect identification is the highest among all the methods (7.57% higher than the DS method, 2.74% higher than the SVM method, and 3.29% higher than the CNN method).

With respect to the recognition accuracy of a single defect, the deep learning method applied by the proposed method achieves better results in identifying PO and SL defects due to its ability to extract abstract features; however, due to the small crack width, the number of features extracted by deep learning is less, and the recognition accuracy for CR defects needs to be improved. For traditional defects, the DS shows better performance. The results indicate that DS method has a high recognition accuracy for CR, but the accuracy is not high when identifying PO and SL defects. Further improvement involves the addition of artificial crack features to the CNN to improve the accuracy of crack identification.

5 Conclusion

In this study, in order to improve the pool adaptive ability and feature selection ability of the CNN for different defect image features, the classic pooling strategy was improved, and the traditional feature evaluation method was combined with the neural network for feature selection. In summary,

  1. (1)

    A pooling strategy, which considers the feature distribution of the pooling region and the feature map to which the pooling region belongs, was proposed. This model includes the characteristics of max pooling and average pooling, and reflects the pooling region, when different feature distributions are involved. A certain degree of dynamic adaptability is significant for improving the recognition rate of deep neural networks.

  2. (2)

    Combining the traditional feature evaluation method of ReliefF and the understanding of the feature importance of the neural network, the feature selection ability of the model was strengthened, enabling further improvement of the model’s classification ability.

  3. (3)

    The method proposed in this study can identify and classify defects in radiographic images. The effectiveness of the CNN model based on the improved pooling strategy and feature selection was verified. The experimental results demonstrated that compared to the traditional CNN, the proposed method has higher correct recognition rate and better adaptability. Compared to the traditional DS method, the overall performance of the proposed method was improved; however, the recognition accuracy for crack defects requires improvement. The CNN model based on the improved pooling strategy and feature selection exhibited good performance in the defect classification of X-ray images. In the future, it is intended to improve the recognition accuracy for crack defects.