1 Introduction

Welding is an integrated industrial process that is inherent to most of the production sectors; specialized welding processes are widely used in high-performance industries such as aerospace, design of machines, power generation sectors, and marine and automotive industries. Welding is complex by nature, involving multiple parameters that influence the quality of the weld joint. Apart from welding parameters, random events that take place in the manufacturing processes also lead to weld defects. Thus, quality assurance and testing are crucial for critical applications. As a result, research into detecting and identifying weld defects has become a necessity. Weld defects are usually detected by non-destructive techniques, which are preferred due to their passive nature when physically involving the specimen. Popular non-destructive techniques are vibration analysis, screening by eddy currents, and X-ray identification [1, 2]. These techniques have their limitations. The X-ray detection method results in harmful side effects for humans under prolonged exposure; using eddy currents is limited only to testing metallic specimens. However, image processing has been an important topic of AI technology in the recent decade and has been extensively implemented in weld fault detection. Image data acquisition, image pre-processing, extraction of features, and classification are the components of image processing [3]. Previously, weld feature extraction and classification methods relied on either surface or geometrical characteristics from weld defect imagery. However, this was inadequate to determine acceptable weld faults. Many academics have suggested various fault detection models that incorporate neural networks. Many types of research have been conducted on the extraction of features and categorization of weld faults. To categorize distinct weld flaws based on textural and geometric characteristics, Valavanis and Kosmopoulos [4] used an artificial neural network and ‘k’ nearest neighbor methods. Yin et al. [5] formulated a novel approach to extract geometric characteristics from Lissajous imagery and error diagnostics by adapting machine learning methods. Boaretto and Centeno [6] employed the double wall double image (DWDI) illumination approach to determine defects by constantly identifying them by the multilayer perception method. An evolutionary platform-based fuzzy interference method for determining weld defects by establishing the form and defect location as classifying variables was explored by Pérez et al. [7]. To detect unique flaws, Zahran et al. [8] used ANN to compare characteristics obtained by analyzing the density of the power spectrum in the split weld regions. El-Abbasy et al. [9] in their work on using artificial neural network models to forecast the conditions of marine pipelines carrying oil and gas. They developed the models by ANN and highlighted that several existing models focus only on corrosion models and their disadvantages in detecting other conditions of the specimen. The authors developed the models based on the pipeline data acquired from existing offshore pipelines, and the model achieved 97% validity when applied to the dataset. Jiang et al. [10] and Mu et al. [11] in their work explored methods of efficiently converting weld defect imagery to confirmatory factor regions. Carvalho et al. [12] worked on detecting and categorizing the weld defects on pipes by utilizing magnetic flux leak signals and artificial neural networks (ANNs). Pattern recognition was developed from an intelligent animal, and ANN was used to differentiate the weld defects and non-defect weld patterns. The authors utilized Fourier, Wavelet analyses, Moving-average, and Savitzky-Golay filters. They utilized signal processing technology to enhance the performance of the differentiation. They determined the external corrosion and internal corrosion rates with 71% success. Shen et al. [13] in their work on the automatic categorizing of weld defects reported two problems that impeded categorization. They were poor generalizations of the categorization method and poor separation of features, both caused by limited training samples. They proposed a novel method based on a multiclass support machine to determine the weld defects to offset poor generalization and a feature-based image classifier to characterize weld defects; thereby effectively enhancing the separability of the image group. They reported a significant computation time cut of 22 s and a greater efficiency in defect classification. He et al. [14] investigated the correlation between the floating potential and welding flaws by using finite element modeling. They developed an ultrasonic duplex finite element model to detect weld flaws in a butt-weld joint. A floating potential was used to conveniently describe the geometrical parameters of the weld flaw. Their simulation analysis emulated an ultrasonic-guided wave testing procedure to test the flaws and identify the variant geometrical traits. They argued that the floating potential data exhibited a significant advantage in relating the geometrical features of the weld defects such as their location and orientation against the energy vanishing point and decay rate. They argued that their proposed methodology showed promise in highlighting the weld flaw geometries in practical use cases. When Krizhevshy et al. [15] utilized AlexNet to win the ImageNet image classification challenge, deep learning exhibited unimaginable potential. The ZFNet, VGGNet, GoogLeNet, and ResNet systems for large-scale image categorization were suggested by the authors. Unlike typical feature extraction approaches, these methods do not require any pre-selected picture characteristics to be implemented; they can use supervised learning to learn the operator's high-level characteristics from the test data [16]. Deep learning algorithms have recently become popular for detecting weld flaws. Pattnaik et al. [17] in their work on developing a framework for categorizing pests in a tomato farm. They reported that recent studies had focused on utilizing CNNs and the myriad methods of adapting them to specific processes. The authors reported that the use of CNN’s has resulted in significant improvement in image recognition quality. They proposed a transfer learning-based CNN framework that was pre-trained for classifying the tomato plants on a farm. The authors reported that the use of transfer learning-based CNN had resulted in significant improvement in classification accuracy and obtained results with 88% accuracy through the use of the DenseNet169 model. They reported that the effectiveness of transfer learning deep CNN had demonstrated good effectiveness in the detection of pests and classification of plants. Using the notion of learning algorithms, Hou et al. [18] proposed an automatic detection scheme with three-stage detection for weld defects in X-ray imagery. The images were pre-processed to determine the weld area, and a classification model incorporating cropped X-ray images was used for training. A sliding window approach was used in distinguishing the images on the training model. They reported that their proposal was effective in detecting welding quality.

Park et al. [19] employed transfer learning to increase the accuracy of weld flaw classification in radiographic pictures. They applied a pre-trained CNN to a limited collection of weld fault photos and fine-tuned it. They obtained 96.9% classification accuracy with insufficient training data, illustrating the efficiency of transfer learning in enhancing classification accuracy. Du et al. [20] identified weld defects from X-ray images of aluminium castings of automobile parts by ROI alignment and pyramid network methods. To solve the classification challenges, the aforesaid methods frequently necessitate the creation of deep CNN architectures and the application of SoftMax. Yet, when there were fewer training samples, SoftMax performed poorly and the features were unclear. A smart recognition technique based on deep learning was presented to overcome the difficulties stated. Furthermore, the random forest was considered the best classification method since it had minimal generalization errors when interacting with classification problems [21]. Base models such as VGG net-16 and ResNet50 were used in the augmented image datasets to perform fine-tune-based transfer learning techniques. Park et al. [22] presented a deep CNN-based ensemble learning system for the automatic classification of weld flaws in radiographic pictures. They attained 96.6% classification accuracy, illustrating ensemble learning's usefulness in enhancing classification accuracy. Dubey and Jain [23] in their work on programmed facial recognition modeling, utilized the VGG16-based transfer learning technique. They reported that the complex nature of recognizing myriad human facial features and expressions had immense implications in multiple fields, such as security, behavioral, and medical sciences, for any community. They proposed the use of a framework built on deep learning by using a transfer learning technique to recognize facial expressions. They utilized existing ImageNet-trained models, VGG116 models, and modified them with additional layers. They subsequently tested the model on existing facial datasets and found around 94% accuracy and superior performance over existing facial recognition techniques. They also reported the ease of processing the technique by using Google Colab-GPU and the versatility of VGG16 and transfer learning techniques. To identify and categorize weld faults in ultrasonic pictures, Lee et al. [24] employed a deep CNN. They attained a classification accuracy of 97.5%, proving the use of deep learning algorithms in ultrasonic testing for weld fault identification and classification. Based on graph theory and CNNs, Zhao et al. [25] suggested a new approach for automated weld flaw classification. They obtained 98.1% classification accuracy, highlighting the promise of graph theory-based approaches for enhancing weld flaw classification accuracy. To diagnose weld faults in radiographic pictures, Oh et al. [26] employed a deep learning model that incorporated a CNN and a support vector machine (SVM). They demonstrated that their model could achieve excellent classification accuracy even with minimal training data, implying that deep learning approaches might be useful for weld fault recognition when data is scarce. Joshi et al. [27] explored the usage of deep learning methods in identifying Covid-19-affected patients from X-ray radiographs. They analyzed datasets involving chest radiographs of affected patients to generate classification data. They also proposed three approaches for the problem namely, a CNN model with a customized architecture and two approaches based on transfer learning using DenseNet121 and MobileNetV2 architectures respectively. They observed improved performance of the three models in the experimental investigations involving available datasets. In comparison to the existing detection methods, they noted improved performance in determining the mean absolute error.

Thus, from analyzing existing research, it has been identified that artificial neural networks and deep learning techniques have largely been used in categorizing and predicting defects and issues. Studies utilizing deep learning, CNN, and transfer learning techniques have exhibited unique advantages based on variant training methods. From the literature, it was inferred that the use of transfer learning in problems requiring categorizing and detection of material problems and defects presents an interesting take on defect identification and the opportunities they present in computation analysis and preventive solutions. The objective of the present research is to identify weld faults utilizing an image-based identification method based on the VGG16 neural network and transfer learning using X-ray radiograph datasets from welded samples retrieved from ImageNet. The base models used for the experimental investigation were ResNet50 and VGG16. The innovative aspect of this study is the use of adaptive approaches to fine-tune the VGG16 to achieve greater average accuracy levels.

2 Materials and methods

2.1 Selection of neural network model

In the present work, Keras, an open-source deep learning library, was utilized for the development of models. The open-source library was used because of its capability to be fine-tuned by transfer learning to better serve the objectives of the present work. The library contained an array of pre-trained deep learning classifiers including VGG16, DenseNet121, InceptionV3, and ResNet50. Each classifier model had different depth values, indicating the need for more resource-intensive classifiers for larger training requirements. In the present work, the availability of data is relatively low in comparison to existing works, and thus the use of a shallow depth model was considered optimal. This was based on the consideration that the subsequent model would exhibit optimal convergence while eliminating the overfitting of the model. Therefore, in the present work, both VGG16 and ResNet50 were the preferred models. The Regions of Interest (ROI) were cropped from the images in the GDXray dataset, and the cropped defect regions were used to train the models and test them. The GDXray dataset used in the present work is hosted in the following repository: http://dmery.ing.puc.cl/index.php/material/gdxray/.

2.2 VGG architecture

AlexNet had utilized ImageNet large scale for the visual recognition challenge when it was released, demonstrating it was one of the most capable in classifying images. Its major features include the use of ReLU, multi-GPU optimization, and overlapping pooling. ImageNet, a resource with over 15 million pictures annotated with over 22,000 classes, served as a general benchmark for image recognition in the present work. Even though AlexNet had all the well-developed features, the implementation of VGG Architecture was quite confusing in the beginning stage. However, the advanced features and time-consumption techniques of VGG architecture made it preferable for research. In Fig. 1, the clear architecture of the VGG network has been explained.

Fig. 1
figure 1

VGG Net Architecture

While earlier AlexNet derivatives concentrated on lower feature maps and leaps in the first convolution kernel, VGG tackles a crucial element which is the depth of CNN [15]. The advanced features of VGG Architecture in comparison to existing systems are listed in Table 1.

Table 1 Features of VGG Architecture

2.3 Transfer learning

Machine learning and deep learning are broad topics with many applications, including image recognition, speech recognition, recommendation and association systems, and so on. To create any model from the ground up, large storage and computational power are required that won't always be available. We may also encounter instances where we know strategies for improving current models, but the difficulties of retraining the models from scratch prevent us from doing so. The transfer learning technique is applied to address such problems. Instead of constructing models from the ground up, this strategy is used to pre-train models as the starting point. This enabled us to meet the challenge of developing deep learning models, which require a considerable amount of computing and storage resources. Equations (1) and (2) represent the component domains and marginal distribution, respectively. The formal description of transfer learning and its application is:

Two elements tuple with domain D, Xi as features space and P(Χ) as a marginal probability, and sample data points as Χ,

For two component domains

$$ {\varvec{D}} = \{ {\varvec{X}}_{{\varvec{i}}} ,P\left( {\rm X} \right) \} . $$
(1)

For marginal distribution

(2)

where xi denotes a specific vector.

Task T denotes the twin element tuple with a label space γ and the objective function η. From a probabilistic standpoint, the objective function can be written as P(γ|Χ).

Using domain D, and by defining the task by two components, Eqs. (3) and (4) are defined to be (adapted from [28]).,

$$ {\varvec{T}} = \, \{ {\varvec{\gamma}},P\left( {Y|X} \right)\} \, = \{ \user2{\gamma , \eta }\} , $$
(3)
$$ {\varvec{Y}} = \left\{ { \, {\varvec{y}}_{{\mathbf{1}}} , \, \ldots \, {\varvec{y}}_{{\varvec{n}}} } \right\}, \, {\varvec{y}}_{{\varvec{i}}} \epsilon{\varvec{\gamma}}. $$
(4)

where γ is the label space, η is the predictive function, xi Є X, yi Є Y are the feature vectors for the two-component pairs, and η (xi) = yi is the prediction of the correspondence label pair.

Transfer learning enabled the determination of targeted conditional probability distribution P(YT | XT) with data obtained from DS, TS with DS, and TS corresponding to DT and TT respectively. As seen in the document categorization example, the source and destination situations differ on the following means,

  • XiS ≠ XiT. The source and destination domains have separate feature spaces.

  • P(XS) ≠ P(XT). The marginal outcome and its probability from the source and target domain distributions diverge.

  • γS ≠ γT. The two tasks possess varying label spaces.

  • P(YS|XS) ≠ P(YT|XT). The source task and target task retain varying conditional probability distributions.

2.4 Dataset description

The GRIMA X-ray database was utilized in creating the weld X-ray database used in this investigation. The X-ray radiographs were acquired based on the techniques specified by the ISO 17636–1 standard for radiographic examination of metallic fusion weld joints. A Lumisys LS85 SDR scanner was used to digitize the radiography films. With a linear LUT proportional to optical film density, 8-bit data were rescaled from the original 12-bit data. The radiographs have a pixel size of 630 DPI in TIFF format. The square regions of interest that contained the defects were manually separated from full-scale radiographs during image annotation. The experimental weld defect sample images are illustrated in Fig. 2. The VGG16 and ResNet50 are pre-trained models with 224 × 224 pixels image sizes. The resulting images were subjected to cropping at a 1:1 ratio to avoid distortions and were categorized into 14 weld defect classes. The number of image samples for different classes revealed significant skewing in the distribution. The performance of the classifier is subjected to being affected by this skewing when trained with this data. The performance of the classifier is also affected by minority classes.

Fig. 2
figure 2

Defect type, number of images in each class, and a sample image of each class

2.5 Training and compiling

As an initial step to prepare the image for our convolutional neural network, the data was loaded and preprocessed by converting it to RGB channel ordering and resizing it to 224*224 pixels. The pixel intensities were scaled to the range of 0 to 1. Data augmentation was done by generating new data from the available training data by applying random geometrical transforms such as zoom, shift, flip, and more in different combinations to expand the data available for training with new plausible samples. It ensured better generalization of the present model and avoided overfitting. The labels are categorical features that cannot be applied to the machine learning models as it is, so they were transformed using one hot encoder (Fig. 3). The data has been split into 80% for training and 20% for testing.

Fig. 3
figure 3

Workflow of weld defect classifier training

In our previous work [29], a trained convolutional neural network was utilized as a feature extractor for transfer learning and it was used to propagate the image dataset and extract the activations at each layer, and subsequently, the features were saved. A standard machine learning classifier system was trained alongside CNN. In the present work, a fine-tuning technique was implemented where a pre-trained CNN model (VGG16/ResNet50) was used without the final connected layers. The ‘head’ of the network, where the class label predictions are returned to, is replaced by a new fully connected layer set possessing randomly initialized trainable parameters (Fig. 4). The construction of the head includes an average pooling layer of pooling size (4, 4), a completely connected 64-node layer and relu activation, a final complete layer with Softmax activation, and a dropout layer whose number of nodes is the same as several classes need to be classified (Fig. 5).

Fig. 4
figure 4

The head of the pretrained VGG16 model got chopped off and replaced with a randomly initialized head

Fig. 5
figure 5

The new fully connected layer was appended on top of the VGG16

The proposed models were trained in the Google Colab coding environment using available free resources, namely the Nvidia K80s, P4s, T4s, and P100s GPUs. To develop the inference and training scripts, the Python SDK was used. To implement the deep learning models, the open-source Keras library was used, as it is a high-level framework for deep learning with TensorFlow and Theano as the base. The pretrained weights for the present VGG16 and ResNet50 were imported from the Keras library. For training and inference, open-source libraries such as Numpy, Seaborn, and Scikit learn were used. The tensors were handled by Numpy and Scikit learn, and the character plotting was handled by Seaborn. The layers beneath the head were frozen, preventing adjustments in weight. The network was trained so that the newer, fully connected layer sets were taught the previous CONV pattern levels formerly used in the network. The training was continued by making the rest of the network unfrozen and using recognition of classes in pretrained networks that were not initially trained by fine-tuning. The final trained models were categorized into pickle files. The training was initiated with the following non-trainable parameters.

  • Dropout layer is used for the newly built head of VGG16 for enhanced model generalization. The corresponding fraction of input units to drop is set to 0.5.

  • Two fully connected layers are present with either 64 or 15 nodes.

  • Training of the network is done by Adam optimizer with a pre-scheduled learning rate. The initial learning rate is 1 × 103 with the decay of the initial learning rate/number of epochs. The initial high learning rate followed by slow decay is used to eliminate early overfitting.

  • The batch size of 16 is used. A categorical cross-entropy loss function was used.

3 Results and discussions

3.1 Experimentation and result

Google Colab web IDE was utilized for the experiments. This IDE provides a free GPU instance. The open source library, Keras, running on the Tensorflow backend, was used for the implementation. The pretrained models and their weights were imported from Keras itself. The experimentation was done using two different base models VGG16 (138 million trainable parameters) and ResNet50 (23 million trainable parameters). During the training process, the model’s learning performance was monitored by using the learning curves. During training, the learning curves were used to identify overfitting or underfitting problems in the model learning so that the generalization behavior could be diagnosed. In the learning curves, the training curve showed a favorable model learning rate, and the generalization of the model was illustrated by the validation curve. The learning curves are given below. The learning curve of the model using ResNet50 as the base showed that the model did not get generalized. While training the model for the pre-determined epochs, the validation of the model was not generalized, even though the training learning curve indicated good learning progress for the model. Increases in the validation loss and corresponding drops in the validation accuracy were indications of poor generalization. However, the training learning curve appeared satisfactory with increasing training accuracy (performance learning curve) and decreasing training loss (optimization learning curve), as shown in Fig. 6.

Fig. 6
figure 6

Train learning curve and validation learning curve of ResNet50 model fine-tuning

The learning curves of the model using VGG16 as the base showed relatively better learning and generalization characteristics. While looking at the performance learning curves, the validation accuracy was higher than the training accuracy, which indicated the model was generalized and could perform well using unseen data. The characteristics illustrated the model’s improvement with an increasing number of epochs (Fig. 7).

Fig. 7
figure 7

Train learning and validation curve of VGG16 model fine tuning a VGG16 model fine-tuning up to 100 epochs; b VGG16 model fine-tuning up to 400 epochs c VGG16 model fine-tuning up to 700 epochs

From the figure, it is inferred that balanced learning happened along the network with an increasing number of epochs. For better inference, the validation of the model confusion matrix is plotted for the epochs 100, 400, and 700 in Fig. 8. From Fig. 7, it is inferred that there is an increase in the prediction of some classes in each epoch, which enhances the performance of the model with better identification of models without any misclassified predictions. As per the experimentation, it was indicated that although the welding dataset taken for this experiment was smaller having only 15 classes, the fine-tuned transfer learning approach provided a better model performance. The performance metrics are listed in Table 2.

Fig. 8
figure 8

Confusion matrix of VGG16 model fine-tuned on weld defect data for epochs: counts of classified images along diagonal. a VGG16 model confusion matrix fine-tuned for weld defect data for 100 epochs; b VGG16 model confusion matrix fine-tuned for weld defect data for 400 epochs; c confusion matrix of VGG16 model fine-tuned on weld defect data for 700 epochs

Table 2 Evaluation reports of classifiers fine-tuned on weld defect data of 15 classes, using VGG16 as the base model

In the present work, the defect regions were manually cropped, trained, and tested. The deep learning models that were used had input sizes of 224 × 224x3 (for both VGG16 and ResNet50) with a 3-channel image of 224 × 224 pixel size. The performance metrics of the proposed model in the present work are shown in Table 2. The recall, F1 score, precision, and accuracy were the performance metrics utilized to evaluate the classifier models. These obtained performance metrics were predicted by the model using the validated data, and the performance of the classifier model was indicated. The proposed model mentioned in the support column of Table 2 was tested on a set of 264 images (out of a total of 1320).

3.2 Comparative study

The obtained results from the present work were compared with the existing work in this section. A VGG net-based model was trained on a small welding X-ray image dataset with three classes. The existing work used data augmentation techniques to expand their training data and attained average accuracy of 95%. A dataset of 3 classes, including 57 lack of penetration images, 44 pinhole images, and 115 porosity images was used for training [18]. Image smoothing, segmentation, and feature extractions were done; subsequently, the feature vectors were used to train a multiclass support vector machine (MSVM), capable of classifying 6 different types of weld defect [30]. The dataset consisted of 4 classes which included porosity, crack, lack of penetration, and slag inclusion classes with 3503 cropped patches as a whole. The deep feature extractors were compared with traditional feature extractors. As a result, the classifier performed better on deep features and exhibited an accuracy of 97.2% [31]. The deep feature extractor consisted of 4 convolution layers with a single fully connected layer. A multilevel feature for classifying weld defects by a united deep neural network was investigated [32, 33]. There were 220 samples in total, divided into five classes, with 50 samples for slag inclusion, a lack of penetration and porosity, 35 samples for cracking, and 35 samples for fusion. The proposed model reported an accuracy of 91.36%. In the previous work [29], pre-trained models like ResNet50 and VGG16 with ImageNet weight were used for feature extraction, and machine learning methods such as logistic regression, random forest, and SVM were trained on the extracted features. The SVM model trained by the featured ResNet50 exhibited good results. However, the inference process required two models: a pretrained model for feature extraction and a machine-learning model for classification. A more detailed review of the comparison against the existing and the present work is discussed in Table 3. The existing works listed in Table 3 used similar weld defect radiograph images for training and testing the classifier models. In particular, [18, 30] utilized the same GDXray image dataset in their investigation. The average macro accuracy of the proposed model was compared to the other existing works; the versatility of the proposed model is indicated by the number of weld defects classified by the proposed model in the capable identification column in Table 3. The present work is an improvement in terms of enabling the VGG16 deep neural network in terms of detecting up to 14 types of weld defects which is higher than the capability reported by Ajmi et al. [27] and Liu et al. [33] in their study. The present work utilized ImageNet as the benchmark for achieving 90% accuracy by using the transfer learning technique. In comparison to the deep learning models developed by Hou et al. [18] and Yang and Jiang [34], the present work's Fine-tuned pretrained VGG16 is a more modern and larger network with a proportionally longer parametric training time.

Table 3 Comparative study

In consideration of the above-mentioned works on weld defect classification, the results obtained in the present study appear promising since it does not require complicated feature extraction processes. The generalization of the model exhibits good results, even with smaller datasets. The performance of the model is well within acceptable limits, even with up to 15 weld defect classes. This indicates the promising capability of this approach, irrespective of data distribution imbalances and limited sample numbers. Furthermore, when the need for model optimization or application scaling up resource management and related processes arises, it will be relatively simple due to the usage of a single model for feature extraction and classification in the present work.

4 Conclusion

In the present work, a deep learning model for weld defect classification was developed. A pre-trained VGG16, with ImageNet weights, was used as a base model, and through transfer learning, the model was fine-tuned so that it could identify and classify weld defects. Even though the training data size was small, the data augmentation technique served the requirements of the process and provided better model generalization.

  • The novel VGG16 model automatically extracted deep features from the input image data, eliminating the need for complicated feature extraction processes and providing rich features to the successive layer of the model for classification.

  • The head of VGG16 was replaced, and the parameters of the head were trained alone while the remaining layers were frozen, resulting in a reduced number of parameters that needed to be trained and thus lower computational requirements.

  • The number of classes was raised to 15, with fewer samples in each class, and the learning curves obtained were satisfactory. The validation curves indicated good model generalization.

  • From the results obtained, it was affirmed that the present approach is capable of being adapted for other industrial computer vision classification applications with low training data availability.

  • The future scope of the present work can be varied such that it could include: multi-class defect identification, real-time defect identification, automated defect recognition, and integration with other technologies.