1 Introduction

Laser welding has the advantages of small welding deformation, small heat-affected zone, high energy density, and high efficiency, so it has become one of the crucial automatic processing technologies [1,2,3]. Defects will inevitably occur in the welding process, and it is necessary to research the physical feature signals when the defects occur. In order to obtain the feature signal in the welding process, the acquisition technology of the sensor signal is involved. The sensors that collect the feature signals of the welding area mainly include spectrometers [4], photoelectric sensors [5], high-speed cameras [6,7,8], X-rays [9]. Photoelectric and visual monitoring methods are widely used [10,11,12]. A wealth of welding information can be obtained through optical and visual sensing, such as molten pool, keyhole, splash, plume, and other morphological features. Dynamic behavior of weld pool, keyhole, and the plume is closely related to weld state and weld appearance [13, 14].

After obtaining reliable welding signals, machine learning methods are usually used to build welding detection models [15,16,17,18]. Chandrasekhar et al. [19] collected infrared thermal images of the molten pool, extracted image features such as length and width of the molten pool, and trained an artificial neural network (ANN) model to predict weld width and penetration. Wang et al. [20] researched the reconstructed three-dimensional molten pool surface and established three intelligent prediction models of weld depth state based on back-propagation neural networks (BPNN). The analysis results show that the proposed real-time monitoring model can predict the penetration state economically and accurately. Lee et al. [21] extracted the statistical features of emission spectrum such as mean, standard deviation, peak value, and skewness, and then used k-nearest neighbor (KNN) and support vector machine (SVM) methods to build welding monitoring models. The results show that the monitoring models can successfully classify welding defects. The traditional neural networks have the disadvantages of long training time and easy to fall into local optimum [22, 23]. For the past few years, research on deep learning applications in welding monitoring has been increasing by degrees [24,25,26,27]. The convolutional neural network (CNN) is one of the most widely used deep learning models. The convolutional neural network has local connection, weight sharing, and pooling operations characteristics. Compared with traditional neural networks, CNN can reduce the number of training parameters, reduce the complexity of the network effectively, and improve training performance [28, 29]. The CNN can be divided into one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D) convolutions to process signals, images, and videos separately [30].

It is one-sided to distinguish the defect state and the good quality state by simply comparing the values of various sensors signals. In the case that the weld states are different but the signal values are similar, judgment errors will be caused. Therefore, this paper uses photoelectric sensors and high-speed cameras to collect light radiation, keyhole, and plume information, and applies rolling mean and rolling standard deviation methods to obtain more abundant sensor signal fluctuation information for fusion detection. The 1D CNN model is especially suitable for the analysis of sensor time series data, and it has the advantages of reducing training parameters and high computing performance. In this paper, the 1D CNN model is used and improved. The mechanism of dynamically adjusting the learning rate is added to obtain the best prediction model, thereby realizing the quality prediction of laser welding.

2 Laser Welding Experiment

2.1 Experimental Equipment

Figure 1 shows the device diagram of the high-power laser welding test platform. The experiments used disk laser welding equipment (Trumpf-16002) with a maximum power of 16 kW. The diameter of the disc laser beam is 300 μm. The laser wavelength is 1030 nm, the range of the focusing position is − 4–4 mm, the range of welding speed is 2–4 m/min, and the actuator is a Yaskawa six-axis robot.

Fig. 1
figure 1

Experimental equipment for laser welding

The sensor data acquisition system mainly comprises photoelectric sensors and high-speed cameras. The laser head uses a one-way reflector and focusing lens to capture the optical radiation in the welding area and transmits it to the photoelectric sensor. The photoelectric sensor converts optical radiation into visible light intensity and reflected light intensity. The signal sampling frequency of the photoelectric sensor is 500 k Hz. A high-speed monochrome camera is positioned at an angle of 60° from the horizontal direction and equipped with a narrowband filter (wavelength 976 nm) to capture the keyhole image above the welds. A high-speed color camera equipped with a visible light filter is installed perpendicular to the welding direction to capture clear, dynamic plume images. The acquisition frame rate of the high-speed camera is 5000 frame/s, and the image resolution is 512 pixel × 512 pixel.

2.2 Experimental Conditions

The material used in the welding test is 780 MPa high-strength steel with a thickness of 12 mm. The shielding gas is argon, and the shielding gas flow is 30 L/min. The defocus is − 3 mm, and the welding speed is 3 m/min. The welding laser power was changed from 6 to 14 kW in increments of 1 kW, and nine experiments were completed. Table 1 shows the test conditions.

Table1 Laser welding test conditions

3 Image Feature Extraction

3.1 Preprocessing of Keyhole and Plume Image

Taking the laser power of 11 kW as an example, the images of the keyhole region and the plume region were obtained by a high-speed monochrome camera and a high-speed color camera, respectively, and the 2000th frame image was taken for image processing and feature extraction.

As shown in Fig. 2a, the region of interest (ROI) was first extracted from the keyhole region taken by the monochromatic high-speed camera. After binarization and median filtering, splash noise was removed, and the keyhole image was obtained after image segmentation. The extraction process of the plume is similar to the extraction of the keyhole image. As shown in Fig. 2b, after the extraction of the region of interest, binarization, and median filtering, there is still a tiny amount of small splashes around the plume. The open operation eliminates the small splashes and obtains a clear plume image. The open operation is a processing method of first eroding and then dilating the image. Erosion and dilation are a pair of opposite operations. Corrosion removes boundary points and causes them to shrink inward. Erosion can incorporate points that are in contact with the boundary, causing the boundary to expand outward. After the opening operation, the boundary can be smoothed and the remaining small splashes can be eliminated, but the size of the target image will not be changed obviously.

Fig. 2
figure 2

Feature extraction. a Extraction of keyhole, b extraction of plume

3.2 Feature Normalization

The optical radiation signal is divided into visible light and reflected light through the photoelectric induction spectrometer. Visible light and reflected light are displayed by the oscilloscope after being amplified by the amplifier. The oscilloscope can export one-dimensional data through the computer, and obtain two feature signals of visible light intensity and reflected light intensity. Using the contour area function of the open source computer vision library (OpenCV), the total number of pixels of the keyhole contour and the plume contour can be calculated. The keyhole area is estimated by multiplying the number of pixels in the processed image and the pixel resolution of the monochrome camera. The plume area is estimated by multiplying the number of pixels in the processed image and the pixel resolution of the color camera.

The feature values extracted by different sensors tend to have different dimensions, and the range of their values varies greatly. Features with large values have a more significant impact on the prediction model than features with small values. Therefore, to eliminate the dimensional influence of features, the visible light intensity, the reflected light intensity, keyhole area, and plume area extracted by the photoelectric sensor and high-speed camera are normalized.

The data is normalized by the maximum and minimum normalization method so that the numerical value of each feature is mapped to the [0,1] interval. The normalized features can be expressed as

$$x^{^{\prime}} (t) = \frac{{x(t) - x_{\min } }}{{x_{\max } - x_{\min } }},t = 1,2,...,N$$
(1)

where xmax and xmin are the maximal and minimal values of x(t).

The normalized feature curve is shown in Fig. 3. Quality defects appeared at both the front and tail of the weld, and the four feature signal curves of visible-light intensity, reflected-light intensity, keyhole area, and plume area fluctuated at the defect. It is worth noting that the feature values in the defect state may be close to those in the good quality state, but the difference between them lies in the different degrees of fluctuation. The fluctuation degree of the feature signal in the defect state is more significant than that in the good quality state.

Fig. 3
figure 3

Normalized values of optical and visual features

3.3 The Fluctuation Degree Indicators of Sensor Features

It is one-sided to distinguish between the defect state and the good quality state simply by comparing various feature values. Because there are cases where the welding state is different, but the feature values are similar, these will lead to judgment errors. However, the fluctuation of the feature signal can more truly reflect the state of the weld, so calculating the fluctuation degree of the feature signal is conducive to distinguishing the state of the weld more obviously. There are mainly two statistics to judge the stability of sequential data, the mean value and the standard deviation(std). A set of sequential data is stable when the mean and standard deviation of features at each moment of this set of sequential data do not change significantly over time. In order to obtain the feature fluctuations at all local moments, the mean and standard deviation of sequential data at each moment is calculated by using the rolling mean method and the rolling standard deviation method. Rolling refers to the sliding of a calculation window. Assuming that the width of the sliding window is w, for the data at a specific time, its rolling mean refers to the mean of the former w data points. After calculating a rolling mean, the window slides backward by one unit, and the rolling standard deviation is similar to the rolling mean. The rolling mean is expressed as

$$x_{mean} (n) = \frac{1}{w}\sum\limits_{i = n - w + 1}^{n} {x_{i} }$$
(2)

where n is a certain moment, w is the size of the sliding window, and xi is the feature value at a specific time.

The rolling standard deviation is expressed as

$$x_{std} (n) = \sqrt {\frac{1}{w - 1}\sum\limits_{i = n - w + 1}^{n} {(x_{i} - x_{mean} (n))^{2} } }$$
(3)

where xmean is the rolling mean, and other parameters are the same as those in Formula 2.

The rolling mean and rolling standard deviation of the normalized values of each feature are shown in Fig. 4. Defects appear at the front and rear of the weld, and the sliding means of four features are all larger than the sliding averages of the stable state in the middle of the weld. Moreover, the rolling standard deviations of visible light intensity and plume area are also more significant. In contrast, the rolling standard deviations of reflected light intensity and keyhole area are not significantly changed. When the values of each original feature in the defect state and the good quality state are very close, comparing the fluctuation degree of the features makes it easier to distinguish the state of the weld.

Fig. 4
figure 4

The rolling mean and standard deviation of each feature

4 Convolutional Neural Network Prediction Model

4.1 Model Structure

The four continuous signals are one-dimensional time series data. The 1D CNN model we built can take four or even more one-dimensional continuous signals as input. Three 1D CNN models are established. The first model takes visible light intensity, reflected light intensity, keyhole area and plume area as inputs. In the second model, the rolling mean of four continuous signals and rolling standard deviation of four continuous signals are used as inputs. The third model takes four continuous signals, the rolling mean of four continuous signals, and the rolling standard deviation of four continuous signals as inputs. The defect status and good quality status of the welds are used as outputs. Finally, 21,418 samples were collected as training sets and 4,300 samples as test sets.

The structure of the 1D CNN model is shown in Fig. 5. This model establishes four groups of convolution and pooling layers, and the last pooling layer connects with a full connection layer. The kernel size of the convolution layer is 3, and the stride is 1. The kernel size of the pooling layer is 2, and the stride is 1. The number of filters from the first convolution layer to the fourth convolution layer is 32, 64, 128 and 256.

Fig. 5
figure 5

Structure of 1D convolution neural network model

The input of the model is one-dimensional feature data, and the convolution kernel of the one-dimensional convolution model is calculated by sliding in one-dimensional space. The expression of one-dimensional convolution layer operation is

$$Z^{I} = f\left( {\sum\limits_{i = 1}^{M} {X_{i}^{I - 1} } * W_{i}^{I} + b_{i}^{I} } \right)$$
(4)

where the Roman numeral I is the first convolution layer, i is the serial number of the convolution kernel, ZI is the output matrix of the first convolution layer, XiI−1 is the input matrix of the first layer, WiI is the weight matrix, biI is the bias, * is the convolution operator, and f(·) is the activation function. The activation function of the convolution layer adopts rectified linear units function (ReLU). ReLU function can be expressed as

$$f = \max (0,x)$$
(5)

The down-sampling layer is also called the pooling layer, which aims to reduce dimension, reduce weight parameters and prevent the model from over-fitting. The model pooling layer adopts the maximum pooling method, and its expression is

$$P^{I} = \mathop {\max }\limits_{w} \left\{ {Z^{I} } \right\}$$
(6)

where PI is the output of the first pooling layer, w is the width of the pooling window, and ZI is both the output of the first convolution layer and the input of the first pooling layer. After multiple layers of convolution and pooling operations, the output of the last pooling layer is used as the input of the fully connected layer. The full connection layer has a hidden layer with 512 neurons in total. The calculation formula of the full connection layer is

$$D = f(WX + b)$$
(7)

where D is the output matrix of the fully connected layer, W is the weight matrix of the fully connected layer, X is the input matrix of the fully connected layer, b is the bias of the fully connected layer, and f(·) is the activation function.

The activation function of the fully connected layer is the normalized exponential function, also known as the Softmax function. It can be expressed as

$$q_{j} = \frac{{\exp \left( {D_{j} } \right)}}{{\sum\nolimits_{i = 1}^{C} {\exp \left( {D_{i} } \right)} }}$$
(8)

where C is the type of weld states. Its value is 2, that is, the two states of the weld: the good quality state and the defect state. Di and Dj are the calculated values in the previous fully connected layer connected to the Softmax layer, and both i and j represent a certain weld state. And qj is the predicted probability value of the jth weld state. The input feature signals are calculated by each layer of the model, and finally the predicted probability values of the two weld state types of good quality state and defect state are obtained through the Softmax function. The sum of the predicted probability value of good quality state and defect state of weld is 1. The good quality state of the weld is defined as label 0, and the defect state of the weld is defined as label 1. The output of the model is defined as the type of weld state with a larger predicted probability value. If the predicted probability value of good quality state is more than the predicted probability value of defect state, the output is label 0. If the predicted probability value of good quality state is less than the predicted probability value of defect state, the output is label 1.

The weight and bias of the 1D CNN model were adjusted according to the loss function calculated after each training round. The loss function of the model is the Binary Cross-Entropy Loss function, and its expression is

$$loss = \sum\limits_{(x,y) \in M} { - y\log (y^{^{\prime}} } ) - (1 - y)\log (1 - y^{^{\prime}} )$$
(9)

where (x, y) ∈ M is the data set of labeled samples, y is the label of the sample, also called category. The label is divided into 0 and 1. Furthermore, y’ is the predicted probability value of the feature set x, between 0 and 1.

4.2 The Settings of Hyperparameters and the Dynamic Adjustment Mechanism of Learning Rate

The settings of primary hyperparameters in the 1D CNN model are as follows: the training epochs are 50, the number of samples learned in each batch is 64, and the initial learning rate is 0.0001.

The learning rate setting is essential to the model's predictive performance. Choosing an initial learning rate that predicts well after multiple tests, but it is challenging to know the optimal learning rate of the model in advance. It is still possible to set the initial learning rate too large to result in faster learning, easily missing the optimal model corresponding to the lowest loss value. Therefore, it is necessary to add a dynamic learning rate adjustment algorithm for the model, which can learn quickly and find the model with the best prediction performance. The expression of dynamic adjustment of learning rate is

$$\eta (k) = \beta \eta (k - 1)$$
(10)

where η(k) is the learning rate of the kth epoch, η(k-1) is the learning rate of (k-1)th epoch, β is the learning rate adjustment factor, and it is set as 0.2. In order to improve the model, the learning rate is associated with the accuracy of the verification set. The number of tolerable epochs without adjusting the learning rate is set to 5 epochs. If the accuracy of the verification set does not improve after five epochs of training, then formula (10) is used to adjust the learning rate once. After each training epoch, the model with the highest accuracy of verification set in the training epoch will be saved.

4.3 Results Analysis of 1D CNN Prediction Model

Figure 6 is the accuracy of each 1D CNN prediction model. When the input training set has only four sensor features: visible light intensity, reflected light intensity, keyhole area, and plume area, the best training set accuracy of model 1 is only 88.43%, and the best verification set accuracy is only 90.18%. When the values of the four sensor features above are replaced by their rolling mean values and rolling standard deviations, the best training set accuracy of model 2 is improved to 98.99%, and the best verification set accuracy is improved to 97.67%. After combining the input training set in model 1 and model 2, the best training set accuracy of model 3 is raised to 98.99%, and the best verification set accuracy is raised to 97.67%.

Fig. 6
figure 6

Accuracy of each prediction model

The accuracy of the three models increased in turn. With only laser welding sensor features as input set, model 1 has the lowest training and verification accuracy. The training accuracy of model 2, which uses the fluctuation degree indicator of sensor features as the input set, is improved by 10.6 percentage points, and the verification accuracy is improved by 7.5 percentage points. Compared with model 2, the training accuracy of model 3 relying on fusion detection of sensor features and fluctuation degree indicator is 0.75 percentage points higher than model 2, and the verification accuracy is 2.3 percentage points higher than model 2.

The loss value curves of the three 1D CNN prediction models are shown in Fig. 7. The loss function curve of model 1 quickly converges and stops decreasing after 15 epochs of training, and the training loss and verification loss eventually stay at 0.3 and 0.26. The loss function curves of model 2 and model 3 begin to converge around the 30th epoch. The loss value of model 2 finally approaches 0.05, while that of model 3 drops even lower, approaching 0.0015. The smaller and smaller loss values of the three models also indicate that the prediction performance of the three models is steadily improving.

Fig. 7
figure 7

Loss of each prediction model

The learning rates of the three 1D CNN prediction models are shown in Fig. 8. The optimal learning rate of the three models is very similar, the optimal learning rate of model 1 and model 2 is reduced to \(2.56 \times 10^{{{ - }10}}\), and that of model 3 is reduced to \(3.2 \times 10^{{{ - }8}}\). It can be found that the learning rate of the three models has the same times of adjustment (4 times), but model 2 adjusts the learning rate after model 1, and model 3 adjusts the learning rate after model 2. The longer the continuous epochs of a learning rate parameter, the more features the model can learn under this learning rate parameter, and the greater the space for performance improvement. The learning rate dynamic adjustment algorithm successfully avoids multiple premature convergences of the prediction model and saves the defect prediction model with the best performance.

Fig. 8
figure 8

Learning rate of each prediction model

After the 1D CNN prediction model was trained with three kinds of inputs, the reliability of the model prediction accuracy was verified by the test data set. Figure 9 shows three models' defect prediction results of a new weld seam. In the figure, ordinate 0 represents the good quality state, and ordinate 1 represents the defect state. Model 1 can only predict the defect position of weld nodules at the tail of the weld, but it is difficult to distinguish the good quality state and the defect state. Model 2 can accurately predict the weld defect at the front and tail of the weld and most of the good quality positions, but there are still a few good quality states that are wrongly judged as defect states. With only minimal error, model 3 has high accuracy in predicting the state of the whole weld, and very few good quality states are wrongly judged as the defect states.

Fig. 9
figure 9

Result of each prediction model

Table 2 shows the prediction accuracy data of each model. The prediction accuracy of model 1 is 79.26%, that of model 2 is 97.05%, and that of model 3 is 99.21%. The prediction accuracy of model 1, which is about 11 percentage points lower than the validation accuracy, has dissatisfied generalization performance. Meanwhile, the prediction accuracy of model 2 and model 3 is slightly lower than the validation accuracy, indicating good generalization performance.

Table2 Prediction accuracy of each model

The results show that the low prediction reliability only relies on values from sensors in the laser welding process. The fluctuation degree indicators of the sensor features can distinguish the quality states of the weld better than the sensor features. Furthermore, the fusion detection model combined sensor features with fluctuation degree indicators can reliably predict the state of the weld.

5 Conclusion

A state prediction model of high-power disc laser welding is established. The model can predict the welding state through the data collected by laser welding sensors. The key findings can be summarized as follows:

  1. (1)

    In the process of high-power disc laser welding, photoelectric features, keyhole, and plume features were obtained by using a photoelectric sensor and high-speed camera. The fluctuation degree indicators of sensor features were obtained using the rolling mean and standard deviation methods. The regularity of fluctuation degree shows that the rolling mean and rolling standard deviation of sensor features will increase when the weld seam has defects.

  2. (2)

    The 1D CNN model based on the improved dynamically adjusted learning rate algorithm was established. After training experimental data, the dynamically adjusted learning rate algorithm can effectively avoid convergence in advance and achieve the best performance.

  3. (3)

    The training accuracy and verification accuracy of model 3 can reach 99.74% and 99.97%, respectively. When the new weld is used in the reliability test, the prediction accuracy can reach 99.21%, which means fusion detection of sensor features and fluctuation degree indicators has high predictive accuracy.