Introduction

Laser scribing is a laser micromachining technique which uses laser scanning to make a shallow scribe line on a surface. It has been extensively studied using short pulse lasers (i.e. picosecond and nanosecond) for solar cell applications (Ku et al., 2013; Wang et al., 2019; Zhao et al., 2014). Generally, higher pulse energy and lower pulse duration mean higher productivity, but with some negative effects on scribe quality (Leitz et al., 2011; Wang et al., 2020). On the other hand, lower pulse energy means less energy waste and melting of the process material (Yao et al., 2005). To overcome thermally induced damage due to melting, recast and microcrack formation in laser scribing, process optimization through modeling is a viable approach.

Traditionally, scribing quality issues can be detected using optical and geometrical inspection. Variation in laser parameters such as pulse energy, pulse duration, repetition rate, and scanning speed can occur at any time scale. Those variations may result in several scribe issues such as debris, crack, missing pulse, or un-straight lines (Wang et al., 2021). Hence, it is very important to identify and prevent these defects during a scribing process. Scribing errors are easy to be fixed right after scribing since the defect locations are known (H. Roozbahani et al., 2018; Hamid Roozbahani et al., 2019). Note that the usual inspection and testing for the final product (e.g. solar panels) do not help since detected defects will lead to scrapping the product.

Some recent studies deployed image analysis in monitoring different laser based manufacturing processes such as additive manufacturing (Chua et al., 2017; Delli & Chang, 2018; Fotovvati et al., 2018; Grasso et al., 2018; Grasso et al., 2017; Imani et al., 2018a, b, 2019; Najjartabar-Bisheh et al., 2021; Yuan et al., 2019) and laser welding (Gonzalez-Val et al., 2020; Marco Grasso et al., 2017; Mayr et al., 2018; Shevchik et al., 2020; Shevchik et al., 2019; Zhang et al., 2020). For example, Imani et al. (2018b) attempted to relate pore size and location to laser powder bed fusion (LPBF) parameters. In their study, they built nine titanium alloy cylinders on a commercial LPBF machine (EOS M280) at different laser power, hatching spacing, and velocity conditions. Multifractal and spectral graph analysis enabled them to monitor and discriminate process deviations with around 80% statistical fidelity. Later, Imani et al. (2019) used a deep neural network (DNN) for inspection and quality control of 362 regions of interest (ROIs) representing 362 layers of a titanium alloy. A DNN algorithm called AlexNet can detect the lack of fusion flaws with 92 percent accuracy.

Various defects can occur during selective laser melting (SLM) that could be detected during the process using images (e.g. improper heat conduction in overhang features, wrong powder deposition due to a worn recoating blade, or improper heat conduction to the underneath powder at the connection between the bottom layers of the part and the supports) (Grasso et al., 2017). SLM process monitoring might be even much more challenging for difficult-to-process materials (e.g. zinc and its alloys) (Grasso et al., 2018). They compared several image segmentation methods on zinc powder ROIs to detect stable and unstable meting conditions using multivariate control charts. Their study showed that process monitoring of some difficult-to-process materials could be completely automated using suitable image segmentation techniques.

In a laser-induced material melting-solidification process, the quality of welded parts might be deteriorated by porosity, cracks, lack of fusion, and incomplete penetration (Zhang et al., 2020). Even though machine learning has been used and explored in laser welding more than in other applications of laser technology, challenges still remain in making laser welding processes more stable using advanced techniques for quality monitoring (Mayr et al., 2018). Recently Gonzalez-Val et al. (2020) released first large dataset of laser metal decomposition (LMD) and laser welding (Gonzalez-Val et al., 2020). This dataset primarily includes 1.6 million images in which 24,444 of them are labeled as defect.

Based on the general performance of convolutional neural network (CNN) on image data, Mayr et al. (2018) examined a shallow CNN to monitor irregular weld seam, recessed weld seam, undercut, weld bead, and holes and spatters in laser welding. Their combined quality monitoring system was able to detect 209 out of 227 bad parts. Shevchik et al. (2020) used hard X-ray radiography images to train a supervised DNN to reveal the unique signature of sub-surface events in wavelet spectrograms from the laser back-reflection and acoustic emission signals. Using 300 images in training and 100 in test set, their quality classification was able to achieve an accuracy between 71 and 99% (Shevchik et al., 2020). Shevchik et al. (2019) adopted a graph support vector machine with data adaptive kernel approach and 23 laser welds as the dataset to achieve an accuracy ranging between 85.9 and 99.9%.

Current physical models are capable of predicting certain geometric aspects of laser scribing such as scribe width and depth. However, several other important quality measures cannot be obtained from the model. These quality measures include heat affected zone, debris, and micro-cracks (Roozbahani et al., 2017). Roozbahani et al. (2018) defined discontinuance as a kind of defect in laser scribing and tried to detect discontinuance area in copper indium gallium selenide solar panels using a particle analysis algorithm. However, laser scribes might also suffer from several other quality issues.

To the best of our knowledge, no existing models can predict all aspects of laser scribing quality, which can be attributed to the following three main reasons. First, laser scribing using short laser pulses is a very complicated process involving many laser parameters and various physical processes in which mechanisms are not completely understood. Second, ultrafast laser-matter interaction is a highly dynamic process and materials are first pushed to a highly non-equilibrium state followed by a rapid hydrodynamic motion, resulting in material ejection. During this process a material experiences fast phase changes and property changes (e.g. physical, optical, mechanical, electrical, etc.), making it extremely difficult to obtain reliable material data to feed into a model. Finally, the uncertainties associated with physical equipment (e.g., laser power fluctuation) and environment (e.g., temperature, vibration) can derail a model from giving reliable predictions since many of these process variations are treated as noise and thus not being considered in a physical model.

With the advent of machine vision and machine learning (ML), an opportunity arises for an Artificial Intelligence (AI) framework to be used to monitor and characterize a laser scribing process with multiple quality features including debris, scribe width, and scribe straightness. The inputs of the proposed framework are images while the scribing is taking place and the output is a classification reports on the quality characteristics under consideration. We propose a deep learning method for the AI framework. Considering the fact that each problem might need a new dataset and high cost of providing labeled dataset for supervised ML that give sufficient accuracy, the main challenge is to use the least amount of images possible to train such a model and able to monitor all aspects of aforementioned scribing quality.

Deep learning (DL) and convolutional neural network (CNN) showed promising performance among other ML methods in recent years in different contexts from autonomous driving to medical image analysis (Badrinarayanan et al., 2017; Sejnowski, 2019). However, these methods need significant amount of data for training a model with adequate accuracy (Bauer & Kohavi, 1999; Bosch et al., 2019; Li et al., 2014). Collecting image data may not be a big challenge, but pre-processing and labeling of these images for training is. This data preparation stage is the most time-consuming and costly step in any machine vision/DL applications. One way to alleviate this problem is the use of transfer learning (Li et al., 2017).

Existing supervised ML methods such as decision tree and other methods based on various trees such as Random Forest (RF) or Gradient Boosting Classifier (GBC) may not need as much training data as the DL/CNN models would but still require a large amount of data. In addition, these traditional ML methods may not be able to handle complicated problems such as semantic segmentation with multiple quality characteristics. In such a complicated problem, there might be several classes of objects to be classified or each might have different geometrical shape and color (Li et al., 2014). A new approach to alleviating the lack of labeled data problem is called Transfer Learning (TL) where the knowledge gained from a different and yet similar problem can be used to solve another problem. Figure 1 demonstrates how the knowledge could be transferred from a pre-trained model to a new model where Softmax layer is an extension version of logistics regression idea into a multi-class world.

Fig. 1
figure 1

An example of feature transfer from a pre-trained model in TL

In the last few years, several studies used image data to monitor laser-based manufacturing processes. Table 1 summarizes these studies, the research focuses, and their results. However, there is a lack of research for in-process monitoring of laser scribing quality. These quality characteristics include scribe width, debris, crack length, scribe depth, width of heat affected zone, and straightness. Also, most of the research is done using a specific experimental condition and one question is whether the results could be applied to other scribing conditions with different imaging systems. In this study we attempt to measure and monitor three important laser scribing characteristics (i.e., scribe width, debris, and straightness) using image data and a state-of-the-art transfer learning method. TDCNN will enable us to leverage existing deep learning models from different domains and accelerate classification with less amount of laser-scribing data.

Table 1 Applications and features addressed by the most related publications

Research methodology

Experimental setup

The experimental setup is depicted in Fig. 2, and the samples used throughout the study are < 100 > -oriented, 1-mm thick intrinsic Si wafers with a resistivity of > 200 Ω cm. An IR laser is used to scribe lines on the surface of a silicon wafer. Specifically, the laser source (MWTech, PFL-1550) has a wavelength of λ = 1550 nm and produces pulses of length τ = 3.5 ns (full width at half-maximum) and can be operated at various repetition rates with a maximum pulse energy of 20 μJ. The output beam has a 1/e2 diameter of 6 mm.

Fig. 2
figure 2

Experimental setup. P Polarizer, HWP Halfwave plate, PBS Polarized beam splitter, M Mirror

A half wave-plate in conjunction with a polarizing beam splitter is used to control the pulse energy by rotation of the wave-plate. The beam is then focused on to the surface of the Si sample by a microscope objective (NA = 0.85, Olympus, Model LCPLN100XIR) that is corrected for spherical aberration. At focus, the beam has a theoretical diameter at \(1/{e}^{2}\) of \({2w}_{0}=1.22\lambda /NA=2.2 \mathrm{\mu m},\) with a Rayleigh length of \({y}_{R}\) = 2.6 μm in air. Parallel lines are scribed on the surface of the silicon samples. Considering three parameters in control and easy to change, we adopted a 23 factorial experimental design. Each factor is experimented on a high and a low value. Specifically, the low level and high level of the pulse energy are 1 and 2 µJ respectively. The low and high levels for petition rates are 20 and 120 kHz. Finally, the low and high setting for scanned speed is 0.5 and 10 mm/s respectively. The scribing conditions are listed in Table 2. The images are obtained by separating long scribe lines from each condition listed in Table 2 into multiple small segments.

Table 2 Scribing conditions used in our experiment

Image data and pre-processing

The first step of image processing for object identification is image segmentation. This segmentation task is accomplished by an unsupervised ML model, which does not require a time-consuming labeling process. However, unsupervised ML has limited applications and are not suitable for problems that need high accuracy. Figure 3(b) shows adaptive thresholding (AT) (Bradley & Roth, 2007) and Fig. 3(c) Otsu’s thresholding (OT) (Otsu, 1979) methods, respectively. AT is a local intensity method while OT is a global intensity one. As shown in Fig. 3, AT performs better than OT. However, the segmentation is still very far from desirable for process monitoring. To monitor a process, a high level of segmentation accuracy is required. Neither method achieves this standard. Our goal is to measure and monitor scribe width, debris, and straightness. None of those goals can be achieved using these thresholding methods.

Fig. 3
figure 3

a A sample include 2 scribes, b clustering result using adaptive thresholding (AT) method, c clustering results using Otsu’s thresholding (OT) method

Image pre-processing can include renaming, resizing, de-noising, segmenting, edge smoothing, and finally labeling. In this study, the collected image dataset is renamed, resized to 1024 × 1024, and labeled to the mentioned classes of scribe, debris, and the part background. The initial goal is to train the model with sufficient accuracy and minimum amount of data. To do this, a total of 21 images from 8 different scribes are collected and labeled to three classes of debris, scribe, and silicon background. Note that this data sets were split into 3 sets of training, validation, and testing with the ratio of 60-20-20. However, a valid concern is testing the model is not reliable based on a handful of images, even if we get very high accuracy and low loss. To make sure the accuracy is reliable, we prepared a large dataset that includes 154 images with the same size and more variety in scribe size, camera zoom, and defects. The purpose of the second dataset is to verify model performance. Specifically, we define the validation set as the dataset held back from training to estimate the model’s capability for tuning hyper parameters and the testing set is just as the part of dataset held back from the training set to give an unbiased estimation of the final trained model (Xie et al., 2011). However, by verification we want to ensure that the model won’t misbehave on a broader range of circumstances (Ding et al., 2021). Thus, in this research, all the images were labeled carefully using MATLAB R2020a Image Labeler application manually. A pixel-wise region of interest is defined in the MATLAB Image Labeler application where we assigned 1 to all silicon background pixels, 2 to all scribe pixels, and 3 to all debris pixels. The labeled ground truth data was exported from MATLAB environment and then imported to Python for DL and image processing.

Transfer learning model architecture

To solve traditional machine learning issues pointed out in the previous section in image segmentation, we designed our TL-based model. Figure 4 shows the details of the designed architecture where blue part (i.e., the first two and half rows) represents the layers with weights transferred from the pre-trained VGG16 and the orange part (the rest of the rows) is the proposed CNN classifier built on top of the pre-trained model. TL works the best when a related pre-trained CNN can be used to transfer knowledge. There are several well-known pre-trained CNNs such as Xception, VGG16, VGG19, ResNet50, Inception, and MobileNet, which have been trained over different public datasets like ImageNet and MNIST, and CIFAR (Mousavi et al., 2019; Noh et al., 2015a; Unnikrishnan et al., 2019; Uzkent et al., 2019; Zoph & Le, 2019). All these datasets are designed for object detection, of which goal is just to determine whether an image contain a specific object or not. However, to the best of our knowledge we could not find any pre-trained pixel-wise semantic segmentation CNN. Among several aforementioned pre-trained models for image classification, VGG16 is a very deep CNN that is trained on part of ImageNet dataset with 2 million images and 1000 different class of objects such as animals, furniture, sports, plants, etc. (Ferguson et al., 2018). VGG16 has a high accuracy for the objects it was trained for. Thus, for this research, VGG16 is chosen to transfer the image feature knowledge.

Fig. 4
figure 4

The proposed TDCNN architecture

Note that the original VGG16 is trained on images with a 224 × 224 resolution. The output of this model is one scalar value representing the classified object. However, the input, desired output, and consequently dimensions of all the layers need to be changed for different problems. In each problem, various resolutions for images can be used for training. Thus, the main adjustment needed for the proposed framework is to change the input dimension or image resolution. The topology of the proposed TDCNN is shown in Fig. 4. Working on appropriate fine-tuning and feature extraction, we trained the proposed TDCNN model. This new classifier can be a logistics regression or support vector machine model in case of binary classification, or deep CNN. Since the pre-trained model and the proposed model serve two purposes (the former is for object detection and the latter is for semantic segmentation), thus another deep CNN should be trained based on a new small dataset. To design this deep CNN, the idea of decoding and up-sampling in Unet (Ronneberger et al., 2015) and SegNet (Badrinarayanan et al., 2017) is adopted. In a similar study, pre-trained DeconvNet (Noh et al., 2015b) layers are transferred on top of VGG 16 for off-road autonomous driving without considering any batch normalization or drop outs (Sharma et al., 2019).

The last blue block in Fig. 4 is the output from the feature extraction process. The output of feature extraction from the pre-trained model is a matrix with dimensions of 128 × 128x256 where 256 is the number of filters. This output (i.e., the output from the last blue block) is the input of the proposed CNN. Six blocks of convolutional layers are designed. Each block contains a convolutional layer, batch normalization, ReLU, dropout, and Max pooling (or unpooling). The dropout helps to reduce overfitting. There is only one max pooling layer connected right after the first convolutional block followed by four max unpooling layers for up-sampling weights to the desired dimension, which is 1024 × 1024 in this case. Finally, a fully connected layer followed by a Softmax layer gives the desired classification. Note that the weights in the blue part remain fixed during training. This is because those weights have been obtained with training from millions of images. The rationale of transfer learning is to leverage this knowledge of fundamental geometrical features for various objects. The orange part is the proposed module for the desired classification of a specific problem domain, in this case, the scribed images.

Despite the complex appearance of the proposed architecture, it is significantly less complicated than the other well-known CNNs like the VGG16. As mentioned above, the blue section in the architecture belongs to the transferred layers from the pre-trained VGG16. The blue part does not include all the trainable layers in the original VGG16. The weights in the transferred layers from VGG16 were unchanged during training. In the proposed TDCNN and based on filter size and number of layers, we only trained 5 million parameters in each epoch for the new module with 771 parameters in the last dense layer, which were 3 times less than those of VGG16.

Model training and evaluation

The architecture in Fig. 4 was coded on the Google Colaboratory using a P100 GPU and Tensorflow environment. Model evaluation was done by two sets of image data; a validation set and a testing set. The validation set was the part of the sample data held back from training to estimate model performance during tuning the model’s hyper-parameters while the testing set was the part of the sample data held back from training to estimate the model’s final performance. We split the entire small dataset with 21 images to 14 images for training, 3 for validation, and 3 for testing. It is necessary to emphasize that the main effort here is to train our model with a few numbers of images and reach desired accuracy. Since the training set was small, the test set was also small, and one might claim the result was not enough. To address this concern, a large data set including 154 images was created and trained later to verify the results and accuracy.

In the training phase, training and validation accuracy and losses were used to access model performance. Thus for the aforementioned model, using Adam optimizer (Jais et al., 2019) with a learning rate of 0.0001, and a mini batch size of 4 the model was trained for 30 epochs. Figure 5 demonstrates the training performance of the designed model. Note that a dropout value of 0.4 was used to prevent possible overfitting during training. Both training and validation indexes (i.e. accuracy and losses) in Fig. 5 are very close to each other in each epoch, especially closer to the final epochs. This observation demonstrates that our model was able to avoid overfitting successfully.

Fig. 5
figure 5

Training and validation performance of TDCNN

While Fig. 5 represents training and validation performance during the model training process, there is still a need to examine the final trained model accuracy on test set. To do this we calculated F1-score for each class and Overall Accuracy (OA) using confusion matrix. Given a general structure of confusion matrix with True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) values in it, F1-score is defined as harmonic average of precision and recall. Equations 1, 2, 3, 4 give some details on how to calculate OA, precision, recall, and F1-score. OA is the correct identification rate. Precision is the rate of correct positive observations out of all observations identified as positive. High precision rates mean low false positive rates. Recall, on the other hand, is the ratio of correctly predicted positive observations out of all positive cases. High recall rates mean low false-negative cases. Finally, F1 score is the weighted average of precision and recall. In short, precision is a measure of false positives while recall is a measure of false negatives. F1 score is an overall measure of both. OA is the F1 score when all correct identifications of all categories are considered as a whole. All 4 metrics used here are the larger the better.

$$\mathrm{OA}=\frac{TP+TN}{TP+ TN+FN+FP}$$
(1)
$$\mathrm{Precision}=\frac{TP}{FP+TP}$$
(2)
$$\mathrm{Recall}=\frac{TP}{\begin{array}{c}TP+FN \\ \end{array}}$$
(3)
$$\mathrm{F}1\mathrm{ score }=2 *\frac{Precision*Recall}{Precision+Recall}$$
(4)

Table 3 shows the OA and details of F-1 scores over the test set. The last column in Table 3 is the number of pixels that support each class. Precision here means, for example, 64 percent of those pixels classified as debris are actually debris. This outcome is expected because debris pixels are only 6.7% of all pixels examined and there are many none debris pixels mistakenly identified as debris. The recall value for debris means our model was able to find and classify 85 percent of all debris pixels. The recall for “Part background” is over 97 percent while the recall for scribed part is 94 percent. This means that the proposed model is able to find and classify all the pixels related to the part correctly with very high rates. This outcome also suggests that most of misclassifications are related to debris and scribe pixels. Debris has the lowest F-1 score. The fact of low precision and high recall values for the debris means the model was able to find most of the debris; however, there are also many non-debris pixels that were classified as debris. We suspect that the misclassification is from the scribe. Figure 7 provides the evidence since many debris are connected to the scribed line.

Table 3 Overall accuracy and F1-score of testing the model using the first data set

Since using 3 images in the test set (despite the high image resolution) might not be very reliable, we trained this model with 60 percent of images in the large dataset and the rest of them is used for testing and validation. Interestingly, this time accuracy even improved to 97 percent and loss value went below 0.1 for both training and validation. Table 4 shows the results of training the model and testing it. As can be seen, performance is just slightly better than training with the small set. The F-1 score for debris is still less than 80 percent. This is probably because of the high unbalance ratio. Also, considering Fig. 5 and the same graph in training the large dataset, there is not a big gap between the training and validation loss value. This observation confirms that our model was able to avoid overfitting.

Table 4 Overall accuracy and F1-score of testing the model using the large dataset

Generation of scribe width and straightness of laser scribes

Given a sample scribe in Fig. 6, we can measure debris as well as scribe width and straightness. Trained models over the small and large sets had almost the same performance. In this section we use the model trained by the small set. The output from TDCNN is a 2D image that three categories of scribe, debris, and sample part are classified in it (see Fig. 7(b)). Thus, debris can be measured directly by measuring the number of pixels that classified as debris. Using the same classified image, scribe width and straightness also could be measured using geometric dimensioning & tolerancing (GD&T) (GD&T Straightness, 2014; NADCA, 2015) with appropriate adjustments on the formulas based on the classified images.

Fig. 6
figure 6

A scribed sample part showing measurement parameters

Fig. 7
figure 7

a original scribed sample on top left; b classification using TDCNN on top right. c scribe classification values

After quantifying debris, the whole image will be segmented to scribe and no-scribe pixels. Figure 7 shows the image and pixel values for an example image of 3 scribe lines. In the initial classification in Fig. 7(b), the model assigns ones to all scribed pixels, zeros to all background pixel and twos to all debris. After measuring debris area and monitoring scribe width and straightness, all the debris pixels’ value will be replaced with zeros to have a binary classification of scribe and no-scribe.

In Fig. 7(c) a pixel matrix is presented, in which one’s (1’s) represent all pixels that are classified as scribed, and zeros represent the background and debris. Let \({x}_{ij}\) be the value of a pixel in row i and column j. In order to determine the width and straightness, the tolerance zones and center line CL need to be identified. The width (W) is the distance between Jmax and Jmin which are the maximum and minimum positions in the effective area, respectively. The effective area consists of all column j’s where the row sum of j= \(\sum_{i=0}^{1023}{x}_{ij}\) is greater than 1024*(1-α). Here α is the classification error and 1024 is the image resolution. Thus, the effective area’s boundary and width are:

$$ {\text{J}}_{{{\text{min}}}} \le {\text{Effective area}} \le {\text{J}}_{{{\text{max}}}} $$
(5)

And,

$$ W = J_{\max } - J_{\min } $$
(6)

Let J be all the columns that with value 1 (i.e., contain scribe). Then the center line (CL) will be:

$$ {\text{CL}} = \sum\limits_{{j_{{min}} }}^{{j_{{max}} }} {{j \mathord{\left/ {\vphantom {j J}} \right. \kern-\nulldelimiterspace} J}} $$
(7)

Now define upper bound (Ub) max j that contains at least 1 pixel contain scribe and lower bound (Lb) min j that contains at least 1 pixel contain scribe. Thus:

Tolerance Zone 1 is between line

$$ U_{b} \;{\text{and }}J_{\max } $$
(8)

Tolerance Zone 2 is between line

$$ {\text{J}}_{{{\text{min}}}} \,{\text{and L}}_{{\text{b}}} $$
(9)

Finally, if the total number of columns in the tolerance zone is n then for each row, the error (ei) in the tolerance zone is:

$$ {\text{ei}} = \sum\limits_{{all\,j\,in\,tolerance\,zone}} {{{|x_{{ij}} - \bar{x}_{{ij}} |} \mathord{\left/ {\vphantom {{|x_{{ij}} - \bar{x}_{{ij}} |} n}} \right. \kern-\nulldelimiterspace} n}} $$
(10)

and \({\overline{x} }_{ij}\) is the average \({x}_{ij}\) in row i. Plotting ei gives a clear idea about the straightness of the scribe.

Results and discussion

The proposed study uses different scribing conditions causing different quality issues such as debris, fluctuation, and very thin or very thick lines. The proposed TDCNN method was able to capture and help quantify the scribed lines. In the following sections, we will discuss these findings.

Debris measurement

Figure 7(a) is an original image from scribing 3 lines and Fig. 7 (b) shows the classified image after the use of the proposed TDCNN model. The green region in the classified image represents the background of the part, the yellow regions are scribes, and the purple ones are debris. Figure 7(c) shows the classification values around the middle line. Note that there are 1024 * 1024 = 220 pixels in the image. Based on this classification, pixel counting shows that the 110,893 purple pixels are debris, the 833,790 greens are background, and the 103,893 yellows are scribes. This means 79.5 percent of the shape is background, 9.9 percent is scribe, and 10.6 percent is debris (mostly because of the second scribe line).

Figure 8 show straightness measurement plots for a normal (straight) line and a fluctuating line, respectively. Those plots were obtained after classifying the line using TDCNN and then quantifying straightness using Eqs. (5, 6, 7, 8, 9, 10). On top of each figure the original scribe can be seen and ei is plotted for the first tolerance zone of each scribe.

Fig. 8
figure 8

Error (ei) plot of a normal line and b fluctuating line

The error value range for a straight line (Fig. 8(a)) was between 0 and 1. However, this range for a fluctuating line (Fig. 8(b)) was between 0 and 6. In addition, comparing Fig. 8 (a) and (b) shows that a straight line has a smoother ei plot with lower variations.

Transferability to a new case

Most DL models are designed and trained based on the assumption that experimental and environmental conditions are consistent. However, these assumptions might not hold in real life. Changes in scribing parameters and imaging conditions potentially make significant differences in the final picture. To solve this problem, the proposed model needs to be retrained in order to obtain knowledge from a new environment. In a model without transfer learning, we might need to use a big data set that includes new conditions to get appropriate accuracy. However, the proposed model can accomplish this task with only a handful of images.

In our research, to test this hypothesis, we provided a new dataset of images from a different laser and imaging conditions. This new dataset is collected from a group of lines scribed by a nanosecond laser with a wavelength of 1064 nm and a repetition rate of 10 Hz. Eight lines with different conditions were scribed. Table 5 shows the technical details about the second group of scribes.

Table 5 Scribing conditions of the second group of lines

To examine the trained TDCNN, we tried to classify the new images using the current trained model. Figure 9 shows the classification result on new samples without any change or retraining. Figure 9(a) is the original scribe image while Fig. 9(b) is the classification result. Note that the yellow part represents the scribe. Interestingly, the model was relatively successful in finding the background. However, we have a high rate of misclassification for the scribe. This misclassification presents a challenge for achieving our goal of automatic characterization of debris, scribe width, and scribe straightness.

Fig. 9
figure 9

Image classification on new dataset a original image with different imaging setup, b classification result without retraining c classification result with retraining

To solve this problem, we retrained the same TDCNN but used additional 6 images to create a new dataset to train a new classification model. This way the model will learn extra knowledge in addition to the knowledge it already has. Network architecture and all the training hyper parameters (e.g., learning rate) were kept the same as the first training. We manually labeled the middle of the scribe as scribe rather than background or debris in the pixels in Fig. 9(a). Training and validation accuracy in retraining is over 98 percent which is even better than that of the first training result. Figure 9(c) shows the retraining performance on a new image. From this point, all the steps in “Generation of scribe width and straightness of laser scribes” section can be repeated and Eqs. (5, 6, 7, 8, 9, 10) can be used to calculate scribe width and straightness.

Conclusions

In this study we proposed a novel deep transfer learning model to classify images from laser scribing with balanced performance for high accuracy and low overfitting. The classified images contain identified debris and scribes. Further image processing and algorithms are developed to quantify scribe width and straightness based on the classified images.

The proposed TDCNN has big advantages over a regular deep learning algorithm without transfer learning. First, while all regular deep learning models require a huge amount of image data and long training time for model accuracy, the proposed model requires only a few image samples for adapting new situations. This transfer learning feature in the proposed framework saves substantial effort and time in data preparation and labeling and leads to a much shorter time in the modeling and training phase. With the training of only 5 million parameters compared to the current well-known architecture (e.g., VGG16 with more than 15 million trainable parameters), the proposed model has less complexity in the newly developed portion. Second, the proposed TL architecture is flexible in that adding new information based on new scribing conditions can be achieved with minimum effort and still retains the same performance. Usually working with a small dataset increases the chance of overfitting. In the proposed TDCNN architecture, we used several layers of batch normalization and dropout to overcome this challenge. These layers helped reduce (if not remove) overfitting.

The main idea in re-using pre-trained layers is transferring general knowledge and patterns such as edges, corners, dots etc. from other labeled images. Thus, only most common features are needed from the transferred layers, specifically, the number of filters were reduced to 64. Facing a new and specific problem, 512 filters were added to capture larger combinations of patterns. For future research, more complicated pattern combinations may be captured and studied by adding new layer modules.

The proposed TDCNN model enables the characterization of laser scribing quality using just a small number of sample images and reaches the accuracy as high as 96 percent. The scribe images in this study had mainly three features of concern: debris, width variation, and straightness. The quality measures on these characteristics pave the way to track all these features automatically and enable the possibility of real-time process control as a logical next step. Although the proposed method is able to measure and quantify debris with any scribe shape, quantifying straightness and width is only limited to vertical lines and further study is needed for non-vertical lines. This study only focuses on laser scribing on silicon wafers. We expect the same framework can be extended to different materials such as solar photovoltaic thin film. However, different quality issues such as cracks may arise that also require future research. Highly unbalanced data was a significant limitation in this study. We will tackle this issue using the other state-of-the-art methods such as Differentially Private Generative Adversarial Networks in future studies as well. Finally, the case study in “Transferability to a new case” section demonstrates that the trained TDCNN model can be transferred to a new case to improve its original performance. We expect the trained models may also be transferred to other laser related processes. Future research is much needed to confirm this hypothesis.