Introduction

PCBs are widely applied in electronics, communications, computers, security, medicine, industrial control, aerospace, etc. [1] As the carrier of modern large-scale integrated circuits, the quality of PCBs is closely related to the operational efficiency and safety of modern circuits. Unfortunately, PCB in the manufacturing process by the environment and the influence of personal factors of technicians, resulting in welding spikes as shown in Fig. 1 aiguille, interconnect pad, dissymmetry, holes, and solder residue of these five types of common welding defects, the ensuing short circuit and electrical fire and other hazards to PCB manufacturers and users will bring huge economic losses [2] and even casualties. Detection of PCB defects is essential.

Fig. 1
figure 1

Common soldering defects on the surface of PCBs

Traditional methods for PCB defect detection include visual inspection, electrical testing [3], and Automatic Optical Inspection (AOI) [4]. Visual inspection by the operator is subject to leakage and mistakes related to the subjective state of the operator [5]. The electrical test method is incapable of detecting short-circuit defects and extensive defects, and the AOI technology is affected by the working environment, technical requirements, hardware equipment, and other factors, which makes its adaptability insufficient. Template matching technology [6] has been applied to PCB surface defect detection with the development of image processing technology. Unfortunately, there are still a large number of targets that are incorrectly detected and overlooked. Machine learning has been applied to PCB surface defect detection [2, 7], but the characterization of defects is cumbersome. The emergence of Convolutional Neural Networks (CNNs) [8, 9] has provided new options for detecting PCB surface defects. Related target detection algorithms have also attempted to recognize PCB surface defects. For instance, RAR-SSD [1] can acquire all the characteristics of defects by applying multi-scale feature fusion, relying on lightweight Receiving Field Block Modules (RFB-s) and an attention mechanism to highlight the importance of different features, which brings the problem of limited detection accuracy while achieving lightweight defect detection. Feature fusion approaches, including TD-Net [10], TDD-Net [11], improved Faster-RCNN [12], LDD-Net [13], and Edge Multiscale Reverse Attention Network (EMRA-Net) [14], emphasize the employment of multiscale feature fusion to improve the detection accuracy of the algorithms at the cost of huge memory consumption. Specifically, the five feature inputs utilized for feature fusion in LDD-Net and the two-stage approaches such as TDD-Net inevitably reduce the inference speed of the algorithm. These methods especially suffer from a lack of effective interaction with the in-scale features of defects, which is important for defect detection in complex contexts. Although the Focal Loss introduced by the extended feature pyramid model [15] addresses the imbalance between foreground and background, it ignores the imbalance between defect classes.

In conclusion, many deep learning algorithms are not suitable for application in real industrial sites due to the large number of parameters and slow inference speed. Furthermore, the majority of feature fusion approaches fail to acknowledge the significance of intra-scale feature interactions in the context of defects. However, the global information of the in-scale features, in conjunction with long-range dependencies, is more effective in assisting the network in the learning of PCB surface defect features in a complex context. Existing methods often overlook the issue of severe class imbalance in PCB surface defect detection, particularly in industrial long-tailed data. Long-tailed data reduces the detection accuracy of some defect categories that are at the tail of the data, or worse, a large number of tailed categories is missed. Networks rarely employ the method of auxiliary head supervision in the process of learning PCB surface defects. Nevertheless, the method of auxiliary head supervision allows more accurate defect feature information to be provided to the algorithm, which helps the algorithm to better identify PCB surface defects.

The main contributions to the problems in PCB table defect detection are as follows:

(1) The proposed EFF-Net enables intra-scale feature interaction and cross-scale feature fusion of defects, enabling the network to efficiently fuse long-range dependencies of PCB surface defect features in combination with multi-scale feature information.

(2) The design of an auxiliary head supervision strategy for the supervision of the middle layer network, which consequently assists the network in achieving accurate learning of PCB defect information.

(3) The designed BCE-LRM loss is utilized for mining hard samples to achieve improved detection accuracy of tail data in the defect data.

The paper is structured as follows: Section “Related work” reviews the state-of-the-art methods for target defect detection in recent years, Section “Proposed approaches” describes our overall approach in detail, Section “Experiment” experimentally validates the feasibility of the proposed method, and Section “Conclusion” summarizes the work in this paper, highlighting our improvements and future research directions.

Related work

As a practical application of the computer vision field in engineering, the main task of defect detection is to localize and classify defects in industrial products. For example, pixel segmentation [16] is used to identify rail surface defects, YOLO-attention [17] detects defects in wire arc additive manufacturing, improved YOLOV4 [18] identifies surface defects in aluminum strips, improved YOLOV5s [19, 20] identifies small defects on the surface of ceramic tiles, and SR-ResNetYOLO [21] detects defects on the surface of gears. All these methods use multi-scale feature fusion to improve the detection accuracy of the algorithms. The resulting large number of parameters as well as the memory footprint pose difficulties for practical deployment. More to the point, these methods only focus on the inter-layer relationship of defect features, ignoring the intra-scale feature information capture of defects. For instance, SOD-YOLO [22] did not try in a complex context when detecting small target defects in wind turbine blades, and generative adversarial networks [23] have deployment difficulties in identifying surface defects in wood. PEI-YOLOv5 [24] did not address the issues of intra-scale feature interaction and inter-class sample imbalance when detecting fabric defects. The application of a cumbersome multi-scale feature fusion module for the detection of steel surfaces [25] increased training time.

In addition to traditional CNN networks, Transformers are beginning to be widely used in the field of defect detection. For example, DAT-Net [26] detects tool wear defects without considering the inter-class imbalance of defects, and DefectTR [27] detects defects in sewage pipe networks with poor detection accuracy. For the detection of roller surface defects, the multi-layer Transformer encoder used by the Cas-VSwin transformer [28] results in huge computational as well as parametric quantities. DefT [29] suffered from slow inference when applied to the detection of industrial surface defects. LSwin Transformer [30] did not implement intra-scale feature interaction and cross-scale feature fusion for the detection of steel surface defects. Swin-MFINet [31] did not make full use of multiscale feature information when performing the detection of surface defects on manufactured materials. RDTor [32] performed the detection of PCB surface defects where the intra-scale interaction of low-level features is unnecessary because of the risk of duplication and confusion with high-level feature interactions. At the same time, all of the above approaches suffer from difficulties in practical deployment, and the Transformer framework is usually accompanied by a huge amount of computation, which makes it difficult to adapt to the high real-time as well as high embeddedness requirements of the industry. Nevertheless, the Transformer framework is able to effectively encode global information and efficiently learn the contextual information of PCB surface defect features. Consequently, it helps the algorithm to recognize PCB surface defects with variable shapes and complex backgrounds.

PCB surface defect detection, as an important branch in industrial defect detection, provides many inspection methods with exploratory significance and practice. Such as improving the CIoU and feature pyramid based on YOLOV5 [33], a combination of lightweight YOLOX and positional attention mechanism [34], adding a trunk feature layer in YOLOV3 [35], YOLOV4-Tiny [36], YOLOV5 combined with Transformer [37] and so on. These methods do not consider the issue of inter-category imbalance of PCB surface defects. As a result, it can be confusing to identify the tail data categories in long-tailed data. The method of adding a backbone feature layer in YOLOV3 increases the memory consumption and computational cost to a large extent, and ignores the intra-layer representation of features, while the long-range dependency of intra-layer representation and global information is extremely important for PCB surface defects detection in complex backgrounds. While few existing defect detection methods utilize auxiliary supervision strategies, auxiliary supervision is extremely beneficial in the training of lightweight networks. Auxiliary supervision is typically effective in providing more comprehensive and reliable abstracted semantic feature information for the detection of PCB surface defects. It also helps the algorithm to more accurately exclude large redundant features, thus achieving the purpose of feature purification.

Proposed approaches

Fig. 2
figure 2

Architecture of LLM-Net, where, Efficient Hybrid Encoding Module (EHEM) is utilized to interact intra-scale features of PCB surface defects in an attentional manner. The Efficient Feature Fusion Network ultimately achieves intra-scale feature interaction and cross-scale feature fusion

The LLM-Net network structure we constructed in Fig. 2 mainly consists of a backbone feature extraction network, an efficient feature fusion network, and a detector. Inspired by YOLOV5 and DETR [38]. We propose an Efficient Feature Fusion Network (EFFNet) by combining the encoder simplification of DETR with the PAFPN network. Multi-scale features are converted into image features utilizing intra-scale feature interaction and cross-scale feature fusion. Enables LLM-Net to efficiently fuse long-range dependencies of PCB surface defect features with multi-scale feature information. The proposed BCE-LRM results in better learning of hard samples by LLM-Net. The errors of LLM-Net during the training process are reduced by our designed assisted supervision strategy, which allows the network training time to be reduced. At the same time, it gives LLM-Net a stronger ability to characterize PCB surface defects.

An efficient feature fusion network

The feature fusion network proposed consists mainly of an efficient hybrid encoding module (EHEM) and a multi-scale feature fusion network. In particular, the EHEM module is mainly responsible for interacting with the in-scale features of PCB surface defects in an attentional manner to achieve efficient embedding of global information about defects. Multi-scale feature fusion network transforms the defect feature maps into feature layers for predicting defects of different sizes, enabling the network to adapt to PCB surface defects of different shapes and sizes.

Efficient hybrid encoding module

The deeper semantic features of PCB surface defects often contain richer abstracted semantic information, and we encode the deeper semantic features provided by the backbone feature network through an efficient hybrid encoding module, which effectively exploits the semantic features of defects. The abstracted feature information in such semantic information is more beneficial for defect classification for regression. To achieve efficient and accurate identification of PCB surface defects, we simplified the encoder of DETR by adopting a simple single-layer transform coding module. The hybrid coding module constructed is shown in Fig. 5.

Fig. 3
figure 3

Scaled dot-product attention

The efficient hybrid encoding module, with the scaled dot product attention in Fig. 3 as its core, can improve the encoding of the global information and learn the contextual information of the PCB surface defect features to a limited extent, which in turn assists the algorithm in detecting PCB surface defects with variable shapes and complex backgrounds. The attention input consists of the query, key (of dimension K), and value (of dimension V), which are described in Eq. (1).

$$\begin{aligned} Att(K,Q,V)=Softmax\left( \frac{QK^{T}}{\sqrt{d_{k}}}\right) V \end{aligned}$$
(1)

where the three matrices K (key), Q (query), and V (value) are obtained by linear transformation, and \(d_{k}\) gives the length of K and Q. The Q matrix of the element is then dot-multiplied by the K matrix of the other elements in the sequence, establishing the dependence relationship between one element and the others in the sequence. Following the encoding of Eq. (1), a relationship is established between the defective features on the PCB surface in terms of order and spatial location. As a consequence, the ability of the algorithm to extract the positional features of the defects is improved, and the ability of the algorithm to locate the defects more accurately is enhanced.

The multi-head attention mechanism is obtained by splicing the output of multiple parallel scaled dot product attention, as shown in Fig. 4. The multi-head attention module achieves compensation between Q, K, and V through parallel processing, resulting in n sets of results with the same number of heads after n (number of heads) linear transformations. The results provide sequence information on defects and dependency information between elements. Not only are they extremely helpful for the network to obtain contextual information on defects on the PCB surface, but they are also indispensable for the subsequent classification and prediction of defects.

Fig. 4
figure 4

Multi-head self attention

Fig. 5
figure 5

Efficient hybrid encoding module

The efficient hybrid encoding module in Fig. 5 further reduces computational redundancy based on the transform encoder, which only performs in-scale interactions of features on P5. We argue that applying the self-attention operation to high-level features with richer semantic concepts captures the connections between conceptual entities in PCB defective images, which helps the subsequent modules detect and recognize defects in images. Simultaneously, due to the absence of semantic concepts, interactions between low-level features within the scale are unnecessary. Otherwise, there is a risk of duplication and confusion with interactions between high-level features.

Efficient feature fusion network

The backbone network outputs several feature layers. The P3 layer contains fine-grained features and location information for recognizing subtle defects. The P4 layer contains more abstract semantic features suitable for recognizing medium-sized defects. The richest semantic features are found in the P5 layer, which plays a crucial role in classifying and recognizing defects. An efficient feature fusion network is constructed by combining the proposed EHEM with a multi-scale feature fusion network. The EFF-Net processes the deep semantic feature P5 with multi-head attention and obtains the PCB surface defect feature F5, which contains the long-range dependencies and contextual information. Subsequently, the F5 feature is utilized for cross-scalar feature fusion, and at the same time, the information contained in F5 is transported to the other two scales of features. Ultimately, EFF-Net accomplishes the embedding of multi-scale defect context information and long-range dependencies, which in turn effectively supports the detector in detecting and identifying defects in the image. Equation (2) characterizes our efficient feature fusion network.

$$\begin{aligned}{} & {} Q=K=V=Flattten\left( P5 \right) \nonumber \\{} & {} F5=Reshape\left( Att\left( K,Q,V \right) \right) \nonumber \\{} & {} \left\{ feat1,feta2,feat3 \right\} =EFF\left( \left\{ F3,F4,F5 \right\} \right) \end{aligned}$$
(2)

where Att represents the multi-head attention module, \(\left\{ feat1, feat2, feat3\right\} \) are the outputs of the efficient feature fusion network, Reshape and Flatten are inverse processes to each other, and P5 is the high-level feature of the backbone outputs. F3 and F4 are the shallow and middle outputs of the backbone network (equal to P3 and P4, respectively, in Fig. 2). The efficient feature fusion network is characterized by EFF.

Assisted supervision strategy

Algorithm 1
figure a

Pseudo-code of Auxhead label assignment strategy

Deep supervision [39] is frequently employed as a means in deep network training, and auxiliary heads are usually added to the middle layer of the network to obtain the auxiliary supervision loss. To enhance the ability of the algorithm to obtain more accurate feature information on PCB surface defects, we designed the auxiliary supervision strategy. The auxiliary head label assignment strategy is independent, preventing the auxiliary head loss from affecting the main detection head loss. Meanwhile, in the label assignment strategy of the auxiliary head, we adopt the state-of-the-art soft label assignment strategy. Efficiently avoids increasing the error in the network learning process when using the original hard labels, which in turn effectively helps the algorithm to learn the defective features. The loss function employed in the auxiliary head is the same as that of the main detection head, which prevents the auxiliary head from ignoring hard samples and also promotes the algorithm to converge faster during the training process. Algorithm 1 presents the pseudo-code for the core code of the label assignment strategy in the auxiliary head.

Fig. 6
figure 6

BCE-LRM. First, the loss of different scale features is calculated for each mini-batch. Then, the losses are ranked and stored in a vector. Next, the loss values in the vector are ranked in descending order, and the top \(\beta \) of the ranking is selected for each image. Finally, the averaged loss is obtained and utilized as the confidence loss for network prediction

The application of auxiliary heads greatly enhances lightweight algorithms for learning from PCB surface defects with variable morphology on long-tail datasets. Following the fully supervised training of the mid-layer network, we incorporate the supervised loss obtained from the main detection head and perform backpropagation and gradient updates. The designed auxiliary supervision strategy combines hard labeling with soft labeling to achieve a more effective defect target allocation strategy. Additionally, the auxiliary supervision can quickly correct learning errors during the training process for the lightweight defect detection network, resulting in more effective identification of PCB surface defects by the lightweight algorithm. The auxiliary supervision strategy is solely utilized during the training process and is not involved in the predictive inference process. With this strategy, the efficiency of the algorithm is ensured.

Boosting for hard samples

In industrial defect detection scenarios, severe imbalances between categories seriously affect detector effectiveness in detecting tail data. To effectively improve the detector performance, we propose a hard sample mining method BCE-LRM, which combines the binary cross-entropy loss. To enhance detector performance effectively, we are inspired by the Literature [40, 41] and propose a difficult sample mining method BCE-LRM by combining the binary cross-entropy loss. Unlike the Literature [40], the original LRM is only for single-scale features, while in the proposed method, we utilize the BCE-LRM for all scales of features. Figure 6 presents the proposed BCE-LRM strategy. Firstly, the loss of different scale features is calculated for each mini-batch. Then, the losses are ranked and stored in a vector. Next, the loss values in the vector are ranked in descending order, and the top \(\beta \) of the ranking is selected for each image. Finally, the averaged loss is obtained and utilized as the confidence loss for network prediction.

The algorithm prioritizes difficult samples with reduced learning effect by the above loss ordering and loss selection, enabling it to effectively learn the difficult samples in the PCB surface defects. The sorting and selection of losses, in comparison to the original binary cross-entropy loss, is effective in learning both bad and difficult samples. In contrast to the Literature [41], we opted to avoid the application of the focal loss in avoiding its neglect of easily detectable faulty categories. Additionally, considering that there are a few quality PCB surface defect samples that have high intersection and concurrency ratios but not high confidence levels, we have avoided utilizing Varifocal Loss.

LRM is utilized for binary cross-entropy losses after feature mapping is complete, and only the losses are ranked and selected in the training phase. Equation (3) characterizes the binary cross-entropy loss.

$$\begin{aligned}{} & {} \sigma \left( P_{i}\right) = \frac{1}{1+e^{P_{i}}} \nonumber \\{} & {} BCEWithLogitsLoss=\frac{1}{N}\sum _{i=1}^{N}(y_{i}\cdot log(\sigma \left( P_{i}\right) )\nonumber \\{} & {} \quad +(1-y_{i})\cdot log(1-\sigma \left( P_{i}\right) )) \end{aligned}$$
(3)

where variable \(p_i\) represents the probability of a defective sample being classified as a positive example, while \(y_i\) represents the true label of the defective sample, and N represents the number of samples.

Equation (4) shows that the core of LRM is a binary mask matrix called Mask. The defective feature mapping, Feat, obtained from prediction is multiplied with Mask to obtain a new feature mapping by multiplying the elements. This new defective feature mapping is mainly utilized to determine whether the sample is a difficult sample or not.

$$\begin{aligned} Feat_{out}=Feat\cdot Mask \end{aligned}$$
(4)

The binary parameters in Mask are not pre-set and are determined by the final prediction result. When the elements in the feature mapping belong to difficult samples, the elements in the mask Mask are set to 1 and the other elements are set to 0. The backpropagation of the network is then carried out, and the parameter updating is shown in Eq. (5).

$$\begin{aligned} \hat{\frac{\partial Loss}{\partial f_{i,j}^{c}}}=m_{i,j}^{c}\cdot \hat{\frac{\partial Loss}{\partial f_{i,j}^{c}}}={\left\{ \begin{array}{ll} 0&{} m_{i,j}^{c} = 0\\ \hat{\frac{\partial Loss}{\partial f_{i,j}^{c}}} &{} m_{i,j}^{c} = 1 \end{array}\right. } \end{aligned}$$
(5)

where denotes the element in the faulty feature mapping f (corresponding to Feat) with position (i, j) when the channel is c, and denotes the element in the mask Mask with position (i, j) when the channel is c. With the mask Mask, the algorithm successfully reduces the backpropagation of simple samples and ensures that the algorithm learns difficult faulty samples.

Experiment

Dataset

Fig. 7
figure 7

Diagram of the capture device

Figure 7 shows the PCB surface defect acquisition system that was built. The system comprises a strip light source, an industrial camera, a conveyor belt, and a host computer. The acquisition resolution of the industrial camera was set to 3072\(\times \)2048, and 300 PCB images were obtained, each containing five types of common soldering defects: aiguille, interconnect pad, dissymmetry, holes, and solder residue. To prevent model overfitting during training, we captured screenshots of the dataset using a fixed window size of 640\(\times \)640 pixels. The step size of the screenshots was set to 320-pixel units, resulting in 940 defect images. Table 1 shows the statistics of the various types of defects in the dataset, and Table 2 displays the camera parameters at the time of acquisition. To ensure a more rigorous verification of the proposed method, we set up the dataset in MS-COCO format.

Table 1 The number of defects in various categories in the dataset
Table 2 The camera parameters during data collection

Experimental preparation and parameter settings

The experimental models were executed on an Ubuntu 23.10 device equipped with torch2.0. The device had a 12th Gen Intel(R) Core(TM) i7-12700F 2.10 GHz CPU, 32Gb RAM, and an Nvidia GeForce RTX 4080 16Gb GPU. All models were trained without using pre-training weights or freezing the backbone training, ensuring fairness in the experiments. The hyperparameter settings during training are displayed in Table 3. To ensure data uniformity in the comparison, we maintained the same image size for training, validation, and testing. Additionally, all models were trained, tested, and validated on an equal number of images.

Table 3 Hyperparameter setting

Evaluation metrics

Practical industrial field inspection aims to balance the number of model parameters, inference speed, and detection accuracy. Objective evaluation metrics such as Precision, Recall, mAP@0.5, mAP@0.5:0.95, FPS, GFLOPs, and Parameters are utilized to evaluate the performance of LLM-Net, and the calculation methods of these evaluation metrics are as follows.

Precision is the ratio of correctly predicted positive samples by LLM-Net to the total number of positive samples predicted. The precision of the predictions made by LLM-Net is characterized by the following formula. The calculation formula for precision is as follows.

$$\begin{aligned} Precision= & {} \frac{TP}{TP+FP} \end{aligned}$$
(6)

TP represents the number of correctly predicted positive samples, while FP represents the number of negative samples that were incorrectly predicted as positive.

Recall is the ratio of correctly predicted positive samples to the total number of actual positive samples in the LLM-Net prediction samples. It characterizes the ability of LLM-Net to identify all positive samples. The calculation formula for the recall is as follows.

$$\begin{aligned} Recall= & {} \frac{TP}{TP+FN} \end{aligned}$$
(7)

FN denotes a positive sample that is incorrectly predicted as a negative sample.

The mean average precision (mAP) represents the average AP value for all categories. The AP value is the area under the Precision-Recall curve, and a larger value indicates better detection for a category.

$$\begin{aligned} \begin{matrix} AP&{}=&{}\displaystyle \int _{0}^{1}P(R)dR \\ mAP&{}=&{}\frac{ {\textstyle \sum _{i=1}^{N}}AP_{i}}{N} \end{matrix} \end{aligned}$$
(8)

Ablation experiment

Validation of the effectiveness of the proposed method on baseline models

The purpose of the ablation experiments is to confirm the effectiveness of the proposed auxiliary supervision strategy, EFFNet, and BCE-LRM for PCB surface defect detection. A series of experiments were conducted on a homemade dataset, with a focus on the metric mAP@0.5 in the experimental metrics. In the field of PCB surface defect detection, accuracy is prioritized over speed when the frame rate meets the field requirements. Therefore, FPS, GFLOPs, and Parameters are considered secondary.

Table 4 Results of ablation study
Fig. 8
figure 8

Visualization results of heatmaps for the baseline model and LLM-Net, where a more reddish color indicates a higher probability of recognition as a target, and conversely a more bluish color indicates a higher probability of recognition as a background

Table 5 The impact of different loss functions on the model accuracy

We apply the YOLOV5-n model as a baseline to verify the effects of the auxiliary supervision strategy, EFFNet, and the algorithm after adopting the BCE-LRM on the PCB surface defect detection accuracy (mAP@0.5, mAP@0.5:0.95), FPS, GFLOPs, and Parameter, and the statistical results are shown in Table 4. The auxiliary supervision strategy can effectively reduce the error weights during the training process, so that the algorithm focuses more on learning the positive sample features of the PCB surface defects, which leads to an improvement of 0.3% and 5.5% in the mAP@0.5 and mAP@0.5:0.95 of the baseline model, respectively. With the mixture of the auxiliary supervision strategy and EFFNet, the algorithm effectively learns the refined feature information of the defects as well as richer contextual information, and the mAP@0.5 and mAP@0.5:0.95 of the algorithm are improved by 0.3% and 6.1%, respectively, compared to the baseline model. Finally, the employment of BCE-LRM improved the ability of the algorithm to learn from hard samples, with mAP@0.5 and mAP@0.5:0.95 improving by 1.4% and 6.2%, respectively, compared to the baseline model. The algorithm that achieved the best accuracy was designated LLM-Net, and the visualized heat map is shown in Fig. 8. The closer the colors in the graph are to blue, the higher the probability that this part is considered to be the background, while the closer the colors are to red, the higher the probability that this part is considered to be the target. From the figure, we can see that our LLM-Net has a strong defect perception capability and accurately identifies the location of defects.

Table 6 Effect of auxiliary head supervision position on model performance
Fig. 9
figure 9

Various auxiliary head supervision schemes, where (ac) correspond to the ‘Method’ column in Table 6

Comparison of different loss functions

For the solution of difficult samples and sample imbalance problems, there are also similar methods such as Focal Loss, Varifocal Loss, Slide Loss, etc. as listed in Table 5. For the difficult sample problem in PCB surface defects, Focal Loss suppresses the prediction frames with high positional accuracy and low confidence and ignores the effect of the intersection ratio, which degrades the performance of the model. Thus, this caused the network accuracy metrics mAP@0.5 and mAP@0.5:0.95 to degrade to 94.1% cent and 68.0%, respectively. Varifocal Loss spends most of its effort on the imbalance between the foreground and background when performing loss compensation, and the imbalance between the samples affects the detection effectiveness of the model. Eventually, the network accuracy metrics mAP@0.5 and mAP@0.5:0.95 are degraded to 94.7% and 68.8%, respectively. Slide Loss takes the average of the intersection and concurrency ratios of the GT and prediction frames to determine whether the target belongs to the difficult samples or not, and emphasizes the boundary samples by weighting them, but the method does not improve the network performance much. Slide Loss merely managed to equal BCE in the mAP@0.5 metric, but caused mAP@0.5:0.95 to degrade to 71.2%. Usually, the hard samples have a large chance of being missed, which will cause difficulties for the subsequent weighting of the slide loss. BCE-LRM ranks the loss of samples and achieves the learning of difficult samples more effectively, which makes the algorithm achieve the best accuracy of 98.5% for the mAP@0.5 metric. At the same time, BCE-LRM is more friendly to most of the target detectors and has a strong embedding ability.

Table 7 Results compared to other methods
Fig. 10
figure 10

Comprehensive performance comparison of each algorithm

Fig. 11
figure 11

Visualization of detection results for each algorithm

Optimization of auxiliary supervision strategies

It was found that the auxiliary supervision strategy is effective in enhancing the performance of the algorithm. Three experiments were conducted to determine the best approach for this strategy, testing only the auxiliary head on LLM-Net. The results of these experiments are presented in Table 6, with the ‘Method’ column corresponding to the scheme number in Fig. 9. The performance of the network without the auxiliary supervision strategy shows only a small improvement compared to the baseline model. However, when the auxiliary head supervises the output of the backbone, it learns the error information of the shallow network. Consequently, the auxiliary head affects the prediction of the main detector head, resulting in no effective improvement in the performance of the network. The proposed scheme in this paper implements auxiliary supervision in the middle layer of the network to acquire more defective feature information, resulting in an improved accuracy index mAP@0.5 of 98.5%. Additionally, the combination of soft and hard labels enhances the adaptability of the network to PCB surface defects.

Comparison with other algorithms

Quantitative comparison with previous methods

In real industrial scenarios, PCB surface defect detection not only requires high accuracy and reasonable inference speed but also needs to achieve strong quantization of the model. Therefore, we compare the accuracy metrics (mAP@0.5, mAP@0.5:0.95), FPS, GFLOPs, and Parameter of the network, focusing on the accuracy metrics of the algorithm and treating other metrics as secondary metrics. In Table 7, we comprehensively compare LLM-Net with 13 representative current deep learning algorithms (SSD [42], CenterNet [43], FastestDet [44], RetinaNet [45], YOLO-PCB [33], SPD-Conv [46], YOLOV5-n [47], YOLOV7-Tiny [48], YOLOV8-n [49], CFPNet-n [50], CFPNet-s [50], YOLOX-n [51], and YOLOX-s [51]) in terms of performance.

Table 7 presents the results of the quantitative comparison between LLM-Net and other methods. The results show that LLM-Net outperforms other methods significantly in terms of the two indicators, mAP@0.5 and mAP@0.5:0.95. Figure 10 demonstrates that LLM-Net is suitable for PCB surface defect detection, with a focus on detection accuracy, despite not achieving the best performance in FPS and Parameter indexes. CenterNet employs the heat map prediction method in PCB surface defect detection, which leads to its inability to detect defects under complex backgrounds well. FastestDet improves detection speed while reducing the number of parameters, at the cost of detection accuracy. YOLO-PCB and YOLOV7-Tiny have the problem of insufficient learning of tail data in defect identification, which leads to serious leakage. Despite adopting a more reasonable means of downsampling, the feature information of PCB surface defects is still severely lost after downsampling by SPD-Conv. In defect detection, the decoupling head of YOLOV8-n is incapable of accurately identifying the tail data. The explicit visual center pyramid in CFPNet-n and CFPNet-s is not sufficient to extract the feature information of the defects. The huge computational volume makes their actual deployment capability poor. The YOLOX-n and YOLOX-s algorithms fail to consider the intra-scale feature interaction of defective features and the cross-scale feature interaction simultaneously. This results in the loss of global information about defects and long-range dependencies, which in turn leads to suboptimal detection outcomes. LLM-Net achieves intra-scale feature interaction and cross-scale feature fusion of PCB surface defects through EFF-Net, which efficiently embeds the long-range dependencies of PCB surface defect features and multi-scale feature information. Applying the designed auxiliary head supervision strategy assists in achieving accurate learning of PCB defect information. Ultimately, LLM-Net achieved the best detection accuracy in several excellent performance algorithms while holding an inference speed of 188FPS. The experimental results show that LLM-Net is more suitable for PCB surface defect identification.

Qualitative comparison with other approaches

Figure 11 shows the visualization of the detection results of various methods. LLM-Net accurately identifies the long-tailed data class (solder residue) and some subtle defects. Intra-scale feature interaction and inter-scale feature fusion provide LLM-Net with richer global information and long-range dependencies. The introduction of multiscale feature linkages can also help LLM-Net achieve more accurate defect recognition in complex contexts. Compared to other comparison algorithms, CenterNet is prone to leakage and misdetection, particularly in identifying small defects such as the solder residue defects in the second column of the visualization result graph. Fastest-Det, despite utilizing a lightweight backbone as well as a feature fusion network, suffers from losing its powerful feature extraction capability, resulting in a large number of missed targets. Excessive attention can decrease the accuracy of the algorithm while introducing false attention features. YOLOV7-Tiny has false detections when identifying defects. SPD-Conv can identify defects relatively accurately, while still failing to avoid false detections. YOLOV8-n ignores the interaction of features within the scale when extracting features, so it can identify subtle defects with a high degree of accuracy. YOLO-PCB, YOLOV7-Tiny, CFPNet-n, CFPNet-s, YOLOX-n, YOLOX-s, and YOLOX-s all fail to detect the tail residue, which is closely related to the fact that they do not consider mining the tail data.

Conclusions

The current methods for improving defect recognition accuracy have some limitations. Firstly, they only utilize simple feature fusion to improve defect recognition accuracy, which results in large memory consumption while ignoring the importance of intra-layer feature interaction; Secondly, they neglect the long-tail problem in industrial data; Thirdly, most of the methods ignore the utilization of an auxiliary supervision strategy for PCB surface defect recognition, which can provide accurate defect feature information to the algorithms; Fourthly, they ignore the importance of intra-layer and inter-layer feature interaction to improve defect recognition accuracy. This paper proposes an EFF-Net based on YOLOV5-n to interact with both intra-layer and inter-layer defect features, which achieves the global information of defects as well as the embedding of long-range dependencies. The algorithm is aided by an auxiliary supervision method that utilizes a soft-label assignment strategy to extract more accurate defect features, and BCE-LRM is designed to improve the detection effect of tail data.

Experiments were conducted to validate LLM-Net using a dataset of PCB surface soldering defects that we collected. The results demonstrate that LLM-Net has the highest detection accuracy and can perform real-time inference at 188 FPS. The visualization results indicate that LLM-Net has the best detection performance and does not present any leakage in randomly selected test images. Currently, we can detect soldering defects on the surface of 5 classes of printed circuit boards in real–ime. However, in industrial scenarios, it is crucial to ensure high detection efficiency. To improve the efficiency of the deep learning method in such scenarios, it is essential to enable defect detection based on tracking.