1 Introduction

As a basic component of transmission lines, insulators play an important role in electrical isolation and mechanical support. However, because they are often in a harsh natural environment, insulators are vulnerable to external factors and cause damage, which may lead to power line interruption in serious cases. Therefore, insulator detection has become an important part of transmission line inspection that cannot be ignored. In the process of insulator defect detection, there will be some complicated situations, the common situation is that the insulator is damaged (as shown in Fig. 1a), and there may also be occluder on the insulator, which may be animals (including birds, reptiles, insects, etc., Fig. 1b shows the snake on the insulator). The presence of animals on the insulator is generally regarded as normal. If the cover is a bird’s nest, it is considered a defect that needs to be addressed (Fig. 1c shows this).

Fig. 1
figure 1

Insulator condition diagram

Insulator defect detection is a hot research field, and many experts at home and abroad are exploring effective detection methods. With the development of deep learning and the progress of computers, insulator defect detection methods based on deep learning have gradually become the mainstream. As an application example of object detection, the algorithm of insulator detection can follow the mature algorithm of object detection. These methods use deep neural networks to extract the features of insulator images, and classify and locate defects. Literature [1] uses the improved faster RCNN algorithm to detect insulator defects on power lines, but only one category of experimental results is not convincing. In literature [2], based on generative adversarial network (GAN), U-Net network is used to detect insulator defects, and the detection accuracy is significantly increased, but the background is complex and changeable, and data enhancement is lacking. Literature [3] uses the three features of contrast, variance and mean value for comprehensive analysis to diagnose whether the insulator has self-explosion. However, the current method lacks adaptability in the area division. In literature [4], the detected insulator region was processed into blocks, and then the distance measurement between chromaticity was used for statistical analysis. The method is sensitive to environmental changes and has insufficient sensitivity. Literature [5,6,7,8,9] is the experimental effect achieved by the latest research on the improvement of YOLO model. While the single performance is improved, the accuracy and mAP obviously have relatively large errors. In the literature [10], insulator defect detection based on deep learning convolutional autoencoder (DCAE) was used to encode and decode the insulator image, and then whether there was a defect was determined according to the reconstruction error and anomaly score. This method did not need a large amount of labeled data and could effectively extract image features. When the accuracy is improved, the image is processed by slicing, and the appropriate threshold is set to distinguish the normal and abnormal images, which makes the early workload quite large, and there are too many influencing factors such as image quality and noise, resulting in poor detection results. Literature [11] uses a deep learning insulator defect detection method based on the framework of two-stage target detection Fast-RCNN +FPN. Compared with the one-stage network, the two-stage network structure can better deal with the problem of multi-scale and small targets. At the same time, the improvement of deep learning algorithm leads to more calculation and poor real-time performance. The global RoI method leads to the large size of the feature map, occupies more memory and computing resources, and reduces efficiency. Therefore, the development of more efficient and adaptive target detection technology has become an urgent task for the challenges and needs unique to the diversity and complex background of insulators.

2 Related work

In this study, the YOLOv5 model was used for insulator defect detection. The YOLOv5 model was improved in view of problems such as high missed or false detection rate, diversity of insulators, and inability to accurately identify small defective insulators in insulator defect image detection. The improvement work is as follows: (1) C2f of YOLOv8 is used to replace C3 of backbone part of YOLOv5. The structure of \(\textrm{C} 2 \textrm{f}\) module is inspired by the design of \(\textrm{C} 3\) module and ELAN module, which can realize the bidirectional fusion of features and thus realize the lightweight of features. (2) Adding the SimAM [12] attention mechanism to the head part of YOLOv5 can effectively enhance the expression ability of features and improve the adaptability of multi-scales by adjusting the attention of feature maps before each detection head. (3) The Wise-IoU [13] loss function is introduced to dynamically update the normalization factor, which can ensure that the gradient gain remains at a high level as a whole, so as to optimize the training effect of the model.

YOLOv5 and YOLOv8 are both advanced target detection algorithms in the field of deep learning. They both adopt the backbone-neck-head network architecture, but there are some differences in details. YOLOv5 has fewer model parameters and a smaller amount of computation, making it more suitable for operating in resource-constrained environments, such as mobile devices or edge computing platforms. On both the CPU and GPU, YOLOv5 is able to achieve higher frames per second (FPS), which is an important advantage for applications that require real-time processing. YOLOv5 does not need to design a large number of anchor boxes, and it simplifies the detection process by directly predicting the center point of the target, which is especially advantageous in small target detection. For specific detection tasks, such as insulator detection, these characteristics make YOLOv5 a more suitable choice. Although YOLOv8 may have some improvements in some aspects, YOLOv5 still has significant advantages in lightweight, speed, stability, and adaptability to specific scenarios.

YOLOv5, as a relatively advanced network architecture in the YOLO series and a single-stage object detection network, has achieved good results in detection accuracy and speed. The YOLOv5 algorithm model mainly consists of three parts: the backbone network, the feature fusion module (Neck), and the detection module (Detect). This structure provides an efficient and accurate solution for image object detection, and its detailed structure is shown in Fig. 2.

  1. (1)

    A concise backbone network: mainly used for extracting image features, using CSPNet [14] to segment feature maps and cross connect them at different stages, thereby reducing computational complexity and improving the learning ability of feature maps; it includes the focus module, which reduces the width and height of the input image by half through slicing operations, while increasing the number of channels by four times. This allows for more effective extraction of image detail features without increasing computational complexity; introduced adaptive anchor box calculation, which can automatically adjust the size and proportion of anchor boxes based on the training dataset.

  2. (2)

    Feature Fusion Module (Neck): It is used to fuse feature maps of different scales, YOLOv5 adopts the PANet (path aggregation network) structure, which enhances information flow and feature utilization by adding paths between low-level and high-level feature maps; in the neck section of YOLOv5, there is not only top-down feature fusion, but also a bottom-up path, which can better preserve high-resolution detail information while utilizing deep semantic information; it also includes a spatial pyramid pooling (SPP) module, which can pool features at different scales, making the model more robust to the size of the input image.

  3. (3)

    Detect: For locating the position of the target and predicting the class of the target, YOLOv5 uses anchor boxes to predict the bounding box of the target. In each anchor box, YOLOv5 predicts the class and position offset of the target. During the detection process, there will be many overlapping prediction boxes. YOLOv5 uses the NMS algorithm to remove these overlapping prediction boxes and only keeps the prediction boxes that are most likely to contain the target.

Fig. 2
figure 2

YOLOv5 algorithm structure diagram

3 Methodology

This article takes YOLOv5 as the baseline model and proposes YOLOv8’s C2f to replace the \(\textrm{C} 3\) in the backbone part of YOLOv5. SimAM attention mechanism is added to the head part of the original YOLOv5, and the Wise-IoU loss function is introduced. The improved model structure diagram is shown in Fig. 3.

Fig. 3
figure 3

YOLOv5 algorithm improvement structure diagram

3.1 C2f module concept and introduction effect

The \(\textrm{C} 2 \textrm{f}\) module is a feature fusion layer based on the \(\textrm{C} 3\) module, which draws on the ideas of ELAN and aims to improve gradient flow information and lightweight capabilities. The C2f module is mainly divided into two key parts: SPP (spatial pyramid pooling) and PAN (path aggregation network). Through the synergistic effect of these two parts, the \(\textrm{C} 2 \textrm{f}\) module can introduce more effective feature representations into the neural network. Firstly, SPP achieves effective feature extraction for targets of different sizes by constructing a pyramid structure with pooling layers of different scales. This can enable the network to have better perception ability and recognize objects of different sizes; next is PAN, which mainly solves the problem of information fusion between feature maps of different scales. The PAN module adopts multiple cross-stage partial network fusion (CSP) modules to fuse information from shallow and deep feature maps. This can increase the perception range of the network and improve the accuracy of object detection.

During the process of insulator defect detection using drones, due to the irregular size of defects and the varying distances between drones and targets, there may be issues with accuracy not meeting expectations in actual insulator defect detection. Therefore, this improvement is based on the original C3 of YOLOv5, replacing the original \(\textrm{C} 3\) module of YOLOv 5 with \(\textrm{C} 2 \textrm{f}\). By adopting the \(\textrm{C} 2 \textrm{f}\) module, the model can leverage its efficient feature merging and channel partitioning mechanism to reduce computational requirements and accelerate inference speed. In addition, the design of the \(\textrm{C} 2 \textrm{f}\) module has also optimized the extraction and integration of multi-scale features, thereby improving the performance of the model in identifying small targets. The C2f structure diagram is shown in Fig. 4:

Fig. 4
figure 4

C2f structure

3.2 Adding wise-IoU loss function

In the field of insulator defect detection, the loss function plays a crucial role. The main goal is to improve the accuracy of the detection model by optimizing the position error between the predicted frame and the enclosing frame of the actual insulator defect. By reducing the difference between the predicted frame and the real frame, the optimization of the loss function enables the model to more accurately identify and locate defects on the insulator. Because the training data inevitably contains low-quality examples, geometric measures such as distance and aspect ratio will aggravate the penalty for low-quality examples and reduce the generalization performance of the model. A good loss function should reduce the penalty of geometric measures when the anchor frame and the target frame coincide well, but more intervention training will make the model have better generalization ability. Therefore, the Wise-IoU v1 version is adopted as a new loss function in this study to balance the model training results pushed by different masses and obtain more accurate detection results. The parameters of Wise-IoU are shown in Fig. 5:

Fig. 5
figure 5

Wise-IoU parameters diagram

IoU loss is defined as:

$$\begin{aligned} {L}_{\textrm{Iou}}=\frac{\left( \textrm{B} \cap {B}^{\textrm{gt}}\right) }{\left( \textrm{B} \cup {B}^{\textrm{gt}}\right) } \end{aligned}$$
(1)

In the formula: B is the predicted bounding box, and \({B}^{\textrm{gt}}\) is the real bounding boxes.

The distance loss formula for the Wise-IoU loss function is:

$$\begin{aligned} {R}_{\text{ Wise-IoU } }= & {} \exp \left[ \frac{\left( {x}-{x}_{\textrm{gt}}\right) ^2 +\left( {y}-{y}_{\textrm{gt}}\right) ^2}{\left( {~W}_{\textrm{g}}^2 +{H}_{\textrm{g}}^2\right) ^*}\right] \end{aligned}$$
(2)
$$\begin{aligned} {L}_{\text{ Wise-IoU } \text{ v1 } }= & {} {R}_{\text{ Wise-IoU } } {L}_{\text{ Iou } } \end{aligned}$$
(3)

In the formula: As shown in Fig. 5, \({W}_{\textrm{g}}\) and \({H}_{\textrm{g}}\) belong to the minimum enclosed box size; \({x}, \textrm{y}\) are the coordinate of the center point of the prediction box; \({x}_{\textrm{gt}}\) and \({y}_{\textrm{gt}}\) are the coordinates of the center point of the real box.

\({R}_{\text{ Wise-IoU } } \in [1, {e})\), this will significantly enlarge the \({L}_{\text{ Iou } }\) of the ordinary quality anchor box. \({L}_{\text{ Iou } } \in [0,1]\), this will significantly improve the \({R}_{\text{ Wise-IoU } }\) of high-quality anchor frames, and significantly reduce its focus on the distance to the center point when the anchor box and target box overlap well.

To prevent \({R}_{\text{ Wise-IoU } }\) from generating gradients that hinder convergence, separate \({W}_{{g}}\) and \({H}_{{g}}\) from the computational graph (superscript * indicates this operation) because it effectively eliminates factors that hinder convergence.

After the above improvements, the YOLOv5n model has improved its recognition ability and robustness during the image preprocessing stage, further enhancing its feature extraction and fusion capabilities.

3.3 Adding SimAM attention mechanism

The attention mechanism is a novel attention module discovered based on the combined processing of human visual and brain signals. For the task of detecting insulator defects with similar background environments, research has found that introducing attention mechanisms in the head can effectively remove irrelevant interference. Therefore, this article introduces a parameter-free attention module SimAM. This attention mechanism is implemented by an energy function, which is related to spatial inhibition in visual neuroscience. In other words, in visual processing, neurons that exhibit significant spatial inhibitory effects should be given higher priority (i.e., importance). The simplest way to find these neurons is to measure the linear separability between a target neuron and other neurons. Based on these neuroscience findings, we defined the following energy function for each neuron:

$$\begin{aligned} {e}_{{t}}\left( {w}_{{t}}, {b}_{{t}}, {y}, {x}_{{i}}\right)= & {} \frac{1}{{M}-1} \sum _{{i}=1}^{{M}-1}\left( -1-\left( {w}_{{t}} {x}_{{i}}+{b}_{{t}}\right) \right) ^2\nonumber \\{} & {} +\left( 1-\left( {w}_{{t}} {t}+{b}_{{t}}\right) \right) ^2+\lambda {w}_{{t}}^2 \end{aligned}$$
(4)

The weight and deviation obtained by solving formula (4) are shown in formula (5) and formula (6):

$$\begin{aligned} {W}_{{t}}= & {} -\frac{2\left( {t}-{u}_{{t}}\right) }{\left( {t}-{u}_{{t}}\right) ^2+2 \sigma _{{t}}^2+2 \lambda } \end{aligned}$$
(5)
$$\begin{aligned} {b}_{{t}}= & {} \frac{1}{2}\left( {t}+{u}_{{t}}\right) {w}_{{t}} \end{aligned}$$
(6)

Among them, \({u}_{{t}}\) and \(\sigma _{{t}}\) represent mean and variance, and the SimAM attention mechanism structure diagram is shown in Fig. 6:

Fig. 6
figure 6

SimAM attention mechanism structure

The formula of the energy function indicates that the lower the energy, the greater the difference between the neuron and the surrounding neurons, thus increasing its importance. Based on this concept, the output formula of SimAM (similarity attention module) can be derived from this. Among them, E represents the sum of the minimum energy in all spatial and channel dimensions, and A is the input feature. After applying the sigmoid function to the outliers that suppress attention weights, multiplying them with the corresponding elements of the input feature matrix yields the final output:

$$\begin{aligned} \widehat{\text {A}}={\text {Sigmod}}\left( \frac{1}{\text {E}}\right) \cdot \text {A} \end{aligned}$$
(7)

This experiment places SimAM in the head section to improve the model’s attention to important features in the image, thereby enhancing the performance of object detection. SimAM adaptively adjusts the information in the feature map to enable the model to focus more on useful features for the current insulator defect detection task.

4 Experiments and results

4.1 Experimental environment construction

The experimental parts involved in this article were all completed under the agreed experimental environment, which consists of hardware and software. The specific parameters are shown in Tables 1 and 2.

Table 1 Configuration of experimental hardware environment
Table 2 Experimental software environment configuration

4.2 Dataset collection and image enhancement

The experiment used the Chinese Power Line Insulator Dataset (CPLID) public dataset, with a total of 848 images and an input image size of \(640 \times 640\). Due to the limited variety and quantity of insulator defect datasets, in order to improve the accuracy of model training results and prevent overfitting during the process, this study enhanced the dataset with images and performed the following enhancement operations:

Flip: Flip the insulator image clockwise and counterclockwise by 45 degrees each, for a total of 90 degrees.

Brightness: Use Roboflow to enhance and decrease the brightness of the image by 30 degrees each.

Saturation: Increase or decrease the saturation of the insulator dataset image by 30 degrees, totaling 60 degrees.

After the above data augmentation operations, Roboflow is used to convert the data set format into YOLO format output, and the data set is automatically re-labeled with insulator and defect locations. The final dataset was expanded to 5581 images in total. Before the model training, the data set was divided into 4906 images in the training set, 506 images in the test set, and 169 images in the validation set.

In order to carry out comparative experiments to prove that all the innovations of this improvement are feasible, a custom data set is used at the same time. The experiment uses the custom data set ins defect on Roboflow. It contains 2089 images collected from different angles, illumination, scale, lighting, background, and insulator types, and contains 2089 images of real environment occlusion and scale change. When training the model, the dataset was divided into 1497 images as the training set and 395 images as the validation set.

4.3 Comparative analysis of experimental results

4.3.1 \(\textrm{C} 2 \textrm{f}\) contrast experiment and effect analysis

A comparative experiment was conducted on YOLOv5n model to verify the performance of the improved algorithm. There were four sets of test results, including accuracy, recall rate, mAP and mAP50-95 before and after the improvement.

The first group is the initial YOLOv5n model; Group 2 refers to the change of C3 module to C2f module on the basis of group 1, and group 1 and group 2 are comparative experiments conducted on public data sets. Group 3 and Group 4 are experimental comparisons on custom data sets on Roboflow. The P–R curve of the comparison experiment is shown in Fig. 7. Table 1 compares the performance indicators. Table 3 shows the results of C2f comparison experiment.

Fig. 7
figure 7

Precision–Recall curve

As can be seen from Table 3 of the comparative experiments, after C2f module is used to replace C3 module, the recall rate of the improved model on the public data set increases by 0.4%, mAP by 0.3%, and mAP50-95 by 2.5% and improved accuracy on custom datasets by 0.9% and mAP50-95 by 1.5%. The comparative experimental results show that the performance of replacing C3 with C2f on the original YOLOv5n model is improved, thus verifying the feasibility of the improvement and the effectiveness of insulator defect detection.

Table 3 Results of C2f comparison experiment

4.3.2 Loss function comparison experiment and experimental results

In order to comprehensively evaluate the performance of the improved model in this study, comparative experiments were designed with different loss functions on a common dataset.

To investigate the impact of the Wise-IoU loss function on algorithm performance in the YOLOv5n model, this paper embeds five common loss functions into the algorithm network YOLOv5n. YOLOv5n defaults to using the Ciou loss function, forming a comparative experiment between the five loss functions proposed in Table 4 and the model before and after improvement.

Table 4 Comparative experimental results of loss function
Fig. 8
figure 8

Improved model results

For the detection of insulator defect images, different loss functions were added to the YOLOv5n model under the same hardware environment. According to Table 4, after introducing the Wise-IoU loss function, the mAP50 of the model reached a maximum of 98.9, which is currently the best performance compared to the other loss functions introduced this time. This verifies the feasibility of selecting Wise-IoU as the loss function. The results of the YOLOv5n model with the introduction of the Wise-IoU loss function are shown in Fig. 8.

4.3.3 Comparative experiment and result analysis of attention mechanism

At present, there are thirteen commonly used attention mechanisms in the improved model. This study selected seven comparative experiments with SimAM attention mechanisms to verify the effectiveness of this improvement. The comparative experiment is shown in Table 5.

Table 5 Experimental results of attention mechanism comparison

After comparing the final algorithm model with different attention mechanisms such as GAM, SK, and SGE embedded in the public dataset provided in this article, it can be concluded that although the SimAM attention mechanism does not have the highest performance among several attention comparisons, its performance in recall, mAP50, and mAP50-95 is relatively balanced and above average, with stable performance. The other types of attention perform best in a certain aspect, but the overall performance difference is too large, which is inconsistent with the original intention of this improvement concept. Therefore, it is still best to introduce the SimAM attention mechanism for performance.

Table 6 Ablation results 1
Fig. 9
figure 9

Insulator defect detection results

4.3.4 Ablation experiment

In order to verify the effectiveness of replacing C3 module of YOLOv 5 with \(\textrm{C} 2 \textrm{f}\) module, adding SimAM attention to backbone part, and Wise-IoU loss function improved in this paper, ablation experiments are carried out to evaluate the influence of each improved module on the detection algorithm in this paper. The ablation experiment uses the original YOLOv5n experimental results as the benchmark, and the performance of the experiment on the public data set is shown in Table 6, and the performance under the custom is shown in Table 7.

Table 7 Ablation results 2

It can be seen from Tables 3 and 7 that the original YOLOv5n can obtain \(94.9 \%\) precision, \(91.1 \%\) recall rate, \(94.2 \% \textrm{mAP} 50-95\) and \(68.6 \% \textrm{mAP} 50-95\) results on the ins defect data set. After gradually adding the three improved modules, the detection indicators are basically improved. It shows that each module contributes to the task of insulator defect detection on transmission lines and also validates the rationality of taking enhancing the gradient information flow of the model and reducing the influence of noise features, reducing the competitiveness of high-quality anchors, and reducing the harmful gradients generated by low-quality samples as the starting point. In the public data set and the custom data set, YOLOv5n model has significantly improved the accuracy and mAP50-95 of the models after the introduction of the three innovations. Thus, the feasibility and efficiency of the improved model can be illustrated, and insulator defects can be detected with higher accuracy. The detection results are shown in Fig. 9.

Table 8 Comparison of results of different detection algorithms
Fig. 10
figure 10

P–R curve before and after improvement

Firstly, the accuracy of YOLOv 5n is improved from \(97.1 \%\) to \(97.7 \%\), and the recall rate is improved from \(98.0 \%\) to \(98.4 \%\) under the public data set after replacing the C3 module of YOLOv5N with C2f. The mAP50 increased from \(98.7 \%\) to \(99 \%,\) and the MAP50-95 value increased by \(0.2 \%\), which proves that the improved model helps to improve the object detection accuracy of the model in low light environment. Secondly, the mAP50-95 value is increased by \(0.6 \%\) after introducing the SimAM attention mechanism, which proves that the model with the SimAM attention mechanism can strengthen the model’s attention to the target feature by simplifying the calculation process without significantly increasing the computational burden. After introducing the Wise-IoU loss function, the mAP is increased by \(0.2 \%\) and the mAP50-95 is increased by \(0.6 \%\), which proves that the improved model can optimize the regression of the bounding box by dynamic focusing mechanism, so as to improve the detection accuracy. Finally, when the model integrates three improvements, the accuracy is significantly improved to \(97.7 \%\), and the mAP50-95 is improved to \(78.7 \%\). The above data clearly verify the feasibility of this improvement for the insulator defect detection task.

4.3.5 Comparison experiments of different detection algorithms

By comparing the performance of the improved lightweight detection model YOLOv5n with other mainstream detection algorithms and the existing improved lightweight YOLOv5 algorithm, the data in this study are the results of 300 rounds of experimental epochs, the specific results are shown in Table 8, and the P–R curve before and after improvement is shown in Fig. 10.

5 Conclusion

This study is based on the original YOLOv5n algorithm. By replacing the C3 module with the C2f module of YOLOv8 on the basis of the original YOLOv5n, the SimAM attention mechanism is introduced, and the loss function is improved to the Wise-IoU loss function to improve the detection of insulator defects on the transmission line. Compared with YOLOv5, YOLOv8 is more lightweight, and the improvement and comparison experiments in the above three parts prove that the improved algorithm has significantly improved the accuracy and recall rate in defect detection on the public data set and custom data set. It shows the feasibility of the improved research and can be better applied to similar complex and variable image detection environment.