1 Introduction

As the economy develops, the scale of the power sector is expanding, providing stable support for national livelihoods. However, this expansion also presents new challenges for power system management. Due to the increasing number of power equipment in the system, the traditional manual inspection method has increasingly revealed more shortcomings, such as low inspection speed, numerous blind spots, low efficiency, certain risks involved, and delayed information processing [1]. Technological advancements have offered new inspection methods, with intelligent unmanned aerial vehicles (UAVs) being one of them. UAVs can closely observe power equipment, enabling comprehensive coverage of all equipment and the safe and efficient completion of inspection tasks. They can collect and provide feedback relevant data through functions, such as infrared temperature measurement, image acquisition, and wireless communication to realize informatization management. Liu et al. [2] proposed using intelligent hangars as connecting points to achieve fully automated UAV inspections for steel pylons. Through simulation, it was found that this method can effectively solve the existing problem of inspection. Chang et al. [3] developed a UAV trajectory planning algorithm for autonomous optimization of flight paths during power grid inspections while tracking flight points at ground calibration position. Nusantika et al. [4] used UAVs to detect ice-covering on overhead power lines, designed a Canny method improved by hybrid technology, and validated its accuracy through simulation experiments. Based on the uploaded images of key fittings on power transmission lines obtained through drone inspection and survey, Zai et al. [5] detected and identified hidden dangers in complex backgrounds by analyzing the processed images in the cloud. Experimental results demonstrated that this approach possessed the benefits of superior precision and rapidity. This paper collected insulator images during power equipment inspections using intelligent UAVs. Then, an insulator detection method based on the you only look once version 5 (YOLOv5) algorithm was investigated to realize the intelligent management of power equipment. An improved YOLOv5 model was designed, and its effectiveness was verified through experimental analysis. This article provides a new and reliable method for the better application of intelligent UAVs in power equipment inspection.

2 Intelligent UAV-based power equipment inspection

2.1 Intelligent UAV inspection

This paper focuses solely on insulators as power equipment has diverse types and complex content. Insulators play a vital role in ensuring the safety of the power system, and their operating conditions directly impact system reliability. Insulators are susceptible to failures, and one such failure is insulator self-explosion, which refers to the rupture of insulators during operation. If not promptly addressed, it can significantly impact the operation of the entire system. Traditional manual inspection methods for insulators involve high altitude and live operation, which has low efficiency and poses significant risks. These methods are only suitable for short-term, high-intensity inspections. Therefore, the utilization of intelligent UAV inspections can be a viable alternative.

This paper used a DJI Phantom 4 Pro UAV for inspection of power equipment (Fig. 1). Insulator images were captured under various conditions, including sunny and cloudy days, to construct the dataset. All images were in JPG format and consisted of samples of standard and self-exploding insulators. However, due to the limited number of self-exploding insulator samples, data were collected from multiple inspection routes. In total, 5000 insulator images were gathered, with 3000 representing ordinary insulators and 2000 representing self-exploding insulators (Fig. 1).

Fig. 1
figure 1

The UAV inspection process (left) and the self-exploding insulator photographed by a UAV (right)

2.2 Insulator recognition model based on the improved YOLOv5

2.2.1 YOLOv5 model

Utilizing intelligent UAVs for power equipment inspection, many images of the equipment can be obtained. By processing and analyzing these images, it is possible to achieve information-based management for power equipment inspection. Analyzing these images to determine whether there are any faults in the equipment can be considered a target detection task. Traditional target detection methods are mostly based on image processing techniques, which require data augmentation, contour extraction, and image segmentation to eliminate the background and obtain the targets. However, these methods perform poorly in complex backgrounds with multiple targets. Deep learning algorithms like the region-based convolutional neural network (R-CNN) and YOLO series can greatly improve target detection accuracy. Deep learning-based object detection enables automatic extraction and classification of objects in images, thereby improving detection accuracy. At present, commonly used methods include the R-CNN series and YOLO series [6]. The YOLO series, which belongs to the end-to-end approach, offers higher computational speed. In order to integrate with UAVs, there is a higher requirement for the detection speed of algorithms. Compared to previous versions of YOLO algorithms, YOLOv5 has achieved a better balance between detection speed and accuracy, especially in detecting small targets. It is currently the most widely used algorithm in the field of object detection. Therefore, this paper studied the YOLOv5 model among the YOLO series [7].

YOLOv5 consists of four variants: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. YOLOv5s is the smallest and fastest among these variants, while the other three models are expanded and deepened versions of YOLOv5s. Therefore, this paper selects the base model YOLOv5s. The structure of YOLOv5s is presented in Fig. 2.

  1. 1.

    According to Fig. 2, YOLOv5 consists of four main components, and their descriptions are as follows:

  2. 2.

    Input: Mosaic data augmentation is performed on the input data to enhance the diversity of the dataset.

  3. 3.

    Backbone: It is responsible for extracting features from the input image. It includes several key elements: ① standard convolutional layer CBS: it consists of a convolutional module (Conv), batch normalization module (BN), and the Silu activation function; ② feature extraction layer C3: it comprises three convolutional layers; ③ pyramid pooling module—spatial pyramid pooling-fast (SPPF): Maximum pooling is used to concat feature maps at different scales, and then they are fused using the convolutional layer.

  4. 4.

    Neck: It is a feature fusion module that adopts a pixel aggregation network-feature pyramid network (PAN-FPN) structure. The PAN component transmits locator data upwards by downsampling, and the FPN component fuses the deep and shallow features by upsampling. The interaction between PAN and FPN enhances feature fusion.

  5. 5.

    Head: It is a target detection module. Each prediction head generates an output at a different scale. The boundary regression loss function used is complete intersection over union (CIoU). The final output is the probability distribution of the target category. The Sigmoid function is utilized.

Fig. 2
figure 2

The structure of the YOLOv5 model

2.2.2 Improved YOLOv5 model

To enhance the accuracy and speed of YOLOv5 in power equipment detection, this paper designs an improved YOLOv5 model [(YOLOv5s + convolutional block attention module (CBAM) + efficient intersection over union (EIoU)]. Firstly, to enhance the accuracy of insulator detection, the CBAM attention module [8] is added. The CBAM is a commonly used lightweight attention module, which can effectively improve model performance when added to different models and has been applied in many scenarios [9]. Introducing the CBAM module in YOLOv5 allows the model to pay more attention to important information related to insulator detection and suppress irrelevant features, thereby improving detection accuracy. The CBAM module contains the following two elements:

  1. 1.

    Channel attention module (CAM)

The CAM treats the feature map with a pooling operation and then feeds it into the multi-layer perceptron (MLP) neural network for computation to obtain a new feature map based on the Sigmoid activation function:

$${M}_{c}\left(F\right)=\sigma \left[\text{MLP}\left(\text{Avgpool}\left(F\right)\right)+\text{MLP}\left(\text{Maxpool}\left(F\right)\right)\right].$$
(1)
  1. 2.

    Spatial attention module (SAM).

The SAM treats the feature map obtained from the CAM with a pooling operation again and then calculates the weight using the activation function to obtain the attention-weighted feature map:

$${M}_{s}\left(F\right)=\sigma \left[{f}^{7\times 7}\left(\text{Avgpool}\left(F\right);\text{Maxpool}\left(F\right)\right)\right],$$
(2)

where \({f}^{7\times 7}\) is a convolution operation with a filter size of \(7\times 7\).

Then, as to the marginal regression loss function, CIoU has the drawbacks of slow convergence and not being able to describe the regression objective effectively. On the basis of CIoU, EIoU improves convergence speed and optimizes object detection by penalizing the differences in width and height between the predicted box and true box [10]. Therefore, in this paper, EIoU is used instead of CIoU, and the formula is written as:

$${L}_{\text{EIoU}}=\left(1-\text{IoU}\right)+\frac{{\rho }^{2}\left({b}^{pr},{b}^{gt}\right)}{{c}^{2}}+\left[\frac{{\rho }^{2}\left({w}^{pr},{w}^{gt}\right)}{{\left({w}^{c}\right)}^{2}}+\frac{{\rho }^{2}\left({h}^{pr},{h}^{gt}\right)}{{\left({h}^{c}\right)}^{2}}\right],$$
(3)

where \(\text{IoU}\) represents the intersection over union of predicted and actual frames, \(\rho \left({b}^{pr},{b}^{gt}\right)\) is the distance between the centers of the two frames, \(c\) is the diagonal distance of the minimum outer rectangle between the two frames, \(\rho \left({w}^{pr},{w}^{gt}\right)\) is the difference between two frame widths, \(\rho \left({h}^{pr},{h}^{gt}\right)\) is the difference in height between the two frames, and \({w}^{c}\) and \({h}^{c}\) are the width and height of the minimum outer rectangle between the two frames.

3 Results and analysis

3.1 Experimental environment and data pre-processing

The experiment was conducted in a Windows 11 environment. The YOLOv5 model was implemented using PyTorch 1.7.1 and accelerated using CUDA 11.1. The parameters of the YOLOv5 model are presented in Table 1.

Table 1 The parameter setting of the YOLOv5 model

The images were uniformly scaled to a resolution of 1200 × 800 pixels using the Python Imaging Library (PIL). Moreover, normalization and denoising were performed. The captured images were manually labeled using the LabelImg tool [11]. The format was then converted to the TXT format used by YOLOv5 using Python. The data was divided into three sets, namely training, validation, and test sets in the ratio of 8:1:1. The performance of the algorithm was evaluated based on precision (P), recall rate (R), mean average precision (mAP), and frames per second (FPS).

3.2 Analysis of results

First, different YOLOv5 models were compared to determine the correctness of selecting YOLOv5s. Table 2 presents the volume, mAP, and FPS of the different models with the same parameter settings.

Table 2 Comparison of different YOLOv5 models

According to Table 2, the size of the YOLOv5 models decreased progressively from YOLOv5x to YOLOv5s. The YOLOv5s model had the smallest size, with a volume of 27.12 MB, which was approximately 32% of YOLOv5m and only about 8% of YOLOv5x. All the models maintained a mAP of over 90%. The YOLOv5x model achieved the highest accuracy (96.34%), followed by the YOLOv5l and YOLOv5m models. The YOLOv5s model had the lowest mAP of 92.37%, 3.75% lower than the YOLOv5m model. Regarding detection speed, the YOLOv5x model had 19.26 FPS although its volume and precision were the highest. In contrast, the YOLOv5s model achieved a detection speed of 121.33 FPS, which was approximately six times faster than the YOLOv5x model.

In summary, the high accuracy of x, l, and m was based on the sacrifice of detection speed, and the algorithm was also large in volume, which is not suitable for the inspection of power equipment. However, the YOLOv5s model had a small size, high detection speed, and moderate detection accuracy, and the detection accuracy can be improved by adjusting the model. Therefore, the YOLOv5s model is suitable as the base model. The detection results are presented in Fig. 3.

Fig. 3
figure 3

An example of the detected insulator

Then, the performance of the optimized YOLOv5 model was analyzed (Table 3).

Table 3 The performance analysis of the improved YOLOv5 model

According to Table 3, after adding CBAM to improve the YOLOv5 model, the P value for insulator detection increased by 3.1% compared to the YOLOv5 model, reaching 95.26%. The R value increased by 2.54–87.11%, and the mAP increased by 1.11% to 93.48%. The FPS also improved by 6.62%, reaching 129.36. These results indicated that introducing CBAM effectively enhanced the model's ability to learn insulator features, improving detection accuracy. Furthermore, when EIoU was used to improve the YOLOv5 model further, both the P and R values of the model showed additional increases. The mAP reached 93.81%, which was improved by 0.33% compared to the YOLOv5 + CBAM model. The FPS also improved significantly, with an improvement of 11.18% compared to the YOLOv5 + CBAM model, reaching 145.64. These outcomes demonstrated that replacing the original CIoU with EIoU not only further improved detection accuracy but also effectively enhanced the detection speed of the algorithm, thereby achieving better performance in the informatization management of power equipment inspection.

The optimized YOLOv5 model was compared with the other target detection methods (Table 4), including:

  1. 1.

    the Faster R-CNN algorithm [12],

  2. 2.

    the single shot multibox detector (SSD) algorithm [13],

  3. 3.

    the YOLOv3 model [14].

Table 4 Comparison with other target detection algorithms

According to Table 4, the Faster R-CNN algorithm exhibited a low P value in insulator detection, indicating many misdetected samples. It achieved a final mAP of 86.03% and an FPS of 78.36. The SSD algorithm displayed a lower R value in insulator detection, indicating many missed detections. It achieved a final mAP of 84.33% and an FPS of 88.57. These results suggested that both methods performed poorly in insulator detection. In contrast, the YOLOv3 model achieved a mAP of 90.12%, which was higher than the Faster R-CNN algorithm, and the FPS was also improved to 118.94. However, the improved YOLOv5 model had higher precision and speed in insulator detection. It achieved a mAP of 93.81%, indicating a 3.69% increase compared to the YOLOv3 model, and an FPS of 145.64, indicating a 22.53% increase compared to the YOLOv3 model. These results demonstrated that the improved YOLOv5 model was reliable for insulator detection, with good accuracy and real-time performance. It could effectively meet the informatization management needs of power equipment inspection by intelligent UAVs.

4 Conclusion

This paper focuses on the informatization management of power equipment inspection by UAVs. An improved YOLOv5 model was designed to detect ordinary and self-exploding insulators by collecting insulator images by UAVs. It was found from experiments that the detection precision and speed of the YOLOv5 model were significantly improved after incorporating CBAM and EIoU. Compared with some other target detection methods, the improved YOLOv5 model had the highest mAP (93.81%) and the highest detection speed (145.64 FPS). These findings validate the effectiveness of the proposed improvements made to the YOLOv5 model. The improved YOLOv5 model can be applied in real-world power equipment inspection scenarios.