Keywords

1 Instruction

At present, most high-voltage equipment such as transformers, voltage/current transformers and capacitors in substations use insulating oil as insulating material, which can achieve insulation, cooling and arc extinguishing of high-voltage equip-ment. If the oil-filled equipment leaks oil, it will affect the safe and stable operation of the power grid and reduce the service life of the equipment. Therefore, it is necessary to study a method for detecting oil leakage of substation equipment, so as to realize the timely detection of oil leakage detection of equipment and improve the stability of power grid operation.

Substations are important nodes for stable and continuous power transmission. Traditional substation inspections rely on on-site inspections by professionals, result-ing in high inspection costs and low inspection efficiency. In addition, there are a lot of unsafe factors in the inspection process, which affects the personal safety of inspectors. With the use of monitoring and inspection robots in substation inspections, the pressure of manual inspections has been greatly reduced, but inspections rely on artificial intelligence recognition [1]. The combination of deep learning technology and substation inspection can greatly improve the detection efficiency [2]. Figure 1 above shows the oil leakage in several typical scenarios of the substation. The following problems can be summarized from the figure: (1) There are many equipment with oil leakage, including transformers, transformers, capacitors and other equipment. (2) The observable parts of oil leakage exist not only on the surface of the equipment, but also on the ground, with various shapes and transparency. (3) The leaking oil is transparent and similar to the shadow color of the equipment, and has no self-fixing characteristics.

Therefore, this paper proposes an oil leakage detection technology for substation equipment based on fusion SLIC, which can highlight the location of oil leakage, re-duce the influence of complex background, and improve the detection accuracy.

Fig. 1.
figure 1

Schematic diagram of oil leakage from substation equipment. In the picture, the background of the oil leakage part is complicated and the color is darker

At present, there are few researches on the detection of oil leakage from substation equipment. The traditional method relies on inspectors to irradiate easy-to-penetrate points such as casings and welds with flashlights, and make visual inspections through reflection, but this method has limitations in the inspection of warehouses and elevated equipment. Or regularly observe and judge through the oil level gauge, the timeliness is low.

Dong Baoguo [3] detected and segmented abnormal areas by difference method based on the color of leaking oil, and compared the color characteristics of abnormal areas in two images to obtain the result of oil leakage. However, this method relied on pictures taken when the leaking parts did not leak in the early stage for comparison. Wang Yan fused the OTSU algorithm with the detected oil leakage area by using the difference method and the segmentation method of monitoring target image, com-pared the images before and after oil leakage in the area, and analyzed and judged the oil leakage area by using the HS color histogram method. This method still relied on the images before and after oil leakage [4], which had limitations. In order to improve the detection rate of oil spill targets and reduce the influence of shadow lighting on the detection model, Huang Wenli et al. [5] proposed an attention segmentation network based on edge fusion, which made full use of the spatial background information of oil spill forms and proposed a self-attention mechanism to improve the detection rate of oil spill. Yang Minchen and Zhang Yan et al. [6] irradiated the oil leakage position with ultraviolet flashlight based on the fluorescence characteristics of the oil leakage. In a dark environment, the oil leakage position would be purple and prominent, but this method could only be detected in the dark and had limitations in the daytime sunshine conditions. Wu et al. [2] studied the detection method based on visible image information of oil leakage, used lightweight Mobilenet-SSD deep net-work model to train oil leakage pictures, and deployed them in edge equipment to achieve intelligent positioning and detection of oil leakage. This method has high practicability. Although machine learning is extremely capable of learning image features, it has limitations in the face of challenges such as the complex background and obscure features of oil seeps.

Image segmentation [7] provides an ideal method to solve images with complex background interference and is one of the key technologies in CV field, especially color image segmentation [8], which can extract interesting or meaningful pixel sets and features in images [9]. Watershed segmentation algorithm [10], based on the similarity criterion, utilizes morphology and topological theory to traverses pixel sets and sub-merges pixels according to the threshold value. If the threshold value is greater than, a boundary will be formed to realize the classification of neighborhood pixels. This algorithm is susceptible to noise. In recent years, with the rapid development of artificial intelligence, image segmentation based on graph theory has also attracted widespread attention [11]. Image segmentation based on graph theory continuously optimizes the weight of pixel edge set after segmentation to achieve the purpose of minimum segmentation through optimization processing. GrabCut is a typical segmentation method based on graph theory [12,13,14]. Users input an bounding box as the seg-mentation target location to achieve the separation and segmentation of targets and complex backgrounds. However, this method has problems such as high time complexity and poor processing quality when targets and backgrounds are similar. Simple Linear Iterative Clustering (SLIC) algorithm shows advantages in generating subimages with good boundary compliance [15, 16]. SLIC is a super pixel algorithm based on K-means clustering, which has the advantages of low time complexity and better edge fitting [17]. In addition, density-based noise application spatial clustering (DBSCAN) [18, 19] performs well in grouping sub-images belonging to the same clus-ter.

Also, Vaswani et al. [20] proposed Transformer for the first time, establishing a new encoder-decoder architecture based on multi-head self-attention mechanism and feedforward neural network. Then, Dosovitskiy et al. proposed the so-called ViT (Vision Transformer) [21], which is a complete Transformer, and has superior performance in image classification task when it is directly applied to image patch sequence. Additionally, the training process is also able to greatly simplified due to the unique advantage of the deep learning method [22,23,24,25,26].

In this paper, through the research of superpixel segmentation and oil leakage de-tection, the oil leakage detection of oil filling equipment in the substation scene is realized. Firstly, the method uses SLIC technology to perform super-pixel segmentation on oil leakage image and obtain the super-pixel segmentation result. Then, DBSCAN technology was used to cluster the segmentation results to highlight the oil leakage area. Then the image is recognized by ViT, and good recognition results are obtained. Finally, the effectiveness and feasibility of the proposed method are verified by experiments in substation scenarios. The flow of oil leakage identification method is shown in the figure below (Fig. 2).

Fig. 2.
figure 2

Flowchart of the oil leakage detection

2 Oil Leakage Detection Based on Fusion SLIC and Transformer

2.1 Superpixel Segmentation Based on SLIC

The SLIC algorithm divides the image into superpixels, and each region has the same size and is named S. The geometric center of each region is considered as the center of the superpixel, and the coordinates of the center are updated at each iteration. Superpixels are grouped according to measurements of spatial distance \(d_s\) and \(d_c\) intensity (a measure of spatial and intensity distance).

$$ d_s = \sqrt {\left( {x_j - x_i } \right)^2 + \left( {y_j + y_i } \right)^2 } $$
(1)
$$ d_c = \sqrt {\left( {I_j - I_i } \right)^2 } $$
(2)

In the above formula, \(\left( {x,y} \right)\) represents the position of each pixel, and \(\left( {I_j ,I_i } \right)\) represents the normalized pixel intensity.

Introducing the total distance of two measurement units \(d_s\) and \(d_c\), calculated as follows:

$$ D{ = }\sqrt {d_c^2 + \left( {\frac{d_s }{S}} \right)^2 m^2 } $$
(2)

In the above formula, m represents the compactness coefficient. The larger the parameter m, the more compact the generated superpixel area; on the contrary, the more superpixels fit the contour of the image, but the size and shape will be irregular. Figure 3 shows the results of oil leakage data based on SLIC superpixel segmentation.

Fig. 3.
figure 3

Image super-segmentation results based on SLIC. (a) Pictures of oil leakage from equipment in substations; (b) Superpixel segmentation of images; (c) Local magnification of oil leakage

2.2 Superixel Clustering Based On DBSCAN

The main idea of DBSCAN clustering is as follows: in two-dimensional space, the neighborhood within the radius of a given object is called Eps of the object, and if the Eps of the object contains at least the minimum threshold MinPts of objects with simi-lar attributes, the object is called core object. For any sample that is in the domain of the core object, it is called density direct. DBSCAN searches the cluster by examining Eps at each point in the data set. If the Eps of point p contains more than MinPts, a new cluster with p as the core object is created. DBSCAN then iteratively collects density-reachable objects directly from these core objects, which at the same time involves merging several density-reachable clusters. The process terminates when a new point cannot be added to any cluster.

The oil leakage image can be regarded as a special spatial data set, in which each pixel has a position coordinate and corresponding color value. By finding spatial clus-ters, clusters in the oil leakage image can be found effectively. Pixels with similar col-ors and spatial connections can be grouped together to form a segmented area. The difference between spatial clustering and pixel clustering lies in that image pixels are not only distributed in spatial space, but also in other feature Spaces such as color. The pixels divided into a cluster should not only be spatially connected, but also similar in color. Table 1 shows the image clustering process of leaking oil based on DBSCAN.

Figure 4 shows the DBSCAN based superpixel clustering result. As can be seen from the figure, after DBSCAN clustering, oil stains with similar features are clustered together, eliminating other unrelated features and inhibiting complex background, which is conducive to the detection of oil leakage.

Table 1. Clustering process based on DBSCAN algorithm
Fig. 4.
figure 4

Superpixel clustering results based on DBSCAN. It can be seen from the figures that after DBSCAN pro-cessing, the background is weakened, and the oil leakage part is more prominent

2.3 Oil Leakage Detection and Analysis Based on Transformer

In this paper, for the convenience of description, the service coding is simplified to X ( X = A, B, C…), the service node coding is simplified to i (i = 1, 2, 3…), thus the service node identification is simplified to Xi (X = A, B, C…; i = 1,2,3…). The service overall topology diagram is shown in Fig. 1. The topology diagram involves four services, namely, A, B, C and D. The service A is provided by service node A1. The service B can be provided by service node B1, B2, B3, B4, B5. The service C is supplied by service node C1, C2 and C3. The service D is provided by service node D1 and D2.

The traditional Transformer is mainly composed of two parts: encoding and decoding. The multi-head attention mechanism is the core of the Transformer, which enables the model to remember the key information in the picture like the human visual attention. Refer to the image sequence processing method mentioned in the paper [21]. First, the image is cut, and the image is divided into several image blocks; secondly, the image block is sent to the trainable linear projection layer, and position encoding is performed. Before sending the image to the encoder, the extracted image features need to be positioned. Coding, the position coding adopts the sine and cosine function to generate the position code, and then adds it to the feature image of the corresponding position. The position coding adopts the random initialization method, and the position coding function is:

$$ PE\left( {pos,2i} \right) = \sin \left( {\frac{pos}{{10000^{\frac{2i}{{d_{\bmod el} }}} }}} \right) $$
(4)
$$ PE\left( {pos,2i{ + }1} \right) = \cos \left( {\frac{pos}{{10000^{\frac{2i}{{d_{\bmod el} }}} }}} \right) $$
(5)

In the above formula, pos is the absolute position of pixels in the feature graph, \(d_{\bmod el}\) is the dimension of the image, \(2i\) and \(2i + 1\) represent parity.

The embedded patch and position encoding are superimposed to obtain an embedded vector, which is sent to the Transformer encoding layer for processing. Transformer encoding layer consists of multi-head attention and multi-layer perceptron. As shown in Fig. 5, multi-head attention contains multiple attention mechanisms, and a single attention contains query matrix, key matrix and value matrix, which are multiplied by the embedding vector The weight matrix is obtained:

$$ Q = X \times W^Q $$
(6)
$$ K = X \times W^K $$
(7)
$$ V = X \times W^V $$
(8)

In the above formula, \(Q\) is query matrix, \(K\) is key matrix, \(V\) is value matrix, \(X\) is output embedding vector, \(W^Q\), \(W^K\) and \(W^V\) corresponds to the weight matrix, respectively. The final output of the self-attention mechanism is:

$$ Z = {\text{soft}}\max \left( {\frac{QK^T }{{\sqrt {d_k } }}} \right)V $$
(9)

In the above formula, \(d_k\) is dimension of \(K\).

Fig. 5.
figure 5

Transformer Encoder structure

Fig. 6.
figure 6

Transformer Decoder structure diagram

Through the Transformer encoder, the features of the input image can be extracted. Unlike the RNN operation, there is no need for a convolutional neural network as the backbone network.

The key vector and value vector output by the encoder form a self-attention vector set, and the self-attention vector set is input to the decoding module to help the decoding module pay attention to which part of the input oil leakage image is the focus area. The decoder consists of multi-head attention and FNN layers, the location of oil leakage in the oil leakage image is obtained by three-layer linear transformation and ReLU in FFN, and the category of the object is obtained by a single linear layer. The decoder is shown in Fig. 6.

The entire oil leakage identification model is divided into three parts. The back-bone network extracts image features, the encoder-decoder performs information fu-sion, and the feedforward network performs prediction. As shown in Fig. 7 below, the backbone network is used to learn to extract the features of the original image.

The encoder reduces the dimension of the input feature image and converts the two-dimensional feature image into one-dimensional feature image structure. Finally, the output of the top encoder is an attention vector set containing key vector and value vector. The decoder uses a small number of fixed query vectors (N) as input, and dif-ferent query vectors correspond to different output vectors. The query vector is then decoded into box coordinates and class labels via FFN, resulting in N final predictions. The following figure shows the identification process of seepage oil based on ViT.

Fig. 7.
figure 7

Oil leakage detection results base on Transformer. The detection process includes n encoding modules and n decoding modules, 8 encoding modules and 8 decoding modules are used in this paper

3 Basis of Model Training

In this paper, the image recognition method based on the fusion SLIC method is adopted to verify the validity and accuracy of the image data of oil leakage from the substation end filling equipment. An image recognition experiment is designed to test the accuracy and difference between the proposed method and Transformer and faster-RCNN.

3.1 Software and Hardware

The test conditions of this paper are: CentOS 8, 64-bit operating system, Pytorch framework. Computer configuration: Desktop COMPUTER, NVIDIA TESLA P100, 32 GB video memory; E5-2680 V4 CPU processor, maximum main frequency 3.30 GHz, disk capacity 500 GB, Python programming language.

3.2 Path Planning

The original data set of this paper takes images for substation inspection, with a total of 4400 images of substation business scenes. In this paper, 220 images of about 5% of the 4400 images were randomly selected as the final test data, and the remaining 4180 images were used as the training data set. The original image contains two types of working conditions, oil seepage and oil leakage, which are not evenly distributed in the image. The image of the same scene is shot from multiple angles and the background is complex. The image data of oil leakage conditions of the two types of oil filling equipment used in this paper are as Fig. 8.

Fig. 8.
figure 8

Oil leakage images. (a) Oil spill image, Oil spills are mainly distributed on the ground; (b) Oil seepage image, Oil seepage is mainly distributed on the surface of the equipment

3.3 Experimental Results

Accuracy (P), Recall (R) and Average Precision (AP) are used as evaluation indexes to evaluate the proposed method. Among them, the calculation method of AP value refers to the calculation method of Everingham et al. The calculation formula of accuracy and recall rate is as follows:

$$ P = \frac{TP}{{TP + FP}} $$
(10)
$$ R = \frac{TP}{{TP + FN}} $$
(11)

where TP (True Position) is a positive sample that is predicted to be a positive sample, FP (False Position) is a Negative sample that is predicted to be a positive sample, and FN (False Negative) is a positive sample that is predicted to be a Negative sample.

In this paper, the data set of oil leakage working conditions of oil filling equipment in substation scenario is classified. The data set includes 4400 samples in total, and the number of samples of each working condition is shown as Table 2.

Table 2. Number of samples of defect categories of electric oil-filled equipment

This paper first tests the difference of image classification results between the original sample and the expanded sample. Therefore, the image recognition models of the proposed method, Transformer and the Faster -RCNN method are trained by using the original data set and the expanded data set. The training data set of experiment 1 was 2200 original images, the training data set of Experiment 2 was an expanded image data set containing 3200 images, and the training data set of Experiment 3 was an expanded image data set containing 4400 images. For the data of the three experiments, 70% were selected as the training set and the remaining 30% as the verification set.

As can be seen from Table 3, Table 4 and Table 5, the method presented in this paper has shown excellent performance in experiments with different data amounts of 2200, 3200 and 4400. Among them, the identification accuracy of the method presented in this paper is 3.53% higher than Transformer on average and 11.50% higher than Faster-RCNN method on average. The identification accuracy of oil leakage is 2.00% higher than ViT and 15.9% higher than Faster-RCNN method on average. The proposed method has an average identification Precision of 4.93% higher than ViT, 16.27% higher than Faster-RCNN method, 12.53% higher than ViT and 11.70% higher than Faster-RCNN method in oil leakage category. The recognition recall rate of the proposed method in oil leakage class is 6.53% higher than ViT, 15.97% higher than Faster-RCNN method, 3.53% higher than ViT and 8.13% higher than Faster-RCNN method in oil leakage class (Figs. 9, 10 and 11).

Table 3. Accuracy comparison of table
Fig. 9.
figure 9

The accuracy of our method is better than Fast-RCNN and ViT on different datasets

Table 4. Precision comparison of table
Fig. 10.
figure 10

The Precision of our method is better than Fast-RCNN and transformer on different datasets, while the oil spill detection based on transformer is lower than Fast-RCNN

Table 5. Recall comparison of table
Fig. 11.
figure 11

On different datasets, recall comparison of our method outperforms Fast-RCNN and ViT

4 Conclusions

In this paper, aiming at the problems of difficult identification and detection of oil leakage from oil-filled equipment in daily inspection tasks of substations, the oil leakage detection technology of substation equipment based on fusion of SLIC is proposed. Firstly, the SLIC method is used to segment the image to obtain the super-pixel image data, and the oil leakage part is segmented from the background. Secondly, DBSCAN method based on linear iterative clustering was used to cluster similar superpixels to ensure accurate clustering of features of leaking oil condition images and remove the interference of background environment on leaking oil condition recognition. Finally, vision Transformer deep learning network is used to train and learn the oil leakage images collected in the substation field, and a stable and accurate oil leakage model is obtained. The oil leakage detection technology of substation equipment based on the fusion of SLIC proposed in this paper can effectively realize the accurate identification of oil leakage condition of oil-filled equipment in substation inspection task, and provide strong support for the intelligent application of power business.