1 Introduction

The increasing incidents of fires and power outages due to the aging of power equipment highlight the importance of an efficient management system for electrical equipment. Partial discharge (PD), which often occurs in high-voltage power equipment such as switchboards, transformers, and switchgear, is the main cause of shortening the lifespan of insulators and causing dielectric breakdown [1, 2]. Feature extraction such as statistical, fractal and moment parameters based on phase resolved partial discharge (PRPD) are widely used [3, 4]. However, the randomness of partial discharges complicates the characterization of discharges into a single type with a typical graph. Partial discharges can manifest in various forms depending on factors such as the material properties, geometry, and operating conditions of the insulation system such as Protrusion Electrode (PE), Defective Insulator (DI), Floating Electrode (FE) and Noise (NS) patterns [5]. Consequently, a single type of discharge may not exhibit a consistent pattern or signature, making it challenging to apply traditional feature extraction methods designed for typical discharges. Moreover, traditional feature extraction methods are often tailored to detect and analyze specific patterns or characteristics commonly associated with typical discharges. These methods may not be sufficiently robust or adaptable to effectively diagnose the diverse range of atypical discharges encountered in real time scenarios [6].

Among the various Artificial Intelligence (AI) model, The significant advancement in Convolutional Neural Network (CNN) performance has propelled their widespread adoption in the field of image recognition and processing. It is credited ability to extract meaningful features from complex image data and effectively learn the local characteristics of images, thereby facilitating precise classification. Additionally, the classification of partial discharge patterns holds great importance, as it enables the identification of specific defects or the prevention of accidents in advance. Consequently, recent studies are increasingly centered on the application of AI model based on CNN for accurate recognition and classification of partial discharge patterns [7]. In this paper, we proposed two models which are Visual Geometry Group (VGG) and Residual neural network (ResNet) based on the CNN for PD classification in PRPD patterns and utilized the Gradient Weighted Class Activation Mapping (Grad-CAM) model [5, 8] among VGG and ResNet to propose a method for humans to understand the reasons for the results more effectively.

2 Background Theory

This paper proposes two models which are Visual Geometry Group (VGG) and Residual neural network (ResNet) based on the CNN for PD classification in PRPD patterns. Basically, CNN architecture described consists of layers for feature extraction (convolutional and pooling layers), feature processing and dimensionality reduction (fully connected layer), and classification (softmax layer). This architecture has proven effective in various computer vision tasks, including image classification, object detection, and segmentation. The description of each layer is as follows:

  1. (1)

    Convolution layer

    • Convolutional layers apply filters to input image data to extract features. These filters detect patterns such as edges, textures, or shapes within the image.

    • The output of a convolutional layer consists of feature maps, which represent the presence of specific features across the input image.

  2. (2)

    Pooling layer

    • Pooling layers sample the feature maps produced by convolutional layers, reducing their spatial dimensions.

    • Common pooling operations include max pooling and average pooling, which retain the most salient features while discarding redundant information.

  3. (3)

    Fully Connected layer

    • It re-extracts features and performs dimensionality reduction by transforming the high-dimensional feature representation into a lower-dimensional space.

    • The fully connected layer, also known as the dense layer, operates on the flattened output of the preceding layers.

  4. (4)

    Softmax Layer

    • The softmax layer is typically the final layer in a CNN and is used for classification tasks.

    • It applies the softmax function to the output of the preceding layer, producing a probability distribution over multiple classes.

    • Each output neuron corresponds to a class, and the softmax function ensures that the probabilities sum to one, making it suitable for multi-class classification (Fig. 1).

Fig. 1
figure 1

Structure of CNN

2.1 Overview of Proposed Models

VGG is a CNN, comprising multiple convolutional layers stacked on top of each other. The depth of the network, with 19 layers in total, allows it to learn complex hierarchical representations of input images. The model consists of 16 convolutional layers, where each layer is followed by a rectified linear unit (ReLU) activation function, contributing to the non-linearity of the model. Where the x is input data, activation function (ReLU) can be expressed as (1):

$$ {\text{y}} = {\text{Re}} {\text{LU}}({\text{x}}) = {\text{Max}}({\text{x}},0) $$
(1)

VGG utilizes max-pooling layers after certain convolutional blocks to downsample the spatial dimensions of the feature maps while preserving important features. Following the convolutional layers, there are 3 fully connected layers that perform high-level feature extraction and classification. The final fully connected layer typically outputs the class probabilities using a softmax activation function. VGG uses small 3 × 3 convolution filters with a stride of 1 and zero-padding to maintain the spatial resolution of the feature maps. Also, ResNet applied a concept called residual blocks to solve the problem of disappearing gradients by using skip connections that link layers to subsequent ones through an addition operation. This forms a residual block, and the ResNets model is created by stacking these residual blocks together, utilizing ReLU activation functions and 2D convolutions. Figure 2 Shows overall flowchart for deep learning.

Fig. 2
figure 2

Flowchart for learning proposed models

2.2 Grad-CAM

Figure 3 shows that the Architecture of Proposed models. Grad-CAM is a technique that bypasses the necessity of Global Average Pooling (GAP) and instead generates a heatmap by weighting each feature map with the gradient. The effectiveness of Grad-CAM can be substantiated by comparing the formulas and heatmap generation process of both traditional CAM and Grad-CAM.

Fig. 3
figure 3

Architecture of Grad-CAM and Proposed models

In contrast, in traditional CAM calculation, instead of the flattening process typically following the last convolution layer, a GAP (Global Average Pooling) layer is employed. This entails computing the average value of each feature map.

\(f_{k} (i,j)\) from the final convolutional layer, yielding a single numerical output. The association between the last convolution layer and the class is depicted by weight (ω) which are then multiplied by \(f_{k} (i,j)\) to generate k heatmaps. These heatmaps are subsequently summed to yield the final image of the CAM. The formula can be expressed as (2):

$$ L_{Grad\;CAM}^{c} ({\text{i}},{\text{j}}) = {\text{Re}} LU(\Sigma_{k} a_{k}^{c} f_{k} (i,j)) $$
(2)

where the Z is sum of the feature map, \(A_{ij}^{k}\) is kth feature map and \(y^{c}\) is the score for class c, \(a_{k}^{c}\) can be expressed as below (3):

$$ a_{k}^{c} = \frac{1}{z}\Sigma_{i} \Sigma_{j} \frac{{\partial y^{z} }}{{\partial A_{ij}^{k} }} $$
(3)

By examination of the formula, it was noted that a ReLU function was integrated, and the weights were substituted with gradient \(a_{k}\) This illustrates that Grad-CAM, devoid of a GAP layer, holds applicability across a spectrum of CNN architectures. Moreover, Grad-CAM can extend its application beyond solely the final convolution layer to intermediate layers, facilitating the scrutiny of the model's information processing at different stages.

3 Data Preparation

3.1 Data Collection

To collect data for learning proposed models, Ultra-High Frequency (UHF) sensor used for measuring PD activity in gas-insulated substations (GIS). Unlike traditional ultrasonic sensors or current sensors typically employed for PD measurement [9], the UHF sensor offers greater resistance to ambient noise, potentially enhancing the accuracy and reliability of PD detection in challenging environments. The process involves receiving pulses generated from the GIS through a Radio Frequency (RF) receiver connected to the UHF sensor. These pulses are then measured at a sampling rate of 128BIN within a frequency band ranging from 300 to 1500 MHz. This frequency band is chosen to capture the UHF signals associated with partial discharge activity within the GIS.

3.2 Data Preprocessing

In this paper, four partial discharge defect patterns were considered, including PE, DI, FE and NS patterns. The data was preprocessed into 2D data using the PRPD. As shown in Table 1, it was confirmed that it occurred in a unique pattern according to the PD defect mode. And in the case of NS, it shows a pattern broadly distributed at the bottom of the y-axis. An experiment was conducted by configuring data sets for each pattern, including 70% learning data sets and 30% test data sets based on preprocessed data.

Table 1 PD defect modes and PRPD Data samples

4 Result and Discussion

4.1 Results of Classification for PRPD Patterns

Table 2 shows that traditional CNN models used for class classification yielded insufficient results for accurate analysis. However, by applying Grad-CAM with the original CNN model, a more comprehensive analysis of class classification results became possible using activation images.

Table 2 Activation image with Grad CAM

As a result of analyzing the data for each PRPD pattern by each proposed model, Accuracy of VGG was 96.5% on average, and it showed high classification accuracy for most of the three patterns except the NS pattern. On the other hand, in the case of ResNet, the average was 93.13%, and the classification rate for NS was 93%, showing higher accuracy than VGG (Table 3).

Table 3 Classification Accuracy for each PRPD patterns

4.2 Epoch Plot

Accuracy and loss compared to learning rate were analyzed using the python-based matplotlib library. These metrics are typically plotted on the y-axis, while the number of epochs is plotted on the x-axis. The plot allows practitioners to visualize how the model's performance changes over time and whether it's converging or diverging. It helps in determining the optimal number of epochs for training and diagnosing issues such as overfitting or under fitting. Where the \(y_{i}\) is predicted value, \(t_{i}\) is data label and k is number of data, loss function can be expressed as (4):

$$ Loss = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {y_{i} - t_{i} } \right)^{2} } $$
(4)

Figure 4a, b shows the epoch plot outcomes for the VGG and ResNet models, respectively. Accuracy, defined as the convergence towards 1 by minimizing the error rate through a function, represents the model's capability to correctly classify instances. Conversely, loss rate, aiming to converge towards 0, signifies the reduction in errors. Hence, upon inspecting the two plot results, it is evident that the pattern classification learning accuracy of each model is improved.

Fig. 4
figure 4

Loss and Accuracy plot of VGG (a) and ResNet (b)

4.3 Discussion

To compare the performance of two different deep learning models, VGG and ResNet, in classifying patterns associated with partial discharge. The summaries are as follows:

  1. (1)

    The average accuracy of VGG was 96.5%.

  2. (2)

    VGG demonstrated high classification accuracy for most of the three patterns (Protrusion Electrode, Defective Insulator, and Floating Electrode), indicating that it effectively identified these patterns.

  3. (3)

    The average accuracy of ResNet was 93.13%, showing a higher classification rate compared to the NS despite the lower average accuracy compared to VGG.

PRPD patterns are primarily characterized by signal features that occur in specific frequency bands in the frequency domain. However, typical convolutional neural networks such as VGG and ResNet focus on extracting features from images. This can make it difficult to adequately extract features in the frequency domain. VGG and ResNet use a fixed 3 × 3 convolutional filter size, which can be insufficient to capture the different features and scales of partial discharge patterns. Since partial discharge patterns vary in size and have features that are strongly influenced by the surrounding environment, a more extended convolution filter size or filters of different scales may be needed. Therefore, more noise data should be acquired for further validation. The presence of some vertically aligned points in the noise data resulted in a misclassification as surface discharges. It is believed that noise was introduced during the measurement. To build an enhanced CNN model, we need to increase the number of data samples and train on partially noisy cases.

5 Conclusion

In this paper, a UHF sensor was installed in GIS to classify four types of partial discharge. The existing initial model showed low accuracy, but higher accuracy was shown through VGG and ResNet based on the CNN deep learning model. Grad-CAM classified PRPD patterns in the proposed learning model and used them to verify the results, which was effective in deriving inconsistencies and directions for improvement of the learning model. This study shows high accuracy in classifying the partial discharge characteristics of electric facilities, but it can be used to diagnose facilities using instantaneous values, but it is insufficient to predict the prognosis of facilities, and it is necessary to derive improvements through continuous research in the future.