1 Introduction

Fault identification is necessary for fault diagnosis of a power transformer as it plays an important role in the electrical power distribution. Dissolved gas analysis is a powerful tool to diagnose a power transformer through oil measurement. It is considered as the most reliable technique in detecting incipient fault within power transformers [1]. There are several DGA interpretation methods that have been proposed and commonly used, such as the Key Gas Method, Rogers Ratio, Doenenburg Ratio, IEC Ratio, Duval Triangle Method, and Duval Pentagon Method. Among the existing interpretation methods, Duval Triangle Method provides the most accurate and consistent analysis [2]. Like previous studies, research [3] confirmed that Duval Triangle resulted in the most similar results from the transformer observation. Duval Triangle Method, along with the pentagon version, is the best methods for more detailed faults diagnosis in transformers [4].

The use of computer-based technology in recent years has led to significant improvement in transformer condition assessment and monitoring. A fuzzy logic approach has been proposed for consistent interpretation of dissolved gas-in-oil analysis [5]. Support Vector Machine algorithm was developed and proposed to diagnose fault in power transformer in [6, 7]. Random Forest model was proposed to predict missing interfacial tension parameter in [8]. Neural network was proposed to diagnose fault in power transformer based on IEC code in [9]. Multiple machine learning classifier was implemented and compared to predict Health Index of power transformer in [10]. In recent years, the graphical or chart-based interpretation has been improved by computer-based technology [11]. A study of [12] proposed a new fault diagnostic based on the Duval Triangle which employs the combination of fuzzy and evidential reasoning techniques. This study aims to improve the Duval Triangle Method so that it can produce easy to understand approach and is able to identify simultaneous faults. A study of [13] developed Java language based software implementation of the Duval Triangle 1. A study of [14] proposed the ANFIS-based Duval Triangle model to for fault diagnosis of power transformer. The proposed ANFIS model produced higher accuracy compared to the ANN algorithm.

The Machine Learning is useful on power transformer assessment as it reduces the dependency on personnel expertise, and improve consistency. Moreover, it is useful to assess a large number of transformers, which consists of hundreds to thousands of transformers data. However there has been no thorough investigation on the use of the suitable machine learning algorithm for the ML-based implementation of Duval Triangle. Previous studies has also done mostly on Duval Triangle 1, while based on the result in [4], other Duval Triangle in combination can improve the consistency of the result. Therefore, this study aims to investigate and compare various popular ML algorithms, and propose the best performing model to support graphical DGA interpretation. The ML-based Duval Triangle 1, 4, and 5 were proposed, and the combination method were implemented. The evaluation of the proposed method was done using the validation dataset.

2 Methodology

The development of ML-based DTM was initially done by generating training and testing datasets for DTM1, DTM4, and DTM5. Subsequently, six ML algorithms were trained, namely decision tree (DT), support vector machine (SVM), random forest (RF), neural network (NN), Naïve Bayes (NB), and AdaBoost (AB). To evaluate the performance of the models, several parameters were used, such as Classification Accuracy, Area Under Curve, F1, Precision, and Recall. Confusion matrices were used to visualize the results. The developed model with the highest performance would be selected. In addition, validation dataset, which was obtained from the previous published articles, was used to evaluate the proposed ML-based DGA Interpretation to the graphical one. Finally, a combination method was implemented in the ML-based DTM1, DTM4, and DTM5. Figure 1 shows the flowchart of the methodology in this study.

Fig. 1
figure 1

Flowchart of the methodology

2.1 Graphical dissolved gas analysis interpretation

Duval triangle method (DTM) is presented in [15] consists of several triangles. It is still considered the most reliable method to identify fault within power transformers [1,2,3,4, 6, 7]. Most in-service power transformers are still using mineral type oil insulation, and the DTM that is suitable for those transformers are DTM1, DTM4, and DTM5. Therefore, those three DGA fault identification methods were investigated in this study.

  1. A.

    Duval Triangle Method 1


DTM1 uses the percentage of three gasses: CH4, C2H4, and C2H6. Table 1 shows the seven fault codes of DTM1, which are PD, D1, D2, T1, T2, T3, and DT. Figure 2 shows the graphical representation of DTM1.

  1. B.

    Duval Triangle Method 4

Table 1 Fault codes and the description of DTM1
Fig. 2
figure 2

Duval triangle 1 dissolved gas fault identification


The DTM4 uses the percentage of three gasses: H2, CH4, and C2H6. Table 2 shows the five fault codes of DTM4. DTM4 is an addition to the DTM1, and was used to obtain more information related to faults within low temperature (PD, T1, or T2) after DTM1 [15]. If the results of DTM1 is D1, D2, or T3, DTM4 should not be used. Mixture of faults can be indicated by the DTM4 and DTM 5 that do not agree. Figure 3 shows the graphical representation of DTM4.

  1. C.

    Duval Triangle Method 5

Table 2 Fault code and the description of DTM4
Fig. 3
figure 3

Duval triangle 4 dissolved gas fault identification

Table 3 Fault codes and the description of DTM5

The DTM5 uses the percentage of three gasses: CH4, C2H4, and C2H6. Table 3 shows the seven fault codes of DTM5. DTM5 was employed to obtain more information after DTM1 identified T2 or T3. DTM5 should not be used when the DTM1 indicated D1 or D2 faults. DTM5 could be used to confirm the uncertainty of the results of DTM1 and DTM4 [15, 16]. Mixture of faults can be indicated by DTM4 and DTM 5 that do not agree. Figure 4 shows the graphical representation of DTM5.

Fig. 4
figure 4

Duval triangle 5 dissolved gas fault identification

2.2 Machine learning algorithms

The use of ML to assist power transformer condition assessment has been reported in several literatures [17,18,19]. This study investigates six different machine learning algorithms to support the graphical dissolved gas analysis fault identification method. Those algorithms are Decision Tree, Support Vector Machine, Random Forest, Neural Network, Naïve Bayes, and AdaBoost.

  1. A.

    Decision Tree


Decision tree is one of the most commonly used algorithms in model building. It has a strong generalization ability and convenient pruning, and it is fast fitting [20]. In this algorithm, classification is carried out by splitting the data into nodes by class purity using information gain. Several studies have implemented Decision Tree into power transformer condition monitoring and diagnostics [20,21,22]. A study of [21] presented the use of C4.5 decision tree to predict transformer fault from gas values of online condition monitoring. A study of [22] implemented decision tree to classify the frequency response analysis to diagnose fault within transformer windings. A study of [20] developed the rules of the fuzzy logic using decision tree algorithm to estimate the degree of polymerization in oil-paper insulation system.

  1. B.

    Support Vector Machine


Many studies have successfully implemented Support Vector Machine in power transformer condition monitoring and diagnostics [23,24,25,26]. This algorithm transforms the samples in the original input space to a higher dimensional space, then searches for the optimal separation hyperplane [23]. SVM is important because of its linearity and flexibility for large data setting. It has good generalization properties because it minimizes the structural misclassification risk in the training process [25, 26].

  1. C.

    Random Forest


Random forest classifier is an ensemble learning method which consists of a set of decision tree that is developed from a bootstrap sample from the training data. This algorithm consists of a collection of tree-structured classifiers with independent identically distributed random vectors. Each tree casts a unit vote for choosing the most popular class at input [27]. An ensemble of the classifiers is given, and the training data are drawn randomly from the distribution of the random vector and the margin function is defined. Random forests do not overfit as more trees are added and produce a limiting value for the generalization error [27]. Several studies has proposed the successful use of Random Forest classifier [10, 28, 29].

  1. D.

    Neural Network


Neural Network is one of the most widely used algorithms to build predictive models. This algorithm was inspired by the functionality of human brain which consists of neurons to process information in parallel [30]. This approach can reveal highly nonlinear input–output relationships and acquire knowledge directly from the training data through a learning process [31].

  1. E.

    Naïve Bayes


This algorithm uses Bayes’ rule with assumption that the attributes are conditionally independent given the class. Naïve Bayes uses the information in the sample to estimate the posterior probability P(y|x) for each class y given an object x. Classification is done once such estimates are obtained [32].

  1. F.

    AdaBoost


Adaptive Boosting (AdaBoost) is an approach to improve the prediction accuracy in machine learning by combining many relatively weak and inaccurate rules [33]. Several studies implementing AdaBoost in power transformer condition monitoring and diagnostics are [34, 35].

The performance evaluation of each model is described in the following section.

2.3 Evaluation

To evaluate the performance of the proposed method, several parameters were compared.

  • Classification accuracy (CA) This CA value is the accuracy of the classification compared to the target category. The highest CA is 1, and the lowest is 0. CA of 1 means that all of the classification results correspond to the actual category. CA was calculated using (1).

    $$ {\text{CA}} = \frac{{{\text{tp}} + {\text{tn}}}}{{\left( {{\text{tp}} + {\text{tn}} + {\text{fp}} + {\text{fn}}} \right)}} $$
    (1)
  • Precision This value measures the ratio of correctly predicted positive observations (tp) to the total predicted positive observations (tp + fp). Precision was calculated using (2).

    $$ {\text{Precision}} = \frac{{{\text{tp}}}}{{\left( {{\text{tp}} + {\text{fp}}} \right)}} $$
    (2)
  • Recall This measures the ratio of correctly predicted positive observations (tp) to the entire observations in the actual class (tp + fn). Recall was calculated using (3).

    $$ {\text{Recall}} = \frac{{{\text{tp}}}}{{\left( {{\text{tp}} + {\text{fn}}} \right)}} $$
    (3)
  • F1 This value shows the weighted average of Recall and Precision. The best value of F1 is 1, and the worst is 0. Typically, the highest F1 possible is desired. F1 was calculated using (4).

    $$ F1 = \frac{{2*\left( {{\text{Precision}}*{\text{Recall}}} \right)}}{{\left( {{\text{Precision}} + {\text{Recall}}} \right)}} $$
    (4)
  • Area under curve (AUC) AUC value is the measure of classifier ability to differentiate between classes. The higher the value, the better the model performance in classifying different classes.

  • Confusion Matrix This table visualizes the performance of the proposed model to do classification. In this paper, each row represents the actual class from graphical DTM, and each column represents the predicted class from ML-based DTM.

2.4 Datasets

Two types of datasets were used in this study. The first one was the DGA fault identification training datasets, which were generated to train and test the ML-based DTM. The second was the validation datasets, which was used to evaluate the proposed model and compare the fault identification results to the graphical DTM.

  1. A.

    DGA Fault Identification Training Datasets

To train the ML-based DTM model, three training datasets were developed. Table 4 shows a total of 1402 data for DTM1, and the number of samples for each fault type. There were seven fault types as the target class in this dataset. Three features were used as input parameters, which are %CH4 (percentage of CH4), %C2H2, and %C2H4. The calculations of the gas percentage are shown in Eqs. 13. These calculations were also carried out with DTM4 and DTM5 dataset, using different gasses combinations.

$$ \% {\text{CH}}4 = \frac{{{\text{CH}}4_{{\left( {{\text{ppm}}} \right)}} \times 100\% }}{{{\text{CH}}4_{{\left( {{\text{ppm}}} \right)}} + {\text{C}}2{\text{H}}2_{{\left( {{\text{ppm}}} \right)}} + {\text{C}}2{\text{H}}4_{{\left( {{\text{ppm}}} \right)}} }} $$
(5)
$$ \% {\text{C}}2{\text{H}}2 = \frac{{{\text{C}}2{\text{H}}2_{{\left( {{\text{ppm}}} \right)}} \times 100\% }}{{{\text{CH}}4_{{\left( {{\text{ppm}}} \right)}} + {\text{C}}2{\text{H}}2_{{\left( {{\text{ppm}}} \right)}} + {\text{C}}2{\text{H}}4_{{\left( {{\text{ppm}}} \right)}} }} $$
(6)
$$ \% {\text{C}}2{\text{H}}4 = \frac{{{\text{C}}2{\text{H}}4_{{\left( {{\text{ppm}}} \right)}} \times 100\% }}{{{\text{CH}}4_{{\left( {{\text{ppm}}} \right)}} + {\text{C}}2{\text{H}}2_{{\left( {{\text{ppm}}} \right)}} + {\text{C}}2{\text{H}}4_{{\left( {{\text{ppm}}} \right)}} }} $$
(7)
Table 4 Number of samples for DTM1 training data

Table 5 shows a total of 1555 data for DTM1, and the number of samples for each fault type. There were five classes of fault types as the target in this dataset. Three features were used as input parameters, which are %CH4, %C2H4, and %C2H6. The calculations of the gas percentage were the same with the previously shown in Eqs. 57, but using different gasses combinations.

Table 5 Number of samples for DTM4 training data

Table 6 shows a total of 1448 data for DTM5, and the number of samples for each fault type. There were seven classes of fault types as the target in this dataset. Three features that were used as input parameters are %CH4, %H2, and %C2H6.

  1. B.

    Validation Dataset

Table 6 Number of samples for DTM5 training data

As many as 1017 transformer dissolved gas data from previous studies [2, 4,5,6,7, 9, 19, 25, 36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72] were gathered and used to evaluate the proposed methods. All of the data contained the concentration of dissolved gas in mineral oil insulation of transformer main tank, with five gasses in ppm (parts per million). These data were used to validate the proposed model and compared to the result of graphical DTM.

3 Results and analysis

This section discusses the results of the analysis. First, eighteen ML-based fault identifications were developed and compared. The best performing ML algorithm was selected. Subsequently, an evaluation was conducted to the proposed model using the validation dataset. After that, the combined ML-based DTM was implemented to prove the applicability of the proposed method to identify fault within the transformer using dissolved gas-in-oil data.

3.1 ML-based dissolved gas analysis interpretation

The datasets generated that are summarized in Tables 4, 5 and 6 are trained to six machine learning algorithms, namely DT (Decision Tree), SVM (Support Vector Machine), RF (Random Forest), NN (Neural Network, NB (Naïve Bayes), and AB (AdaBoost). The tenfold cross validation was employed to evaluate each of the trained model. Figure 5 shows the performance of each ML algorithm in classifying each DTM in terms of classification accuracy. The green bar shows the accuracy of ML-Based DTM1, the blue bar is for DTM4, and yellow bar is for DTM5. The dark blue dot shows the mean of accuracy for each ML algorithm in predicting fault types of DTM1, DTM4, and DTM5. RF resulted in the highest average accuracy of 0.998, followed by AdaBoost for 0.997. Decision Tree also obtain relatively high accuracy, 0.990, while Support Vector Machine and Neural Network obtain similar results of 0.944. In this case, Naïve Bayes got the lowest average of accuracy of 0.823.

Fig. 5
figure 5

Classification accuracy comparisons of seven ML algorithms

Figures 6, 7, 8 and 9 show the comparison of area under curve (AUC), F1, Precision, and Recall of seven ML algorithms in classifying fault types of transformer dissolved gas analysis. Table 7 shows the performance of seven ML algorithms. Random Forest and AdaBoost performed better than other algorithms compared in this study, with Random Forest models slightly better. This result is consistent with all of the evaluation parameters. Therefore, RF models were proposed to be implemented further.

Fig. 6
figure 6

Area under curve comparisons of seven ML algorithms

Fig. 7
figure 7

F1 comparisons of seven ML algorithms

Fig. 8
figure 8

Precision comparisons of seven ML algorithms

Fig. 9
figure 9

Recall comparisons of seven ML algorithms

Table 7 Performance of seven ML algorithms

3.2 Evaluation

After developing the RF-based DTM1, DTM4, and DTM5, an evaluation was conducted using the validation dataset. As many as 1071 dissolved gas data were obtained and subsequently classified using the developed RF-based models. The results of the fault identification were then compared to the graphical DTM and analyzed. The resulting accuracy of RF-based DTM on the validation dataset is shown in Table 8. All of the models performed well and DTM4 obtained the highest accuracy of 96.2%.

Table 8 Accuracy of RF-Based DTM on Validation Dataset

Tables 9, 10 and 11 show the confusion matrix of RF-based DTM on the validation dataset. The rows of the confusion matrix shows the actual fault identifications by graphical DTM, while the columns shows the results of RF-based DTM in classifying the dissolved gasses into fault. True positive is a ratio of correctly classified sample per class, while false negative is the ratio of incorrectly classified sample per class. The results shows that RF-based DTM1 is excellent in identifying partial discharge fault type, with 100% true positive rate. Satisfactory results were also shown in D1, D2, and T2 fault types with 99% true positive rate. Meanwhile, DT fault type had 17% inaccuracy due to the fact that DT is a mixture of electrical and thermal faults.

Table 9 Confusion matrix of RF-based DTM1 on validation dataset
Table 10 Confusion matrix of RF-based DTM4 on validation dataset
Table 11 Confusion matrix of RF-based DTM5 on validation dataset

Table 10 shows the confusion matrix of RF-based DTM4 on the validation dataset. This model was used to obtain more information related to faults within low temperature (PD, T1, or T2). The results show that RF-based DTM4 is excellent in identifying overheating fault under 250C with 99% accuracy. Other fault codes for this model were also good, as the accuracy ranges from 95 to 96%. Table 11 shows the confusion matrix of RF-based DTM5. This model was employed after DTM1 identified T2 or T3. The results show that DTM5 was excellent in identifying PD and T3. However, the accuracy of 87% and 85% were obtained when identifying O and T2, respectively.

3.3 Combined method

The combination of DTM1, DTM4, and DTM5 is useful to distinguish further the faults inside the transformer besides electrical faults of D1 and D2. When low energy or low temperature faults were identified using the DTM1 (PD, T1 or T2), DTM4 was used to obtain more information. When high, or very high, temperature faults were identified with DTM1 (T2 or T3), DTM5 was used to obtain more information. DTM4 distinguished between relatively minor faults such as S, O, PD, and potentially more dangerous faults C, which involved possible carbonization of paper. The DTM5 method could be used to distinguish between high temperature faults T3/T2 in mineral oil and potentially more dangerous faults C involving possible carbonization of paper [11].

The combination of RF-based DTM1, DTM4, and DTM5 in fault identification was implemented in this study. The use of combined Duval Triangle could improve the consistency of the results [4]. Figure 10 shows the flowchart of DGA interpretation using the combination of DTM1, DTM4, and DTM5. These steps were applied to the 1071 dissolved gas dataset, using graphical DTM and RF-based DTM. The comparison was carried out to evaluate the use of the proposed Random Forest model.

Fig. 10
figure 10

Flowchart of DGA Fault identification using combination of DTM1, DTM4, and DTM5

Table 12 shows the confusion matrix of the proposed RF-based combined DTM on the validation dataset. The rows represent the actual DGA interpretation using graphical DTM and the columns represent the predicted category using combined RF-based DTM. The results show that the proposed method has high performance when evaluated with the validation dataset. As many as ten categories were observed, with the total accuracy of 98.7% as shown in Table 13. The resulting method performed well and useful to support graphical DGA fault identification, especially on large number of transformers.

Table 12 Confusion matrix of combined RF-based DTM1,4, and 5 on validation dataset
Table 13 Total accuracy of the proposed RF-based combination method in DGA Fault Identification

4 Conclusion

This study investigates various machine learning algorithms to support dissolved gas analysis fault identification of power transformer. Duval Triangle Method 1, 4, and 5 were employed as they were the most commonly used and reliable DGA methods. Three datasets were developed to be trained to six different machine learning algorithms. The results show that Random Forest performed the best compared to the others. Subsequently, the resulting RF-models were evaluated using the validation dataset. As many as 1071 data of dissolved gas in transformers were sampled from previous researches and observed in this study. The resulting accuracy of RF-based model for DTM1 is 95.5%, DTM4 is 96.2%, and DTM5 is 95.1%. Then, the combination of DTM1, DTM4, and DTM5 was implemented. The results show that the developed RF-based method performed satisfactorily with 98.7% accuracy. The proposed model is reliable and especially useful to help asset managers in assessing a large number of transformers data.