Keywords

1 Introduction

As an important piece of equipment in a power system, an interruption of power supply can be caused by any failure in the power transformer. Therefore, it is vital to detect transformer faults [1,2,3]. Dissolved gas analysis (DGA) has been widely recognized as an effective diagnostic technique for fault detection in power transformers. The analysis of specific dissolved gas concentrations in transformer insulating oil yields knowledge about the condition of the transformer and allows necessary preventive measures to be taken based on the results of the process [4,5,6]. However, due to the variability of gas data and the nature of the operation, fault detection by conventional methods is not always an easy task.

To develop more accurate diagnostic tools based on DGA data, scholars have developed a number of artificial intelligence methods [7, 8]. With the development of machine learning, the fault diagnosis of power transformers has also been enhanced. To cope with the uncertainty in fault diagnosis, Huang et al. proposed a fuzzy logic-based fault diagnosis method for power transformers, where the technique can diagnose multiple faults in a transformer and quantitatively indicate the severity of each fault [9]. To reduce the redundant information of the data, Kari et al. proposed to reduce the dimensionality of the data with principal component analysis and detect power transformer faults using fuzzy C-means method [10].

To develop more accurate diagnostic tools based on DGA data, scholars have developed many artificial intelligence methods [7, 8]. With the development of machine learning, the fault diagnosis of power transformers has also been enhanced. To cope with the uncertainty in fault diagnosis, Huang et al. proposed a fuzzy logic-based fault diagnosis method for power transformers, where the technique can diagnose multiple faults in a transformer and quantitatively indicate the severity of each fault [9]. To reduce the redundant information of the data, Kari et al. proposed to reduce the dimensionality of the data with principal component analysis and detect power transformer faults using the fuzzy C-means method [10].

However, the above methods do not take into account the dynamic nature of power transformer data, and for continuously operating systems, may fail to explore valuable dynamic information for the process and lead to some misleading monitoring results [11]. CVA is widely used in a dynamic process to generate a state-space model from data by maximizing the correlation between the constructed “past” and “future” matrices [12]. To the best of the authors’ knowledge, the CVA method has not been used in the data processing of power transformers for fault diagnosis.

Motivated by the above discussion, considering the characteristics of CVA and SVM, a new fault diagnosis method is proposed by combining CVA and SVM for the power transformer process. First, CVA extracts the dynamic features of the process data. Based on the extracted features. And then SVM is employed to classify the fault types to address the issues of no-Gaussian assumption and nonlinearity. For the parameter optimization problem in SVM, this paper uses the random grid search cross-validation method to improve the accuracy of the model.

This paper is organized as follows. Section 2 briefly reviews the CVA and SVM. Section 3 is devoted to describing the proposed CVA-SVM method. Section 4 presents the application of the proposed method in the real power transformer data. Finally, conclusions are given in Sect. 5.

2 Review of CVA and SVM

2.1 CVA

CVA is based on the so-called subspace identification, where process measurements are stacked to form the past and future spaces [13]. Denote \(\mathbf {x_k}\in \mathbb {R}^m\) (\(k=1,2,\dots ,N\)) as the normalized stacked vector at time instant k. For each k, the past data vector \(\textbf{x}_{p,k}\) and future data vector \(\textbf{x}_{f,k}\) are collected as

$$\begin{aligned} \textbf{x}_{p,k} =\begin{bmatrix} \textbf{x}_{k-1}\\ \textbf{x}_{k-2}\\ \vdots \\ \textbf{x}_{k-l} \end{bmatrix}\in \mathbb {R}^{ml} ,\textbf{x}_{f,k} =\begin{bmatrix} \textbf{x}_{k}\\ \textbf{x}_{k+1}\\ \vdots \\ \textbf{x}_{k+l-1} \end{bmatrix}\in \mathbb {R}^{ml} \end{aligned}$$
(1)

where l is the number of time lag. For a finite sequence with N samples, the past and future Hankel matrices \(\textbf{X}_p\) and \(\textbf{X}_f\) are constructed,

$$\begin{aligned} \begin{aligned} \begin{array}{l} \textbf{X}_p=[\textbf{x}_{p, l+1},\textbf{x}_{p, l+2},\dots , \textbf{x}_{p, l+M}]\in \mathbb {R}^{ml\times M}\\ \textbf{X}_f=[\textbf{x}_{f, l+1},\textbf{x}_{f, l+2},\dots , \textbf{x}_{f, l+M}]\in \mathbb {R}^{ml\times M} \end{array} \end{aligned} \end{aligned}$$
(2)

where \(M=N-2l+1\). The estimates of the sample covariance and cross-covariance of the past and future vector are expressed below,

$$\begin{aligned} \begin{bmatrix} \varSigma _{pp} &{} \varSigma _{pf} \\ \varSigma _{fp} &{} \varSigma _{ff} \end{bmatrix}=\frac{1}{M-1} \begin{bmatrix} \textbf{X}_p \textbf{X}^{\top }_p &{} \textbf{X}_p \textbf{X}^{\top }_f\\ \textbf{X}_f \textbf{X}^{\top }_p &{} \textbf{X}_f \textbf{X}^{\top }_f \end{bmatrix} \end{aligned}$$
(3)

In CVA, the projection matrices \(\textbf{J}\) and \(\textbf{L}\) can be computed through performing singular value decomposition (SVD) on the Hankel matrix \(\textbf{H}\),

$$\begin{aligned} \textbf{H} =\varSigma _{ff}^{-1/2} \varSigma _{fp} \varSigma _{pp}^{-1/2} =\textbf{U}\varLambda \textbf{V}^{\top } \end{aligned}$$
(4)

Here, \(\textbf{U}\) and \(\textbf{V}\) are the left and right singular matrices of the matrix \(\textbf{H}\), respectively. \(\varLambda =diag[\sigma _1,\sigma _2,\dots ,\sigma _q]\) is the diagonal matrix containing all singular values, and q is the rank of \(\textbf{H}\).

From the result of SVD, the projection matrices \(\textbf{J}\) and \(\textbf{L}\) can be calculated. The first r columns of \(\textbf{V}\) can be considered to have the highest pairwise correlation with the first r columns of \(\textbf{U}\) [14]. It produces a pair of new matrices \(\textbf{U}_r\in \mathbb {R}^{ml\times r}\) and \(\textbf{V}_r\in \mathbb {R}^{ml\times r}\) with smaller dimensionality.

$$\begin{aligned} \begin{aligned} \begin{array}{l} \textbf{J} =\textbf{V}_r^{\top } \varSigma _{pp}^{-1/2} \in \mathbb {R}^{r\times ml}\\ \textbf{L} =(\textbf{I}-\textbf{V}_r\textbf{V}_r^{\top } )\varSigma _{pp}^{-1/2}\in \mathbb {R}^{ml\times ml} \end{array} \end{aligned} \end{aligned}$$
(5)

Finally, two matrices contain the state and residual vectors are derived below,

$$\begin{aligned} \begin{aligned} \begin{array}{l} \textbf{Z}=\textbf{J} \textbf{X}_p\in \mathbb {R}^{r\times \textbf{M}} \\ \textbf{E}=\textbf{L} \textbf{X}_p\in \mathbb {R}^{ml\times \textbf{M}} \end{array} \end{aligned} \end{aligned}$$
(6)

2.2 SVM

As illustrated in Fig. 1, a Support Vector Machine aims to find an optimal hyperplane by maximally separating the margins between the hyperplane and the data [15, 16].

Fig. 1.
figure 1

Separation of two classes by SVM.

Given a data set \(F=\left\{ x_i, y_i\right\} ^m_{i=1}\), where m is the sample number, \(x_i\in R^n\) stands for the input vectors, \(y_i\in {+1,-1}\) denotes two classes. The hyperplane \(f(x) = 0\) that separates the given data can be determined when the two classes are linearly differentiable.

$$\begin{aligned} f(x)=w\cdot x+b=\sum _{i=1}^{m} w_k \cdot x_k+b=0 \end{aligned}$$
(7)

where w denotes the weight vector and b denotes the bias term. The separation hyperplane should satisfy the following constraints,

$$\begin{aligned} y_if(x_i)=y_i( w\cdot x_i+b)\ge 1,i=1,2,\dots m \end{aligned}$$
(8)

For solving the linear indivisibility problem, the relaxation variable \(\zeta _i\) is introduced, and thus the constraint becomes as,

$$\begin{aligned} \begin{aligned} \begin{array}{l} Min \quad \frac{1}{2} \left\| w \right\| ^2+C\sum _{i=1}^{m} \zeta _i,i=1,\dots ,m\\ S.t.\quad {\left\{ \begin{array}{ll}y_i( w\cdot x_i+b)\ge 1-\zeta _i) \\ \zeta \ge 0 \end{array}\right. } \end{array} \end{aligned} \end{aligned}$$
(9)

where C is the error penalty.

The above optimization problem is transformed into a pairwise quadratic optimization problem by introducing the Lagrange multiplier \(\alpha _i\) , i.e.

$$\begin{aligned} \begin{aligned} \begin{array}{l} Max \quad L(\alpha )=\sum _{i=1}^{m} \alpha _i-\frac{1}{2} \sum _{i,j=1}^{m} \alpha _i\alpha _j y_i y_j(x_i ,x_j)\\ S.t.\quad \sum _{i=1}^{m} \alpha _i y_i=0,\alpha _i\ge 0,i=1,\cdots ,m \end{array} \end{aligned} \end{aligned}$$
(10)

The linear decision function is therefore created by solving a pairwise optimization problem defined as,

$$\begin{aligned} f(x)=sign(\sum _{i,j=1}^{m} \alpha _i y_i (x_i ,x_j)+b) \end{aligned}$$
(11)

SVM can be used for nonlinear classification. By using a nonlinear mapping function, the original data x is mapped to a high-dimensional feature space in which linear classification can be performed. Then the decision function is transformed into,

$$\begin{aligned} f(x)=sign(\sum _{i,j=1}^{m} \alpha _i y_i K(x_i ,x_j)+b) \end{aligned}$$
(12)

In this paper, the Gaussian Kernel is selected as kernel function,

$$\begin{aligned} K(\textbf{x}_i,\textbf{x}_j)=(\phi (\textbf{x}_i)\cdot \phi (\textbf{x}_j))=exp(-\parallel \textbf{x}_i-\textbf{x}_j\parallel ^2 /h) \end{aligned}$$
(13)

where \(\phi \) is a nonlinear mapping that maps data points to the high-dimensional feature space. To obtain a tighter boundary, an appropriate width parameter h of the Gaussian kernel function is selected.

3 CVA-SVM Based Fault Diagnosis

In the proposed CVA-SVM method, the space of canonical variables can be divided into the state space and the residual space. The state space is then used as target objects for developing SVM hypersphere layers. Finally, The SVM faults classification is performed. The procedure of the CVA-SVM based fault detection method is depicted (Table 1).

Fig. 2.
figure 2

Main steps of the CVA-SVM based fault diagnosis.

As shown in Fig. 2, two phases are included, offline training and online diagnosis. Specifically, the procedure of the CVA-SVM based fault diagnosis is described in detail as follows,

Offline training:

  • Step 1. Standardize the collected faulty measurements.

  • Step 2. Construct the Hankel matrices \(\textbf{X}_f\) and \(\textbf{X}_p\) with the determined time-lag l.

  • Step 3. Obtain the projection matrices \(\textbf{J}\) and \(\textbf{L}\) according to Eq.(5)

  • Step 4. Determine the state and residual matrices \(\textbf{Z}\) and \(\textbf{E}\) using Eq.(6).

  • Step 5. Build SVM model for \(\textbf{Z}\) with the determined C and h.

  • Step 6. The SVM classifier is trained using the appropriate values of parameters.

Online diagnosis:

  • Step 1. Obtain and Standardize the test sample \(\textbf{x}^t_{k}\).

  • Step 2. Construct stacked vectors and calculate the state and residual vectors from \(\textbf{J}\) and \(\textbf{L}\),

    $$\begin{aligned} \begin{aligned} \begin{array}{l} \textbf{z}_k=\textbf{J} \textbf{x}^t_{p,k} \\ \textbf{e}_k=\textbf{L} \textbf{x}^t_{p,k} \end{array} \end{aligned} \end{aligned}$$
    (14)
  • Step 3.Input the state space \(\textbf{z}_k\) into the SVM classifier.

  • Step 4. Obtain the diagnostic results.

4 Case Study

To verify the effectiveness of the CVA-SVM method proposed in this paper, 188 power transformer oil dissolved gas content faulty data were collected for the experiment. This data has 6 types of fault states and 5 components of dissolved gas content in oil, some of which are shown in Table 2. For computational convenience, we coded and labeled the fault types of the dataset, as shown in Table 3. After that, the data were divided into a training set and a test set, and the data in the test set and the training set each accounted for \(50\%\) of the original data.

By subjecting the gas data to the CVA algorithm analysis, the five data variables of transformer faults can be dimensionally reduced to four data variables. This will facilitate the linear partitioning of the data by the SVM classifier and can also improve the computational speed of the fault diagnosis system. In addition, to compare the superior performance of the proposed methods, we compare the traditional SVM method, PCA and SVM combined algorithms. The classification result plots of the three methods are shown in Fig. 3.

The final classification accuracy of each model is summarized in Table 4. From Fig. 3, it is obtained that the SVM algorithm is less effective in identifying normal samples and has a lower detection rate for the medium to low temperature overheating and the high temperature overheating faults. The SVM method also has some false detection. The detection effectiveness of the PCA-SVM method is improved. Table 3 quantifies the detection effect of each model, and from the Table, we get that CVA-SVM has the highest accuracy for both the test set and the training set, and achieves the best classification effect.

Table 1. Comparison of fault diagnosis results.
Table 2. Description of power transformer fault data.
Table 3. Codes for power transformer fault type.
Fig. 3.
figure 3

Diagnostic results of power transformer faults based on three methods (a). SVM, (b) PCA-SVM, (c) CVA-SVM.

5 Conclusion

This paper proposes a design of a power transformer fault diagnosis system based on the optimized SVM kernel function model algorithm and optimizes the SVM model by collecting the data of five gases generated by oil fission when a transformer fault occurs. Compared with the traditional SVM and PCA-SVM methods, it can be seen that the CVA-SVM method can significantly improve the accuracy of transformer fault diagnosis. CVA can extract dynamic information from the data, so the optimized SVM model algorithm is more suitable for transformer fault diagnosis systems with high practicality. Further investigation is strongly recommended to extend power transformer fault identification methods.