Keywords

1 Introduction

Fault identification is a technology to evaluate system device running state by device data already detected, thus it is vital to ensure the safe, steady, reliable work of industrial devices. Along with the digital and information time, data-driven technology is widely used in the processes of monitoring and diagnosing in industry [1]. In particular, fault identification based on data-driven has become the mainstream method.

Fault identification methods under data-driven directly learn the fault features from historical fault data during the system running time, thus can diagnose without exact system mathematical models. But most large-scaled industrial devices are not permitted to run until fault happens, and devices change from normal to breakdown gradually, the actual target fault samples are few. Thus it is costing to collect samples, suppressing the applications of the traditional data-driven methods in fault diagnosis greatly [2,3,4].

Recently, a large quantities of fault identification methods based on transfer learning are proposed to solve the problem of fault sample lack [5, 6]. The main idea of methods based on transfer learning is, the relative and labeled data from one or more domains can be used as auxiliary knowledge to improve the model performance in target domain, and to complete the target tasks in target domain. Thus the transfer learning methods have a strong ability of representation learning and end-to-end training. Lu et al. [7] proposed a deep neural network model with domain adaptive for fault diagnosis. Yang et al. [8] proposed a polynomial kernel induced distance measurement method for deep transferring fault diagnosis. These deep transferring fault diagnosis models essentially deal with domain transition between source and target domain. Also, researchers dedicate to solving the problems of unknown fault diagnosis [9] and unlabeled fault diagnosis [10]. Gupta et al. [9] proposed an early classification method of multiple time Series (MTS) based on semantic information. Attribute learning model is used to obtain the semantic information of known fault categories, and then the semantic information of visible classes is used to predict unknown fault. Although the above method is similar to zero-shot, it is not zero-shot fault diagnosis problem in nature. For zero-shot fault diagnosis, Feng et al. [11] proposed an attribute transfer method based on fault description. By transferring the artificially defined fault descriptions to target domain, the source domain training model can be used for fault diagnosis in target domain. However, this method requires manual setting of fault attribute description and in-depth understanding of fault data.

To deal with above problems, this paper proposes a zero-shot fault identification method based on transfer learning. The proposed method is to obtain the potential fine-grained feature representation of source domain data samples, as a knowledge transferring bridge, and to construct a discriminant attribute extractor for both source and target domains. First, the proposed method utilizes PCA algorithm to extract the main fault attribute [12]. Then the shared fine-grained groups of all known fault categories are obtained according to fault date in source domain, and transferred to the representation in target domain. Both the discriminant features in source and target domain are formed by discriminant matrices learnt from the shared fine-grained groups. Finally, we can use these discriminant features to zero-shot fault diagnosing.

2 Methodology

In this chapter, the framework of the proposed MK-CSE is introduced. When new data comes, the framework embeds it into the regenerative kernel Hilbert space to obtain the optimal mapping kernel, and MK-MMD between data blocks is maximized to ensure the diversity of retained knowledge, which has better robustness and accuracy than single kernel measurement.

2.1 Zero-Shot Fault Identification Definition

Assuming that there is k classes of fault samples, then the corresponding formula is \(X\in R^{a\times b}\), and the labels can be described as \(Y\in R^{a\times 1}\). Where a and b are the number and dimensions of samples. Zero-shot fault identification, means that the fault data to be diagnosed has no available sample to train models (namely as target domain), while other relative and available fault data (namely source domain) is required as the auxiliary knowledge to train models instead. Thus, data from source domain is defined as \(D_{O}=\{X_{O}\in R^{a_{O}\times b},Y_{O}\in R^{a_{O}\times 1}\}\), while \(x_{oi}\) and \(y_{oi}\) stand for the ith sample and label from source domain, \(p_{o}\) is the number of fault categories in \(y_{oi}=1,2,\dots ,p_{o}\).

Similarly, data from target domain is defined as \(D_{T}=\{X_{T}\in /R^{a_{T}\times b},Y_{T}\in R^{a_{T}\times 1}\}\), while \(x_{ti}\) and \(y_{ti}\) stand for the jth sample and label from target domain, \(p_{t}\) is the number of fault categories in \(y_{ti}=1,2,\dots ,p_{t}\). Then the following formula describes the zero-shot fault identification model:

$$\begin{aligned} \begin{aligned} \hat{Y_{T}}=func(X_{T}, D_{O}(X_{O}, Y_{O})), s.t.\hat{Y_{T}}=Y_{T} \end{aligned} \end{aligned}$$
(1)

wherein, func() is the mapping function from source domain to target domain.

2.2 Fault Fine-Grained Representation and Discriminant Feature Extraction

Fig. 1.
figure 1

Diagram of zero sample fault identification method based on Transfer Learning

The proposed zero-shot fault identification method based on transfer learning is shown as Fig. 1, divided into training stage and test stage. The blue stands for the training stage, which is aimed to learn the shared fine-grained representation of fault by using source domain data, and transfer it to the target domain, thus the discriminant feature used to classify can be learnt. The red is the test stage, which is mainly to extract the discriminant feature according to the fine-grained representation in training stage, to achieve fault classifying. Because the labels in source domain and target domain are different, labels in source domain need to be projected into the corresponding labels under the fine-grained representation.

For the given large-scale fault data in source domain \(X_{O}\), its main attribute extracted is \(P(X_{O})=X_{O}^{P}=\{f_{o1}^{p},f_{o1}^{p}, \dots ,f_{o1}^{p}\}\) after PCA analysis. Referring to the idea of Fig. 1, we use \(X_{O}^{P}\) to learn the shared fine-grained attribute and discriminant features first. Then \(X_{O}^{P}\) can be divided into a shared fine-grained base and a discriminant matrix as shown:

$$\begin{aligned} \begin{aligned} \min _{G,C_{O}^{G}}\left| \left| X_{O}^{P}-GC_{O}^{G} \right| \right| _{F}^{2} \end{aligned} \end{aligned}$$
(2)

wherein, \(G\in R^{a_{o}\times n}\) stands for the shared fine-grained base matrix, n is the atom number of the fine-grained base, thus we can describe \(X_{O}^{P}\) with more fault categories. And \(C_{O}^{G}\in R^{n\times b}\) intends to be the discriminant matrix under the fine-grained base matrix G. We also use \(tr((C_{O}^{G})^{T}Y^{O}C_{O}^{G})\) as restraint to strengthen the discriminant ability of the discriminant matrix. But the labels in source and target domains differ a lot after mapping, the restraint is changed into \(tr((C_{O}^{G})^{T}C_{O}^{G})\). Thus Formula (2) can be rewritten as:

$$\begin{aligned} \begin{aligned} \min _{G,C_{O}^{G}}\left| \left| X_{O}^{P}-GC_{O}^{G} \right| \right| _{F}^{2}+\left| \left| C_{O}^{G} \right| \right| _{2} \end{aligned} \end{aligned}$$
(3)

Through Formula (4), we obtain the discriminant features of faults \(X_{O}^{G}\):

$$\begin{aligned} \begin{aligned} X_{O}^{G}=X_{O}^{P}C_{O}^{G} \end{aligned} \end{aligned}$$
(4)

The procedures above demonstrate how to obtain the shared fine-grained base matrix and discriminant features from source domain. In a similar way, the matrix is transferred to target domain to get the discriminant matrix of target domain, as shown:

$$\begin{aligned} \begin{aligned} \min _{C{T}^{G}}\left| \left| X_{T}^{P}-GC_{T}^{G} \right| \right| _{F}^{2}+\left| \left| C_{T}^{G} \right| \right| _{2} \end{aligned} \end{aligned}$$
(5)

Then the discriminant features are \(X_{T}^{G}=X_{T}^{P}C_{T}^{G}\).

Optimization: For the similarity of our objective function and dictionary learning objective function, we can use the same solving processes as dictionary learning. For more precise shared fine-grained base matrix, the proposed method use the same optimization process as KSVD [13] to solve models. Fix one optimization variable in \(C_{O}^{G}\) and G, and use it to optimize the other, as a loop.

First, we initialize G, and use orthogonal matching tracking algorithm to obtain \(C_{O}^{G}\). Then we upgrade the shared fine-grained base matrix G and the discriminant matrix \(C_{O}^{G}\) of each row in the way of row by row. \(g_{n}\) is the nth vector of the shared fine-grained base matrix, and \(c_{n}^{T}\) is the nth vector of the discriminant mapping matrix. Formula (2) can be rewritten as:

$$\begin{aligned} \begin{aligned} \left| \left| X_{O}^{P}-GC_{O}^{G} \right| \right| _{F}^{2}&=\left| \left| X_{O}^{P}-\sum _{i=1}^{N}g_{i}c_{j}^{T} \right| \right| _{F}^{2}\\&=\left| \left| E_{n}-g_{n}c_{n}^{T} \right| \right| _{F}^{2} \end{aligned} \end{aligned}$$
(6)

wherein, \(E_{n}=X_{O}^{P}-\sum _{}^{i\ne n}g_{i}c_{i}^{T}\). The optimization of the shared fine-grained base matrix G is described as:

$$\begin{aligned} \begin{aligned} \min _{g_{n}c_{n}^{T}}\left| \left| E_{n}-g_{n}c_{n}^{T} \right| \right| _{F}^{2} \end{aligned} \end{aligned}$$
(7)

In Formula (7), the optimal \(g_{n}\) and \(c_{n}^{T}\) obtained by SVD, can be the approximate terms of 1-rank matrix \(E_{n}\). But \(c_{n}^{T}\) just by putting \(E_{n}\) into SVD solution is not sparse, thus we extract the positions that its corresponding \(c_{n}^{T}\) not equal to 0, and restructure a new \(E_{n}^{'}\). Then decompose it with SVD, as \(E_{n}^{'}=L\sum R^{T}\). Take the first column vector of the left singular matrix L as \(g_{n}\), and the product of the first column vector and the first singular value of the right singular matrix as \(c_{n}^{T}\). After obtaining the shared fine-grained base matrix G based on source domain data, \(C_{T}^{G}\) can be obtained by utilizing orthogonal matching tracking algorithm. The full optimization solution process is shown in Algorithm 1.

figure a

3 Experiments

For further evaluate the proposed method in this paper, we go on experiments with TE (Tennessen-Eastrman) process dataset. The detailed experimental settings are shown as below.

Experiment Dataset: TE process dataset is simulated based on a real industrial process. It consists of five major subsystems, including a reactor, a condenser, a vapor-liquid separator, a recycle compressor, and a product stripper. It contains 21 types of faults, each of which is composed of 41 measurement control variables and 11 process control variables. TE data is divided into training set and test set. Each type of fault in the training set contains 480 samples, with a total of 21\(\,\times \,\)480 training samples. Each type of fault in the test set contains 960 samples, a total of 21\(\,\times \,\)960 test samples.

Experiment Settings: First, the proposed method extracts the relative features of the faults by PCA, and obtains the main relative features of the source-domain faults \(X_{O}^{P}\) and the target-domain faults \(X_{T}^{P}\). For each attribute of faults, we extract 20 main relative features from 52 original variables. Second, the fine grit is set to 35, both the shared fine-grained base and the source domain discriminant matrix are learnt from the source domain data. Third, use the shared fine-grained base to obtain the target domain discriminant matrix to finally obtain the discriminant features of the source and target domain. For the problem of different labels in source and target domain, the proposed method utilizes linear mapping function to change the source-domain label \(y_{O}\) and target-domain label \(y_{T}\) to labels \(z_{O_{k}}, z_{T_{k}}=\left\{ l_{1}, l_{2},\dots , l_{k} \right\} \) which have the same information. The linear mapping function is followed:

$$\begin{aligned} \begin{aligned} y_{O}=wz_{O} \end{aligned} \end{aligned}$$
(8)

Then the mapping representation of label in target domain is \(y_{T}=wz_{T}\). For zero-shot fault identification defines that there is no available training sample for the fault category to be diagnosed, the experimental Settings in this paper are the same as those in [11], and only the training set part is used. As the last 6 faults of the 21 fault classes of TE data are rarely described in the data set, 80% fault classes of the first 15 fault classes of all fault classes are used for training and 20% fault classes are used for testing. The experiment was divided into 5 groups, each group had 12 fault classes for training and 3 fault classes for testing. See Table 1 for specific categories.

In addition, this paper adopts four algorithms for fault classification, namely deep belief network (DBN) [14], support vector machine (SVM) [15], Random forest (RF) [16] and Naive Bayes (NB) [17]. The DBN is formed by two layers of 100 dimensions, the number of epochs is 100 and the size of batch is 120, the learning rate is set to 1. The LSVM parameter is set to (‘-s 0 -t 0 -c 1’), and the number of decision trees for RF is set to 50. We tested the accuracy of five experiments. The calculation formula of fault classification accuracy is as follows:

$$\begin{aligned} \begin{aligned} Acc=\frac{\hat{W}_{T} }{W_{T}} \end{aligned} \end{aligned}$$
(9)

where, \(\hat{W}_{T}\) is the number of samples which are correctly classified, and \(W_{T}\) is the total number of samples of target faults in the test stage.

Results and Analysis: The diagnostic results of five groups of experiments under four algorithms are shown in Table 2. DBN classifier reaches highest at 72.64% on group B, LSVM classifier reaches highest at 84.31%, 56.11%, 44.03%, on group A, group C and group E, RF classifier reaches highest at 63.68% on group D. Since the significance characteristics of each fault are different, the fault identify accuracy varies from 44.03% to 88.47%. Although the highest fault identification accuracy of the four classifiers varies from 44.03% to 88.47%, there is still some space to improve.

Table 1. Experiment groups
Table 2. Experiment results of 4 algorithms on 5 groups
Table 3. Comparison between our method and other SOTA methods

The paper also compared with DAP [18], IAP [19], SJE [20], ESZSL [21] and Feng et al. [11]. The results are shown in Table 3. The proposed method achieved the best identification results in group A, B, C, D, and the average accuracy of the five groups of experiments reached highest at 64.99%. Table 3 shows that the shared fine-grained base transferring proposed in this paper can effectively solve the problem of zero-shot fault identification.

The proposed method aims at transferring source domain to share fine-grained base matrices. Therefore, the identification results of representation matrix migration at different granularity are discussed. In this paper, comparison tests under five different fine grits are set as \(k=[20, 25, 30, 35, 40]\), and the results are shown in Fig. 2. It can be seen that the overall trend is that the larger the fine grit is, the more accurate the identification result will be, which is also more consistent with the practical theoretical interpretation, that is, within a certain range, using more dimensions to describe the fault features can make the classification effect better.

Fig. 2.
figure 2

Experiment results under different sizes of fine girt

4 Conclusion

In actual applications of large-scaled complex devices, for the problems that fault samples are usually few and hard to collect, and take a lot of human and material resources, we propose a zero-shot fault identification method based on transfer learning. Under the situation of no available sample, the proposed method transfers the fine-grained representation of fault data in other relative domains to target domain, thus to diagnose the faults in target domain. Compared with existing zero-shot identification methods, our method utilizes the discriminant feature extraction based on shared fine-grained base representation. The fine-grained base are flexible in dimensions, and the fine-grained base description matrices can be obtained automatically according to algorithms, without setting fine-grained description manually. On TE process dataset, the average identify accuracy is 64.99%, which gains the highest accuracy compared with other methods, and analysis results on different experiments also demonstrate the effectiveness of the proposed method.