Keywords

1 Introduction

With the increment of precision and operation complexity in industry, the techniques of machinery fault diagnosis, which is used for the safe guarantee in operation of mechanical equipments, has gained more and more attention. At present, data-driven methods play a important role in the mechanical fault diagnosis. The vibration signal contains a wealth of information which indicate the condition of machinery, therefore it is widely used in fault diagnosis that the method based on vibration signal processing [1]. The characters of vibration signal in time-frequency domain that is used to establish the initial feature space can be extracted by methods such as Fourier transform or wavelet transform etc. And then, the classifier is constructed with the further processing of initial feature space [1]. The dimensionality reduction method is taken to get a low-dimensional feature representation since the initial feature space is generally high-dimensional. The frequently used dimensional reduction methods, such as principal component analysis [2] or linear discriminant analysis [3], is the linear dimensional reduction method. For the nonlinear signal, the linear method is not the best choice.

Manifold learning has been applied in many fields since three articles published in Science [46]. Manifold learning is a nonlinear data dimensional reduction method, which is used to extract low-dimensional manifold structure embed in high-dimensional space. Therefore there is a new way to diagnose the machinery fault. Generally speaking, the operation data of machinery with the same condition lie on the same manifold and the different condition lie on different manifold [7]. Based on this setting, manifold learning can be applied to machinery fault diagnosis.

A typical manifold learning method includes Local Linear Embedding (LLE) [5], Isometric Mapping (ISOMAP) [6] etc. A proper choice of neighborhood size is critical in manifold Learning. Using a fixed neighborhood size is inappropriate, such as in LLE and ISOMAP, because the curvature of the manifold and the density of the sample points may be different in different regions of the manifold in the engineering applications. Sparse Manifold Clustering and Embedding (SMCE) is proposed in [8]. In SMCE, the automatic choice of the neighborhood size is done by solving a sparse optimization problem. Thus SMCE is more suitable for engineering applications. In this paper, we briefly introduce the principle of SMCE and discuss its application in feature extraction and diagnosis of mechanical fault.

2 The principle of SMCE

SMCE can be used for manifold clustering and dimensionality reduction for multiple nonlinear manifold simultaneously. The key difference between SMCE and other manifold learning methods is that SMCE can automatically search neighbors by solving a sparse optimization problem and weight matrix is established by calculating weights between each points and its neighbors. And then, spectral clustering [9] and Laplacian eigenmap [10] can be used for manifold clustering and dimensional reduction respectively.

Given a set of N samples \(\{\varvec{x}_i\in \mathbb {R}^D\}_{i=1}^N\) lying in n different manifolds \(\{\mathcal {M}_l\}^n_{l=1}\) with intrinsic dimensions \(\{d_l\}^n_{l=1}\). For each sample point \(\varvec{x}_i\in \mathcal {M}_l\) consider the smallest ball \(\mathcal {B}_i\subset \mathbb {R}^D\) contains that the \(d_l+1\) nearest neighbors of \(\varvec{x}_i\). Namely, the affine subspaces with intrinsic dimensions \(d_l\) is spanned by the \(d_l+1\) neighbors. This is the fundamental assumption of SMCE. That is,

$$\begin{aligned} ||\sum _{j\in \mathcal {B}_i}k_{ij}(\varvec{x}_j-\varvec{x}_i)||_2\leqslant \epsilon \quad and \quad \sum _{j\in \mathcal {B}_i}k_{ij}=1 \end{aligned}$$
(1)

Where \(\varvec{x}_j\in \mathcal {B}_i\) and \(j\ne i\), \(\epsilon \)(\(\geqslant 0\)) is the upper bound of error, \(k_{ij}\) is coefficient. It is hard to know the diameter of \(\mathcal {B}_i\), therefore the Eq. (1) can not be solved directly. The diameter of \(\mathcal {B}_i\) is selected according to empirical rules in the LLE and ISOMAP which are local and global manifold learning algorithm respectively. In SMCE, this challenge is to be solved by a sparse optimization problem. Consider a sample point \(\varvec{x}_i\) and a sample set \(\{\varvec{x}_j|j\ne i, j=1,2,\cdots ,N\}\), the column vector \(\varvec{c}_i\) with dimension \(N-1\) is obtained by solved Eq. (2).

$$\begin{aligned} ||[\varvec{x}_1-\varvec{x}_i,\cdots ,\varvec{x}_N-\varvec{x}_i]\varvec{c}_i||_2\leqslant \epsilon \quad and \quad \varvec{1}^T\varvec{c}_i=1 \end{aligned}$$
(2)

Where the solution \(\varvec{c}_i\) is sparse and the non-zero elements are corresponding to several sample points lying in the same manifold that \(\varvec{x}_i\) belongs to. It is the key difference between SMCE and other manifold learning algorithm.

In the case of densely sampled set, the affine subspace coincides with the \(d_l\) dimensional tangent space of \(\mathcal {M}_l\) at \(\varvec{x}_i\). In other words, the sample point corresponding to non-zeros elements of \(\varvec{c}_i\) may no longer be the closest points to \(\varvec{x}_i\) in \(\mathcal {M}_l\). Therefore, the vectors \(\{\varvec{x}_j-\varvec{x}_i\}_{j\ne i}\) is normalized and let

$$\begin{aligned} \varvec{X}_i :=\left[ \frac{\varvec{x}_1-\varvec{x}_i}{||\varvec{x}_1-\varvec{x}_i||_2},\cdots ,\frac{\varvec{x}_N-\varvec{x}_i}{||\varvec{x}_N-\varvec{x}_i||_2}\right] \in \mathbb {R}^{D\times (N-1)} \end{aligned}$$
(3)

Thus the (2) has the following form:

$$\begin{aligned} ||\varvec{X}_i\varvec{c}_i||_2\leqslant \epsilon \quad and \quad \varvec{1}^T\varvec{c}_i=1 \end{aligned}$$
(4)

In this way, among all the solutions of (4), the one that uses a few closest neighbors of \(\varvec{x}_i\) is searched by considering the following \(\ell ^1\) optimization problem.

$$\begin{aligned} min\ ||\varvec{Q}_i\varvec{c}_i||_1 \quad s.t.\ ||\varvec{X}_i\varvec{c}_i||_2\leqslant \epsilon ,\quad \varvec{1}^T\varvec{c}_i=1 \end{aligned}$$
(5)

Where the proximity inducing matrix \(\varvec{Q}_i\), which is a positive definite diagonal matrix, is defined as

$$\begin{aligned} \varvec{Q}_i=\frac{||\varvec{X}_j-\varvec{X}_i||_2}{\sum _{t\ne i}||\varvec{X}_t-\varvec{X}_i||_2}\in (0,1] \end{aligned}$$
(6)

Another optimization problem which is related to (5) by the method of Lagrange multipliers is

$$\begin{aligned} min\ \lambda ||\varvec{Q}_i\varvec{c}_i||_1+\frac{1}{2}||\varvec{X}_i\varvec{c}_i||_2^2 \quad s.t. \quad \varvec{1}^T\varvec{c}_i=1 \end{aligned}$$
(7)

Where the parameter \(\lambda \) sets the trade-off between the sparsity of the solution and the affine reconstruction error. By solving the optimization problem above for each sample point, it is obtained that the solution \(\varvec{c}_i^T:=[c_{i1},\cdots ,c_{iN}]\). Thus, the weight vector \(\varvec{w}^T_i=[w_{i,1},\cdots ,w_{iN}]\in \mathbb {R}^N\) is defined as

$$\begin{aligned} w_{ii}=0, \quad w_{ij}:=\frac{c_{ij}/||\varvec{X}_j-\varvec{X}_i||_2}{\sum _{t\ne i}c_{it}/||\varvec{X}_t-\varvec{X}_i||_2},\quad j\ne i \end{aligned}$$
(8)

The similarity graph \(G=(V,E)\) whose nodes represent sample points is built. Node i, corresponding to \(\varvec{x}_i\), connects the node j, corresponding to \(\varvec{x}_j\), with an edge whose weight is equal to \(|w_{ij}|\). Each node i connects only a few other nodes named sparse neighbors that correspond to the neighbors of \(\varvec{x}_i\) in the same manifold. Hence, the similarity matrix \(\varvec{W}\) is constructed with weight vector \(\varvec{w}_i\) and reflect the distance from sparse neighbors to \(\varvec{x}_i\). The samples are clustered by applying spectral clustering to \(\varvec{W}\) [9], or by applying dimensionality reduction for original samples with Laplacian eigenmap [10], and then the classifier is constructed effectively.

3 The Method of Feature Extraction Based on SMCE

The mathematical principles of SMCE adapt to engineering applications because the high-dimensional Euclidean space in SMCE is corresponding to the feature space that is spanned by feature vectors of machinery. In practice, the manifolds embedded in feature space have the following characters: the density of sample points in manifold is different in different regions in the manifold, and curvature of each point in manifold is also different. Therefore, it is inappropriate that the fixed neighborhood size is used to extract the manifold structure within mechanical operation data.

The initial feature space should be constructed firstly before that the nonlinear dimensionality reduction is done by manifold learning. In the engineering application, the initial feature space can be spanned by feature vectors that should be composed by characters in time domain or characters in frequency domain, or characters in both time and frequency domain of vibration signal [1]. In this paper, we combine characters in time domain and frequency domain of vibration signals as an initial feature vector. The term of characters in time domain is shown in Table 1, and the sub-band energy method is employed to establish frequency feature vector.

Table 1. Characters in time domain

The energy distribution in different frequency sub-bands of the vibration signals indicates the operation condition of machinery. Thus the sub-band energy vector could regard as the frequency feature vector. Divide vibration signal into m sub-bands with a constant bandwidth and the frequency range of i-th sub-band is shown in (9).

$$\begin{aligned} f_i=f_{i-1}+\varDelta f \end{aligned}$$
(9)

Where \(i=1,2,\cdots ,m\), \(f_i\) and \(f_{i-1}\) is the upper and low limiting frequency of i-th sub-band, \(\varDelta f\) is the bandwidth of i-th sub-band. Through Fast Fourier Transform(FFT), the number of frequency lines \(n_i\) in each sub-band is equal, that is

$$\begin{aligned} n_i=\frac{f_i-f_{i-1}}{\xi } \end{aligned}$$
(10)

Where \(\xi \) is frequency resolution. Hence, all frequency lines can be divided into m equal partitions related to m sub-bands. Suppose that the sequence index of frequency lines in i-th sub-band is corresponding to \(k_{i-1}\) and \(k_i\), the energy of i-th sub-band is obtained by (11) according to Parseval’s theorem.

$$\begin{aligned} E_i=\sum ^{k_{i}}_{k=k_{i-1}}|A_k|^2 \varDelta f \end{aligned}$$
(11)

Where \(A_k\) is the amplitude of frequency line whose sequence index is k. In this paper, the vibration signal is divided into 16 equal sub-band, thus we can obtain a feature vector \(\varvec{y}_f\) of dimension 16. And \(\varvec{y}_f\) is normalized by (12). In a similar way, the feature vectors in time domain that is shown in Table 1 can be normalized, and we obtained a normalized feature vector in time domain \(Y_t\) with dimension 12. And then, the feature vectors \(Y=[Y_f,Y_t]\) with dimension 28 is used to span the initial feature space.

$$\begin{aligned} \varvec{Y}_f=\frac{\varvec{y}_f-min(\varvec{y}_f)}{max(\varvec{y}_f)-min(\varvec{y}_f)} \end{aligned}$$
(12)

The presented method of machinery fault diagnosis based on manifold learning is done with following stage.

  1. (1)

    For each vibration signal, the feature vector with dimension 28 is constructed firstly.

  2. (2)

    Using appropriate parameter \(\lambda \) to build similarity matrix W. Due to there is not mature theory to guide us to choose the best parameter \(\lambda \), therefore the choice of best parameter \(\lambda \) is done by equal step length searching. Namely, the diagnosis accuracy is different under different \(\lambda \), and we fixed the \(\lambda \) corresponding to best accuracy.

  3. (3)

    Clustering of the samples data by applying spectral clustering [9] to W. The samples data from the same machinery condition is in one set and from the different condition is in different set. This is the unsupervised fault diagnosis approach. In addition, we apply laplacian eigenmap [10] to do dimensionality reduction and the low-dimensionality representation of original high-dimensionality feature space is obtained. And then, the support vector machine(SVM) is employed to do the classification of the low-dimensionality space. Because of SVM is one of supervised classifier, this approach of fault diagnosis is named supervised method of fault diagnosis.

4 Experiment

Bearing Date Center(BDC) [11] of Case Western Reserve University has been used by many researchers [12, 13]. The availability of BDC has been proved, and this database has become a standard database of vibration signal of bearing defect. In this paper, The deep groove ball bearings SKF 6205-2RS-JEM-SKF with 9 rolling elements made by Swedish is employed as the analysis object. In experiment, the shaft speed is 1750RPM and the sampling frequency is 48KHZ. The vibration signal that collected by a accelerometer is shown in Fig. 1. Considering four condition of bearing, that is, normal, inner-race defect, ball defect, outer-race defect, 200 samples were chosen for each condition. Thus there are 800 samples in all and the sample length is 2048 of all. The feature space is spanned by 800 feature vectors, denoted by \(\varvec{x}_1,\varvec{x}_2,\cdots ,\varvec{x}_{800}\in \mathbb {R}^{28}\), and the manifold structure in the feature space is extracted by the method described previously. Similarity matrix \(W\in \mathbb {R}^{800\times 800}\) is obtained with fixed parameter \(\lambda \). And then, the unsupervised fault diagnosis can be achieved by spectral clustering [9]. For the supervised fault diagnosis, Laplacian eigenmap [10] is applied to the W, and original feature space is mapped to three dimensional space. And then, the samples of each condition in three dimensional space is divided into 2 equal partition. One of the half samples are taken as training samples, and the remaining are testing samples. The problem we faced is to classify 4 conditions, but SVM can classify only 2 condition at a time. Generalizing method to solve this challenging is by decomposing the 4 class problem into 2 class problem. Each time take one class of the training samples as positive class and take remaining class as negative class, thus classification of 4 class is done by doing SVM 4 times. The parameter \(\lambda \) play a important role for diagnosis, we make the parameter \( \lambda \) to traverse [10,200] with step 5. Under different \(\lambda \), the diagnosis accuracy of spectral clustering and support vector machine is shown in Fig. 2.

Fig. 1.
figure 1

The original signal of bearing. (a) Normal condition. (b) Inner-race defect. (c) Ball defect. (d) Outer-race defect.

Fig. 2.
figure 2

The diagnosis accuracy with different \(\lambda \)

Fig. 3.
figure 3

Three dimensional representation of original feature space

As the Fig. 2 show, the best accuracy of diagnosis with spectral clustering is up to 97.125 % when the \(\lambda \) is set as 30. In the other way, three dimensional representation of origin feature space with dimension 28 is obtained under different \(\lambda \). The best accuracy of diagnosis with SVM is up to 98.75 % when the \(\lambda \) is set as 130. And the three dimensional space is shown in Fig. 3. Spectral clustering do the classification using the geometric structure of space spanned by sample point and without the tag of samples. This is the reason that it is named unsupervised diagnosis. Conversely, SVM do the classification with tags of samples, thus the diagnosis based on SVM is named supervised diagnosis. From Fig. 3, the diagnosis accuracy of SVM is higher than spectral clustering and the accuracy of both has remained stable under different \(\lambda \). The accuracy difference between maximum and minimum is 0.625 % and 1 % with spectral clustering and SVM respectively. Namely, the parameter \(\lambda \) is easily fixed. It is demonstrated that the presented method is suitable for engineering applications. To reveal the outstanding characters of presented method, the fault diagnosis is done with LLE [5] instead of SMCE. The classifier is established with SVM. The accuracy of diagnosis is shown in Fig. 4. It is illustrated that the diagnosis accuracy vary in a large range with different parameter. The comparison of diagnosis accuracy between SMCE and LLE is shown in Table 2. The diagnosis method with SMCE is more stable than LLE. The satisfactory result is reached with nearly almost arbitrary parameter with SMCE, but the diagnosis result with LLE is more affected by the parameters. Therefore the convenience of parameter choice is one of the advantages of presented method.

Fig. 4.
figure 4

The diagnosis accuracy with LLE and SVM

Table 2. The comparison of accuracy between SMCE and LLE

5 Conclusion

  1. (1)

    The method of feature extraction based on SMCE could extract low dimensional manifold structure that indicates the nature of mechanical condition embed in high-dimensional feature space.

  2. (2)

    The feature vector that consist of time-domain characters and sub-band energy can be used to diagnose the condition of machinery.

  3. (3)

    The diagnosis accuracy of presented method in this paper is less affected by the parameters, therefore it is more suitable for engineering applications thanks to the convenience of parameter choice.