Keywords

1 Introduction

Manufacturing companies in today’s global marketplace race, use their best efforts to cut costs and improve product quality to maintain their competitiveness. The health and availability of such rotating machines in industry have a direct effect on production schedules, production quality, and production costs. Unforeseen machine failures may lead to unexpected machine downtime, accidents, and injuries. Condition Monitoring (CM) of rotating machine can play an important role in addressing these issues by reducing unplanned downtime of machinery, avoiding machine breakdown, and improving reliability and safety. Various components can be monitored in rotating machine, e.g., bearings, shafts, gearboxes, etc. Of these components, rolling element bearings are amongst the most important components to be monitored as their failures may lead to more significant failures in the machine. CM techniques in rotating machinery encompass the practice of monitoring measurable signals, e.g., vibration and acoustic signals that can be used uniquely or in mixture to identify changes in machine condition [1]. Consequently, this allows Condition Based Maintenance (CBM) to be arranged, or may be other actions to be taken to prevent machine breakdowns.

In general, the overall framework of rotating machine CM contains three main steps: (1) data acquisition; (2) signal processing; and (3) machine condition identification, as shown in Fig. 1. In the data acquisition step, vibration signals have been widely used since various characteristic features can be observed from vibration signals. In the second step, the collected signals are analysed by using signal processing techniques to extract features that represent the health condition of the monitored rotating machine. In the third step, an algorithm is used to identify the condition of rotating machine. The acquisition of vibration signals can be done through vibration sensors, e.g., velocity sensors and accelerometers [2].

Fig. 1.
figure 1

The overall framework of machine condition monitoring.

The sampling theorems including Shannon-Nyquist theorem are in the core of the current sensing systems. However, Nyquist sampling rate which is at least twice the highest frequency enclosed in the signal is high for some recent emerging applications, e.g., industrial rotating machine, that provide a means of measuring a huge amount of data to be transmitted, stored, and processed. Likewise, some applications that include wideband it is often very costly to collect samples at the necessary rate.

In place of processing the original collected vibration signals, i.e., raw signals, the common methodology is to identify a lower – dimensional features space that can represent the acquired large amount of vibration signals while retaining the important information of the machine conditions. Normally, vibration signals can be analysed in three key sets including, time domain, frequency domain, and time-frequency domain [3, 4]. Time-domain based techniques extract features from the raw vibration signals using several statistical parameters, e.g., Impulse factor, skewness, kurtosis, Crest Factor, root mean square, peak-to-peak value, etc. [5]. The frequency domain based techniques, e.g., baseband auto-spectral density, linear frequency spectrum, and phase-averaged linear spectra, which can be produced by Fourier transforming time series, have the ability to observe substantial diagnosis information based on the frequency features which are not easily to be observed from time domain series [6]. The time-frequency domain based techniques have been utilised for non-stationary waveform signals that are very common when machinery fault happens. In the literature, several time-frequency domain based techniques have been proposed and applied to vibration based machinery fault diagnosis, e.g., Wavelet Transform (WT), Hilbert-Huang transform (HHT) [7], Short Time Fourier Transform (STFT) [8], Empirical Mode Decomposition (EMD) [9] etc. Furthermore, the application of Spectral Kurtosis and Kurtogram have been investigated and effectively utilised for both vibration signal and acoustic emission in relation to bearing defect identification [10].

As far as the problem of the high dimensionality of the acquired vibration signals is concerned, dimensionality reduction techniques are employed as a data pre-processing stage or as part of the data analysis. Reducing the dimension of vibration signals is useful as it improves the computational efficiency and may develop the accuracy of the analysis. Normally, low-dimensional space of vibration signals can be generated by selecting a subgroup of the original features or by transforming it to a new reduced group of features. For instance, Sakthivel et al. [11] transformed statistical features extracted from vibration signals using several dimensionality reduction techniques including, Principal Component Analysis (PCA), Kernel PCA, Maximum Variance Unfolding (MVU), Local Tangent Space Analysis (LTSA), Diffusion Maps (DM), Laplacian Eigenmaps (LE), and Local Linear Embedding (LLE); then, they are classified using Decision Tree (DT), Bayes Net (BN), and Naïve Bayes (NB) classifiers. Wang et al. [12] proposed a PCA-based technique on defined time-frequency statistical features of rolling bearing vibration signals and faults diagnosis was evaluated using a fuzzy C-means (FCM) model. Widodo et al. [13] studied the application of Independent Component Analysis (ICA) and Support Vector Machines (SVMs) to detect and diagnosis induction motor faults using vibration signals. Ciabattoni et al. [14] presented a novel Linear Discriminant Analysis (LDA) based algorithm to deal with fault vibration data dimension reduction and fault detection problems.

Compressive Sampling (CS) [15] is a new technique that supports sampling below Nyquist rate and shows great possibilities to reconstruct the high dimensional signals from fewer measurements using various signal reconstruction techniques, e.g., Compressive Sampling Matching Pursuit (CoSaMP) [16]. In recent years, several publications have been appeared documenting the use of CS in machine fault diagnosis. For example, Wong et al. [17] demonstrated that it is possible to sample the vibration of roller element bearing at less than Nyquist rate using CS framework and recover the signal for fault classification. In the same way, Li et al. [18] shown the possibility of diagnosis the fault of train’s rolling bearing from the reconstructed vibration signal based on CS. Another significant aspect of CS is the use of compressed measurements to diagnosis machine fault without reconstructing the original signal [19,20,21,22].

Motivated by the advantages of CS, PCA, LDA, and CCA, this paper proposes a new intelligent fault diagnosis method for rotating machinery. We demonstrate how to create a basis from CS based compressively-sampled data that contains highly correlated potential features of sufficient principal components and discriminative components for fault classification.

2 Compressive Sampling (CS)

The basic idea of CS is that many real-world signals that have sparse representations in some domain, e.g., Fourier Transform (FT), can be recovered from fewer measurements satisfying certain conditions. To obtain a compressed signal y ϵ Rm x 1 from a given collected vibration signal x ϵ Rn x 1 using the standard framework of CS can be explained as following [23]

$$ x = \psi s $$
(1)

where s ϵ Rn x 1 is a column vector with k nonzero coefficients and represents the sparse components of x. \( \psi \) ϵ Rn x n is the sparsifying transform, e.g., Fast Fourier Transform (FFT) matrix. The compression process can be achieved through a measurement matrix \( \phi \) ϵ Rm x n and a compressive sampling rate (α) where m = α * n and m ≪ n. The equation that describes the compressed signal y ϵ Rm x 1 is as follows:

$$ y = \phi \psi s $$
(2)

The measurement matrix \( \phi \) should satisfy the Restricted Isometry Property (RIP), i.e., satisfy the data minimal information loss, to obtain compressed measurements that have the quality of the original signal. Random matrices, e.g., Gaussian random matrix, satisfy the RIP [24]. The possibility to reconstruct the original signal from these compressed samples indicate that the compressed samples possess the quality of the original signal.

Having explained CS framework, the following section will give brief descriptions of PCA, LDA, and CCA algorithms.

3 Subspace Learning Techniques

In terms of subspace learning techniques that used in this study, this section has presented brief descriptions of PCA, LDA, and CCA algorithms.

3.1 PCA

PCA is an orthogonal linear feature projection algorithm that aims to find all the components (eigenvectors) in descending order of significance. The procedure of PCA involves the following steps.

  • Calculate the mean vector of the data.

  • Compute the covariance matrix of the data.

  • Obtain the eigenvalues and eigenvectors of the covariance matrix.

PCA can be employed to form a low-dimensional feature vector [25]. To reduce the dimensionality of the data by means of PCA, one ignores the least significant components from the PCA. Suppose that the input dataset \( Y = \left[ { y_{1} , y_{2} , \ldots , y_{L} } \right] \) has L observations and m-dimensional space. PCA transforms Y to \( \hat{Y} = \left[ { \hat{y}_{1} , \hat{y}_{2} , \ldots , \hat{y}_{L} } \right] \) in a new m1-dimensional space that can be represented by the following equation:

$$ \hat{Y} = W^{T} Y $$
(3)

here \( W \) is projection matrix in which each column vector is composed of the corresponding eigenvectors of \( m1 \) largest eigenvalues (\( m1 \ll m \)) of the covariance matrix C that can be computed as follows

$$ C = \frac{1}{L}\sum\nolimits_{i = 1}^{L} {(y_{i} - \bar{y})(y_{i} - \bar{y})^{T} } $$
(4)

where

$$ \bar{y} = \frac{1}{L}\sum\nolimits_{i = 1}^{L} {y_{i} } $$
(5)

3.2 Fisher LDA

Unlike PCA, searching for the most significant components of samples, Fisher LDA [26] aims to discover discriminant components that distinguish different class samples. In fact, LDA collects the samples from the same class and expand the margin of samples from different classes. This method considers maximizing the Fisher criterion function J(W), i.e., the ratio of the between the class scatter \( \left( {S_{B} } \right) \) to the within class scatter (\( S_{w} ) \) such that

$$ J\left( W \right) = \frac{{\left| {W^{T} S_{B } \left. W \right|} \right.}}{{\left| {W^{T} S_{w } \left. W \right|} \right.}} $$
(6)
$$ S_{B} = \frac{1}{L}\sum\nolimits_{i = 1}^{c} {l_{i} \left( {\mu^{i} - \mu } \right)\left( {\mu^{i} - \mu } \right)^{T} } $$
(7)
$$ S_{w} = \frac{1}{L}\sum\nolimits_{i = 1}^{c} {\sum\nolimits_{j = 1}^{{l_{i} }} {(y_{j}^{i} - \mu^{i} )\left( {y_{j}^{i} - \mu^{i} } \right)^{T} } } $$
(8)

where L is the total number of observations, c is the number of classes, y ϵ R L x m is the training dataset, \( y_{1}^{i} \) represents the dataset belong to c-th class, \( l_{i} \) is the number of observations of the i-th class, \( \mu^{i} \) is the mean vector of class i, and \( \mu \) is the mean vector of all training dataset. LDA projects the space of the original data onto a (c – 1) – dimension space by finding the optimal projection matrix W that maximizes the J (W) in Eq. (6) such that

$$ \hat{W}{ = }arg \mathop {max}\limits_{W} J\left( W \right) $$
(9)

here \( \hat{W} \) is composed of the selected eigenvectors (\( \hat{w}_{1} , \ldots , \hat{w}_{m2} \)) with the first m2 largest eigenvalues (m2 = c − 1).

3.3 CCA

Different from PCA and LDA that encompass only one dataset, CCA is a statistical method for finding linear combinations of two datasets by maximizing their correlation [27]. For example, let \( (y_{1} ,y_{2} ) \in R^{m1} , R^{m2} \) be two vectors with covariance \( (\sum_{11} ,\sum_{22} ) \) and cross-covariance \( (\sum_{12} ) \). CCA finds linear combinations of \( y_{1} \) and \( y_{2} \) vectors \( \left( {\acute{w}_{1} y_{1} ,\acute{w}_{2} y_{2} } \right) \) that are maximally correlated such that the following objective function is maximised

$$ (w_{1} , w_{2} ) { = }arg \mathop {max }\limits_{{W_{1} , W_{2} }} (\acute{w}_{1} y_{1} ,\acute{w}_{2} y_{2} ) $$
(10)
$$ = arg \mathop {max }\limits_{{W_{1} , W_{2} }} \frac{{\acute{w}_{1} \mathop \sum \nolimits_{12} w_{2} }}{{\sqrt {\acute{w}_{1} \mathop \sum \nolimits_{11} w_{1} \acute{w}_{2} \mathop \sum \nolimits_{22} w_{2} } }} $$
(11)

where \( \sum_{11} ,\sum_{22} , \) and \( \sum_{22} , \) can be computed as \( \mathop \sum \nolimits_{11} = \frac{1}{m} y_{1} \acute{y}_{1} ,\mathop \sum \nolimits_{22} = \frac{1}{m} y_{2} \acute{y}_{2} , \mathop \sum \nolimits_{12} = \frac{1}{m} y_{1} \acute{y}_{2} \). Based on the theory of CCA, the objective function in (10) can be rewritten as follows:

$$ \begin{aligned} & arg \mathop {max }\limits_{{W_{1} , W_{2} }} \acute{w}_{1} \mathop \sum \nolimits_{12} w_{2} \\ & {\text{s}} . {\text{t}}\quad \quad \acute{w}_{1} \mathop \sum \nolimits_{11} w_{1} = 1, \acute{w}_{2} \mathop \sum \nolimits_{22} w_{2} = 1. \\ \end{aligned} $$
(12)

More details of the mathematical formulation of CCA can be found in [26].

4 The Proposed Method

To classify the condition of rotating machine from collected vibration signals, a novel condition monitoring method is proposed in this paper. The intention of this method is to reduce vibration signals dimensionality as much as possible while retaining appropriate fault information from which a basis of sufficient principal components (PCs) and enough discriminant components (DCs) will be created for fault classification. As shown in Fig. 2, our proposed method employed CS to obtain compressed measurements that possess the quality of the original vibration signals. Then, a multi-step methodology of PCA, LDA, and CCA is utilised to extract further reduced features from the obtained compressed measurements. Finally, with these extracted features SVM is used to classify rotating machine health condition.

Fig. 2.
figure 2

The proposed method.

As shown in Fig. 3, to obtain compressed measurements using CS framework in (2), we began by obtaining the sparse representation \( S \in R^{nx1} \) from the raw vibration signal \( x \in R^{nx1} \) using FFT. Then, the compression process is applied using Gaussian random matrix and an appropriate compressive sampling rate (α) to produce the compressed measurement \( y \in R^{mx1} \). After the compression process, the compressively-sampled signals were used to create a further reduced features vector of sufficient principal components and enough discriminant components using a proposed multi-step technique.

Fig. 3.
figure 3

The compression process of acquired vibration signal using CS framework.

In this technique, PCA and LAD are first applied individually to extract subspaces of principal components \( \hat{y}_{1} \in R^{ m1 x 1} \) and discriminant components \( \hat{y}_{2} \in R^{ m2 x 1} \) from the compressed samples \( y \in R^{mx1} \) where m1 is a chosen number of principal components and m2 = c − 1; here c is the number of classes associated to each machine condition. Once the PCA and LDA based subspaces were extracted, CCA is used to maximize the correlation between \( \hat{y}_{1} \) and \( \hat{y}_{2} \) by finding the weighted linear composites \( {\text{w}}_{1} \) and \( {\text{w}}_{2} \). The size of \( {\text{w}}_{1} \) and \( {\text{w}}_{2} \) are defined by the smaller number of components of \( \hat{y}_{1} \) and \( \hat{y}_{2} \), i.e., m1 and m2 respectively. The linear combination of \( \hat{y}_{1} (\hat{y}_{{1_{CCA} }} = w_{1} *\hat{y}_{1} ) \) and \( \hat{y}_{2} (\hat{y}_{{2_{CCA} }} = w_{2} *\hat{y}_{2} ) \) will maximize their correlation.

Following this process, the learned features \( \hat{y}_{{1_{CCA} }} \) and \( \hat{y}_{{2_{CCA} }} \) are concatenated to obtain a vector that comprise high correlated representations of principal components and discriminative components. Finally, we used multi-class SVM classifier to classify machine condition.

Our proposed method is validated through computer experiments of a fault classification case study of rolling element bearings.

5 Experimental Study

The bearing dataset used in this study is provided by the Case Western Reserve University (CWRU) [28]. The vibration signals were acquired from the drive end of a motor in a test rig (Fig. 4) under normal condition (NO), outer race (OR) fault, inner race (IR) fault, and rolling element (RE) fault. The dataset are further grouped by fault width (0.18, 0.36, and 0.53) under four motor loads (0, 1, 2, and 3 hp) with different shaft speeds (1797, 1772, 1750, 1730 rpm). The sampling rate used to sample the data was 12 kHz. In this study, the motor bearing dataset composed of these vibration signals with 10 bearing health condition and 100 signal examples for each health condition per fault width under four load conditions. Therefore, the total dataset contains 400 examples for each health condition (4000 signal examples for all health conditions) with 1200 data points for each signal. Figure 5 depicts some typical time series plots for the ten different aforementioned health conditions. Moreover, the explanation of this dataset is presented in Table 1.

Fig. 4.
figure 4

CWRU bearing test rig [28].

Fig. 5.
figure 5

Typical time series plots for the ten different health conditions.

Table 1. Description of bearing health conditions under four loads with different fault width.

To classify the motor bearings health condition in this case study, we applied our proposed method to this bearing dataset. Half of the bearing vibration signal examples were selected randomly for training, and the other half of the signal examples were used for testing. To apply our method, we began by obtaining the compressed measurements from the training set with 2000 examples and with 1200 time samples for every example using CS framework. We used FFT basis sparse representations of training signals and random Gaussian matrix as measurement matrix with different values of the compressive sampling rate (α) with 0.05, 0.07, and 0.1.

To learn features from the training set with compressed measurements, we applied our proposed multi-step approach using 9 components for LDA and 10 principal components for PCA for each of the compressed measurements described above. The concatenated features produced from this approach were used to train our classifier. For the classification problem, we employed SVM with “fitcecoc” function [29] on the learned features, i.e., the concatenated features. It uses c(c − 1)/2 binary SVM models using one-versus-one coding design, where c is the number of unique class labels. This will return a fully trained error-correcting output codes (ECOC) multiclass model that cross-validated using 10-fold cross-validation. The overall classification results and their related root mean square errors of 20 experiments are shown in Table 2.

Table 2. Classification results with their corresponding root mean square (RMSE) and computational time using various compressive sampling rates (α).

As Table 2 shows, our proposed method deliver high classification accuracy for each value of α with a small RMSE. Particularly, the testing classification accuracy for α = 0.1 is 99.9%, and the testing time required by our method is only 1.62 s. In general, the computational time increased slightly with the increase in α value. For example, the total time for training and testing with the smallest value of α, i.e., α = 0.05 (5.6 s) increased by less than 20% compared with the total time required by the largest value of α = 0.1 (6.45 s).

Table 3(a) and (b) shows some sample confusion matrices of ten types of a health condition in the classification results for values of α = 0.1 and 0.07 respectively. It can be clearly seen that with α = 0.1 in Table 3(a), only two signal examples of IR1 is likely to be estimated as IR3, i.e., the proposed method misclassified only 1% of testing examples of IR1 as IR3. With α = 0.7 in Table 3(b), only one of IR1 (0.5 of IR1 testing examples) is likely to be confused with IR3, two of IR2 (1% of IR2 testing examples) to be classified as IR1, and five of IR3 (2.5% of testing examples) is expected to be classified as IR1.

Table 3. Sample confusion matrix

For additional assessment of the efficiency of our proposed method, several experiments were conducted for α = 0.1 with training size of 10% and 40%, and 20 trials for each experiment. The results of these experiments are compared to some recently published results [30,31,32,33]. Table 4 shows the comparisons. The first column denotes to the scenarios of the motor operation and load conditions in which the bearing samples acquired, these scenarios will be stated as fixed load and variable loads. The second column describes the methods utilised to classify bearing conditions. The third column defines the percentage of samples used for training. The fourth column refers to the related load of the data used with each method, and the fifth column presents testing accuracies obtained using these methods.

Table 4. A comparison with results from literature on CWRU vibration dataset of roller bearings

It can be clearly seen that compared with methods presented in Table 4, in both load conditions our method with the smallest percentage (10%) of samples of the original samples, i.e., compressed using α = 0.1, achieved the highest classification accuracies. In particular, results of 10% training size of data collected under variable loads of 0, 1, 2, and 3 horsepower and separated loads of 0 and 3 horsepower with 99.8% classification accuracy. With 40% training size of data collected under variable loads, the proposed method achieved 99.9% classification accuracy. Taken together, these results show that the proposed method has the ability to classify the bearing conditions with high accuracy compared to other results reported in [30,31,32,33].

6 Conclusion

A new method for rotating machinery condition monitoring has been proposed. In this method, CS is used to generate compressively-sampled signals. Then, a multi-step feature learning approach joining the advantages of PCA, LDA, and CCA is proposed to learn further reduced features for fault classification. With these learned features, multi-class SVM classifier is employed to deal with the classification problem. From the experimental results, the proposed method has achieved a high classification accuracy with a significantly reduced dimension of the original signals and its performance was benchmarked against some existing methods. Moreover, the high classification results achieved in both fixed and variable loads suggest that the proposed method is suitable in rotating machine condition monitoring where the properties of overloads or unexpected load changes may be a reason of rotating machine faults.