Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

As a highly important piece of equipment in various industrial fields, rotary machinery is integral for ensuring security and stable operations of mechanical systems. The rotor and its rotating parts are the two main components of rotary machinery. Critical consequences may result from failures in rotary machinery or its rotating units, and can be more detrimental in the absence of adequate monitoring. In addition, since condition-based maintenance is often required in the modern machinery management, it is necessary to perform condition monitoring to eliminate excess maintenance and guarantee a safe operation. Based on this, researches related to fault diagnosis of rotary machinery have attracted great interests from both academic and industrial communities.

In the past, fault diagnosis was often achieved via equipment disassembly at the job site by experienced maintenance engineers. This manual method presented challenges in guaranteeing high accuracy and efficiency. Presently, with the rapid development of information technology, it is possible to achieve online monitoring and fault diagnosis of rotary machinery. Various unstable factors may exist during the operation of rotary machinery. Abundant status information consistently results from vibrations, allowing vibration signal analysis to be a common and effective method for condition monitoring. Vibration signal analysis is typically performed in the time domain, frequency domain or time-frequency domain [1, 2]. Statistical parameters are adopted to detect and predict faults in the time domain. This method is easily implemented for fault detection, however, it cannot distinguish fault types with high precision [3]. Frequency analysis may be applied to extract fault features [4, 5], identifying fault types by highlighting characteristic fault frequencies in a spectral domain. Signals acquired by sensors, however, often contain noise, thereby complicating effective fault features extraction. Time-frequency methods have been developed to solve these issues, e.g., empirical mode decomposition [6,7,8] and wavelet analysis [9, 10], and are generally based on the Shannon sampling theory in which the sample frequency must be twice the maximum frequency. This theory indicates that a large amount of data must then be collected, creating an exceptional challenge for signal acquisition, transmission and processing. In addition, since the development of database technology and the automation improvements of large essential equipment, a real-time monitoring techniques have been widely applied. Using this approach, operation data can be acquired by a distributed control system with a high-rate collection, e.g., one data point per microsecond or even higher, to monitor the changes of displacement, acceleration or other parameters. Finally, a large amount of data is collected, and a large scale database or data warehouse is built to improve the accuracy of monitoring and its automaticity.

However, the observed data and parameters are often disorderly and unsystematic, i.e., the features are not obvious for condition monitoring. Meanwhile, complex equipment often generates a large-scale data set to be analysed. Generally, an intelligent fault diagnosis system is an information processing system, which collects a large set of information about an object with the aid of technologies related to sensing, information and data transmission [11]. It must be able to accommodate a lot of original fault information, however, it may also encounter problems of low quality data that could potentially result in uncertain information. Especially, the problems of incomplete information are also exacerbated by the limitations of current data acquisition and monitoring techniques as well as the diverse information of rotary machinery. The incompleteness and discordance of the data presents new challenges to fault diagnosis and condition monitoring. For the fault diagnosis, incomplete information primarily refers to missing attributes, incomplete data or uncertain information, which would result in an inaccurate conclusion regarding a machine’s status. Therefore, how to deal with incomplete monitoring data and make a reasonable inference about a machine’s running condition has become a hot topic in intelligent monitoring of rotary machinery.

Moreover, there is a high requirement on the real-time performance for condition monitoring of modern rotary machinery. It is expected that a fault can be discovered once it appears. However, in a big data set generated by continuous monitoring, there is only a small number of data related to a machine’s abnormality, since the large majority is healthy and stable information. Thus, it would be much easier if the monitoring data were greatly reduced, while preserving the most useful information. In this case, the pressures on acquisition and post-processing can be relieved with a guarantee to the diagnosis speed and accuracy. Whereas a large amount of sampled data is required to be within the limits of traditional Shannon sampling principle [12] used for perfect post-processing of the observed data. Therefore, it seems impossible to achieve condition monitoring of rotary machinery from an abbreviated data set as suggested above.

Compression of large-scale monitoring data to detect fault features directly from sparse samples is one way to address these challenges. A theory termed compressive sensing is such a way that provides a new insight for solving the above problems, i.e., condition monitoring from compressed samples or incomplete big data sets. The theory states that it is still possible to recover a signal from only a few samples, even with under-sampled incomplete data [13, 14]. It is a big breakthrough in the signal processing field and great attentions have been placed on it since its original proposal. Compressive sensing has been widely applied in various fields, e.g., magnetic resonance imaging [15], seismic wave processing [16] over time, yet many of the studies reported are associated with signal or image reconstruction.

For condition monitoring of rotary machinery, according to compressive sensing, operation information is possible to be reserved with well-designed sampling, then it is possible to store and transmit a small amount of samples and reconstruct them on the receiving side, and detect the fault features from only a few samples. Moreover, the fault features usually can be identified far before signal recovery is complete, thus it is not necessary to recover the signal perfectly. Effectiveness of statistical inference based on compressive sensing has been verified in references [17,18,19,20] in related fields, suggesting the possibility of estimating certain characteristic parameters from only a few compressed measurements without ever recovering the actual signals.

In the field of condition monitoring, there are also some related reports found in the relevant literature [21, 22]. Chen et al. [23] built a learning dictionary frame to extract a fault-impact signal. Zhang et al. [24] performed a preliminary study on compressive detection issues of bearing faults. Tang et al. [25] developed a sparse classification method for rotating machinery faults based on a compressive sensing strategy. Results of these studies validate the effectiveness of compressive sensing in machinery fault diagnosis; however, focuses were primarily on sparse representation or reconstruction of fault signals.

Considering the complexity of both condition monitoring and compressive sensing, there are still many obstacles that must be overcome, especially on the extraction of fault features from compressed signals. The motivation of this paper is to briefly introduce the compressive sensing theory and present some applications for the condition monitoring of rotary machinery. These compressive-sampling-based methods can help to promote the fault detection efficiency of rotary machinery faults from under-sampled signals, which will provide new insights to this research fields.

In this paper, roller bearings are used as an example to explain the main concept of the proposed strategy [26, 27]. Statistical inference based on compressive sensing has been studied in other fields [28,29,30] as mentioned above, yet there are still many obstacles to be overcome when applied to bearing fault detection. The bearing fault signal consists of impulses and in the commonly utilized Fourier or wavelet domain, its sparsity does not completely meet the requirements of compressive sensing, thereby increasing difficulty of the compressive sensing process. Also yet to be resolved also is the identification of bearing fault features to be extracted from under-sampled signals and the integration process for compressive sensing into the bearing fault diagnosis. In this study, we try to develop applicable condition monitoring strategies for bearing faults from under-sampled vibration signals, and perform simultaneously sampling and detection without a complete recovery of the incomplete signal.

The rest of this paper is organized as follows. Section 2 states the fault detection problems in rotary machinery monitoring. Section 3 provides a brief introduction to compressive sensing. Section 4 shows three proposed methods and case studies for bearing fault detection with simulation and experiments. Conclusion is drawn in Sect. 5.

2 Problem Statement

In the condition monitoring of rotary machinery, to reveal the operation status accurately and comprehensively, a large number of signals are often collected, including the signal of operating condition (e.g., speed, pressure), the vibration signals, the surrounding signals (e.g., temperature), etc. This leads to mutual crosslinking, which complicates the relationship between different signals. The intensity trends of a vibration signal are often related to the operation state of a piece of equipment, thus they are important indicators of whether a machine is running properly or not. Furthermore, the intensity is also closely related to the working condition and the surroundings. They are closely linked to each other, therefore none are dispensable. However, the limitations of the field environment, often result in a lack of data, which adversely affects the judgment of the machine status. In addition, the complication of a piece of equipment usually causes a complex signal transmission path, which leads to a serious noise interference, or even incorrect signals. Thus we have to pre-processing the observed signals, e.g., eliminating invalid signals, which often renders the data incomplete.

In short, the big data related to rotary machinery are often interfered by the surroundings, which makes the data difficult comprehensively acquired. Therefore, it is necessary to develop a strategy to deal with the big but incomplete monitoring data. Ideally, the big data should be compressed or compressively sensed without losing important information.

Without a loss of generality, a simple detection issue of bearing faults can be formulated as

$$x = s + n$$
(1)

where s is a known signal of interest, x denotes the observation signal, and n is mixed noise with interference signals from surrounding devices.

Provided s denotes a vibration signal related to a bearing fault, the fault detection problem then is to distinguish s from x. One of the common methods to distinguish the fault component s from the mixture signal x is to proceed in a transforming domain:

$$y = \varPhi x = \varPhi \varphi^{H} \varphi (s + n) = A\varphi (s + n)$$
(2)

where x is a signal of \(N \times 1\) dimension, \(\varPhi\) is an M × N measurement matrix, \(M \le N\), and each row of \(\varPhi\) represents a sensor to measure x. \(\varphi\) is a \(N \times N\) column orthonormal basis matrix, and the superscript \(\varphi\) denotes a conjugate transposition. \(A = \varphi^{H} \varphi\) is often designated the sensing matrix to measure the transformed data \(u = \varphi x\). y is a \(M \times 1\) measurement vector denoting the observation of \(y = Au\). When all N measurements are available, i.e., \(M = N\), then, \(\varPhi^{H} \varPhi = \varPhi \varPhi^{H} = I_{N \times N}\), indicating that y is an observation of x with full sampling, which can be solved by many methods.

However, to facilitate data acquisition and bypass the limitations resulting from incomplete and imprecise knowledge, \(M \ll N\) is often encountered or expected. y is then indicated as a compressive sensing of signal x. It would be promising if required information of the original signal x could be deduced from the compressed observation y without reconstruction, i.e., the compressed detection problem.

3 Compressive Sensing Theory

3.1 Shannon’s Sampling Theory

Shannon’s sampling theory was first proposed by Shannon in 1949 [31]. According to the theory, if a continuous signal can be completely represented by a cluster of samples processed at discrete time, then the samples must occur at more than twice the sampling frequency of the highest frequency of the signal.

If the signal xt is sampled through the sampling frequency \(f_{s}\) (sampling interval \(T_{s} = 1/f_{s}\)), then the sequence can be generated at \(\left\{ { \ldots ,x\left( { - nT_{s} } \right), \ldots ,x\left( { - T_{s} } \right),x(0),x\left( {T_{s} } \right), \ldots ,x\left( {nT_{s} } \right), \ldots } \right\}\),

$$x_{s} (t) = \sum\limits_{n = - \infty }^{\infty } {x\left( {nT_{s} } \right) \cdot \delta \left( {t - nT_{s} } \right)} = x(t) \cdot \sum\limits_{k = - \infty }^{\infty } {\delta \left( {t - nT_{s} } \right)}$$
(3)

where \(\delta \left( {t - nT_{s} } \right) = 1\) at \(t = nT_{s}\), and \(\delta \left( {t - nT_{s} } \right) = 0\) elsewhere. \(f_{s}\) is called the Nyquist frequency [31]. Based on this theory, the maximum frequency in the signal \(x(t)\) is \(f_{s} /2\).

The Nyquist frequency must be reached in signal band-limited processing. However, with the development of information technology, the bandwidth of the signals has been so widely expanded that many new troubles arise when dealing with these signals in data collection, data transmission and data storage.

In addition, a lot of unimportant and redundant information is contained in the sampled data. It costs large amounts of time to store and transmit data, in addition to perform an increasing time in signal processing.

3.2 Compressive Sensing

The theory of compressive sensing has been developed in the field of signal processing. It brings a new inspiration to solve problems of big data compression, incomplete data processing and rapid detection from small samples, which is regarded as a breakthrough of the Shannon sampling theorem. Here we give a brief introduction about the theory. For more detail, please refer to [13, 14].

Provided that a perceptual measurement matrix \(A = \varPhi \varphi^{H}\) satisfies the isometric constraint conditions, \(u = \varphi x\) defines a representation of a sparse signal x as

$$y = Au$$
(4)

where x is a \(N \times 1\) vector signal, \(\varPhi\) is a \(M \times N\) measurement matrix, \(M \le N\), and each row of \(\varPhi\) represents a sensor to measure x. \(\varphi\) is a \(N \times N\) column orthonormal basis matrix and the superscript H denotes a conjugate transposition. \(A = \varPhi \varphi^{H}\) is often termed the sensing matrix to measure the transformed data \(u = \varphi x\), y is a \(M \times 1\) measurement vector denoting a compressive sensing of the original full data.

Because \(M \le N\), thus Eq. (4) is an under-determined problem, whose solution can be approximately pursued as

$$\hbox{min} \left\| \theta \right\|_{0} \quad {\text{s}} . {\text{t}} .\quad y = Au = \varPhi \varphi^{H} (\varphi x)$$
(5)

Owing to the sparsity promotion strategy, if x is sparse in \(\varphi\), u and x can be recovered from the small observations y.

The theory employs a sparse space \(\varphi\) to represent the signal x and obtain a small amount of observation data y. In this way, the signal sampling is converted into an information sampling. Then by solving an optimization problem, the original signal x can be recovered from compressed observed data y. With this theory, the sample data no longer depends on the bandwidth of the signal, but on the information structures and contents of the signal. Compressive sensing makes it possible to solve inference problems with low sampling rates.

It also provides a new insight to condition monitoring of rotary machinery. According to the compressive sensing theory, a signal can be represented by or sufficiently approximated to a linear combination of predefined atoms. Then the compression efficiency can be greatly improved and processing costs can be largely reduced. Furthermore, for the condition monitoring of rotary machinery, fault features extraction from small samples are as important as those from continuously measured large samples. If we can detect the faults from only a few under-sampled signals, i.e., overcome the limitations of the traditional Shannon sampling theorem, then the requirements surrounding the data acquisition and post-processing can be greatly reduced, in addition to the time costs of condition monitoring.

Generally, sparse representation, sampling schemes and solutions of underdetermined equations are three key issues for compressive sensing technique.

3.3 Sparse Representation of a Signal

According to the compressive sensing theory, sparse representation of a signal is a precondition to recover the original signal. In many methods for signal compression, the signal is often transformed into another domain first with orthogonal projections. Then only samples at positions with large absolute values in the transform domain are reserved to obtain a compressed signal. This is called sparsity which means that a signal can be represented by a liner combination of a small amount of elements. This signal representation theory was originally developed by Mallat and Zhang with a complete dictionary sparse decomposition [32].

In general, a set of functions \(\left\{ {\varphi_{i} } \right\}\) can be found in Hilbert space \(L_{2} (R)\) so that signal y can be expressed as a liner combination of N basis \(\left\{ {\varphi_{i} } \right\}\). So,

$${\text{y}} =\Phi {\text{x}} = \sum\limits_{{{\text{i}} = 1}}^{\text{N}} {{\text{x}}_{\text{i}} {\upvarphi }_{\text{i}} }$$
(6)

where \(x_{i}\) is the coefficients of y in dictionary \(\Phi = (\varphi_{1} , \ldots \varphi_{N} )\). x and y are equivalent representations of the same signal. The difference is that y is in the time domain and x is in the dictionary \(\Phi\). We say that signal y is K-sparse, which means that the \(x_{i}\) coefficients in formula (6) has K-nonzero elements. In practice, x is considered to be compressible if there are few large coefficients and many small coefficients.

Besides sparsity, the other key point of sparse representations is incoherence which means that the basis must be obviously different [33]. The coherence between two orthonormal matrixes \(\phi_{i}\) and \(\phi_{j}\) is defined as,

$$\mu \left( {\phi_{i} ,\phi_{j} } \right) = \sqrt n \,\text{max}_{1 \le k,\,j \le n} \left| {\left\langle {\phi_{i} ,\phi_{j} } \right\rangle } \right|$$
(7)

There are two main issues in sparse representation: how to build a redundant dictionary and how to design a decomposition method. At present, the dictionaries mainly include local cosine dictionary, over-complete wavelets, curvelets and Gabor dictionary [32]. Furthermore, the decomposition methods mainly include Matching Pursuit (MP) [34], Orthogonal Matching Pursuit (OMP) [35], Basis pursuit (BP) [36] and FOCUSS [37].

3.4 Sampling Method

Sampling is the first step of conducting condition monitoring of rotating machinery. Traditional sampling methods must obey the Nyquist sampling theorem to maintain essential features of the signal so that the analysis results via Fourier transform, wavelet transform and Hilbert transform make sense. This usually generates vast amounts of monitoring data. The sampling frequency based on the compressive sensing theory may be much lower while still maintaining the signal features well.

In compressive sensing theory, the measurement matrix is the key to sampling. To ensure different sparse signals is not projected to the same M-dimensional measurement matrix for the perfect reconstruction of sampled signal, the measurement matrix must satisfy the principle of restricted isometry property (RIP) that the measurement matrix \({\varPhi}\) is noncoherent with sparse representation basis \(\varphi\).

RIP can be described below. There is an isometric constant constraint \(\varepsilon \in (0,1)\), that allows the following formula to be true for any K-sparse signal:

$$(1 - \varepsilon )\left\| {\varPhi \varPsi x} \right\|_{2}^{2} \le (1 + \varepsilon )\left\| x \right\|_{2}^{2}$$
(8)

Determination of the sampling method is essential to designing a measurement matrix Φ that meets RIP. Gaussian random measurement matrix and Bernoulli random measurement matrix are often employed in compressive sensing theory. The former obeys the \(N(0,1)\) normal distribution, and the latter meets the Bernoulli distribution. It has been proved that the Gaussian random matrix can meet RIP with great probability. Such an irregular sampling (random sampling) method is simple to design, and usually performs perfectly in the reconstruction of under-sampled data. Therefore, a Gaussian random matrix is employed to conduct compression measurement in condition monitoring and fault diagnosis of rotating machinery.

3.5 Optimization Solving Strategy

The process of compressive measurement can be described as:

$$y = \varPhi x = \varPhi \varphi^{H} \varphi (s + n) = Au$$
(9)

The sparse solution can be approximately pursued as:

$$\hbox{min} \left\| u \right\|_{0} \quad {\text{s}} . {\text{t}} .\quad y = Au = \varPhi \varphi^{H} (\varphi x)$$
(10)

However, \(M \le N\), thus Eq. (10) based on minimum \(l_{0}\)-norm is an under-determined problem within an uncertainty of solutions. Convex-optimum algorithm and greedy algorithm are most common used methods to solve the above issue. Optimization objective is replaced by \(l_{1}\)-norm that can transform the problem into linear programming and insure the uniqueness of the solution, which described as:

$$\hbox{min} \left\| u \right\|_{1} \quad {\text{s}} . {\text{t}} .\quad y = Au = \varPhi \varphi^{H} (\varphi x)$$
(11)

Basis Pursuit (BP) is a typical algorithm based on \(l_{1}\)-norm optimization. A local optimal solution is selected to approximate the original signal in each iteration of Greedy algorithms, which has a lower computational complexity than convex-optimum algorithm. Matching Pursuit (MP), Orthogonal Matching Pursuit (OMP) and their improved algorithm are typically employed in the optimization solving.

4 Proposed Strategies and Applications

4.1 Experiments

Experiments are carried out to validate the effectiveness of the proposed method. The test rig and the faulty roller bearings are shown in Fig. 1, which is composed of a motor, a coupling, a rotor and a shaft with two roller bearings. Here we do the experiments with roller bearings with single fault in the outer race, inner race and rolling element, respectively. The fault sizes are all width of 0.7 mm and depth of 0.25 mm. Sample frequency is 100 kHz at a shaft speed of 500, 900 and 1300 rpm, respectively. Vibration sensors are located at positions near bearings to mitigate the effects of signal attenuation. The bearing housing is considered to be a superior location for bearing arrangement. Vibration signals are measured by an accelerometer located at the top of the bearing house and the theoretical values of the fault characteristic frequency are shown in Table 1. All data using in this paper are processed through the normalization.

Fig. 1
figure 1

a Fault test rig of roller bearing b outer-race fault c inner-race fault d rolling-element fault

Table 1 Theoretical values of the fault characteristic frequency

4.2 Reconstruction of Incomplete Vibration Signal

Continuous condition monitoring always leads to big data, which is a major challenge for fault diagnosis. Inspired by the compressive sensing theory, reconstruction from a limited samples provides a new idea for signal storage and transmission. If the original vibration signals can be reconstructed from few samples, it enables the storage of a small amount of samples instead of the whole data set, in addition to the reconstruction of the limited samples to obtain the raw vibration signals when necessary. One of the key preconditions for the compressive sensing theory is that the analyzed signal must be sparse or compressible. Unfortunately, the vibration signals of rotary machinery are often insufficiently sparse in the common transform domain, which presents an obstacle to the application of compressive sensing in fault diagnosis. Here a compression and reconstruction strategy based on compressive sensing is presented to show the potential applications.

Vibration signals measured from faulty bearings are always drowned out by noise, which weakens the sparsity of the vibration signals. Thus, in this section a sparsity-promoted approach based on segmentation threshold denoising is developed.

As shown in Fig. 2, the original vibration signal is first divided into several segments based on its peaks, which are the significant features in faulty vibration signals. Then a threshold is set for denoising, through which the vibration signal becomes sparser. Since the vibration signal becomes adequately sparse, the unit matrix is selected as a sparse matrix, while the Gaussian random matrix is selected as the measurement matrix to gain random observations in order to meet the requirements of compressive sensing. Finally, the signal denoising and recovery are obtained via implementation of a matching pursuit strategy.

Fig. 2
figure 2

The flowchart of the proposed compression and reconstruction strategy

A vibration signal of a roller bearing with an outer-race fault operated at 1300 rpm is shown in Fig. 3, which shows that the signal is not significantly sparse. Thus, a sparsity-promoted method based on the segmentation threshold denoising is used to increase the sparsity of the original signals as shown in Fig. 4. After segmentation threshold denoising, the vibration signal becomes much sparser and smoother as shown in Fig. 5. If the signal is sparse enough, the compressive sensing theory can be applied to reconstruct the vibration signal. The dimension-reduced signal with 1000 samples as presented in Fig. 6, is achieved through random sampling. Through the application of a matching pursuit algorithm, the original signal can be recovered as shown in Fig. 7, and the envelope spectrum is shown in Fig. 8, through which the running status of roller bearing can be judged according to the fault characteristic frequency.

Fig. 3
figure 3

Original vibration signal of a roller bearing with an outer-race fault operating at 1300 rpm

Fig. 4
figure 4

Segmentation threshold denoising: a1a5 original segmentation, b1b5 after segmentation threshold denoising

Fig. 5
figure 5

Vibration signal after segmentation threshold denoising

Fig. 6
figure 6

Compressed sampling with 1000 random samples

Fig. 7
figure 7

Reconstructed signal by compressive sensing strategy

Fig. 8
figure 8

Envelope spectrum of the recovered signal

4.3 Fault Classification of Rotating Machinery [25]

The compressive sensing theory has proved its capability in reconstruction, de-noising and feature extraction of rotating machinery vibration signal. Additionally, it can be applied to fault diagnosis and classification in the case of partial reconstruction. Bearing signals are taken for example in this section, introducing a rotating machine fault classification method based on dimension reduction sampling and sparse representation.

A redundant dictionary should contain all possible types of signals that any test signal can be described as the linear combination of the vectors in redundant dictionary. For bearing signals, the redundant dictionary consists of the normal signal, inner ring fault signal, outer ring fault signal and rolling elements fault signal, a total of four signal types. E is defined as the redundant dictionary composed of k categories of samples with the following configuration:

$$\begin{aligned} E & = \left[ {E_{1} ,E_{2} , \cdots ,E_{i} , \ldots ,E_{k} } \right] \\ & = [\nu_{11} , \ldots \nu_{{1N_{1} }} ,\nu_{21} , \ldots ,\nu_{{2N_{2} }} , \ldots ,\nu_{i1} , \ldots \nu_{{iN_{i} }} ,\nu_{k1} \ldots ,\nu_{{kN_{k} }} ] \in R^{M \times N} \\ \end{aligned}$$
(12)
$$N = N_{1} + N_{2} + N_{3} + \cdots + N_{k}$$
(13)

where \(E_{i} = [\nu_{i1} ,\nu_{i1} , \ldots ,\nu_{{iN_{i} }} ] \in R^{{M \times N_{i} }}\) indicates the number of samples \(N_{i}\) of the ith category fault.

After configuration of the redundant dictionary, the test signal sample x of ith category can be described as the linear combination of the vectors in the redundant dictionary,

$$x = Eu \in R^{M}$$
(14)

Thus, the bearing signal x is represented as the sparse vector \(u = [0, \ldots ,0,u_{i1} ,u_{i2} , \ldots ,u_{{in_{i} }} ,0, \ldots ,0,]^{T} \in R^{N}\) in the transform base E which consists of over-complete training samples.

Gaussian random measurement matrix \(R \in R^{D \times M} (D \ll M)\) that contains i.i.d. \({\mathcal{N}} (0, 1)\) entries processes the bearing signal sample x and redundant dictionary E by random mapping dimension reduction, to provide compressive observations \(\tilde{y} = Rx\) and sensing matrix \(\widetilde{E} = RE\).

$$y = Rx = REu = \widetilde{E}u \in R^{D}$$
(15)

For each test sample x, its sparse solution \(\alpha\) of the training set E can be obtained through the SP algorithm. A new set of sparse vector is defined and \(u = \sum\nolimits_{i} {\delta_{i} (u)}\), setting zero value for all elements except those ones corresponding to the ith signal category. Thus, the mapping feature of the test sample x in the ith category has the following formula:

$$\hat{y}_{i} = \widetilde{E}\delta_{i} (u)$$
(16)

The residual error between compressive observations y and feature value \(\hat{y}_{i}\) is calculated:

$$\text{min}_{i} \,r_{i} (y) = \left\| {y - \widetilde{E}\delta_{i} (u)} \right\|_{2}$$
(17)

The category of the test sample can be determined by the minimum residual error.

The flow chart of this sparse representation classification framework based on compressive sensing is shown in Fig. 9.

Fig. 9
figure 9

Flowchart of the sparse representation classification framework

Each fault signal test group is 500 at three different speeds, concluding 500, 900 and 1300 rpm. Fault identification and classification accuracy of the proposed method is presented in Table 2.

Table 2 Fault classification results at different rotating speeds

Traditional fault pattern recognition methods, such as BP neural network and SVM methods, are usually based on the characteristic parameters of time and frequency domains to achieve fault classification. In this section, a compressed sensing sparse representation classification algorithm (SRC) is proposed. The solution is a sparse vector by SP algorithm and residual error calculation of the feature and observed values, determining the category of the test signal. Comparison analysis is shown in Fig. 10. The SRC method demonstrates its advantage of having a higher accuracy rate in rotating machinery fault classification than traditional BP and SVM pattern recognition methods.

Fig. 10
figure 10

Classification results of SRC method in comparison with BP and SVM

To investigate the effect of the length of original signals on the sparse classification results, the average classification accuracy rate of the proposed SRC algorithm is observed to be a gradually increased trend at different rotating speeds when the length of each input signal is varied from 5000 to 50,000. When the signal length is more than 30,000, the average recognition rate by SRC can reach 99.67%, and more details are shown in Fig. 11.

Fig. 11
figure 11

Effect of the length of signal M on average classification accuracy

Sparse dimension reduction parameter D is directly related to the feature extraction and preserving of original signals, thus it is necessary to discuss the influence of this measurement dimension D on classification accuracy rate of the proposed SRC. Figure 12 indicates that the average classification accuracy increases as the \(D = 2^{j} (j = 1,2, \ldots ,8)\), and when \(D \ge 2^{5} = 32\), the classification accuracy of the three kinds of bearing faults can reach 98.5%.

Fig. 12
figure 12

Influence of variable measurement dimension D on average classification accuracy

4.4 Compressive Sensing of Bearing Fault via Characteristic Harmonic Detection

As mentioned above, the vibration signal of a roller bearing is insufficiently sparse in the Fourier domain to meet the requirement of compressive sensing. It is well known that when a defect occurs in roller bearing, an impulse will be generated when the bearing strikes another surface and periodic impulses will be generated, termed fault characteristic frequency. The inadequate sparsity of a vibration signal exerts a negative effect on a perfect signal reconstruction. If the gathered data is incomplete or compressed due to some reasons, it should be recovered first before using post-processing methods to identify the condition of roller bearing. This makes it difficult to meet the efficiency requirement of real-time condition monitoring. Ideally, the fault detection would be performed with compressed samples directly without complete recovery [26].

The sparsity of a signal is regarded as a priori information for most existing reconstruction algorithms based on compressive sensing theory. However, in practice, it is difficult to achieve a perfect sparse representation and obtain a specific sparsity. Therefore, the sparsity of a signal must be estimated correctly, otherwise, the sparsity of a signal will be an obstacle to the application of compressive sensing.

In fault diagnosis of roller bearing, the objective is to extract fault features rather than data reconstruction. Thus, complete reconstruction of a signal is not necessary in all cases. To our knowledge, the envelope signal of a roller bearing consists of a variety of harmonic waves as sub-components, which are related to the fault features. In addition, it is well known that the sparsity of a harmonic wave in the Fourier domain has a value of 2. If we can detect these harmonic waves related to the fault features in Fourier domain, a decision as to whether or not a fault exists in the roller bearing can be made [26]. Based on this idea, a compressed fault detection method for roller bearing is developed in this work and the fault detection flowchart is presented in Fig. 13.

Fig. 13
figure 13

Scheme of the proposed fault detection strategy with compressive sensing of characteristic harmonic waves

Here the Fourier basis is selected for sparse representation, and a Gaussian random matrix is chosen as a measurement matrix to reduce the amount of bearing vibration signal. Finally, the matching pursue algorithm, such as orthogonal matching pursue (OMP), compressive sensing sampling matching pursue (CoSaMP), is utilized to detect the harmonic wave with frequencies of interest.

The proposed detection strategy is implemented to extract the fault features with a fault on the inner race at a shaft speed of 900 rpm. The waveform with impulses in time domain is presented in Fig. 14. In generally, it is difficult to extract fault characteristic frequencies from such a large number of samples. Therefore, the proposed compressed fault detection method is applied to extract the fault features. As mentioned above, the Gaussian random matrix is selected as a measurement matrix while Fourier basis is chosen for sparse representation. Next, the detection method based on CoSaMP is used to extract the fault characteristic frequency, where the sparsity K is set to 2. With a measurement matrix, the number of samples could be compressed to 800 as shown in Fig. 15. The frequency of the first detected harmonic component is 100.6 Hz, as shown in Fig. 16, which is almost equal to the theoretical value. Furthermore, the value twice to the fault characteristic frequency can also be determined, as shown in Fig. 17. Therefore, it could be concluded that a fault existed on the inner race. Different dimension of 400 is utilized to fully validated the effective of the proposed method. From the results in Figs. 18, 19 and 20, a conclusion can be drawn that the method proposed in this work can also detect the faults with 400 observations.

Fig. 14
figure 14

Time domain waveform of a roller bearing with a fault on the inner-race at 900 rpm

Fig. 15
figure 15

Random sampling through compressed sensing with 800 observations

Fig. 16
figure 16

Fault characteristic frequency of the first detected harmonic component from 800 samples

Fig. 17
figure 17

2 * Fault characteristic frequency of the detected harmonic wave from 800 samples

Fig. 18
figure 18

Random sampling through compressed sensing with 400 observations

Fig. 19
figure 19

Fault characteristic frequency of the first detected harmonic component from 400 samples

Fig. 20
figure 20

2 * Fault characteristic frequency of the detected harmonic wave from 400 samples

5 Conclusions

To solve the problems of big data and incomplete small samples in condition monitoring of rotary machinery, this paper introduced a newly developed compressive sensing theory to the field of rotary machinery. A threshold denoising method is used to promote the sparsity of roller bearing and a perfect reconstruction is achieved, which provides a new insight for signal storage and transmission. Furthermore, a fault classification based on compressive sensing is developed in this work without designing a classifier. Compared to other methods of classification, the success ratio is much higher. In addition, a compressed fault detection strategy is proposed to directly detect the fault features from limited samples, which can increase the efficiency of fault diagnosis. Reconstruction and detection may proceed simultaneously without complete recovery and significantly improving detection efficiency is validated by simulations and experiments. The strategy of compressed detection provides a new insight to condition monitoring of rotary machinery, making it possible to largely reduce the data sets while preserving useful information for monitoring. However, there are still lots of un-solved problems still remain for future investigations. Improvements in elimination of more redundant information and preservation of more useful samples will be the focus of our future work regarding compression strategy.