Introduction

Rolling element bearings are one of the most important components of any industrial setup from motors to turbines, compressors and heavy ground and air vehicles [14]. During the mechanical process, different faults arise that generate vibration and acoustic emission (AE) signals having different characteristics. The differences are due to the complexity of the mechanical setup and the correlation between different mechanical components. Real-time monitoring and fault detection can avoid disastrous failures [38] and severe losses [66] and hence have received considerable attention [39]. A variety of condition-based monitoring methods have been developed and discussed in the literature, including but not limited to vibration, acoustic emission, oil debris, electrostatic and temperature analysis [30, 37, 44, 53].

Vibration signal can be used for detection and location of faults like mass unbalance, misalignment, gear faults and cracks along with their propagation in rotating shafts and gear wheels [27].However, early detection of these cracks in shafts and gears is possible with acoustic emission [56] with the typical frequencies range from 20 KHz to 1 MHz [29]. Sound is a vibration signal that propagates in audible frequency range, while acoustic emission is generated as transient elastic wave resulting from fast strain energy discharge as a result of damage within or on the material surface [48]. Audible sound frequency ranges from 20 Hz to 20 KHz and hence requires much less samples as compared to acoustic emission.

Vibration and AE signal analysis are the effective tools for studying local defects and detecting its size in rolling elements. However, the vibration signal analysis has limited efficiency in fault detection in low-speed rotating machines and early detection of faults as compared to acoustic emission, making AE as one of the extensively reported techniques in the literature [53]. These signals are complicated and non-stationary in nature with heavy background sounds [18]; therefore, a pattern recognition process is combined with different signal processing algorithms to detect and classify different faults. Signal characteristics to be used with artificial intelligence (AI) are extracted using signal processing techniques like time domain analysis, frequency domain analysis, time–frequency domain analysis, etc. [13, 18, 67]. However, recent development in AI with the ability of extracting features from original signals, such as deep neural networks, is being used in fault diagnosis [55, 60, 63, 68].

Time domain analysis like energy, average signal level and duration count of AE signals has resulted in the detection of shaft cracks [24], and the amplitude and energy of the acoustic emission signal are exploited for defects in roller bearings [1]. Abdullah et al. [2] have compared the vibration and AE signals for bearing defects and its size, using amplitude and the root-mean-square (RMS) values of the signal under test. Authors in [13] have analyzed vibration data of slew bearing in detail with time domain kurtosis, wavelet domain kurtosis and largest Lyapunov exponent (LLE) feature. The authors have combined these algorithms with kernel-based regression to detect the incipient damage and estimate the useful life of the slew bearing. Kurtosis and its variations like spectral kurtosis, short-term Fourier transform (STFT)-based kurtosis, kurtogram and adaptive spectral kurtosis of the vibration signals for rotating machinery are discussed in detail in [65]. Antoni et al. [5] have described the algorithm of fast kurtogram with computational complexity similar to that of fast Fourier transform (FFT), for the detection of transient faults.

In frequency domain, Fourier transform is one of widely used signal processing technique [42]; however, it lacks time information. As vibration signals are non-stationary in nature, time–frequency methods are used. To provide local feature information the STFT is proposed [26] that calculates local spectrum of the signal using a fixed window. A more flexible approach in the time–frequency analysis is the wavelet transform that can be used to detect faults [7, 23, 31, 34]. Authors in [15], have used time–frequency technique like Hilbert–Huang transform (HHT) and continuous wavelet transform (CWT) to detect coupling shaft cracks, misalignment and rotor–stator rub in the vibration signals. HHT is combined with KNN classifier to effectively monitor defects in bearing [49] and wavelet transform for early stage fault detection in outer race [14]. At very low signal-to-noise ratio (SNR), fault at incipient stage is detected by [14], using wavelet transform for de-noising of the AE signals and envelop detection with autocorrelation for detecting faulty patterns. Authors in [22] have used the spectral contents of intrinsic mode function (IMF) to analyze both the vibration and AE signals to detect broken bar, bearing defect and unbalanced load distribution in the induction motors. However, this method faces some limitations like a priori knowledge of the number of modes in which the signal is required to be decomposed and localization of fault frequencies. Authors in [67] have used empirical mode decomposition algorithm and convolutional neural networks (CNNs) to extract features from raw vibration signal. The SVM and Softmax training algorithms were used for the classification of faults into outer race, inner race and ball faults.

To detect the vibration and acoustic signals, sensors like tachometer or accelerometer are directly connected to the machinery under test. In case of complex machinery and/or high temperature and humidity, mounting of sensors directly on the machinery under test becomes infeasible. Therefore, to reduce the installation and maintenance cost of the CBM system, it is required to remotely sense the signals. One of solution is to use noninvasive sensors like microphones for recording machine sounds [32]. Use of sound signals for the maintenance is a cost-effective method and can easily be utilized by small-scale industries [4]. The microphones are generally mounted 2–10 cm away and positioned toward the machinery under observation [11, 64]. In most of the cases, microphone arrays are used for recording machine sounds [6, 17, 25, 41, 43, 58], but the use of single microphone for fault detection did not receive much attention due to its complexity [64] and due to contamination of sound from the surrounding environment and, hence, require complex source separation algorithms like wavelet and blind source separation (BSS) [45, 70].

Sound recognition and classification using a single microphone have reported in the literature for speaker and environmental sound recognition [62], like wavelet packet transform (WPT), mel-frequency cepstral coefficients (MFCCs), hidden Markov model (HMM), Gaussian mixture model (GMM), etc., and some of these algorithms are also used in vibration-based fault diagnosis. These algorithms can be and are being used in machine fault diagnosis, as the machine sounds are less dynamic as compared to human speech [40]. In [46], multi-scale fractal dimension (MFD) is successfully used for feature extraction from vibration signals of faulty bearings. These features are used with HMM and GMM classifier, giving better results, but with expensive computation. Similarly, a combination of MFCC and kurtosis for feature extraction with HMM gave similar results as MFD and HMM [46]. Wavelet packet transform combined with FFT and artificial neural network (ANN) was used for fault diagnosis and prognosis in blower using low-frequency vibration recorded with three sensors [69]. In [52], MFCC and linear predictive coding (used in speech encoding), combined with Euclidean distance and KNN, resulted in 92% and 94% accuracy, respectively, in vibration-based bearing fault detection. In [40], MFCC with KNN and multivariate Gaussian distribution (MGD) was used to classify different machine sounds. In this research work, features like frequency cepstral (FFC) coefficients were extracted using linear, log and mel filters of different lengths. In this research work, the FFC features with linear filter and MGD were classified as the most appropriate method. Similarly, certain time–frequency algorithms, like STFT [28, 35, 50], Wigner–Ville distribution (WVD) [10, 36, 51, 59] that were successfully used for vibration analysis, were also used for fault detection in fuel injection system using array of microphones [3]. Authors in [54] have used the statistical and histogram methods to extract features from sound signals in audible frequency range. Statistical methods like standard deviation, mean, mode, median and variance are combined with the histogram features. Decision tree algorithm is used to select the best features that are then forwarded to ANN for classification of faults into unbalanced shaft, outer and inner race faults. The results show that statistical features outperform the histogram-based features.

Therefore, sound-based signals using microphones can be used for fault detection in rotating machines, as a noninvasive or wireless approach. This will reduce the installation and maintenance complexity and also the sampling rate as compared to AE signals. In this research work, a single microphone is used to record sound signals of bearings with different faults. As selecting the audio signal for detecting the bearing fault and classifying them into ball, inner and outer race fault was a first attempt of its type (as per our knowledge). The audio signal is a mechanical/compression signal like vibration and acoustic emission signal; therefore, an attempt is made to study all the available range of algorithms for fault detection. Therefore, a subset of candidates from each domain, which is time, frequency and time–frequency domains, were selected for testing. Different feature extraction methods from time, frequency and time–frequency domains were used with ML algorithms to detect and classify these faults. The candidate feature extraction methods used are kurtosis, skewness, spectral kurtosis, envelop detection, STFT, FFT, PSD, etc. These features were used to train different classifiers like KNN, SVM, KLDA and SDA classifiers. Rest of the paper is organized as: Sect. 2 gives a brief introduction of the signal analysis and classification methods like kurtosis, FFT, STFT, envelop detection, etc. In Sect. 3, the sound signals are analyzed using signal analysis methods and the signal graphs of different faults are discussed. Discussion on classifier and classification of faults into inner race, outer race and ball faults are discussed in Sect. 4, while the concluding remarks are given in Sect. 5.

Fig. 1
figure 1

Filter bank procedure

Signal Processing Algorithms

This section summarizes the basic concepts of the signal processing algorithms used in the rest of the paper. As discussed above, the sound signals have not been extensively analyzed for fault classification, and it exhibits the similar characteristic like vibration and acoustic signals except for the frequency range. Therefore, a subset of all the different types of analysis techniques that were used for vibration or acoustic emission fault detection are considered here. As far as passing of the signals through different frequency bands and the calculation of average FFT or PSD, etc., is considered, it has not discussed by the research community. Moreover, this will limit the size of feature vector and computational burden on the machine learning algorithm.

Statistical Features of Raw Signals

The given raw sound data were used to extract statistical features like maximum, minimum, standard deviation, mean, median, variance, range, skewness, kurtosis, Petrosian fractal dimension [8], Fisher information ratio [8, 33] and entropy. The results of skewness, kurtosis and standard deviation are discussed and presented in the next sections. These features were also concatenated to obtain the final statistical feature representation of the signal under test and were then fed to different classification algorithms. The RMS values of the frequency domain features were also calculated and, however, discussed in the frequency domain section.

Frequency Domain Features

Frequency domain is one of the important signal analysis domains, in which all the spectral components, from which the raw signal is formed, are analyzed. In frequency domain analysis, the Fourier transform, envelop detection and power spectral density are calculated. Fourier transform is calculated using fast Fourier transform algorithm, while for envelope detection, the Hilbert transform of the Fourier components is used, as given in Eqs. 1 and 2, respectively, while the PSD can be calculated using Fourier transform of the autocorrelation function. For further details, the interested readers are referred to [47].

$$\begin{aligned} y[k]=\sum _{n=0}^{N-1} {x}[n] e^{-\frac{j2\pi nK}{N}} \quad k=0,1,2,3,\ldots ,N-1 \end{aligned}$$
(1)

where y[k] is the Fourier transform and x[n] is the signal under test.

$$\begin{aligned} \hat{x}[n] = {\left\{ \begin{array}{ll} {\frac{2}{\pi }\sum \nolimits _{n=\mathrm{odd}} \frac{x[n]}{k-n}} &{}\quad \text{ if } k \text{ is } \text{ even }\\ {\frac{2}{\pi }\sum \nolimits _{n=\mathrm{even}} \frac{x[n]}{k-n}} &{}\quad \text{ if } k \text{ is } \text{ odd } \end{array}\right. } \end{aligned}$$
(2)

where \(\hat{x}[n]\) is the Hilbert transform, taking Fourier transform of \(\hat{x}[n]\) results in the envelope of the signal x[n].

In frequency domain, to further analyze the behavior of different signals with and without faults, the signals were passed through different bandpass filters. The average values of the Fourier transform and PSD at those specific frequency bands were calculated. The generalized operation is given in Fig. 1. In this figure, BPFs are the bandpass filters with subscripts from 1 to n and represents each band. The number of bands depends on the range of frequencies of interest and is variable. The features of the signals are first calculated without subjecting it to any filtering operation, as discussed above. The same input signal is also passed through different bandpass filters with desired centered and band frequencies. The FFT and the PSD of the filtered signals are averaged over that frequency range. To the best of our knowledge, this procedure of dividing signal into different frequency bands and taking average of the FFT and PSD as feature vectors for fault detection in rotating machinery is not reported in the literature.

Time–Frequency Domain Features

The input signal was first divided into multiple segments of equal length using Hanning window, and then, the STFT of each segment was computed using Eq. 3

$$\begin{aligned} X(m,w)=\sum _{N = -\infty }^{\infty } x[n] w[n-m] \exp ^{-jwn} \end{aligned}$$
(3)

where x[n] is the data, w[n] is the window function and X(mw) is the STFT of \(x[n] w[n-m]\). In the simulation, the Hanning window is used. To compute the feature vector, the STFT values of each windowed segment were averaged to form a spectrogram-based feature vector.

Fig. 2
figure 2

Test rig block diagram and specifications

Signal Analysis for Fault Detection

This section gives the statistical and signal processing analysis of audio signals. The machine learning (ML) techniques have the ability to detect minute differences and hence can classify the signals into different faults. However, to discuss the signals in depth, the graphs that can be analyzed with naked eyes are discussed here. The sound signal of healthy and faulty signals was recorded, using a single microphone, with a sampling rate of 40,000 samples per second and shaft rotating at a rate of 800 revolutions per minute. The microphone was mounted at different distances ranging from 2 to 3 cm from the machinery under test, with bearings having ball, inner race and outer race faults. The block diagram of the test rig and other specifications are given in Fig. 2.

In contrast to the laboratory settings, sound signals in heavy industries are affected by noise contributed by different types of machinery. Hence, in such cases it is necessary to filter out the signal of interest from other unwanted signals. This limitation can be overcome by physically installing our sound signal-based fault detection system in a manner to minimize the effects of unwanted noise. For example, in simple machines such as vacuum cleaner, the same sound signal can be used to classify different faults [12] and no signal separation is required. This physical isolation of sensors can work efficiently in simple mechanical setups; however, in case of heavy industry or for complex environments, additional sound separation algorithms, e.g., independent component analysis (ICA) [20], are required. Installation of sensors for sound separation may vary from industry to industry based on how different mechanical systems are mounted. This paper is an initial attempt of shifting toward sound-based fault detection in rotating machinery only. With the help of signal separation algorithms, our method can also be extended to efficient classification of other faults like gear faults and shaft misalignment, etc., in complex environments.

Fig. 3
figure 3

Statistical analysis of healthy and faulty signals

Statistical Analysis

Statistical techniques are widely used for the analysis of almost every type of data, whether stationary or non-stationary. These are well-established techniques including median, mode, kurtosis, RMS, standard deviation, regression, correlation, etc. The time domain sound signals of faulty and healthy bearings were analyzed using skewness, standard deviation, RMS values and kurtosis.

Figures comparing the selected statistical analysis are shown in Fig. 3. In Fig. 3a, the skewness comparison of the healthy sound signal is compared with that of the signals with ball fault, outer race fault and inner race fault. It can be seen that the healthy signal has the highest skewness in all the cases followed by that of the inner race fault, ball fault and outer race fault and hence can easily be differentiated. In Fig. 3b, the standard deviation of the same signals is shown. Here a gradual decrease can be seen from healthy signal to signals with ball fault, inner race and outer race fault. Similar results were obtained using the RMS values of the signals as shown in Fig. 3c. In case of kurtosis as in Fig. 3d, the kurtosis of the signal with outer race fault is the maximum, followed by that of the signal with inner race fault, healthy signal and ball fault signal. The standard deviation and RMS support results of each other, while the skewness and kurtosis give similar patterns except for the outer race fault.

These statistical values were used as characteristic vectors for training different machine learning algorithms, to classify the signals into different fault groups and is discussed in Sect. 4.

Frequency Domain Analysis

In this section, the frequency domain analysis is carried out using Fourier transform, envelop analysis and power spectral density (PSD) analysis of the faulty and non-faulty signals. In the first attempt, the FFT, envelope detection and PSD of the whole signal are calculated as shown in Figs.  45 and 6, respectively. In the second part, both the healthy and faulty signals were passed through bandpass filters and the average values of the FFT and PSD at those frequency bands were calculated as shown in Figs. 7a, b, respectively. In all these figures, the x-axis shows frequency in Hz and the y-axis shows magnitude in watts/Hz.

Fig. 4
figure 4

FFT of healthy and faulty signals

Fig. 5
figure 5

Envelope detection of the healthy and faulty signals

Figure 4 shows the frequency spectrum of the healthy signal and signals with ball fault, inner race fault and outer race fault. Figures are zoomed for easy readability. Figure 4a shows spikes at around 100, 200 and 300 Hz, while after 400 Hz the graphs become almost smooth, due to high-frequency noise. The graphs at 100, 200 and 300 Hz are zoomed for easy readability and shown in Figs. 4b–d, respectively. The healthy signal and signals with outer and inner race faults are almost centered at 99.7 Hz, while the signal with ball fault is centered at 100.3 Hz as shown in Fig. 4b. The high spike of outer race shows that the fault in outer race is thin and hair like, while the wider spike of inner race shows that the fault in inner race is dent like and relatively wider as compared to the fault in inner race. The signal showing ball fault is slightly shifted to the right as the dent in ball has changed the ball frequency. The original frequency of the shaft is 800 rev/min or 13.33 Hz, and the frequencies at which these spikes occur are multiples of the basic frequency. However, all these spikes are slightly shifted to right. Same patterns are repeated at almost 199.7 Hz and 299.7 Hz in Figs. 4c, d, respectively. There are some small spikes at frequencies 13.33 Hz and its multiples, corresponding to the fundamental shaft frequency; however, their effect is negligible as compared to the spikes at the above frequencies. The cumulative effect of spikes, at fundamental frequency and its multiples, will be analyzed using PSD, the average FFT and average PSD discussed at the end of this section.

Figure 5 shows the envelope detection of healthy and faulty signals. These figures are also zoomed for easy readability. These graphs show comparable results as discussed above in the case of Fourier transform, having similar graphs and spikes. Spikes related to healthy and faulty signals at 99.7 Hz, 199.7 Hz and 299.7 Hz are given in Figs. 5b–d, respectively. Similarly, spikes at fundamental frequency and its multiples seem to be ineffective at this stage; however, their cumulative effect will be discussed with PSD, the average FFT and average PSD.

To further assess the behavior of these signals, the power associated with the spectral components (PSD) is shown in Fig. 6. Figure 6a gives spikes at around 100, 200 and at 300 Hz; however, after 300 Hz, the graph becomes almost smooth with no further information. Similarly, the spikes at fundamental frequencies that were visible in the previous graphs are negligible here. The graph is zoomed at 100 Hz, 200 Hz and 300 Hz and is given in Fig. 6b–d, respectively. In Fig. 6b, the spikes are more clear as compared to Figs. 4 and 5. The power associated with the signal having outer race fault is concentrated in a very narrow frequency range and leads the healthy signal. The power associated with the signal having inner race fault is wider as compared to the other signals and spans over the combined frequency range of healthy and outer race fault, while signal with ball fault lags all the signals. Similar patterns are repeated at 199.7 and 299.7 Hz in Fig. 6c, d.

Fig. 6
figure 6

Power spectral density of healthy and faulty signals

However, the differences between the graphs discussed above can easily by classified into healthy and faulty signals. However, to further analyze the signals at specific frequency bands, the signals were passed through a bank of bandpass filters, with frequencies centered at 15 Hz and its multiples, having width of 30 Hz each. The averages of the FFT and PSD at those frequency bands were calculated and are shown in Fig. 7.

As shown in the previous graphs, most of the signal power is concentrated below 450 Hz; therefore, the signals are filtered upto a maximum of 1000 Hz only. The average FFT in Fig. 7a shows spikes at around 99.7, 199.7 and 299.7 Hz, while beyond 450 Hz the amplitude decreases rapidly. The average PSD in Fig. 7b is more clear, giving spikes at same frequencies, with maximum power related to healthy signal, followed by signals with inner race, ball and outer race fault. Here the power drops considerably after 350 Hz.

Fig. 7
figure 7

Comparison of average FFT, PSD and power bins of the healthy and faulty signals

Similarly, the RMS values of the FFT and the PSD are shown in Fig. 8a, b, supporting the results of the averaged FFT and PSD graphs shown in Fig. 7.

Fig. 8
figure 8

Comparison of RMS values of FFT and PSD of the healthy and faulty signals

Time–Frequency Analysis

Time–frequency graphs give analysis of the spectral components along with the time of its occurrence. These include algorithms like short-term Fourier transform, Hilbert–Huang transform, wavelet transform, etc. In this section, the STFT of the signals is taken and discussed and other time–frequency techniques will produce different graphs and may give better results, however, including all types of time–frequency analysis algorithms which need a full and dedicated research paper. The STFT of the signals is given in Fig. 9, showing graphs of healthy signal, signal with ball, inner race and outer race faults, in Fig. 9a–d, respectively. All the graphs support previous results by giving spikes at the frequencies discussed above, while the smaller spikes at multiple of 13.33 Hz are negligible, except for the outer race fault as shown in Fig. 5a, d. It can be seen that the high power frequency components are present between 200 Hz and 300 Hz and at frequencies near 100 Hz, giving reasons for the high amplitude of signal with outer race fault in the previous figures.

Fig. 9
figure 9

STFT of healthy and faulty signals

Classification of Faults

This section outlines the supervised machine learning techniques for fault classification with our proposed features extracted in the previous steps.

We begin by defining \({\mathbf {s}} = [s_1, s_2,\ldots ,s_p]^{\top } \in {\mathbb {R}}^{p}\) to be the p-dimensional feature vector that is constructed from the statistical and frequency domain features extracted from a sound signal. Let \({\mathbf {G}} = \{{{\mathbf {s}}}_j\}_{j=1}^{g} \in {\mathbb {R}}^{p\times g}\) be the training data matrix containing g normalized feature vectors. The number of fault categories (classes) is denoted with c, and the discrete class labels are represented by \({\mathbf {Y}} = \{y_j\}_{j=1}^{g}\).

The problem of sound-based fault classification involves estimating the label \(y_t\) of a test feature vector \({{\mathbf {s}}}_t \in {\mathbb {R}}^p\) (representing the sound signal) given the labeled training data \({\mathbf {G}}\). To demonstrate the effectiveness of our proposed features for fault classification, we perform experiments using well-known classical supervised learning algorithms including support vector machines (SVMs), Mahalanobis distance-based nearest neighbor (NN), sparse discriminant analysis (SDA) and kernel linear discriminant analysis (KLDA).

A brief summary of these learning methods is given below.

K-nearest neighbors (KNN)-based classification

A popular distance measure between two feature vectors \({\mathbf {s}}_i\) and \({\mathbf {s}}_j\) is the Mahalanobis distance which is defined as [61]

$$\begin{aligned} d = ({\mathbf {s}}_i - {\mathbf {s}}_j)^{\top }{\mathbf {C}}^{-1}({\mathbf {s}}_i - {\mathbf {s}}_j) \end{aligned}$$
(4)

where \({\mathbf {C}} \in {\mathbb {R}}^{p\times p} \) is the covariance matrix computed using the training feature vectors. For small training sample size, \({\mathbf {C}}\) is computed as a diagonal matrix where diagonal elements correspond to the feature variances. A test feature vector \({{\mathbf {s}}}_t\) is assigned the label of the training sample having the minimum distance d with \({{\mathbf {s}}}_t\). We then extend this strategy to a voting-based K-nearest neighbor classification.

Support Vector Machine (SVM)

Support vector machine is a popular supervised learning method originally designed for two class classification problems (\(y_j \in \{1,-1\}\)). SVM algorithm learns the parameters of an optimal hyperplane that separates two classes with the largest possible margin. This is achieved by optimizing the following objective function using the training data:

$$\begin{aligned}&\underset{{\mathbf {w}}, b, \xi }{\min } \left( \frac{1}{2}{\mathbf {w}}^{\top }{\mathbf {w}} + C \sum _j \xi _j \right) \nonumber \\&s.t. ~~ y_j({\mathbf {w}}{\mathbf {s}}_j + b) \ge 1-\xi _j, \xi _j \ge 0 \end{aligned}$$
(5)

where C is the regularization constant, \({\mathbf {w}}\) and b represent the hyperplane and \(\xi ^j\) are used to incorporate the non-separable cases. For nonlinear separation, the constraint \(y_j({\mathbf {w}} \phi ({\mathbf {s}}_j) + b) \ge 1-\xi _j, \xi _j \ge 0\) can be introduced to perform the computation in an implicit higher-dimensional space. After computing the parameters of the optimal hyperplane, the label \(y_t\) of a test feature vector \({{\mathbf {s}}}_t\) is determined using the sign of \(\frac{{\mathbf {w}}{\mathbf {s}}_t + b}{\Vert {\mathbf {w}}\Vert }\). Since we are classifying multiple faults categories (four classes), therefore, we have extended the binary SVM to multi-class SVM via a one-versus-all strategy. For a more elaborate understanding of the SVM method, readers are referred to [21, 57]. We used LibSVM [16] library to compute the parameters of the hyperplane.

Kernel Linear Discriminant Analysis (KLDA)

KLDA is used to represent data more efficiently via supervised dimensionality reduction. It means that the features that are less useful for fault classification will be suppressed by KLDA. KLDA is applied when the classes to be separated are not linearly separable in the original feature space. For achieving this, KLDA learns nonlinear projections that transform the p-dimensional training feature vectors to \(c-1\)-dimensional vectors which are then classified using linear methods (such as SVM) with higher accuracy. KLDA relies on a kernel matrix computed via dot products in a very high-dimensional feature space induced by nonlinear mapping function \( \phi :{\mathbb {R}}^{p} \mapsto {\mathcal {H}}\). However, due to huge cost involved in the explicit computation of the mapping \(\phi \), the kernel trick is used and the kernel matrix is computed in the original feature space using a valid kernel function: \({\mathbf {K}}(i,j) = k({\mathbf {s}}_i, {\mathbf {s}}_j)\). We employ the polynomial family of kernel functions \(k({\mathbf {s}}_i, {\mathbf {s}}_j)=({\mathbf {s}}_i.{\mathbf {s}}_j)^{\beta }\), where \((\cdot )\) denotes the dot product and \(\beta \) is the order of the kernel. Given an input kernel \({\mathbf {K}}\), KLDA solves the following objective function [9]

$$\begin{aligned} \varvec{\alpha }_\mathrm{opt}=\arg \max \frac{{\varvec{\alpha }}^\top {{\mathbf {K}}{\mathbf {W}}{\mathbf {K}}}{\varvec{\alpha }}}{{\varvec{\alpha }}^{\top }{{\mathbf {K}}{\mathbf {K}}}{\varvec{\alpha }}}, \end{aligned}$$
(6)

where \(\varvec{\alpha } = [{\alpha }_1,\ldots ,{\alpha }_g]^\top \). \({\mathbf {W}} \in {\mathbb {R}}^{g \times g}\) is a block-diagonal matrix: \({\mathbf {W}}=\mathrm{diag}\{{\mathbf {W}}_1, {\mathbf {W}}_2,\ldots ,{\mathbf {W}}_c\}\), where \({\mathbf {W}}_j \in {\mathbb {R}}^{m_j \times m_j}\) have every elements equal to \(\frac{1}{m_j}\) (\(m_j\) represent the number of samples in class j). The largest eigenvectors of \( ({\mathbf {K}}{\mathbf {K}}+ \epsilon {\mathbf {I}})^{-1}({\mathbf {K}}{\mathbf {W}}{\mathbf {K}})\varvec{\alpha } = \lambda \varvec{\alpha }\) give the optimal solution. A transformation matrix is then constructed from the \((c-1)\) dominant eigenvectors ( \(\varvec{\varLambda } = [\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_{c-1}] \in {\mathbb {R}}^{p \times (c-1)}\)), and the training data matrix is projected on \(\varvec{\varLambda }\) to perform dimensionality reduction. At testing stage, the test feature vectors are also projected to the discriminative low-dimensional KLDA space and any linear classifier can be deployed for label estimation. We used the nearest neighbor after KLDA for simplicity.

Sparse Discriminant Analysis (SDA)

SDA learns discriminative feature representations as sparse linear combinations of the given features. In our case, SDA will learn sparse combinations of statistical and frequency domain features to represent the sound signals more discriminatively. Let \({\mathbf {Q}} \in {\mathbb {R}}^{g \times c}\) be a indicator matrix where \(Q_{ij}\) indicates whether the ith observation belongs to the jth class. The formulation of SDA involves a combination of the optimal scoring criterion and the elastic net [19]

$$\begin{aligned}&\underset{\varvec{\beta }_i, \varvec{\theta }_i}{\min } \left( \Vert {\mathbf {Q}}\varvec{\theta }_i - {\mathbf {G}}^{\top } \varvec{\beta }_i \Vert + \lambda \Vert \varvec{\beta }_i \Vert _1 + \gamma \Vert \varvec{\beta }_i \Vert ^2 \right) \nonumber \\&s.t.~~\frac{1}{g}\varvec{\theta }_i^{\top }{\mathbf {Q}}^{\top }{\mathbf {Q}}\varvec{\theta }_i = 1, \varvec{\theta }_i^{\top }{\mathbf {Q}}^{\top }{\mathbf {Q}}\varvec{\theta }_l = 0 \forall l < i \end{aligned}$$
(7)

where \(\varvec{\theta }_i \in {\mathbb {R}}^{c}\) is the score vector and \(\varvec{\beta }_i \in {\mathbb {R}}^{p}\) is the coefficient vector. The SDA algorithm in [19] is used to compute the solution \({\mathbf {B}} = [\varvec{\beta }_1,\ldots ,\varvec{\beta }_{(c-1)}]\). The training features are then projected on \({\mathbf {B}}\) for a low-dimensional representation.

At testing stage, the test feature vectors are also projected on \({\mathbf {B}}\) and a linear classifier is used for label estimation. We used the nearest neighbor for simplicity.

Experimental Settings

To simulate the practical settings for sound-based fault classification, classification was conducting on the sound signals measured in different sessions. For every fault class, sound signals measured in one session were randomly chosen as testing data and signals measured in the rest of the sessions as training data. Hundred experiments were conducted by randomly generating different training and test combinations. Average classification accuracy of these 100 experiments was then reported. The simplest test settings were chosen for the classification algorithms by setting parameter \(\beta \) to 1 (linear kernel) in KLDA and SVM. \(C=100\) in SVM. The experiments were performed with the base setting to assess the effectiveness of the proposed features. Careful parameter tuning can be performed to further enhance the accuracy of classification further.

Table 1 Classification of faults with statistical analysis

Results

The statistical, frequency and time–frequency feature vectors as discussed in Sect. 3 were used to train different classifiers as discussed above. The results are tabulated in Tables 123 and 4 for statistical features, frequency domain features, frequency domain features after passing the signal through bandpass filters and time–frequency features, respectively. There are a number of other statistical, frequency and time–frequency features, which can be utilized; however, in this research work only a subset of different domains are selected.

The statistical features, including kurtosis, skewness, standard deviation (STD) and the combination of these three, were subjected to SVM, KNN, SDA and KLDA for classification into healthy, ball, inner race and outer race faults. For these statistical features, the results are not very encouraging. The highest accuracy is given by KLDA with the combination of all the given statistical features as shown in the last entry of Table 1.

The frequency domain features and their results from different classifiers are given in Table 2. The features include FFT, envelope detection, PSD and the combination of all these. The power of the FFT and envelope signals at the frequency range with high spikes were calculated as feature vectors. In the first attempt as given in column nos.1, 2 and 3 of Table 2, only the FFT, envelope and PSD data were forwarded to the classification algorithms. In the second attempt, all the three types of data were concatenated as a single vector and forwarded to the classification vector. Considering the feature vectors of single entities like FFT or PSD or envelope, etc., the SDA outperforms all the other classification methods for PSD, followed by KLDA with PSD and envelope detection. However, in case of combining all these feature vectors, the SDA gives better results followed by KLDA, KNN and SVM.

Table 2 Classification of faults with frequency domain analysis

The results after passing the audio signal through the band pass filter and averaging the FFT and PSD values, RMS values of PSD and RMS values of FFT and the combination of all these features, are given in Table 3. In this case, the KLDA gives better results for average FFT and for combined features, followed by the average PSD values and RMS of the PSD. The RMS values of the FFT give the worst results with all the classifiers as compared to frequency and time–frequency features, discussed below.

Table 3 Classification of faults with frequency domain using bandpass filter

The time–frequency results are given in Table 4, using STFT as the feature vector. Hanning window is used in STFT as stated in Sect. 2. The Fourier transform values resulted from each Hanning window are averaged before giving to the classification algorithm, in order to reduce the number of comparison points and reduce computational burden on the classification algorithm. Here again the KLDA outperforms the other classifiers.

Table 4 Classification of faults with time–frequency (STFT) analysis

From the results given in these tables, it can be concluded that in case of the classifiers, the KLDA accurately classified the signals into different faults followed by SDA, KNN and SVM. As far as the feature vectors are concerned, average FFT and combination of all the features give better results, followed by the average PSD and the RMS values of PSD.

Conclusion

In this work, we employed audio signals to detect and classify faults in rotating machinery that use bearings. The sounds of machinery with and without faults were recorded using a single microphone. Different statistical, time and time–frequency features of the signals were calculated and analyzed for different patterns arising from different faults. These features were used to train different machine learning models, and the faults were successfully detected and classified into ball, inner race and outer race faults. Best results were achieved with the average FFT followed by average PSD and the RMS values of the PSD. The time–frequency features gave relatively lower results and worst results by the statistical features. In the case of sound signals, the KLDA outperformed other classification algorithms followed by SDA and KNN. From these results, it can be concluded that the sound signals can be used as an alternate method of fault detection to that of vibration and acoustic analysis for fault classification in rotating machinery, at much lower cost and with simple and remote installation.