1 Introduction

Rolling element bearings (REBs) are common machine elements equipped in all kinds of rotating machinery in various industries. Although they are designed for a very long running life, they are fragile mechanical parts, and their lifetimes depend on the operating conditions. Therefore, accurate condition monitoring is required to avoid unexpected operational failures and improve the reliability of plant operation (Sullivan et al. 2004). Moreover, REBs are at the heart of condition monitoring techniques since they are the most common cause of failure in induction motors (i.e., in more than 50% of the cases) (Appana et al. 2017).

Fault signals for degraded bearings provide predictive patterns over successive oscillations with periodic fault signatures when rolling components impact with the fault. To detect and diagnose bearing defects, several studies have proposed using traditional methods and deep learning methods. Signal-based methods are popular traditional methods (Kang et al. 2016; Patil et al. 2016; Uddin et al. 2014; Zheng et al. 2016). In this approach, first, the eminent features are extracted from the acquired complex signals in the time domain, frequency domain, or time–frequency domain (Appana et al. 2017; Flandrin et al. 2004; Kang et al. 2016; Yang et al. 2007; Zhou et al. 2012). Second, a discriminatory feature vector selecting the eminent features is built that can predict anomalous fault conditions. Finally, the abnormality is classified using classifiers, such as k-nearest neighbors (k-NN) (Kang et al. 2016), artificial neural networks (Patil et al. 2016), self-organized maps (SOMs) (Zheng et al. 2016), and support vector machines (Uddin et al. 2014). Unlike traditional signal-based techniques that rely on handcrafted features based on domain expertise, a technique using a deep neural network (DNN) (Ahmed et al. 2016; Chen et al. 2015, 2016; Ince et al. 2016; Janssens et al. 2016; Karpathy et al. 2014; Ruchika Malhotra 2017; Sanjeev Kumar 2017; Wang et al. 2009; Yu et al. 2017) automates the feature extraction process for effective fault classification (Karpathy et al. 2014). Many variants of DNN architectures for fault diagnosis have been proposed, such as deep Boltzmann machines (DBMs) (Chen et al. 2016), stacked deep auto-encoders (SDAs) (Ahmed et al. 2016), recurrent neural networks (RNNs) (Wang et al. 2009), and convolutional neural network (CNNs) (Chen et al. 2015; Christian Gerber 2017; Guo et al. 2016; Ince et al. 2016; Janssens et al. 2016; Lee et al. 2016; Zhang et al. 2017).

For early fault detection, acoustic emission (AE) signals are very effective. Due to their inherent properties, AE signals are capable of describing low-energy signals of the intrinsic symptoms of bearing faults in low-speed rotating machines (Li et al. 2012; Sun et al. 2015). Traditional signal analysis approaches (Kang et al. 2016; Patil et al. 2016; Uddin et al. 2014; Zheng et al. 2016) are easily applicable to AE signals, with representations of characteristics in a feature space. In real scenarios, however, the rotational speed of the shaft is not usually constant due to variations in the working conditions of rotating machinery. Thus, the fault signal characteristics are subjected to random variations, making the bearing signal nonstationary. For this reason, design variations are essential by retraining a new classifier for each rotational speed. Immense research is needed to determine the eminent features that are invariant to changes in the working conditions of the rotating machine and variations in the rotational speed of its shaft.

Deep learning techniques, especially CNNs, can extract discriminative features from their inputs during the learning process using convolutional operations, and they can achieve high classification accuracy by minimizing the error criterion. Recently, various utilizations of CNN for fault diagnosis have been proposed by researchers (Chen et al. 2015; Guo et al. 2016; Ince et al. 2016; Janssens et al. 2016; Lee et al. 2016; Zhang et al. 2017). Chen et al. (2015) extracted time-domain statistical feature parameters from ball-pass frequencies to combine the load and speed information of the original signals for feature vector construction. A classification of gearbox faults independent of its speed variations was accomplished by employing a CNN as a classifier alone. Janssens et al. (2016) presented two ways to utilize a CNN: (1) features extracted from a preprocessed signal form a high dimensional feature vector that is fed to the CNN as a classifier, and (2) the preprocessed signal is directly fed to the CNN as an automated feature learning architecture. However, the ConvNet stack architecture cannot achieve high performance compared to conventional signal processing-based techniques in terms of classification accuracy when raw vibration data are directly applied to the CNN architecture. Lee et al. (2016) experimentally investigated the effectiveness of different combinations of layer representations utilized in CNN architectures when the input was corrupted with various levels of noise. However, the study did not address the problem of CNN model creation, which can learn to be invariant under rotational speed changes. Zhang et al. (2017) described an effective way to transform a one-dimensional (1-D) raw vibration signal into a two-dimensional (2-D) image representation to ensure that the CNN architecture learns the powerful features from its input. Guo et al. (2016) proposed an effective fault pattern recognition approach by adding an adaptive learning technique to a deep neural network. However, the complexity of an adaptive learning procedure increases by many folds when shaft speeds remain unstable. Ince et al. (2016) proposed a modified CNN architecture instead of reshaping the temporal data that are fed into the network inputs. With a one-layer ConvNet architecture, 1-D convolutional operations are applied to the preprocessed signal data to perform both feature extraction and fault classification procedures for the current motor signal in an effective manner. This method could achieve 97.4% in overall accuracy. However, the accuracy of this method degrades when it is applied to multiple faults and fluctuations in operating conditions. These CNN techniques can be applied to fault diagnosis for classifying faults. However, they cannot properly discriminate faults on the variance of RPMs with raw time-domain signals due to the background noises inherent in the raw signals. Moreover, processing AE signals requires large memory due to the huge amounts of information involved.

To address the problem, this study proposes a reliable fault diagnosis technique for bearings with varying rotational speeds using a CNN-based method that learns about the bearing faults from the envelope spectrums (ESs) of the raw AE signals, which contain characteristic fault frequencies. The proposed scheme is validated on a low-speed rolling element bearing dataset by training the CNN on one-rotational-speed signal samples and testing it on the remaining signal instances corresponding to various rotational speeds. The results demonstrate that the classification accuracy can withstand the nonstationary conditions of rotating machinery with varying rotational speed.

The rest of the paper is organized as follows. Section 2 describes the data acquisition from a fault machinery simulator that is equipped with various single and multiple bearing defects. Section 3 presents the proposed methodology for bearing fault diagnosis based on AE signal analysis, and Sect. 4 evaluates the performance of the proposed method. Finally, Sect. 5 provides the conclusions.

Fig. 1
figure 1

Diagrammatic illustration of the fault simulator

Fig. 2
figure 2

a Custom fault simulator, b 2- channel AE PCI board for data acquisition system

Fig. 3
figure 3

Bearings with different fault conditions: a BCI, b BCO, c BCR, d BCIO, e BCIR, f BCOR, and g BCIOR

2 Data acquisition

Figure 1 illustrates the experimental fault simulator that is used in this study. It comprises a three-phase induction motor connected to a drive-end shaft (DES), which transmits torque to a non-drive-end shaft (NDES) through a gear box. For each shaft on both ends, a cylindrical rolling element bearing (FAG NJ206-E-TVP2) is attached. A load is applied through the adjustable blades, which are connected to the NDES via a belt and pulley. The rotational speed of the bearing is measured by a displacement transducer, placed on the unloaded side of the NDES. In this experiment, a general purpose wideband frequency AE sensor (PAC \(\hbox {WS}{\upalpha }\)) is used to acquire data when placed on top of the NDES bearing at a 21.48 mm displacement. A PCI-2-based system is utilized to quantize the continuous AE signals used for this study. An overall view of the data acquisition system is shown in Fig. 2.

A crack of 12 mm in length, 0.35 mm in width, and 0.3 mm in depth is engraved on the bearing individual components. For acquiring data, the bearings are equipped with single and multiple faults conditions (see Fig. 3): no crack on bearing (BNC), crack on bearing inner raceway (BCI), crack on bearing outer raceway (BCO), crack on bearing roller elements (BCR), crack on bearing inner raceway and roller elements (BCIR), crack on bearing inner and outer raceways (BCIO), crack on bearing outer raceway and rollers (BCOR), and crack on bearing inner and outer raceways and rollers (BCIOR). The readings are collected for one-second durations when the data is sampled at 250 kHz with multiple fault conditions and at rotational speeds of 300, 350, 400, 450, and 500 RPM.

3 Proposed fault diagnosis scheme

The proposed methodology consists of two essential stages: envelope spectrum extracted from the bearing AE signal and fault classification by the CNN using the envelope spectrums as input to the network. ES extraction serves as a preprocessing step to the CNN, which separates the effects of specific faults from the high-frequency AE-based bearing signals, representing them in the low-range spectrum. Then, the CNN effectively learns about bearing faults from the characteristic defect frequencies and its variances in working conditions. Figure 4 shows an overall flow diagram of the bearing fault diagnosis.

Fig. 4
figure 4

Overall flow diagram of the proposed bearing fault diagnosis method

Fig. 5
figure 5

Bearing signals illustrating different fault conditions of bearing rotating at a speed of 400 RPM

Fig. 6
figure 6

Variations in fault signatures of the bearing with faults in outer raceways, roller elements, and inner raceways

3.1 Preprocessing raw AE signal as envelope spectrum

When the sensors quantify the localized faults while the machine is in the operating condition, the rolling element bearing signal produces a series of impulse responses as the roller element strikes the faults (Randall and Antoni 2011). Therefore, the bearing tone or bearing signal is distinct when the single or multiple components of the bearings are defective. Figure 5 shows the bearing signals obtained in the data acquisition mode. By understanding the design of the mechanical device and its operation, the envelope spectrum (ES) can provide the location at the fault-to-component level through the bearing characteristic (or defect) frequency (BCFs) peaks.

In general, an ES is a process of demodulating high-frequency resonances to low-range fault frequencies, thus providing good resolution for diagnosing the bearing faults. Therefore, the faults can be observed in an ES at any of the three BCFs, such as ball-pass frequencies in the inner and outer raceways (BPFI, BPFO) and ball-spin frequency (BSF) which are as follows (Randall and Antoni 2011):

$$\begin{aligned} \hbox {BPFI}= & {} \frac{N_{\mathrm{roller}} \cdot F_{\mathrm{shaft}}}{2} \left( 1+\frac{B_\mathrm{d} }{P_\mathrm{d}}\cos \phi \right) \nonumber \\ \hbox {BPFO}= & {} \frac{N_{\mathrm{roller}}\cdot F_{\mathrm{shaft}}}{2} \left( 1-\frac{B_\mathrm{d} }{P_\mathrm{d}}\cos \phi \right) \nonumber \\ \hbox {BSF}= & {} \frac{N_{\mathrm{roller}} \cdot F_{\mathrm{shaft}}}{2.B_\mathrm{d}}\left( 1-\left( {\frac{B_\mathrm{d}}{P_\mathrm{d}}\cos \phi }\right) ^{2}\right) , \end{aligned}$$
(1)

where \(\phi \) is the contact angle, \(N_{\mathrm{roller}}\) is the number of roller elements, \(F_{\mathrm{shaft}}\) is the shaft speed in hertz, \(P_\mathrm{d}\) and \(B_\mathrm{d}\) are the pitch and roller diameters.

The ES is obtained using a Hilbert transform-based extraction method (Ming et al. 2011; Rauber et al. 2015) via a two-step procedure:

Step 1 The Hilbert transform (HT) of the real signal x(t) can be computed using

$$\begin{aligned} h(t)=H\{x(t)\}=\frac{1}{\pi }\int \limits _{-\infty }^\infty {\frac{x(\tau )}{t-\tau }\hbox {d}\tau }. \end{aligned}$$
(2)

The magnitude of the analytical signal from the Hilbert transform can be represented as

$$\begin{aligned} x_{ht} (t)=\left| x(t)+j \cdot h(t)\right| =\sqrt{x^{2}(t)+h^{2}(t)}. \end{aligned}$$
(3)

Step 2 To obtain the envelope spectrum, a Fourier transform is applied to this analytical signal:

$$\begin{aligned} E(t)=\left| \hbox {FT}(x_{ht} (t))\right| . \end{aligned}$$
(4)

Since the amplitudes and frequency peaks indicate the severity and the source of the problem, respectively, different fault signatures can be obtained, as shown in Fig. 6. However, when the RPM is varied, the frequency components also vary due to changes in the cycles per minute (Felten 2003). Therefore, for a particular bearing signal within a fault category, the ES is the same with different RPMs, but the peak responses are varied with a shift. The variations in peak response (P) can be formulated as follows:

$$\begin{aligned} P=\left( {\hbox {BCF}}^{*}\frac{{\hbox {rpm}}}{60}\right) \cdot \end{aligned}$$
(5)

Corresponding variations in peak responses are observable due to change in rotational speeds as illustrated in Fig. 7.

Fig. 7
figure 7

Variations of the fault signature peaks (P) with variation of rotational speed

3.2 Convolution neural network

A CNN comprises three basic blocks: a convolution layer (CL), a pooling layer (PL), and a fully connected layer (FC). The CL and PL layers are applied to the input data in various combinations and numbers. In general, a deep CNN consists of a multiple stacked set of CL and PL. The researchers in Guo et al. (2016) attempted to determine the best architecture that could provide stability in overall classification accuracy for bearing fault diagnosis. Based on the suggestions, the proposed CNN-based fault diagnosis network had a two-layered convolution and pooling layer stack of

$$\begin{aligned} {\hbox {INPUT}}\rightarrow [[{\hbox {CL}}\rightarrow {\hbox {RELU}}] \rightarrow {\hbox {PL}}]^{*}2\rightarrow {\hbox {FC}} \end{aligned}$$
(6)

to learn from the envelope spectrums.

To apply the conventional CNN architecture for fault diagnosis purposes, the original 1-D input signal should be reshaped into a 2-D array to feed as the input to the network. However, this solution is not always suitable for real-life scenarios. To address this drawback, the 1-D array of the input ES is used directly without reshaping for both the kernels and the feature maps in this study.

In CL, the quality of the extracted features depends on the choice of the hyper-parameters, such as the kernel size, stride values, paddings, and numbers of filters (Karpathy et al. 2014). The kernel size and strides are chosen as scalars and are kept low to keep the learning deeper with respect to minor variations in the characteristic frequencies of the bearing signal. The number of convolutional filters is empirically determined to be eight by monitoring the variations in error rate (Lee et al. 2016). In this approach, no zero padding is used. All the forward propagation computations that utilize convolution operations are performed in 1-D because the input to the CNN is not reshaped. To reduce the likelihood of the gradient to vanish and maintain sparsity, a rectifier linear unit (RELU) is used as the activation function of all the nodes in the network. To reduce the dimension of the extracted features, the max pooling operation is performed in the PL. The FC is used as the last layer of the CNN to connect a classifier with the neurons that are connected to all the activation maps in the previous layer. To avoid overfitting of the network, a dropout technique is applied during the learning process. The dropout regularization rate is set based on the suggestions given by Srivastava et al. (2014) to obtain the highest variance and an equal probability distribution for every subnet. To improve the performance of the supervised learning, an FC with 64 neurons is enabled with 50% (0.5) dropping.

To accomplish the classification task, a Softmax classifier is utilized in this study. The Softmax function takes a vector of arbitrary real-valued scores and represents them as a vector of posterior class probabilities between zero and one which sum to one (Karpathy et al. 2014). In this study, the number of nodes in the Softmax layer is set to eight to compute the conditional probabilities for the available bearing conditions used in the study.

At the beginning of the learning process, a random initialization of the CNN parameters is used in this study. Then, with an aim to minimize the error at the output, the weights and biases are fine-tuned by a back-propagating error gradient across all layers of the CNN. The mathematical computations of the back propagation updates in 1-D are provided in the literature (Ince et al. 2016). Adam, an optimizing algorithm, updates the first-order gradients based on adaptive estimates of lower-order moments (moving averages of parameters). Thus, for controlling the learning rate, the Adam optimizer is best suited, although more computations are required for each parameter in each training step compared to other methods. Moreover, with a larger step size, the algorithm converges without fine-tuning of the hyper-parameter. Details about the Adam optimization algorithm with mathematical formulations can be found in the literature (Kingma and Ba 2014). The Adam optimizer controls the training process with a constant learning rate of 1e-3 as a processed signal is provided to the inputs of the network.

Fig. 8
figure 8

Architecture of the proposed CNN

To update the model parameters, a mini-batch containing 128 samples is used during the training of the network (Karpathy et al. 2014). The CNN architecture is shown in Fig. 8, and the dimensions of the hyper-parameters that are used in the architectural hierarchy are given in Table 1.

Table 1 Dimensions of the proposed convolution neural network

4 Performance evaluation

4.1 Experimental setup

Using the data acquisition procedure described in Sect. 3, 450 samples of one-second length each for every bearing condition and one particular RPM are grouped as a dataset. Therefore, a dataset comprises 3600 (450*8) instances for each given RPM. Figure 5 shows the AE signals with all eight bearing conditions from the 400 RPM dataset. For this study, a total of five datasets with varying rotational speeds are obtained as a result of the data acquisition. Three different experiments are performed by grouping the available datasets into an analysis dataset (instances corresponding to one particular RPM dataset) and an evaluation dataset (samples from the remaining four RPM datasets are merged). The details of the analysis and evaluation datasets for each experiment are presented in Table 2. This configuration of datasets is chosen to guarantee that the CNN can automatically extract discriminative features from the available data and classify the bearing faults invariant under fluctuations in the rotational speed.

Table 2 Experiments involving various datasets to evaluate the efficacy of the proposed method invariant to RPM

4.2 Effectiveness of envelope spectrum

As described in Sect. 3.1, extracting the ES on raw AE signals serves as a powerful simplification technique to extract information from the signal instances. In each sample of one-second length, the high-frequency raw signal of 250,000 discrete time samples is represented as 256 discrete defect low-range frequency samples. For analysis purposes, the spectrum is windowed with the bearing defect frequency to create a low range spectrum. All the windowed spectrums are normalized before feeding into the network. From Figs. 9 and 10, it can be observed that bearings with single and multiple faults have characteristic defect frequencies visible in the ES. Even when the rotational speeds are fluctuating, the fundamental bearing fault characteristics produce dominant frequency components with variation in peaks. Figure 9 shows the marking of expected peaks and their comparison to obtained peaks across different RPMs. Hence, the ES’s capturing important information about the bearing failure allows the CNN to learn and easily diagnose the multiple defects of the rolling element bearing parts under varying rotational speeds.

Fig. 9
figure 9

Envelope spectrum of raw signals obtained from a bearing with a single fault

Fig. 10
figure 10

Envelope spectrum of raw signals obtained from a bearing with multiple fault

4.3 Effectiveness of convolutional neural network

To demonstrate the usage of the proposed fault diagnosis framework in real-time applications, all the CNN configurations are deployed in 1-D. The CNN learns about the bearing faults and the effects of rotational speeds from the ES extracted from the AE signals. The two-layered ConvNet stack provides a stable output in terms of classification accuracy. The optimal hyper parameters, such as the empirical choice of 8 filters in each convolutional layer and the 50% dropout rate, are decided by the training network for different RPMs. Making the learning rate used by the Adam optimizer be constant reduces the complexity of the parameter update procedure. Short training appears to be an adequate strategy for achieving good fault classification accuracy because the layered learning process converges quickly. The number of training iterations required to solve the trade-off problem between convergence and overfitting is experimentally determined. To reduce the classification error to values less than 0.5%, the number of training iterations is set to 50 for all the experiments.

4.4 Performance evaluation of proposed fault diagnosis scheme

The classification performance of the proposed methodology is validated using an average classification accuracy (ACA) metric, formulated as follows:

$$\begin{aligned} {\hbox {ACA}}=\frac{\sum _{N_{{\mathrm{classes}}}} {\left( \frac{N_T}{N_N}\right) }}{N_{{\mathrm{classes}}}}, \end{aligned}$$
(7)

where \(N_{T}\) is the number of samples in class C that are correctly classified, \(N_{N}\) is the number of instances that belong to class C, and \(N_{{\mathrm{classes}}}\) is the number of fault types or classes in the study. The accuracy estimation is carried out multiple times via 20- fold cross-validation.

To verify the classification capabilities of the proposed framework (Method 5), its performance is compared against those in four recent studies using traditional signal-based methods (Methods 1–2) and deep learning methods (Methods 3–4). The classification accuracies of the various bearing fault conditions and average classification accuracies, obtained over three experiments, are listed in Table 3.

Fig. 11
figure 11

Comparison of average classification accuracy for methods 1–4 when training and testing samples are from the same RPM dataset

Figure 11 shows comparisons of the average classification accuracies for methods 1–4 when training and testing samples correspond to the same rotational speeds. In Method 1 (Kang et al. 2016), the hybrid feature subset is extracted from the fault signals and used to classify multiple bearing faults using the k-NN algorithm. A subset of hybrid features is extracted from the time-domain signals, each with a 5-s length, and it comprises one time-domain feature [i.e., the square root of amplitude (SRA)] and three frequency-domain features [i.e., frequency center (FC), root-mean-square frequency (RMSF), and root variance frequency (RVF)].

From Fig. 11, it can be observed that when the training and testing samples are from the same dataset, the classification is perfectly accurate. The statistical parameters in the frequency domain can provide the dominant information about the bearing fault signatures. However, the frequency components are highly susceptible to rotational speeds. Therefore, the performance of the k-NN based approach decreases from absolute to \(72\,\pm \,2\%\) accuracy for the rotational speed-invariant classification task. This combination gives the optimal classification accuracy when the k value (for k-NN) is set to 8. Method 2 (Kang et al. 2015), kernel discriminative feature analysis (KDFA) is applied to a multiclass SVM-based classification of the bearing faults. First, the wavelet-based fault features, namely wavelet energy and wavelet packet node entropy, are extracted from the 5-second length signals. Then, the discriminative features are selected to train individual classifiers to identify patterns from the subset of samples that belong to one RPM dataset. Since the energy levels vary with the changing the rotational speeds, the decision boundaries tend to be inappropriate for detecting the samples that belong to datasets for other RPMs. As shown in Fig. 11 and Table 3, the classification accuracy changes from perfect classification to \(76\,\pm \,2\%\) when the training and testing samples for the experimental scenarios are changed.

Table 3 Classification accuracies of various fault conditions via 20 times cross-validation

In Method 3 (Zhang et al. 2017), the 1-D temporal vibrational signal is converted into a 2-D image and fed to a single-body CNN architecture, which extracts the features and classifies the faults. Each fault signal is windowed and reshaped to a \(360 \times 360\) image representation, which generally contains more information than that for two-time cycles.

Since the CNN extracts features from its inputs automatically, many irrelevant features overlap with the actual bearing fault characteristics, with redundant information being presented in the AE signal. This degrades the classifier capability to learn about the features for accurate classification. This can be observed (see Fig. 11) even when the classifier is trained and tested with samples from the same RPM datasets. To make the classifier learn efficiently, appropriate sampling reduction techniques must be applied to the AE signals.

In Method 4 (Ahmed et al. 2016), a sparse auto encoder is used to learn and extract the features from the raw signals. A Softmax classifier is appended at the end of the network to classify faults. The fine-tuning of network weights is done using back propagation. This method utilizes samples from the same rotational speed to train and test.

It can be observed that the time-domain characteristics in Methods 3 and 4 are not sufficient to discriminate the bearing faults when testing the accuracy on different RPM datasets. The amplitude values in the time domain exhibit slight variations for various fault types under variable speed conditions when the energy levels are low. Such weak features cannot provide discriminative variations for the classifier, and the classification accuracy is deteriorated. As shown in Fig. 11, when trained and tested with individual RPMs, these two methods cannot effectively learn because the high sampling rate also affects the quality of the raw signal.

As expected, from the results in Table 3, the proposed fault diagnosis framework with its specific deep learning configuration provides a more reliable diagnostic performance than its counterparts from existing methods. In each experiment, the number of samples classified (where the testing datasets correspond to four different RPMs) is fourfold larger than the number of samples used for training (one RPM dataset). Therefore, the classification capability is robust with \(86\,\pm \,1\%\) accuracy that is insensitive to fluctuations in rotational speed. As the network is trained with one of three RPM datasets (350, 400, and 450), and the RPMs of the remaining testing datasets vary between 300 and 500 RPM (excluding the RPM dataset used for training), the proposed approach demonstrates stronger adaptability to real-life applications. This result can be explained as follows: the bearing fault characteristics appear as defect frequencies in ES, and this clear representation allows the CNN to learn about the defects and variations in the rotational speed with high precision.

5 Conclusions

This paper proposed a reliable fault diagnosis method for bearings with varying rotational speeds using envelope spectrums (ESs) and a CNN. In the proposed method, an ES is a simplified form of an AE signal, and it is used by the CNN network to learn about the fault signatures. Extraction of the ES provides good resolution of the fault signatures because a high sampling rate signal is represented as a low-range fault frequency. We performed three different experiments for real-life working conditions, where the RPM values were varied slightly to represent working conditions. The proposed fault diagnosis framework improved the classification accuracy from 10 to 13% compared to traditional signal-based approaches for bearing faults under different rotational speeds and simplified the approach by automating both optimal features extraction and the classification processes. Moreover, the proposed method provided the approach for acoustic emission analysis-based deep neural networks with classification accuracy approximately 86% which is invariant to rotational speeds.