1 Introduction

Modern engineered systems are becoming increasingly complex and are operating under harsh and uncertain conditions; thus, systems are now more vulnerable to system breakdowns. On the other hand, in the past 10 years there has been a revolution in the Internet-of-Things (IoT), big data analytics, and artificial intelligence (AI). Through this revolution, ideas such as deep learning have risen from obscurity to provide a collection of innovative new techniques, thus achieving human-level performance in image recognition and gaming [1]. These technical trends will undoubtedly have a profound effect on an emerging discipline, Prognostics and Health Management (PHM), as it becomes a core technology for the 4th Industrial Revolution. An emerging discipline, PHM ensures cost-effective operation and management of engineered systems by protecting the engineering assets from potential hazards and sudden breakdowns. PHM also increases the efficiency, reliability, and availability of the engineering assets. To this end, PHM is concerned with the presence of faults (fault detection), analysis of fault type and location (diagnosis), and forecast of future health condition and remaining useful lifetime (RUL) (prognosis) [2,3,4,5,6,7,8,9,10]. PHM also examines decision-making and feedback to provide improved condition-based maintenance (CBM) strategies. A generalized representation of the main PHM processes is shown in Fig. 1.

Fig. 1
figure 1

Main processes of prognostics and health management (PHM)

PHM, which aims to detect machine breakdown and prevent consequent accidents that bring economic losses, is a wide research domain. This paper focuses on reviewing and summarizing contemporary PHM techniques applied to rotating electrical machines (REMs). REMs are at the heart of most engineering processes (due to their relatively low price and operational ease [11], [12]) and REM failures are one of the foremost causes of breakdown in industry, causing high costs of operating maintenance. Furthermore, rolling element bearing (REB) faults account for 45–55% of REM failures [13, 14] and for about 41% of motor faults, followed by stator faults (37%) and rotor faults (10%) [15].

Existing model (physical/mathematical)-based studies on REB PHM suffer from many difficulties. This is because the noisy and complex working conditions limit the development of the required model as shown in Fig. 2a, with the required precision. In addition, existing models, especially physics-based models, cannot be updated in real time (on-line) with newly measured data. Data-driven approaches, as opposed to model-based approaches, are gaining in popularity because they are model-free techniques. In addition, there have been significant advances in the development of sensors, sensor networks and computing systems. Existing data-driven techniques require extra information, more data acquisition equipment to implement, and additional measurements, such as vibration, temperature, acoustic emission, sound measurement, oil debris, laser displacement, stator current monitoring [16], or not such rotor speed signal monitoring [17]. The acquired signals contain the fault information and characteristics; signals must be preprocessed first, and then different features are extracted to better understand the REB health status. It is crucial to recognize that those signals often have a low signal-to-noise ratio and non-stationary statistical parameters due to the actual harsh operating conditions in industry (e.g., high mechanical load, time-varying speed, mechanical shocks). These factors make standard data-driven REB PHM methods difficult [4] and limit their effectiveness, performance, and flexibility. Therefore, PHM efforts for REBs have focused on extending and/or improving the existing standard data-driven REB PHM methods or completely developing other approaches, referred to as smart data-driven approaches. For example, shallow learning-based PHM (SL-based PHM) and deep learning-based PHM (DL-based PHM) techniques, as seen in Fig. 2b, c, have emerged as alternatives to model/physics-based approaches [18,19,20]. These data-driven approaches have become more and more attractive due to the widespread deployment of low-cost sensors and their connection to internet, introducing the phenomena of big data. Thus, the aim of this paper is to review and summarize the most recent intelligent PHM techniques applied to REB fault detection, diagnosis, and prognosis, providing a reference for further studies on the related topics. Therefore, this paper first discusses and classifies shallow learning algorithms, then it reviews the most advanced techniques, deep learning-based rolling element-bearing fault detection, diagnosis, and prognosis.

Fig. 2
figure 2

Comparison between: a physics/math model-based PHM technique, b shallow learning-based PHM technique, and c deep learning PHM technique

Taking this into consideration, this paper will present a brief description of the different bearing failure modes, and a comprehensive description of the different health features (indexes, criteria) used for REB fault diagnostics and prognostics, with the goal of providing an overall platform for researchers, system engineers, and experts to select and adopt the best fit for their applications. This paper is organized as follows: Sect. 2 briefly introduces the different bearing failure modes and their causes, followed by a comprehensive representation of the different health features (indexes and criteria). The different existing shallow-learning algorithms for REB PHM are detailed in Sect. 3. Section 4 provides the most recent investigations and studies that are based on the hottest subfield, deep learning-based REB fault detection, diagnosis, and prognosis. Finally, a summary and concluding remarks are given in Sect. 5.

A prior survey paper [21] gives a review of the emerging research work related to deep learning and new trends related to its use in machine health monitoring for different applications and systems. In addition, the review paper of Zurita et al. [22] mainly reviewed the state-of-the-art vibration condition-based monitoring of gears and bearings that are based on advanced digital signal processing techniques and artificial intelligence methods. In contrast to these prior works, this paper focuses only on reviewing contemporary learning algorithms (i.e., the shallow learning algorithms and the deep learning algorithm and its variants) for REB fault detection, diagnosis, and prognosis techniques. Contemporary PHM techniques are summarized as follows.

Modern engineering systems are embracing more and more user-friendly data acquisition tools and low-cost sensors that are connected to the internet. Therefore, PHM researchers and practitioners are adopting contemporary techniques, i.e., smart data-driven approaches—SL-based PHM and DL-based PHM techniques—that have been developed in the last decade. These techniques aim to synthesize information available from the acquired data to better represent the system’s health condition. Further, the latter (i.e., DL-based PHM) extracts the best-suited features from big data and better represents the system health condition in a hierarchical architecture. With the propagation of acquired data, DL-based PHM techniques model the high-level representation of the complex multivariate nonlinear relationship behind the data without need for a profound understanding of the system physics; this eliminates the need for a significant amount of human labor. In contrast, SL-based PHM methods require a manual feature extraction step, which may require domain knowledge. Thus, these methods can face problems in extracting useful representations from big data.

Availing from the shallow structures (e.g., artificial neural network (artificial NN), support vector machine (SVM), etc.), SL-based PHM techniques were constructed. They consist mainly of four phases, as shown in Fig. 2b: data processing, hand designed features extraction, feature selection (e.g., using principal component analysis (PCA), linear discriminant analysis (LDA), independent component analysis (ICA), etc.), and model training. This process enables SL-based PHM techniques to achieve decent performance in dealing with fault detection & diagnosis in specific [22]. However, the huge amount of data in the PHM field makes it difficult for these SL-based PHM approaches to know and determine the best-suited features to be designed; further, these methods have other challenges [23]. This limits their performance, since the features must be manually designed and the four phases cannot be optimized simultaneously. Many studies have been performed based on the shallow learning technique to detect, diagnose, and predict the various REB faults, which used to describe conventional machine learning methods, as compared with deep learning methods. In this paper, SL-based PHM techniques are classified into statistical methods, neural network methods, and combined approaches, as shown in Fig. 3. Our aim is to provide a state-of-the-art review of SL-based PHM techniques and their application to REB PHM, as given in Sect. 3.

Fig. 3
figure 3

Classification of shallow learning-based PHM techniques

In light of the challenges of SL-based PHM techniques outlined above, DL-based PHM methods are considered to outperform the previous methods. DL-based PHM techniques seek to handle big data by modeling the high-level illustration behind the data. Thus, they automatically extract the best features, which are highly nonlinear and complex, via stacking multiple layers in hierarchical architecture, instead of handcrafting the optimum features using domain knowledge [21, 23]. Thus, they have four main advantages, as compared to others methods: (1) they achieve an end-to-end system, as shown in Fig. 2c, (2) they do not require intensive human labor and knowledge that would otherwise be necessary to handcraft the feature design, (3) they construct models with high hierarchical architectures and a nonlinear combination of multiple layers, (4) compared to SL-based PHM techniques, the DL-based PHM algorithm’s parameters are optimized simultaneously. Our classification of the different existing DL-based PHM methods for REB PHM applications follows Zhao et al.’s [21] classification; methods are classified into (deep) convolutional neural network (CNN) approaches, (deep) recurrent neural network (RNN) approaches, restricted Boltzmann machine (RBM)-based deep neural network (DNN) approaches, and autoencoder (AE)-based DNN approaches, as shown in Fig. 4, with the aim of providing a comprehensive review of DL-based PHM methods and their applications to REB PHM, as given in Sect. 4.

Fig. 4
figure 4

Classification of deep learning-based PHM techniques

2 Fundamentals of rolling element bearing (REB) prognostics and health management (PHM)

In industry, the health of many machines depends on the robustness and reliability of the REBs. Failures may appear in REBs during operation or before (i.e., during the manufacturing process). From a prior FMECA (failure modes, effects, and criticality analysis) study of servo motors, which are the core component for mechanism control of electrical machinery, bearing faults were shown to have the highest frequency, severity, and criticality [24]. Therefore, detection, diagnosis, and prognosis of these defects are important for prognostics and health management, as well as for quality inspection of bearings [25].

2.1 Bearing failure modes

It does not require severe REB failure to induce vibration, noise, or even sudden breakdown of equipment; tiny faults such as cracks, crushes, wears, indentation, etc. will also cause breakdowns. These different faults can be caused by a wide range of factors. Flaking, pitting, spalling, rusting, corroding, creeping, and skewing can all lead to failure [26]. From ISO 15243 [27], the most common faults are fatigue, wear, corrosion, electrical erosion, plastic deformation, and fracture & cracking. Each is briefly introduced below.

  • Fatigue begins as a tiny crack on the bearing surface (rollers or races) due to a material structure change, which is caused by repeated stress in the contact areas.

  • Wear comes from the presence of dirt or foreign particles inside the bearing due to inaccurate sealing or inadequate lubrication (contamination).

  • Electric erosion is damage (in the form of craters) in one of the bearing parts (rollers or races) due to a passing through the bearing of an electric current.

  • Corrosion comes from the presence of water or corrosive agents inside the bearing due to damaged seals, acidic lubricants, or a sudden high change of operating temperature.

  • Plastic deformation generates mainly when the bearing is subject to an excessive load that results in an indentation of the raceways.

  • Fracture and cracking results from the stress that comes from rough treatment (impacts) or from cyclic stress. Additionally, fracture and cracking can be caused by high heating (thermal).

2.2 REB health features

Rolling element bearing PHM techniques often use different sensors to collect several raw physical signals (vibration, stator current, temperature, rotor speed, etc.); the result is the so-called big data phenomena. Dozens of indices or criterion (i.e., features) are usually extracted from those raw signals in the time, frequency, and time–frequency domains, to detect, diagnose, and predict the health condition of the REB system. This section attempts to provide a complete list of those features in Table 1 with the aim of providing a comprehensive platform for researchers, system engineers, and experts to identify and adopt those that best fit their needs.

Table 1 Various features used in REB PHM techniques

In addition, many techniques have been developed and applied. In the frequency domain [45,46,47,48], the power spectrum analysis, the fast Fourier transform (FFT), the discrete Fourier transform (DFT), the Welch method, and the noise cancellation techniques can be found. In the time–frequency domain [49,50,51], well-known techniques are the short-time Fourier transform, the Wigner–Ville distribution, the continuous wavelet transform (CWT), the discrete wavelet transform (DWT), and the Wavelet packet transform (WPT).

3 Shallow learning algorithms for REB PHM

This section presents a state-of-the-art review of SL-based PHM methods and their application to REB PHM. In attempt to organize and classify the diverse SL-based REB PHM techniques, which may originate from the artificial neural network (NN) or may not, three categories are proposed: statistical approaches, NN approaches, and combined methods. Further, the statistical approaches are sub-divided, according to the nature and the task of each algorithm, into LDA-based REB PHM, SVM-based REB PHM, K-nearest neighbor (KNN)-based REB PHM, extreme learning machines (ELM)-based REB PHM, and other non-NN algorithms applicable to REB PHM. The combined methods are the ones that utilize a non-NN algorithm with a NN method, or a NN algorithm with a signal processing technique, or a non-NN algorithm with a signal processing approach.

3.1 Statistical approaches for REB PHM

Several shallow learning algorithms exist that were constructed using a shallow architecture that benefits from the statistical properties of the data and uses this information to classify it to already known group [2]. The following section provides a detailed description of those statistical SL-based REB PHM techniques as applied to REB PHM. The structure typical of each algorithm is briefly introduced, and its application to REB PHM is outlined to highlight its challenges, its pros and cons, and its latest advancements.

3.1.1 LDA-based REB PHM

The LDA algorithm aims to find a linear combination of features that separates different classes well. It helps in the classification process by finding new projection directions in which, when data are projected in those directions, the within-mode distance decreases while the between mode distance increases [52]. Thus, it reduces the dimensionality by maximizing the ratio of between-class scatter and within-class scatter. Therefore, the main objectives of LDA are either to reduce the dimensionality or to perform classification. Figure 5 shows a descriptive example of LDA-based classification. As can be seen, if two classes are presented (red and blue), the LDA will map them into a new feature space following the projection lines ((a) and (b)) as can be seen in Table 2 in which they will be more linearly discriminant.

Fig. 5
figure 5

Example of linear discriminant analysis (LDA)

Table 2 Between- and within-class scatters of Fig. 5

The LDA algorithm has been used to improve classification of ball bearing faults according to their severity level [53]. LDA was also used as a dimensionality reduction technique to find the dimensions of a few features that best discriminate a set of features extracted from raw vibration signals [54]. Zhao et al. [55] proposed a trace ratio version of LDA, which uses the between-class scatter matrix to evaluate the separability of different classes and the within-class scatter matrix to evaluate the compactness within each class. The extended discriminative subspace learning method was used for dealing with the trace ration problem in linear discriminant analysis for a REB fault detection and diagnosis problem. A trace ratio LDA algorithm was also introduced by Jin et al. [56] and used to reduce the dimension and then to classify the motor bearing health conditions, which arose from single-point faults and generalized-roughness faults. Another form of LDA, called ∆-LDA, was proposed by Ciabattoni et al. [57] to deal with fault data dimension reduction and fault detection issues with application to REB fault detection. ∆-LDA was proposed to overcome the problem of a between-class scatter matrix trace very close to zero, which is the case when detecting different bearing faults. It did indeed improve the classification accuracy when the classes were overlapped. Evaluating the current feature generated by frequency selection in the stator current spectrum by means of LDA algorithm, a fault diagnosis of bearing damage was proposed in [58], in which the fault diagnosis was performed by the Bayes classifier.

3.1.2 SVM-based REB PHM

The classifier in machine learning and statistics learns from the data input given to it and then uses this learning to classify a new observation. The same can be done to detect and then diagnose various bearing faults (outer-race fault (ORF), inner-race fault (IRF), and ball bearing fault (BBF) or cage faults). One of the most used classifier-based PHM techniques is the SVM.

The basic idea of SVM is to first map the input data nonlinearly into a feature space; in this feature space, a linear decision function is constructed. Then, the inner product of the feature space is nonlinearly mapped to the original space [59]. Thus, the main purposes of SVM are classification and estimation. In machine learning, SVM is considered a supervised learning model that is used for classification and regression analysis. For classification, SVM finds the optimal separating hyperplane with maximum margin to build a maximum margin classifier. The margin is defined as the perpendicular distance between the support vectors, as can be seen in Fig. 6, where only those support vectors are used to determine the boundaries.

Fig. 6
figure 6

Example of support vector machine (SVM)

Numerous researches have modified SVM algorithms for various reasons. Sugumaran et al. [60] used the SVM and proximal-SVM (PSVM) classifiers to find this optimal number of time domain statistical and histogram features of a vibration signal. A hybrid, two-stage one-against-all SVM approach was proposed for REB fault diagnosis in [61] to predict the type of faults more accurately. In the first SVM stage, the vibration signal can be classified into either normal or fault. Then, the fault types are classified in the second SVM stage. In addition, one-class ν-SVM, which use only the normal state data, was used in an automatic bearing fault diagnosis [62]. To fully exploit the advantage of SVM, two multi-layer kernel learning models, supervised incremental local tangent space alignment (SILTSA)-SVM and supervised linear local tangent space alignment (SLLTSA)-SVM were proposed in [63] and applied to REB fault diagnosis. The proposed method combines the supervised method with the dimension reduction algorithms (ILTSA and LLTSA) [64]. In addition, to optimize the SVM parameters, which have significant impact on classification performance, an improved ant colony optimization (IACO) algorithm was proposed to determine the parameters, and then the IACO-SVM algorithm was applied to rolling element bearing fault detection [65]. More recent studies were performed to further investigate the use of SVM for REB bearing fault detection and diagnosis, including [66,67,68,69,70].

3.1.3 K-nearest neighbor (KNN)-based REB PHM

KNN is a non-parametric (i.e., the model structure is determined mainly from the data without any assumptions on the underlying data distribution), lazy algorithm (i.e., as opposed to an eager algorithm, it does not learn discriminative functions but uses all the training data in the classification step) used for classification, in which the existing (historical) data are grouped into several classes to be used to classify the new data. Thus, the main advantages of KNN are that the learning is very simple and easy to interpret (i.e., it has a physical meaning), and it is an effective classification method for noisy training data and complex target function, which makes it a well-suited algorithm for REB PHM [71]. However, there are also some disadvantages of KNN-based REB PHM. Specifically, since it is a lazy algorithm it needs to store the entire training dataset and thus needs to compare distance values for whole training samples; this is time- and power-consuming.

A descriptive KNN example is shown in Fig. 7. The test data (green multiplication sign) are classified to class 1 (blue circles) if k = 3, but if k = 6 the test data are classified to class 2 (red circles). Therefore, determining the parameter k is critical for accuracy of the KNN-based REB fault detection and diagnosis. However, the best choice of k depends on the data. If k is large, the effect of noise is reduced, but the boundaries between classes are less distinct; whereas, if k is small, strict boundaries can be obtained, but analysis may be vulnerable to noise and outliers (i.e., the overfitting problem may occur); this is the case when dealing with bearing fault detection and diagnosis [71].

Fig. 7
figure 7

Example of K-nearest neighbor

KNN was used first for fault detection and diagnosis of low speed (≤ 100 rpm) REBs in the year 1992 [72]. A combination of weighted KNN (WKNN) classifiers was proposed by Y. Lei et al., [73] to overcome the two previously mentioned disadvantages of KNN-based REB fault detection and diagnosis. The KNN was also combined with other classification methods to enhance the REB fault detection and diagnosis capability, such as with SVM [74], kernel PCA (KPCA) [75], the fuzzy C-means method [76], the binary differential evolution algorithm [77], or the K-star classifier [78]. More recently, an optimal KNN model was combined with KPCA to deal with bearing fault detection and diagnosis, in which the KNN was optimized using a particle swarm optimization method [79].

3.1.4 Extreme learning machine (ELM)-based REB PHM

ELM was proposed in 2006 by G. Huang et al. [80] to provide good generalization performance at an extremely fast learning speed. ELM offered improvement over the learning speed of feedforward neural networks (FNNs), which are very slow, especially in real-time applications [80]. The slow learning speed of FNNs arises for two main reasons: the FNNs extensively use slow gradient-based learning algorithms for training, and using such learning algorithms, all network parameters are tuned iteratively [80]. ELM uses a single hidden layer feedforward neural network (SLFNN), as shown in Fig. 8, that randomly chooses hidden nodes and analytically determines the output weights wi of the SLFNN. Thus, as can be seen the ELM can be represented as a linear system that employing an activation function, F(.), to generate the learned output y.

Fig. 8
figure 8

Basic structure of extreme learning machine (ELM)

To the authors’ knowledge, ELM was first applied alone to REB fault diagnosis system by Razavi-Far and Saif [81] to deal with the abilities of incremental learning in non-stationary environments and to detect and diagnose bearing faults under the class imbalance condition. The proposed ELM methods adopted: two state-of-the-art ensemble-based techniques, Learn ++.CDS (Concept Drift with SMOTE) [82], which was used to overcome the class imbalance issue in non-stationary environments, and the Learn ++.NIE (nonstationary and imbalanced environment) [83] to handle class-imbalanced data during the incremental phase in non-stationary environments. A more recent study that used ELM for REB condition monitoring was carried out by W. Mao et al., [84] in which they tried to solve the online imbalanced data problem that occurs when collecting data online in a sequential way and the number of fault data is much less than the number of the normal data.

3.1.5 Other statistical algorithms for REB PHM

Sugumaran et al. [85] investigated the effectiveness of an automatic rule learning-based decision tree for classification when employing a fuzzy classifier. The decision tree was used to select the different extracted statistical features from the vibration signals, and then multiple membership functions based on the generated ‘if–then’ rules were designed. Finally, a fuzzy inference engine was built and used to classify the REB health conditions based on predefined threshold. Then, they [86] proposed a decision tree based method for the use of the histogram features to improve the previous results in the case of small data points in the data set.

Other different non-NN methods were investigated to detect and diagnose an REB’s health state. Yu [87] proposed a supervised-learning-based local and nonlocal preserving projection (SLNPP) method; Kankar et al. [88] used learning vector quantization (LVQ) as a REB fault classifier. In Cao et al. [89], a novel fault diagnosis method based on semi-supervised fuzzy C-means (SFCM) cluster analysis was developed; and more recently, targeting the nonstationary and non-Gaussian characteristics of a vibration signal from a faulty rolling bearing, Han et al. [90] developed a VMD-AR (variational mode decomposition-autoregressive) model and investigated diagnosing REB faults using the random forest learning (RFL) classifier. The VMD was applied to decompose vibration signals where a series of stationary component signals were obtained, then, an AR model was established for each component mode. The models were used as fault characteristic vectors. Finally, a novel RFL classifier was considered for pattern recognition to diagnose different bearing faults.

Mohsenzadeh et al. [91] introduced a novel sparse Bayesian learning (SBL) algorithm called the relevance sample feature machine (RSFM), which had the capability of choosing the relevant samples and the relevant features simultaneously for regression or classification problems. Further, it was concluded that the RSFM had the advantage of avoiding overfitting, resulting in less system complexity during the testing stage, and better generalization. Wong et al. [92], successfully adopted a novel structure that is based on a pairwise-coupled sparse Bayesian extreme learning committee machine to intelligently and simultaneously diagnose bearing faults.

A bearing fault diagnosis technique was also presented by Shen et al. [93] based on a transfer learning (TL) technique, which was not limited to the same field [94]; it used singular value decomposition (SVD) [95] as its feature extraction tool. The authors describe the main idea of the proposed TL method as [93] “to utilize selective auxiliary data to assist target data classification, where a weight adjustment between them is involved in the TrAdaBoost algorithm for enhanced diagnostic capability. In addition, negative transfer is avoided through the similarity judgment, thus improving accuracy and relaxing computational load of the presented approach.”

Manifold learning (ML) [96] techniques are widely used in cluster analysis, image processing, bio-informatics, etc., [97,98,99]. However, ML techniques are rarely used for fault diagnosis, and were only used as a nonlinear time series noise reduction method applied to the analysis of gearbox vibration signals with snaggletooth in [100]. Recently, Wang et al. [101] proposed a novel machinery REB fault diagnosis approach based on a statistical locally linear embedding (S-LLE) manifold learning algorithm, which was an extension of LLE [102]. Another study, which applied the ML technique in combination with wavelet packet transform to detect weak transient signals for REB fault diagnosis, was carried out by Wang et al. [103]. This study proposed an extraction method, named waveform feature manifold (WFM), that used the binary wavelet packet transform to obtain the waveform feature space, which was then used to extract the weak signatures.

It should be noted that there are a few remaining learning techniques, such as the Bayesian learning (BL) [104] technique and the Widrow-Hoff learning (WHL) [105] algorithm. The authors did not find any study that applied these techniques to the bearing prognostics and health management field, although researchers may consider these techniques in the future.

3.2 Neural network approaches for REB PHM

Other SL-based REB PHM techniques that were constructed using a shallow structure originating from the artificial NN are grouped and reviewed in this subsection. It is worth noting that the deep learning methods originated as an extension of these NN-based techniques.

Artificial NNs are a statistical model inspired by the biological neural networks that constitute the human brain. The NNs typically consist of an input layer, a hidden layer, and an output layer, as shown in Fig. 9. The nodes, xi, in the input layer represent the normalized features extracted from the acquired signals. The output layer nodes, yj, are the two nodes that can have only binary levels when dealing only with bearing fault detection (i.e., they represent healthy and faulty bearings). For bearing fault diagnosis, more nodes must be added to the output layer to localize and identify the different bearing faults. The hidden layers are generated from the input layer based on imposing weights \(w_{ji}^{\left( l \right)}\). Using forward/backward propagation, proper weights, which minimize the cost value, are calculated from the labeled data. The output nodes are activated and their values are defined using activation functions. To train neural networks by gradient descent [106], the activation function should be differentiable; the nonlinear activation function adds nonlinear properties to the neural network. Different linear or nonlinear activation functions exist, such as the sigmoid function, the tanh function, the rectified linear unit (ReLU) function, the exponential linear unit (ELU) function, etc. [107].

Fig. 9
figure 9

Example of 2-layer artificial neural network (ANN)

As stated above, the weights to be used in the network are calculated using forward/backward propagation. For the gradient descent method, computing the error gradient with respect to each weight is needed to quantify the influence of each weight on the final error. Backpropagation is an efficient way to compute gradients of the cost function; it is commonly used to train the network [108]. The backpropagation procedure can be defined as follows: first, initialize the weights randomly, apply the forward propagation (through the neural network, to obtain output & cost), then apply the backward propagation (calculate the influence of each weight on cost; error gradient), and finally, update the weights by repeating those steps until the performance of the network is satisfactory.

Neural network (NN) techniques have been applied to the PHM field for different engineered systems [109,110,111]. One of the earliest works that used NN for motor REB fault diagnosis was performed by Li et al. [112]. Frequency domain features extracted from the vibration signal were first performed (i.e., using FFT), then a NN was trained to emulate the knowledge of the vibration experts, which are very expensive. Thus, motor REB fault diagnosis was achieved more efficiently and at a reduced cost. Another study [113] used time domain features (Irms, \(I_{{\sigma^{2} }} ,\)Isk, and Ikur6) for artificial NN-based bearing fault diagnosis instead of frequency domain features. Pandya et al. [114] used time–frequency domain features for NN-based REB fault diagnosis. They used the wavelet packet decomposition for feature extraction from the measured vibration signal. A comparison study [115] of three types of artificial NNs, the multilayer perceptron (MLP), the radial basis function (RBF) network, and the probabilistic neural network (PNN), for bearing fault detection was also performed. With the goal of automating the process of feature extraction, fault detection and identification was performed for REMs. A matching pursuit analysis was used to extract time–frequency domain features that were used subsequently as inputs to a feedforward neural network (FFNN) to classify the different bearing conditions (healthy, IRF, ORF, and BBF) [116]. Gebraeel et al. [117] proposed a way to predict the residual life from vibration-based degradation signals to estimate the bearing failure time. They developed two classes of models—a single bearing and a clustered bearing neural network—to perform REB fault prognosis. Different combinations of time, frequency, time–frequency domain features with an NN-based approach were also carried out to deal with REB fault detection and diagnosis [118,119,120]. A non-intrusive artificial NN approach that used stator current signals instead of vibration signals was also previously applied for REB fault detection and diagnosis for a three-phase induction motor [121].

Recently, in the last 2 years, a comparative study was published [122], where NN-based REB fault diagnosis was compared to SVM-based REB fault diagnosis; results showed that the latter gave better results than the former. An assessment study of the effect of the NN structure and parameters on REB fault diagnosis was carried out in [123] since no formula exists to select the optimal values of these network characteristics. A hybrid fault diagnosis method for a REB fault in the field of gas turbine health management was investigated in [124]. This hybrid technique combined the S-transform algorithm [125] and the artificial NN method. Their results showed that the S-transform could extract good time–frequency domain features from the raw vibration signals for REB fault detection and diagnosis.

3.3 Combined methods for REB PHM

Merging different techniques is a common method of technique development. On this basis, many researchers have combined different SL-based methods to deal with REB PHM. Thus, in this paper, these combined SL-based methods are classified into statistical algorithms with NN methods, NN algorithms with signal processing methods, and statistical algorithms with signal processing methods. Many papers were found [126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152]; selected papers for each group are discussed in the following subsection. It should be noted that the papers discussed are just examples; selection does not indicate a preference or an endorsement by the authors. The only criteria applied for selection is that only the most recent published works were chosen for discussion. However, a summary of the classification of the SL-based REB PHM methods and a list of all reviewed SL-based REB PHM papers with the principle, pros and cons, and their applications, is presented in Table 3.

Table 3 Summary of the reviewed SL-based REB PHM methods

As a statistical algorithm with an NN method, J. B. Ali et al., [126] combined PCA and LDA with PNN and a simplified fuzzy adaptive resonance theory map (SFAM) neural network for early online diagnosis of naturally progressing bearing degradations. The former was used for feature reduction and the latter were used for classification. An artificial NN method combined with a signal processing technique (i.e., discrete wavelet transform (DWT)) was investigated to detect and diagnose the bearing faults of an industrial robot [127]. An ensemble SVM, as a statistical algorithm, was combined with composite multiscale fuzzy entropy (CMFE), as a signal processing method, for REB fault detection and diagnosis [128]. Other studies [129] and [130], tended to combine the ELM algorithm with other methods to better detect and diagnose the different REB faults. The ELM was used as a classifier and combined with multi-scale intrinsic mode function permutation entropy, which extracted feature parameters, after a preprocessing stage to de-noise the original vibration signals using Wavelet as the pre-filter [129]. Tong et al. [130] proposed a fault diagnosis approach for REBs based on redundant second generation WPT and ELM.

A more recent work that was just published in the current year, 2018, proposed a novel FDD Method for REB based on ensemble local characteristic-scale decomposition (ELCD) and the ELM (ELCD-ELM) algorithm [131]. First, numerous intrinsic scale components (ISCs) were obtained by decomposing the vibration signals using ELCD, and then different ISCs (in the time domain, energy, and relative entropy) were calculated to be the inputs to the ELM-based REB FDD. The proposed ELCD-ELM was found to be able to process nonstationary vibration signals and overcome mode-mixing phenomenon of the LCD method.

4 Deep learning for REB PHM

Many SL-based techniques have been applied to the PHM field and investigated to detect, diagnose, and predict (sometimes) rolling element bearing health conditions, as reviewed and summarized in the previous section. Those SL-based techniques achieved decent performance, especially when detecting and diagnosing REB faults. However, few studies that deal with REB fault prognosis were found. Further, from surveying the above-reviewed SL-based REB PHM techniques, it is clear that the performance of those techniques depends greatly on extracting the best-suited features, which were summarized in Table 1. Given the fact that the SL-based PHM techniques manually design and extract the features, in addition to the variety and the large amount of data in the PHM field, it can be concluded that those SL-based PHM techniques will face significant challenges in actually determining the best-suited features to be extracted, especially in the big data scenario. Further, the SL-based PHM techniques have other challenges that come from the big data, such as the high dimensionality of feature space, the proliferation of multimodal data, and multicollinearity among data measurements [23]. Moreover, the four phases of the SL-based PHM technique shown in Fig. 2b cannot be optimized simultaneously (i.e., data processing, feature extraction, feature selection, and model training usually are done successively, not at the same time), which boosts the required processing time (i.e., time-consuming issue) and increases complexity. Therefore, as a technique that has the capability to be a bridge that connects the big data from the machinery and the intelligent machine PHM methods, the DL- based PHM technique is being adapted to the REB PHM field. This DL-based REB PHM method is known as a method that classifies different patterns via stacking multiple layers in hierarchical architectures and can model the high-level representations behind data [21]. Further, the DL-based techniques are gaining popularity even in the PHM field because they can use the raw data directly (without any preprocessing, as shown in Fig. 2c) as an input, i.e., representation learning. They can learn complex and highly nonlinear representations from high-dimensional data [153].

Although the deep learning is not a new concept, it has only recently started to gain more attention and to be successfully applied in different fields, such as computer vision, language and audio processing, and (automatic) recognition [153], [154]. It is only in the last few years that deep learning started to be applied to the PHM field [155,156,157]. To the authors’ knowledge, the deep learning technique was first applied to rolling element bearing prognostics and health management in the year 2015, except for a few works, such as the one from Liu et al. [158]. Liu et al. used sparse coding with a learned dictionary instead of a predefined one for adaptive feature extraction from the vibration signal for REB fault diagnosis; they introduced a natural extension of sparse coding, the shift-invariant sparse coding algorithm. In other work, Verma et al. [159] proposed intelligent condition-based monitoring of REMs using a sparse autoencoder method.

As mentioned in Sect. 3 and following the classification of deep learning methods in Zhao et al. [21], this paper thus classifies and reviews the existing studies on DL-based REB PHM into four groups: (deep) CNN-based, (deep) RNN-based, RBM-based DNN, and AE-based DNN approaches. In addition to briefly introducing the definition and principle of each algorithm with its typical structure, its application to REB PHM is outlined to highlight its challenges, its pros and cons, and its latest advancements.

4.1 (Deep) CNN-based REB PHM approaches

One deep learning technique is the convolutional neural network (CNN) approach. Per its definition, CNN is a feed-forward neural network of multiple layers, which assumes inputs as images [160]. It was inspired by neurons of the human visual cortex that have two features [153]. One is local connections, which means that since images have high correlation within sub-regions, the correlation information is critical in recognizing those images, where the sub-regions in the previous layer are connected to local patches in the feature maps by filters. The other feature is shared weights, where a pattern can appear in various locations in the images, and by convoluting filters across an image, the pattern can be extracted independent of location. In addition, using the same filter across an image, the number of parameters is reduced significantly. Nowadays, many open-sourced CNN models are available (e.g., GoogLeNet, AlexNet) which make them attractive to researchers.

CNN is structured by a series of layers, in which the convolutional and pooling layers come first, and the fully connected layers come last. A descriptive example of the CNN architecture is shown in Fig. 10. The convolutional layer is used to detect local correlation from the previous layer (the raw input). It has a number of hyper-parameters, such as the number of filters, the filter size, the stride, and the zero-padding. Usually, an activation function is applied that can be linear or nonlinear, such as the ReLU (the most used one), the sigmoid, etc., to treat the raw input data and to generate invariant local features. The pooling layer is used to pool out the good features among those local features or to merge several features into one. To do that, it has different pooling operation types, i.e., max pooling, average pooling, L2-norm pooling; max pooling is the most-used type. In some studies, the pooling layer is not used, but instead larger strides are considered. Feature learning can be performed in CNN by stacking and switching the convolutional layers and the pooling operations. The fully connected layer contains full connections to all activations in the previous layer (i.e., the pooling layer). Where, after learning the features, the two-dimensional map is converted into a one-dimensional vector within this layer (i.e., fully connected layer) and then fed into a softmax function for model construction [23].

Fig. 10
figure 10

Descriptive example of convolutional neural network (CNN)

An investigation of the use of the convolutional neural networks (CNN) with a deep structure, from one-layer up to three-layers, on raw signals to test the accuracy of it as a classifier on bearing fault data was proposed in [161], where its effectiveness was investigated when the input signals were corrupted with noise. More works were carried out to deal with REB PHM based on DCNN (deep CNN) [156] or a modified DCNN, i.e., the hierarchical adaptive DCNN [162] and energy-fluctuated multiscale feature learning with deep Convnet for intelligent spindle bearing fault diagnosis in [163]. CNN-based bearing fault detection was proposed in [164], which was considered as a feature-learning model for condition monitoring, so that it can autonomously learn useful features for bearing fault detection from the data itself.

Just last year, 2017, several papers were published, [165,166,167,168,169,170,171], that used CNN-based deep learning to deal with detecting and diagnosing REB faults. Thus, it should be noted that there is a clear tendency toward applying such deep learning techniques for REB fault detection & diagnosis tasks; however, no study paper has yet considered the prognostic task—this research still needs to be pursued.

A hybrid method was proposed by You et al. that benefits from the feature-learning capability of the CNN method—as a deep learning technique—and the generalization ability of the support vector regression (SVR) [165]. In [166] the method was used to detect and diagnose different bearing faults as well as gear faults. The proposed hybrid model, CNN-SVR, was constructed by replacing the top layer of the traditional CNN with an SVR classifier and then the new model was stacked layer-by-layer with convolutional layers and pooling layers inside. The structure of the proposed hybrid model consists of 10 layers totally, including the input layer, three convolutional layers, three pooling layers, two fully connected layers, and a support vector regressive classifier as the top layer. In [167], [168], and [169], the DCNN was applied as a deep learning technique that was combined with other methods to deal with REB FDD. Zhang et al. [167] proposed a novel method named DCNN with wide first-layer kernels (WDCNN); Fuan et al. [168] utilized DCNN with a particle swarm optimization method and the t-distributed stochastic neighbor embedding (t-SNE) technique. Li et al. [169] proposed IDSCNN, which is based on ensemble DCNN and an improved Dempster–Shafer theory based on an evidence fusion technique. Another paper that combined the CNN with a feature extraction algorithm based on EMD method was proposed by Xie and Zhang [170] with attention to extracting distinguishing features (compressed features with spatial information) to solve the nonstationary characteristic in the original vibration signals. Finally, Lu et al. [171] investigated a new hierarchical network of CNN-based deep learning for bearing fault diagnosis under fluctuated working conditions and noisy environments making use of cognitive computing theory.

4.2 (Deep) RNN-based REB PHM approaches

When constructing a deep NN with the same weights that are applied in a recursive way in the network (i.e., the weights are shared across the whole network), a recurrent neural network (RNN) will be generated [172]. RNN is a neural network that has cyclic connections in the hidden units; these can hold past information. Furthermore, RNN processes input data, xt at time step t, sequentially, and the past data is stored in a state vector, ht at time step t, implicitly. Therefore, in the RNN, the output, Ot at time step t, of current data depends on all the past data. An RNN can be seen as a very deep network using the same weight as can be seen in Fig. 11a, b, where, Win are the weights of the inputs, W are the weights of the state, and Wout are the weights of the output. An RNN is powerful in analyzing sequential information; however, it can face the vanishing gradient problem during backpropagation for model training. Thus, the long-short term memory (LSTM) [173] and the gated recurrent units (GRU) techniques [174] were developed.

Fig. 11
figure 11

a A general structure and b the detailed one of recurrent neural network (RNN)

Almost all found papers, [175,176,177,178,179,180] used the RNN as a tool not only for REB fault diagnosis, but also for prognosis, except Abed et al. [175]. Abed et al. [175] used dynamic recurrent neural networks (DRNNs) that can learn the dynamics of nonlinear systems, whereas conventional static neural networks cannot. The DRNN was fed with the orthogonal fuzzy neighborhood discriminant analysis (OFNDA) features to be applied for real-time REB FDD.

Malhi et al. [176] preprocessed vibration signals from defect-seeded REB using CWT and then used a competitive learning-based approach based on the RNN algorithm for long-term prognosis. Different statistical parameters were utilized as inputs to the RNN, which were clustered based on the principle of competitive learning to effectively represent the bearing defect propagation. The results showed that the RNN did not work well in a short-term prediction case, but for long-term prediction, the RNN did increase the training speed and achieved good prognostic results. Sharma et al. [177] proposed a robust fault analysis method to diagnose and predict the level of fault severity of a REB. They used DWT for feature extraction, and an orthogonal fuzzy neighborhood discriminative analysis (OFNDA) technique for feature reduction. Finally, a DRNN method was used to predict the REB conditions and classify their different faults. Xie and Zhang [178] used two methods, echo state network (ESN) and recurrent multilayer perceptron (RMLP), which are functionals of RNN, for vibration-based REB fault prognosis. The two methods used were able to predict the REB health condition in a relatively short time and with only limited data available, contrary to the autoregressive moving average (ARMA) and SVM methods. More recently, the RNN was used as the main tool for REB prognosis [179], [180], where in [180] it was applied in the time and the frequency domains; test results showed that the RNN can be used to do fault prognosis in general, and especially for bearing health conditions. These prior studies showed promising results regarding the ability of RNNs to predict the RUL, is an important factor for decision-making to alleviate emergency situations. Thus, the use of an RNN for REB fault prognosis is worth further in-depth study.

4.3 RBM-based DNN approaches for REB PHM

Deep neural networks (DNNs) belong to the category of artificial NNs, but they are generally superior since they are known to have strong power for learning representation. A DNN that builds an architecture using a deep learning technique, which is a layer-by-layer learning technique, has the ability to deal with the issue of a local-optimal to train the parameters of the network [111]. A deep DNN structure can be built either by the restricted Boltzmann machine (RBM) or by the autoencoder (AE) technique. In the next two subsections, research on RBM-based DNN and the AE-based DNN for REB PHM is reviewed, respectively. Thus, a brief description of RBM is presented first, with variant models that used it as the basic learning module, i.e., deep-belief networks (DBN) and the deep Boltzmann machine (DBM). Then, a comprehensive review of existing studies examining RBM-based DNN for REB PHM is presented.

An RBM is a network of symmetrically coupled stochastic binary units composed of a visible layer with visible nodes, vi, and a hidden layer with hidden nodes, hj. To build the RBM, there must be a symmetric connection between the visible and the hidden units and no connection among the same layer, as can be seen in Fig. 12a. Further details can be found in [181] and in [21].

Fig. 12
figure 12

Frameworks of a RBM architecture, and b RBM-based DBN structure

Stacking multiple RBMs, the DBN is constructed, as can be seen in Fig. 12b. Thus, the DBN is a NN of multiple layers that has stochastic latent variables (hidden units) and a generative graphical model [182]. The DBN has two steps of training, first an unsupervised layer-wise pre-training (RBM 1, RBM 2, and RBM 3 in Fig. 12b), and then a supervised fine-tuning (fully connected (FC) layer in Fig. 12b). In pre-training, each hidden layer serves as the visible layer for the next layer.

In contrast to a DBN, the DBM is built by grouping hidden units into a hierarchy of layers instead of a single one. Thus, the DBM is simply a deep structured RBM, where any adjacent layers can be connected, but non-adjacent layers cannot be connected. In addition, no connection is permitted within units of the same layer. The DBM adopts learning a complex, fully connected Boltzmann machine, in which each layer captures complicated, higher order correlations between the activities of hidden features in the layer below [183].

First, the RBM-based DNN structure that uses RBM as the basic learning module, i.e., the deep-belief network (DBN), was employed as a bearing condition monitoring tool to overcome the presence of noise and transient impacts in the acquired vibration signals in [184]. Another research paper that uses an optimization DBN method to deal with REB fault diagnosis was achieved by Shao et al. [185]. Different research works have been performed by combining the DBN with other techniques to improve the REB detection, diagnosis, and prognosis capability. Wang et al. [186] proposed a bearing fault diagnosis method based on the Hilbert envelope spectrum and a DBN. Getting the right parameters of the DBN is crucial, however, it can be time-consuming due to the training process. Thus, a research study in [187] was proposed to deal with this issue and to avoid both the overfitting and the under-fitting problems. An assessment of the bearing degradation based on the Weibull distribution and a DBN was investigated by Ma et al. [188]. Bearing fault diagnosis based on a DBN and multi-sensor information fusion techniques was carried out based on use of multi-vibration signals to adaptively fuse multi-feature data and identify various bearing faults [189]. Yin et al. [190] developed a combined machine health assessment model based on an Isomap and a DBN, which effectively evaluated the degradation of the bearing health conditions, since it was found to be more sensitive to the incipient faults. A two-layer hierarchical diagnosis network (HDN) [191] that deals with REB diagnosis in two stages was carried out using a wavelet packet energy feature. The bearing fault types were identified by the first layer, then their severity ranking was recognized in the second layer. Finally, the HDN was compared to two similar networks constructed by SVM, and to a backpropagation neuron network (BPNN); according to the experimental results, it could deal with the presence of noises and disturbances that gave rise to the overlapping problem among the different fault classes and was more reliable for precise, multi-stage diagnosis.

One critical challenge for performing prognosis of bearings in the era of the IoT and 4th Industrial Revolution is to automatically process massive amounts of data and accurately predict the RUL of bearings. Recently, a study of Deutsch et al. [192] addressed the limitations of SL-based REB prognostics, and presented a new method that integrates a DBN and a particle filter for RUL prediction of hybrid ceramic bearings; the study then compared the results with DBN and particle filter-based approaches. The validation and comparison results showed promising RUL prediction performance of the integrated method. Early bearing fault diagnosis using effective feature selection methods was proposed by Devendiran et al. [193]; these researchers used a DBN as one of the neural network classification algorithms. In contrast to the conventional fault diagnosis and classification methods, which usually do not consider the temporal coherence of time series data, Zhang et al. [194] proposed a REB FDD model based on a DBN. It can directly recognize raw time series sensor data without feature selection and signal processing. It also takes advantage of the temporal coherence of the data, thus, expertise in feature selection and signal processing is not required.

In the current year, 2018, three papers have already been published. All of them used the DBN-based deep learning method to deal with REB PHM. In [195], in contrast to the shallow learning methods, which require establishing explicit model equations and much prior knowledge (and therefore are limited in the age of big data as explained in the previous section), this paper presented a deep learning-based approach for RUL prediction of rotating components with big data. The developed deep learning-based approach was a DBN-feedforward neural network (DBN-FNN) algorithm that takes advantage of the self-taught, feature-learning capability of the DBN and the predicting power of the FNN; together, these strategies overcome the above-mentioned limitations. A novel convolutional deep-belief network (CDBN) was proposed for REB PHM in [196]. First, an autoencoder was used to compress data and reduce the dimension. Second, a novel CDBN was constructed with Gaussian visible units to learn the representative features. Finally, the exponential moving average (EMA) was considered to improve the performance of the constructed deep model. Another study was performed by Oh et al. [197] where the researchers developed a DBN-based deep learning method with vibration images as the inputs. The developed method was found scalable, due to the fact that the vibration imaging approach devised incorporates data from systems with various scales, such as small testbeds and real field-deployed systems. Further, the method was proposed for unsupervised feature engineering. The proposed DBN-based deep learning algorithm was pre-trained for high-level feature extraction, where a large amount of field data without any label can be incorporated since pre-training can be achieved in an unsupervised manner. Then, the pre-trained DBN was fine-tuned by combining it with a multilayer perceptron (MLP), leading to a fault classifier. The pre-trained DBN could also be used as a fault cluster by combining it with a self-organizing map (SOM).

Second, an RBM-based DNN structure that stacks multiple RBMs, i.e., deep Boltzmann machine (DBM), was investigated and used for REB condition monitoring in [198]. In the study, several time, frequency, and time–frequency domain features were extracted from an acquired data set with seven fault patterns to assess the performance of the proposed DBM for REB fault diagnosis. The seven parameters were used as the input parameters of the DBM model. Their results clarify the accuracy and reliability of the DBM model. An enhanced RBM was considered with prognosability regularization for prognostics and health assessment of the REBs. The proposed DBM method was benchmarked with deep structure of the regular RBM algorithm and the PCA [199]. A scoring method based on the benchmarking score was used to evaluate each PHM method in its ability to predict the RUL. He et al. [200] proposed a novel bearing diagnosis method based on the Gaussian restricted Boltzmann machine (Gaussian RBM) algorithm using vibration signal data. The envelope spectrums were used directly as the feature vectors to represent the fault types of the bearing and then classified using the proposed Gaussian RBM algorithm.

The deep-statistical feature learning (DSFL) of the machinery condition health monitoring can be constructed by Gaussian-Bernoulli deep Boltzmann machine (GDBM) [201], where in the GDBM, each neuron in the intermediate layers is connected to both top-down and bottom-up information, unlike in other RBM-based deep models, such as the DBN and the deep autoencoder. For deep learning of statistical features with unknown value boundaries, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) were stacked to develop the GDBM-based DSFL method in [202] and were applied for both bearing and gearbox systems. Deutsch and He [203] dealt with bearing RUL prediction with big data based on a deep learning technique that used the DBM, which predicts L steps ahead in the future, to predict the RUL by predicting the RMS values and the time of the bearing’s failure.

4.4 AE-based DNN approaches for REB PHM

The autoencoder [204] belongs to the unsupervised machine learning structure; it is a feedforward NN that consists of two phases, the encoder phase and the decoder phase. Its main characteristic is that it tries to predict the output equal to the input. A common architecture of AE is shown in Fig. 13a and the AE-based DNN structure is shown in Fig. 13b. In the encoder phase, several features will be extracted from the input vector to form the hidden layer via a nonlinear mapping, using a weight matrix W1. In the decoder phase, the output vector will be predicted to reconstruct the original input vector in a similar way by employing a weight matrix W2. It should be noted that the hidden layer could have a smaller dimension m than the input layer dimension n (i.e., m < n) so the AE will be forced to learn a compressed form of the input, which will try to find the correlation among the input vectors. Also, it can have a bigger dimension m > n and the AE still can have an interesting structure by imposing other constraints on the network.

Fig. 13
figure 13

Frameworks of a AE architecture, and b the AE-based DNN structure

The AE-based trained layers can be stacked into a new network, which is an AE-based DNN. By training and stacking various layers of AEs, diverse structures of AE-based DNN can be generated to extract features that present health states of various engineered systems as shown in Fig. 13b. Moreover, a deeper layered structure than the one in Fig. 13a is another widely used form of AE-based DNN, which can discover highly representative features from extremely complexed signals.

The AE was applied to rolling element bearing fault diagnosis by W. Lu et al. [205] to extract the features from the raw signal and guarantee sensitivity to every interested fault category to avoid incomplete diagnosis results and the appearance of unknown-category faults. Six classes of the different bearing faults were considered to evaluate the proposed method after data preprocessing using the FFT to generate a 600 points length. Therefore, the built DNN was a two-layer structure with 800 and 400 neurons in the hidden layers, which were constructed by an AE. The authors concluded that the constructed DNN-based AE could extract useful features and further studies should be carried out to better classify the fault categories with high accuracy and to handle the unknown-category fault cases. F. Jia et al., [206] stated that the AE-based DNN method could overcome the two issues hindering ANN-based intelligent fault diagnosis of rotating machinery—1) the need for prior expertise and knowledge to manually extract fault features, and 2) the limitation in learning the complex nonlinear relationships in fault diagnosis. Thus, F. Jia et al., [206] suggested a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data, based on AE with deep architectures, instead of shallow ones. Another study [207], took advantage of the learning capability of the AE and combined it with the digital wavelet frame (DWF) and nonlinear soft threshold method to de-noise the fault vibration signal first. The study applied a stacked autoencoder (SAE) to extract the features, which were the inputs to a BP network classifier. A method was employed to deal with extracting features from the stationary and nonlinear characteristics of bearing deep learning stacked de-noising autoencoder (SDAE) [208] non-vibration signals. The deep learning SDAE, combined with dropout, was found to be useful to learn good representations of those features and improve fault pattern classification robustness.

The AE-based DNN was also used to classify REB fault classes under two and three hidden layers in an unsupervised manner using the encoder part of the AE [209]. L. Guo et al., [210] suggested a new multi-feature extraction and a nonlinear dimension reduction algorithm based on deep learning for bearing condition recognition. Different time domain, frequency domain, and time–frequency domain features were calculated and then their dimension was reduced. Finally, the different bearing faults were classified using a top-layer classifier of AE-based DNN outputs.

S. Tao et al. [211] combined the SAE with the softmax regression method, which is a classification method that generalizes logistic regression to multiclass problems, for examining the bearing fault diagnosis problem. Their results showed that combining SAE with the softmax regression method had a strong robustness and eliminated the impact of noises remarkably. H. Liu et al., [212], for the first time, combined the short-time Fourier transform (STFT) and the stacked sparse autoencoder (SSAE) with the softmax regression to automatically extract the features from the sound signals and classify the different REB fault modes, respectively. The effectiveness of the proposed STFT-SSAE method was investigated and compared to empirical mode decomposition (EMD), Teager energy operator (TEO), and SSAE to evaluate its performance in deploying the PCA technique for dimensionality reduction. Taking the advantage of the high training speed of the ELM method and the AE extraction capability, another deep learning algorithm named the “AE-ELM-based diagnosis method” was proposed by Mao et al. [213] to build a universal extraction and a fast-trained method to deal with the different REB diagnosis issues. High-level features, which were extracted in the frequency domain and the Wavelet packet transform domain, were extracted in [214] using a DNN. First, the weights of the DNN were initialized using Stacked Denoising Sparse Autoencoders (SDSAE), then those weights were finetuned based on the softmax regression and the centering towards the median. These high-level features were then classified using the SVM and the random forest technique simultaneously. In [215], the SDAE was also applied to denoise random noises in the raw signals and to represent fault features in fault pattern diagnosis for both bearing rolling faults and gearbox faults. The SDAE was trained in a greedy, layer-wise fashion. The proposed method was compared to the DBN algorithm in a highly noisy environment and the results showed its superiority for fault diagnosis.

More recent works, just accepted or lately published in last/current year, 2017/2018, have been conducted to investigate the AE method or its varieties for use in REB fault diagnosis. Chen and Li [216] proposed a multi-sensor feature fusion for bearing fault diagnosis using sparse auto encoder and a DBN. Lu et al. [217] solved the health state identification problem in REB fault diagnosis using a SDAE method. Another deep learning method named “automated AE correlation-based (AEC)” was developed by Hasani et al. [218]; it was used for health monitoring and for prognostics of machine bearings. A hybrid feature pool method was proposed in [219] that was combined with SAE-based DNNs to perform effective diagnosis of REB faults of multiple severities. The authors found that the hybrid feature pool could extract more discriminating information from the raw vibration signals to overcome the non-stationary behavior of the signals caused by multiple crack sizes; the proposed method outperforms the SVM and the BPNN. In [220], a locality preserving projection (LPP) was adopted to fuse the deep features, and thus to build a new deep AE method constructed with a denoising autoencoder (DAE) and a contractive autoencoder (CAE) for the enhancement of feature-learning ability with the goal of diagnosing REB faults. A hybrid deep model consisting of a multi-channel CNN followed by a stack of denoising autoencoders (MCNN-SDAE) was developed by A. Shaheryar et al., [221] for fault identification in rotary machines. In the study, these researchers explored the MCNN for unsupervised feature learning on vibration signals and SDAE for extracting vibration features that are robust and invariant to the noises in vibration signals.

Two papers, [222] and [223], that proposed intelligent bearing fault diagnosis methods were just published in the current year, 2018. [222] was built by combining compressed data acquisition and SSAE-based deep learning; in [223], the authors explored, highly compressed measurements of REB vibration signals under different operating conditions, taking advantage of the compressed sensing (CS) method [225] that allows sampling the signals below the Nyquist frequency. A summary of the classification of the DL-based REB PHM methods and a list of all reviewed DL-based REB PHM papers with the principle, the pros and cons of these methods, and their applications, is presented in Table 4.

Table 4 Summary of the reviewed DL-based REB PHM methods

5 Summary and concluding remarks

This paper has presented a comprehensive review and summary of recent techniques aimed at REB fault detection, diagnosis, prognosis, and their applications. Thus, this paper attempts to elegantly represent the widespread, contemporary REB PHM techniques by considering two main categories, shallow learning algorithms and deep learning methods. First, the different bearing failure modes were briefly described, focusing on fatigue, wear, plastic deformation, corrosion, electrical erosion, and fracture & cracking modes. Then, the different health features (indexes, criteria), which are used by these contemporary REB PHM techniques, were thoroughly described (with their physical meaning where applicable—some of the features do not have any) to provide an overall background for researchers, system engineers, and experts—in the general PHM field and in the specific REB PHM field—to select and adopt the best fit for their specific applications.

Several SL-based algorithms were found and were applied to REB PHM systems; some originated from artificial neural networks (NN), some did not. Thus, three categories were proposed in this paper: (1) Statistical approaches, which were divided to LDA-based REB PHM, SVM-based REB PHM, KNN-based REB PHM, ELM-based REB PHM, and other statistical algorithms for REB PHM; (2) NN approaches; and (3) combined methods. Further, DL-based REB PHM techniques were also reviewed and classified into four groups in this paper, as follows: (1) (Deep) CNN methods; (2) (Deep) RNN methods; (3) RBM-based DNN methods—subdivided into DBN-based REB PHM and DBM-based REB PHM; and (4) AE-based DNN methods. Furthermore, the principle, the pros and cons of these SL-based REB PHM, and DL-based REB PHM methods, and their advancements and applications were reviewed and summarized.

From this survey study, several key points can be concluded, including:

  • Although both SL-based and DL-based REB PHM techniques achieve good results in detecting and diagnosing different REB faults (sometimes they achieve perfect results with 100% accuracy), they are still not adopted in industry due to a lack of studies that consider how these contemporary techniques (i.e., SL-based and DL-based) will be applied in practice. Thus, it will be very interesting if academics and industrial experts work together to adopt and study these strategies. To consider different scales, different fault modes (i.e., a single failure mode as well as compound failures), and different bearing types, such as journal bearings and magnetic bearings that are becoming more incorporated in real-world applications nowadays should be studied. Furthermore, it is recommended that companies and industry experts to share their data healthy, faulty, and run-to-failure data—with academics, who usually use only data collected from their in-lab test bench; this shared data would help to achieve better advancements not only in research, but also for practical industry use.

  • It is well known in the PHM field that if an accurate enough reference model exists, using a model-based technique for detecting, diagnosing, and predicting the faults is the best choice. Thus, incorporating dynamic models of REBs could improve the accuracy of the REB PHM methods. Further, since fault data are very rare and hard to get from modern engineered systems, researchers can benefit from the recently developed generative adversarial networks (GAN) technique for generating faulty data.

  • Although there have been significant advancements in the development of both the SL-based REB PHM techniques and the DL-based REB PHM techniques, there is still no formula or law that exists to select the optimal values of the network geometry or hyper-parameters (e.g., number of layers) to achieve the best results in detecting, diagnosing the bearing faults, and (ultimately) predicting health conditions. Thus, providing a standardized platform or at least a streamline of how deep those algorithms should be, with consideration of the fact that most companies lack software, modeling, and expertise to understand deeply those algorithms and to interpret their results, will enable integration of these contemporary techniques into real-world applications.

  • Finally, nearly all existing REB PHM, whether based on shallow learning or deep learning techniques, have targeted only the REB fault detection and diagnosis (condition monitoring) problem. Very few studies were found that deal with the REB prognosis with the aim of predicting the remaining useful lifetime (RUL) with the goal of providing a better condition-based maintenance (CBM) strategy. Strategies that begin to enable improved CBM will be of great interest to the rolling element bearing PHM field in particular, and to PHM for any modern engineered system in general, especially in the forthcoming years in the age of IoT and big data.