Keywords

1 Introduction

In modern industrial systems, there is an increasing trend toward the need for more reliable machines. Rotating machines are commonly used mechanical equipment in various industrial applications. They accounts for more than 90% of industrial machines [1]. As these machines usually operate under dynamic and harsh conditions for a long time, they often suffer from various types of mechanical failures. Any type of failure in rotating machines, even minor failure, cannot be accepted as it can significantly affect the entire system, and can even lead to undesirable downtime, huge economic losses and serious safety problems [2, 3]. Consequently, research on fault diagnosis is practically significant to enhance the reliability of machines, reduce economic losses, and avoid safety problems [4, 5].

Over the last decades, numerous methods have been presented to diagnose the faults of rotating machines. These methods fall into three broad classes: model-based methods, statistical methods, and artificial intelligence-based methods [6]. Model-based methods are formed based on the physical characteristics of a monitored machine with the necessary assumptions to establish an explicit mathematical model [7]. However, it is challenging to establish an exact mathematical model for complex systems [8]. The high complexity of industrial faults and the cost of model-based methods limit their applicability in fault diagnosis of machines. Statistical methods assume that historical data can be used to establish the fault modes and the future mechanism of machine failure [9]. However, this assumption might not hold in practical scenarios because the failure mechanisms of machines are complex, nonlinear, and involve the coupling of different physical processes. Nowadays, artificial intelligence-based fault diagnosis methods are the focus of academic and industrial research for overcoming the problems in the fault diagnosis of complex industrial machines [10]. The primary reason is that artificial intelligence methods are instrumental if they can be improved as compared with other methods. Artificial intelligence can be easily extended and modified. These methods can also be made adaptive by integrating new data [11].

Motivated by the advantages of artificial intelligence methods, intelligent fault diagnosis methods have gained great attention in recent decades. Therefore, this paper provides a review of artificial intelligence methods applied for fault diagnosis of rotating machines, with a special emphasis given to deep learning methods published from 2017 to 2022. This paper analyzes the strengths and weaknesses of each method, so as to give valuable guidance for researchers in selecting an appropriate intelligent method for specific applications instead of choosing randomly. The research challenges in this field are also discussed to provide possible research directions for further exploration.

The remainder of this paper is organized as follows. Section 2 provides a general overview of intelligent fault diagnosis of rotating machines. Section 3 presents a detailed review of the applications of deep learning methods in the fault diagnosis of rotating machines. Section 4 discusses the observations of the review, research challenges, and future direction in this area. Finally, conclusions are drawn in Sect. 5.

2 Overview of Intelligent Fault Diagnosis of Rotating Machines

In the last decades, traditional machine learning methods have been widely applied in the intelligent fault diagnosis of rotating machines. These methods mainly comprise three consecutive steps: data acquisition, feature extraction, and fault classification [12, 13]. In the data acquisition stage, a variety of signals such as vibration, acoustic emission, noise, temperature, etc., are acquired from target machines by sensor systems [14]. In the feature extraction stage, fault-sensitive information from sensor signals is manually extracted using different types of signal processing methods [15]. Such processes rely too much on the step of feature extraction, which requires prior signal processing knowledge and diagnosis experience [16, 17]. Finally, the extracted features are fed into the traditional machine learning methods for classification [18, 19]. However, the traditional machine learning methods are designed for specific types of faults or machines and therefore are case dependent and not used for general applications [20]. Moreover, these methods are not efficient for processing high-dimension data [21]. In general, traditional intelligent diagnosis methods have low diagnosis performance for machines that operate under adverse and complex conditions [22]. These reasons can seriously restrict the applicability of traditional machine learning methods for rotating machine fault diagnosis.

Recently, deep learning methods have received great interest and achieved significant successes in machine fault diagnosis, which overcomes the limitations of traditional machine learning methods [23, 24]. Deep learning-based fault diagnosis methods can extract the learnable features from large amounts of sensor data directly by constructing deep network architecture with multiple layers of linear and non-linear transformations and performs an end-to-end fault diagnosis [25, 26]. In the following section, the most common deep learning methods have been discussed.

3 Deep Learning Methods in Fault Diagnosis of Rotating Machines

This section reviews the applications of the most common deep learning methods and their corresponding variants in fault diagnosis of rotating machines.

3.1 Convolutional Neural Network (CNN)

Convolutional neural network (CNN) is a biologically inspired feed-forward neural network used to extract local features from the raw sensor data to perform classification [27]. The typical convolutional neural network model consists of multiple hidden layers, namely, the convolution layer, pooling layer, and fully connected layer [28]. The convolution layer is composed of a series of learnable filters (also known as kernels) that can extract different features of input data to generate new feature maps as the input to the next layer. The pooling layer is the down-sampling layer which decreases the size of the input and the number of parameters, and thus it can decrease the number of computations and prevent overfitting. The fully connected layer is used to compute the class scores [29].

Convolutional neural network (CNN) was originally designed for processing two-dimensional (2D) or three-dimensional (3D) input data such as images and video frames [30]. The traditional CNN is not suitable for fault diagnosis of mechanical equipment since most measured signals are one-dimensional (1D) signals. Thus, the input 1D data needs to be converted into 2D data through some methods to complete the feature extraction and classification [31]. Studies by [32, 33] proposed CNN-based fault diagnosis methods by converting the original 1D signals into 2D images for different machine diagnosis tasks. However, the process of converting the original signal is time-consuming, and may certainly cause the loss of faulty data. The emergence of a one-dimensional convolutional neural network (1D-CNN) provides a feasible solution to avoid the above problems. Compared with 2D-CNN, 1D-CNN has a simpler and more compact network structure, and it can effectively diagnose the faults of machines with limited training data. Using one-dimensional vibration signals as input data, researchers often utilize a 1D-CNN to diagnose the faults of different rotating machine components, such as bearings [12, 20, 34], automobile engines [35], and gearboxes [36]. Abdeljaber et al. [37] used a one-dimensional convolutional neural network (1D-CNN) for structural damage detection based on vibration signals. Yin et al. [38] combined 1D-CNN with self-normalizing neural networks (SNN) to improve the diagnosis accuracy and generalization capability of rotating machine fault diagnosis.

Many researchers have made significant efforts on developing novel CNN-based models and have achieved considerable progress. Jia et al. [39] developed a deep normalized CNN for imbalanced fault classification of machines from mechanical vibration signals. In [40, 41], an adaptive deep CNN model was to diagnose the faults of rolling bearings. Kolar et al. [42] proposed a multi-channels deep CNN model for rotary machine fault diagnosis from the raw vibration data. Sun et al. [43] presented a convolutional discriminative feature learning (CDFL) approach to diagnose the faults of the motor. Dilated CNN methods have been used for bearing fault diagnosis from raw vibration signals [44, 45]. Liu [46] developed a dislocated time series CNN to diagnose the faults of an induction motors. Zhang et al. [47] utilized a CNN model with wide first-layer kernels for rolling bearing fault diagnosis using one-dimensional vibration data. Chen et al. [48] developed a novel deep capsule network with stochastic delta rule (DCN-SDR) for rolling bearing fault diagnosis. Ye and Yu [49] proposed a deep morphological CNN for fault diagnosis of the gearbox. Wang et al. [50] developed a novel multiple-input, multiple-task CNN method for roller bearing fault diagnosis. Studies [51,52,53] proposed a hierarchical convolutional neural network (HCCN) for fault diagnosis of different rotating machine components. Zhang et al. [54] developed an intelligent method based on multi-level information fusion and hierarchical adaptive CNN to diagnose the faults of centrifugal blowers. Jiang et al. [55] developed a multiscale convolutional neural network (MSCNN) for fault diagnosis of wind turbine gearboxes. Wang et al. [56] proposed a cascade CNN with progressive optimization for motor fault diagnosis under dynamic working conditions.

3.2 Recurrent Neural Network (RNN)

Recurrent neural network (RNN) is the deepest neural network with both feedforward connections and internal feedback connections between network layers. Varying from feedforward neural networks like CNN, RNN can exploit temporal information from multiple sequential data because of its internal memory. Neurons of RNN can not only receive information from other neurons but also receive their information to form a network structure with loops. RNN has more advantages in exploiting temporal information. Thus, it has been widely utilized in machine fault diagnosis. Hu et al. [57] utilized an improved deep RNN for rotating machine fault diagnosis. Huang et al. [58] proposed the RNN-based variational auto-encoder (VAE) for motor fault detection. However, RNN has gradient vanishing and exploding problems, thus it has inherent limitations in capturing long-term information [59, 60]. To overcome the limitations of the recurrent neural network, researchers have proposed long short-term memory (LSTM) [61], gated recurrent unit (GRU), and other improved RNN models.

As an improved recurrent neural network, LSTM resolves the problems of gradient vanishing and exploding, and captures long-term dependencies and nonlinear dynamics of time series data [62]. As a result, the LSTM model with memory function has gained increasing attention in machine fault diagnosis. For instance, Yin et al. [63] developed an optimized fault diagnosis method based on the cosine loss LSTM neural network for the wind turbine gearbox. Yang et al. [64] developed an improved long short-term memory model to diagnose the faults of electromechanical actuators. However, LSTM cannot make full use of data since it can only process data in one direction [65]. Furthermore, unidirectional LSTM has a relatively high network complexity, thus the training process takes a long time [66]. Bi-directional LSTM is an improvement of LSTM that can address the limitations of unidirectional LSTM. Bi-directional LSTM can extract features from both forward and backward directions. Cao et al. [67] developed a novel intelligent method based on deep bi-directional LSTM diagnose the faults of wind turbine gearboxes. Han et al. [68] combined Bi-LSTM and a Capsule Network with a CNN for rotating machine fault diagnosis. The bi-directional LSTM was used to complete the feature denoising and fusion, which was extracted by a convolutional neural network and used a capsule network to achieve the fault diagnosis for insufficient training data. Li et al. [69] utilized Bi-LSTM to detect the faults of rolling bearings. Thus, Bi-LSTM has higher diagnosis accuracy and efficiency than unidirectional LSTM.

Compared with the LSTM model, a gated recurrent unit (GRU) can better handle large training data with a simple network structure and fewer parameters, thus it greatly reduces the calculation efficiency [70]. Liu et al. [71] utilized a GRU to diagnose the faults of rolling bearings. Tao et al. [72] utilized a multilayer GRU method for fault diagnosis of spur gear from vibration signals. To verify its superiority, the proposed method was compared with LSTM, multilayer LSTM, and support vector machine (SVM). Besides the most basic GRU structure, Bidirectional GRU (Bi-GRU) has also been employed for fault diagnosis. Bi-GRU can learn information from both forward and backward directions of the input data at the same time. For this reason, Lv et al. [73] proposed a new heterogeneous Bi-GRU method based on fusion health indicators. Zhao et al. [74] utilized a local feature-based GRU network for bearing fault diagnosis. This method used an enhanced bi-directional gated recurrent unit to extract high-level features from vibration data.

Recurrent neural network and its improvement have also been combined with other machine learning methods. Fu et al. [75] combined a CNN with a LSTM to monitor and warn of the fault of wind turbine gearbox bearing using temperature data. Zhao et al. [76] also combined bi-directional LSTM with CNN to address tool wear prediction tasks. In this study, convolutional neural network was used to extract local features from the sequential input, and bi-directional long short-term memory was used to encode the temporal information. Qiao et al. [77] combined a deep CNN with LSTM to complete an end-to-end bearing fault diagnosis under variable loads and different noise interferences. Liao et al. [78] developed a fault diagnosis method for hydroelectric generating units based on one-dimensional convolutional neural network and GRU from the raw vibration signal collected under different operational conditions.

3.3 Generative Adversarial Network (GAN)

In practical engineering scenarios, the faulty data collected from the target machine is usually more limited than the normal data, i.e., the model training data is highly imbalanced. The deep learning method trained with imbalanced data is prone to poor generalization performance. The generative adversarial network is a well-known data generative model inspired by the game theory that can address the data imbalance problem [79]. The generative adversarial network (GAN) model is mainly composed of a Generator and a Discriminator [80]. The generator receives the original data to generate new data that have a similar distribution to the real data, thus expanding the training dataset. The generated new data are imported to the discriminator with the original data to predict whether the input data are real or false data [81]. As a result, GAN has been successfully employed to fault diagnosis. For instance, Liu et al. [82] developed a fault diagnosis method based on global optimization generative adversarial network to solve the unbalanced data problem of rolling bearings. Ding et al. [83] proposed a novel fault diagnosis method for rotating machines based on GAN, and validated the effectiveness through small sample rolling bearing and gearbox datasets.

Recently, researchers have made many improvements and developed a wide variety of generative adversarial network variants. For instance, Yan et al. [84] developed a fault detection and diagnosis method that utilizes the conditional Wasserstein GAN to overcome the imbalanced data problem for air handling units. Zheng et al. [85] proposed a conditional GAN model with a dual discriminator for imbalanced rolling bearing fault diagnosis. Studies by [86,87,88,89] used a deep convolutional GAN for fault diagnosis of rotating machines with imbalanced data. Luo et al. [90] utilized a conditional deep convolutional GAN to address the data imbalance problem in machine fault diagnosis. Shao et al. [91] utilized an auxiliary classifier GAN to generate fake sensor signals to solve unbalanced fault data problems and diagnose the faults of the induction motor. Xiong et al. [92] utilized a Wasserstein gradient-penalty GAN with a deep auto-encoder (DAE) to diagnose the faults of rolling bearing. However, Wasserstein gradient-penalty GAN has the shortcomings of vanishing gradient and mode collapse. As a result, Li et al. [93] proposed a rotating machine fault diagnosis model based on a deep Wasserstein GAN with gradient penalty for the imbalanced data problem. Zareapoor et al. [94] developed a new model named Minority oversampling GAN for class-imbalanced fault diagnosis. Zi et al. [95] proposed a novel multitask redundant lifting adversarial network (MRLAN), and the results confirm its satisfactory performance under sharp speed fluctuation and little data. Liu et al. [96] developed a variational auto-encoding GAN model with deep regret analysis for bearing fault diagnosis. The study by [97] combined an auxiliary classifier GAN with a stacked denoising auto-encoder for fault diagnosis of rolling bearing. Liu et al. [98] proposed a categorical adversarial auto-encoder (CatAAE) for fault diagnosis of rolling bearings under different working conditions and achieved satisfactory performance and high clustering indicators even in different working conditions.

While GAN and its extensions have yielded certain success for the imbalanced training dataset problem, there are still some practical problems that need further exploration. For example, sometimes GAN generates no reasonable data due to the lack of auxiliary information in the deep features of input data. Besides, to create sufficient fault data, a generative adversarial network consumes a huge computing resources and takes a long training time. Consequently, it is practically significant to develop novel fault diagnosis methods based on GAN to overcome these real problems.

3.4 Auto-Encoder (AE)

Auto-encoder (AE) is an unsupervised feed-forward neural network that uses a backpropagation algorithm to learn discriminative features in an unsupervised manner by minimizing reconstruction error between the input data and the output data [99, 100]. The typical AE consists of the input layer, hidden layer, and output layer. The input and hidden layers form the encoder network, whereas the hidden and output layers form the decoder network [101]. The encoder transforms the high-dimensional input data into low-dimensional hidden features and the decoder reconstructs the input data from the learned hidden features [102].

Compared with supervised deep learning methods like CNN and RNN, the auto-encoder possesses the properties of unsupervised learning, high-efficiency training, simple structure, and easy implementation. As a result, auto-encoders had been applied for fault diagnosis of bearings [103], electric motors [104, 105], turbines [106], and other components. However, the standard auto-encoder (AE) has limited feature extraction capability for fault diagnosis tasks due to the lack of label data [107]. Moreover, in most practical situations, the measured signals are always polluted by heavy background noises, which decreases the performance of the standard auto-encoder. To overcome the aforementioned challenges, several variants of the auto-encoder have been introduced into rotating machine fault diagnosis. The common variants are denoising auto-encoder (DAE), sparse auto-encoder (SAE), contractive auto-encoder (CAE), and variational auto-encoder (VAE) [108]. The AE, DAE, SAE, and CAE can be stacked to extract deep features with better representative ability, which are named stacked auto-encoder (SAE), stacked denoising auto-encoder (SDAE), stacked sparse auto-encoder (SSAE), and stacked contractive auto-encoder (SCAE), respectively. The following subsection reviews their applications in machine fault diagnosis.

Stacked Auto-Encoder (SAE).

The structure of the stacked auto-encoder (SAE) is composed of multiple auto-encoders stacked that can extract more implicit features from high-dimensional complex data and reduce the dimensionality of input data than a single auto-encoder [109]. In SAE, the output of the formerly hidden layer is used as the input to the next hidden layer [110]. Since SAE is an unsupervised learning method, it cannot be directly employed for machine fault diagnosis. Consequently, a classification layer is usually added at the end of the network structure of the model. In this context, Liu et al. [111] proposed a stacked auto-encoder (SAE) based deep learning method for gearbox fault diagnosis. Studies by [112, 113] utilized SAE to develop new methods for fault diagnosis of roller bearings. Karamti et al. [114] developed a fault diagnosis method based on stacked auto-encoders for diagnosing rotating system faults with imbalanced samples. An et al. [115] developed a batch-normalized stacked auto-encoder method for intelligent fault diagnosis of rotating machines. The effectiveness of this method was validated through motor bearing and gearbox datasets. Shao et al. [116] also proposed an improved SAE to diagnose the faults of rotating machines, and its effectiveness was validated through sun gear and roller bearing datasets.

Denoising Auto-Encoder (DAE).

The denoising auto-encoder (DAE) is an AE obtained by adding noise to the input data with some statistical characteristics to increase the anti-noise capability [117]. DAE can automatically extract robust features from corrupted and partially destroyed data, so it is more suitable for fault diagnosis of different rotating machines. For instance, studies by [118, 119] employed a DAE to diagnose the faults of rolling bearings. Lu et al. [120] applied the stacked denoising auto-encoder (SDAE) for rolling bearing fault diagnosis. Zhao et al. [121] developed a deep learning method using a SDAE for motor fault diagnosis. Chen and Li [122] applied a deep neural network based on a SDAE to diagnose the faults of the rotor system. J. Yu [123] proposed a manifold regularized SDAE (MRSDAE) for planetary gearbox vibration signals. Zhan et al. [124] also utilized a SDAE combined with a SVM classifier for a permanent magnet synchronous motor used in an electric vehicle. Xu et al. [125] proposed an intelligent fault diagnosis method for metro traction motor bearings based on an improved SDAE. Xiao et al. [126] proposed a noisy domain adaptive marginal SDA for fault diagnosis of gear and motor using acoustic signals. Godói et al. [127] proposed a new denoising convolutional AE method configuration employed to the condition monitoring of rotating machines. Zhao et al. [128] combined a one-dimensional denoising convolutional auto-encoder (DCAE) with a 1D-CNN for rotating machine fault diagnosis under noisy environments. Although DAE can extract robust features and achieve remarkable results in fault diagnosis, it takes more time to select the most suitable corruption level and corrupt the input data into corrupted inputs of the DAE. Moreover, the extracted features by DAE may consist of some useless features for fault diagnosis.

Sparse Auto-Encoder (SAE).

As an extension of AE, the sparse auto-encoder (SAE) is a widely used auto-encoder that introduces the sparse penalty term, adding constraints to the hidden layer for a concise expression of the input data [129]. Compared with other deep learning methods, SAE is superior in extracting sparser features, highly discriminative, and useful for classification. As a result, many researchers have widely used SAE for fault diagnosis of rotating machines. For instance, Xin et al. [130] combined a sparse auto-encoder with softmax regression to diagnose the fault of the attachment on the blades of the marine current turbine. Zhao et al. [131] proposed a semi-supervised deep SAE with local and non-local information for fault diagnosis of rotating machines. Kim et al. [132] utilized a sparse SAE to develop a new fault diagnosis method for the gearbox. Qi et al. [133] developed an intelligent fault diagnosis method based on a SSAE, and its effectiveness was validated through rolling bearing and gearbox vibration datasets. Sun et al. [101] developed a novel intelligent diagnosis method of automatic feature learning and classification of rotating machines based on SSAE. Studies by [134,135,136,137] proposed new fault diagnosis methods based on sparse stacked denoising AE for bearing fault diagnosis. Zhang et al. [138] also developed a stacked pruning sparse denoising AE method for rolling bearing fault diagnosis. Wen et al. [139] proposed a new fault diagnosis method based on stack pruning sparse denoising auto-encoder and CNN to detect and categorized the actuator damage fault of the unmanned aerial vehicle, and showed good fault diagnosis accuracy in an actual high noise environment. Jia et al. [140] developed a local connection network (LCN) constructed by normalized sparse autoencoder (NSAE) for fault diagnosis of rotating machines. The superiority of the proposed NSAE-LCN was verified using gearbox and bearing datasets. However, the accuracy and generalization ability of sparse stacked auto-encoder is affected by its hyperparameter settings and there is no clear rule for determining the optimal hyperparameter values, which heavily depends on experimental experience. Moreover, the standard learning method employed in sparse stacked auto-encoder is time-consuming.

Contractive Auto-Encoder (CAE).

The contractive auto-encoder (CAE) is a well-known AE variant that can automatically learn more robust features and is thus suitable for dealing with noise-overwhelmed signals. The robustness of the data description is obtained by adding a contractive penalty to the error function of the reconstruction. This penalty is used for penalizing the attribute sensitivity in the input variations. CAE can handle noisy data without knowing noise intensity and has been applied successfully for robust feature extraction and fault classification. Qi et al. [141] proposed a new deep fusion network that combined the SSAE and CAE for fault diagnosis of bearing and gearbox. Fu et al. [142] also proposed a deep contractive auto-encoding network (DCAEN) for fault diagnosis of bearing. Shen et al. [143] applied a stacked contractive auto-encoder (SCAE) for feature extraction and fault diagnosis of rotating machines. Gao et al. [144] proposed a new ensemble deep CAE method for machine fault diagnosis under noisy environments. The effectiveness proposed method was verified using bearing, gearbox, and self-priming centrifugal pump datasets. However, CAE still has higher reconstruction errors during the encoding and decoding process of input features to the network that cause difficulty to capture the useful information within the feature space.

Variational Auto-Encoder (VAE).

As a generation model, variational auto-encoder (VAE) can augment the dataset by generating meaningful synthetic data similar to the original real data and has been successfully employed in fault diagnosis of different rotating machines [145]. In [146,147,148], a variational auto-encoder has been employed to create fault data of bearings. Sun et al. [149] developed a novel fault diagnosis method called conditional variational auto-encoder generative adversarial network for planetary gearboxes to solve small sample problems. However, the data augmented by the variational auto-encoder is not always real data. Thus, how to make the data samples generated by variational auto-encoder more real is still a challenge that requires further exploration.

3.5 Deep Belief Network (DBN)

Deep belief network (DBN) is a probability generation model composed of several layers of restricted Boltzmann machines (RBMs), where the output of the previously hidden layer is utilized as the input of the next layer [150] and the last layer is the backpropagation neural network. The training process of DBN comprises two stages: forward unsupervised greedy layer-by-layer pre-training and backward supervised fine-tuning process. The forward pre-training phase is an unsupervised training process that aims to extract features from bottom to top layer-by-layer. After the pre-training of multiple RBMs, the fine-tuning phase is then utilized with a backpropagation algorithm to optimize the parameters and structure of the pre-trained network to further enhance the classification accuracy. In fine-tuning, the weights and biases of every layer are updated continuously until the iteration reaches the limit [151].

Since deep belief network (DBN) is suitable for processing one-dimensional data, its applications in fault diagnosis are reported frequently. For instance, Shang et al. [152] proposed a diagnosis method based on DBN for rolling bearings, which reduces the complicated network structure to some extent. Qin et al. [153] proposed a new fault diagnosis method using a DBN for planetary gearboxes of wind turbines. Yan et al. [154] also proposed a rotor unbalance fault diagnosis method using a multi-deep belief network model with multi-sensor information. Han et al. [155] combined the DBN model with wavelet packet energy entropy and multi-scale permutation entropy to diagnose gear faults. The authors of [156] proposed a new condition monitoring method for rolling bearings by using the DBN model optimized by the multi-order fractional Fourier transform filtering algorithm and the sparrow search algorithm. Zhang et al. [157] applied a DBN algorithm to diagnose the fault of the power system, and enhanced the ability of feature extraction and fault classification by enhancing the network model. Yu et al. [158] proposed a novel fault diagnosis method by hybridizing DBN with Dempster-Shafer theory for diagnosing the wind turbine system.

The performance of deep belief networks in machine fault diagnosis depends heavily on their structure. To obtain an optimal network structure with high performance and training speed, researchers utilized various optimization techniques. In the literature [159, 160], the network structure and learning rate of DBN were optimized by using the PSO algorithm, and the diagnosis accuracy was improved. Wen et al. [161] combined the deep belief network with a fuzzy mean clustering algorithm for rolling bearing fault diagnosis without using data labels. Gao et al. [162] optimized the network architecture of a deep belief network using a salp swarm algorithm and utilized it for rolling bearing fault diagnosis. Similarly, Kamada et al. [163] used the neuron generation annihilation and layer generation algorithm to propose the adaptive structure learning method of restricted Boltzmann machine and deep belief network, and achieved remarkable success. Shen et al. [164] developed an improved hierarchical adaptive DBN optimized by Nesterov momentum (NM) for bearing fault diagnosis.

4 Discussion, Existing Challenges and Future Directions

As seen from the review provided, traditional machine learning methods and deep learning methods are widely applied in intelligent machine fault diagnosis. Intelligent fault diagnosis methods based on traditional machine learning have been widely investigated in the field of fault diagnosis of rotating machines, but they have limitations in processing massive amounts of data as useful features are extracted manually with prior expert experience. Different from traditional machine learning methods, deep learning methods can extract abstract features from massive and heterogeneous mechanical signals with the help of their multilayer nonlinear mapping ability to perform an end-to-end fault diagnosis. Table 1 reveals the strengths and weaknesses of deep learning methods applied in fault diagnosis of industrial machines.

Table 1. Strengths and weaknesses of deep learning methods.

Although deep learning methods have achieved tremendous success in fault diagnosis, there are still some practical problems that need further exploration.

  1. 1.

    Most existing deep learning methods often need a sufficient amount of labeled data for model training, which achieves great results in laboratory experiments since there is sufficient labeled data. However, it is hard to acquire massive data or even impossible in practical industrial scenarios as most machines operate in healthy conditions.

  2. 2.

    Existing deep learning methods can recognize faults accurately with the assumption that the training dataset and the testing dataset are drawn from the same machine under the same working conditions. This assumption may not hold in many real cases due to variations in machine working conditions, interference of environmental noise, etc., which leads to significant diagnosis performance deterioration.

  3. 3.

    In practical industrial applications, the collected sensor signals from rotating machines are usually polluted by various forms of noise, thereby reducing the performance of the existing fault diagnosis methods.

  4. 4.

    For long-term monitoring, it is essential to achieve early fault detection of rotating machines. However, in real practice, it is quite difficult to realize the faults at the earliest stage due to the weakness of impulse signals and the interference of environmental noise.

  5. 5.

    Furthermore, rotating machines operate for a long time under changeable conditions, and compound faults may occur simultaneously. Not only this, multiple rotating machine components may fail at a time. Most existing studies have ignored the existence of simultaneous fault problems.

Therefore, it is of great significance to resolve these practical problems and advance intelligent diagnosis methods for promising employment in modern industrial applications. The following are some possible research directions given to researchers, readers, and engineers who aim to contribute to the advancement of artificial intelligence in the fault diagnosis of rotating machines.

Active research area toward promising results.

  1. 1.

    The emergence of transfer learning provides a feasible solution to overcome the abovementioned gaps. Different from deep learning, transfer learning targets to extract knowledge obtained in the source domain and transfer it to resolve a different but similar problem in another domain task. Therefore, transfer learning is becoming an active research area in the field of intelligent machine fault diagnosis.

  2. 2.

    Based on the review provided, some deep learning methods have strong feature extraction capabilities and others have limitations in fault classification. To break the limitation of a single method, researchers still have great possibilities to propose hybrid deep learning-based fault diagnosis methods for rotating machines.

5 Conclusions

This paper reviewed the applications of artificial intelligence methods for the diagnosis of the faults of rotating machines. The observations, research gaps, and some new research prospects in this research area are discussed. From the review, it is concluded that deep learning methods have better feature learning ability, better adaptability, and a more flexible network structure as compared with conventional machine learning methods. However, their applicability in fault diagnosis is highly restricted by the amount and quality of the training data, the variation of operating conditions, the disturbance of background noise, the weakness of early failure detection, and the occurrence of hidden simultaneous faults. To address these limitations, transfer learning is becoming a hot research topic in machine fault diagnosis. In addition, new intelligent diagnosis methods are needed to be able to combine the advantages of both methods in the future. In the future, the authors will continue to review the applications of transfer learning for the diagnosis of faults of rotating machines.