1 Introduction

The majority of industrial machines used in modern industries are either rotating machines or have rotating elements such as gears, bearings, shafts etc [1,2,3,4,5,6,7]. As these machines need to work in harsh and tough environmental conditions, there are very high chances of generation of faults. The rotating elements are generally subjected to a variety of faults such as gear chipped tooth, surface pitting, root crack of gear, bearing inner-race defect, outer-race defect, ball defect, cage defect, bent shaft, misalignment, unbalance etc. [8]. Failure of any of the sub components of these machines may leads to a complete shutdown of the whole system/plant. Hence it is imperative to diagnose the faults at its incipient stage to avoid any further catastrophic accidents and also to track the changes and evolution of fault severity over time [9,10,11,12,13].

Gearbox plays the most essential role in modern machinery and is considered to be the heart of plant. As gearbox performs its tasks under varying load and speed conditions, the probability of incipient failure is very high. There are many reasons for gearbox failure and some of them to mention are insufficient lubrication, sudden and excessive application of load, misalignment, poor fitting etc. An appropriate maintenance strategy can be implemented to minimize any unplanned downtime [14, 15]. Condition monitoring and fault diagnosis, a predictive maintenance strategy, plays a significant role in predicting and preventing any catastrophic accidents. In routine industrial practice, \(\hbox {International Organization for Standardization (ISO)}\) standards related to vibration, lubrication and acoustics are followed to distinguish between a healthy and faulty state of any machinery. However, \(\hbox {ISO}\) standards fails to address the component-wise health status of any machinery. To overcome these problems, industrial people create pre-defined statistical limits of a particular machine/sub-components based on their past experience. However, these limits can’t be generalized and hence its usage is limited.

With the technological advancement of computation power and availability of new sensors, researchers looked for advance signal processing techniques for the health monitoring of industrial assets. Fault diagnosis is automatically detected and health condition of machines are easily monitored with the help of \(\hbox {Artificial Intelligence (AI)}\) [16,17,18]. The most common followed framework for the detection of incipient faults in rotating machines is shown in Fig. 1. The process starts with the data acquisition followed by signal pre-processing/denoising. Then statistical feature processing is carried out and at last an \(\hbox {AI}\) technique is utilized to predict the health status of machine/component under consideration.

Fig. 1
figure 1

Framework for detection of Incipient faults in rotating machine

Machine predictive maintenance has attracted more and more attention from academic researchers and industrial sector in recent years. Figure 2 shows the increasing trend in variation of number of publications for incipient fault diagnosis in rotating machinery in past 30 years from 1991–2021. The number of publication in last decade from 2011–2021 has been increased more than 75% which shows that more and more research is being done to diagnose faults in incipient stage in order to prevent any catastrophic failure and financial losses.

Fig. 2
figure 2

Bar chart representing the number of publications in past 30 years used in this review paper for incipient fault diagnosis in rotating elements

It is imperative to mention that there are some excellent review papers among these publications. Among them some focused on the prognostics of rotating machinery and gave a brief review on the remaining useful life prediction approaches [19,20,21]. Some of them focused on remaining useful life prediction application in industrial settings [22, 23]. The aforementioned papers have given interesting reviews related to machinery prognostics. However, they have some limitations such as most of the review articles [19,20,21,22,23] were published six years ago. Also, these articles just focused on the last block of machine health prediction, i.e., remaining useful life prediction using various \(\hbox {AI}\) methods. The other three blocks of predictive health monitoring, i.e., data acquisition, signal pre-processing and feature processing, however, were ignored by the existing literature. In conclusion, it lacks a systematic literature review covering the whole scenario of the machine predictive health monitoring about its advancements in recent years. This paper fills these gaps and gives a systematic overview covering all four blocks of machine predictive health monitoring approach in order. Compared with the existing review papers, the major contributions of this paper are as follows.

  • This paper segments the whole program of machine predictive health monitoring into four blocks, i.e., data acquisition, signal pre-processing, feature processing, \(\hbox {AI}\) and reviews them systematically in order.

  • This paper provide a guidemap for researchers and industrial personals in the field of fault diagnosis and prognosis to help them in selecting an appropriate method to identify incipient fault severity according to their research requirements.

  • This paper covers the signal pre-processing and feature processing segments, which are significant for machine fault diagnosis but always ignored by the existing literature review.

The remaining of this paper is organized as follows: Sect. 2 gives brief idea for data acquisition. Section 3 reviews the pre-processing of vibration signal using various signal processing methods. Section 4 provides knowledge about feature extraction, selection and dimensionality reduction. Section 5 describes the \(\hbox {AI}\) based techniques. The conclusions are drawn in Sect. 6.

2 Data Acquisition

Data acquisition is a process in which data is captured and stored by different sensors that are mounted on the machine. It gives basic condition monitoring information as it is the first step of machine prognostics. The most commonly used sensors includes accelerometers, acoustic emmission sensors, infrared thermometers, current sensors etc. Among all, vibration sensors are mainly used in faults diagnosis of bearings [17, 24,25,26] and gears [27,28,29,30]. Faults at incipient stage in bearings [31,32,33,34,35,36,37,38,39,40] and gears [41,42,43,44,45,46,47,48,49,50,51,52] can be detected by acoustic emission sensors when machines are working at low speed and enviornmental noise is having low frequency. Khamisan et al. [53] used \(\hbox {Infrared Thermography (IRT)}\) for detection of incipient faults in bearings. Duan et al. [54] presented \(\hbox {IRT}\) to detect faults in rotating machinery such as shaft misalignment, rotor radial rubbing, base looseness, coupling unbalance and misalignment faults. Faults in bearings [28, 55,56,57,58,59,60,61,62] such as outer race damage, inner race damage, roller element damage, multi-fault damage, and worn damage can be detected by using \(\hbox {IRT}\). Artigo et al. [63] used current signature and vibration signal to diagnose faults in wind turbines. Naha et al. [64] proposed a low-complexity fault detection algorithm based on the sub-Nyquist sampling of the analytic current signal. The algorithm has been tested successfully for the detection of variety of localized bearing faults under different loads and supply frequency conditions. Antonino et al. [65] used current signal for the detection of different types of mechanical faults such as misalignment caused by loosened bolts and soft foot, as well as coupling unbalance. Park et al. [66] used \(\hbox {Motor Current Signature Analysis (MCSA)}\) for detection of faults in induction motors. Artigao et al. [63] used current signature to detect faults in Doubly-fed induction generators. Aouabdi et al. and other [64, 67] used \(\hbox {MCSA}\) with \(\hbox {Principal Component Analysis (PCA)}\) to detect localized gear tooth defects, such as pitting in gearboxes. Figure 3 summarize the various sensor technologies and its applicable application for general mechanical component in a typical plant. Figure 4 summarize each technology’s capability to detect the symptom related to the fault.

Fig. 3
figure 3

Sensor technology and its proven applications

Fig. 4
figure 4

Summary of general symptoms of faults and sensor technologies which can detect those symptoms

2.1 Epilog

Data is mainly captured in the form of vibration, acoustic, temperature and current signal. Among all, vibration analysis plays an important role in fault diagnosis of rotating elements due to its sensitivity towards fault progression. Vibration transducers are generally mounted on the machine bearing locations to detect changes in vibration amplitude. However, some specific applications such as hot and corrosive environment does not allow to place a physical transducer on the surface of the machine. Hence, an acoustic transducers can be used for such applications. Contact-less thermal sensor is another way of capturing the data from the machine to correlate the health status. For motor condition monitoring, current probes are used to capture and analyze the motor current for possible faults. Many researchers has used multiple sensor types to detect the fault presence as different sensing technologies compliment each other to accurately confirm the fault. Hence, a variety of transducers are available to the user depending on conditions to capture the raw data. A summary of each sensor technology and its application to detect faults in various machinery has been done for readers ease.

3 Pre-processing (Signal De-noising)

Various mechanical components, particularly gearbox, designed to operate at variable speeds and sometimes at a very low speed making it difficult to analyze using raw transducer signals alone. These low energy low frequency signals are masked under strong environmental noises and later is difficult to discover any faulty signal hidden in them. In order to detect the presence of incipient faults in these mechanical components, various signal denoising techniques have been discovered, tested and used in past and their efficacy have been documented in the literature. Signal pre-processing methods are classified based on their specific utility for diagnosis of faults such as time-domain, frequency-domain and time-frequency domain methods and are discussed in detail in subsequent sub-sections.

3.1 Time Domain Methods

Time domain signal processing is generally used for fault diagnosis of components where severity of fault produces a periodic shocks/peaks in time domain signal. However, its utility comes under questions when there is variable speed application. McFadden [68] studied a technique in which time domain averages can be calculated in case of planetary gearbox. Wu et al. [69, 70] studied statistical indicators such as root mean square and standard deviation so that faulty and healthy condition can be distinguished in planetary gearbox in helicopter. Bartelmus and Zimroz [71] studied the impact when the load conditions were continuously varying on vibration signals of planetary gearbox. Yip [72] investigated \(\hbox {Time Synchronous Averaging (TSA)}\) method for pre-processing of vibration data and after that healthy indicators were extracted from this pre-processed data to detect planetary gearbox used in oil sand operations. Keller and Grabill [73] used parameters such as FM0 and FM4 that were modified in planetary gearbox for detection of faults.

Fig. 5
figure 5

Tachometer arrangement to collect TSA signal

Fig. 6
figure 6

TSA signal extraction methodology

\(\hbox {TSA}\) is commonly used when attempting to diagnose gear faults. \(\hbox {TSA}\) average away all of the vibration sources that are not synchronous with the tachometer pulse, which is taken from the shaft of the gear of interest as shown in Figs. 5 and 6. This means that other sources of vibration, from bearings, the motor, resonances, and so on, are removed, leaving a clean time waveform. The \(\hbox {TSA}\) is time consuming test (a large number of averages is required), and it is time consuming to set up in the first place. However the results are worth all the effort.

3.2 Frequency Domain Methods

In time domain signal processing methods it is difficult to realize the exact fault location of sub-components of a machine as the information is limited in time domain only. Whereas, different components of a machine exhibits different frequencies when they run. Hence, frequency domain methods comes in to picture to resolve the issues of time domain methods and are used by many researchers for pinpoint fault diagnosis and condition monitoring of mechanical components. To detect faults in incipient stage, Mark et al. [74] investigated a frequency domain method which removes the effects of transducers and structural path caused amplitude changes. Sparis and Vachtsevanos [75] investigated index vectors using \(\hbox {Fast Fourier Transform (FFT)}\) to differentiate between faulty and healthy planetary gearboxes. Hines et al. [76] investigated a frequency domain feature known as energy ratio for diagnosis of planetary gearbox and this feature is based on \(\hbox {TSA}\) technique for pre-processing of data.

3.3 Time Frequency Domain Method

There is a limitation with frequency domain signal processing methods that while converting time domain signal to frequency domain, it average out the frequency amplitude with respect to time. Hence, for a system which is changing its dynamics with time is hard to analyze with frequency domain. Thus to overcome this issue time-frequency domain signal processing methods comes into play as these can represent a signal both in time and frequency domain. There are many time-frequency domains methods such as Wigner-Ville distribution and Wavelets for diagnosis of planetary gearboxes. Zimroz et al. [77, 78] analysed instantaneous speed and vibration for the gearbox working under non-stationary condition. Meltzer and Ivanov [79, 79] used time-frequency method for analysis of faults in planetary gearboxes in automobiles. Liu et al. [80] used \(\hbox {Local Mean Decomposition (LMD)}\) for diagnosis of crack fault in wind turbines. Saxena et al. [81] used Morlet wavelets for extracting features to distinguish between faulty and healthy planetary gearbox. Samuel and Pines [82] used multiple sensor methodology to separate vibration signal and later this signal is analysed with \(\hbox {Continuous Wavelet Transform (CWT)}\) to diagnose gear fault.

Jiang et al. [83] presented a method for denoising which is based on adaptive Morlet wavelets and singular value decomposition which extract impulse features in planetary gearbox. Zhang et al. [84,85,86,87] used a blind deconvolution denoising method to diagnose crack in a planetary gearbox. Bonnardot et al. [88] denoised the signal by an unsupervised order tracking algorithm to detect faults in bearings in planetary gearbox of helicopter. Lei et al. [89] used de-noised method named adaptive stochastic resonance to detect a chipped tooth and a missing tooth faults in gears. There are many adaptive mode decomposition methods have been developed, such as \(\hbox {Empirical Mode Decomposition (EMD)}\), \(\hbox {Ensemble Empirical Mode Decomposition (EEMD)}\), and \(\hbox {LMD}\), \(\hbox {Empirical Wavelet Transform (EWT)}\) [90], and \(\hbox {Variational Mode Decomposition (VMD)}\) [91]. In \(\hbox {EMD}\) vibration signal is decomposed into series of \(\hbox {Intrinsic Mode Functions (IMFs)}\) [92]. These \(\hbox {IMFs}\) can be analysed to determine the health status of rotating machines. Dybala et al. [93] diagnosed faults at incipient stage by using \(\hbox {EMD}\). The signal was decomposed into \(\hbox {IMFs}\) and then spectral analysis was used to get \(\hbox {IMFs}\) in case of faulty bearing. Li et al. [12] used adaptive multiscale morphological analysis after using \(\hbox {EMD}\) to decompose signal into \(\hbox {IMFs}\) in case of early faults in bearings. Zhao et al. [94] used \(\hbox {EMD}\) and approximate entropy for detection of incipient faults in bearings. Lv et al. [95] modified \(\hbox {EMD}\) to analyse the vibration signal for detection of faults at early stage. Parey et al. [96] used \(\hbox {EMD}\) and variable cosine window for gearbox fault diagnosis. There are many disadvantages in \(\hbox {EMD}\) method such as boundary effects [97], mode mixing [98] and over- and undershoot problems [99]. Hu et al. [100] pre-processed the vibration signal by \(\hbox {EMD}\) and then energy ratio was calculated to diagnose faults in electric fan. Drawbacks of \(\hbox {EMD}\) can be overcome by \(\hbox {EMD}\) which is proposed with the help of white noises [101]. Chen et al [102] proposed \(\hbox {EEMD}\) for turbine gearbox fault diagnosis. Wang et al. [103] selected \(\hbox {IMFs}\) by tunable Q-factor \(\hbox {Wavelet Transform (WT)}\) to diagnose incipient fault diagnosis in bearings. Vokelj et al. [104] proposed Independent Component Analysis to select \(\hbox {IMFs}\) to diagnose early faults in bearings. Guo et al. [105] used an enhanced \(\hbox {EEMD}\), to diagnose early faults in bearings. Shifat et al. [106] used denoising technique by \(\hbox {EEMD}\) which decomposed vibration signals into \(\hbox {IMFs}\) and the \(\hbox {IMFs}\) were selected was analyzed in time-frequency domain by \(\hbox {CWT}\). Liang et al. [107] used \(\hbox {CWT}\) to diagnose faults in gearbox. In 2005 Smith proposed another widely used adaptive decomposition method named \(\hbox {LMD}\) [108]. \(\hbox {LMD}\) is self-adaptive decomposition method which decomposes a complicated vibration signal into a series of \(\hbox {Product Functions (PFs)}\). \(\hbox {LMD}\) has one main advantage in comparison to \(\hbox {EMD}\) and \(\hbox {EEMD}\) that the \(\hbox {Instantaneous Frequency (IF)}\) of each \(\hbox {PFs}\) can be calculated directly by \(\hbox {LMD}\) without using the Hilbert transform. Liu et al. [80] proposed \(\hbox {LMD}\) which can decompose vibration signal and frequency modulated signal can be calculated by \(\hbox {IF}\) to diagnose gearbox fault in wind turbines. Li et al. [109] used \(\hbox {LMD}\) method which decomposed the vibration signal by differential rational spline-based method to diagnose incipient faults in gears and bearings. Li et al. [110] used \(\hbox {LMD}\) for pre-processing of vibration signals to detect gear faults at incipient stage. In 2013 Gilles [111] proposed a novel adaptive decomposition method known as \(\hbox {EWT}\). The drawbacks of wavelet and \(\hbox {EMD}\) can be overcome by \(\hbox {EWT}\). Chen et al. [112] identified weak faults and compound diagnosis of faults by using \(\hbox {EWT}\) in which signals were decomposed into mono-components under an orthogonal basis to extract inherent modulation information. Boualem et al. [113] used \(\hbox {EWT}\) method with Hilbert transform to detect incipient faults of tooth crack fault signals. Lu et al. [114] used kurtogram for denoising the signal and for the location of fault frequency. Then, this denoised signal was filtered using \(\hbox {EWT}\). Dragomiretskiy et al. [91] proposed \(\hbox {VMD}\), which decompose vibration signal into an ensemble of band-limited \(\hbox {IMFs}\). Ma et al. [115] proposed adaptive scale spectrum segmentation to determine \(\hbox {IMFs}\) obtained from vibration signal by \(\hbox {VMD}\). Finally, the Teager energy operator of the effective \(\hbox {IMFs}\) components was applied to realize the incipient fault identification.

Li et al. [116] proposed \(\hbox {VMD}\) and improved autoregressive-minimum entropy deconvolution in rotating machinery having multiple faults. Yang et al. [117] optimized \(\hbox {VMD}\) to decompose the vibration signal and to diagnose incipient faults in milling operations by online chatter identification method. Guo et al. [118] used parameter optimization algorithm by selecting weak features by using \(\hbox {VMD}\). Cao et al. [119] proposed algorithm which combined \(\hbox {VMD}\), \(\hbox {Permutation Entropy (PE)}\) and wavelet threshold (\(\hbox {VMD}\)-\(\hbox {PE}\)-wavelet threshold) for denoising signals. This combination of \(\hbox {VMD}\) and \(\hbox {PE}\) solved the problem of mode-aliasing and also identified noisy components easily with the help of wavelet threshold denoising. Kumar et al. [120] detected faults in bearings by using combination of \(\hbox {VMD}\), genetic algorithm and kernel based mutual information. Li et al. [121] proposed denoising algorithm based on \(\hbox {VMD}\) and average periodic energy in case of power quality signals. Hu et al. [122] used processing of signals by \(\hbox {VMD}\) to diagnose faults in rotating machinery. Dibaj et al. [123] proposed hybrid method of \(\hbox {VMD}\) and \(\hbox {Convolutional Neural Networks (CNN)}\) for diagnosis of faults in gears and bearings. Fan et al. [124] used wavelet-based approach for statistical signal detection to monitor and diagnose the bearing fault at an incipient stage. Cui et al. [125] proposed \(\hbox {WT}\) and time-frequency analysis for incipient fault in bearings. Cui et al. [126] proposed \(\hbox {WT}\) for denoising signal and grey correlation method was used for identification of fault. Wang et al. [127] detected fault by adaptive wavelet stripping algorithm to detect incipient fault in bearing. Combet et al. [128] detected local damage in gears by \(\hbox {WT}\) by using the instantaneous wavelet bicoherence as fault features. Chen et al. [129] selected sensitive frequency bands of multiwavelet packet coefficients by using energy ratio to detect faults in rotating machines. He et al. [130] extracted fault detection at incipient stage by maximal-overlap adaptive multiwavelet method. Fan et al. [131] denoised the signal by \(\hbox {Discrete Wavelet Transform (DWT)}\) and the signal is decomposed into different levels and then statistical parameters were used to detect gear faults. Kumar et al. [132] used HAAR wavelet to extract wavelet coefficients to diagnose faults in rotating machines. Yang et al. [133] denoised the signal by combining \(\hbox {EMD}\) and Wavelet Packet Decomposition to detect weak features and faults in bearings. Morsy et al. [134] detected the incipient faults in bearing by applying the Morlet wavelet filter for preprocessing. Yiakopoulos [135] combined Morlet wavelets and morphological analysis for feature extraction to detect faults in bearings. Sharma et al. [136] used Flexible Analytical \(\hbox {WT}\) and \(\hbox {DWT}\) to denoise the signals for detecting faults in roller bearings in rotary machines. Younus et al. [137] used 2D-\(\hbox {DWT}\) for detection of faults such as mass unbalance, misalignment and bearing faults in rotating machinery. Dong et al. [138] proposed hybrid method by combination of spectral graph \(\hbox {WT}\)s and detrended fluctuation analysis for denoising of non-stationary vibration signals. Jiang et al. [139] proposed denoising method named nonconvex wavelet thresholding total variation to diagnose faults in planetary gearbox. Minhas et al. [140] denoised vibration signal by complementary \(\hbox {EEMD}\) method in which signal is decomposed into \(\hbox {IMFs}\) and significant \(\hbox {IMFs}\) was achieved by Hurst exponent threshold analysis to detect faults in bearings. Cheng et al. [141] proposed an adaptive weighted symplectic geometry decomposition method to denoise the signals to detect incipient faults in gears. Adaptive weighted symplectic geometry decomposition method defines cycle kurtosis and periodic impact intensity to measure amount of fault information of the component for excellent performance in denoising. Li et al. [142] used denoised algorithm which was based on spectral characteristics and multipoint optimal minimum local mean entropy deconvolution adjusted to extract the characteristics frequency of bearings having strong noise interference. Mukherjee et al. [143] used hybrid method of \(\hbox {TSA}\) and J48 algorithm to detect faults in gearbox. \(\hbox {TSA}\) was used for denoising vibration signals and J48 was used for features selection and classification. Mansi et al. [144] proposed the use of Maximum Overlap Discrete Wavelet Transform to pre-process the vibration signal from gearbox. This technique allows to segregate noise from actual fault signal. Various \(\hbox {AI}\) techniques were then applied for accurately identify the fault stage in gearbox. Afia et al. [145] used \(\hbox {Maximal Overlap Discrete Wavelet Packet Transform (MODWPT)}\) for pre-processing of vibration signal in gears. These signals were decomposed into nodes and then auto correlation was computed to calculate kurtosis at each level of decomposition. Table 1 sum up the methods available in the literature used for fault diagnosis of various mechanical components at incipient state.

Table 1 Summary of various denoising techniques in incipient fault diagnosis in rotating elements

3.4 Epilog

Pre-processing of signals can be analyzed in time-domain, frequency-domain and time-frequency domain based on the component under investigation, speed and loading conditions. Time domain signals are generally used to analyze the faults in gearbox. Whenever there are variable speed and loading conditions and presence of faults in other components such as bearings are to be analyzed then time domain signals are not sufficient to detect faults. In such case frequency domain signals are used for fault detection. In industries the machines which are driven by variable speeds drives, their speed various as per need and hence time-frequency domain methods are comes into play to visualize the \(\hbox {FFT}\) values with respect to time variation.

4 Feature Processing

The main aim of feature processing is to extract and reveal the most relevant information which shows some direct relation in terms of value with the increasing fault severity in order to determine the health condition of machines. Various steps involved in feature processing are feature extraction, feature selection and dimensionality reduction. Different features are generally extracted from vibration signal to construct multi-dimensional feature sets. Then algorithm for dimension reduction is used to generate feature that are more suitable with less dimension. The detail of each step is described in the subsequent sub-sections.

4.1 Feature Extraction

In feature extraction various features are extracted which contains information to determine the health condition of machines. The features can be time-domain, frequency domain and time-frequency domain. There are two categories of time domain features, one having dimensions and other dimensionless. Dimensional features includes mean, standard deviation, root amplitude, root mean square, peak value, etc. and these features are affected when the load and speed of machine changes. The dimensionless features includes shape indicator, skewness, kurtosis, crest indicator, clearance indicator, impulse indicator, etc. and these features do not depend on operating conditions of machines [146, 147]. The features which are extracted from frequency spectrum are frequency domain features. These features includes mean frequency, frequency center, root mean square frequency, standard deviation frequency etc. [146, 147]. The information that is not present in time domain features are contained in frequency domain features. The time-frequency domain features, such as energy entropy [146, 147], are usually extracted by \(\hbox {WT}\), \(\hbox {Wavelet Packet Transform (WPT)}\) or \(\hbox {EMD}\). Under non stationary conditions, these features are unable to reflect health condition of machines. Here, the statistical features [148], such as mean, variance, root mean square, skewness, or kurtosis, are commonly operated in the time domain feature extraction process. In the frequency domain, \(\hbox {FFT}\) [149], discrete Fourier transform, power spectrum analysis, autoregressive model [150], eigenvector, envelope analysis, and Welch’s method are the frequently applied approaches for the feature extraction. The most popular methods which include \(\hbox {Short-Time Fourier Transform (STFT)}\) [151], \(\hbox {EMD}\), and wavelet packet decomposition [152]-fall under the time frequency domain analysis, which also includes methods such as Hilbert- Huang transform, Hilbert transform [153], and \(\hbox {WT}\) [137]. For non-linear and non-stationary signal processing \(\hbox {EMD}\) and Hilbert- Huang transform shows better performance. Li et al. [154] proposed method for feature extraction based on improved Multipoint Optimal Minimum Entropy Deconvolution Adjusted and Teager-Kaiser energy operator for diagnosis of bearings faults. Minhas et al. [140] extracted faulty features by weighted multiscale entropy methods to detect faults in bearings. Ekici et al. [155] proposed method \(\hbox {WPT}\) to extract faulty features and the energy and entropy of wavelet packet coefficients are calculated for each faulty waveform in transmission lines. Liang et al. [156] proposed new method called maximum average kurtosis deconvolution to extract fault signature in rotating machines. Also, average kurtosis was used which calculated the kurtosis of each impulse and average was taken in order to evaluate fault information in the signal. Gao et al. [157] proposed a hybrid method of L-kurtosis and enhanced clustering-based segmentation to extract fault features from background noise for detecting faults in case of hydraulic machines. Wei et al. [158] proposed hybrid method for the extraction of features called refined composite hierarchial fuzzy entropy in combination with random forest to diagnose incipient faults in planetary gearbox. Jing et al. [159] used peak-to-average ratio to extract features for diagnosis of faults in gears.

4.2 Feature Selection

Health status of machines can be known by selecting sensitive features by feature selection methods such as filters, wrappers and embedded methods from the extracted features. In Filter-based methods, pre-processing of collected features can be done by Filters [160]. Some filters used by various researchers are Relief [161] and Relief-F [162] to determine the health condition of machines. Information gain and gain ratio [163], are methods for feature selection. The features having higher gain ratio and higher information gain were selected to diagnose of faults. Minimum Redundancy Maximum Relevance [164] was used to select features. Distance evaluation [165, 166] was used for feature selection by distance metric. In Wrapper-based methods, the feature set which was selected was assessed by the performance of classifiers. Las vegas wrapper [167] was used for feature selection in which the Las Vegas algorithm is employed to search for the feature subset. Zhao et al. [168] searched the optimal feature subset by Polyserial and Pearson correlation coefficients as evaluation metrics to diagnose gearbox faults. Liu et al. [169] used feature selection method as cosine similarity in the Gaussian radial basis kernel space as a criterion and with the sequential backward algorithm to detect fault in gearbox. Cheng et al. [170, 171] proposed two-sample Z-test which is used for feature selection for estimation of crack level to detect gearbox faults. Wand and Shao [172] proposed an improved hybrid feature selection technique and the features which were not relevant were removed by using the distance evaluation technique and Pearson’s correlation analysis as the evaluation metrics. Li et al. [34] used hybrid approach of multi-scale morphological filter and Laplacian score so that interruption which was not related to fault was removed and refined the fault features extracted using the modified hierarchical \(\hbox {PE}\). Dybala [173] used selected better feature subset by a novel noise-assisted feature subset evaluation method and used nbv-based classifier for final classification. Barkowiak and Zimroz [174] used shrinkage operator (Lasso) to select features for gearbox fault diagnosis. Therefore, in case of multi-dimensional features feature selection and feature extraction both are used for dimension reduction for gearbox fault diagnosis. But there are many advantages and disadvantages of these methods. In feature selected methods, the selected features contain less information and the features which are not selected are not considered while in case of feature extraction all the feature information are concentrated in selected features. So feature extraction-based algorithm is used where one wants to keep most information from selected features. On the other hand, feature selection method is used if one wants the employed low dimension feature set with physical meanings. Liu et al. [175] combined the advantages of both feature selection and feature extraction by proposing a hybrid method. The kernel feature selection method and kernel Fisher discriminant analysis were used sequentially for removing irrelevant features and establishing more compact features vector with Gaussian radial basis function as the kernel function.

4.3 Dimensionality Reduction

In dimensionality reduction features having less dimensions are generated using statistical dimensionality reduction strategies which are of two types, linear and non-linear. Linear method includes \(\hbox {PCA}\) and \(\hbox {Linear Discriminant Analysis (LDA)}\) for dimensionality reduction. Zimroz et al. [176] used \(\hbox {PCA}\) and canonical discriminant analysis for dimensionality reduction of features to detect gearbox faults. Besides linear methods there are non linear methods for dimensionality reduction named kernel function-based and eigen value-based methods. Cheng et al. [177] proposed kernel \(\hbox {PCA}\) for dimensionality reduction of features which is kernel-based to diagnose gear faults. Tang et al. [178] proposed algorithm called orthogonal neighborhood preserving embedding for dimensionality reduction of features for diagnosis of faults in wind turbines. Chen et al. [179] proposed Laplacian eigenmaps algorithm to identify faults in gears. Many researchers have developed specially designed features for dimensionality reduction of features. Some of them are Lei et al. [180, 181] used four statistical features, root mean square of the filtered signal, normalized summation of positive amplitudes of the difference spectrum, Accumulative amplitudes of carrier orders and Energy ratio based on difference spectra, Hu et al. [182] used correlation dimension, Liang et al. [183] used energy for diagnosis of gear faults. Shifat et al. [106] used \(\hbox {PCA}\) for dimensionality reduction to diagnose faults at incipient stage in Brushless DC motor.

4.4 Epilog

Feature processing consists of feature extraction, feature selection and dimensionality reduction. Feature extraction is a crucial stage in the fault detection process and can be done in time domain, frequency domain or time-frequency domain. Then the sensitive features are selected by various methods. There may be many features which provide the same information or some may provide no meaningful information so dimensionality reduction method is used after feature selection. It reduces the dimensions of the datasets thereby preserving the maximum statistical information. These selected features are input to \(\hbox {AI}\) algorithms which are described in the next section.

5 AI Based Techniques for Incipient Fault Diagnosis

To predict faults based on \(\hbox {AI}\) in final phase, various machine learning models are employed. The most widely used machine learning classifiers for fault diagnosis are the \(\hbox {Support Vector Machine (SVM)}\) model [184], the \(\hbox {Artificial Neural Network (ANN)}\) model [185], \(\hbox {k-Nearest Neighbors (k-NN)}\) [186] etc. Many researchers introduced the AdaBoost [187] algorithm. Khazaee et al. [188] classified different health condition of planetary gearbox such as healthy, the ring gear with a worn tooth, and a planet gear with a worn tooth by \(\hbox {SVM}\). Khawaja et al. [189] detected a growing crack in planetary gearbox by Least Squares \(\hbox {SVM}\). Liu et al. [190] identified various damage in planetary gearbox by combining \(\hbox {SVM}\) and \(\hbox {LDA}\). Qu et al. [191] used various feature selection methods that are based on \(\hbox {SVM}\) to classify faults. Samuel and Pines [191] used a classifier named as self-organizing neural network that automatically detect faults in planetary gear of helicopter. Li et al. [192] used \(\hbox {k-NN}\) algorithms on fault features that were extracted from vibration and acoustic emission signals in order to diagnose faults in planetary gearboxes. \(\hbox {SVM}\) is a supervised machine learning method for fault diagnosis. Platt et al. [38] and Hsu et al. [193] compared the performance of one-against-all with one-against-one for diagnosing better accuracies. To overcome the weakness of one-against-all and one-against-one, the direct acyclic graph [194,195,196] and the binary tree [34, 38, 150, 178, 193, 197,198,199,200,201,202,203] were used. The modified \(\hbox {SVM}\) was applied to machine fault diagnosis, such as the least square \(\hbox {SVM}\) [9, 204,205,206,207,208], the proximal \(\hbox {SVM}\) [209,210,211], the one-class \(\hbox {SVM}\) [212], the hyper-sphere-structured \(\hbox {SVM}\) [213], the wavelet \(\hbox {SVM}\) [214,215,216,217], the ensemble \(\hbox {SVM}\) [218, 219], the fuzzy \(\hbox {SVM}\) [220, 221], the multi-kernel \(\hbox {SVM}\) [222, 223], and the relevance vector machine [224,225,226], which achieved better diagnosis performance than the conventional \(\hbox {SVM}\)-based approaches. To select the parameters of \(\hbox {SVM}\) various optimization algorithms were used such as the sequential minimal optimization [31, 227,228,229,230], the genetic algorithm [231,232,233,234], the Particle Swarm Optimization (PSO) [235,236,237,238], and the ant colony optimization [239]. \(\hbox {SVM}\) offers some disadvantages. First, small number of data can be handled by using \(\hbox {SVM}\). However, there is difficulty in fitting the massive data. Second, the performance of \(\hbox {SVM}\)-based diagnosis models is sensitive to the kernel parameters as kernel parameters are not appropriate and reliable results can not be diagnosed. Thirdly only binary classification tasks can be solved by \(\hbox {SVM}\) algorithm. \(\hbox {k-NN}\) is one of the supervised machine learning classification methods in which \(\hbox {k-NN}\) are determined by calculating Euclidean distance between testing and training samples [240]. In this method, k samples can be searched by distance metric. Georgoulas et al. Georgoulas et al. [241] used \(\hbox {k-NN}\) to locate faults in bearings and after transforming raw signal into discrete component which was represented by a histogram. Gao et al. [242] extracted features by combining S transform and morphological pattern spectrum to diagnose faults in bearings. Rajeshwari et al. [243] extracted features by using \(\hbox {EEMD}\) and dimensionality reduction was done by combining hybrid binary bat algorithm and machine learning algorithm to select the predominant features and then the incipient faults were detected by using \(\hbox {k-NN}\). Geramifard et al. [244] preprocessed the vibration signal by hidden Markov model and the parameters of this Markov model were used as input of \(\hbox {k-NN}\) to detect faults in motors. Many researchers used \(\hbox {k-NN}\) to diagnose incipient faults in various machine elements such as rolling element bearings [245,246,247,248,249,250,251,252], gears [253,254,255,256], and motors [257]. \(\hbox {k-NN}\) faced some problems such as neighborhood boundary was not distinguishable and there was difficulty to select the optimal neighborhood parameter. Lei et al. [258] used weighted \(\hbox {k-NN}\) method to detect faults in bearings. In this method the features that were extracted were weighted to train the model to detect the heath condition of machines. Zhao et al. [259] used Euclidean weighted \(\hbox {k-NN}\) model in which extracted features were weighted by using Euclidean distance for bearing fault diagnosis. Li et al. [260] proposed the optimized evidence-theoretic \(\hbox {k-NN}\) classifier to detect faults in bearings and to improve accuracy. Dong et al. [261, 262] used the \(\hbox {k-NN}\) classifier which was optimized by the \(\hbox {PSO}\) algorithm which improved accuracy for diagnosis than the method used without optimization. Pandya et al. [263] modified the \(\hbox {k-NN}\) algorithm that was based on asymmetric proximity function which improved diagnosis accuracy in case of bearings. The diagnosis that are based on \(\hbox {k-NN}\) are easily implemented. But the cost for computation is more which handles large volume dataset. Shifat et al. [106] used \(\hbox {k-NN}\) for diagnosing faults at incipient stage in brushless direct current motor. Hu et al. [100] proposed algorithm for diagnosis of faults in electric fan in which Least square \(\hbox {SVM}\) as primary classifier and \(\hbox {k-NN}\) as secondary classifier. Lu et al. [264] proposed enhanced \(\hbox {k-NN}\) for fault diagnosis in gears. In this proposed method nearest neighbors were selected automatically and features were extracted by unsupervised methods. Cao et al. [265] proposed deep learning novel method named Y-net for diagnosis of faults in planetary gearbox. Zhao et al. [266] proposed unsupervised algorithm named L12 sparse filtering to extract fault features in bearings. Li et al. [267] used unsupervised convolutional autoencoders model with silhouette coefficient to detect faults in rotating machines such as inner race in bearings and spalling, tooth faults in case of gears. Li et al. [268] used k-means clustering approach to diagnose faults in rotating machinery. Chen et al. [269] used unsupervised networks to detect faults in bearings and gears. These networks improve the knowledge transfer from the labeled vibration signals (source domain) to the unlabeled vibration signals (target domain). Ali et al. [270] used unsupervised network named adaptive resonance theory 2 to diagnose faults in bearings such as inner race, outer race and ball defect. Pacheco et al. [271] proposed unsupervised algorithm attribute clustering algorithm using Rough Set theory that combines the advantages of k-means and \(\hbox {k-NN}\) algorithms to detect faults in bearings and can give better accuracy with unlabeled data. Tao et al. [272] proposed algorithm ST-\(\hbox {Categorical Generative Adversarial Networks (CatGAN)}\) which is combination of \(\hbox {STFT}\) as well as \(\hbox {CatGAN}\) for the fault diagnosis of rolling bearings. Raw 1-D vibration signals were transformed to 2-D by \(\hbox {STFT}\) and then this signal was served as input to \(\hbox {CatGAN}\). this algorithm results in strong ability to extract features with better robustness under different motor load. Zhang et al. [273] compared various neural network-based approaches for fault diagnosis in bearings, including basic neural network, deep neural network, stacked autoencoders, \(\hbox {CNN}\) and deep \(\hbox {CNN}\). Wu et al. [274] proposed novel semi-supervised method to detect faults in bearings such as inner race, outer race and ball defects. Wang et al. [275] proposed novel ensemble extreme learning machine network which is combination of two sub-networks, namely, the first extreme learning machine for clustering, and the second for multi-label classification for detection of compounds faults in rotating machinery. Sinistin et al. [276] proposed hybrid model based on \(\hbox {CNN}\) and multilayer layer perceptron that used different data types as input (both numerical and images) for diagnosis of rolling bearing faults. Zhang et al. [277] proposed a novel 2D deep \(\hbox {CNN}\) model which is trained on the augmented training set to learn more discriminative feature set for the fault diagnosis in rotating mechines. Imamura et al. [278] proposed recurrent neural network to identify imbalance faults in rotating machinery by using vibration and current signals. Li et al. [279] a novel model by combining binarized deep neural networks with improved random forests for diagnosis of faults in rotating machinery. Ince et al. [280] proposed one dimensional self organised \(\hbox {Operational Neural Network (ONN)}\) ( self-ONN) for detecting the severity of bearing faults at an early stage. Brusa et al. [281] used novel machine learning model which is is characterized by the introduction of the eigen-spectrograms and randomized linear algebra in bearing fault diagnosis. Zhu et al. [282] proposed improved LeNet-5 model and the model is optimized automatically by \(\hbox {PSO}\) hyperparameter. This \(\hbox {PSO}\) improved \(\hbox {CNN}\) model is constructed which further results in increase in accuracy and it also takes less time in testing & training. Liu et al. [283] proposed unsupervised algorithm named categorical adversarial autoencoder for diagnosis of faults in bearings. categorical adversarial autoencoder method is validated to be feasible for the unsupervised clustering on rolling bearings. Costa et al. [284] diagnosed the faults in bearings by using two unsupervised machine learning techniques called shapley additive explanations and local depth-based feature importance for the isolation forest. Zhao et al. [285] proposed two new algorithms, adaptive sparse contrative auto-encoder algorithm and optimized unsupervised extreme learning machine classifier for diagnosis of faults in roller bearings. Qu et al. [286] proposed an unsupervised feature extraction method called disentangled tone mining was proposed to learn fault level directly from the frequency spectra of the data. Disentangled tone mining method can disentangle uncorrelated noise and recover the discriminant features. Cheng et al. [287] proposed unsupervised machine learning algorithm based on Gaussian mixture model for detection of incipient faults in industrial robots. Martin et al. [288] proposed unsupervised machine learning algorithms such as sparse coding and dictionary learning to detect faults in rotating machine elements like rolling element bearings and gears. Li et al. [289] presented a new roller bearing fault diagnosis algorithm based on a sparsity and neighbor-hood preserving deep extreme learning machine. Kong et al. [290] proposed a novel method, attention recurrent auto-encoder hybrid model classification algorithm, for early fault diagnosis and severity detection of rotating machinery. He et al. [291] proposed \(\hbox {AI}\) unsupervised method based on a deep belief network for diagnosis of faults of a gear transmission chain and the accuracy of this method for fault classification are more than back propagation neural network and \(\hbox {SVM}\) in case of bearings and gears. Table 2 lists various \(\hbox {AI}\) methods used for fault diagnosis of mechanical components.

Table 2 AI based techniques for incipient fault diagnosis in rotating elements

5.1 Epilog

In the last decade, there has been a growing need for \(\hbox {AI}\) to solve the problems of engineering. Earlier, these problems were considered hard to be solved analytically or by using mathematical modeling and needed human intelligence. Machine learning is an application of \(\hbox {AI}\) that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning approaches are traditionally classified as supervised (deals with labeled data) and unsupervised (deals with unlabeled data). Availability of actual machine conditions from field data is somewhat in scarcity due to which supervised machine learning technique under-performs. To overcome this scarcity of labeled data, researchers investigated unsupervised machine learning techniques in the field of condition monitoring. Fault cannot be predicted in actual field conditions while doing data acquisition and also supervised learning has limited scope because of the scarcity of labeled data. Computation earlier was not good but with the advancement of technology use of computers increases. Now a days signal processing computation is done on sensors called edge computing. So \(\hbox {AI}\) is emerging field for fault detection in various rotating machinery.

6 Conclusion

This paper has reviewed the state of the art of machinery prognostics following the four processes of the predictive maintenance program, namely data acquisition, pre-processing (denoising process), feature processing, and \(\hbox {AI}\). In the data acquisition section, various data acquisition sensors are discussed in detail out of which vibration based data acquisition method was found to be prominent. However, acoustic emission is found to be one of the most advanced contact-less method in recent publications in combination with vibration transducers. Signal pre-processing (denoising process) section summarizes various advance signal processing techniques in time domain, frequency domain and time-frequency domain. The use of these techniques are based on the component under study, speed and loading conditions. The feature processing section is considered to be one of the most important part of predictive maintenance program. This section is divided in-to three parts viz. feature extraction in which various feature are extracted from signals, feature selection in which most relevant feature sets are selected based on statistical analysis and at last dimensionality reduction which helps \(\hbox {AI}\) stage work efficiently. In the end, \(\hbox {AI}\) based techniques are discussed to detect incipient level faults in rotating machines. A review on all the literature is impossible and omission of some papers would be inevitable. Figure 7 shows section and sub-section wise paper distribution for detection of incipient faults in rotating machinery.

Fig. 7
figure 7

Literature distribution for various steps in detection of incipient faults in rotating machinery

Although plenty of development have been attained in the field of condition monitoring and health assessment, there are still several aspects which need to be further investigated. Also, the last aspect of this paper is to list the challenges and opportunities in this field, which is expected to point out the development directions and give some suggestions for researchers.

6.1 Challenges in Data Acquisition

  • How to deal with multiple sensors information

In industry 4.0 settings where multiple sensor sources are present from multiple machines, a new problem of chunk of data arises. Under these circumstances, researchers/industrial personals have to carefully investigate the meaningful information related to fault. \(\hbox {AI}\), no doubt, can handle this scenario. However, researchers have to keep in mind the universal statement ‘Garbage In Garbage Out’. Sensor data fusion approach to investigate the health status of machine may be useful tool to handle such situations.

  • How to deal with limited data availability

Limited data troubles more than the chunk of data. There are situations where limited data is available from a machine such as newly commissioned machine. In such situation it is difficult to conclude about the health status of any machine by looking at limited data points. Transfer learning approach may be helpful in such situation where there is some past mathematical model already made based on similar machine.

6.2 Challenges in Signal Processing

  • How to deal with catastrophic failure analysis

One of the most painful situation in any plant is sudden failure of any component without giving any prior symptoms. However, vibration based condition monitoring approach has proven its ability to diagnose faults before a significant amount of time. Various recent advanced signal processing techniques such as \(\hbox {Teager Kaiser Energy Operator (TKEO)}\) condition the signal and improves the fault detection capability of a system. Various other advanced signal processing mythologies have been investigated in the literature, however, their implementation on real world situation is still in initial stage.

  • How to deal with the burden of computational time of signal processing

With the advancement of complex signal processing methods, difficulty of increased computational time arises. Due to this implementation of such signal processing methods got restricted in real-time fault diagnosis applications. But thanks to edge computing, which resolves this problem by providing additional processor on the chipboard of sensor itself to compute maximum calculation at the source sensor only. Various new script writing languages such as Python also resolves these issues by integrating computational power to the sensors.

6.3 Challenges in Feature Processing

  • How to establish relation between feature threshold and fault severity

The relationship between fault severity such as increasing crack or surface wear with statistical feature value is generally difficult to establish. Each machine is unique in its vibration property and data availability for fault progression is quite redundant. This may be handled with the help of dynamic-modeling analysis of a mechanical system.

  • How to find new features which can enhance fault diagnosis at a very initial stage of fault.

Tracking the fault severity based on statistical parameter amplitude is the key to success for any condition monitoring program. Early the fault detection, longer the time interval to take necessary action and hence better will be the condition monitoring program outcome. However, statistical features need not follows a monotonic trend as fault severity increases because the value of statistical parameter depends on many other factors such as speed, load, lubrication, damping etc. Hence, there is a requirement to find new statistical feature sets which shows monotonic trend with fault progression. This can be achieved by using advanced signal processing methods which effectively filter out the noise from the actual fault signal. For example Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) shows very good results in extracting faulty vibration signals from a noisy environment [292].

6.4 Challenges in Artificial Intelligence

  • How to predict multiple faults in a single component

There has a plenty of research work undertaken on the investigation of single fault present at a time on a component. But in practical scenario there may be multiple faults present in a single component. \(\hbox {AI}\) performs better when it comes to predict the results from a seen data set. However, it fails to perform in a situation on which the \(\hbox {AI}\) model is not trained. Hence, a robust \(\hbox {AI}\) model can be trained based on reinforcement learning where the model may be updated as soon as it encounters a new know condition.

  • How to predict system health as a whole based on component level health monitoring data

Most of the time data from a system is captured from its components such as bearing, gearbox, shaft etc. It means that any fault prediction based on the data will be most significant for the component itself. It is challenge to predict the health of a whole machine based on the fault predictions from various sub-components of that system. Stacking Ensemble Machine Learning is one of the way to tackle this issue where information from multiple machine learning models can be combined and used to predict the health of a complete system.