1 Introduction

Rolling element bearings are widely used in rotating machines and in various industries such as steel, mining, aerospace, paper, railways, textile and renewable energies (Harris 2001). Failure of bearings is a major cause of machinery breakdown, economic losses and even loss of human lives. Undesirable vibrations can be caused by faulty installation, poor maintenance or surface spall that finally leads to development of REB failure (Howard 1994). There is a considerable amount of research in the field of diagnosis of REBs (Singh et al. 2015). One of the important issues is how to identify bearing fault before it reaches to the final failure state. Bearing failure is reported to be almost 40–50% of the cause of motors failure in industries (Nandi et al. 2005). Based on the critical role of REBs in the machines, it is important to anticipate its RUL in order to take a correct maintenance decision.

Prognostic methods are generally categorized as data-driven methods, model-based methods and hybrid methods (Hu et al. 2012). Data-driven methods are based on processing of vibration signal that is gathered from sensors like accelerometers. Data-driven methods do not require knowledge about the physics of failure. Some of the popular methods used in this category are artificial neural networks (Gebraeel and Lawley 2008; Gebraeel et al. 2004; Shao and Nezu 2000), regression analysis (Pham and Yang 2010) and neuro-fuzzy approaches (Wang 2007; Liu et al. 2009). Model-based methods by using mathematical models describe the evolution of defect and degradation of REBs like Paris law (Li et al. 1999; Marble and Morton 2006), contact stress analysis (Gebraeel and Pan 2008) and Kotzalas and Harris (2001). Model-based methods do not require massive sensory data as is the case in data-driven methods but they are suitable for only special defects where a mathematical model can be applied (Kim et al. 2012). Hybrid methods are combination of data-driven methods and model-based methods (Li et al. 2000; Goebel et al. 2006). Hybrid methods take advantage of both data-driven methods and model-based methods, but similar to model-based methods, hybrid methods are limited to the special defects and cases in which sufficient knowledge is available about physics of the failure.

Heng et al. (2009) published a review paper on the subject of prognosis in rotating machinery which mainly covers prognosis of REBs. After that, two other comprehensive review papers have been published which deal with the prognosis of REBs (Jammu and Kankar 2011; El-Thalji and Jantunen 2015). Si et al. (2011) reviewed statistical data-driven methods for RUL estimation of equipment based on available past data and statistical models. They classified statistical models to two broad types: models that depend on directly observed state information of the equipment and models that do not depend on directly observed state information. They pointed to four challenging problems in RUL prediction that shall be further studied in future works. Qian et al. (2017) used multi-time scale method for prediction of bearing RUL. They divided timescale into fast-time and slow-time scale. Multi-time scale modeling technique called phase space warping (PSW) has been used in damage identification, damage evolution tracking and then RUL of bearing predicted by Paris crack growth model. Qiu et al. (2003) used wavelet filter method to enhance weak signatures of fault and used self-organizing map (SOM) for REB performance degradation assessment. This method is suitable where defect features are impulse-like. Gebraeel et al. (2005) proposed two different exponential degradation model, two-parameter exponential model with multiplicative random error terms and two-parameter exponential model with multiplicative Brownian motion error. They used Bayesian approach for updating estimation of stochastic parameters in exponential models and then used these models for predicting residual life of bearings. Wang and Tsui (2017) proposed a statistical model of bearing degradation. They divided bearing degradation into two distinct stages: In stage 1 bearing stays in normal health condition and also health indicator is in stable level, but in stage 2 bearing defects occur and health indicator exhibits an exponential degradation trend. They used Bayesian inference method from Gebraeel et al. (2005) and extended this method to estimate RUL of bearings more accurately. Guo et al. (2017) estimated RUL of bearings by means of recurrent neural network (RNN) and health indicator. Features for neural network were selected among six related-similarity (RS) features and eight time–frequency features. The monotonicity and correlation metrics have been used to select the most sensitive fault features. Rai and Upadhyay (2018) predicted RUL of bearing based on self-organizing map (SOM) and support vector regression (SVR). They extracted time-domain and frequency-domain features from the raw bearing vibration signal, and then, by using self-organizing map-minimum quantization error evolution (SOM-MQEE), health index of bearing was achieved. The health index was fed as input to SVR to predict RUL of bearings. Peng et al. (2018) predicted RUL of bearings by using Gaussian mixture model (GMM) and distance evaluation technique (DET). Gaussian mixture model is used to cluster the health states and identify the abnormal data sets and also minimum description length (MDL) method is used to determine the number of clusters. After obtaining the health states, features are chosen based on distance evaluation technique, and then, RUL of the bearings was predicted by means of least square support vector machine (LS-SVM). Huang et al. (2007) used minimum quantization error (MQE) as a degradation indicator from a bearing’s incipient defect stage to its final failure stage that is derived from SOM. Then this index is used as an input to a back propagation neural network to predict the residual life of a REB. Hong et al. (2014) mentioned that nature of bearing’s vibration signals is non-stationary and has weak faulty signals with strong noise in background; therefore, time–frequency methods are appropriate for feature extraction from the raw signal. They used wavelet packet-empirical mode decomposition (WP-EMD) as a time–frequency method for feature extraction, after that, SOM was used for the condition assessment of the REB performance degradation. Loutas et al. (2013) proposed probabilistic support vector machine (SVM) for prediction of bearing RUL. They used wiener entropy or spectral flatness for condition monitoring of rolling element bearings. Kim et al. (2010) suggested SVM as a classifier to evaluate each health state. They used prior knowledge of the physical degradation of the machinery, failure patterns and information of prior maintenance. Their method needs historical data of various types of defects. Sloukia et al. (2013) used data-driven method for prognosis of REBs. They used SVM as a classification and mixture of Gaussian hidden Markov model (MOG-HMM) for predicting RUL. Liu et al. (2016) classified life cycle of bearings into three health states as normal state, degradation state and failure state. They used data-driven approach for RUL prediction based on multiple health state assessment. They used SVM in classification of health states and also in RUL prediction. Chen et al. (2013) mentioned that SVM is an effective approach for small samples and for univariate time series and to overcome these shortcomings they introduced multivariable support vector machine (MSVM) with relative features for life prediction. They used features like relative root mean square (RRMS) and relative kurtosis factor as input; they concluded that MSVM has a better performance than univariate SVM in RUL prediction when insufficient condition monitoring data exist. They concluded that complicated signal processing methods are required to assess the REB performance degradation and determine the initial defect as well as the final failure more accurately. Reuben and Mba (2014) used actual service data gathered from AH64D helicopters that was measured by accelerometer and used spectral analysis of the data in prognosis. They claimed that in both low- and high-frequency bandwidths, simple spectral analysis was effective for tracking progressive stages of bearing damage. They used regression models like exponential regression model and 5-parameter logistic model for RUL prediction. Ali et al. (2015) defined a new feature as root mean square entropy estimator (RMSEE) and used simplified fuzzy adaptive resonance theory map (SFAM), neural network and Weibull distribution (WD) to calculate RUL of REBs. Tian et al. (2010) used suspension data whereby machines are taken out of service before they fail, in addition to condition monitoring data from failure histories, and used them as inputs to a feedforward neural network (FFNN) for predicting RUL. They concluded that degradation process is important for correct prediction of RUL. Zhang et al. (2013) decomposed vibration signal into several signals containing one approximation and some details by wavelet transform (WT) and then transformed them to frequency domain using fast Fourier transform (FFT), and then they were used as inputs to train a back propagation neural network.

In this paper, a novel prognostic method is proposed that uses current and past vibration signal of the machine. In the proposed method, vibration signal is decomposed by wavelet packet transform (WPT). To obtain better result, best mother wavelet has been selected among 53 examined wavelet bases. Then optimal level and node have been chosen based on kurtosis value. Then features are extracted from decomposed signal and are fed to a designed artificial neural network. In this paper, multilayer perceptron (MLP) network with preprocessing data is employed for RUL prediction. The proposed method showed good accuracy in the prediction of RUL using data obtained in REB life test and proved its capability for practical applications.

This paper is organized as follows. In Sect. 2, the proposed method is explained. Experimental setup is described in Sect. 3. Section 4 explains wavelet transform and common criteria for selection of appropriate mother wavelet, optimal level and node. In Sect. 5, artificial neural networks are explained, and results of MLP are compared with experimental results. Conclusions are given in Sect. 6.

2 Method

Data-driven prognostic methods do not need to model evolution of defect and physics of defect growth. This method is based only on measured vibration signals. In this paper, we used wavelet packet transform and ANN to predict RUL of a deep groove ball bearing under controlled load. Schematic diagram of the proposed method is shown in Fig. 1.

Fig. 1
figure 1

Process of the proposed REB prognosis method

Feature extraction is based on presentation of vibration signal in different domains. In time domain, main features are based on statistical description of the vibration signal like peak, root mean square (RMS), crest factor, kurtosis. (Chen et al. 2013). Common time-domain features are defined in Table 1. Other features are calculated based on transformation of the signal into other domains like frequency domain by means of Fourier transform. However, simple FFT is not efficient in analysis of non-stationary signals, and time–frequency-domain features are supposed to provide more information. The main methods to transform signal into the time–frequency domain are short time Fourier transform (STFT) and wavelet transform (WT). STFT has problems in selecting window length and time–frequency resolution. WT has a compact dyadic presentation of the signal and is a good choice for data compressing, while WPT has more detailed information. Therefore, WPT has been selected for extracting appropriate features of REB vibration signal.

Table 1 Common statistical vibration features in time domain

Features in Table 1 can be described as follows: Mean is mathematical expectation of the random variable. RMS represents the average power of a signal. Absolute average value is used to get the absolute values of all the numbers, and then, average function is applied to calculate the result. Fourth moment is a specific quantitative measure, used in both mechanics and statistics of the shape of a set of points. Variance is the expectation of the squared deviation of a random variable from its mean, informally, it measures how far a set of random numbers are spread out from their average value. Peak is the maximum value, either positive or negative, that a waveform attains. Crest factor is the ratio of peak values to the RMS. In other words, crest factor indicates how extreme the peaks are in a waveform. Kurtosis is a measure of the tailedness of the probability distribution of a random variable in comparison with the normal distribution.

3 Experimental Setup

A test rig for accelerated life test of REBs at Sharif University of Technology is used to verify proposed prognosis method. The schematic of test rig is shown in Fig. 2. The vibrations data of each bearing consist of two channels from two B&K AS-65 accelerometers that are placed radially on the housing of the test bearing. The test bearing type is NTN 6804-ZZ, a double-shielded single-row deep groove REB. The experimental setup is shown in Fig. 3.

Fig. 2
figure 2

Schematic of test rig

Fig. 3
figure 3

Experimental setup for REB run-to-failure test

As shown in Fig. 3, two accelerometers were mounted in vertical and horizontal directions. The load was applied to the bearing radially in vertical direction by means of four screws and measured by a load indicator. Bearings are shielded so there is no need for external lubrication. Vibration signal was collected at two sampling frequencies of 51.2 kHz by an eight channel, 16 bits real-time data recorder and 25.6 kHz by a portable two channel vibration analyzer. In this experiment, six bearings have been tested as shown in Table 2. All tests have been done at constant speed (1500 rpm) and approximately constant radial load (4400 N) with a small variation. Bearing life depends on load but has stochastic behavior. Load changes in the tests were small; therefore, variation in bearings life is due to randomness in bearing life. According to the conducted tests and observations, acceleration threshold for end of the test was selected as 20 g rms. Inspection of internal parts of the bearing showed that bearing damage was fully developed at this acceleration level. Since run of the test beyond this level would have safety risks, this acceleration level was chosen as end of test. This acceleration threshold has been reported in other tests with the same size of bearing (Wang et al. 2015; Zhao et al. 2016). In addition, temperature of the bearings was measured in every 15 min by a thermometer during the test.

Table 2 Test specifications

The run-to-failure measured vibration signal of bearing test 6 is shown in Fig. 4. As shown in Fig. 4, amplitude of the acceleration increases until it reaches the threshold of 20 g. It shall be noted that at this acceleration level bearing defect has been grown entirely and in real-world applications the machine must be stopped earlier to avoid damage to other parts. Nevertheless, the test was continued to reach this level to observe defect growth in the bearing until near the end of the bearing life. Last portion of the RMS trend of vibration in bearing test 5 is plotted in Fig. 5. Damaged bearing in test no. 5 is shown in Fig. 6. Spalls can be seen on the inner race and balls. In real applications, the threshold for stopping the machine shall be below 20 g rms to avoid safety risks.

Fig. 4
figure 4

Bearing vibration signal in test no. 6

Fig. 5
figure 5

Trend of acceleration RMS (km/s2) versus hours of operation in bearing test no. 5

Fig. 6
figure 6

Inner race and rolling elements of the damaged bearing in test no. 5

4 Wavelet Transform (WT)

Classical diagnosis tools like fast Fourier transform (FFT) that are used for the analysis of stationary vibration signals are not suitable for analysis of bearing defects (Akbari et al. 2014). Vibration of defective bearings can be considered as cyclostationary (Antoni 2009), and time–frequency or bi-frequency transforms (Yiakopoulos and Antoniadis 2005) are more suitable for it. Wavelet theory developed as a signal processing tool which is suitable for analysis of both stationary and non-stationary signals. This transform has also been used successfully in diagnosis of bearings and feature extraction (Kankar et al. 2011). The commonly used wavelet transforms are continuous wavelet transform (CWT), discrete wavelet transform (DWT) and wavelet packet transform (WPT) (Kulkarni and Sahasrabudhe 2013).

CWT generates redundant data, and its calculation consumes more time than other wavelet transforms. In comparison, DWT is more compact and faster. This is the reason that DWT is more common in condition monitoring purpose (Akbari et al. 2014).

Continuous wavelet transform of signal x(t) is defined as:

$$ T\left( {a,b} \right) = \mathop \int \nolimits_{ - \infty }^{ + \infty } x\left( t \right) \psi_{{\left( {a,b} \right)}}^{*} \left( t \right) {\text{d}}t $$
(1)

The sign * is complex conjugate and \( \psi_{(a,b)} \) is:

$$ \psi_{{\left( {a,b} \right)}} \left( t \right) = \frac{1}{\sqrt a } \psi \left( {\frac{t - b}{a}} \right) $$
(2)

\( \psi_{(a,b)} (t) \) is a daughter wavelet that is obtained from mother wavelet, a and b are real parameters, representing scaling and translation, respectively.

Discrete wavelet transform is obtained by discretization of CWT. The most popular one is dyadic discretization given by:

$$ \psi_{{\left( {j,k} \right)}} \left( t \right) = \frac{1}{{\sqrt {2^{j} } }} \psi \left( {\frac{{t - 2^{j} k}}{{2^{j} }}} \right) $$
(3)

where a and b are replaced by \( 2^{j} \) and \( 2^{j} k \), respectively.

Wavelet packet transform is a generalized form of DWT where in addition to approximation coefficients, detail coefficients are also decomposed and all nodes at the same level have equal frequency bandwidth. WPT is calculated by wpdec command in MATLAB.

4.1 Selection of Best Mother Wavelet and Optimal Node

There are various methods for selecting mother wavelet and decomposition level. Yan (2007) proposed maximum energy to Shannon entropy ratio as a criterion for mother wavelet selection. Rafiee et al. (2009) used Daubechies (DB) mother wavelet and have chosen 4th level as an optimal level. Kankar et al. (2011) extracted statistical features by using complex Morlet wavelet based on minimum Shannon entropy criterion (MSEC). Bafroui and Ohadi (2014) decomposed vibration signals of a gearbox by using CWT and Morlet wavelet and determined the optimal range of wavelet scales based on the maximum energy to Shannon entropy ratio criterion. Kumar et al. (2014) used the fact that mother wavelet selection depends on the similarity between the shape of original signal and the mother wavelet and also focused on the mother wavelet selection for analyzing bearing vibration signals based on the two criteria including MSEC and maximum energy to Shannon entropy ratio criteria. Akbari et al. (2014) chose appropriate mother wavelet and level based on maximum energy to Shannon entropy ratio criteria.

Combination of two criteria of maximum energy and minimum Shannon entropy for mother wavelet selection leads to the maximum energy to Shannon entropy ratio criteria (Bafroui and Ohadi 2014):

$$ \xi \left( n \right) = \frac{E\left( n \right)}{{S_{\text{entropy}} \left( n \right)}} $$
(4)

That \( E(n) \) is energy at level n and \( S_{\text{entropy}} (n) \) is entropy at level n.

Energy at level n is given by:

$$ E\left( n \right) = \mathop \sum \limits_{i = 1}^{m} \left| {C_{n,i} } \right|^{2} $$
(5)

where \( C_{n,i} \) is the ith wavelet coefficient at level n, m is the number of wavelet coefficients.

Shannon entropy at level n is given by:

$$ S_{\text{entropy}} \left( n \right) = - \mathop \sum \limits_{i = 1}^{m} P_{i} *{ \log }P_{i} $$
(6)

where \( P_{i} \) is the distribution of the energy probability for the wavelet coefficients given by:

$$ P_{i} = \frac{{\left| {C_{n,i} } \right|^{2} }}{E\left( n \right)} $$
(7)

where \( \sum\nolimits_{i = 1}^{m} {P_{i} = 1} \). The value of \( P_{i} *{ \log }P_{i} \) is assumed zero when \( P_{i} = 0 \).

In this paper, maximum energy to Shannon entropy ratio criteria have been used for mother wavelet selection. Comparing 53 different mother wavelets as illustrated in Fig. 7 shows that bior 3.1 is the most suitable mother wavelet for REB feature extraction.

Fig. 7
figure 7

Comparison of energy to Shannon entropy ratio of different mother wavelets

Due to the fact that WPT is a generalization of DWT and provides richer information of the considered signal (Nikolaou and Antoniadis 2002; Peng and Chu 2004; Al-Badour et al. 2011), WPT has been used in this paper and both optimal level and optimal node are selected.

Kurtosis of acceleration signal is used widely in diagnosis of rolling bearings. Healthy bearing has a kurtosis of 3, while higher kurtosis values correspond to bearing defect. To select the optimal level and node, the value of kurtosis of decomposed signal in all nodes of each level is calculated. The node that has a larger value is selected as the optimal level and the optimal node.

By applying WPT on the bearing vibration signal and calculating the value of kurtosis at each node, as shown in Fig. 8, Node (4, 7), i.e., level 4 and node 7, are chosen as an optimal level and an optimal node. Two features, RMS and Kurtosis, are calculated from decomposed signal at this node and fed as input of the neural network.

Fig. 8
figure 8

Optimal level and optimal node selection based on kurtosis value

5 Artificial Neural Network (ANN)

ANN structure consists of one input layer, one or more hidden layer, one output layer, weights and biases. Process of determining the weights and the biases is known as training of the ANN (Rao et al. 2012). Schematic diagram of the multilayer ANN is shown in Fig. 9. The extracted features from the vibration signal act as network inputs, and target or output is the RUL of the bearing stated in percentage of total life. Selection of the features for ANN is based on the comparison of performance of eight features that are listed in Table 1. Root mean square of error (RMSE) is used to measure performance of each feature. Performance of the ANN for all combinations of inputs in groups of one, two and three features was calculated. The best features as single input were used in combination to form two and three inputs. Best results were obtained in groups of two features as shown in Table 3. It can be seen that combination of RMS and absolute average value has less error than others and consequently this combination was chosen as input to ANN.

Fig. 9
figure 9

Schematic diagram of the multilayer ANN

Table 3 Comparison of RMSE of various combinations of vibration features as input to ANN

A MLP neural network with seven neurons (chosen by comparing the results) in the hidden layer and one neuron in the output layer is used. Hyperbolic tangent sigmoid transfer function is selected for hidden layer and linear transfer function for output layer. Levenberg–Marquardt algorithm is used for ANN training, because it gives better performance among other network training algorithms. Cost function is selected as mean square of errors (MSE). The NN has been trained by RMS and absolute average value as inputs and the percentage of RUL as target. There were 487 data points in which 347 data points are selected randomly for training, and the remaining 70 random data points are used for validation and test. Result of the MLP for data test 5 is shown in Fig. 10.

Fig. 10
figure 10

Result of MLP training for data test 5

The prediction result obtained by MLP is not satisfactory. To improve the accuracy of MLP prediction, some preprocessing has been applied on the features obtained from decomposed signal by WPT. First, input data were smoothed by moving average method. The smoothed RMS is depicted in Fig. 11. It can be seen that RMS has nearly monotonic trend so it can be used in prognosis. Features that do not have monotonic behavior such as kurtosis are not good candidates for prognosis. Then, MLP has been trained by smoothed data, and the results as depicted in Fig. 12 show better behavior.

Fig. 11
figure 11

RMS was smoothed with moving average method

Fig. 12
figure 12

Result of MLP with smoothed data

To further improve results, after smoothing of inputs, a Weibull basis hazard function h(t) and exponential function were fitted to them. Weibull hazard function h(t) is defined as (Jardine et al. 1998):

$$ h(t) = \frac{{{\text{probabilty}}\; {\text{density}}\; {\text{function}}\;\left( {\text{PDF}} \right)}}{{1 - {\text{cumulative}}\; {\text{distribution}}\; {\text{function}}\;\left( {\text{CDF}} \right)}} $$
(8)
$$ h\left( t \right) = \frac{\beta }{{\eta^{\beta } }} t^{\beta - 1} $$
(9)

where β and η are shape and scale parameters of Weibull distribution function, respectively. The Weibull shape parameter β is also known as the Weibull slope because the value of β determines the initial slope of the curve. Change of scale parameter η is equivalent to the change in the abscissa scale.

Exponential function f(t) is defined as:

$$ f\left( t \right) = ae^{bt} + ce^{dt} $$
(10)

Fitting of RMS by Weibull basis hazard function is depicted in Fig. 13, the R-Square of this fitting is 0.7, and its root mean square of error (RMSE) is 0.0018. Fitting of RMS by exponential function is depicted in Fig. 14, the R-Square of this fitting is 0.77, and its root mean square of error (RMSE) is 0.0016, so exponential function has a better fitting rather than Weibull basis hazard function and it is used for fitting of features. Fittings of absolute average value by Weibull basis hazard function and exponential function are also depicted in Figs. 15 and 16. In this figure, exponential fitting also shows better performance than Weibull fitting.

Fig. 13
figure 13

Weibull basis hazard fitting of RMS

Fig. 14
figure 14

Exponential fitting of RMS

Fig. 15
figure 15

Weibull basis hazard fitting of absolute average

Fig. 16
figure 16

Exponential fitting of absolute average

The result of MLP network with smoothed and fitted data is depicted in Fig. 17. As shown in Fig. 17 and according to Table 3, the root mean square of error (RMSE) is 2.13 and there is a little error between targets and outputs.

Fig. 17
figure 17

Result of MLP with smoothed and exponential fitted data of test 5

Error of MLP network with raw input and preprocessed data is shown in Table 4. It can be seen from Table 4 that, as the preprocessing on the data is applied, accuracy of the network improves.

Table 4 Comparison of error in MLP with different preprocessing data

In the next step, the trained NN with data test 5 is used for predicting the RUL of data test 1 and data test 6. The training data for prediction of test 1 and test 6 contain data of test 5 and first half of the data of test 1 and first half of the data of test 6, respectively. This ensures that the proposed prognostic method can be used on similar machines after training of the network with the data of only one machine. However, the prediction can be improved if the data of more than one machine are used for training.

The result of MLP network prediction with smoothed and fitted data of data test 1 is depicted in Fig. 18, and the result of MLP network prediction with smoothed and fitted data of data test 6 is depicted in Fig. 19.

Fig. 18
figure 18

Relation between predicted RUL and actual RUL for data test 1 (only first 50% of the data is used for training)

Fig. 19
figure 19

Relation between predicted RUL and actual RUL for data test 6 (only first 50% of the data is used for training)

As shown in Figs. 18 and 19, the predicted RUL is approaching to the desired or actual RUL and the network predicts the last time of experiment with a small error. The error of MLP neural network with smoothed and fitted data for data test 1 and data test 6 is shown in Table 5. As shown in Table 5, there is little error between actual and predicted results.

Table 5 Comparison of error in MLP for other data tests

It is interesting to compare the results of the current method with other references. Because error is reported in various forms in different papers, it is not possible to compare the results directly. A descriptive comparison is given as follows:

In Gebraeel and Lawley (2008), the error in RUL prediction is about 7.56% and other benchmarks pointed to in that paper have a high error compared to current proposed method.

In Huang et al. (2007), errors were grouped into some classes (less than 10%, greater than 10% but not greater than 20% and greater than 20%), and generally error of their method is high.

In Kim et al. (2010), predicted RUL is based on health state and error of their method in the first part of the test is high. Error value decreases at the end of the test.

In Sloukia et al. (2013), mixture of hidden Markov model and support vector machine has been used and they reported relatively high error between estimated and real RUL (38.52 and 21.85%).

In Gebraeel et al. (2009), predicted RUL of bearing by using Bernstein fitting and despite the complexity of computation, the average prediction error is reported between 16.2 and 22.5%.

As can be seen, the proposed method has less error compared to the published results.

6 Conclusions

This paper is intended to present an efficient and accurate method for RUL prediction of REBs. In this work, vibration signal is decomposed by WPT. New methods for selecting appropriate mother wavelet, optimal level and optimal node are proposed. The selected features are extracted from the decomposed signal and then smoothed and fitted by Weibull hazard function as well as exponential function. Effect of preprocessing of the input data on the prediction results of the MLP network was investigated, and it was concluded that MLP with smoothed and fitted data has better RUL prediction than other methods. Additionally it was verified that the trained network for one test can be applied in prediction of other data tests successfully. In this study, it has been assumed that rotation speed and load are constant. If the operating conditions of the machine are not constant, the current approach shall be modified to account for variation in degradation rate of bearing. This case which can be observed in some applications is the subject of a new research. The proposed approach can be considered as the first investigative step since it concerns a single application of the method to specific type of machines and to unique specimens and therefore its effectiveness for other components and various cases has to be proved with further investigations.