Introduction

Rolling element bearings that suffer from the harsh working condition are the most important components of the rotating machine. Thus a precision residual life prediction is required, so that the fatal damage can be prevented and the personal safety can be guaranteed. Therefore, it deserves much to do some research on the improvement of the residual life prediction accuracy for rolling element bearings.

Residual life prediction is the important part of the PHM (prognostics and health management). And two key important elements that are degradation indicator and prediction model consist in residual life prediction. A sensitive degradation indicator can be helpful to timely detection of the incipient fault and the determination of the incipient threshold, while a reasonable prediction model can improve the prediction accuracy. Therefore, the key to improve the prediction accuracy is to select the sensitive degradation indicator and build the prediction model reasonably. In general, some fault features, such as kurtosis, RMS, peak-to-peak value, etc., are always used to evaluate the degradation performance. But these features are only effective for certain defect at certain stage. Furthermore, for the highly stochastic degradation these features can neither be sensitive to the incipient faults nor make good trend of the degradation performance. To address these problems, many methods have been developed. Qiu developed a new degradation indicator based on self-organizing map neural network, while Jihong Yan used a BP neural network-based indicator to evaluate the degradation performance [1, 2]. Yu developed some new indicators based on generative topographic mapping (GTM), Gaussian mixture model (GMM), and Bayes [3, 4]. Liao used genetic programing to improve the degradation performance [5]. The above methods can show some sensitive to incipient fault than the general fault features, but they do not consider the stochastic characteristics which can exhibit a great effect on the degradation performance in time domain. In addition to the development of the degradation indicator, lots of prediction models have also been proposed for residual life prediction. In [68], the authors proposed a new prediction method based on neural network which improved the long-term prediction accuracy slightly. However, the neural network has some limitations: (1) difficulty of determining the network structure and the number of nodes and (2) slow convergence of the training process. In [9, 10], the authors proposed some prognostic approach based on Bayesian theory, which can predict the probability distribution of the residual life. Meanwhile, in [11] the authors also proposed a new method that can obtain the residual life by computing Phase-Type (PH) distribution. But the shortcoming of this method is also very obvious. It needs a large number of accurate data of prior probability distribution. Unfortunately, it is difficult to meet in actual application. Marcos E. proposed a fault prediction method based on particle filtering [12]. In [13, 14], the authors also used particle filtering to predict residual life and obtained good results. However, the previous prediction models that seldom compromised the information of the real-time data were mostly built by historical data. But the real-time data can directly reflect the current development trend of the bearings’ residual life, while the historical data contain the empirical information of the bearings’ residual life. To this point, using both historical data and real-time data can exhibit some advantages theoretically in terms of efficiency and accuracy.

CHMM (Continuous Hidden Markov Model) that has been widely used in many fields can deal with stochastic characteristics very well and exhibit great effectiveness in comparison to the above-mentioned methods. In [15, 16], the authors used CHMM for fault diagnostic. Moreover, SVR (Support Vector Regression) as a prediction model has also been used in RUL prediction. In [17, 18], SVR was used for fault prognostic.

Therefore, an adaptive method based on Markov indicator and support vector is proposed. The CHMM is used for the construction of the degradation indicator for its ability of dealing with stochastic characteristics in time domain. SVR that can handle the small sample well is used to build the prediction model which is composed of two models, respectively, based on historical data and online data. At last, the ultimate prediction result is calculated by taking a weighted average of the two prediction results captured by these two prediction models, and the weights are adjusted by the LMS (Least Mean Square algorithm) to enhance the prediction accuracy. The predicted results indicate that the proposed method is more effective in comparison to other common prediction methods.

The Principle of the Adaptive Prediction Method

The prediction method is first used to find an effective indicator for degradation performance. Then the prediction model is used to predict the trend of the indicator. At last, the predicted result can be obtained according to the predetermined threshold. In this paper, we use CHMM and SVR to build, respectively, the degradation indicator and the prediction model.

The Principle of the Degradation Indicator Based on CHMM

A sensitive indicator with significant trend is needed for monitoring. Then the failure threshold can be determined so that a severity fault can be prevented at incipient stage. For the low effectiveness of the general fault features, CHMM is used for the construction of the indicator so that the stochastic problem in time domain can be solved.

Hidden Markov Model (HMM) is a powerful statistical tool and has been applied in many areas. The essential construction of HMM is given as follows. We denote that \(S\) is the state alphabet set and \(O\) is the observation alphabet set:

$$S = (s_{1} ,s_{2} , \ldots s_{N} ),\quad O = (o_{1} ,o_{2} , \ldots o_{M} ),$$
(1)

where \(N\) and \(M,\) respectively, denote the total number of the states and the observations. Then, the transition matrix \(A\) is given as follows:

$$A = \left[ {\begin{array}{*{20}l} {a_{11} } \hfill & {a_{12} } \hfill & \ldots \hfill & {a_{1N} } \hfill \\ {a_{21} } \hfill & {a_{22} } \hfill & \ldots \hfill & {a_{2N} } \hfill \\ \vdots \hfill & \vdots \hfill & \ddots \hfill & \vdots \hfill \\ {a_{N1} } \hfill & {a_{N2} } \hfill & \cdots \hfill & {a_{NN} } \hfill \\ \end{array} } \right].$$

Thus

$$a_{ij} = P(q_{t + 1} = s_{j} |q_{t} = s_{i} ),\quad 1 \le i,j \le N,$$
(2)

where \(q_{t}\) denotes the state at time \(t,\) while \(q_{t + 1}\) denotes the state at time \(t + 1\), and the formula denotes the probability that the state \(s_{i}\) at time \(t\) transmits to \(s_{j}\) at time \(t + 1\). Then we denote the observation matrix \(B\) as follows:

$$B = \left[ {\begin{array}{*{20}l} {b_{11} } \hfill & {b_{12} } \hfill & \ldots \hfill & {b_{1M} } \hfill \\ {b_{21} } \hfill & {b_{22} } \hfill & \ldots \hfill & {b_{2M} } \hfill \\ \vdots \hfill & \vdots \hfill & \ddots \hfill & \vdots \hfill \\ {b_{N1} } \hfill & {b_{N2} } \hfill & \cdots \hfill & {b_{NM} } \hfill \\ \end{array} } \right].$$
(3)

Thus \(b_{ik} = P(\theta_{t} = o_{k} |q_{t} = s_{i} ),1 \le i \le N,1 \le k \le M,\) which denotes the existence probability of the \(o_{k}\) in state \(S_{i}\). \(\pi\) is the initial probability array given as follows:

$$\pi \, = \,(\pi_{1} ,\,\pi_{2} ,\, \ldots \,,\,\pi_{N} )$$

and

$$\pi_{i} = P(q_{0} = s_{i} ),\quad 1 \le i \le N.$$
(4)

As the summary of the above discussion, the HMM can be given by \(\lambda = (\pi ,A,B)\). Then CHMM is given by the HMM with continuous observation matrix, \(B = \{ b_{jo} \}\), which are denoted by Gaussian mixture model as follows:

$$\begin{aligned} b_{jo} & = \sum\limits_{k = 1}^{M} {c_{jk} N(o,\mu_{jk} ,U_{jk} )} \\ & = \sum\limits_{k = 1}^{M} {\frac{{c_{jk} }}{{(2\pi )^{D/2} \left| {U_{jk} } \right|^{1/2} }}\exp \left[ { - \frac{{(o - \mu_{jk} )(o - \mu_{jk} )^{T} }}{{2U_{jk} }}} \right]} \\ j& = 1,2, \ldots ,N \\ \end{aligned}.$$
(5)

Of which, the definitions of the variables are as follows: M: the number of the Gaussian function; \(c_{jm}\): the weights of the mth Gaussian function in state \(S_{j}\), which satisfy \(c_{jm} \ge 0\sum\nolimits_{m = 1}^{M} {c_{jm} = 1} ,\quad j = 1,2, \ldots ,N\); N: Gaussian probability density function; o: the observation sequences with the size \(D \times T\), where D denotes dimension and T denotes the length of the sequences; \(\mu_{jm}\): the mean value vector of the mth Gaussian probability density function in state \(S_{j}\); and \(U_{jm}\): the mean value vector of the mth Gaussian probability density function in state \(S_{j}\).

Hence, CHMM can be denoted by \(\lambda = (\pi ,A,C,\mu ,U)\).

For the purpose of constructing the degradation indicator, we introduce the evaluation problem of HMM. Given a HMM and a sequence of observations, the likelihood probability of the observation sequence can be computed. This problem could be viewed if a given observation sequence is generated by the given model. Thus we can use the above probability to solve the problem, which is given by

$$\begin{aligned} P(O|\lambda ) & = \sum\limits_{Q} {P(O|Q,\lambda )P(Q|\lambda )} \\ & = \sum\limits_{{q_{1} \ldots q_{T} }} {\pi_{{q_{1} }} b_{{q_{1} }} (o_{1} )a_{{q_{1} q_{2} }} b_{{q_{2} }} (o_{2} ) \ldots a_{{q_{T - 1} q_{T} }} b_{{q_{T} }} (o_{T} )} \\ \end{aligned}.$$
(6)

The computation of the above equation can be solved by the forward algorithm which is fully presented in [19]. As we use the health data to train the model, the likelihood probability can express how much the given sequence deviated from the health condition. To that end, likelihood probability can be used for degradation indicator. Since the extracted data are continuous and multidimensional, Gaussian mixture model of the continuous observation matrix in CHMM can be used to describe the distribution of these complicated data. Thus the CHMM-based indicator can not only capture the characteristics of the multiple features but also decrease the stochastic influence in time domain. In general, log likelihood probability (LLP) values are smaller than zero. In order to improve its intelligibility, negative LLP (NLLP) is used as a health quantization indication in this study.

The Principle of the SVR-Based Prediction

Support vector machine (SVM) that is used for classification and regression analysis is a supervised learning model with associated learning algorithms. A version of SVM for regression is called support vector regression. The model produced by support vector classification (as described above) depends only on a subset of the training data, because the cost function for building the model does not care about training points that lie beyond the margin. Analogously, the model produced by SVR depends only on a subset of the training data, because the cost function for building the model ignores any training data which are close to the model prediction. Thus the basic function for SVR is given by

$$y = f(x) = \omega \phi (x) + b,$$
(7)

where \(\omega\) and \(b\) are the coefficients with target value \(y\), and the \(\phi (x)\) is a non-linear mapping function that can map the input vector \(x\) into high-dimension space for linear regression so that the non-linear SVR can be achieved. The estimation of these coefficients means solving

$$\begin{array}{*{20}l} {\text{minimize}} \hfill & {\frac{1}{2}\left\| \omega \right\|_{2} } \hfill \\ {\text{subject to}} \hfill & {\left\{ \begin{aligned} y_{i} - \left\langle {\omega ,x_{i} } \right\rangle - b \le \varepsilon \hfill \\ \left\langle {\omega ,x_{i} } \right\rangle + b - y_{i} \le \varepsilon \hfill \\ \end{aligned} \right.} \hfill \\ \end{array},$$
(8)

where \(\varepsilon\) is a free parameter that serves as a threshold; all predictions have to be within an \(\varepsilon\) range of the true predictions. Slack variables are usually added into the above model to allow for errors and approximation in the case that the above problem is infeasible. More details about the estimation of the coefficients can be found in Ref. [20].

The main principle of SVR-based prediction is the iterated multi-step life prediction strategy. When \(n-\) point time series \(X = \{ x_{1} ,x_{2} \ldots x_{n} \}\) is given, we can obtain a training group expressed by \(Y = \{ y_{1} ,y_{2} \ldots y_{m} \}\), where \(y_{m} = \{ (x_{m} ,x_{1 + m} \ldots x_{l + m} ),x_{l + m + 1} \}\), \((x_{m} ,x_{1 + m} \ldots x_{l + m} )\) is the input, and \(x_{l + m + 1}\) is the output. When the SVR trained by the training group is ready, the prediction can be implemented using formula (7). The basic idea of iterated prediction is to use the last predicted result as the component of the input vector for the next prediction so that the future indicator can be captured. Then the remaining life can be computed according to the predicted steps and failure threshold.

The Principle of the Prediction Model

The historical data-based model can get the whole trend information of the full life cycle but less real-time performance. Even though the online data can obtain the trend information of real-time data, the long-term prediction accuracy is low. Therefore, a robust prediction model should contain the information of historical data and online data. For that purpose, two SVR prediction models based on online data and historical data, respectively, are used for residual life prediction, and two predicted results are obtained. Then the ultimate prediction result is captured by taking a weighted average of the two results. The formula of the residual life prediction is as follows:

$$RUL = \beta \cdot l_{1} + (1 - \beta ) \cdot l_{2},$$
(9)

where \(\beta\) is the weight of the SVR model based on historical data with \(\beta \in [0,1]\), \(l_{1}\) denotes the remaining life predicted by the historical data-based SVR model, and \(l_{2}\) stands for the remaining life predicted by the online data-based SVR model.

If \(\beta\) is directly used for prediction, the real-time performance of prediction is very poor for the reason that \(\beta\) is constant. In addition, the accuracy of the two SVR prediction models is different in different degradation states. Then LMS is used for the adjustment of the weight \(\beta\) according to the predicted value and the real value, so that the prediction accuracy can be improved by the dynamic weights. LMS is a self-adaptive filtering algorithm. The formula of weight adjustment is given by

$$\beta_{t} = \beta_{t - 1} + \mu_{t} \cdot e_{t} \cdot x_{t},$$
(10)

where \(\beta_{t}\) is the weight at moment \(t\), \(x_{t}\) is the indicator value at moment \(t\), \(\mu_{t}\) is the step length with \(0 < \mu_{t} < 1/x_{t}\), and \(e_{t}\) is the relative prediction error of the two models. The estimation of \(e_{t}\) is achieved by the predicted indicator value and the actual value. Assume that \(\lambda_{t}^{1}\) and \(\lambda_{t}^{2}\) are, respectively, the predicted indicator value of the two SVR models at moment \(t\), and \(\lambda_{t}^{{}}\) is the actual value at moment \(t\). Thus the computation of \(e_{t}\) is given by

$$e_{t} = 0.1 \cdot \left( {\frac{{\left| {\lambda_{t}^{1} - \lambda_{t} } \right|}}{{\left| {\lambda_{t}^{2} - \lambda_{t} } \right|}} - 1} \right).$$
(11)

Here, \(\left| \cdot \right|\) denotes the Euclid distance calculation.

The principle of the adaptive prediction is first to extract the general features from the vibration signal and construct the CHMM-based indicator with health data of these features. Then the SVR models based on online data and historical data can be built. When the incipient fault occurred, the residual life prediction is implemented. Thus the ultimate prediction result is computed according to formula (9). The prediction flow chart is shown in Fig. 1.

Fig. 1
figure 1

The flow chart of the adaptive prediction method

Experimental Verification

The whole life cycle data of rolling element bearings are from NASA website. The experimental bearing type is Rexnord ZA-2115. The experimental rotating speed is 2000 rm/min, and the sampling frequency is 20 kHz. Every sampling length of the data segment is 1 s, and the interval of the data segment is 10 min. Five groups of data are used in this paper: four of them treated as historical data are used to train the SVR model, while the last group is used for residual life prediction. The formula of the predicted error is given by

$$e = \frac{1}{n}\sum\limits_{r = 1}^{n} {\frac{{\left| {x_{r} - d_{r} } \right|}}{{d_{r} }}} \times 100\%,$$
(12)

where \(x_{r}\) stands for the predicted value at moment \(r\) and \(d_{r}\) stands for the actual value at moment \(r\). Four kinds of general fault features are used for condition monitoring, namely kurtosis, root mean square value, peak-peak value, and peak value. The whole life cycle maps of the five groups of data are shown in Figs. 2, 3, 4, and 5.

Fig. 2
figure 2

The kurtosis map of the full life

Fig. 3
figure 3

The RMS map of the full life

Fig. 4
figure 4

The peak-to-peak map of the full life

Fig. 5
figure 5

The peak map of the full life

The Construction of CHMM-Based NLLP Indicator

We firstly use the health data to train CHMM. As shown from Figs. 2, 3, 4, and 5, the health data can be determined from first data point to the 1000th data point. Then the CHMM-based NLLP indicator which is shown in Figs. 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15 can be obtained. As can be seen from Figs. 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15, the trend of the first 1000th data point is not easy to judge. However, the trend of CHMM-based NLLP indicator from 1500th data point to the end is obvious. Moreover, this significant performance can also be seen in Figs. 7, 9, 11, 13, and 15. But in Figs. 3, 4, 5, and 6, a sudden increase in trend shows up instead of significant degradation trend. Otherwise, the trend of the proposed indicator is smoother and shows less outlier. In these figures of CHMM-based NLLP indicator, we can also choose the incipient fault threshold according to the red line. Therefore, the prediction can be implemented when NLLP indicator reaches the incipient fault threshold of about 300.

Fig. 6
figure 6

The NLLP indicator of the first group of data

Fig. 7
figure 7

Partial enlarged detail map of the NLLP indicator using the first group of data

Fig. 8
figure 8

The NLLP indicator of the second group of data

Fig. 9
figure 9

Partial enlarged detail map of the NLLP indicator using the second group of data

Fig. 10
figure 10

The NLLP indicator of the third group of data

Fig. 11
figure 11

Partial enlarged detail map of the NLLP indicator using the third group of data

Fig. 12
figure 12

The NLLP indicator of the fourth group of data

Fig. 13
figure 13

Partial enlarged detail map of the NLLP indicator using the fourth group of data

Fig. 14
figure 14

The NLLP indicator of the fifth group of data

Fig. 15
figure 15

Partial enlarged detail map of the NLLP indicator using the fifth group of data

The Adaptive Prediction

In order to verify the validation of the proposed method, three other methods that are historical data-based SVR prediction method, online data-based SVR prediction method, and prediction method of [2] are used for comparison. Four groups of data treated as the historical data are used for the training of the historical data-based SVR model, while the last group of data treated as the online data is used for the training of the online data-based SVR model and monitoring. Figures 16, 17, 18, and 19, respectively, show the adaptive prediction map, the prediction map of the prediction model based on historical data, the prediction map of the prediction model based on real-time data, and the prediction map of the algorithm of Ref. [2], and the predicted errors are 0.251, 0.371, 0.382, and 0.384. In these figures, we can see that the predicted map of the proposed method is nearest to the real-life curve in comparison to other curves. Therefore, the experimental results show that the proposed method is effective.

Fig. 16
figure 16

The prediction map of the proposed prediction method

Fig. 17
figure 17

The prediction map of the historical data-based SVR prediction method

Fig. 18
figure 18

The prediction map of the online data-based SVR prediction method

Fig. 19
figure 19

The prediction map of the method in [2]

Conclusion

In order to improve the prediction accuracy, a new method is proposed. Firstly, the CHMM is used for the construction of the degradation indicator that can show the significant incipient fault and the trend of the degradation. Then SVR and LMS are used for the construction of the adaptive prediction model that is composed of the historical data-based SVR model and the online data-based SVR model. The experimental results show that the historical data-based model or the online data-based model cannot provide high accuracy, while the proposed model based on the historical and online data can show great precision in comparison to other methods.