Keywords

1 Introduction

The prevalence of health and usage monitoring systems (HUMS) on aircrafts in the past decades has fuelled the growth of using condition monitoring (CM) data in degradation modeling for health assessment of critical systems [1]. A wide variety of approaches in use of CM data for degradation modeling were comprehensively reviewed by different authors [14]. These data driven approaches are broadly classified into three main categories namely, physics-based methods, artificial intelligence methods and statistical methods. Physics-based method uses deterministic model of the system and can be very complex to develop. Artificial intelligence methods such as neural networks and support vector machines can handle highly non-linear problems but it requires huge number of training data which is often not available in practice. Amongst the three, statistical method is the most widely used in industry where conventional statistical process control and trend extrapolation are most commonly applied [3]. Advanced statistical methods such as Hidden Markov Models and cluster analysis can classify faults better but not widely used in practice due again to unavailability of training data. Most of the applications in the literature used experimental or simulated data for model training and little work was done with fielded applications [4]. In this paper, the switching Kalman filters (SKF) is investigated for fault detection and remaining useful life (RUL) of rolling element bearing and applied to both simulated and actual CM data gathered from AH64D helicopter.

2 Literature Review

The Kalman filter is a stochastic filtering process, which recursively estimates the state of a dynamic system in the presence of measurement noise and process noise, by minimizing the mean squared error [4]. The Kalman filter has been a widely applied concept in navigation and is also used in fields such as signal processing and econometrics. The Kalman Filter requires less training data compared to other statistical and AI techniques as it relies mainly on individual system’s measurement data. However, the dynamical behavior of the system is required and be represented as a state-space model. In prognostic application, the Kalman filter was applied to predict the RUL of electrical connections [5] and electrolytic capacitors [6]. In these applications, the Kalman filter was used to adaptively track changes in the degradation process of the system and the dynamical model describing the degradation process was assumed to be time invariant. However, the degradation process in components can be uncertain and evolve over time as seen in bearing wear tests [7]. For example, in Fig. 1, the vibration measurement of a serviceable bearing can be stationary with measurement noise. When slow stable wear from damage such as surface pitting occurs, the vibration can gradually rise as a linear function. When accumulated damage is severe and unstable, the vibration rises rapidly in higher order functions.

Fig. 1
figure 1

Evolution of degradation process across time

As such, a single dynamical model may not adequately represent the different degradation processes. Consequently, this can cause predictions to diverge or fluctuate depending on whether the degradation process is under or over-fitted. This constraint is often seen in works [8, 9] where only measurements above an established threshold are considered in the analysis as those below does not behave according to the assumed dynamical model. For such problems, SKF can track the dynamics of the degradation process as it changes. RUL prediction is then performed based on the most probable dynamical model representing the degradation process. To do this, the SKF consists of multiple linear state-space models; like the basic Kalman filter, and it can switch between these models through a weighted combination across time. It is popularly used to track multiple moving targets but has also been applied in meteorology [10] and econometric [11]. SKF is applied here to track the different bearing degradation processes shown in Fig. 1. By tracking the dynamical behavior of different degradation processes, fault detection can be performed without using pre-established detection thresholds. It also helps maintainers to predict RUL more accurately by distinguishing between stable and unstable wear and performing prediction only when unstable wear is detected.

3 Background

This section provides a brief review of the Extended Kalman filter, dynamic Bayesian network, SKF and their application towards fault detection and RUL estimate of rolling element bearing.

3.1 Extended Kalman Filter

As mentioned, the Kalman filter recursively estimates the state mean and covariance of a linear process by minimizing the mean square error. The Extended Kalman filter (EKF) is a non-linear extension which uses linear approximation of the non-linear function to estimate the state mean and covariance [12, 13]. The linear approximation performed through first and second-order taylor series expansion of the non-linear function is most commonly used and the first-order is adopted here. The discrete state-space model describing a non-linear process is given by:

$$ x_{t} = f\left( {x_{t - 1} } \right) + q_{t - 1} ,\,y_{t} = h\left( {x_{t} } \right) + r_{t} $$
(1)

where x t is the true but hidden state of the system and y k is the observable measurement of the state. f(.) is the fundamental matrix describing the system dynamics and h(.) is the measurement matrix and both are functions assumed to be continuously differentiable. \( q_{t - 1} \sim N\left( {0,Q_{t} } \right) \) is the process noise and \( r_{t - 1} \sim N\left( {0,R_{t} } \right) \) is the measurement noise. The EKF estimates the value of x t , given the measurement, y t by filtering out the noises. This is carried out using the ‘Prediction’ and ‘Update’ steps also known as the Ricatti Equations [13] are shown as follows.

Prediction Step:

$$ \begin{aligned} {\text{Predicted}}\,{\text{state}}\,{\text{estimate}}:\quad \hat{x}_{t} & = f\left( {x_{t - 1} ,t - 1} \right) \\ {\text{Predicted}}\,{\text{estimate}}\,{\text{covariance}}\,\,\hat{P}_{t} & = F\left( {x_{t - 1} ,t - 1} \right)P_{t - 1} F^{ '} \left( {x_{t - 1} ,t - 1} \right) + Q_{t - 1} \\ \end{aligned} $$
(2)

Update Step:

$$ \begin{aligned} {\text{Measurement}}\,\,{\text{residual}}:\quad v_{t} & = y_{t} - h\left( {\hat{x}_{t - 1} ,t} \right) \\ {\text{Residual}}\,\,{\text{covariance}}\quad C_{t} & = H\left( {\hat{x}_{t} ,t} \right)\hat{P}_{t} H^{ '} \left( {\hat{x}_{t} ,t} \right) + R_{t} \\ {\text{Kalman}}\,\,{\text{Gain}}\quad K_{t} & = \hat{P}_{t} H^{ '} \left( {\hat{x}_{t} ,t} \right)C_{t}^{ - 1} \\ {\text{Updated}}\,\,{\text{state}}\,\,{\text{estimate}}\quad x_{t} & = \hat{x}_{t} + K_{t} v_{t} \\ {\text{Updated}}\,\,{\text{estimate}}\,\,{\text{covariance}}\quad P_{t} & = (I - K_{t} H\left( {\hat{x}_{t} ,t} \right))\hat{P}_{t} \\ \end{aligned} $$
(3)

where F(.) and H(.) are the Jacobians of f(.) and h(.) are given by

$$ F\left( {x_{t - 1} ,t - 1} \right) = \left. {\frac{{\partial f\left( {x_{t - 1} ,t - 1} \right)}}{\partial x}} \right|_{{\hat{x}_{t - 1|t - 1} }} ,\,H\left( {\hat{x}_{t} ,t} \right) = \left. {\frac{{\partial h\left( {x_{t} ,t} \right)}}{\partial x}} \right|_{{\hat{x}_{t|t - 1} }} , $$
(4)

3.2 Switching Kalman Filter

The switching Kalman filter may be represented as a dynamic Bayesian network. In each time step, both the model switch variable, S t and state variable, x t are hidden and have to be inferred from the observations, y t . For a system with multiple dynamics which are described with n Kalman filters, the size of the belief state will increase exponentially at each time step to n t. As such, inferring the probability of every state at each time step becomes intractable. To overcome this problem, approximation method like the Generalised Pseudo Bayseian (GPB) algorithm as described in [12] was adopted. In each time step, the state and covariance estimates from all the filters in the previous time step are combined with weights assigned according to the mix probabilities of the model switch variable, \( S_{t}^{i|j} \) and the model transition probability, Z ij as shown in Eqs. (5) and (6).

$$ {\text{Model switching probabilities:}} \quad S_{t}^{i|j} = \frac{{Z_{ij} S_{t - 1}^{i} }}{{\mathop \sum \nolimits_{i = 1}^{n} Z_{ij} S_{t - 1}^{i} }} $$
(5)

Weighted state and covariance estimates:

$$ \tilde{x}_{t - 1}^{j} = \mathop \sum \limits_{i = 1}^{n} S_{t}^{i|j} x_{t - 1}^{i} ,\,\,\tilde{P}_{t - 1}^{j} = \mathop \sum \limits_{i = 1}^{n} S_{t}^{i|j} \left\{ {P_{t - 1}^{i} + \left[ {x_{t - 1}^{i} - x_{t - 1}^{j} } \right]\left[ {x_{t - 1}^{i} - x_{t - 1}^{j} } \right]^{ '} } \right\} $$
(6)

with the weighted state and covariance estimates, the usual Kalman filter as shown in Eqs. (2) and (3) is carried out for each filter model with each yielding a predicted state, \( \hat{x}_{t - 1}^{j} \) and covariance, \( \hat{P}_{t - 1}^{j} \) estimate. The likelihood of each filter is then determined with Eq. (7) using their measurement residual, \( v_{t}^{i} \). The probability of each model at the current time step can then be obtained as shown in Eq. (8). The weighted state and covariance estimate update for the current time can also be determined using Eq. (9). A detailed description of SKF is available in [14] and a good demonstration of SKF with use of GPB is shown in [15].

$$ {\text{Likelihood}}\,{\text{of}}\,{\text{filter}}\,{\text{from}}\,{\text{measurement}}\,{\text{residual:}}\quad L_{t}^{i} = N(v_{t}^{i} ;0,C_{t}^{i} ) $$
(7)

Probability of each model:

$$ S_{t}^{i} = \frac{{L_{t}^{i} \left( {\mathop \sum \nolimits_{i = 1}^{n} Z_{ij} S_{t - 1}^{i} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {L_{t}^{i} \mathop \sum \nolimits_{i = 1}^{n} Z_{ij} S_{t - 1}^{i} } \right)}} $$
(8)

The weighted state and covariance estimate update are computed as follows:

$$ x_{t} = \mathop \sum \limits_{i = 1}^{n} S_{t}^{i} x_{t}^{i} ,\,P_{t} = \mathop \sum \limits_{i = 1}^{n} S_{t}^{i} \left\{ {P_{t}^{i} \left[ {x_{t}^{i} - x_{k} } \right]\left[ {x_{t - 1}^{i} - x_{t} } \right]^{ '} } \right\} $$
(9)

4 SKF Formulation for Tracking Varying Degradation Processes

In this analysis, it is assumed that component degradation is monotonically increasing and it evolves from normally operating to stable wear and then unstable wear. For bearings, a linear, polynomial or exponential model is used to describe the different trends in the vibration-based degradation measure [1618]. A Kalman filter is built for each of them and they are used together in the SKF. For the exponential filter, extended Kalman filter is applied due to its non-linear form. The state transition F i (.) is obtained from the Jacobian of the state equations using Eq. (4). It is assumed that the process noise entering the system only consists of zero mean white noise q a and q b which models the wear rate parameters a t and b t stochastically. The state, transition and process noise covariance for each filter are shown below with subscripts 1, 2 and 3 denoting the zero, first order and exponential Kalman filters respectively.

Zero Order polynomial model (Normal Operation)

$$ \begin{aligned} {\text{State}} & : x_{t} = x_{t - 1} \\ {\text{State}}\,\,{\text{Transition}} & :F_{1,t} = 1 \\ {\text{Process}}\,\,{\text{Noise}} & :Q_{1,t} = 0,y_{t} = x_{t} + r_{t} \\ {\text{Measurement}} & :H_{1,t} = 1 \\ \end{aligned} $$
(10)

1st Order polynomial model (Stable Wear)

$$ \begin{aligned} {\text{State}} & :x_{t} = x_{t - 1} + a_{t - 1}\Delta t,\,a_{t} = a_{t - 1} + q_{a} \\ {\text{State}}\,\,{\text{Transition}} & :F_{2,t} = \left[ {\begin{array}{*{20}c} 1 & {\Delta t} \\ 0 & 1 \\ \end{array} } \right] \\ {\text{Process}}\,\,{\text{Noise}} & :Q_{2,t} = \left[ {\begin{array}{*{20}c} 0 & 0 \\ 0 & {q_{a} } \\ \end{array} } \right] \\ {\text{Measurement}} & :y_{t} = x_{t} + r_{t} ,\,H_{2,t} = \left[ {\begin{array}{*{20}c} 1 & 0 \\ \end{array} } \right]^{ '} \\ \end{aligned} $$
(11)

Exponential model (Unstable Wear)

$$ \begin{aligned} {\text{State}} & : x_{t} = x_{t - 1} e^{{b_{t - 1}\Delta t}} ,\,b_{t} = b_{t - 1} + q_{b} \\ {\text{State Transition}} & :F_{3,t} = \left[ {\begin{array}{*{20}c} {e^{{b_{t - 1}\Delta t}} } & {x_{t - 1}\Delta te^{{b_{t - 1}\Delta t}} } \\ 0 & 1 \\ \end{array} } \right] \\ {\text{Process Noise}} & :Q_{3,t} = \left[ {\begin{array}{*{20}c} 0 & 0 \\ 0 & {q_{b} } \\ \end{array} } \right] \\ {\text{Measurement}} & :y_{t} = x_{t} + r_{t} ,\,H_{3,t} = \left[ {\begin{array}{*{20}c} 1 & 0 \\ \end{array} } \right]^{ '} \\ {\text{Model}}\,\,{\text{transition}}\,\,{\text{matrix}} & :Z = \left[ {\begin{array}{*{20}c} {0.99} & {0.005} & {0.005} \\ {\sim 0} & {0.99} & {0.01} \\ {\sim 0} & {\sim 0} & {\sim 1} \\ \end{array} } \right] \\ \end{aligned} $$
(12)

Initial model probabilities, state and covariance estimate:

$$ S_{0} = \left[ {\begin{array}{*{20}c} {0.98} & {0.01} & {0.01} \\ \end{array} } \right],\,x_{0} = y_{0,}\,a_{0} = 0,\,b_{0} = 0,P_{0} = I $$
(14)

For the SKF, the state transition matrix Z is set such that the system tends to remain in its own state with Z ii  ~ 1. It is also assumed that the degradation rate can only progress i.e. from normal to stable and unstable degradation but not the reverse. However, Z ij is assigned a value approximately zero for i > j as a value of zero can cause underflow problems in Eq. (8) when implemented as a software program. The initial model probability, S 0 is set with high probability that its in normal condition. The initial state estimate, x 0 is initialized to the first measurement and initial parameters a 0 and b 0 are zero. The initial covariance matrix, P 0 is set arbitrarily with an identity matrix, I.

5 Diagnostics of Evolving Degradation Processes Using Simulated Data

The SKF approach to track the degradation processes is demonstrated here using simulated data. Figure 2 shows different evolving degradation processes; (1) normally operating to unstable wear at t = 150 h and (2) normally operating to stable wear at t = 100 h and then unstable wear at t = 200 h. The simulated degradation measurements are generated using the measurement equations from Eqs. (1113). An additive measurement noise, \( r\sim N\left( {0,0.08^{2} } \right) \) is added all three processes. For stable wear, a wear rate parameter, a = 0.01 is adopted with process noise, \( q_{a} \sim N\left( {0,0.001^{2} } \right) \). For unstable wear, a wear rate parameter, b = 0.04 is adopted with process noise, \( q_{b} \sim N\left( {0,0.004^{2} } \right) \).

Fig. 2
figure 2

Simulated degradation processes with measurement and process noise: (1) normally operating to unstable wear at t = 150 h and (2) normally operating to stable wear at t = 100 h and unstable wear at t = 200 h

The ideal case where the dynamical models of the degradation processes and their measurement and process noise are known is shown here. Figures 3 and 4 shows the SKF results in tracking the evolving degradation processes. It can be seen that the SKF is able to track and estimate the most probable degradation process well using the dynamical behavior of the measurement. For normal to unsteady wear, the SKF detects the change at 158 h compared to 150 h. For normal to steady and then unsteady wear, the SKF detects the change at 116 h and 208 h compared to 100 h and 200 h respectively. The SKF lags behind the actual transition times as it is performing the estimation in real-time and requires adequate measurements from the dynamical process. In addition, it can estimate the wear rate parameters, a and b well at ~0.001 and ~0.04. It should be noted that the estimation will not converge towards the exact parameter value due to inherent noise added to the measurements.

Fig. 3
figure 3

Normal to unstable wear (Top left) Filtered state and most probable model (Bottom left) Model probabilities, (Top and bottom right) Estimated parameters a t and b t

Fig. 4
figure 4

Normal to stable and unstable wear (Top left) Filtered state and most probable model, (Bottom left) Model probabilities, (Top and bottom right) Estimated parameters a t and b t

6 Case Study on AH64D Helicopter Tail Rotor Gearbox Bearing

The SKF approach is applied to vibration CM data from the AH64D Tail Rotor Gearboxes (TRGB) in a practical scenario. The bearing CM data and results from the SKF is shown in Fig. 5. The measurement error, r = 3.2e−4 is obtained by taking the variance of the stationary measurements when the TRGB is in a good condition and this can vary between individual gearboxes. The process error, q s contains the uncertainty of the filters in modeling the real world [13]. It is obtained by tuning the SKF model with similar defect cases and is assumed to be the same across gearbox bearings. The SKF formulations are applied with q s set initially as a small percentage of the measurement error, R. The SKF model is then applied on the CM data and q s is tuned till the model is acceptably consistent yet responsive to changes in the degradation processes. In this study, q s  = 5e-8 is obtained by tuning the model using CM data from other TRGB with similar failure. Form Fig. 5, it can be seen that the SKF can adaptively track the different bearing degradation processes with the process noise tuned from other gearboxes. However, when the CM measurements are not increasing monotonically at ~200 h, the SKF has to take a longer time before it converges. Instead of relying on the absolute value of the CM measurements, the SKF uses the dynamic behavior between the current and past measurement to diagnose the degradation state. Therefore, it is not dependent on a fixed threshold which are typically derived from statistical evaluation of large numbers of past failure cases. Another key advantage of this technique for diagnosis is that it provides the probability of which degradation process the bearing is in. In comparison, the widely used, statistical process control (SPC) approach only triggers when the measurement is above a statistical limit and no further information is available. The quantitative probability measure from the SKF allows more support for maintenance engineers as the probabilities of the bearing conditions can be compared in the event of an outlier measurement.

Fig. 5
figure 5

TRGB CM data (Top left) filtered state and most probable model, (Bottom left) Model probabilities, (Top and bottom right) Estimated parameters a t and b t

7 Prediction of Remaining Useful Life

The SKF infers the most probable dynamic model to be applied at each time step for prediction and the RUL of the bearing is predicted whenever an unsteady wear is detected. The RUL is predicted by propagating the weighted state and covariance estimates obtained from Eq. (8) at each time step using Eq. (2) and determining the time when the degradation state crosses the failure threshold. The α–λ metric [19] is applied to evaluate the performance of this prognostic evaluation as shown in Fig. 6.

Fig. 6
figure 6

α–λ performance metric using 30 % accuracy bounds

The α-λ metric compares the actual RUL to the predicted RUL with converging α bounds that provides an accuracy region. The α bounds are application specific and a prediction is correct if it falls within the alpha bounds. From Fig. 6, the prognostic algorithm performs well as its accuracy improves quickly with time within the 30 % bounds. However, there are points on the RUL trajectory that lies outside the accuracy zone towards the end of useful life which is a behavior reportedly observed in [20] as well. This behavior could be attributed to unsteady vibration levels as the accumulated damage in the bearing becomes sizeable and could perhaps be addressed by lowering the failure threshold limit. Besides the RUL estimate, most of the lower confidence bound, which is important for conservative estimate of the RUL prediction are close to the lower 30 % accuracy bound as well.

8 Conclusion

In this study, the use of SKF is applied for fault detection and RUL estimation. The method is applied to both simulated data and actual helicopter gearbox bearing with promising results. The SKF model allows for degradation processes to evolve through time from which the underlying dynamical process would be inferred accordingly. The advantages of this approach are that it does not depend on a fixed threshold for fault detection and it can model the different degradation processes as they evolve. This approach also provides maintainers with more information for decision-making as a probabilistic measure of the state of bearing degradation is available. From the prognostic performance metric, it was shown that the RUL estimates have high accuracy when it is inferred that the degradation process is likely to be unstable. This in turn can provide maintainers with higher confidence on the predicted RUL for maintenance planning. A drawback of this method is that it requires frequent acquisition of measurement for the filter estimation which may not be readily available in practice.