Keywords

1 Introduction

Because of the growing demands of equipment availability, performance, and maintenance, the scientific community has been developing methods for forecasting failures, and for the estimation of the Remaining Useful Life (RUL) for scheduling Condition-Based Maintenance (CBM) and Predictive Maintenance (PM). The National Aeronautics and Space Administration (NASA) was among the first to work on prognosis, because in the aerospace field, prognosis of failure can avoid catastrophes. Performance evaluation is a key element in fault prognosis and several methods have been proposed, based on different evaluation criteria. The work presented in [63, 100, 101] goes in the direction of a standardization of these criteria and proposes performance metrics applicable to different methods of fault prognosis. These metrics allow, on the one hand, establishing design requirements by quantifying acceptable performance limits and on the other hand, comparing different methods. In [101], a structured synthesis of the used metrics for the evaluation of the performance of fault prognosis methods that adapt to different application domains is presented, including Prognosis Horizon (PH), Alpha-Lambda Performance, Relative Accuracy (RA), and Convergence Rate. Fault prognosis methods are compared with respect to these metrics.

From the methodological viewpoint, several classifications of fault prognosis methods are proposed, like the pyramidal classification proposed in [19], which provides a classification into three approaches: expert approaches, physical model-based approaches, and data-driven approaches. The originality of this classification is related to the fact that these approaches are positioned in a pyramidal organization chart according to the scope of application, cost, and complexity of each approach. The evolution of hardware and software resources for data acquisition, storage, and processing has favored the widening of the application scope and the accuracy of the data-driven methods of fault prognosis. In addition, hybrid approaches have emerged to benefit from the combination of these approaches [35, 36].

This paper focuses on fault prognosis with a horizontal approach, which offers the advantage of relating fault diagnosis and prognosis. Unlike other existing reviews [45, 57, 103, 108], which only focus on RUL prediction, this paper offers a review of approaches that deal with the problem including fault diagnosis and allowing RUL estimation also when the only available data relate to normal operation. Since the health indices (HIs) generated by the methods initially used for fault diagnosis are not all usable for failure prognosis, evaluation methods of the properties that a HI must satisfy to be usable for RUL estimation, namely the Monotonicity, Trendability, and Prognosability [10, 24], are presented and then used to evaluate the usability for failure prognosis of HIs generated by fault diagnosis methods. The methods are presented in their basic version to facilitate the understanding of the ideas and the formal analysis, followed by indications and references to their extensions for particular practical cases.

As illustrated in Fig. 8.1, the structure of the horizontal approaches has two main parts: HI generation based on condition monitoring and HI trend modeling for RUL estimation. Correspondingly, this paper is organized as follows: the studied framework and definitions are presented in Sect. 8.2. Section 8.3 is devoted to formal description of methods for the definition of HIs and an analysis of their use for the estimation of the RUL. Then, the techniques for the estimation of the RUL by trend modeling are presented in Sect. 8.4, with an analysis of their complexity and performance. The purpose is to provide an overview of existing techniques and guidance for choosing approaches according to the field of application and available knowledge (physical knowledge, expert knowledge, data-driven). In the horizontal approach, the methods used for generating the HIs can be completely different from the method used to model trends for RUL estimation. For this reason, in this work, we have opted for a separate classification of the two parts: a classification of the methods used for the generation of HIs and a classification of the HIs trend modeling techniques for RUL estimation. In both parts, metrics are proposed for performance evaluation.

Fig. 8.1
figure 1

Structure of the horizontal approach of fault prognosis

2 Study Framework

By definition, a fault is an unauthorized and unexpected deviation from the normal condition, whereas a degradation refers to the deterioration of performance in an irreversible manner. Degradation becomes failure when performance falls below a critical threshold defined in the functional specification of the equipment: the system is no longer able to perform the required function. According to the international standard (ISO 13381-1:2004), fault prognosis is defined as the estimation of the Remaining Useful Life (RUL) or the End of Life (EoL), and the estimation of the risk of subsequent development or existence of one or more faulty modes. However, in the literature, the definition of the fault prognosis concept is adapted to the context, the objectives, and the field of application, among these interpretations:

  • Wang et al. [114]: In the industrial and manufacturing areas, prognosis is interpreted to answer the question: what is the RUL of a machine or a component once an impending failure condition is detected and identified.

  • Mathur et al. [78]: Prognosis is an assessment of the future health.

  • Lebold et al. [64]: Prognostics is the ability to perform a reliable and sufficiently accurate prediction of the RUL of equipment in service. The primary function of prognostics is the projection into the future of the current health state of equipment, taking into account the estimate of future usage profiles.

  • Byington et al. [19]: Prognostics is the ability to predict the future condition of a machinery based on the current diagnostic state of the machinery and its available operating and failure history data.

  • Jardine et al. [57]: Prognostics deals with fault prediction before it occurs. Fault prediction is a task to determine whether a fault is impending and estimate how soon and how likely a fault will occur.

  • Muller et al. [87]: Prognostics is the ability to predict the future state of an item from its present, its past, its degradation laws, and the maintenance actions to be investigated.

In recent publications, the notion of prognosis is increasingly associated with the estimation of the RUL:

  • Tobon et al. [109]: Fault prognostics can be defined as the prediction of when a failure might take place.

  • Gucik-Derigny [50]: The prognosis consists in predicting the evolution of the future state of health of a system and estimating the remaining lifetime of a system before one or more failures appear on the system.

  • Singleton et al. [104]: Effective diagnostic and prognostic tools are essential for timely fault detection and Remaining Useful Life prediction.

  • Sun et al. [107]: Prognostics usually focuses on the prediction of the failure time or the Remaining Useful Life of a system or component in service by analysis of data collected from sensors.

  • Lee et al. [66]: Prognostics can be interpreted as the process of health assessment and prediction, which includes detecting incipient failures and predicting RUL.

  • Lim et al. [69]: Prognostics is the analysis of the symptoms to predict future conditions and Remaining Useful Life.

It can be noticed that the references cited above define prognosis as the prediction of the RUL based on an analysis of the monitoring condition data and the current state of the system.

2.1 Formal Definitions of the RUL

The RUL is sometimes also called Remaining Service Life, Residual Life, or Remnant Life [57], and refers to the time left before observing a failure given the current machine age and condition, and the past and future operation profile. In [103], the RUL of an asset or system is defined as the time-span from the current time to the end of the useful life. In [105], the RUL at any time t is defined as the remaining lifetime of a unit given that it is running at time t and given all the available information related to the unit at time t. Two main mathematical definitions of the RUL can be found in the literature, depending on the method used for estimating this quantity and depending on the available information: a definition of the RUL as a function of the condition monitoring (CM) and a definition of the RUL as a function of the reliability function (RF).

2.1.1 Definition of the RUL as a Function of CM

The definitions of the RUL given above are in agreement with the formal definition given in Jardine et al. [57], where the RUL is defined as a function of the CM of the system (Z(t)), which gathers all the prior knowledge on the past operating state of the system as well as the co-variables that describe its current operating state, and is expressed as follows:

$$\displaystyle \begin{aligned} {\mathrm{RUL}}({t}|{Z_t}) = T - t\left| {T > t,\,Z\left( t \right)} \right. \end{aligned} $$
(8.1)

with:

  • T: random variable of time to failure

  • t: current age

  • Z(t): past condition profile up to the current time.

This expression is illustrated in Fig. 8.2. According to the knowledge in Z(t), the RUL can be calculated as a deterministic, statistic (as an expectation), or probabilistic variable (as a probability density function). The dashed gray envelope defines the margin of uncertainty about future operating conditions and the system environment.

Fig. 8.2
figure 2

Illustration of RUL

2.1.2 Definition of the RUL as a Function of RF

In [8, 9], it is stated that information from condition monitoring can be included in reliability analysis by considering the hazard rate function as a probabilistic function. There are several methods for calculating the conditional and unconditional reliability functions (RFs) and for computing the Remaining Useful Life (RUL) as a function of the current conditions. In classical reliability, the RF is calculated mainly for two cases: as an unconditional RF, assuming that the item has not yet been put into operation (P(T > t)), and as a conditional RF, assuming that the item has not yet failed up to sometime x (P(T > t|T > x)).

Assuming that the system is operating at time t, the RUL is expressed in [112] as a time v for which the probability that the state of the system Z at time t + v, noted P[Z(t + v) ≥ L|Z(t)] approximates the probability of failure q assumed to be known. The RUL expression is given as follows:

$$\displaystyle \begin{aligned} {\mathrm{RUL}}(t, q) = sup \{v : P[Z(t + v) \ge L|Z(t)] \le q\} \end{aligned} $$
(8.2)

where L is the failure threshold and P[Z(t + v) ≥ L|Z(t)] is defined as the reliability of the system.

3 Health Indices Definition Methods

Existing methods of fault diagnosis have been reviewed in recent years, such as [45, 46, 108]. The most recent is proposed by D. Gao et al. [45], in which the authors propose a first classification of fault diagnosis methods into two groups: hardware redundancy-based fault diagnosis and analytical redundancy-based fault diagnosis. The fault diagnosis techniques based on analytical redundancy are structured in five classes according to the mathematical tools and prior knowledge used: model-based fault diagnosis, signal-based fault diagnosis, knowledge-based fault diagnosis, hybrid fault diagnosis, and active fault diagnosis. In this work, the methods for defining health indices are gathered in two classes: physical model-based, data analysis and signal processing. The focus is on methods that can be used for the generation of health indices sensitive to progressive degradation and whose trend can be modeled for RUL estimation. The presented methods of HI generation are summarized in Table 8.1.

Table 8.1 Analysis table of HI generation methods

3.1 Physical Model-Based Methods

These methods are based on a physical representation of the process. They require a good understanding of the behavior of the system but does not require the availability of data on the operation of the system in degraded modes. Physical models are usually described by partial differential equations [3, 67] or state representation equations [72, 98]. Once the physical model is available, the behavior of the current process is compared with that of the model in normal operation to detect the start of degradation. After degradation has been detected, trend models are used to predict the evolution of degradation over time until reaching a failure threshold, usually predefined. The RUL corresponds to the time taken by the degradation to reach this failure threshold.

3.1.1 Analytical Redundancy Relations

The analytical redundancy relations (ARRs) are obtained from an over-constrained system by eliminating the unknown variables [18], assuming that all functions are differentiable with respect to their arguments. For a nonlinear deterministic system (Eq. (8.3)) where \(x \in {\Re ^n}\) is the state vector, \(u \in {\Re ^{m_u}}\) is the control vector, \(d \in {\Re ^{m_d}}\) is the disturbance vector, \(\theta \in {\Re ^{m_f}}\) is a fault vector, and \(y \in {\Re ^p}\) is the measurement vector:

$$\displaystyle \begin{aligned} \begin{array}{l} \dot x = h(x,u,d,\theta ) \\ y = g(x,u,d,\theta ) \\ \end{array} {} \end{aligned} $$
(8.3)

the derivative of order q of the output y gives rise to the following set of (q + 1)p constraints:

$$\displaystyle \begin{aligned} {y^{(q)}} = {\gamma ^q}\left( {x,{{\bar u}^{(q)}},{{\bar d}^{(q)}},{{\bar \theta}^{(q)}}} \right) {} \end{aligned} $$
(8.4)

where \({\bar u}^{(q)} \in \Re ^{(q+1)m_u}\), \({\bar d}^{(q)} \in \Re ^{(q+1)m_d}\), and \({\bar \theta }^{(q)} \in \Re ^{(q+1)m_{\theta }}\).

Under the condition that (q + 1)p > n + (q + 1)m d and the Jacobian \(\left [ {\begin {array}{*{20}{c}} {\frac {{\partial {\gamma ^{(q)}}}}{{\partial x}}} & {\frac {{\partial {\gamma ^{(q)}}}}{{\partial d}}} \\ \end {array}} \right ]\) is of rank n + (q + 1)m d [22], both the state x and the unknown input d can be eliminated, leading to the set of ARRs.

$$\displaystyle \begin{aligned} HI= r\left( {{{\bar y}^{(q)}},{{\bar u}^{(q)}},{{\bar \theta}^{(q)}}} \right) = 0 {} \end{aligned} $$
(8.5)

In normal operation, Eq. (8.5) is true, whereas it is not in presence of a fault. Equation (8.5) shows that the set of residuals r is a function of the set of parameters θ identified on the system in normal operation, and corresponding to well-identified hardware components or physical phenomena. The appearance of a progressive degradation in the system manifests a progressive deviation of the parameter affected by the degradation from its nominal value identified during normal operation. Thus, the residuals that are a function of this parameter progressively deviate from zero, enabling the detection of the start of degradation. The use of the failure signature matrix makes it possible to check the isolability of the degradation: even if the isolation of the degraded component or physical phenomenon is not always possible, the subsystem that degrades in a complex system is often possible. The identification of the component or subsystem at the origin of the degradation is a relevant knowledge for practical purposes, exploitable in the modeling of the degradation trend for RUL estimation. In addition, ARRs can be generated automatically using a bipartite graph [18] or a bond graph model [60], and are easy to implement once the parameters of the state model have been identified.

3.1.2 Parity Space

This method is applicable to linear state models and consists in eliminating the internal variables of the system by projection onto an input-output representation space, called parity space [48]. It is generally applied in a discrete time space, taking measurements over a time interval called observation window. Information redundancy is, thus, created without resorting to successive derivations of the measurements. Consider the following example of a system described by the following state model (linear or linearized around an operating point):

$$\displaystyle \begin{aligned} \left\{ {\begin{array}{l} {\dot x\left( t \right) = Ax\left( t \right) + Bu\left( t \right)} \\ {x\left( t \right) = Cx\left( t \right)} \\ \end{array}} \right. {} \end{aligned} $$
(8.6)

First, the observability matrix O obs of the system is computed by Eq. (8.7), using the individual observability matrix of each sensor [73]:

$$\displaystyle \begin{aligned} \begin{array}{l} {O_{obs}} = \left( {\begin{array}{*{20}{c}} {{C_1}} & {{C_2}} & \cdots & {{C_p}} \\ \end{array}} \right) \\ {\mathrm{with}}\,\,{C_i} = \left( {\begin{array}{*{20}{c}} {{c_i}} & {{c_i}A} & \cdots & {{c_i}{A^{n - 1}}} \\ \end{array}} \right) \\ \end{array} {} \end{aligned} $$
(8.7)

The observability matrix is, then, used to calculate the left null observability matrix noted W, which is not unique. In practice, it is not possible to calculate a matrix W that is perfectly orthogonal to the matrix O obs, adding thus a further uncertainty to the structured and unstructured uncertainties of the model and giving rise to non-zero residuals.

After computing the matrix W, the observability matrix is reformulated in terms of inputs, outputs, and their derivatives. The derived observability given for the i th output is

$$\displaystyle \begin{aligned} {O_i} = \left[ {\begin{array}{*{20}{c}} {y\left( t \right)} \\ {\frac{d}{{dt}}y\left( t \right)} \\ \vdots \\ {\frac{{{d^n}}}{{d{t^n}}}y\left( t \right)} \\ \end{array}} \right] - \left[ {\begin{array}{*{20}{c}} 0 & 0 & \cdots & 0 \\ {{c_i}B} & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ {{c_i}{A^{n - 1}}B} & {{c_i}AB} & \cdots & {{c_i}B} \\ \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {u\left( t \right)} \\ {\frac{d}{{dt}}u\left( t \right)} \\ \vdots \\ {\frac{{{d^{n - 1}}}}{{d{t^{n - 1}}}}y\left( t \right)} \\ \end{array}} \right] \end{aligned} $$
(8.8)

The residuals are calculated using Eq. (8.9), multiplying the global derived observability noted O D, computed for all outputs with the matrix W:

$$\displaystyle \begin{aligned} \begin{array}{l} HI= R = W{O_D} \\ {\mathrm{with}}\,\,{O_D} = {\left[ {\begin{array}{*{20}{c}} {{o_1}} & {{o_2}} & \cdots & {{o_p}} \\ \end{array}} \right]^T} \\ \end{array} {} \end{aligned} $$
(8.9)

After analyzing the equations, especially the observability matrix O obs and the global derived observability matrix O D, it is noted that the C i and the O i are functions of the state matrix A, whose parameters represent the physical elements of the system (physical components or physical phenomena). Thus, the occurrence of a system degradation will cause a variation of the parameters of the matrix A and, consequently, a deviation of the residuals from their values in normal operation. The residuals are, therefore, sensitive to the degradation of the system. However, the causal relationship between the residuals R and the variations of the parameters of the matrix A is not explicit: it is drawn in the process of projection in the parity space. For this reason, the parity space is used only for the detection of sensor faults, with an extension to the actuators, under the strong assumption that there are no system faults.

3.1.3 Observer Methods

Observers theory is widely used in the literature for the estimation of observable but unmeasured states [43, 82]. It has been used in fault diagnosis for the generation of health indices through the development of the unknown inputs observers. The stability and convergence analysis, the gains calculation, and assumptions on matrix rank and model inversion have been the subject of several research works [82, 84, 86] and will not be detailed in this work. Only the relevance of the use of the health indices generated using an observer in the context of fault prognosis is analyzed.

Two kinds of models are most used for the synthesis of observers in the framework of fault diagnosis and prognosis. The first model, given in Eq. (8.10) below, allows the simultaneous description of the state of the system and the degradation. These models are interconnected and are of multiple time scales, in order to highlight the difference in the evolution between the fast dynamics of the system behavior and the slow evolution of the degradation:

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \dot{x} &\displaystyle =&\displaystyle f(x,\lambda(\theta),u)\\ \dot{\theta} &\displaystyle =&\displaystyle \epsilon g(x,\theta)\\ y &\displaystyle =&\displaystyle Cx+Du+v \end{array} \end{aligned} $$
(8.10)

where \(x\in \mathbb {R}^n\) is the set of state variables associated with the fast dynamics of the system; \(\theta \in \mathbb {R}^m\) is the set of slow-dynamic variables related to the degradation of the system; \(u\in \mathbb {R}^l\) is the input vector. The parameter vector \(\lambda \in \mathbb {R}^q\) is a function of θ. The rate constant 0 < 𝜖 ≪ 1 defines the time scale separation between fast dynamics and slow drift. \(y\in \mathbb {R}^p\) is the output vector and v is the measurements noise.

The general pattern of the observer-based fault prognosis begins with the joint estimation of the state and the unknown input, with precision, and in a finite time. Then, the estimation of the RUL is carried out by a time projection of the evolution of the slow and fast dynamics until the total failure. The finite-time convergence of an observer is a less common notion in the literature than asymptotic convergence; yet it is necessary in the context of fault prognosis. New conditions of stability and convergence in finite time have been developed in Lyapunov theory, and presented in [16, 17, 85] and [86]. Methods for the synthesis of observers with finite-time convergence have been proposed in [38, 56] and [82] for linear systems, and in [79,80,81] and [84] for nonlinear systems. In the case of observers with unknown input and finite-time convergence, synthesis work has been developed in [97] for the linear case, and in [43] for sliding mode observers.

Although the model of Eq. (8.10) is closest to the reality of the degradation phenomenon and its interaction with the state of the system, it is strongly nonlinear and, moreover, the interaction between the state and the degradation is difficult to formalize (to model). Thus, the most used model for the generation of health indices is the second type of model given in Eq. (8.11) below [71], where the nonlinear system considered consists of a linear part exploited for the synthesis of the observer’s gain, and a nonlinear part satisfying some more or less restrictive assumptions:

$$\displaystyle \begin{aligned} \left\{ {\begin{array}{*{20}{c}} {\dot x = l\left( {x,f,u} \right)} \\ {y = h\left( {x,u} \right) + W\left( u \right)f} \\ \end{array}} \right. {} \end{aligned} $$
(8.11)

x ∈ R n is the state vector. u ∈ U is the known input vector. y ∈ R p is the output vector. f ∈ R m is the vector of unknown inputs whose number is equal to or less than the number of measurements (mp). W is the transfer matrix of the degradation to the output and it is a function of the conditions of use (of the input u). Once the state x is estimated, it is possible to derive an estimate of the unknown input as follows:

$$\displaystyle \begin{aligned} HI = \hat f = W_1^ + \left( u \right)\left( {{y_1} - {h_1}\left( {\hat x,u} \right)} \right) \end{aligned} $$
(8.12)

where \(W_1^ +\) is the pseudo-inverse of W.

Thus, finite-time convergence is a necessary condition for the use of the unknown input observer for fault prognosis. Equation (8.10) also shows that the prior knowledge of the effect of degradation on the system is necessary, as it makes it possible to calculate the matrix W, which must be invertible. As the name implies, degradation is considered to be an unknown input, implying that any change in the dynamics of the system is considered as a degradation. Since the degradation is introduced in an additive way into the model of the system, it can be assumed that:

  • The health index aggregates all the faults that may occur in the system.

  • The prior knowledge necessary for the calculation of the matrix W can be used to identify the nature of the degradation.

3.1.4 Algebraic Methods

In the algebraic framework, the HIs are expressed as an algebraic equation of the system’s variables (u and y) and their derivatives. In fact, in Fliess’s theory [40], and differently from Kalman’s theory, a nonlinear system is defined by a differential field extension LK finitely generated, where L is the system field which contains the system variables and the algebraic equations that links the variables; K is the ground field that contains the coefficients of the system.

The input u of a system LK is a set u = {u 1, …, u m} of L such that the extension LK < u >  is differentially algebraic, which means that any element ω ∈ L satisfies an algebraic differential equation over K < u >  of the form \(P(\omega , {u_1},\ldots ,{u_m},\ldots ,{\dot u}_1,\ldots ,{\dot u}_m)\). The output of a system LK is a set y = {y 1, …, y p} of L. A transcendence basis x = {x 1, …, x n} of the differential field extension LK < u >  is the state of the system LK. Any component of the derivative \(\dot x = \{ {{{\dot x}_1},\ldots ,{{\dot x}_n}} \}\) and y are K < u >  algebraic on x, which leads to the following generalized state variables representation:

$$\displaystyle \begin{aligned} \left\{ {\begin{array}{*{20}{c}} \begin{array}{l} {F_1}\left( {{{\dot x}_1},x,\ldots,u,\dot u,\ldots,{u^{({\alpha _1})}}} \right) = 0 \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\ \end{array} \\ {{F_n}\left( {{{\dot x}_n},x,\ldots,u,\dot u,\ldots,{u^{({\alpha _n})}}} \right) = 0} \\ \begin{array}{l} {H_1}\left( {{y_1},x,\ldots,u,\dot u,\ldots,{u^{({\beta _1})}}} \right) = 0 \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\ \end{array} \\ {{H_p}\left( {{y_p},x,\ldots,u,\dot u,\ldots,{u^{({\beta _p})}}} \right) = 0} \\ \end{array}} \right. {} \end{aligned} $$
(8.13)

where F i,H j are polynomials over k and α i, β j ∈ N; x is called a generalized state.

In the presence of faults (f), the nonlinear system is denoted as an algebraic differential field extension k(u : y, f)∕k(u, f) [27]. If the fault f is a differential algebraic equation with coefficients over the field K(u, y), then it is said to be diagnosable. In other words, the fault variable is written in polynomial form, as function of the input variables, the output variables and their respective derivatives as follows:

$$\displaystyle \begin{aligned} f = h\left( {u,\dot u,\ddot u,\ldots,{u^{(m)}},y,\dot y,\ddot y,\ldots,y(n)} \right) {} \end{aligned} $$
(8.14)

The residuals (\(\hat r\)) given by Eq. (8.15) below are used to evaluate the obtained fault indicator (Eq. (8.14)):

$$\displaystyle \begin{aligned} \hat r = {s^{ - n}}\frac{d^n}{{d{s^n}}}\hat F {} \end{aligned} $$
(8.15)

where the sign \(\left (\hat { }\right )\) means that the variable is written in the Laplace domain.

The following steps are used to obtain the residuals [11, 41]:

  • Put the fault indicator in Laplace domain,

  • Differentiate the result n times with respect to s in order to eliminate the initial conditions, which may be unknown,

  • Multiply by s n and return back to time domain.

These residuals are performed by using the integrals of the measured signals. In the case of noisy signals, these integrals produce a filtering effect. The derivative with respect to s of order n (\(\frac {{{d^n}}}{{d{s^n}}}\)) in Laplace domain results in a multiplication by (−1)nt n in the time domain, and the multiplication by s n in Laplace domain corresponds to an integration of order n in the time domain.

The fault diagnosis based on the algebraic approach is mainly applied to linear systems, and some nonlinear systems for actuator and sensor fault diagnosis. In [15], the algebraic approach in association with a bond graph tool was extended to component fault diagnosis, under the assumption that the system inputs and outputs are fault free. The residuals of Eq. (8.15) reflect only the cumulative sum of the fault indicator from the degradation start until failure time, which means that the residuals do not reflect the degradation dynamics but only its effect. It should also be pointed out that this method does not need prior knowledge on the degradation nature and it has not been used yet for fault prognosis. Finally, this method can only handle additive faults and any change in the system dynamics can be considered as degradation.

3.1.5 Parameter Estimation Method

The fault diagnosis based on the parameter estimation consists to parametric identification of the system model using the system inputs (u) and outputs (y), and monitoring the estimated system parameters. For a nonlinear system described by the following state-space model:

$$\displaystyle \begin{aligned} \left\{ \begin{array}{l} \dot x = h\left( {x,u,\theta } \right) + d \\ y = g(x) \\ \end{array} \right. {} \end{aligned} $$
(8.16)

where x is the state vector, h is the state function, g is the output function, and d represents the system disturbances that are assumed to be a bounded signal, and under the assumption that the system model parameters vary depending on the occurrence of a fault on the physical system. In the normal operation, θ takes the nominal values of the physical parameters; however, in faulty operation, the value of θ varies as a function of fault severity on the physical system. The model of Eq. (8.16) is, then, used for an on-line nonlinear parameter estimation problem, for which unknown fault parameters are estimated using system inputs and outputs, and appropriate approaches such as neural networks [115], fuzzy models [13], and Takagi–Sugeno (TS) models [88], and for linear systems, least-squares (LS) approaches are used. The estimation error (Eq. (8.17)) between the reference model parameters estimated in normal operation and the parameters estimated under faulty conditions is taken as HIs for diagnosis purposes [21].

$$\displaystyle \begin{aligned} HI = \hat \theta - {\theta _n} {} \end{aligned} $$
(8.17)

The fault diagnosis via parameter estimation can handle only additive faults on parameters with slow rate dynamic. The main limit of this method is the difficulty of concluding on fault isolability conditions, since the parameters being estimated are model parameters and they do not represent the system physical parameters. This problem has been partially solved by studying the influence of each physical parameter on the model parameters [55]. As the gradual variations in degradation cause progressive changes in system parameters from their nominal values, which leads to a gradual deviation of HIs from zero, the HI trend can be analyzed to construct prediction models for RUL estimation.

3.1.6 Practical Constraints

Data Availability

At the beginning of operation, data describing the system degradation process and expert knowledge are not available. In this case, the HIs generated from the physical knowledge are the most suitable. In the majority of the practical cases, the faulty operation is defined by thresholds that the parameters of the system must not exceed. These thresholds are used in the literature to estimate the failure thresholds for HIs whose parameters have a clear physical meaning [14, 31]. The estimation of the RUL is then performed using trend modeling methods that do not require prior knowledge of the dynamics of degradation presented in the second part of this chapter.

System Instrumentation

The observability is a necessary condition for the implementation of the methods presented above, for the implementation of the AR method the system must be, in addition, over-constrained. This property is easily verifiable on a state model or a bond graph model. The identification of an optimal placement of sensors to obtain an observable system (or over-constrained for the application of the AR method) is also possible. But in practice, it is not always possible to place all the necessary sensors for the observability of a system, for reasons of cost, lack of space on the system, the non-availability of the sensor, and the consequences of placing a given sensor on the system. Among the practical cases of systems on which the authors of this paper have encountered difficulties of instrumentation for the application of the methods presented above: electric motors [14], where the torque sensor is rarely available, which makes it impossible to generate HIs whose electrical part and mechanical part are decoupled. Indeed, in the example of the HIs generated for the mechanical transmission system, the torque Γ(t) is not measured, but, rather, it is estimated using the model of the interaction between the electrical part and the mechanical part of the brushless motor, given as follows:

$$\displaystyle \begin{aligned} \begin{array}{l} e(t) = {k_e}\dot \theta (t) \\ \varGamma (t) = {k_t}i(t) \\ \end{array} \end{aligned} $$
(8.18)

where k e is the electromotive force (EMF) and k t is the motor torque constant. The use of the current variable to calculate the torque creates a matching of the HIs generated from the electrical part of the motor with the HI generated from the mechanical part. Consequently, it is not possible to locate the degraded subsystem. On thermal engines, in particular marine diesel engines [61], there are many sensors, but the system is not observable. The available sensors are mostly effort sensors (temperature, pressure), while flow sensors (volume flow, mass flow, heat flow, entropy flow) are not available. This is due to the unavailability of some sensors (such as the entropy flow sensors) and the consequence that the sensor placement may have on the system (for example, a mass-flow sensor must be inserted in the pipe, which may promote the appearance of fluid leaks). The same instrumentation constraints are encountered on the electricity production and management systems, and industrial equipment [75, 122].

3.2 Data-Driven and Signal Processing Methods

Among the data-driven methods of HIs generation for fault diagnosis, several are used also for the generation of HIs for failure prognosis [124]. Methods of multivariate analysis, such as principal component analysis (PCA) and its variants (IsoMap, PCA-Kernel, …), are widely used [2, 68, 90, 96] due to the fact that they allow, in addition to generating the HI, extracting a degradation profile from raw data, assuming that this information is initially contained in the raw data. The time and frequency attributes of the measurement signals are also widely used, especially when the instrumentation of the system is poor, and is limited to just one or two sensors. These techniques allow a separation of the features contained in the signal highlighting dynamics which are not perceptible on the raw signal. The features presenting progressive trends that are not related to the normal operation of the system are often related to the process of degradation of the system, and can be exploited for the prognosis of failure [49, 51, 94]. Signal processing methods the most used for generating HIs are: statistical indices [74], empirical modes decomposition (EMD) [54], low pass filters [92], fast Fourier transform (FFT) [77], and wavelet decomposition [76]. In the area of failure prognosis, Ref. [74] uses statistical indices to extract the characteristics susceptible to failure and robust to noise from vibration data pump oil sands; Ref. [44] also uses these statistical indices on the raw data measured on bearings; Ref. [119] applies the EMD to bearings vibration signals to identify and diagnose faults. For the diagnosis of bearing faults from acceleration signals, Ref. [4] applies filtering with several levels of bandwidth to improve the signal-to-noise ratio. For the application of the wavelet transformation, Ref. [70] applies it to the voltage data of the rolling elements of a gearbox to characterize symptoms of early fatigue and cracking.

The principle of analytical redundancy can also be applied by creating redundancy through data-driven models, such as neural networks, support vector machine (SVM), and auto-regressive models. This technique is applied in [31] for the prognosis of failures of the embedded electronic systems, where a NARX neural network is used for the estimation of the consumed power and an ARMAX model is used for the estimation of the temperature. These two estimates are then compared to the measured values to generate health indices for fault diagnosis and failure prognosis. A multilayer perceptron (MLP) neural network is used in [47] for health condition monitoring of a wind turbine gearbox, and a recurrent neural network (RNN)is proposed in [7] for or early fault detection of gearbox bearings. In [102] an adaptive network-based fuzzy inference system (ANFIS) is implemented for wind turbine condition monitoring using normal behavior models.

3.2.1 Practical Constraints

Data Availability

The data-driven methods presented above are all based on the assumption that data containing the degradation process is available, so they are complementary to the physical model-based methods that only require data from the normal operation of the system, used for parameter identification, and a physical knowledge of the system. These two types of approaches are complementary, covering thus a wide field of application.

Properties of the Generated HIs

Unlike HIs generated using physical model-based methods, the properties analysis (Monotonicity, Trendability, and Prognosability) of HIs generated by data-driven methods has been the subject of several research works [10, 24, 25, 52]. Monotonicity is related to the irreversibility assumption of degradation phenomena. Trendability is related to the degradation profile, i.e. it is related to the fact that the HI value is representative of the degradation value at any moment of the evolution of the degradation. Prognosability is related to the amplitude of the HI corresponding to the total failure; this property is respected when the threshold of HI corresponding to the total failure is constant. The metrics presented below are the ones proposed in [24] and [10], as their score is easily interpretable (between 0 and 1), where 1 indicates the most satisfactory and 0 the less satisfactory level of the specific HI property:

$$\displaystyle \begin{aligned} {\mathrm{Monotonicity}}=\frac{1}{N}\left | \sum_{i=1}^{N} M_i \right | \end{aligned} $$
(8.19)

where M i is the monotonicity of a single run-to-failure trajectory given by:

$$\displaystyle \begin{aligned} M_{i}=\frac{n_{i}^{+}}{n_{i}-1}-\frac{n_{i}^{-}}{n_{i}-1}, i=1,\ldots,N \end{aligned} $$
(8.20)

n i is the total number of observations in the ith run-to-failure trajectory and \(n_{i}^{+}\) (\(n_{i}^{-}\)) the number of observations characterized by a positive (negative) first derivative.

$$\displaystyle \begin{aligned} {\mathrm{Trendability}}={\mathrm{min}}\left(\left|{corrcoef}_{ij}\right|\right), i, j=1,\ldots,N \end{aligned} $$
(8.21)

corrcoef ij is the linear correlation coefficient between the ith and the jth run-to-failure trajectories. The computation of the correlation coefficient between two vectors requires that they are formed by the same number of patterns.

$$\displaystyle \begin{aligned} {\mathrm{Prognosability}}={exp}\left(\frac{-std\left(HI_{fail}\right)}{mean\left|HI_{start}-HI_{fail}\right|}\right) \end{aligned} $$
(8.22)

where HI start and HI fail are the HI values at the beginning and end of the run-to-failure trajectories, respectively; \({std\left (HI_{fail}\right )}\) is standard deviation of the HI values at the end of the trajectories. \({mean\left |HI_{start}\right |}\) and \({mean\left |HI_{fail}\right |}\) are the average variation of the HI values between the beginning and the end of the trajectories, respectively.

Recent researches are directed towards the development of methods allowing the extraction of a set of features optimizing the scores of the three properties, as in [10], where the HIs identification is formulated as the problem of selecting the best combination of features to be used, and a multi-objective optimization that considers as objectives the metrics of Monotonicity, Trendability, and Prognosability. The proposed method is based on a binary differential evolution (BDE) algorithm for the multi-objective optimization.

The HI generation methods presented in this paper have been applied by the authors of this review paper on real cases. The details of the application of each method can be found in [14] and [35] for AR method, [33] for parity space method, [15] for algebraic methods, [36, 90] for PCA method, [89] for EMD method, [34, 89] for WD method, and [31, 32] for HI generated using machine learning methods. Table 8.1 summarizes the constraints of use, the advantages, and limitations of the methods presented above.

4 HI Trend Modeling for RUL Estimation

As illustrated in Fig. 8.3, the modeling approaches of HI trends for the estimation of RUL can be decomposed into three main families: physical approaches, data-driven approaches, and expert methods. Another classification is proposed in [2], where the RUL estimation methods are classified into: reliability based, similarity based, model based, and data-driven based approaches. The most used physical model form is the differential one, whose order and parameters are identified according to the physical knowledge and data available on the degradation process. The updating of the parameters makes it possible to compensate the modeling uncertainties and the adaptation of the model to changes in the degradation rate. Data-driven approaches are the most used and can be decomposed into five families: statistical models, stochastic models, deterministic models, probabilistic models, and machine learning model. The third family of trend modeling comprises those approaches that formalize the knowledge of industry experts through the tools of fuzzy logic and Bayes probabilities. Only data-driven methods able to include expert knowledge into the prediction models are presented in this paper.

Fig. 8.3
figure 3

Classification of RUL estimation approaches

4.1 Data-Driven Models

Four kinds of models are presented in this section:

  • Stochastic models, especially continuous and discrete Markov processes.

  • Probabilistic models, based on Bayes probability theory.

  • Statistical models, with a focus on the auto-regressive (AR) models and the auto-regressive moving average (ARMA) model, which is representative of these methods.

  • Deterministic models, which are geometric models allowing to estimate the RUL as a deterministic variable.

4.1.1 Stochastic Models

Markov processes are the most used stochastic models for fault prognosis. These models describe processes without memory, where the probability of the future state X n depends only on the current state X n−1 as shown in the following equation:

$$\displaystyle \begin{aligned} \mathrm{P}({X_n} = {x_n}|{X_{n - 1}} = {x_{n - 1}}, \ldots ,{X_0} = {x_0}) = \mathrm{P}({X_n} = {x_n}|{X_{n - 1}} = {x_{n - 1}}) \end{aligned} $$
(8.23)

x 1, …, x n are linked to the different states of the system. These processes can be divided into two categories: continuous Markov processes and discrete Markov processes.

Continuous Markov Processes

The most common continuous Markov processes used in the literature for prognosis are Wiener and Gamma processes. The hypothesis of independent increments leads these processes to Markov properties because: X(t + Δt) − X(t) is independent of X(t) and X(t + Δt) = X(t) + (X(t + Δt) − X(t)); the process {X(t), t ≥ 0} is, therefore, a Markov process [91].

  • Wiener processes are continuous Markov processes [X t, t > 0], with a drift parameter μ and a variance parameter σ 2, σ > 0. They are well adapted to the modeling of degradation processes which vary over time with a Gaussian noise. These processes are described as follows [103]:

    $$\displaystyle \begin{aligned} X_t = x_0 + \mu t + \sigma B(t) \end{aligned} $$
    (8.24)

    where B(t) is the Brownian motion. The RUL \(H_{t_i}\) at a time t i is defined as the time taken by the variable X t, with t > t i to reach a predefined failure threshold w such that:

    $$\displaystyle \begin{aligned} H_{t_i} = \inf \{\varDelta_{t_i}: X_{t_i+\varDelta_{t_i}} \geq w | X_{t_i} < w\} \end{aligned} $$
    (8.25)

    In the literature, the RUL is often given with a confidence interval, obtained by the calculation of a probability density function given by the following expression [26]:

    $$\displaystyle \begin{aligned} f_{H_{t_i}} (h_{t_i}) = \frac{w - X_{t_i}}{\sqrt{2\pi t_{t_i}^3 \sigma^2}} {exp}\left(- \frac{(w - X_{t_i} - \mu t_{t_i})^2}{2t_{t_i}\sigma^2}\right) \end{aligned} $$
    (8.26)

    Many work apply this process and its variants [95, 105, 110, 111, 113], and particularly [116] which proposes a Wiener process with an updated drift parameter μ t.

  • Gamma process is the most appropriate for modeling a monotonic and gradual deterioration [91]. Reference [1] proposed to use it as a deterioration model randomly introduced over time [6, 23, 42, 62]. Mathematically, a random quantity X follows a Gamma distribution with a shape parameter υ > 0 and a scale parameter u > 0 if its probability density function (PDF) is given as follows:

    $$\displaystyle \begin{aligned} Ga(x|\upsilon,u) = \frac{u^{\upsilon}}{\varGamma(\upsilon)} x^{\upsilon-1} {exp}(-ux)I_{(0,\infty)}(x) \end{aligned} $$
    (8.27)

    where I (0,)(x) = 1 for x ∈ (0, ) and I (0,)(x) = 0 for x∉(0, ), \(\varGamma (\upsilon ) = \int _{z=0}^{\infty } z^{\upsilon -1}e^{-z}dz\) is the Gamma function for υ > 0.

    Given a non-decreasing function υ(t), Gamma process with the form function Υ(t) > 0 and the scale parameter u > 0 is a continuous stochastic process with the following characteristics:

    1. 1.

      X(0) = 0 with a probability of 1

    2. 2.

      X(τ) − X(t) ∼ Ga(υ(τ) − υ(t), u) for all τ > t ≥ 0

    3. 3.

      X(t) has independent increments.

    Let X(t) be the deterioration at time t, t ≥ 0: the PDF of X(t) is as follows:

    $$\displaystyle \begin{aligned} f_{X(t)}(x) = Ga(x| \upsilon(t), u) \end{aligned} $$
    (8.28)

    HIs expectation and variance are as follows:

    $$\displaystyle \begin{aligned} E(X(t)) = \frac{\upsilon(t)}{u}, \quad VAR(X(t)) = \frac{\upsilon(t)}{u^2} \end{aligned} $$
    (8.29)

    A system is said to be faulty when its degradation reaches a predefined threshold S. From Eq. (8.28), the distribution of the failure time at time t is written as follows:

    $$\displaystyle \begin{aligned} \begin{array}{rcl} F(t) &\displaystyle =&\displaystyle Pr\{ T_S \leq t \} = Pr\{ X(t) \geq S \} \\ &\displaystyle =&\displaystyle \int_{x=S}^{\infty} f_{X(t)}(x)dx = \frac{\varGamma(\upsilon(t),S u)}{\varGamma(\upsilon(t))} \end{array} \end{aligned} $$
    (8.30)

    where \(\varGamma (a,x) = \int _{z=x}^{\infty } z^{a-1}e^{-z}dz\) is the incomplete gamma function with x ≥ 0 et a > 0. The PDF of the failure time at time t is, thus:

    $$\displaystyle \begin{aligned} f(t) = \frac{\partial}{\partial t} \Big[ \frac{\varGamma(\upsilon(t),S u)}{\varGamma(\upsilon(t))} \Big] \end{aligned} $$
    (8.31)

    The mean failure time and the average RUL are given in the following equations:

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \mathcal{T}_t &\displaystyle = &\displaystyle \int_{t=0}^{+\infty} tf(t) dt \end{array} \end{aligned} $$
    (8.32)
    $$\displaystyle \begin{aligned} \begin{array}{rcl} RUL_t &\displaystyle =&\displaystyle \mathcal{T}_t - t \end{array} \end{aligned} $$
    (8.33)

    The two Markov processes presented above are widely used to model degradation, covering the majority of degradation profiles: linear and nonlinear, noisy and monotonous. However, these processes require the calculation of a HI X(t), which estimates the current level of degradation of the system and which can be calculated using one of the methods presented in Sect. 8.3. The main limitation is related to the central property of Markov models, called memoryless assumption, which is a relatively strong assumption and thus may lead to strong approximation for real applications. To overcome this issue, a reliability model can be developed to consider the changes in the operating modes of the systems [89]. This model is based on two assumptions: (1) the future value of the HI is a function of the current state of the system, given by the present value of the HI, time, operating modes assumed known, and external noises supposed to follow a Gaussian law; (2) the HI is non-negative and monotonous.

    Given these two hypotheses, the dynamics of the HI can be described as follows:

    $$\displaystyle \begin{aligned} \varDelta X_t = \frac{\beta t^{\beta-1}}{\eta^{\beta}} {exp}(\gamma Z_t + \varepsilon) \end{aligned} $$
    (8.34)

    where β > 0 is the shape parameter of the model, η > 0 is its scale parameter, \(\gamma = [\gamma _1, \ldots , \gamma _m] \in \mathbb {R}^m\) is a vector of m elements, describing the influence of changes in operating modes Z t = [Z 1,t, …, Z m,t] on the degradation. The uncertainties of the model are represented by the variable ε assumed to follow a normal distribution N(0, Q). The first term βt β−1η β depends on time and means that ΔX t depends on the system aging.

    The HI evolution X t is defined as the accumulation of all segments ΔX t:

    $$\displaystyle \begin{aligned} X_t = \sum_{\tau=0}^t \varDelta X_{\tau} \end{aligned} $$
    (8.35)

    Based on the linearity of mathematical expectation, the value of the mathematical expectation of X(t) is calculated as follows:

    $$\displaystyle \begin{aligned} \begin{array}{rcl} E \big[X_t \big] &\displaystyle =&\displaystyle E \left[ \sum_{\tau=0}^t \varDelta X_{\tau} \right] \\ &\displaystyle =&\displaystyle \sum_{\tau=0}^t E \big[\varDelta X_{\tau}\big] \\ &\displaystyle =&\displaystyle \sum_{\tau=0}^t E \left[\frac{\beta \tau^{\beta-1}}{\eta^{\beta}} {exp}(\varepsilon) \right] \\ &\displaystyle =&\displaystyle \sum_{\tau=0}^t \frac{\beta \tau^{\beta-1}}{\eta^{\beta}} E \big[{exp}(\varepsilon) \big] \end{array} \end{aligned} $$
    (8.36)

    𝜖 ∼ N(0, Q) being a normal distribution variable, exp(𝜖) is a log-normal distribution variable with mean value exp(Q∕2):

    $$\displaystyle \begin{aligned} \begin{array}{rcl} E \big[X_t \big] &\displaystyle =&\displaystyle \sum_{\tau=0}^t \frac{\beta \tau^{\beta-1}}{\eta^{\beta}} {exp}(Q/2) \\ &\displaystyle =&\displaystyle {exp}(Q/2) \sum_{\tau=0}^t \frac{\beta \tau^{\beta-1}}{\eta^{\beta}} \end{array} \end{aligned} $$
    (8.37)

    Assuming that k is he RUL and L > X t the predefined failure threshold, the RUL can be estimated as follows:

    $$\displaystyle \begin{aligned} \begin{array}{*{20}{c}} {P(k|{X_t} < L)} \hfill & = \hfill & {P({X_{t + k}} < L|{X_t} < L)} \hfill \\ {} \hfill & = \hfill & {P\left({X_t} + \sum_{i = t + 1}^{t + k} \varDelta {X_i} < L\right)} \hfill \\ {} \hfill & = \hfill & {P\left(\sum_{i = t + 1}^{t + k} \varDelta {X_i} < L - {X_t}\right)} \hfill \\ {} \hfill & = \hfill & {{F_{\sum_{i = t + 1}^{t + k} \varDelta {X_i}}}(L - {X_t})} \hfill \\ \end{array} \end{aligned} $$
    (8.38)

    \( F_{\sum _{i=t+1}^{t+k} \varDelta X_i} (L-X_t)\) is the distribution function (fr) of the sum \(S_k = \sum _{i=t+1}^{t+k} \varDelta X_i\) to the value L − X t.

Discrete Markov Processes

These methods are based on the principle of Markov chains for modeling processes that evolve through a finite number of states [37, 59]. By definition, it is assumed that the probability associated with each state, the probability associated with the transition from one state to another, and the probability of future failure can be estimated. The main property of Markov models is the assumption that the future state depends only on the current state, called conditionally independent or memoryless assumption. The most commonly used models for fault prognosis are the Hidden Markov models (HMMs), characterized by two parameters: (1) number of states of the system, (2) number of observations by state, and three probability distributions: (1) probability distribution of transitions between states, (2) probability distribution of observations, and (3) an initial probability distribution of states [12, 37, 83]. The HMM presents an appropriate mathematical model to describe the failure mechanisms of systems, which evolve in several degraded health states over the time prior to failure, as it can estimate the unobservable health states using observable sensor signals. The word “hidden” is related to the fact that the states are hidden from direct observations, so they only manifest themselves via a probabilistic behavior. HMM can exactly capture the characteristics of each state of the failure process, which is the basis of HMM prognosis [37]. These methods allow, thus, to model several operating conditions of the system and failure scenarios. However, their implementation requires a large amount of data and knowledge for learning, and the calculation intensity, which is proportional to the number of states, can become important for systems with several operating states. The three basic issues in HMMs implementation are: (1) Evaluation/Classification that represents what is the probability to get the model given an observation sequence, (2) Decoding/Recognition that represents what sequence of hidden states is the most optimal or is most probably the one that generates the given sequence of observations, and (3) Learning/Training that represents how to adjust the model parameters.

In the fault prognosis area, a widely used algorithm is the backward–forward algorithm, where the RUL at the time n can be defined as:

$$\displaystyle \begin{aligned} {X_n} = \inf \left\{ {{x_n}:{Y_{n + {x_n}}} = N/{Y_n} \ne N} \right\} \end{aligned} $$
(8.39)

where Y n is the nth observation. The calculation of RUL using Markov chains usually involves the use of the phase-type distribution. As a result, the distribution and the expectation of the RUL are given as:

$$\displaystyle \begin{aligned} \mathrm{Pr(}\mathrm{X}_{\mathrm{n}} = \mathrm{k}) = {\alpha _n}{\tilde P}^{\mathrm{k - 1}}(I - \tilde P) e {} \end{aligned} $$
(8.40)
$$\displaystyle \begin{aligned} \mathrm{E(}\mathrm{X}_{\mathrm{n}}) = \alpha _n(I - \tilde P)^{ - 1} e {} \end{aligned} $$
(8.41)

where

$$\displaystyle \begin{aligned} \begin{array}{*{20}{c}} \mathrm{P} = \left( {\begin{array}{*{20}{c}} \tilde P & {P_0} \\ 0 & 1 \\ \end{array}} \right) & \mathrm{with}\,\,\,\,{P_0} = \left( {1 - \tilde P} \right)e \\ \end{array} \end{aligned} $$
(8.42)

HMM is suitable for nonlinear systems. It can estimate the data distribution of normal operation with nonlinear and multimodal characteristics, assuming that predictable fault patterns are not available. It is applicable to non-stationary systems. It has been widely applied in real applications. The main reason is that the plant operation condition can be divided into several meaningful states, such as “Good,” “OK,” “Minor Defects Only,” “Maintenance Required,” “Unserviceable,” so that the state definition is closer to what is used in industry. It can be used for fault and degradation diagnosis on non-stationary signals and dynamical systems. It is appropriate for multi-failure modes [37, 59, 66].

The main limitation is related to the property of Markov models, i.e. the memoryless assumption. The health state visit time is assumed to follow an exponential distribution, which could be inappropriate for some cases. The transition probability among the system states in Markov models is often determined by empirical knowledge or by a large number of samples, which is not always available. A large amount of data is needed for accurate modeling [37, 59, 66].

The hidden semi-Markov model (HSMM) is an improved HMM, which overcomes the inherent limitation of assuming exponential distributions. Unlike the HMM, the HSMM does not follow the unrealistic Markov chain assumption and therefore provides more powerful modeling and analysis capability for real problems. In addition, the HSMM allows modeling the time duration of the hidden states and therefore is well suited for fault prognosis. A practical example is given in [28], where an approach for RUL estimation from heterogeneous fleet data under variable operating conditions is proposed in three steps:

  • Identification of the degradation states of an homogeneous discrete-time finite-state semi-Markov model using unsupervised ensemble clustering approach.

  • The maximum likelihood estimation (MLE) method and the Fisher information matrix (FIM) are used for parameter identification of the discrete Weibull distributions describing the transitions among the states and their uncertainties.

  • The direct Monte Carlo (MC) simulation based on the degradation model is used to estimate the RUL of fleet equipment.

The proposed approach is applied to two case studies: heterogeneous fleets of aluminum electrolytic capacitors and turbofan engines. Another solution proposed in [53] to overcome the lack of knowledge on the condition monitoring is the online updating of the parameters of the degradation model formulated as a first-order Markov process. The originality of this work consists of the combination of Particle Filtering (PF) technique with a Kernel Smoothing (KS) one, for simultaneously estimating the degradation state and the unknown parameters in the degradation model, while significantly overcoming the problem of particle impoverishment.

4.1.2 Conditional Probabilistic Models

These models are based on Bayes theorem, which describes relationship between the conditional and marginal probabilities of two stochastic events A and B as follows:

$$\displaystyle \begin{aligned} P(A|B) = \frac{P(B|A)P(A)}{P(B)} \end{aligned} $$
(8.43)

These methods describe the current state as a conditional probability function and, then, apply Bayes theorem to update the probability assessment of future behavior. The most used modeling tool is the Bayesian network, which is a probabilistic graphical model representing random variables in the form of an acyclic oriented graph. In the field of aeronautics, [39] uses the network with variables such as aircraft weight, landing speed, and brake operation to predict brake failure. In other research work, Bayesian networks are associated with the Kalman filter [22, 69] or particle filter [20, 93, 106] for failure prognosis.

4.1.3 Statistical Models

The ARMA, the ARIMA (Auto-Regressive Integrated Moving Average), and the ARMAX (Auto-Regressive Moving Average eXogenous inputs) models, initially used for time series prediction, have been used to estimate the RUL by considering the future value of the degradation as a linear function of system inputs, past observations, and random noise. To show how these methods are used for prognostics, let us take the example of the ARMA model. A time series \(\left \{ {{x_t}|t = 1,2,\ldots } \right \}\) is generated by an ARMA model (p, q) as follows:

$$\displaystyle \begin{aligned} x_t = \sum_{i=1}^{p} \phi_i x_{t-i} + \sum_{j=0}^q \theta_j \epsilon_{t-j}, \quad (\theta_0 = 1) \end{aligned} $$
(8.44)

where x t is a series at the instant t, p and q are non-zero integers, p is the order of the auto-regressive part, q is the order of the moving average part, \(\left \{ {{\epsilon _t}} \right \}\) indicates the noise series, {ϕ i, i = 1, …, p} et {θ j, j = 1, …, q} are the parameters to be estimated.

To use this model for fault prognosis, the variable x t is considered as the HI which represents the system condition state and the failure threshold D is supposed known. The RUL at instant t of the system is calculated by the following equation:

$$\displaystyle \begin{aligned} rul_t = {\mathrm{min}}\{\varDelta t : x_{t+\varDelta t} \geq D | x_{t} < D\} \end{aligned} $$
(8.45)

Yan et al. [118] have used ARMA model for fault prognosis. An ARMA model is incorporated in a software for data fusion and prognosis [65]. An extension of an ARMA model by usingbootstrap forecasting is proposed in [117].

The use of these models is simple for prognosis. However, they assume that the future state of the system is a linear function of the system inputs, past observations, and noise, which is not often the case in reality. Moreover, their results are sensitive to the initial conditions, thus leading to an accumulation of systematic errors in the prediction.

4.1.4 Deterministic Models

This approach is supervised by the calculation of the Euclidean distance (d) between the actual status of the system, given by the actual HIs values and the faulty HIs identified offline. The degradation speed (v), which indicates how the degradation moves from the normal operation to the faulty one, is used to compute the RUL as follows:

$$\displaystyle \begin{aligned} rul\left( t \right) = \left| {\frac{{d\left( t \right)}}{{v\left( t \right)}}} \right| {} \end{aligned} $$
(8.46)

To compute the distance d(t) between the n HIs in real time operation and the barycenter of the identified faulty operating cluster (C f(c 1, c 2, …, c n)), the following Euclid metric is considered:

$$\displaystyle \begin{aligned} d\left( t \right) = \sqrt {{{\left( {{r_1}(t) - {c_1}} \right)}^2} + {{\left( {{r_2}(t) - {c_2}} \right)}^2} + \cdots + {{\left( {{r_n}(t) - {c_n}} \right)}^2}} {} \end{aligned} $$
(8.47)

where r 1(t), r 2(t),…r n(t) is a set of HIs defining the monitoring space. This set can be generated using one of the HI generation methods presented in the previous section.

The numerical differentiation of the distance variable d is taken to compute the degradation speed v:

$$\displaystyle \begin{aligned} v\left( t \right) = \frac{{d\left( {t + {\varDelta T}} \right) - d\left( t\right)}}{{{\varDelta T}}} {} \end{aligned} $$
(8.48)

where ΔT is the sampling time for speed degradation computation. whose value is chosen so that the noise is not amplified.

The main interest of this method is the fact that no knowledge is required on the tendency or the pattern of degradation. It is accurate when the faulty operation is clearly identified and successfully applied to a wind turbine system [36] for RUL estimation.

Other deterministic models are used in the literature, especially when the degradation profile is known. The models are identified by using the fitting methods applied on the available profile of the degradation. Linear, exponential, and polynomial models are the most used [120].

4.1.5 Learning Techniques

Learning techniques are widely used in the literature for trend modeling in the field of failure prognosis. These regression models, such as neural networks and support vector regression (SVR), are scalable and able to accommodate nonlinear dynamics, but require a large amount of data for learning. For unsupervised learning cases, an example of using SVR for RUL prediction is proposed in [32] for failure prognosis of embedded systems. The prediction is realized using a SVR at a step of the evolution of the health index. The SVR expression is given as follows:

$$\displaystyle \begin{aligned} \hat{HI}(k+1)=\sum_{[i=(m-1)\tau+1]}^{N-1}\alpha_{i}^{*}K(HI(i),HI(k))+b^{*} \end{aligned} $$
(8.49)

where \(\alpha _{i}^{*}\) are Lagrange multipliers and τ the delay. In this work, the standard SVR toolbox is used without making any special changes to the prediction of temporal overlays. The free parameters, C, 𝜖, the size of the kernel (Gaussian), and the dipping dimension m are selected from a comprehensive search in the parameter space to optimize the performance of the prediction on the validation set. The available N observations are therefore shared between two sets of training and validation of respective sizes Ne and Nv. Values for which the prediction error at a step on the validation set is minimal are retained for the final prediction. Once the parameters are fixed, the prediction is made using all the N observations available. The predictions at several steps, i.e. for the values (k ≥ N + 1), are realized by the ratio of the prediction at one step, using the estimated vectors \(\hat {HI}(k)\) at the previous iterations and not the observations themselves.

In addition to machine learning techniques, deep learning techniques like long short-term memory (LSTM), which can remember information for long periods of time, are used for trend modeling and RUL prediction. An application case is proposed in [121], where a long short-term memory recurrent neural network is used for RUL prediction of lithium-ion batteries.

4.2 Physical Models

RUL estimation based on a physical model consists of considering that degradation follows a parametric trend, which can take one of the following ordinary differential equations (ODE):

$$\displaystyle \begin{aligned} \begin{array}{l} \dot F = {\beta _1}F \\ \dot F = {\beta _2}{F^2} \\ \ddot F = {\beta _3}\dot F + F + {\beta _4} \\ \ddot F = {\beta _5}{{\dot F}^2} + {\beta _6}F + {\beta _7} \\ \end{array} \end{aligned} $$
(8.50)

where F is the fault component value describing the degradation and β i (i = 1…7) represent the degradation model coefficients which are identified on-line by using the least square method [30] or particle filter [29, 58]. For example, the RUL associated with the degradation model of the form \(\dot F = {\beta _1}F\) is given by:

$$\displaystyle \begin{aligned} {\mathrm{RUL}} = \frac{{\ln \left( {\frac{{1 - th}}{{1 - H{I_i}}}} \right)}}{\beta } - N{T_s} \end{aligned} $$
(8.51)

where N represents the sample data, T s is the sampling time, and th is the failure threshold.

Other trend modeling approaches for RUL estimation can be found in the literature, such as [99] where the RUL estimation is treated as an uncertainty propagation problem, [103] where the review is focused on statistical data-driven approaches, relying only on available past observed data and statistical models, [5] which provides practical options for prognostics so that beginners can select appropriate methods for their fields of application.

4.3 Practical Constraints

The purpose of the RUL estimate is to give to the maintenance experts’ two pieces of information: the first is that the system will undergo the occurrence of a total failure, the second is to give a sufficient time horizon for the maintenance experts in order to plan a maintenance strategy. A metric proposed in [101], called Prognosis Horizon (PH), is used to evaluate this time horizon in a confidence interval which can be defined by the user. Prognosis Horizon (PH) ranges within \(\left [ {0,\infty } \right [\) and is calculated as follows:

$$\displaystyle \begin{aligned} PH\left( i \right) = EoP - CT \end{aligned} $$
(8.52)

It represents the difference between the Current Time index (CT) and the End of Prediction time index (EoP), obtained when the prediction crosses the failure threshold. A practical example of the PH calculation is illustrated in Fig. 8.4a which represents the PH calculated for a wind turbine system in presence of an unbalance fault caused by a progressive deformation of the blade. In this practical example, the HIs are generated using the PCA method and the trend modeling for RUL estimation is performed using a deterministic model, based on Euclidean distance [36]. Figure 8.4a shows that the obtained PH, by considering a confidence interval of 18%, is equal to 65 h. The maintenance expert can, then, make a decision, whether or not this HP is sufficient to plan a maintenance strategy in good conditions. If it judges that this PH is insufficient it is possible to increase it but by increasing the confidence interval.

Fig. 8.4
figure 4

Illustration of the PH RA metrics calculated for a wind turbine system in presence of an unbalance fault caused by a progressive deformation of the blade

To give the user an easily interpretable measurement tool of the confidence that can be given to the PH metric, another metric is proposed in [101] where the accuracy is quantified according to the real RUL. This metric is called relative accuracy (RA) and expressed as follows:

$$\displaystyle \begin{aligned} RA\left( t \right) = 1 - \frac{{\left| {RU{L^*}\left( t \right) - {\mathrm{RUL}}\left( t \right)} \right|}}{{RU{L^*}\left( t \right)}} {} \end{aligned} $$
(8.53)

RUL is the real RUL. The range score of the RA metric is between [0, 1], and the best score is close to 1. A practical example of the results of this metric applied to the estimated RUL before the total degradation of the wind turbine system is given in Fig. 8.4b. It shows that the RA is greater than 0.7 in average over the PH, but has a great variability. All these measures will enable maintenance experts to assess the risks and make the right decisions for the maintenance of the systems.

In addition, these two metrics can be used to compare the performances of different methods in a given context. A practical example is given in Fig. 8.5, where the PH and RA are calculated for four prognostic methods applied to the RUL prediction on the mechanical transmission system presented in [14]. In this paper, the analytical redundancy method is used for HIs generation and four trend modeling methods are applied for RUL estimation: an auto-regressive (AR) model, whose parameters are estimated using the least square methods, an updated Wiener process, whose drift parameter is updated using a Kalman filter, a first-order differential model whose parameter is updated using a particle filter (PF), and a deterministic model based on the calculation of the Euclidean distance [14]. The performance results of the considered RUL estimation methods are given in Table 8.2, which shows that the AR model and the Wiener model have the largest PH, thus giving the user more time to react, whereas the Wiener model is less stable since it presents a greater variability in its RA.

Fig. 8.5
figure 5

Practical illustration of the use of PH and RA metrics for the performance comparison of the RUL estimation methods

Table 8.2 Performance comparison of the considered RUL estimation methods by universal metrics

It would be also better to evaluate the RUL prediction accuracy using—accuracy and cumulative relative accuracy (CRA) proposed in [100, 101].

5 Discussion and Future Challenges

5.1 Discussion

After analyzing the methods described above, one can see that discrete Markov models are the most complex to implement because they require expert knowledge and rich databases on the previous operation of the system and its failures; the uncertainty brought by expert knowledge is often taken into account using fuzzy logic. The memoryless assumption, which is the main property of discrete Markov processes, and that they also share with continuous processes, is a major limitation in the use of these processes for the estimation of the RUL. To overcome it, the hidden semi-Markov models (HSMMs), that do not follow the unrealistic Markov chain assumption, to provide more powerful modeling and analysis capabilities for real fault prognosis problems.

Continuous Markov processes, especially the Wiener and Gamma processes, are widely used in the literature as they are easy to implement and are well adapted to modeling the progressive dynamics of degradation phenomena. The updating of the parameters by increasingly powerful techniques such as maximum likelihood, the Kalman filter, and the particle filter makes it possible to adapt the estimation to the possible changes in the rate of degradation and provide in part a solution to the limit related to the memoryless assumption. Research works have gone even further in modeling, drawing on the Cox model, by proposing a reliability function that takes into account the covariates representing changes in the operating modes of a system: the limit of this model is related to the fact that the evolution of the covariates must be known beforehand, which is difficult to obtain on systems such as energy and transport systems where covariates are environment variables that are not controlled.

The representation of degradation processes by adaptive differential models makes it possible to take into account the physical knowledge available on these phenomena for the choice of the order of the models. Continuous parameter updating adapts to the change in degradation rate, and structured and unstructured uncertainties are taken into account by the generation of normal operating thresholds and total failure thresholds. The main limitation of this type of model is related to problems of amplification of the noises generated by the successive derivations of the outputs, as well as the lack of physical knowledge about the degradation processes, which generally leads to an arbitrary choice of the order of the model. Geometric models are efficient and accurate, but require complex classification work to identify clusters, using the physical model for generating the useful databases for learning.

The choice of the HI modeling approach depends on the context of use, the complexity of the system. and the information available on its previous operating modes, especially for the definition of the structure of the model as well as for the identification of its parameters. Table 8.3 summarizes a set of criteria, not exhaustive, that can be used as a basis for choosing the HI modeling methodology for the estimation of RUL.

Table 8.3 Trend modeling approaches for RUL estimation

5.2 Future Challenges

The general formulation of the Remaining Useful Life (RUL) of a system can be expressed in a general form, as a function of the time t, the current condition monitoring CM(t) and the current health state HS(t) (Eq. (8.54) below). However, in practice, the state of degradation is neither available nor measurable in the majority of cases, health indices must be deduced from the physical knowledge, expert knowledge, and available measurements [90]:

$$\displaystyle \begin{aligned} {\mathrm{RUL}}= g(t, HS(t), CM(t)) {} \end{aligned} $$
(8.54)
  • The use of the techniques initially developed for fault detection and isolation (FDI) to estimate the health state (HS) of the system is a good idea, but in the context of failure prognosis, the early detection becomes a major issue, as it is necessary to estimate the RUL well in advance to allow maintenance operators to plan their maintenance interventions. FDI techniques treat uncertainties in a probabilistic or deterministic manner to generate thresholds that provide a better compromise between false alarms and non-detections. In the context of failure prognosis, it is necessary to take into account also the Prognosis Horizon [100, 101]. In addition, the problems related to the occurrence of multiple faults, their interaction, their effect on the HS of the system remain.

  • Condition monitoring (CM(t)), necessary for RUL estimation, is not always known especially in systems operating in a randomly variable environment, such as offshore wind turbines and transportation systems. The solution proposed in the literature consists in taking account of the known or controlled CMs by using, for example, the modified Cox model [89], and in compensating the lack of knowledge about the unknown CMs by an online update of the model parameters. This solution is effective in some application cases, but in the case where these CMs vary strongly and continuously, the RUL estimate will change considerably and continuously, which will prevent the use of the estimated RUL for planning the maintenance. The solution may be to associate the RUL estimate with risk analysis methods taking into account several operating and degradation scenarios [123, 125].

6 Conclusion

This review of horizontal approaches for RUL estimation has highlighted the diversity of methods proposed in the literature as well as their formal description and context of use. The analysis shows that the two major stages of the procedure can be synthesized independently, but that the context of use, the complexity of the systems as well as the history of available data and expertise are common elements that govern the relevance of the choice of the HI generation methods and the method of modeling its tendency for the estimation of the RUL. The diversity of methods for generating health indices can cover a wide range of applications. The PCA makes it possible to simultaneously reduce the size of the data and generate health indices from large databases with linear, bilinear, or nonlinear dependencies. When the instrumentation of the systems is not rich, the statistical, frequency, and time-frequency attributes can be extracted from the signals and, then, analyzed to make them HIs for the estimation of the RUL. When physical knowledge is relevant enough to take modeling assumptions, build and validate physical models, these latter are, then, associated with health index generation methods such as analytical redundancy, observers, and parameter estimation. In this paper, we have also presented the trend modeling methods that are the simplest to implement and that are adapted to the physical properties of degradation processes, such as the progressive aspect and the influence of the environment and operating modes of the systems. The choice of the model depends on the context of use, the physical knowledge available on degradation processes, the expert feedback, and available data. The RUL can be presented as a stochastic, probabilistic, or deterministic variable.