6.1 Introduction

Stochastic processes are used for the description of a systems operation over time. There are two main types of stochastic processes: continuous and discrete. The complex continuous process is a process describing a system transition from state to state. The simplest process that will be discussed here is a Markov process. Given the current state of the process, its future behavior does not depend on the past. This chapter describes the concepts of stochastic processes including Markov process, Poisson process, renewal process, quasi-renewal process, and nonhomogeneous Poisson process, and their applications in reliability and availability for degraded systems with repairable components.

6.2 Markov Processes

In this section, we will discuss discrete stochastic processes. As an introduction to the Markov process, let us examine the following example.

Example 6.1

Consider a parallel system consisting of two components (see Fig. 6.1). From a reliability point of view, the states of the system can be described by.

Fig. 6.1
figure 1

A two-component parallel system

  • State 1: Full operation (both components operating).

  • State 2: One component operating - one component failed.

  • State 3: Both components have failed.

Define

$$ P_{i} (t) = P[X(t) = i] = P[{\text{system is in state}}\;i\;{\text{at time}}\;t] $$

and

$$P_{i} (t + dt) = P[X(t + dt) = i] = P[{\text{system is in state}}\;i\;{\text{at time}}\;t + dt].$$

Define a random variable X(t) which can assume the values 1, 2, or 3 corresponding to the above-mentioned states. Since X(t) is a random variable, one can discuss P[X(t) = 1], P[X(t) = 2] or conditional probability, P[X(t1) = 2 | X(t0) = 1]. Again, X(t) is defined as a function of time t, the last stated conditional probability, P[X(t1) = 2 | X(t0) = 1], can be interpreted as the probability of being in state 2 at time t1, given that the system was in state 1 at time t0. In this example, the “state space” is discrete, i.e., 1, 2, 3, etc., and the parameter space (time) is continuous. The simple process described above is called a stochastic process, i.e., a process which develops in time (or space) in accordance with some probabilistic (stochastic) laws. There are many types of stochastic processes. In this section, the emphasis will be on Markov processes which are a special type of stochastic process.

Definition 6.1

Let t0 < t1 < \( \cdots \) < tn. If.

$$ \begin{aligned} P[X(t_{n} ) & = A_{n} |X(t_{{n - 1}} ) = A_{{n - 1}} ,X(t_{{n - 2}} ) = A_{{n - 2}} , \ldots .,X(t_{0} ) = A_{0} ] \\ & = P[X(t_{n} ) = A_{n} |X(t_{{n - 1}} ) = A_{{n - 1}} ] \\ \end{aligned} $$
(6.1)

then the process is called a Markov process. Given the present state of the process, its future behavior does not depend on past information of the process.

The essential characteristic of a Markov process is that it is a process that has no memory; its future is determined by the present and not the past. If, in addition to having no memory, the process is such that it depends only on the difference (t + dt) − t = dt and not the value of t, i.e., P[X(t + dt) = j | X(t) = i] is independent of t, then the process is Markov with stationary transition probabilities or homogeneous in time. This is the same property noted in exponential event times, and referring back to the graphical representation of X(t), the times between state changes would in fact be exponential if the process has stationary transition probabilities.

Thus, a Markov process which is time homogeneous can be described as a process where events have exponential occurrence times. The random variable of the process is X(t), the state variable rather than the time to failure as in the exponential failure density. To see the types of processes that can be described, a review of the exponential distribution and its properties will be made. Recall that, if X1, X2,…, Xn, are independent random variables, each with exponential density and a mean equal to 1/λi then min { X1, X2, …, Xn} has an exponential density with mean \(\left( {\sum {\lambda _{i} } } \right)^{{ - 1}}\).

The significance of the property is as follows:

  1. 1.

    The failure behavior of the simultaneous operation of components can be characterized by an exponential density with a mean equal to the reciprocal of the sum of the failure rates.

  2. 2.

    The joint failure/repair behavior of a system where components are operating and/or undergoing repair can be characterized by an exponential density with a mean equal to the reciprocal of the sum of the failure and repair rates.

  3. 3.

    The failure/repair behavior of a system such as 2 above, but further complicated by active and dormant operating states and sensing and switching, can be characterized by an exponential density.

The above property means that almost all reliability and availability models can be characterized by a time homogeneous Markov process if the various failure times and repair times are exponential. The notation for the Markov process is {X(t), t > 0}, where X(t) is discrete (state space) and t is continuous (parameter space). By convention, this type of Markov process is called a continuous parameter Markov chain.

From a reliability/availability viewpoint, there are two types of Markov processes. These are defined as follows:

  1. 1.

    Absorbing Process: Contains what is called an “absorbing state” which is a state from which the system can never leave once it has entered, e.g., a failure which aborts a flight or a mission.

  2. 2.

    Ergodic Process: Contains no absorbing states such that X(t) can move around indefinitely, e.g., the operation of a ground power plant where failure only temporarily disrupts the operation.

Pham (2000) presents a summary of the processes to be considered broken down by absorbing and ergodic categories. Both reliability and availability can be described in terms of the probability of the process or system being in defined “up” states, e.g., states 1 and 2 in the initial example. Likewise, the mean time between failures (MTBF) can be described as the total time in the “up” states before proceeding to the absorbing state or failure state.

Define the incremental transition probability as

$$ P_{{ij}} (dt) = P[X(t + dt) = j|X(t) = i] $$

This is the probability that the process (random variable X(t)) will go to state j during the increment t to (t + dt), given that it was in state i at time t. Since we are dealing with time homogeneous Markov processes, i.e., exponential failure and repair times, the incremental transition probabilities can be derived from an analysis of the exponential hazard function. In Sect. 2.1, it was shown that the hazard function for the exponential with mean 1/λ was just λ. This means that the limiting (as \(dt \to 0\)) conditional probability of an event occurrence between t and t + dt, given that an event had not occurred at time t, is just λ, i.e.,

$$ h(t) = \mathop {\lim }\limits_{{dt \to 0}} \frac{{P[t < X < t + dt|X > t]}}{{dt}} = \lambda $$

The equivalent statement for the random variable X(t) is

$$ h(t)dt = P[X(t + dt) = j|X(t) = i] = \lambda dt $$

Now, h(t)dt is in fact the incremental transition probability, thus the Pij(dt) can be stated in terms of the basic failure and/or repair rates. Define.

Pi (t)::

the probability that the system is in state i at time t

rij (t)::

transition rate from state i to state j

In general, the differential equations can be written as follows:

$$ \frac{{\partial P_{i} (t)}}{{\partial t}} = - \sum\limits_{j} {r_{{ij}} (t)P_{i} (t)} + \sum\limits_{j} {r_{{ji}} (t)P_{j} (t)} . $$
(6.2)

Solving the above different equations, one can obtain the time-dependent probability of each state.

Example 6.2

Returning to Example 6.1, a state transition can be easily constructed showing the incremental transition probabilities for process between all possible states:

  • State 1: Both components operating.

  • State 2: One component up - one component down.

  • State 3: Both components down (absorbing state).

The loops (see Fig. 6.2) indicate the probability of remaining in the present state during the dt increment

$$ \begin{array}{*{20}l} {P_{{11}} (dt) = 1 - 2\lambda dt} \hfill & {P_{{12}} (dt) = 2\lambda dt} \hfill & {P_{{13}} (dt) = 0} \hfill \\ {P_{{21}} (dt) = 0} \hfill & {P_{{22}} (dt) = 1 - \lambda dt} \hfill & {P_{{23}} (dt) = \lambda dt} \hfill \\ {P_{{31}} (dt) = 0} \hfill & {P_{{32}} (dt) = 0} \hfill & {P_{{33}} (dt) = 1} \hfill \\ \end{array} $$
Fig. 6.2
figure 2

State transition diagram for a two-component system

“up” states before proceeding to the absorbing state or failure state.

The zeros on Pij, i > j, denote that the process cannot go backwards, i.e., this is not a repair process. The zero on P13 denotes that in a process of this type, the probability of more than one event (e.g., failure, repair, etc.) in the incremental time period dt approaches zero as dt approaches zero.

Except for the initial conditions of the process, i.e., the state in which the process starts, the process is completely specified by the incremental transition probabilities. The reason for the latter is that the assumption of exponential event (failure or repair) times allows the process to be characterized at any time t since it depends only on what happens between t and (t + dt). The incremental transition probabilities can be arranged into a matrix in a way which depicts all possible statewide movements. Thus, for the parallel configurations,

$$ [P_{{ij}} (dt)] = \left[ {\begin{array}{*{20}c} {1 - 2\lambda dt} & {2\lambda dt} & 0 \\ 0 & {1 - \lambda dt} & {\lambda dt} \\ 0 & 0 & 1 \\ \end{array} } \right] $$

for i, j = 1, 2, or 3. The matrix [Pij(dt)] is called the incremental, one-step transition matrix. It is a stochastic matrix, i.e., the rows sum to 1.0. As mentioned earlier, this matrix along with the initial conditions completely describes the process.

Now, [Pij(dt)] gives the probabilities for either remaining or moving to all the various states during the interval t to t + dt, hence,

$$ \begin{aligned} & P_{1} (t + dt) = (1 - 2\lambda dt)P_{1} (t) \\ & P_{2} (t + dt) = 2\lambda dtP_{1} (t)(1 - \lambda dt)P_{2} (t) \\ & P_{3} (t + dt) = \lambda dtP_{2} (t) + P_{3} (t) \\ \end{aligned} $$
(6.3)

By algebraic manipulation, we have

$$ \begin{aligned} & \frac{{[P_{1} (t{\text{ }} + {\text{ }}dt) - P_{1} (t)]}}{{dt}} = - 2\lambda P_{1} (t) \\ & \frac{{[P_{2} (t{\text{ }} + {\text{ }}dt) - P_{2} (t)]}}{{dt}} = 2\lambda P_{1} (t) - \lambda P_{2} (t) \\ & \frac{{[P_{3} (t{\text{ }} + {\text{ }}dt) - P_{3} (t)]}}{{dt}} = \lambda P_{2} (t) \\ \end{aligned} $$

Taking limits of both sides as \(dt \to 0\), we obtain (also see Fig. 6.3)

$$ \begin{aligned} & P_{1}^{\prime } (t) = - 2\lambda P_{1} (t) \\ & P_{2}^{\prime } (t) = 2\lambda P_{1} (t) - 2\lambda P_{2} (t) \\ & P_{3}^{\prime } (t) = \lambda P_{2} (t) \\ \end{aligned} $$
(6.4)
Fig. 6.3
figure 3

Markov transition rate diagram for a two-component parallel system

The above system of linear first-order differential equations can be easily solved for P1(t) and P2(t), and therefore, the reliability of the configuration can be obtained:

$$ R(t) = \sum\limits_{{i = 1}}^{2} {P_{i} (t)} $$

Actually, there is no need to solve all three equations, but only the first two as P3(t) does not appear and also P3(t) = 1 – P1(t) – P2(t). The system of linear, first-order differential equations can be solved by various means including both manual and machine methods. For purposes here, the manual methods employing the Laplace transform (see Appendix B) will be used.

$$L[P_{i}(t)] = \int_{0}^{\infty } {e^{{ - st}} } P_{i}(t)dt = f_{i}(s)$$
(6.5)
$$L[P_{i}^{\prime } (t)] = \int_{0}^{\infty } {e^{{ - st}} } P_{i}^{\prime } (t)dt = s{\text{ }}f_{i}(s) - P_{i} (0)$$

The use of the Laplace transform will allow transformation of the system of linear, first-order differential equations into a system of linear algebraic equations which can easily be solved, and by means of the inverse transforms, solutions of Pi(t) can be determined.

Returning to the example, the initial condition of the parallel configuration is assumed to be “full-up” such that

$$ P_{1} (t = 0) = 1,P_{2} (t = 0) = 0,P_{3} (t = 0) = 0 $$

transforming the equations for P′1(t) and P′2(t) gives

$$ \begin{aligned} & sf_{1} (s) - P_{1} (t)|_{{t = 0}} = - 2\lambda f_{1} (s) \\ & sf_{2} (s) - P_{2} (t)|_{{t = 0}} = 2\lambda f_{1} (s) - \lambda f_{2} (s) \\ \end{aligned} $$

Evaluating P1(t) and P2(t) at t = 0 gives

$$ \begin{aligned} & sf_{1} (s) - 1 = - 2\lambda f_{1} (s) \\ & sf_{2} (s) - 0 = 2\lambda f_{1} (s) - \lambda f_{2} (s) \\ \end{aligned} $$

from which we obtain

$$ \begin{aligned} & (s + 2\lambda )f_{1} (s) = 1 \\ & - 2\lambda f_{1} (s) + (s + \lambda )f_{2} (s) = 0 \\ \end{aligned} $$

Solving the above equations for f1(s) and f2(s), we have

$$ \begin{aligned} & f_{1} (s) = \frac{1}{{(s + 2\lambda )}} \\ & f_{2} (s) = \frac{{2\lambda }}{{[(s + 2\lambda )(s + \lambda )]}} \\ \end{aligned} $$

From Appendix B of the inverse Laplace transforms,

$$ \begin{aligned} & P_{1} (t) = e^{{ - 2\lambda t}} \\ & P_{2} (t) = 2e^{{ - \lambda t}} - 2e^{{ - 2\lambda t}} \\ & R(t) = P_{1} (t) + P_{2} (t) = 2e^{{ - \lambda t}} - e^{{ - 2\lambda t}} \\ \end{aligned} $$
(6.6)

The example given above is that of a simple absorbing process where we are concerned about reliability If repair capability in the form of a repair rate μ were added to the model, the methodology would remain the same with only the final result changing.

Example 6.3

Continued from Example 6.2 with a repair rate μ added to the parallel configuration (see Fig. 6.4), the incremental transition matrix would be

$$ [P_{{ij}} (dt)] = \left[ {\begin{array}{*{20}c} {1 - 2\lambda dt} & {2\lambda dt} & 0 \\ {\mu dt} & {1 - (\lambda + \mu )dt} & {\lambda dt} \\ 0 & 0 & 1 \\ \end{array} } \right] $$
Fig. 6.4
figure 4

Markov transition rate diagram for a two-component repairable system

The differential equations would become

$$ \begin{array}{*{20}l} {P_{1} ^{\prime } (t) = - 2\lambda P_{1} (t) + \mu P_{2} (t)} \hfill \\ {P_{2}^{\prime } (t) = 2\lambda P_{1} (t) - (\lambda + \mu )P_{2} (t)} \hfill \\ \end{array} $$
(6.7)

and the transformed equations would become

$$ \begin{aligned} & (s + 2\lambda )f_{1} (s) - \mu f_{2} (s) = 1 \\ & - 2\lambda f_{1} (s) + (s + \lambda + \mu )f_{2} (s) = 0 \\ \end{aligned} $$

Hence, we obtain

$$ \begin{aligned} & f_{1} (s) = \frac{{(s + \lambda + \mu )}}{{(s - s_{1} )(s - s_{2} )}} \\ & f_{2} (s) = \frac{{2\lambda }}{{(s - s_{1} )(s - s_{2} )}} \\ \end{aligned} $$

where

$$ \begin{aligned} & s_{1} = \frac{{ - (3\lambda + \mu ) + \sqrt {(3\lambda + \mu )2 - 8\lambda ^{2} } }}{2} \\ & s_{2} = \frac{{ - (3\lambda + \mu ) - \sqrt {(3\lambda + \mu )2 - 8\lambda ^{2} } }}{2} \\ \end{aligned} $$
(6.8)

Using the Laplace transform (see Appendix B), we obtain

$$ \begin{aligned} & P_{1} (t) = \frac{{(s_{1} + \lambda + \mu )e^{{ - s_{1} t}} }}{{(s_{1} - s_{2} )}} + \frac{{(s_{2} + \lambda + \mu )e^{{ - s_{2} t}} }}{{(s_{2} - s_{1} )}} \\ & P_{2} (t) = \frac{{2\lambda e^{{ - s_{1} t}} }}{{(s_{1} - s_{2} )}} + \frac{{2\lambda e^{{ - s_{2} t}} }}{{(s_{2} - s_{1} )}} \\ \end{aligned} $$

Reliability function R(t), is defined as the probability that the system continues to function throughout the interval (0,t). Thus, the reliability of two-component in a parallel system is given by

$$ \begin{aligned} R(t) & = P_{1} (t) + P_{2} (t) \\ & = \frac{{(s_{1} + 3\lambda + \mu )e^{{ - s_{1} t}} - (s_{2} + 3\lambda + \mu )e^{{ - s_{2} t}} }}{{(s_{1} - s_{2} )}} \\ \end{aligned} $$
(6.9)

6.2.1 Three-Non-Identical Unit Load-Sharing Parallel Systems

Let fif(t), fih(t), and fie(t) be the pdf for time to failure of unit i, for i = 1, 2, and 3 under full load and half load and equal-load condition (this occurs when all three units are working), respectively. Also let Rif(t), Rih(t), and Rie(t) be the reliability of unit i under full load and half load condition, respectively. The following events would be considered for the three-unit load-sharing parallel system to work:

  • Event 1: All the three units are working until the end of mission time t;

  • Event 2: All three units are working til time t1; at time t1 one of the three units fails. The remaining two units are working til the end of the mission.

  • Event 3: All three units are working til time t1; at time t1 one of the three units fails. Then at time t2 the second unit fails, and the remaining unit is working until the end of mission t.

Example 6.4

Consider a three-unit shared load parallel system where.

  • \(\lambda _{0}\) is the constant failure rate of a unit when all the three units are operational;

  • \(\lambda _{h}\) is the constant failure rate of each of the two surviving units, each of which shares half of the total load; and.

  • \(\lambda _{f}\) is the constant failure rate of a unit at full load.

For a shared-load parallel system to fail, all the units in the system must fail.

We now derive the reliability of a 3-unit shared-load parallel system using the Markov method. The following events would be considered for the three-unit load-sharing system to work:

  • Event 1: All the three units are working until the end of the mission time t where each unit shares one-third of the total load.

  • Event 2: All the three units are working until time t1 (each shares one-third of the total load). At time t1, one of the units (say unit 1) fails, and the other two units (say units 2 and 3) remain to work until the mission time t. Here, once a unit fails at time t1, the remaining two working units would take half each of the total load and have a constant rate \(\lambda _{h}\). As for all identical units, there are 3 possibilities under this situation.

  • Event 3: All the three units are working until time t1 (each shares one-third of the total load) when one (say unit 1) of the three units fails. At time t2, (t2 > t1) one more unit fails (say unit 2) and the remaining unit works until the end of the mission time t. Under this event, there are 6 possibilities that the probability of two units failing before time t and only one unit remains to work until time t.

Define State i represents that i components are working. Let Pi(t) denote the probability that the system is in state i at time t for i = 0, 1, 2, 3. Figure 6.5 below shows the Markov diagram of the system.

Fig. 6.5
figure 5

Markov diagram for a three-unit shared load parallel system

The Markov equations can be obtained as follows:

$$ \left\{ {\begin{array}{*{20}l} {\frac{{dP_{3} (t)}}{{dt}} = - 3\lambda _{0} P_{3} (t)} \hfill \\ {\frac{{dP_{2} (t)}}{{dt}} = 3\lambda _{0} P_{3} (t) - 2\lambda _{h} P_{2} (t)} \hfill \\ {\frac{{dP_{1} (t)}}{{dt}} = 2\lambda _{h} P_{2} (t) - \lambda _{f} P_{1} (t)} \hfill \\ {\frac{{dP_{0} (t)}}{{dt}} = \lambda _{f} P_{1} (t)} \hfill \\ {P_{3} (0) = 1} \hfill \\ {P_{j} (0) = 0,{\text{ j}} \ne 3} \hfill \\ {P_{0} (t) + P_{1} (t) + P_{2} (t) + P_{3} (t) = 1} \hfill \\ \end{array} } \right. $$

Solving the above differential equations using the Laplace transform method, we can easily obtain the following results:

$$ P_{3} (t){\text{ = e}}^{{ - {\text{3}}\;\lambda _{{\text{0}}} \;{\text{t}}}} $$
$$ P_{2} (t) = \frac{{3\lambda _{0} }}{{2\lambda _{h} - 3\lambda _{0} }}\left( {{\text{e}}^{{ - {\text{3}}\lambda _{{\text{0}}} t}} - {\text{e}}^{{ - {\text{2}}\lambda _{{\text{h}}} t}} } \right) $$
$$ P_{1} (t) = \frac{{6\lambda _{0} \lambda _{{\text{h}}} }}{{\left( {2\lambda _{h} - 3\lambda _{0} } \right)}}\left[ {\frac{{{\text{e}}^{{ - {\text{3}}\lambda _{{\text{0}}} t}} }}{{\left( {\lambda _{f} - 3\lambda _{0} } \right)}} - \frac{{{\text{e}}^{{ - {\text{2}}\lambda _{{\text{h}}} t}} }}{{\left( {\lambda _{f} - 2\lambda _{h} } \right)}} + \frac{{\left( {2\lambda _{h} - 3\lambda _{0} } \right){\text{e}}^{{ - \lambda _{{\text{f}}} t}} }}{{\left( {\lambda _{f} - 3\lambda _{0} } \right)\left( {\lambda _{f} - 2\lambda _{h} } \right)}}} \right] $$

Hence, the reliability of a three-unit shared-load parallel system is

$$ \begin{aligned} R(t) & = {\text{P}}_{{\text{3}}} (t) + {\text{P}}_{{\text{2}}} (t) + {\text{P}}_{{\text{1}}} (t) \\ & = {\text{e}}^{{ - {\text{3}}\lambda _{{\text{0}}} t}} + \frac{{3\lambda _{0} }}{{2\lambda _{h} - 3\lambda _{0} }}\left( {{\text{e}}^{{ - {\text{3}}\lambda _{{\text{0}}} t}} - {\text{e}}^{{ - {\text{2}}\lambda _{{\text{h}}} t}} } \right) \\ & + \frac{{6\lambda _{0} \lambda _{{\text{h}}} }}{{\left( {2\lambda _{h} - 3\lambda _{0} } \right)}}\left[ {\frac{{{\text{e}}^{{ - {\text{3}}\lambda _{{\text{0}}} t}} }}{{\left( {\lambda _{f} - 3\lambda _{0} } \right)}} - \frac{{{\text{e}}^{{ - {\text{2}}\lambda _{{\text{h}}} t}} }}{{\left( {\lambda _{f} - 2\lambda _{h} } \right)}} + \frac{{\left( {2\lambda _{h} - 3\lambda _{0} } \right){\text{e}}^{{ - \lambda _{{\text{f}}} t}} }}{{\left( {\lambda _{f} - 3\lambda _{0} } \right)\left( {\lambda _{f} - 2\lambda _{h} } \right)}}} \right] \\ \end{aligned} $$
(6.10)

which is the same as Eq. (6.60) in Chapt. 4.

6.2.2 System Mean Time Between Failures

Another parameter of interest in absorbing Markov processes is the mean time between failures (MTBF) (Pham et al. 1997). Recalling the previous Example 6.3 of a parallel configuration with repair, the differential equations P1′(t) and P2′(t) describing the process were (see Eq. 6.7):

$$ \begin{aligned} & P_{1} ^{\prime } (t) = - 2\lambda P_{1} (t) + \mu P_{2} (t) \\ & P_{2}^{\prime } (t) = 2\lambda P_{1} (t) - (\lambda + \mu )P_{2} (t). \\ \end{aligned} $$
(6.11)

Integrating both sides of the above equations yields

$$ \int\limits_{0}^{\infty } {P^{\prime } _{1} (t)dt = - 2\lambda } \int\limits_{0}^{\infty } {P_{1} (t)dt} + \mu \int\limits_{0}^{\infty } {P_{2} (t)dt} $$
$$ \int\limits_{0}^{\infty } {P_{2}^{\prime } (t)dt = 2\lambda } \int\limits_{0}^{\infty } {P_{1} (t)dt} - (\lambda + \mu )\int\limits_{0}^{\infty } {P_{2} (t)dt} $$

From Chap. 1,

$$ \int\limits_{0}^{\infty } {R(t)dt} = MTTF $$
(6.12)

Similarly,

$$ \begin{aligned} & \int\limits_{0}^{\infty } {P_{1} (t)dt} = {\text{mean}}\;{\text{time}}\;{\text{spent}}\;{\text{in}}\;{\text{state}}\;1,{\text{and}} \\ & \int\limits_{0}^{\infty } {P_{2} (t)dt} = {\text{mean}}\;{\text{time}}\;{\text{spent}}\;{\text{in}}\;{\text{state}}\;{\text{2}} \\ \end{aligned} $$

Designating these mean times as T1 and T2, respectively, we have

$$ \begin{aligned} & P_{1} (t)dt|_{0}^{\infty } = - 2\lambda T_{1} + \mu T_{2} \\ & P_{2} (t)dt|_{0}^{\infty } = 2\lambda T_{1} - (\lambda + \mu )T_{2} \\ \end{aligned} $$

But P1(t) = 0 as \(t \to \infty\) and P1(t) = 1 for t = 0. Likewise, P2(t) = 0 as \(t \to \infty\) and P2(t) = 0 for t = 0. Thus,

$$ \begin{aligned} - 1 & = - 2\lambda T_{1} + \mu T_{2} \\ 0 & = 2\lambda T_{1} - (\lambda + \mu )T_{2} \\ \end{aligned} $$

or, equivalently,

$$\left[ \begin{gathered} - 1 \\ 0 \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}c} { - 2\lambda } & \mu \\ {2\lambda } & { - (\lambda + \mu )} \\ \end{array} } \right]\left[ \begin{gathered} T_{1} \\ T_{2} \\ \end{gathered} \right]$$

Therefore,

$$ \begin{array}{*{20}c} {T_{1} = \frac{{(\lambda + \mu )}}{{2\lambda ^{2} }}\quad \quad T_{2} = \frac{1}{\lambda }} \\ {MTTF = T_{1} + T_{2} = \frac{{(\lambda + \mu )}}{{2\lambda ^{2} }} + \frac{1}{\lambda } = \frac{{(3\lambda + \mu )}}{{2\lambda ^{2} }}} \\ \end{array} $$
(6.13)

The MTBF for non-maintenance processes is developed exactly the same way as just shown. What remains under absorbing processes is the case for availability for maintained systems. The difference between reliability and availability for absorbing processes is somewhat subtle. A good example is that of a communica-tion system where, if such a system failed temporarily, the mission would continue, but, if it failed permanently, the mission would be aborted.

Example 6.5

Consider the following cold-standby configuration consisting of two units: one main unit and one spare unit (see Fig. 6.6):

Fig. 6.6
figure 6

A cold-standby system

  • State 1: Main unit operating—spare OK.

  • State 2: Main unit out—restoration underway.

  • State 3: Spare unit installed and operating.

  • State 4: Permanent failure (no spare available).

From Fig. 6.7, the incremental transition matrix is given by

$$ [P_{{ij}} (dt)] = \left[ {\begin{array}{*{20}c} {1 - \lambda dt} & {\lambda dt} & 0 & 0 \\ 0 & {1 - \mu dt} & {\mu dt} & 0 \\ 0 & 0 & {1 - \lambda dt} & {\lambda dt} \\ 0 & 0 & 0 & 1 \\ \end{array} } \right] $$
Fig. 6.7
figure 7

State transition diagram for the cold-standby system

We obtain

$$ \begin{aligned} & P_{1}^{\prime } (t) = - \lambda P_{1} (t) \\ & P_{2}^{\prime } (t) = \lambda P_{1} (t) - \mu P_{2} (t) \\ & P_{3} ^{\prime } (t) = \mu P_{2} (t) - \lambda P_{3} (t) \\ \end{aligned} $$

Using the Laplace transform, we obtain

$$ \begin{aligned} sf_{1} (s) - 1 & = - \lambda f_{1} (s) \\ sf_{2} (s) & = \lambda f_{1} (s) - \mu f_{2} (s) \\ sf_{3} (s) & = \mu f_{2} (s) - \lambda f_{3} (s) \\ \end{aligned} $$

After simplifications,

$$ \begin{aligned} & f_{1} (s) = \frac{1}{{(s + \lambda )}} \\ & f_{2} (s) = \frac{\lambda }{{[(s + \lambda )(s + \mu )]}} \\ & f_{3} (s) = \frac{{\lambda \mu }}{{[(s + \lambda )^{2} (s + \mu )]}} \\ \end{aligned} $$

Therefore, the probability of full-up performance, P1(t), is given by

$$ P_{1} (t) = e^{{ - \lambda t}} $$
(6.14)

Similarly, the probability of the system being down and under repair, P2(t), is

$$ P_{2} (t) = \left[ {\frac{\lambda }{{(\lambda - \mu )}}} \right]\left( {e^{{ - \mu t}} - e^{{ - \lambda t}} } \right) $$
(6.15)

and the probability of the system being full-up but no spare available, P3(t), is

$$ P_{3} (t) = \left[ {\frac{{\lambda \mu }}{{(\lambda - \mu )^{2} }}} \right][e^{{ - \mu t}} - e^{{ - \lambda t}} - (\lambda - \mu )te^{{ - \lambda t}} ] $$
(6.16)

Hence, the point availability, A(t), is given by

$$ A(t) = P_{1} (t) + P_{3} (t) $$
(6.17)

If average or interval availability is required, this is achieved by

$$ \left( {\frac{1}{t}} \right)\int_{0}^{T} {A(t)dt} = \left( {\frac{1}{t}} \right)\int_{0}^{T} {[P_{1} (t) + P_{3} (t)]dt} $$
(6.18)

where T is the interval of concern.

With the above example, cases of the absorbing process (both maintained and non-maintained) have been covered insofar as “manual” methods are concerned. In general, the methodology for treatment of absorbing Markov processes can be “packaged” in a fairly simplified form by utilizing matrix notation. Thus, for example, if the incremental transition matrix is defined as follows:

$$ [P_{{ij}} (dt)] = \left[ {\begin{array}{*{20}c} {1 - 2\lambda dt} & {2\lambda dt} & 0 \\ {\mu dt} & {1 - (\lambda + \mu )dt} & {\lambda dt} \\ 0 & 0 & 1 \\ \end{array} } \right] $$

then if the dts are dropped and the last row and the last column are deleted, the remainder is designated as the matrix T:

$$ [T] = \left[ {\begin{array}{*{20}c} {1 - 2\lambda } & {2\lambda } \\ \mu & {1 - (\lambda + \mu )} \\ \end{array} } \right] $$

Define [Q] = [T]′ - [I], where [T]′ is the transposition of [T] and [I] is the unity matrix:

$$ \begin{aligned} & [Q] = \left[ {\begin{array}{*{20}c} {1 - 2\lambda } & \mu \\ {2\lambda } & {1 - (\lambda + \mu )} \\ \end{array} } \right] - \left[ {\begin{array}{*{20}c} 1 & 0 \\ 0 & 1 \\ \end{array} } \right] \\ & \quad \quad = \left[ {\begin{array}{*{20}c} { - 2\lambda } & \mu \\ {2\lambda } & { - (\lambda + \mu )} \\ \end{array} } \right] \\ \end{aligned} $$

Further define [P(t)] and [P′(t)] as column vectors such that

$$ [P_{1} (t)] = \left[ \begin{gathered} P_{1} (t) \hfill \\ P_{2} (t) \hfill \\ \end{gathered} \right],\quad [P^{\prime } (t)] = \left[ \begin{gathered} P_{1} ^{\prime } (t) \hfill \\ P_{2} ^{\prime } (t) \hfill \\ \end{gathered} \right] $$

then

$$ \left[ {P^{\prime } (t)} \right] = \left[ Q \right]\left[ {P(t)} \right] $$

At the above point, solution of the system of differential equations will produce solutions to P1(t) and P2(t). If the MTBF is desired, integration of both sides of the system produces

$$\begin{aligned} & \left[ \begin{gathered} - 1 \hfill \\ {\text{ }}0 \hfill \\ \end{gathered} \right] = [Q]\left[ \begin{gathered} T_{1} \\ T_{2} \\ \end{gathered} \right] \\ & \left[ \begin{gathered} - 1 \hfill \\ {\text{ }}0 \hfill \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}c} { - 2\lambda } & \mu \\ {2\lambda } & { - (\lambda + \mu )} \\ \end{array} } \right]\left[ \begin{gathered} T_{1} \\ T_{2} \\ \end{gathered} \right]\quad {\text{or}} \\ & [Q]^{{ - 1}} \left[ \begin{gathered} 1 \hfill \\ 0 \hfill \\ \end{gathered} \right] = \left[ \begin{gathered} T_{1} \\ T_{2} \\ \end{gathered} \right] \\ \end{aligned}$$

where [Q]−1 is the inverse of [Q] and the MTBF is given by

$$ {\text{MTBF}} = T_{1} + T_{2} = \frac{{3\lambda + \mu }}{{(2\lambda )^{2} }} $$

In the more general MTBF case,

$$ [Q]^{{ - 1}} \left[ \begin{gathered} - 1 \\ 0 \\ \cdot \\ \cdot \\ \cdot \\ 0 \\ \end{gathered} \right] = \left[ \begin{gathered} T_{1} \\ T_{2} \\ \cdot \\ \cdot \\ \cdot \\ T_{{n - 1}} \\ \end{gathered} \right]\quad {\text{where}}\;\sum\limits_{{i = 1}}^{{n - 1}} {T_{i} = {\text{MTBF}}} $$

and (n − 1) is the number of non-absorbing states.

For the reliability/availability case, utilizing the Laplace transform, the system of linear, first-order differential equations is transformed to

$$\begin{aligned} s\left[ \begin{gathered} f_{1} (s) \\ f_{2} (s) \\ \end{gathered} \right] - \left[ \begin{gathered} 1 \\ 0 \\ \end{gathered} \right] & = [Q]\left[ \begin{gathered} f_{1} (s) \\ f_{2} (s) \\ \end{gathered} \right] \\ [sI - Q]\left[ \begin{gathered} f_{1} (s) \\ f_{2} (s) \\ \end{gathered} \right] & = \left[ \begin{gathered} 1 \\ 0 \\ \end{gathered} \right] \\ \left[ \begin{gathered} f_{1} (s) \\ f_{2} (s) \\ \end{gathered} \right] & = [sI - Q]^{{ - 1}} \left[ \begin{gathered} 1 \\ 0 \\ \end{gathered} \right] \\ L^{{ - 1}} \left[ \begin{gathered} f_{1} (s) \\ f_{2} (s) \\ \end{gathered} \right] & = L^{{ - 1}} \left\{ {[sI - Q]^{{ - 1}} \left[ \begin{gathered} 1 \\ 0 \\ \end{gathered} \right]} \right\} \\ \left[ \begin{gathered} p_{1} (s) \\ p_{2} (s) \\ \end{gathered} \right] & = L^{{ - 1}} \left\{ {[sI - Q]^{{ - 1}} \left[ \begin{gathered} 1 \\ 0 \\ \end{gathered} \right]} \right\} \\ \end{aligned}$$

Generalization of the latter to the case of (n − 1) non-absorbing states is straightforward.

Ergodic processes, as opposed to absorbing processes, do not have any absorbing states, and hence, movement between states can go on indefinitely For the latter reason, availability (point, steady-state, or interval) is the only meaningful measure. As an example for ergodic processes, a ground-based power unit configured in parallel will be selected.

Example 6.6

Consider a parallel system consisting of two identical units each with exponential failure and repair times with constant rates λ and μ, respectively (see Fig. 6.8). Assume a two-repairmen capability if required (both units down), then.

Fig. 6.8
figure 8

State transition diagram with repair for a parallel system

  • State 1: Full-up (both units operating).

  • State 2: One unit down and under repair (other unit up).

  • State 3: Both units down and under repair.

It should be noted that, as in the case of failure events, two or more repairs cannot be made in the dt interval.

$$ [P_{{ij}} (dt)] = \left[ {\begin{array}{*{20}c} {1 - 2\lambda dt} & {2\lambda dt} & 0 \\ {\mu dt} & {1 - (\lambda + \mu )dt} & {\lambda dt} \\ 0 & {2\mu dt} & {1 - 2\mu dt} \\ \end{array} } \right] $$
  • Case I: Point Availability—Ergodic Process. For an ergodic process, as \(t \to \infty\) the availability settles down to a constant level. Point availability gives a measure of things before the “settling down” and reflects the initial conditions on the process. Solution of the point availability is similar to the case for absorbing processes except that the last row and column of the transition matrix must be retained and entered into the system of equations. For example, the system of differential equations becomes

$$\left[ \begin{gathered} P_{1} ^{\prime } (t) \\ P_{2} ^{\prime } (t) \\ P_{3} ^{\prime } (t) \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}c} { - 2\lambda } & \mu & 0 \\ {2\lambda } & { - (\lambda + \mu )} & {2\mu } \\ 0 & \lambda & { - 2\mu } \\ \end{array} } \right]\left[ \begin{gathered} P_{1} (t) \\ P_{2} (t) \\ P_{3} (t) \\ \end{gathered} \right]$$

Similar to the absorbing case, the method of the Laplace transform can be used to solve for P1(t), P2(t), and P3(t), with the point availability, A(t), given by

$$ A(t) = P_{1} (t) + P_{2} (t) $$
(6.19)
  • Case II: Interval Availability—Ergodic Process. This is the same as the absorbing case with integration over time period T of interest. The interval availability, A(T), is

$$ A(T) = \frac{1}{T}\int\limits_{0}^{T} {A(t)dt} $$
(6.20)
  • Case III: Steady State Availability—Ergodic Process. Here the process is examined as \(t \to \infty\) with complete “washout” of the initial conditions. Letting \(t \to \infty\) the system of differential equations can be transformed to linear algebraic equations. Thus,

    $$\mathop {\lim }\limits_{{t \to \infty }} \left[ \begin{gathered} P_{1} ^{\prime } (t) \\ P_{2} ^{\prime } (t) \\ P_{3} ^{\prime } (t) \\ \end{gathered} \right] = \mathop {\lim }\limits_{{t \to \infty }} \left[ {\begin{array}{*{20}c} { - 2\lambda } & \mu & 0 \\ {2\lambda } & { - (\lambda + \mu )} & {2\mu } \\ 0 & \lambda & { - 2\mu } \\ \end{array} } \right]\left[ \begin{gathered} P_{1} (t) \\ P_{2} (t) \\ P_{3} (t) \\ \end{gathered} \right]$$

As \(t \to \infty\), \(P_{i} (t) \to\) constant and \(P_{i} ^{\prime } (t) \to 0\). This leads to an unsolvable sys-tem, namely

$$\left[ \begin{gathered} 0 \\ 0 \\ 0 \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}c} { - 2\lambda } & \mu & 0 \\ {2\lambda } & { - (\lambda + \mu )} & {2\mu } \\ 0 & \lambda & { - 2\mu } \\ \end{array} } \right]\left[ \begin{gathered} P_{1} (t) \\ P_{2} (t) \\ P_{3} (t) \\ \end{gathered} \right]$$

To avoid the above difficulty, an additional equation is introduced:

$$ \sum\limits_{{i = 1}}^{3} {P_{i} (t) = 1} $$

With the introduction of the new equation, one of the original equations is deleted and a new system is formed:

$$\left[ \begin{gathered} 1 \\ 0 \\ 0 \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}c} 1 & 1 & 1 \\ { - 2\lambda } & \mu & 0 \\ {2\lambda } & { - (\lambda + \mu )} & {2\mu } \\ \end{array} } \right]\left[ \begin{gathered} P_{1} (t) \\ P_{2} (t) \\ P_{3} (t) \\ \end{gathered} \right]$$

or, equivalently,

$$\left[ \begin{gathered} P_{1} (t) \\ P_{2} (t) \\ P_{3} (t) \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}c} 1 & 1 & 1 \\ { - 2\lambda } & \mu & 0 \\ {2\lambda } & { - (\lambda + \mu )} & {2\mu } \\ \end{array} } \right]^{{ - 1}} \left[ \begin{gathered} 1 \\ 0 \\ 0 \\ \end{gathered} \right]$$

We now obtain the following results:

$$ \begin{aligned} & P_{1} (t) = \frac{{\mu ^{2} }}{{(\mu + \lambda )^{2} }} \\ & P_{2} (t) = \frac{{2\lambda \mu }}{{(\mu + \lambda )^{2} }} \\ \end{aligned} $$

and

$$ P_{3} (t) = 1 - P_{1} (t) - P_{2} (t) = \frac{{\lambda ^{2} }}{{(\mu + \lambda )^{2} }} $$

Therefore, the steady state availability, A(∞), is given by

$$ A_{3} (\infty ) = P_{1} (t) + P_{2} (t) = \frac{{\mu (\mu + 2\lambda )}}{{(\mu + \lambda )^{2} }}. $$
(6.21)

Note that Markov methods can also be employed where failure or repair times are not exponential, but can be represented as the sum of exponential times with identical means (Erlang distribution or Gamma distribution with integer valued shape parameters). Basically, the method involves the introduction of “dummy” states which are of no particular interest in themselves, but serve the purpose of changing the hazard function from constant to increasing.

Example 6.7

We now discuss two Markov models (Cases 1 and 2 below) which allow integration of control systems of nuclear power plant reliabiltiy and safey analysis. A basic system transition diagram for both models is presented in Fig. 6.9. In both models, it is assumed that the control system is composed of a control rod and an associated safety system. The following assumptions are applied in this example.

Fig. 6.9
figure 9

System state diagram

  1. (i)

    All failures are statistically independent.

  2. (ii)

    Each unit has a constant rate.

  3. (iii)

    The control system fails when the control rod fails.

The following notations are assocaited with the system shown in Fig. 6.9.

i:

ith state of the system: i = 1 (control and its associated safety system operating normally); i = 2 (control operating normally, safety system failed), i = 3 (control failed with an accident), i = 4 (control failed safely); i = 5 (control failed but its associated safety system operating normally).

Pi(t):

probability that the control system is in state i at time t, i = 1,2, …, 5

λi:

ith constant failure rate: i = s (state 1 to state 2), i = ci (state 2 to state 3), i = cs (state 2 to state 4), i = c (state 1 to state 5).

Pi(s):

Laplace transform of the probability that the control system is in state i; i = 1,2, …, 5.

s:

Laplace transform variable.

  • Case 1: The system represented by Model 1 is shown in Fig. 6.9. Using the Markov approach, the system of differential equations (associated with Fig. 6.9) is given below:

$$ \begin{aligned} & P_{1}^{\prime } (t) = - \left( {\lambda _{s} + \lambda _{c} } \right)P_{1} (t) \\ & P_{2}^{\prime } (t) = \lambda _{s} P_{1} (t) - \left( {\lambda _{{ci}} + \lambda _{{cs}} } \right)P_{2} (t) \\ & P_{3}^{\prime } (t) = \lambda _{{ci}} P_{2} (t) \\ & P_{4}^{\prime } (t) = \lambda _{{cs}} P_{2} (t) \\ & P_{5}^{\prime } (t) = \lambda _{c} P_{1} (t) \\ \end{aligned} $$

Assume that at time t = 0, P1(0) = 1, and P2(0) = P3(0) = P4(0) = P5(0) = 0. Solving the above system of equations, we obtain

$$ \begin{aligned} & P_{1} (t) = e^{{ - At}} \\ & P_{2} (t) = \frac{{\lambda _{s} }}{B}\left( {e^{{ - Ct}} - e^{{ - At}} } \right) \\ & P_{3} (t) = \frac{{\lambda _{s} \lambda _{{ci}} }}{{AC}}\left( {1 - \frac{{Ae^{{ - Ct}} - Ce^{{ - At}} }}{B}} \right) \\ & P_{4} (t) = \frac{{\lambda _{s} \lambda _{{cs}} }}{{AC}}\left( {1 - \frac{{Ae^{{ - Ct}} - Ce^{{ - At}} }}{B}} \right) \\ & P_{5} (t) = \frac{{\lambda _{c} }}{A}\left( {1 - e^{{ - At}} } \right) \\ \end{aligned} $$

where

$$ A = \lambda _{s} + \lambda _{c} ;\quad B = \lambda _{s} + \lambda _{c} - \lambda _{{cs}} - \lambda _{{ci}} ;\quad C = \lambda _{{ci}} + \lambda _{{cs}} . $$

The reliability of both the control and its safety system working normally, Rcs, is given by

$$ R_{{cs}} (t) = P_{1} (t) = e^{{ - At}} . $$

The reliability of the control system working normally with or without the safety system functioning successfully is

$$ R_{{ss}} (t) = P_{1} (t) + P_{2} (t) = e^{{ - At}} + \frac{{\lambda _{s} }}{B}\left( {e^{{ - Ct}} - e^{{ - At}} } \right). $$

The mean time to failure (MTTF) of the control with the safety system up is

$$ MTTF_{{cs}} = \int\limits_{0}^{\infty } {R_{{cs}} (t)dt} = \frac{1}{A}. $$

Similarly, the MTTF of the control with the safety system up or down is

$$ MTTF_{{ss}} = \int\limits_{0}^{\infty } {R_{{ss}} (t)dt} = \frac{1}{A} + \frac{{\lambda _{s} }}{B}\left( {\frac{1}{C} - \frac{1}{A}} \right). $$
  • Case 2: This model is the same as Case 1 except that a repair is allowed when the safety system fails with a constant rate μ. The system of differential equations for this model is as follows:

    $$ \begin{aligned} & P_{1}^{\prime } (t) = \mu P_{2} (t) - AP_{1} (t) \\ & P_{2}^{\prime } (t) = \lambda _{s} P_{1} (t) - \left( {\lambda _{{ci}} + \lambda _{{cs}} + \mu } \right)P_{2} (t) \\ & P_{3}^{\prime } (t) = \lambda _{{ci}} P_{2} (t) \\ & P_{4}^{\prime } (t) = \lambda _{{cs}} P_{2} (t) \\ & P_{5}^{\prime } (t) = \lambda _{c} P_{1} (t) \\ \end{aligned} $$

We assume that at time t = 0, P1(0) = 1, and P2(0) = P3(0) = P4(0) = P5(0) = 0. Solving the above system of equations, we obtain

$$ \begin{aligned} & P_{1} (t) = e^{{ - At}} + \mu \lambda _{s} \left[ {\frac{{e^{{ - At}} }}{{\left( {r_{1} + A} \right)\left( {r_{2} + A} \right)}} + \frac{{e^{{r_{1} t}} }}{{\left( {r_{1} + A} \right)\left( {r_{1} - r_{2} } \right)}} + \frac{{e^{{r_{2} t}} }}{{\left( {r_{2} + A} \right)\left( {r_{2} - r_{1} } \right)}}} \right] \\ & P_{2} (t) = \lambda _{s} \frac{{e^{{r_{1} t}} - e^{{r_{2} t}} }}{{\left( {r_{1} - r_{2} } \right)}} \\ & P_{3} (t) = \frac{{\lambda _{s} \lambda _{{ci}} }}{{r_{1} r_{2} }}\left( {\frac{{r_{1} e^{{r_{2} t}} - r_{2} e^{{ - r_{1} t}} }}{{r_{2} - r_{1} }} + 1} \right) \\ & P_{4} (t) = \frac{{\lambda _{s} \lambda _{{cs}} }}{{r_{1} r_{2} }}\left( {\frac{{r_{1} e^{{r_{2} t}} - r_{2} e^{{ - r_{1} t}} }}{{r_{2} - r_{1} }} + 1} \right) \\ & P_{5} (t) = \frac{{\lambda _{c} }}{A}\left( {1 - e^{{ - At}} } \right) + \mu \lambda _{s} \lambda _{c} \\ & \quad \quad \left[ {\frac{1}{{r_{1} r_{2} A}} - \frac{{e^{{ - At}} }}{{A\left( {r_{1} + A} \right)\left( {r_{2} + A} \right)}} + \frac{{e^{{r_{1} t}} }}{{r_{1} \left( {r_{1} + A} \right)\left( {r_{1} - r_{2} } \right)}} + \frac{{e^{{r_{2} t}} }}{{r_{2} \left( {r_{2} + A} \right)\left( {r_{2} - r_{1} } \right)}}} \right] \\ \end{aligned} $$

where

$$ \begin{aligned} & r_{1} ,r_{2} = \frac{{ - a \pm \sqrt {a^{2} - 4b} }}{2}, \\ & a = A + C + \mu ,\quad b = \lambda _{{ci}} \lambda _{s} + \lambda _{{cs}} \lambda _{s} + \left( {\lambda _{{ci}} + \lambda _{{cs}} + \mu } \right)\lambda _{c} . \\ \end{aligned} $$

The reliability of both the control and its associated safety system working normally with the safety repairable system is.

$$ R_{{cs}} (t) = e^{{ - At}} + \mu \lambda _{s} \left[ {\frac{{e^{{ - At}} }}{{\left( {r_{1} + A} \right)\left( {r_{2} + A} \right)}} + \frac{{e^{{r_{1} t}} }}{{\left( {r_{1} + A} \right)\left( {r_{1} - r_{2} } \right)}} + \frac{{e^{{r_{2} t}} }}{{\left( {r_{2} + A} \right)\left( {r_{2} - r_{1} } \right)}}} \right]. $$

The reliability of the control operating normal with or without the safety system operating (but having safety system repair) is

$$ \begin{aligned} R_{{ss}} (t) & = e^{{ - At}} + \frac{{\lambda _{s} \left( {e^{{r_{1} t}} - e^{{r_{2} t}} } \right)}}{{\left( {r_{1} - r_{2} } \right)}} \\ & \quad + \mu \lambda _{s} \left[ {\frac{{e^{{ - At}} }}{{\left( {r_{1} + A} \right)\left( {r_{2} + A} \right)}} + \frac{{e^{{r_{1} t}} }}{{\left( {r_{1} + A} \right)\left( {r_{1} - r_{2} } \right)}} + \frac{{e^{{r_{2} t}} }}{{\left( {r_{2} + A} \right)\left( {r_{2} - r_{1} } \right)}}} \right]. \\ \end{aligned} $$

The MTTF of the control with the safety system operating is

$$ MTTF_{{cs}} = \int\limits_{0}^{\infty } {R_{{cs}} (t)dt = } \frac{1}{A}\left( {1 + \frac{{\mu \lambda _{s} }}{b}} \right). $$

We can see that the repair process has helped to improve the system's MTTF. Similarly, the MTTF of the control with the safety system up or down but with accessible repair is given by

$$ MTTF_{{ss}} = \int\limits_{0}^{\infty } {R_{{ss}} (t)dt = } \frac{1}{A}\left( {1 + \frac{{\mu \lambda _{s} }}{b}} \right) + \frac{{\lambda _{s} }}{A}. $$

Example 6.8

A system is composed of eight identical active power supplies, at least seven of the eight are required for the system to function. In other words, when two of the eight power supplies fail, the system fails. When all eight power supplies are operating, each has a constant failure rate \(\lambda _{a}\) per hour. If one power supply fails, each remaining power supply has a failure rate \(\lambda _{b}\) per hour where \(\lambda _{a} \le \lambda _{b}\) We assume that a failed power supply can be repaired with a constant rate \(\mu\) per hour. The system reliability function, R(t), is defined as the probability that the system continues to function throughout the interval (0, t). Here we wish to determine the system mean time to failure (MTTF).

Define.

  • State 0: All 8 units are working.

  • State 1: 7 units are working.

  • State 2: More than one unit failed and system does not work.

The initial condition:\(P_{0} (0) = 1,P_{1} (0) = P_{2} (0) = 0\).

The Markov modeling of differential equations (see Fig. 6.10) can be written as follows:

$$ \begin{aligned} & P_{0}^{\prime } (t) = - 8\lambda _{a} P_{0} (t) + \mu P_{1} (t) \\ & P_{1}^{\prime } (t) = 8\lambda _{a} P_{0} (t) - \left( {7\lambda _{b} + \mu } \right)P_{1} (t) \\ & P_{2}^{\prime } (t) = 7\lambda _{b} P_{1} (t) \\ \end{aligned} $$
(6.22)
Fig. 6.10
figure 10

Markov transition rate diagram for a 7-out-8 dependent system

Using the Laplace transform, we obtain

$$ \left\{ {\begin{array}{*{20}l} {sF_{0} (s) - P_{0} (0) = - 8\lambda _{a} F_{0} (s) + \mu F_{1} (s)} \hfill \\ {sF_{1} (s) - P_{1} (0) = 8\lambda _{a} F_{0} (s) - \left( {7\lambda _{b} + \mu } \right)F_{1} (s)} \hfill \\ {sF_{2} (s) - P_{2} (0) = 7\lambda _{b} F_{1} (s)} \hfill \\ \end{array} } \right. $$
(6.23)

When s = 0:

$$ F_{i} (0) = \int\limits_{0}^{\infty } {P_{i} (t)dt} . $$

Thus, the system reliability function and system MTTF, respectively, are

$$ R(t) = P_{0} (t) + P_{1} (t). $$
(6.24)

and

$$ MTTF = \int\limits_{0}^{\infty } {R(t)dt} = \int\limits_{0}^{\infty } {\left[ {P_{0} (t) + P_{1} (t)} \right]dt} = \sum\limits_{{i = 1}}^{2} {F_{i} (0)} . $$
(6.25)

From Eq. (6.23), when s = 0, we have

$$ \left\{ {\begin{array}{*{20}l} { - 1 = - 8\lambda _{a} F_{0} (0) + \mu F_{1} (0)} \hfill \\ {0 = 8\lambda _{a} F_{0} (0) - \left( {7\lambda _{b} + \mu } \right)F_{1} (0)} \hfill \\ \end{array} } \right. $$
(6.26)

From Eq. (6.26), after some arrangements, we can obtain

$$ 7\lambda _{b} F_{1} (0) = 1\quad \Rightarrow \quad F_{1} (0) = \frac{1}{{7\lambda _{b} }} $$

and

$$ \begin{aligned} F_{0} (0) & = \frac{{7\lambda _{b} + \mu }}{{8\lambda _{a} }}{\text{ }}F_{1} (0) \\ & = \frac{{7\lambda _{b} + \mu }}{{8\lambda _{a} }}{\text{ }}\frac{{\text{1}}}{{7\lambda _{b} }} = \frac{{7\lambda _{b} + \mu }}{{56\lambda _{a} \lambda _{b} }}. \\ \end{aligned} $$

From Eq. (6.25), the system MTTF can be obtained

$$ \begin{aligned} MTTF & = \int\limits_{0}^{\infty } {R(t)dt} = \int\limits_{0}^{\infty } {\left[ {P_{0} (t) + P_{1} (t)} \right]dt} = F_{0} (0) + F_{1} (0) \\ & = \frac{{7\lambda _{b} + \mu }}{{56\lambda _{a} \lambda _{b} }} + \frac{{\text{1}}}{{7\lambda _{b} }} = \frac{{\mu {\text{ }} + {\text{ 8}}\lambda _{a} {\text{ + 7}}\lambda _{b} }}{{56\lambda _{a} {\text{ }}\lambda _{b} }}. \\ \end{aligned} $$

Given \(\lambda _{a} = 3\, \times \,10^{{ - 3}} = 0.003\), \(\lambda _{b} = 5\, \times\, 10^{{ - 2}} = 0.05\), and \(\mu = 0.8\), then the system mean time to failure is given by:

$$ \begin{aligned} MTTF & = {\text{ }}\frac{{\mu + {\text{8}}\lambda _{a} + {\text{7}}\lambda _{b} }}{{{\text{56 }}\lambda _{a} \,\lambda _{b} }} \\ & = \frac{{0.8{\text{ }} + {\text{ 8(0}}{\text{.003) }} + {\text{ 7}}\left( {0.05} \right)}}{{56\left( {0.003} \right)\left( {0.05} \right)}} = \frac{{{\text{1}}{\text{.174}}}}{{{\text{0}}{\text{.0084}}}} = {\text{139}}.762\;{\text{h}}. \\ \end{aligned} $$

Example 6.9

A system consists of two independent components operating in parallel (see Fig. 6.1) with a single repair facility where repair may be completed for a failed component before the other component has failed. Both the components are assumed to be functioning at time t = 0. When both components have failed, the system is considered to have failed and no recovery is possible. Assuming component i has the constant failure rate \(\lambda _{i}\) and repair rate \(\mu _{i}\) for i = 1 and 2. The system reliability function, R(t), is defined as the probability that the system continues to function throughout the interval (0, t).

  1. (a)

    Derive the system reliability function and system mean time to failure (MTTF) and calculate the MTTF.

  2. (b)

    Assume that both components have the same failure rate \(\lambda\) and repair rate \(\mu .\) That is, \(\lambda _{1} = \lambda _{2} = \lambda\) and \(\mu _{1} = \mu _{2} = \mu .\) Calculate the reliability function and system MTTF when \(\lambda = 0.003\) per hour, and \(\mu = 0.1\) per hour, and t = 25 h.

Define.

  • State 1: both components are working.

  • State 2: component 1 failed, component 2 is working.

  • State 3: component 2 failed, component 1 is working.

  • State 4: Both components 1 and 2 failed.

The initial conditions: \(P_{1} (0) = 1,\quad P_{2} (0) = P_{3} (0) = P_{4} (0) = 0.\) From Fig. 6.11, the Markov modeling of differential equations can be written as follows:

$$ \left\{ {\begin{array}{*{20}l} {\frac{{dP_{1} (t)}}{{dt}} = - \left( {\lambda _{1} + \lambda _{2} } \right)P_{1} (t) + \mu _{1} P_{2} (t) + \mu _{2} P_{3} (t)} \hfill \\ {\frac{{dP_{2} (t)}}{{dt}} = \lambda _{1} P_{1} (t) - \left( {\lambda _{2} + \mu _{1} } \right)P_{2} (t)} \hfill \\ {\frac{{dP_{3} (t)}}{{dt}} = \lambda _{2} P_{1} (t) - \left( {\lambda _{1} + \mu _{2} } \right)P_{3} (t)} \hfill \\ {\frac{{dP_{4} (t)}}{{dt}} = \lambda _{2} P_{2} (t) + \lambda _{1} P_{3} (t)} \hfill \\ {P_{1} (0) = 1,\quad P_{j} (0) = 0,{\text{j}} \ne 1.} \hfill \\ \end{array} } \right. $$
(6.27)
Fig. 6.11
figure 11

A degraded system rate diagram

Let \(\ell \left\{ {P_{i} (t)} \right\} = F_{i} (s).\) Then \(\ell \left\{ {\frac{{\partial P_{i} (t)}}{{\partial t}}} \right\} = sF_{i} (s) - F_{i} (0).\) Using the Laplace transform, we obtain

$$ \left\{ {\begin{array}{*{20}l} {sF_{1} (s) - 1 = - \left( {\lambda _{1} + \lambda _{2} } \right)F_{1} (s) + \mu _{1} F_{2} (s) + \mu _{2} F_{3} (s)} \hfill \\ {sF_{2} (s) = \lambda _{1} F_{1} (s) - \left( {\lambda _{2} + \mu _{1} } \right)F_{2} (s)} \hfill \\ {sF_{3} (s) = \lambda _{2} F_{1} (s) - \left( {\lambda _{1} + \mu _{2} } \right)F_{3} (s)} \hfill \\ {sF_{4} (s) = \lambda _{2} F_{2} (s) + \lambda _{1} F_{3} (s)} \hfill \\ \end{array} } \right. $$
(6.28)

From Eq. (6.28), we obtain

$$ \begin{aligned} & F_{1} (s) = \frac{{(s + a_{2} )(s + a_{3} )}}{{s^{3} + b_{1} s^{2} + c_{1} s + c_{2} }} \\ & F_{2} (s) = \frac{{\lambda _{1} }}{{s + a_{2} }}F_{1} (s) \\ & F_{3} (s) = \frac{{\lambda _{2} }}{{s + a_{3} }}F_{1} (s) \\ \end{aligned} $$

where

$$ \begin{aligned} & a_{1} = \lambda _{1} + \lambda _{2} {\text{;}}\quad a_{2} = \lambda _{2} + \mu _{1} ;\quad a_{3} = \lambda _{1} + \mu _{2} ; \\ & a_{4} = \lambda _{1} \mu _{1} {\text{;}}\quad a_{5} = \lambda _{2} \mu _{2} ;\quad b_{1} = a_{1} + a_{2} + a_{3} ; \\ & b_{2} = a_{1} a_{2} + a_{1} a_{3} + a_{2} a_{3} ;\quad b_{3} = a_{1} a_{2} a_{3} ; \\ & c_{1} = b_{2} - a_{4} - a_{5} ;\quad c_{2} = b_{3} - a_{3} a_{4} - a_{2} a_{5} . \\ \end{aligned} $$

Take the inverse of Laplace transform, that is \(P_{i} (t) = \ell ^{{ - 1}} \left\{ {F_{i} (s)} \right\}\), then the system reliability function is

$$ R(t) = \sum\limits_{{i = 1}}^{3} {P_{i} (t)} . $$
(6.29)

When s = 0:

$$ F_{i} (0) = \int\limits_{0}^{\infty } {P_{i} (t)} . $$

Thus, the system MTTF is

$$ MTTF = \int\limits_{0}^{\infty } {R(t)dt = } \int\limits_{0}^{\infty } {\left[ {P_{1} (t) + P_{2} (t) + P_{3} (t)} \right]dt = } \sum\limits_{{i = 1}}^{3} {F_{i} (0)} . $$

Substitute s = 0 into Eq. (6.28), we have

$$ \left\{ {\begin{array}{*{20}l} { - 1 = - \left( {\lambda _{1} + \lambda _{2} } \right)F_{1} (0) + \mu _{1} F_{2} (0) + \mu _{2} F_{3} (0)} \hfill \\ {0 = \lambda _{1} F_{1} (0) - \left( {\lambda _{2} + \mu _{1} } \right)F_{2} (0)} \hfill \\ {0 = \lambda _{2} F_{1} (0) - \left( {\lambda _{1} + \mu _{2} } \right)F_{3} (0)} \hfill \\ {0 = \lambda _{2} F_{2} (0) + \lambda _{1} F_{3} (0)} \hfill \\ \end{array} } \right. $$

Solving for Fi(0), we obtain

$$ \begin{aligned} & F_{1} (0) = \frac{{a_{2} a_{3} }}{{a_{1} a_{2} a_{3} - a_{3} a_{4} - a_{2} a_{5} }} \\ & F_{2} (0) = \frac{{a_{2} a_{3} \lambda _{1} }}{{a_{1} a_{2}^{2} a_{3} - a_{2} a_{3} a_{4} - a_{2}^{2} a_{5} }} \\ & F_{3} (0) = \frac{{a_{2} a_{3} \lambda _{2} }}{{a_{1} a_{2} a_{3}^{2} - a_{2} a_{3} a_{5} - a_{3}^{2} a_{4} }}. \\ \end{aligned} $$
(6.30)

Thus, the system MTTF is

$$ \begin{aligned} MTTF & = \sum\limits_{{i = 1}}^{3} {F_{i} (0)} \\ & = \frac{{a_{2} a_{3} }}{{a_{1} a_{2} a_{3} - a_{3} a_{4} - a_{2} a_{5} }} + \frac{{a_{2} a_{3} \lambda _{1} }}{{a_{1} a_{2}^{2} a_{3} - a_{2} a_{3} a_{4} - a_{2}^{2} a_{5} }} \\ & \quad + \frac{{a_{2} a_{3} \lambda _{2} }}{{a_{1} a_{2} a_{3}^{2} - a_{2} a_{3} a_{5} - a_{3}^{2} a_{4} }}. \\ \end{aligned} $$
(6.31)

When \(\lambda _{1} = \lambda _{2} = \lambda\) and \(\mu _{1} = \mu _{2} = \mu ,\) from Eq. (6.29) and (6.31), we can show that the system reliability and the MTTF are given as follows:

$$ R(t) = \frac{{2\lambda ^{2} }}{{\alpha _{1} - \alpha _{2} }}\left( {\frac{{e^{{ - \alpha _{2} t}} }}{{\alpha _{2} }} - \frac{{e^{{ - \alpha _{1} t}} }}{{\alpha _{1} }}} \right) $$
(6.32)

where

$$ \alpha _{1} ,\alpha _{2} = \frac{{\left( {3\lambda + \mu } \right) \pm \sqrt {\lambda ^{2} + 6\lambda \mu + \mu ^{2} } }}{2} $$

and

$$ MTTF = \frac{3}{{2\lambda }} + \frac{\mu }{{2\lambda ^{2} }} $$
(6.33)

respectively.

(b) Calculate the reliability function and system MTTF when \(\lambda _{1} = \lambda _{2} = \lambda = 0.003\) per hour, and \(\mu _{1} = \mu _{2} = \mu = 0.1\) per hour, and t = 25 h.

Substitute \(\lambda = 0.003\) and \(\mu = 0.1\) into Eq. (6.32), we obtain

$$ \alpha _{1} = 0.1088346,\quad \alpha _{2} = 0.0001654 $$

thus, the system reliability at the mission time t = 25 h is

$$ {\text{R}}({\text{t}} = 25) = 0.99722 $$

Similarly, from Eq. (6.33), we obtain the system MTTF is 6055.56 h.

6.2.3 Degraded Systems

In real life, there are many systems may continue to function in a degraded system state (Pham et al. 1996, 1997; Li and Pham a, b). Such systems may perform its function but not at the same full operational state. Define the states of a system, where the transition rate diagram is shown in Fig. 6.11, are as follows:

  • State 1: operational state.

  • State 2: degraded state.

  • State 3: failed state.

We denote the probability of being in state i at time t as Pi(t).

From the rate diagram (Fig. 6.11) we can obtain the following differential equations:

$$ \left\{ {\begin{array}{*{20}l} {\frac{{dP_{1} (t)}}{{dt}} = - \left( {\lambda _{1} + \lambda _{2} } \right)P_{1} (t)} \hfill \\ {\frac{{dP_{2} (t)}}{{dt}} = \lambda _{2} P_{1} (t) - \lambda _{3} P_{2} (t)} \hfill \\ {\frac{{dP_{3} (t)}}{{dt}} = \lambda _{1} P_{1} (t) + \lambda _{3} P_{2} (t)} \hfill \\ {P_{1} (0) = 1,\quad P_{j} (0) = 0,{\text{ j}} \ne 1} \hfill \\ \end{array} } \right. $$
(6.34)

From Eq. (6.34), we can obtain the solution

$$ P_{1} (t) = e^{{ - \left( {\lambda _{1} + \lambda _{2} } \right)t}} $$
(6.35)

We can also show, from Eq. (6.34), that

$$ P_{2} (t) = \frac{{\lambda _{2} }}{{(\lambda _{1} + \lambda _{2} - \lambda _{3} )}}\left( {e^{{ - \lambda _{3} t}} - e^{{ - (\lambda _{1} + \lambda _{2} )t}} } \right) $$
(6.36)

Finally,

$$ P_{3} (t) = 1 - P_{1} (t) - P_{2} (t). $$

The system reliability is given by

$$ \begin{aligned} R(t) & = P_{1} (t) + P_{2} (t) \\ & = e^{{ - \left( {\lambda _{1} + \lambda _{2} } \right)t}} + \frac{{\lambda _{2} }}{{(\lambda _{1} + \lambda _{2} - \lambda _{3} )}}\left( {e^{{ - \lambda _{3} t}} - e^{{ - (\lambda _{1} + \lambda _{2} )t}} } \right) \\ \end{aligned} $$
(6.37)

The system mean time to a complete failure is

$$ \begin{aligned} MTTF & = \int\limits_{0}^{\infty } {R(t)dt} \\ & = \frac{1}{{\lambda _{1} + \lambda _{2} }} + \frac{{\lambda _{2} }}{{(\lambda _{1} + \lambda _{2} - \lambda _{3} )}}\left( {\frac{1}{{\lambda _{3} }} - \frac{1}{{\lambda _{1} + \lambda _{2} }}} \right). \\ \end{aligned} $$
(6.38)

Example 6.10

A computer system used in a data computing center experiences degrade state and complete failure state as follows:

\(\lambda _{1} = 0.0003\) per hour, \(\lambda _{2} = 0.00005\) per hour, and \(\lambda _{3} = 0.008\) per hour.

Here for example, when the system is in degraded state, it will fail at a constant rate 0.008 per hour. From Eqs. (6.356.36), we obtain the following results. Table 6.1 shows the reliability results for various mission times.

$$ P_{1} (t) = e^{{ - \left( {0.0003 + 0.00005} \right)t}} $$
$$ P_{2} (t) = \frac{{0.00005}}{{(0.0003 + 0.00005 - 0.008)}}\left( {e^{{ - 0.008t}} - e^{{ - (0.0003 + 0.00005)t}} } \right) $$
Table 6.1 Reliability and MTTF of degraded computer system

From Eq. (6.38), the system MTTF is given by

$$ MTTF = \frac{1}{{\lambda _{1} + \lambda _{2} }} + \frac{{\lambda _{2} }}{{(\lambda _{1} + \lambda _{2} - \lambda _{3} )}}\left( {\frac{1}{{\lambda _{3} }} - \frac{1}{{\lambda _{1} + \lambda _{2} }}} \right) = 2875. $$

6.2.4 k-Out-Of-n Systems with Degradation

In some environments the components may not fail fully but can degrade and there may exist multiple states of degradation. In such cases, the efficiency especially the performance of the system may decrease (Pham et al. 1996; Yu et al. 2018). This section discusses the reliability of the k-out-of-n systems considering that:

  1. (1)

    The system consists of n independent and identically distributed (i.i.d.) non-repairable components;

  2. (2)

    Each component can have d stages of degradation; degradation stage (d + 1) is a failed state and stage (d + 2) is a catastrophic failure state;

  3. (3)

    The system functions when at least k out of n components function;

  4. (4)

    The components may fail catastrophically and can reach the failed state directly from a good state as well as from a degraded state;

  5. (5)

    A component can survive either until its last degradation or until a catastrophic failure at any stage;

  6. (6)

    All transition rates (i.e., catastrophic and degradation) are constant; and

  7. (7)

    The degradation rate and catastrophic failure rate of a component depends on the state of the component.

Let λi be a transition (degradation) rate of the component from state i to state (i + 1) for i = 1,2, …, d. Let μi be a transition rate of the component from state i to state (d + 2), i.e. catastrophic failure stage. A component may fail catastrophically while it is in any degraded state. A block diagram of such a component is shown in Fig. 6.12.

Fig. 6.12
figure 12

A component flow diagram (Pham 1996)

In general, the system consists of n i.i.d. non-repairable components and at least k components are required for the system to function. Each component starts in a good state (“state 1”). The component continues to perform its function within the system in all of its d level of degraded states of operation. The component no longer performs it function when it reaches its last degradation state at (d + 1) at which point it has degraded to a failed state or when it has failed catastrophically, i.e. state (d + 2), from any of its operational states of degradation. The rate at which the components degrade to a lower state of degradation or fail catastrophically increases as the components degrades from one state to a lower state of degradation. Components that reached the failed state either by degradation or by catastrophic failure cannot longer perform their function and cannot be repaired. In other words, once a component has reached a failed (either degradation or catastrophic) state, it cannot be restored to a good state or any degradation state.

The successful operation of the entire system is expressed as a combination of component success and failure events. We can formulate the component reliability function using the Markov approach. Denote Pi(t) as the probability that a component is in state i at time t. From Fig. 6.12, we can easily obtain the following differential equations using the Markov approach:

$$ \begin{aligned} & \frac{{dP_{1} (t)}}{{dt}} = - \left( {\lambda _{1} + \mu _{1} } \right)P_{1} (t) \\ & \frac{{dP_{i} (t)}}{{dt}} = \lambda _{{i - 1}} P_{{i - 1}} (t) - \left( {\lambda _{i} + \mu _{i} } \right)P_{i} (t)\quad {\text{for}}\;i = 2,3, \ldots ,d \\ & \frac{{dP_{{d + 1}} (t)}}{{dt}} = \lambda _{d} P_{d} (t) \\ & \frac{{dP_{{d + 2}} (t)}}{{dt}} = \sum\limits_{{j = 1}}^{d} {\mu _{j} } P_{j} (t). \\ \end{aligned} $$
(6.39)

Solving the above system of differential equations, we obtain the state probability as follows:

$$ P_{m} (t) = \prod\limits_{{k = 1}}^{m} {\lambda _{{k - 1}} } \left( {\sum\limits_{{i = 1}}^{m} {\frac{{e^{{ - \left( {\lambda _{1} + \mu _{2} } \right)t}} }}{{\prod\limits_{{\begin{array}{*{20}c} {j = 1} \\ {j \ne i} \\ \end{array} }}^{m} {\left( {\lambda _{j} + \mu _{j} - \lambda _{{\text{i}}} - \mu _{{\text{i}}} } \right)} }}} } \right) $$
(6.40)

for m = 1, 2, …, d and \(\lambda _{0} = 1\). Thus, the component reliability is

$$ R_{c} (t) = \sum\limits_{{i = 1}}^{d} {B_{i} } e^{{ - \left( {\lambda _{i} + \mu _{i} } \right)t}} $$
(6.41)

where

$$ B_{i} = \sum\limits_{{m = i}}^{d} {\frac{{\prod\limits_{{k = 1}}^{m} {\lambda _{{k - 1}} } }}{{\prod\limits_{{\begin{array}{*{20}c} {j = 1} \\ {j \ne i} \\ \end{array} }}^{m} {\left( {\lambda _{j} + \mu _{j} - \lambda _{i} - \mu _{i} } \right)} }}} . $$
(6.42)

The mean time to failure of the component (MTTFC) is given by

$$ \begin{aligned} MTTF_{C} & = \int_{0}^{\infty } {R_{C} } (t)dt \\ & = \int_{0}^{\infty } {\sum\limits_{{i = 1}}^{d} {\left( {B_{i} e^{{ - \left( {\lambda _{i} + \mu _{i} } \right)t}} } \right)} } dt \\ & = \sum\limits_{{i = 1}}^{d} {\left( {\frac{{B_{i} }}{{\lambda _{i} + \mu _{i} }}} \right)} . \\ \end{aligned} $$
(6.43)

The k-out-of-n system reliability is

$$R_{S} (t) = \sum\limits_{{i = k}}^{n} {\left( {\begin{array}{*{20}l} n \\ i \\ \end{array} } \right)} \left[ {R_{C} (t)} \right]^{i} \left[ {1 - R_{C} (t)} \right]^{{n - i}}$$
(6.44)

where RC(t) is in Eq. (6.41). After several algebra simplifications, we obtain the system reliability (Pham et al., 1996):

$$R_{s} (t) = \sum\limits_{{i = k}}^{n} i !\;A_{i} \sum\limits_{{\sum\limits_{{j = 1}}^{d} {i_{j} } = i}} {\left[ {\prod\limits_{{j = 1}}^{d} {\left( {\frac{{B_{j}^{{i_{j} }} }}{{i_{j} !}}} \right)} } \right]} \;e^{{ - \sum\limits_{{j = 1}}^{d} {i_{j} } \left( {\lambda _{j} + \mu _{j} } \right)t}}$$
(6.45)

where \(A_{i} = ( - 1)^{{i - k}} \left( {\begin{array}{*{20}l} {i - 1} \\ {k - 1} \\ \end{array} } \right)\left( {\begin{array}{*{20}l} n \\ i \\ \end{array} } \right)\) . The MTTF of the system is

$$ MTTF_{S} = \int_{0}^{\infty } {R_{s} } (t)dt $$

where Rs(t) is given in Eq. (6.45). Therefore, system MTTF is

$$MTTF_{S} = \sum\limits_{{i = k}}^{n} i !\;A_{i} \sum\limits_{{\sum\limits_{{j = 1}}^{d} {i_{j} } = i}} {\left( {\frac{{\prod\limits_{{j = 1}}^{d} {\left( {\frac{{B_{j}^{{i_{j} }} }}{{i_{j} !}}} \right)} }}{{\sum\limits_{{j = 1}}^{d} {i_{j} } \left( {\lambda _{j} + \mu _{j} } \right)}}} \right)} .$$
(6.46)

For components without catastrophic failure. When there is no catastrophic failure, the catastrophic failure rate \(\mu _{i}\) in Eq. (6.45) becomes zero. From Eq. (6.41), we obtain

$$ R_{C} (t) = \sum\limits_{{i = 1}}^{d} {B_{i} } e^{{ - \lambda _{i} t}} $$
(6.47)

where

$$ B_{i} = \prod\limits_{{\begin{array}{*{20}c} {j = 1} \\ {j \ne i} \\ \end{array} }}^{d} {\frac{{\lambda _{j} }}{{\lambda _{j} - \lambda _{i} }}} \quad \quad i = 1,2, \ldots ,d $$

Similarly, the component MTTF is

$$ MTTF_{C} = \sum\limits_{{i = 1}}^{d} {\frac{{B_{i} }}{{\lambda _{i} }}} = \sum\limits_{{i = 1}}^{d} {\frac{1}{{\lambda _{i} }}} $$
(6.48)

The system reliability and MTTF for this special case are computed from the general forms of the Eqs. (6.45) and (6.46), respectively:

$$R_{s} (t) = \sum\limits_{{i = k}}^{n} i !\;A_{i} \sum\limits_{{\sum\limits_{{j = 1}}^{d} {i_{j} } = i}} {\left[ {\prod\limits_{{j = 1}}^{d} {\left( {\frac{{B_{j}^{{i_{j} }} }}{{i_{j} !}}} \right)} } \right]} \;e^{{ - \sum\limits_{{j = 1}}^{d} {i_{j} } \lambda _{j} t}}$$
(6.49)

where \(A_{i} = ( - 1)^{{i - k}} \left( {\begin{array}{*{20}l} {i - 1} \\ {k - 1} \\ \end{array} } \right)\left( {\begin{array}{*{20}l} n \\ i \\ \end{array} } \right)\) and

$$ MTTF_{S} = \sum\limits_{{i = k}}^{n} {i!} {\text{ }}A_{i} {\text{ }}\sum\limits_{{\sum\limits_{{j = 1}}^{d} {i_{j} = i} }} {\left[ {\frac{{\prod\limits_{{j = 1}}^{d} {\left( {\frac{{B_{j}^{{i_{j} }} }}{{i_{j} !}}} \right)} }}{{\sum\limits_{{j = 1}}^{{\text{d}}} {i_{j} \lambda _{j} } }}} \right]} $$
(6.50)

Example 6.11

Consider a 2-out-of-5 system where components consist of two stages of degradation (d = 2) with the following values:

$$ \lambda _{1} = 0.015/h,\quad \mu _{1} = 0.0001/h,\quad \lambda _{2} = 0.020/h{\text{,}}\;{\text{and}}\;\mu _{2} = 0.0002/h $$

Given n = 5, k = 2, λ1 = 0.015, λ2 = 0.02, μ1 = 0.0001, μ2 = 0.0002, d = 2. There are two cases as follows.

  • Case 1: From Eq. (6.42), components can fail by degradation and by catastrophic events:

$$ \begin{aligned} {\text{B}}_{1} & = {\uplambda }_{0} + \frac{{\lambda _{1} \lambda _{2} }}{{\lambda _{2} + \mu _{2} - \lambda _{1} - \mu _{1} }} \\ & = 1 + \frac{{(0.015)(1)}}{{0.02 + 0.0002 - 0.015 - 0.0001}} = 3.94 \\ \end{aligned} $$
$$ {\text{B}}_{2} = \frac{{\lambda _{0} \lambda _{1} }}{{\lambda _{1} + \mu _{1} - \lambda _{2} - \mu _{2} }} = \frac{{(1)(0.015)}}{{0.015 + 0.0001 - 0.020 - 0.0002}} = - 2.94 $$
$$ {\text{MTTF}}_{{\text{C}}} = \frac{{B_{1} }}{{\lambda _{1} + \mu _{1} }} + \frac{{B_{2} }}{{\lambda _{2} + \mu _{2} }} = 115.4 $$
$$ R_{C} (t) = \sum\limits_{{i = 1}}^{2} {B_{i} } e^{{ - \left( {\lambda _{i} + \mu _{i} } \right)t}} = B_{1} e^{{ - \left( {\lambda _{1} + \mu _{1} } \right)t}} + B_{2} e^{{ - \left( {\lambda _{2} + \mu _{2} } \right)t}} $$

For t = 1, then

$$ R_{C} (t = 1) = 3.94\,e^{{ - (0.015 + 0.0001)(1)}} + ( - 2.94)\,e^{{ - (0.02 + 0.0002)(1)}} = 0.9998 $$

Similarly,

$$ R_{S} (t) = \sum\limits_{{i = 2}}^{5} {\left( {\begin{array}{*{20}l} 5 \hfill \\ i \hfill \\ \end{array} } \right)} \left[ {R_{C} (t)} \right]^{i} \left[ {1 - R_{C} (t)} \right]^{{5 - i}} $$

For t = 1, then Rs(t = 1) ≈ 1 and MTTF = \(\int_{0}^{\infty } {{\text{R}}_{{\text{S}}} } ({\text{t}}){\text{dt}} =\) 144.5 h. Tables 6.2 and 6.3 present the tabulated reliability of the 2-out-of-5 system with and without catastrophic failures for varying mission time t, respectively.

Table 6.2 Reliability of 2-out-of-5 system with catastrophic failures for varying time t
Table 6.3 Reliability of 2-out-of-5 system without catastrophic failures for varying time t
  • Case 2: Components can only fail by degradation (no catastrophic failures (μ1 = μ2 = 0)):

    $$ {\text{B}}_{1} = \frac{{\lambda _{2} }}{{\lambda _{2} - \lambda _{1} }} = \frac{{0.02}}{{0.02 - 0.015}} = 4 $$
    $$ {\text{B}}_{2} = \frac{{\lambda _{1} }}{{\lambda _{1} - \lambda _{2} }} = \frac{{0.015}}{{0.015 - 0.02}} = - 3 $$

Then we can easily obtain as follows:

$$ {\text{R}}_{{\text{C}}} ({\text{t}} = 1) = 0.9999;\quad {\text{R}}_{{\text{C}}} ({\text{t}} = 5) = 0.9965 $$
$$ {\text{MTTF}}_{{\text{C}}} = \frac{1}{{\lambda _{1} }} + \frac{1}{{\lambda _{2} }} = \frac{1}{{0.015}} + \frac{1}{{0.02}} = 116.67\;{\text{h}} $$

The system MTTF is: MTTF = 145.9 h. The catastrophic failure process has decreased both the system reliability and component reliability.

6.2.5 Degraded Systems with Partial Repairs

In some environments, systems might not always fail fully, but can degrade and there can be multiple stages of degradation. In such cases, the efficiency of the system may decrease. After a certain stage of degradation the efficiency of the system may decrease to an unacceptable limit and can be considered as a total failure (Pham et al. 1997). In addition, the system can fail partially from any stage and can be repaired. The repair action cannot bring the system to the good stage but can make it operational and the failure rate of the system will remain the same as before the failure. This section discusses a model for predicting the reliability and availability of multistage degraded systems with partial repairs based on the results by Pham et al. (Pham et al. 1997).

Initially, the system is considered to be in its good state. After some time, it can either go to the first degraded state upon degradation or can go to a failed state upon a partial failure. If the system fails partially, the repair action starts immediately, and after repair the system will be restored to the good state, will be kept in operation and the process will be repeated. However, if the system reaches the first degraded state (state 3), it can either go to the second degraded state (state 5) upon degradation or can go to the failed state upon a partial failure with increased transition rates. If the system fails partially at this stage, after repair the system will be restored back to the first degraded state, and will be kept in operation. Figure 6.13 shows the system flow diagram transition process where: State (i): State of the system; State (1): Good; State (2i-1): Degraded; State (2i): Failed (Partially); and State (2d + 1) Failed (Completely).

Fig. 6.13
figure 13

A system state diagram (Pham 1997)

Assumptions

  1. 1.

    System can have d-stages of degradation (dth stage is complete failure state).

  2. 2.

    The system might fail partially from a good state as well as from any degraded state.

  3. 3.

    System can be restored back from a partially failed state to its original state just before the failure.

  4. 4.

    All transition rates are constant (i.e., degradation, partial failure, and partial repair rates).

  5. 5.

    The degradation as well as repair rates of the system depend upon the state of the system (i.e., degradation level).

Figure 1: System flow diagram.

Notation.

d::

number of operational states.

State (2i)::

partially failed states; i = 1,2, …, d

State 1::

good state.

State (2i − 1)::

degraded operational states; i = 2,3, …, d

αi::

transition (degradation) rate from state (2i − 1) to (2i + 1).

λi::

transition (partial failure) rate from state (2i − 1) to (2i).

μi::

transition (partial repair) rate from state (2i) to (2i − 1).

Using the Markov approach, we can obtain the following equations:

$$ \begin{aligned} & \frac{{dP_{1} (t)}}{{dt}} = - \left( {\alpha _{1} + \lambda _{1} } \right)P_{1} (t) + \mu _{1} P_{2} (t) \\ & \frac{{dP_{{(2i - 1)}} (t)}}{{dt}} = - \left( {\alpha _{i} + \lambda _{1} } \right)P_{{(2i - 1)}} (t) + \mu _{i} P_{{(2i)}} (t) + \alpha _{{i - 1}} P_{{(2i - 3)}} (t)\quad {\text{for}}\;i = 2,3, \ldots ,d \\ & \frac{{dP_{{(2i)}} (t)}}{{dt}} = - \mu _{i} P_{{(2i)}} (t) + \lambda _{i} P_{{(2i - 1)}} (t)\quad {\text{for}}\;i = 1,2, \ldots ,d \\ & \frac{{dP_{{(2d + 1)}} (t)}}{{dt}} = \alpha _{d} P_{{(2d - 1)}} (t). \\ \end{aligned} $$
(6.51)

Taking Laplace transformations of each of these equations and simplifications, we obtain the following equations:

$$ \begin{aligned} P_{{(2i - 1)}} (t) & = \sum\limits_{{k = 1}}^{i} {\left( {A_{{ik}} e^{{ - \beta _{k} t}} + B_{{ik}} e^{{ - \gamma _{i} t}} } \right)} \\ P_{{(2i)}} (t) & = \sum\limits_{{k = 1}}^{i} {\left( {C_{{ik}} e^{{ - \beta _{k} t}} + D_{{ik}} e^{{ - \gamma _{k} t}} } \right)} \\ \end{aligned} $$
(6.52)

where

$$ \begin{aligned} & \beta _{i} = \frac{{\left( {\alpha _{i} + \lambda _{i} + \mu _{i} } \right) + \sqrt {\left( {\alpha _{i} + \lambda _{i} + \mu _{i} } \right)^{2} - 4\alpha _{i} \mu _{i} } }}{2} \\ & \gamma _{i} = \frac{{\left( {\alpha _{i} + \lambda _{i} + \mu _{i} } \right) - \sqrt {\left( {\alpha _{i} + \lambda _{1} + \mu _{i} } \right)^{2} - 4\alpha _{i} \mu _{i} } }}{2}\quad {\text{for}}\;i = 1,2, \ldots ,d \\ \end{aligned} $$

and

$$ \begin{aligned} & A_{{ik}} = \left\{ {\begin{array}{*{20}l} {\frac{{\left( {\mu _{1} - \beta _{1} } \right)}}{{\left( {\gamma _{1} - \beta _{1} } \right)}}\quad {\text{for}}\;i = 1,k = 1} \hfill \\ {\left( {\prod\limits_{{m = 1}}^{{i - 1}} {\alpha _{m} } } \right)\left( {\prod\limits_{{m = 1}}^{i} {\mu _{m} } - \beta _{k} } \right)\left( {\prod\limits_{{\begin{array}{*{20}c} {m = 1} \\ {m \ne k} \\ \end{array} }}^{i} {\frac{1}{{\left( {\beta _{m} - \beta _{k} } \right)}}} } \right)\left( {\prod\limits_{{m = 1}}^{i} {\frac{1}{{\left( {\gamma _{m} - \beta _{k} } \right)}}} } \right){\text{ for }}i = 2, \ldots ,d} \hfill \\ \end{array} } \right. \\ & B_{{ik}} = \left\{ {\begin{array}{*{20}l} {\frac{{\left( {\mu _{1} - \gamma _{1} } \right)}}{{\left( {\beta _{1} - \gamma _{1} } \right)}}\quad {\text{for}}\;i = 1,k = 1} \hfill \\ {\left( {\prod\limits_{{m = 1}}^{{i - 1}} {\alpha _{m} } } \right)\left( {\prod\limits_{{m = 1}}^{i} {\mu _{m} } - \gamma _{k} } \right)\left( {\prod\limits_{{\begin{array}{*{20}c} {m = 1} \\ {m \ne k} \\ \end{array} }}^{i} {\frac{1}{{\left( {\gamma _{m} - \gamma _{k} } \right)}}} } \right)\left( {\prod\limits_{{m = 1}}^{i} {\frac{1}{{\left( {\beta _{m} - \gamma _{k} } \right)}}} } \right){\text{ for }}i = 2, \ldots ,d} \hfill \\ \end{array} } \right. \\ & C_{{ik}} = \left\{ {\begin{array}{*{20}l} {\frac{{\lambda _{1} }}{{\left( {\gamma _{1} - \beta _{1} } \right)}}\quad {\text{for}}\;i = 1,k = 1} \hfill \\ {\lambda _{i} \left( {\prod\limits_{{m = 1}}^{{i - 1}} {\alpha _{m} } } \right)\left( {\prod\limits_{{m = 1}}^{i} {\mu _{m} } - \beta _{k} } \right)\left( {\prod\limits_{{\begin{array}{*{20}c} {m = 1} \\ {m = k} \\ \end{array} }}^{i} {\frac{1}{{\left( {\beta _{m} - \beta _{k} } \right)}}} } \right)\left( {\prod\limits_{{m = 1}}^{i} {\frac{1}{{\left( {\gamma _{m} - \beta _{k} } \right)}}} } \right){\text{ for }}i = 2, \ldots ,d} \hfill \\ \end{array} } \right. \\ & D_{{ik}} = \left\{ {\begin{array}{*{20}l} {\frac{{\lambda _{1} }}{{\left( {\beta _{1} - \gamma _{1} } \right)}}\quad {\text{for}}\;i = 1,k = 1} \hfill \\ {\lambda _{i} \left( {\prod\limits_{{m = 1}}^{{i - 1}} {\alpha _{m} } } \right)\left( {\prod\limits_{{m = 1}}^{i} {\mu _{m} } - \gamma _{k} } \right)\left( {\prod\limits_{{\begin{array}{*{20}c} {m = 1} \\ {m = 1} \\ \end{array} }}^{i} {\frac{1}{{\left( {\gamma _{m} - \gamma _{k} } \right)}}} } \right)\left( {\prod\limits_{{m = 1}}^{i} {\frac{1}{{\left( {\beta _{m} - \gamma _{k} } \right)}}} } \right){\text{ for }}i = 2, \ldots ,d} \hfill \\ \end{array} } \right. \\ \end{aligned} $$
(6.53)

The availability A(t) of the system (i.e., the probability that system will be found in an operational (either good or degraded) state at time t) is given by:

$$ A(t) = \sum\limits_{{i = 1}}^{d} {P_{{(2i - 1)}} } (t) = \sum\limits_{{i = 1}}^{d} {\sum\limits_{{k = 1}}^{i} {\left( {A_{{ik}} e^{{ - \beta _{i} t}} + B_{{ik}} e^{{ - \gamma _{i} t}} } \right)} } . $$
(6.54)

The system unavailability due to partial failures is

$$ D(t) = \sum\limits_{{i = 1}}^{d} {P_{{(2i)}} } (t) = \sum\limits_{{i = 1}}^{d} {\sum\limits_{{k = 1}}^{i} {\left( {C_{{ik}} e^{{ - \beta _{k} t}} + D_{{ik}} e^{{ - \gamma _{k} t}} } \right)} } . $$
(6.55)

Thus, the probability that the system fails completely before time t is:

$$ \begin{aligned} F(t) & = 1 - A(t) - D(t) \\ & = 1 - \sum\limits_{{i = 1}}^{d} {\sum\limits_{{k = 1}}^{i} {\left[ {\left( {A_{{ik}} + C_{{ik}} } \right)e^{{ - \beta _{k} t}} + \left( {B_{{ik}} + D_{{ik}} } \right)e^{{ - \gamma _{k} t}} } \right]} } . \\ \end{aligned} $$
(6.56)

After simplifications, we obtain

$$ F(t) = 1 - \sum\limits_{{i = 1}}^{d} {\left( {X_{i} e^{{ - \beta _{i} t}} + Y_{i} e^{{ - \gamma _{i} t}} } \right)} $$
(6.57)

where

$$ \begin{aligned} & X_{i} = \frac{1}{{\beta _{i} }}\left( {\prod\limits_{{m = 1}}^{d} {\frac{{\alpha _{m} \left( {\mu _{m} - \beta _{i} } \right)}}{{\left( {\gamma _{m} - \beta _{i} } \right)}}} } \right)\left( {\prod\limits_{{\begin{array}{*{20}c} {m = 1} \\ {m \ne i} \\ \end{array} }}^{d} {\frac{1}{{\left( {\beta _{m} - \beta _{i} } \right)}}} } \right) \\ & Y_{i} = \frac{1}{{\gamma _{i} }}\left( {\prod\limits_{{m = 1}}^{d} {\frac{{\alpha _{m} \left( {\mu _{m} - \gamma _{i} } \right)}}{{\left( {\beta _{m} - \gamma _{i} } \right)}}} } \right)\left( {\prod\limits_{{\begin{array}{*{20}c} {m = 1} \\ {m \ne i} \\ \end{array} }}^{d} {\frac{1}{{\left( {\gamma _{m} - \gamma _{i} } \right)}}} } \right). \\ \end{aligned} $$
(6.58)

If the repair time tends to infinity (or repair rate is zero), then the total operational time becomes the time to first failure. Therefore, the system reliability R(t) can be obtained from A(t) by substituting zeros for all repair rates. Thus, we obtain

$$ R(t) = \sum\limits_{{i = 1}}^{d} {L_{i} } e^{{ - \left( {\alpha _{i} + \lambda _{i} } \right)t}} $$
(6.59)

where

$$L_{i} = \sum\limits_{{m = i}}^{d} {\left( {\frac{{\prod\limits_{{k = 1}}^{m} {\alpha _{{k - 1}} } }}{{\prod\limits_{{\begin{array}{*{20}c} {j = 1} \\ {j \ne i} \\ \end{array} }}^{m} {\left( {\alpha _{j} + \lambda _{j} - \alpha _{i} - \lambda _{i} } \right)} }}} \right)} \quad {\text{for}}\;i = 1,2, \ldots ,d\;{\text{and}}\;\alpha_{0} = 1.$$
(6.60)

The mean time to first failure of the system (MTTF) is given by

$$ \begin{aligned} MTTF & = \int_{0}^{\infty } R (t)dt = \int_{0}^{\infty } {\left( {\sum\limits_{{i = 1}}^{d} {L_{i} } e^{{ - \left( {\alpha _{i} + \lambda _{i} } \right)t}} } \right)} dt \\ & = \sum\limits_{{i = 1}}^{d} {\frac{{L_{i} }}{{\alpha _{i} + \lambda _{i} }}} . \\ \end{aligned} $$
(6.61)

Example 6.12

Consider a multistage repairable system with d = 2 (stages of degradation) and degradation rates: α1 = 0.001, α2 = 0.002; with partial failure rates: λ1 = 0.01, λ2 = 0.05, and repairing rates: μ1 = 0.02, and μ2 = 0.01. Calculate the system availability and reliability using Eqs. (6.54) and (6.59) (Fig. 6.14).

Fig. 6.14
figure 14

System flow diagram with d = 2

From Eqs. (6.54), (6.55), and (6.59), we obtain the reliability results as shown in Table 6.4. The system mean time to first failure (MTTF) is 92.7 (units of time).

Table 6.4 Reliability measures for various time t

6.3 Counting Processes

Among discrete stochastic processes, counting processes in reliability engineering are widely used to describe the appearance of events in time, e.g., failures, number of perfect repairs, etc. The simplest counting process is a Poisson process. The Poisson process plays a special role to many applications in reliability (Pham 2000). A classic example of such an application is the decay of uranium. Radioactive particles from nuclear material strike a certain target in accordance with a Poisson process of some fixed intensity. A well-known counting process is the so-called renewal process. This process is described as a sequence of events, the intervals between which are independent and identically distributed random variables. In reliability theory, this type of mathematical model is used to describe the number of occurrences of an event in the time interval. In this section we also discuss the quasi-renewal process and the non-homogeneous Poisson process.

A non-negative, integer-valued stochastic process, N(t), is called a counting process if N(t) represents the total number of occurrences of the event in the time interval [0, t] and satisfies these two properties:

  1. 1.

    If t1 < t2, then N(t1) ≤ N(t2)

  2. 2.

    If t1 < t2, then N(t2) - N(t1) is the number of occurrences of the event in the

    interval [t1 , t2].

For example, if N(t) equals the number of persons who have entered a restaurant at or prior to time t, then N(t) is a counting process in which an event occurs whenever a person enters the restaurant.

6.3.1 Poisson Processes

One of the most important counting processes is the Poisson process.

Definition 6.2

A counting process, N(t), is said to be a Poisson process with intensity λ if.

  1. 1.

    The failure process, N(t), has stationary independent increments

  2. 2.

    The number of failures in any time interval of length s has a Poisson distribution with mean λs, that is,

    $$ P\{ N(t + s) - N(t) = n\} = \frac{{e^{{ - \lambda s}} (\lambda s)^{n} }}{{n!}}\quad n = 0,1,2, \ldots $$
    (6.62)
  3. 3.

    The initial condition is N(0) = 0

This model is also called a homogeneous Poisson process indicating that the failure rate λ does not depend on time t. In other words, the number of failures occurring during the time interval (t, t + s] does not depend on the current time t but only the length of time interval s. A counting process is said to possess independent increments if the number of events in disjoint time intervals are independent.

For a stochastic process with independent increments, the auto-covariance function is

$$ Cov[X(t_{1} ),X(t_{2} )] = \left\{ {\begin{array}{*{20}l} {Var[N(t_{1} + s) - N(t_{2} )]} \hfill & {{\text{for}}\quad 0 < t_{2} - t_{1} < s} \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right. $$

where

$$ X(t) = N(t + s) - N(t). $$

If X(t) is Poisson distributed, then the variance of the Poisson distribution is

$$ Cov[X(t_{1} ),X(t_{2} )] = \left\{ {\begin{array}{*{20}l} {\lambda [s - (t_{2} - t_{1} )]} \hfill & {{\text{for}}\quad 0 < t_{2} - t_{1} < s} \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right. $$

This result shows that the Poisson increment process is covariance stationary. We now present several properties of the Poisson process.

Property 6.1

The sum of independent Poisson processes, N1(t), N2(t), …., Nk(t), with mean values λ1t, λ2t, …., λkt respectively, is also a Poisson process with mean \(\left( {\sum\limits_{{i = 1}}^{k} {\lambda _{i} } } \right)t\). In other words, the sum of the independent Poisson processes is also a Poisson process with a mean that is equal to the sum of the individual Poisson process’ mean.

Property 6.2

The difference of two independent Poisson processes, N1(t), and N2(t), with mean λ1t and λ2t, respectively, is not a Poisson process. Instead, it has the probability mass function.

$$ P[N_{1} (t) - N_{2} (t) = k] = e^{{ - (\lambda _{1} + \lambda _{2} )t}} \left( {\frac{{\lambda _{1} }}{{\lambda _{2} }}} \right)^{{\frac{k}{2}}} I_{k} (2\sqrt {\lambda _{1} \lambda _{2} } t), $$
(6.63)

where Ik(.) is a modified Bessel function of order k.

Proof

Define N(t) = N1(t) - N2(t). We have

$$ P[N(t) = k] = \sum\limits_{{i = 0}}^{\infty } {P[N_{1} (t) = k + i]} {\text{ }}P[N_{2} (t) = i]. $$

Since Ni(t) for i = 1, 2 is a Poisson process with mean λit, therefore,

$$ \begin{aligned} P[N(t) = k] & = \sum\limits_{{i = 0}}^{\infty } {\frac{{e^{{ - \lambda _{1} t}} \left( {\lambda _{1} t} \right)^{{k + i}} }}{{\left( {k + i} \right)!}}} \frac{{e^{{ - \lambda _{2} t}} \left( {\lambda _{2} t} \right)^{i} }}{{i!}} \\ & = e^{{ - (\lambda _{1} + \lambda _{2} )t}} \left( {\frac{{\lambda _{1} }}{{\lambda _{2} }}} \right)^{{\frac{k}{2}}} \sum\limits_{{i = 0}}^{\infty } {\frac{{\left( {\sqrt {\lambda _{1} \lambda _{2} } t} \right)^{{2i + k}} }}{{i!\left( {k + i} \right)!}}} \\ & = e^{{ - (\lambda _{1} + \lambda _{2} )t}} \left( {\frac{{\lambda _{1} }}{{\lambda _{2} }}} \right)^{{\frac{k}{2}}} I_{k} (2\sqrt {\lambda _{1} \lambda _{2} } t). \\ \end{aligned} $$

Property 6.3

If the Poisson process, N(t), with mean λt, is filtered such that every occurrence of the event is not completely counted, then the process has a constant probability p of being counted. The result of this process is a Poisson process with mean λpt.

Property 6.4

Let N(t) be a Poisson process and Yi a family of independent and identically distributed random variables which are also independent of N(t). A stochastic process X(t) is said to be a compound Poisson process if it can be represented as.

$$ X(t) = \sum\limits_{{i = 1}}^{{N(t)}} {Y_{i} } . $$

6.3.2 Renewal Processes

A renewal process is a more general case of the Poisson process in which the inter-arrival times of the process or the time between failures do not necessarily follow the exponential distribution. For convenience, we will call the occurrence of an event a renewal, the inter-arrival time the renewal period, and the waiting time or repair time the renewal time.

Definition 6.3

A counting process N(t) that represents the total number of occurrences of an event in the time interval (0, t] is called a renewal process, if the time between failures are independent and identically distributed random variables.

The probability that there are exactly n failures occurring by time t can be written as

$$ P\{ N(t) = n\} = P\{ N(t) \ge n\} - P\{ N(t) > n\} $$
(6.64)

Note that the times between the failures are T1, T2, …, Tn so the failures occurring at time Wk are

$$ W_{k} = \sum\limits_{{i = 1}}^{k} {T_{i} } $$

and

$$ T_{k} = W_{k} - W_{{k - 1}} $$

Thus,

$$ \begin{aligned} P\{ N(t) = n\} & = P\{ N(t) \ge n\} - P\{ N(t) > n\} \\ & = P\{ W_{n} \le t\} - P\{ W_{{n + 1}} \le t\} \\ & = F_{n} (t) - F_{{n + 1}} (t) \\ \end{aligned} $$
(6.65)

where Fn(t) is the cumulative distribution function for the time of the nth failure and n = 0,1,2, ….

Example 6.13

Consider a software testing model for which the time to find an error during the testing phase has an exponential distribution with a failure rate of X. It can be shown that the time of the nth failure follows the gamma distribution with parameters k and n with probability density function. From Eq. (6.65) we obtain

$$ \begin{aligned} P\{ N(t) = n\} & = P\{ N(t) \le n\} - P\{ N(t) \le n - 1\} \\ & = \sum\limits_{{k = 0}}^{n} {\frac{{(\lambda t)^{k} }}{{k!}}} e^{{ - \lambda t}} - \sum\limits_{{k = 0}}^{{n - 1}} {\frac{{(\lambda t)^{k} }}{{k!}}} e^{{ - \lambda t}} \\ & = \frac{{(\lambda t)^{n} }}{{n!}}e^{{ - \lambda t}} \quad {\text{for}}\;n = 0,1,2, \ldots . \\ \end{aligned} $$
(6.66)

Several important properties of the renewal function are given below.

Property 6.5

The mean value function of the renewal process, denoted by M(t), is equal to the sum of the distribution function of all renewal times, that is,

$$ M(t) = E[N(t)] = \sum\limits_{{n = 1}}^{\infty } {F_{n} (t)} $$

Proof

The renewal function can be obtained as

$$ \begin{aligned} M(t) & = E[N(t)] \\ & = \sum\limits_{{n = 1}}^{\infty } {nP\{ N(t) = n\} } \\ & = \sum\limits_{{n = 1}}^{\infty } {n[F_{n} (t) - } F_{{n + 1}} (t)] \\ & = \sum\limits_{{n = 1}}^{\infty } {F_{n} (t)} . \\ \end{aligned} $$
(6.67)

The mean value function, M(t), of the renewal process is also called the renewal function. In other words, the mean value function represents the expected number of renewals in [0, t].

Property 6.6

The renewal function, M(t), satisfies the following equation:

$$ M(t) = F(t) + \int\limits_{0}^{t} {M(t - s)dF(s)} $$
(6.68)

where F(t) is the distribution function of the inter-arrival time or the renewal period. The proof is left as an exercise for the reader (see Problem 7).

In general, let y(t) be an unknown function to be evaluated and x(t) be any non-negative and integrable function associated with the renewal process. Assume that F(t) is the distribution function of the renewal period. We can then obtain the following result.

Property 6.7

Let the renewal equation be.

$$ y(t) = x(t) + \int\limits_{0}^{t} {y(t - s)dF(s)} $$
(6.69)

then its solution is given by

$$ y(t) = x(t) + \int\limits_{0}^{t} {x(t - s)dM(s)} $$

where M(t) is the mean value function of the renewal process.

The proof of the above property can be easily derived using the Laplace transform. It is also noted that the integral equation given in Property 6.6 is a special case of Property 6.7.

Example 6.14

Let x(t) = a. Thus, from Property 6.7, the solution y(t) is given by

$$ \begin{aligned} y(t) & = x(t) + \int\limits_{0}^{t} {x(t - s)dM(s)} \\ & = a + \int\limits_{0}^{t} {a{\text{ }}dM(s)} \\ & = a(1 + E[N(t)]). \\ \end{aligned} $$

6.3.3 Quasi-Renewal Processes

In this section, a general renewal process, namely, the quasi-renewal process, is discussed. Let {N(t), t > 0} be a counting process and let Xn be the time between the (n − 1)th and the nth event of this process, n ≥ 1.

Definition 6.4

(Wang and Pham 1996): If the sequence of non-negative random variables {X1, X2, ….} is independent and.

$$ X_{i} = aX_{{i - 1}} $$
(6.70)

for i ≥ 2 where α > 0 is a constant, then the counting process {N(t), t ≥ 0} is said to be a quasi-renewal process with parameter and the first inter-arrival time X1.

When α = 1, this process becomes the ordinary renewal process as discussed in Sect. 2.6.2. This quasi-renewal process can be used to model reliability growth processes in software testing phases and hardware burn-in stages for α > 1, and in hardware maintenance processes when α ≤ 1.

Assume that the probability density function, cumulative distribution function, survival function, and failure rate of random variable X1 are f1(x), F1(x), s1(x), and r1(x), respectively. Then the pdf, cdf, survival function, failure rate of Xn for n = 1, 2, 3, … is respectively given below (Wang and Pham 1996):

$$ \begin{aligned} & f_{n} (x) = \frac{1}{{\alpha ^{{n - 1}} }}f_{1} \left( {\frac{1}{{\alpha ^{{n - 1}} }}x} \right) \\ & F_{n} (x) = F_{1} \left( {\frac{1}{{\alpha ^{{n - 1}} }}x} \right) \\ & s_{n} (x) = s_{1} \left( {\frac{1}{{\alpha ^{{n - 1}} }}x} \right) \\ & f_{n} (x) = \frac{1}{{\alpha ^{{n - 1}} }}r_{1} \left( {\frac{1}{{\alpha ^{{n - 1}} }}x} \right). \\ \end{aligned} $$
(6.71)

Similarly, the mean and variance of Xn is given as

$$ \begin{aligned} & E(X_{n} ) = \alpha ^{{n - 1}} E(X_{1} ) \\ & Var(X_{n} ) = \alpha ^{{2n - 2}} Var(X_{1} ). \\ \end{aligned} $$

Because of the non-negativity of X1 and the fact that X1 is not identically 0, we obtain

$$ E(X_{1} ) = \mu _{1} \ne 0 $$

Property 6.8

(Wang and Pham 1996): The shape parameters of Xn are the same for n = 1, 2, 3, … for a quasi-renewal process if X1 follows the gamma, Weibull, or log normal distribution.

This means that after “renewal”, the shape parameters of the inter-arrival time will not change. In software reliability, the assumption that the software debugging process does not change the error-free distribution type seems reasonable. Thus, the error-free times of software during the debugging phase modeled by a quasi-renewal process will have the same shape parameters. In this sense, a quasi-renewal process is suitable to model the software reliability growth. It is worthwhile to note that

$$ \begin{aligned} \mathop {\lim }\limits_{{n \to \infty }} \frac{{E(X_{1} + X_{2} + \ldots + X_{n} )}}{n} & = \mathop {\lim }\limits_{{n \to \infty }} \frac{{\mu _{1} (1 - \alpha ^{n} )}}{{(1 - \alpha )n}} \\ & = 0\quad {\text{if}}\;\alpha < 1 \\ & = \infty \quad {\text{if}}\;\alpha > 1 \\ \end{aligned} $$

Therefore, if the inter-arrival time represents the error-free time of a software system, then the average error-free time approaches infinity when its debugging process is occurring for a long debugging time.

Distribution of N(t).

Consider a quasi-renewal process with parameter α and the first inter-arrival time X1. Clearly, the total number of renewals, N(t), that has occurred up to time t and the arrival time of the nth renewal, SSn, has the following relationship:

$$ N(t) \ge n\;{\text{if}}\;{\text{and}}\;{\text{only}}\;{\text{if}}\;SS_{n} \le t $$

that is, N(t) is at least n if and only if the nth renewal occurs prior to time t. It is easily seen that

$$ SS_{n} = \sum\limits_{{i = 1}}^{n} {X_{i} = } \sum\limits_{{i = 1}}^{n} {\alpha ^{{i - 1}} X_{1} \quad {\text{for}}\quad n \ge 1} $$
(6.72)

Here, SS0 = 0. Thus, we have

$$ \begin{aligned} P\{ N(t) = n\} & = P\{ N(t) \ge n\} - P\{ N(t) \ge n + 1\} \\ & = P\{ SS_{n} \le t\} - P\{ SS_{{n + 1}} \le t\} \\ & = G_{n} (t) - G_{{n + 1}} (t) \\ \end{aligned} $$

where Gn(t) is the convolution of the inter-arrival times F1, F2, F3, …, Fn. In other words,

$$ G_{n} (t) = P\{ F_{1} + F_{2} + \ldots . + F_{n} \le t\} $$

If the mean value of N(t) is defined as the renewal function M(t), then,

$$ \begin{aligned} M(t) & = E[N(t)] \\ & = \sum\limits_{{n = 1}}^{\infty } {P\{ N(t) \ge n\} } \\ & = \sum\limits_{{n = 1}}^{\infty } {P\{ SS_{n} \le t\} } \\ & = \sum\limits_{{n = 1}}^{\infty } {G_{n} (t)} . \\ \end{aligned} $$
(6.73)

The derivative of M(t) is known as the renewal density

$$ m(t) = M^{\prime } (t). $$

In renewal theory, random variables representing the inter-arrival distributions only assume non-negative values, and the Laplace transform of its distribution F1(t) is defined by

$$ {\mathfrak{L}}\{ F_{1} (s)\} = \int\limits_{0}^{\infty } {e^{{ - sx}} dF_{1} (x)} $$

Therefore,

$$ {\mathfrak{L}}F_{n} (s) = \int\limits_{0}^{\infty } {e^{{ - a^{{n - 1}} st}} dF_{1} (t)} = {\mathfrak{L}}F_{1} (\alpha ^{{n - 1}} s) $$

and

$$ \begin{aligned} {\mathfrak{L}}m_{n} (s) & = \sum\limits_{{n = 1}}^{\infty } { {\mathfrak{L}}G_{n} (s)} \\ & = \sum\limits_{{n = 1}}^{\infty } { {\mathfrak{L}}F_{1} (s) {\mathfrak{L}}F_{1} (\alpha s) \cdots \cdot {\mathfrak{L}}F_{1} (\alpha ^{{n - 1}} s)} \\ \end{aligned} $$

Since there is a one-to-one correspondence between distribution functions and its Laplace transform, it follows that.

Property 6.9

(Wang and Pham 1996): The first inter-arrival distribution of a quasi-renewal process uniquely determines its renewal function.

If the inter-arrival time represents the error-free time (time to first failure), a quasi-renewal process can be used to model reliability growth for both software and hardware.

Suppose that all faults of software have the same chance of being detected. If the inter-arrival time of a quasi-renewal process represents the error-free time of a software system, then the expected number of software faults in the time interval [0, t] can be defined by the renewal function, M(t), with parameter α > 1. Denoted by Mr(t), the number of remaining software faults at time t, it follows that

$$ M_{r} (t) = M(T_{c} ) - M(t), $$

where M(Tc) is the number of faults that will eventually be detected through a software lifecycle Tc.

6.3.4 Non-homogeneous Poisson Processes

The non-homogeneous Poisson process model (NHPP) that represents the number of failures experienced up to time t is a non-homogeneous Poisson process {N(t), t ≥ 0}. The main issue in the NHPP model is to determine an appropriate mean value function to denote the expected number of failures experienced up to a certain time (Pham 2006a).

With different assumptions, the model will end up with different functional forms of the mean value function. Note that in a renewal process, the exponential assumption for the inter-arrival time between failures is relaxed, and in the NHPP, the stationary assumption is relaxed.

The NHPP model is based on the following assumptions:

  • The failure process has an independent increment, i.e., the number of failures during the time interval (t, t + s) depends on the current time t and the length of time interval s, and does not depend on the past history of the process.

  • The failure rate of the process is given by

    $$ \begin{aligned} P\{ {\text{exactly}}\;{\text{one}}\;{\text{failure}}\;{\text{in}}(t,t + \Delta t)\} & = P\{ N(t + \Delta t) - N(t) = 1\} \\ & = \lambda (t)\Delta t + o(\Delta t) \\ \end{aligned} $$

where λ(t) is the intensity function.

  • During a small interval Δt, the probability of more than one failure is negligible, that is,

    $$ P\{ {\text{two}}\;{\text{or}}\;{\text{more}}\;{\text{failure}}\;{\text{in}}(t,t + \Delta t)\} = o(\Delta t) $$
  • The initial condition is N(0) = 0.

On the basis of these assumptions, the probability of exactly n failures occurring during the time interval (0, t) for the NHPP is given by

$$ \Pr \{ N(t) = n\} = \frac{{[m(t)]^{n} }}{{n!}}e^{{ - m(t)}} \quad n = 0,1,2, \ldots $$
(6.74)

where \(m(t) = E[N(t)] = \int\limits_{0}^{t} {\lambda (s)ds}\) and λ(t) is the intensity function. It can be easily shown that the mean value function m(t) is non-decreasing.

Reliability Function.

The reliability R(t), defined as the probability that there are no failures in the time interval (0, t), is given by

$$ \begin{aligned} R(t) & = P\{ N(t) = 0\} \\ & = e^{{ - m(t)}} \\ \end{aligned} $$

In general, the reliability R(x|t), the probability that there are no failures in the interval (t, t + x), is given by

$$ \begin{aligned} R(x|t) & = P\{ N(t + x) - N(t) = 0\} \\ & = e^{{ - [m(t + x) - m(t)]}} \\ \end{aligned} $$

and its density is given by

$$ f(x) = \lambda (t + x)e^{{ - [m(t + x) - m(t)]}} $$

where

$$ \lambda (x) = \frac{\partial }{{\partial x}}[m(x)] $$

The variance of the NHPP can be obtained as follows:

$$ Var[N(t)] = \int\limits_{0}^{t} {\lambda (s)ds} $$

and the auto-correlation function is given by

$$ \begin{aligned} Cor[s] & = E[N(t)]E[N(t + s) - N(t)] + E[N^{2} (t)] \\ & = \int\limits_{0}^{t} {\lambda (s)ds} \int\limits_{0}^{{t + s}} {\lambda (s)ds} + \int\limits_{0}^{t} {\lambda (s)ds} \\ & = \int\limits_{0}^{t} {\lambda (s)ds} \left[ {1 + \int\limits_{0}^{{t + s}} {\lambda (s)ds} } \right] \\ \end{aligned} $$
(6.75)

Example 6.15

Assume that the intensity λ is a random variable with the pdf f(λ). Then the probability of exactly n failures occurring during the time interval (0, t) is given by

$$ P\{ N(t) = n\} = \int\limits_{0}^{\infty } {e^{{ - \lambda t}} \frac{{(\lambda t)^{n} }}{{n!}}f(\lambda )d\lambda } . $$

It can be shown that if the pdf f(λ) is given as the following gamma density function with parameters k and m,

$$ f(\lambda ) = \frac{1}{{\Gamma (m)}}k^{m} \lambda ^{{m - 1}} e^{{ - k\lambda }} \quad {\text{for}}\;\lambda \ge 0 $$

then

$$ P\left( {N(t) = n} \right) = \left( \begin{gathered} n + m - 1 \\ n \\ \end{gathered} \right)\left[ {p(t)} \right]^{m} \left[ {q(t)} \right]^{n} \quad n = 0,1,2, \ldots $$
(6.76)

is also called a negative binomial density function, where

$$ p(t) = \frac{k}{{t + k}}\quad {\text{and}}\quad q(t) = \frac{t}{{t + k}} = 1 - p(t). $$
(6.77)

Thus,

$$ P\left( {N(t) = n} \right) = \left( \begin{gathered} n + m - 1 \\ n \\ \end{gathered} \right)\left( {\frac{k}{{t + k}}} \right)^{m} \left( {\frac{t}{{t + k}}} \right)^{n} \quad n = 0,1,2, \ldots $$
(6.78)

The reader interested in a deeper understanding of advanced probability theory and stochastic processes should note the following highly recommended books:

Devore, J. L., Probability and Statistics for Engineering and the Sciences, 3rd edition, Brooks/Cole Pub. Co., Pacific Grove, 1991.

Gnedenko, B. V and I. A. Ushakov, Probabilistic Reliability Engineering, Wiley, New York, 1995.

Feller, W., An Introduction to Probability Theory and Its Applications, 3rd edition,

Wiley, New York, 1994.

6.4 Problems

  1. 1.

    Calculate the reliability and MTTF of k-out-of-(2 k − 1) systems when d = 3,

    $$ \lambda _{1} = 0.0025/h,\;\;\lambda _{2} = 0.005/h,\;\;\lambda _{3} = 0.01/h\;{\text{and}}\;\mu _{1} = \mu _{2} = \mu _{3} = 0 $$

    where k = 1,2,3,4 and 5 for various time t. (Hints: using Eqs. (6.42) and (6.43)).

  2. 2.

    In a nuclear power plant there are five identical and statistically independent channels to monitor the radioactivity of air in the ventilation system with the aim of alerting reactor operators to the need for reactor shutdown when a dangerous level of radioactivity is present. When at least three channels register a dangerous level of radioactivity, the reactor automatically shuts down. Furthermore, each channel contains three identical sensors and when at least two sensors register a dangerous level of radioactivity, the channel registers the dangerous level of radioactivity. The failure rate of each sensor in any channel is 0.001 per day. However, the common-cause failure rate of all sensors in a channel is 0.0005 per day. Obtain the sensor reliability, channel reliability, and the entire system reliability for various time t.

  3. 3.

    A crucial system operates in a good state during an exponentially distributed time with expected value \(\frac{1}{\lambda }\) After leaving the good state, the system enters a degradation state. The system can still function properly in the degradation state during a fixed time a > 0, but a failure of the system occurs after this time. The system is inspected every T time units where T > a. It is replaced by a new one when the inspection reveals that the system is not in the good state.

    1. (a)

      What is the probability of having a replacement because of a system failure?

    2. (b)

      What is the expected time between two replacements?

  4. 4.

    A system consists of two independent components operating in parallel with a single repair facility where repair may be completed for a failed component before the other component has failed. Both the components are assumed to be functioning at time t = 0. When both components have failed, the system is considered to have failed and no recovery is possible. Assuming component i has the constant failure rate \(\lambda _{i}\) and repair rate \(\mu _{i}\) for i = 1 and 2. The system reliability function, R(t), is defined as the probability that the system continues to function throughout the interval (0, t).

    1. (a)

      Using the Eqs. (6.29) and (6.31) and the Laplace transform, derive the reliability function for the system. Obtain the system mean time to failure (MTTF)

    2. (b)

      Calculate (a) with \(\lambda _{1} = 0.003\) per hour, \(\lambda _{2} = 0.005\) per hour, \(\mu _{1} = 0.3\) per hour, \(\mu _{2} = 0.1\) per hour, and t = 25 h.

  5. 5.

    A system is composed of 20 identical active power supplies, at least 19 of the power supplies are required for the system to function. In other words, when 2 of the 20 power supplies fail, the system fails. When all 20 power supplies are operating, each has a constant failure rate \(\lambda _{a}\) per hour. If one power supply fails, each remaining power supply has a failure rate \(\lambda _{b}\) per hour where \(\lambda _{a} \le \lambda _{b} .\) We assume that a failed power supply can be repaired with a constant rate \(\mu\) per hour. The system reliability function, R(t), is defined as the probability that the system continues to function throughout the interval (0, t).

    1. (a)

      Determine the system mean time to failure (MTTF).

    2. (b)

      Given \(\lambda _{a} = 0.0005\), \(\lambda _{b} = 0.004\), and \(\mu = 0.5\), calculate the system MTTF.

  6. 6.

    A system is composed of 15 identical active power supplies, at least 14 of the power supplies are required for the system to function. In other words, when 2 of the 15 power supplies fail, the system fails. When all 15 power supplies are operating, each has a constant failure rate \(\lambda _{a}\) per hour. If one power supply fails, each remaining power supply has a failure rate \(\lambda _{b}\) per hour where \(\lambda _{a} \le \lambda _{b} .\) We assume that a failed power supply can be repaired with a constant rate \(\mu\) per hour. The system reliability function, R(t), is defined as the probability that the system continues to function throughout the interval (0, t).

    1. (a)

      Determine the system mean time to failure (MTTF).

    2. (b)

      Given \(\lambda _{a} = 0.0003\), \(\lambda _{b} = 0.005\), and \(\mu = 0.6\), calculate the system MTTF.

  7. 7.

    Events occur according to an NHPP in which the mean value function is m(t) = t3 + 3t2 + 6t t > 0.

    What is the probability that n events occur between times t = 10 and t = 15?

  8. 8.

    Show that the renewal function, M(t), can be written as follows:

$$ M(t) = F(t) + \int\limits_{0}^{t} {M(t - s)dF(s)} $$

where F(t) is the distribution function of the inter-arrival time or the renewal period.