Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Let \(S\) be a system that is designed for implementing certain function and is put into field operation at time \(t = 0\). It is desired that the system performs excellently in field use. To measure its performance naturally, we can use its reliability characteristics such as survival probability, mean lifetime, mean residual life, or hazard rate etc., as criteria. If, however, we look at this process dynamically, then we will have to consider whether the system will be still functioning at any given time \(t > 0\).

No matter how reliable the system is, it will fail sooner or later. So a problem is how to deal with failed system. Commonly, there are two ways to restore the failed system to operation. One is to repair the failed system if the system is repairable. Two types of repair methods are commonly studied in reliability literature. One method is called perfect or complete repair which repairs the failed system as good as new, i.e., the lifetime of the repaired system has the same distribution function as the original system \(S\). Of course, it is implicitly assumed that the lifetime of the repaired system is independent of that of the original system \(S\). The other type of repair method is named imperfect or incomplete repair with which the distribution function of the repaired system is not exactly the same as that of the original system. There is a special imperfect repair-minimal repair that repairs the failed system as good as it was prior to the failure of the original system. The other way to restore the failed system to operation is to replace it by a system that is iid as the original \(S\). Both perfect repair and replacement change the failed system to be as good as new, they are equivalent in this regard and so we will use them alternatively later.

In addition to different types of repair methods, another related problem is how we can know whether a system is failed or unfailed. With respect to this, systems can be classified into two categories. Systems in one category can be under continuously monitoring, and consequently their failures are self-announcing, whereas systems in the other category cannot be continuously monitored due to either technical difficulties or expensive costs and therefore their failures are not self-announcing. In this case, system failure can only be detected by applying inspections. Two inspection policies, calendar-based and age-based policy will be described in Sect. 3.

Maintenance policies of both repair/replacement and inspection will be considered. With the help of these maintenance policies, it becomes more meaningful to measure the likelihood for a system to be functioning at any given time \(t > 0\). To this purpose, we consider a system which can be in one of two states, namely ‘unfailed’ (or ‘up’) and ‘failed’ (or ‘down’). By ‘up,’ we mean the system is still functioning and by ‘down,’ we mean the system is not working. Suppose that the status of system in field use can be revealed through certain way. Depending on the status of the system, the unfailed system may be upgraded or modified, and the failed system will be repaired or replaced. Let a system (the original system) starts operation at time \(t = 0\) and works until certain maintenance measures, which includes but not limited to minimal repair, perfect repair, replacement, and many others, are going to take place. At this time, the first up period is over and the first down period begins. The first up and down periods constitute the first cycle of the system. At the end of each subsequent down period, a new cycle of the system will be completed and the system will resume operating, and so on and so forth. Let \(U_{j}\) and \(D_{j}\) denote the duration of the \(j\)th up and down periods, respectively. Basically \(U_{j}\) is the lifetime of the system after the \((j - 1)\)th down period, while \(D_{j}\) is the length of time required to finish the planned maintenances like repairing or replacement. For any time \(t \ge 0\), we can use binary random variable \(\xi (t)\) to indicate the status of the system, namely \(\xi (t) = 1\) meaning the system is unfailed or still working, and \(\xi (t) = 0\) meaning the system failed or is not working. The probability that the system is still working at time \(t\) called instantaneous availability is denoted as \(A(t) = P(\xi (t) = 1)\). This review will focus on the recent research works on \(A(t)\) and some other related quantities such as the steady-state availability \(A(\infty )\), the limiting average availability \(A_{av} (\infty )\) defined later. The focus of Sect. 2 is the availability of systems whose failures are self-announcing, and so there is no need of applying inspections. Section 3 reviews major works on availability of systems whose failure are not self-announcing and hence inspections are necessary. The last section will mention some other works on system availability. There is a great vast of papers contributed to the topic of system availability, so it is inevitable that our review may miss some meaningful works or even very significant ones. But the authors hope this chapter can provide readers an overview about the progresses made in recent years toward the very important topic of system availability.

2 Availability of System Under Continuously Monitoring

It is assumed in this section that the failure of the system is self-announcing and thus inspection policy will not be involved.

Usually it is a formidable work to give explicit formula of \(A(t)\) except for a few simple cases, so other measures have been proposed, and more attention is being paid to the limiting behavior of these quantities, i.e., engineers are more interested in the extent to which the system will be available after it has been run for a long time.

In the case, when \(\{ (U_{j} ,D_{j} ),j \ge 1\}\) consists of a sequence of iid random variables and \(U_{j}\) is independent of \(D_{j}\) for each \(j \ge 1\) (for convenience we will call this as the IID Model in the following), some desirable properties have been obtained using results from alternative renewal processes. For instance, it has been proved that \(A(t)\) is the unique solution of the renewal equation

$$A(t) = \overline{F} (t) + \int\limits_{0}^{t} {A(t - s){\text{d}}H(s),}$$

where \(H(t)\) is the convolution of \(F(t)\) and \(G(t)\) due to the assumed independence of \(U_{i}\) and \(D_{i}\), i.e., \(H(t) = \int_{0}^{t} F(t - x){\text{d}}G(x)\) for any \(t \ge 0\). The solution can actually be expressed explicitly as

$$A(t) = \left( {\bar{F}*\sum\limits_{n = 0}^{\infty } H^{(n)} } \right)(t),$$

where \(H^{(n)}\) is the n fold convolution of \(H\). However, in the most cases this equation does not help much. In the case, when both \(F\) and \(G\) have density functions \(f\) and \(g\) the function \(H\) also has density given be \(h(t) = H^{\prime}(t) = \int_{0}^{t} g(t - x)f(x){\text{d}}x\) and consequently \(A(t)\) is the unique solution of the renewal equation

$$A(t) = \overline{F} (t) + \int\limits_{0}^{t} {A(t - s)h(s){\text{d}}s.}$$

Moreover, as \(t \to \infty\) both the instantaneous and the average availability

$$\bar{A}(t) = \frac{1}{t}\int\limits_{0}^{t} {A(u){\text{d}}u}$$
(1)

converge to a common limit \({\mathbb{E}}(U)/[{\mathbb{E}}(U) + {\mathbb{E}}(D)]\) where \((U,D)\) is iid as \((U_{1} ,D_{1} )\). More details can be found in Barlow and Proschan (1975).

In addition, Takács (1957), Rényi (1957), Rise (1979), and Gut and Janson (1983) discussed the asymptotic normality property of \(A(t)\) for the IID Model.

Mi (1995) studies the case when \(\{ (U_{j} ,D_{j} ),j \ge 1\}\) are independent but not necessarily identically distributed. The concepts that a sequence of random variables or their CDFs are dominated by a function and the average availability in the first \(n\) cycles defined by the ratio of accumulated up time in the first \(n\) cycles to the total length of time in the \(n\) cycles

$$\bar{A}_{n} = \frac{{\sum\nolimits_{j = 1}^{n} {U_{j} } }}{{\sum\nolimits_{j = 1}^{n} {U_{j} } \sum\limits_{{}}^{n} {} + \sum\nolimits_{j = 1}^{n} {D_{j} } }}$$
(2)

were introduced there. Assuming that \(\{ (U_{j} ,D_{j} ),j \ge 1\}\) are dominated by a function and that

$$\frac{1}{n}\sum\limits_{j = 1}^{n} {\mathbb{E}}(U_{j} ) \to \mu ,\quad \frac{1}{n}\sum\limits_{j = 1}^{n} {\mathbb{E}}(D_{j} ) \to \nu \quad {\text{as}}\;\;n \to \infty$$
(3)

it was shown that

$$\mathop {\lim }\limits_{n \to \infty } \;\bar{A}_{n} = \frac{\mu }{\mu + \nu }\quad a.s.\;\;{\text{and}}\;\;L_{p}$$

and

$$A_{av} (\infty ) = \mathop {\lim }\limits_{t \to \infty } \;\bar{A}(t) = \frac{\mu }{\mu + \nu },\quad a.s.\;\;{\text{and}}\;\;L_{p} ,$$
(4)

where \(A_{av} (\infty )\) is called the limiting average availability.

Furthermore, under some additional mild conditions both \(\bar{A}_{n}\) and \(\bar{A}(t)\) are asymptotically normal as \(n \to \infty\) or \(t \to \infty\).

Assuming the IID Model, Sarkar and Chaudhuri (1999) found the Fourier transform \(\tilde{b}(z)\) of the derivative \(b(t)\) of unavailability \(B(t) = 1 - A(t)\) defined by

$$\tilde{b}(z) = \int\limits_{ - \infty }^{\infty } {{\text{e}}^{izu} b(u){\text{d}}u,}$$
(5)

where \(i = \sqrt{ - 1}\) is the imaginary unit. Then they defined function \(c_{u} (z) = {\text{e}}^{ - iuz} \tilde{b}(z)\) for any \(u > 0\). The function \(c_{u} (z)\) is analytic except at finite number of isolated singularities, say \(z_{j} ,1 \le j \le k\), and the authors further expressed \(b(u)\) as a sum of residues

$$b(u) = - i\sum\limits_{{j:Im(z_{j} ) < 0}} {\text{Res}}(c_{u} ,z_{j} ),$$
(6)

where \(Im(z_{j} )\) is the imaginary part of the complex number \(z_{j}\), and \(Im(z_{j} ) < 0\) means \(z_{j}\) locates in lower half of the complex plane. Finally the instantaneous availability \(A(t)\) was expressed in terms of the integral of \(b(u)\)

$$A(t) = 1 - \int\limits_{0}^{t} {b(u){\text{d}}u,\quad \forall t \ge 0}$$
(7)

In that paper, Fourier transformation is applied instead of Laplace transformation in order to avoid problem with inverting the Laplace transform of \(A(t)\). As an example, let the lifetime of the system have gamma distribution with density

$$f(t) = \frac{{\lambda^{\alpha } }}{{\Gamma (\alpha )}}{\text{e}}^{ - \lambda t} t^{\alpha - 1} ,\quad t > 0,$$
(8)

where \(\alpha\) is a positive integer, and let the repair time have exponential distribution with density

$$g(t) = \lambda {\text{e}}^{ - \lambda t} ,\quad t > 0.$$
(9)

Then \(A(t)\) is obtained as

$$A(t) = \frac{\alpha }{\alpha + 1} - \frac{1}{\alpha + 1}\sum\limits_{j = 1}^{\alpha } \theta_{j} {\text{e}}^{{ - \lambda (1 - \theta_{j} )t}} ,$$
(10)

where \(\theta_{0} = 1,\theta_{1} , \ldots ,\theta_{\alpha }\) are the \((\alpha + 1)\)-th roots of 1. That is, \(\theta_{j} = [\exp \{ i2\pi /(\alpha + 1)\} ]^{j}\).

Example 2.1

Suppose that \(T{ \sim }Gamma(4,\alpha )\) and \(D{ \sim }Gamma(2,\alpha )\).

In this case, we have

$$\begin{aligned} f(t) = & \frac{{\alpha^{4} }}{{\Gamma (4)}}{\text{e}}^{ - \alpha t} t^{3} ,\quad t > 0, \\ \tilde{f}(s) = & (1 - is/\alpha )^{ - 4} ,\quad - \infty < s < \infty , \\ g(t) = & \frac{{\alpha^{2} }}{{\Gamma (2)}}{\text{e}}^{ - \alpha t} t,\quad t > 0, \\ \tilde{g}(s) = & (1 - is/\alpha )^{ - 2} ,\quad - \infty < s < \infty , \\ c_{u} (z) = & {\text{e}}^{ - iuz} \tilde{f}(z)\frac{{1 - \tilde{g}(z)}}{{1 - \tilde{f}(z)\tilde{g}(z)}} \\ = & \frac{{{\text{e}}^{ - itz} \alpha^{4} }}{{7\alpha^{2} z^{2} + 6i\alpha^{3} z - 4i\alpha z^{3} - 3\alpha^{4} - z^{4} }}. \\ \end{aligned}$$

There are 4 singularities of \(c_{u} (z)\):

$$z_{1} = \frac{ - 3i + \sqrt{3}}{2}\alpha ,\quad z_{2} = \frac{ - 3i - \sqrt{3}}{2}\alpha \quad {\text{and}}\quad z_{3} = \frac{ - i + \sqrt{3}}{2}\alpha ,\quad z_{4} = \frac{ - i - \sqrt{3}}{2}\alpha .$$

We can calculate the residue at \(z = z_{1}\):

$$\text{Res}(c_{u} ,z_{1} ) = \mathop {\lim }\limits_{{z \to z_{1} }} (z - z_{1} )c_{u} (z) = \frac{{\alpha {\text{e}}^{ - 3u\alpha /2} {\text{e}}^{iu\alpha \sqrt{3}/2} }}{\sqrt{3} - 3i}.$$

The residues at \(z = z_{2} ,z_{3}\) and \(z_{4}\) can be calculated similarly. Thus, we have

$$b(u) = - i(\text{Res}(c_{u} ,z_{1} ) + \text{Res}(c_{u} ,z_{2} ) + \text{Res}(c_{u} ,z_{3} ) + \text{Res}(c_{u} ,z_{4} ))$$

and

$$\begin{aligned} A(t) = & 1 - \int_{0}^{t} b(u)du \\ = & \frac{2}{3} + \frac{{\sqrt{3} e^{ - \alpha t/2} \sin (\sqrt{3} \alpha t/2)}}{3} + \frac{{e^{ - 3\alpha t/2} \cos (\sqrt{3} \alpha t/2)}}{3}. \\ \end{aligned}$$

Without using Fourier or Laplace transformation, it seems not likely to obtain this expression through directly solving the renewal equation mentioned above.

Keeping the assumption of independence of all \(U_{j}\) and \(D_{j},\, j \ge 1\), Biswas and Sarkar (2000) modified the IID model as follows. A positive integer \(k\) is fixed in advance. At the \((k + 1)\) th failure of the system, either it is replaced by a new system that is iid to the original one and the replacement is finished instantly without taking any time (Model A), or it is perfectly repaired that takes time \(D_{k + 1}\) (Model B). Obviously, in either case, the system is brought back to a condition as good as new and so the time when \(D_{k + 1}\) ends is the renewal point. Afterward, the process will evolve in the same pattern. In other words, the two models allow \(k\) imperfect repairs before a complete repair or replacement that will bring the process to a renewal point. It is natural to further assume that

$$F_{1} \mathop \ge \limits^{\text{st}} F_{2} \mathop \ge \limits^{\text{st}} \cdots \mathop \ge \limits^{\text{st}} F_{k + 1}$$
(11)

and

$$G_{1} \mathop \le \limits^{\text{st}} G_{2} \mathop \le \limits^{\text{st}} \cdots \mathop \le \limits^{\text{st}} G_{k + 1}$$
(12)

This paper employed the same Fourier transformation approach in Sarkar and Chaudhuri (1999). Denote the instantaneous system availability as \(A_{j} (t)\) when at time \(t = 0\) the system with lifetime \(U_{j}\) and then again take the ending time of \(D_{k + 1}\) as the renewal point. The equations satisfied by the Fourier transforms of the derivatives \(b_{j} (t)\) of unavailability \(B_{j} (t) = 1 - A_{j} (t),1 \le j \le k + 1\) were derived for both Model A and Model B. Upon determination of \(\tilde{b}(u) \equiv \tilde{b}_{1} (u)\), the desired availability \(A(t) \equiv A_{1} (t)\) then can be obtained by (7). The explicit expression of \(A(t)\) were shown for the case of exponential lifetimes and repair times.

In the above studies at each system failure, it is deterministic that the failed system undergoes either perfect repair or imperfect repair. Brown and Proschan (1983) considered a model according to which a perfect repair is implemented with probability \(p\) and an imperfect repair, which is actually a minimal repair restoring the failed system to its condition just prior to failure, is performed with probability \(1 - p\) at each system failure. Their model has been generalized by Block et al. (1985) to the case in which the probability of perfect repair is state dependent. Lim et al. (1998) proposed the Bayesian imperfect repair model, according to which the probability of performing a perfect repair is a random variable \(P\) with distribution function \(\varPi (p)\) on \((0,1] ,\) and the probability of applying minimal repair is \(1 - P\) at each system failure. Cha and Kim (2001) examined the same model under the assumptions that the perfect repair times are iid, the minimal repair times are iid, and these times are independent of each other. Under these assumptions the steady-state system availability \(A(\infty )\) was derived as

$$A(\infty ) = \frac{{\int {\int_{0}^{\infty } \exp \{ - p\Lambda (t)\} {\text{d}}t{\text{d}}\Pi (p)} }}{{\int {\int_{0}^{\infty } \exp \{ - p\Lambda (t)\} {\text{d}}t{\text{d}}\Pi (p)} + \nu_{1} \int_{0}^{1} \frac{1 - p}{p}{\text{d}}\Pi (p) + \nu_{2} }},$$
(13)

where \(\Lambda (t) = \int_{0}^{t} \lambda (x){\text{d}}x\), \(\lambda (x)\) is the failure rate function of the system, \(\nu_{1}\) is the mean perfect repair time, and \(\nu_{2}\) is the mean minimal repair time.

In the special case of \(P = 1\) with probability one, that is, only perfect repair is performed at each system failure, this model is reduced to the IID one. Certainly in this case \(\nu_{2} = 0\) and so \(A(\infty )\) becomes

$$A(\infty ) = \frac{{\int_{0}^{\infty } \bar{F}(t){\text{d}}t}}{{\int_{0}^{\infty } \bar{F}(t){\text{d}}t + \nu_{1} }}$$
(14)

which is exactly the same as in the classic IID model since

$$\bar{F}(t) = \int_{0}^{t}\Lambda (x){\text{d}}x.$$
(15)

In the previous works on availability, only one type failure was taken into consideration, Cha et al. (2004) generalizes the study of Mi (1994) and considered repairable system with two types of failures: one is Type I failure (minor failure) that occurs with probability \(1 - p(t)\), where \(t\) is the age of the system at failure, the other is Type II failure (catastrophic failure, i.e., the usual failure) that occurs with probability \(p(t)\). The failed system with Type I failure can be restored to operation by a minimal repair, whereas the failed system with Type II failure can be restored to operation only by a perfect repair (or a replacement). This model is called the general failure model. The study on availability in Cha et al. (2004) combined burn-in policy \(b\) and age replacement \(T\) together and obtained the expression of the steady-state availability \(A(\infty )\) as follows:

Suppose that a new system is burned-in for time \(b\), and it will be put in field operation if it survives the burn-in. In the field use, the system is replaced by another system, which has also survived the same burn-in time \(b\), either at the use ‘‘age’’ \(T\) or at the time of the first Type II failure, whichever occurs first. However, for each Type I failure occurring during field use, only minimal repair will be performed.

It is further assumed that the repair times are not negligible. Let \(\nu_{1}\), \(\nu_{2}\), and \(\nu_{3}\) be the means of a minimal repair time, time for an unplanned replacement caused by the Type II failure, and time for a replacement done at the system field use age T by planned preventive maintenance policy, respectively. For technique reason, it is required that \(\int_{0}^{\infty } p(t)r(t){\text{d}}t = \infty\) where \(r(t)\) is the hazard rate function of the lifetime of a new system. Under these assumptions, then by similar arguments described in Cha and Kim (2002), it can be shown that the steady-state availability of the system under the policy \((b,T)\) is given by

$$A(\infty ) = \frac{{\int_{0}^{T} \bar{G}_{b} (t){\text{d}}t}}{{\int_{0}^{T} \bar{G}_{b} (t){\text{d}}t + \left[ {\int_{0}^{T} r(b + t)\bar{G}_{b} (t){\text{d}}t - G_{b} (T)} \right]\nu_{1} + G_{b} (T)\nu_{2} + \bar{G}_{b} (T)\nu_{3} }}$$
(16)

where

$$\bar{G}_{b} (t) = \exp \{ - \int\limits_{0}^{t} {p(b + x)r(b + x){\text{d}}x\} }$$
(17)

Letting \(b = 0\) and \(p(t) = 1,\forall t \ge 0\), we see that \(\bar{G}_{b} (t) = \bar{F}(t)\). It also implies that there is only perfect repair but no minimal repair and so \(\nu_{1} = 0\), and \(\nu_{2} = \nu_{3} \equiv \nu\). Thus \(A(\infty )\) is reduced to

$$A(\infty ) = \frac{{\int_{0}^{T} \bar{F}(t){\text{d}}t}}{{\int_{0}^{T} \bar{F}(t){\text{d}}t + \nu }}.$$
(18)

If further let the age replacement policy \(T = \infty\), that is replacement can only take place at system failure, then finally \(A(\infty )\) is obtained as

$$A(\infty ) = \frac{{\int_{0}^{\infty } \bar{F}(t){\text{d}}t}}{{\int_{0}^{\infty } \bar{F}(t){\text{d}}t + \nu }} = \frac{\mu }{\mu + \nu }$$
(19)

which is exactly the result in the case of the IID Model.

Mi (2006a, b) reconsidered the system with nonidentical lifetime distributions and nonidentical repair time distributions studied in Mi (1995). Let \(U_{j}\) and \(D_{j}\) have distribution functions \(F_{j}\) and \(G_{j} ,\) respectively, for each \(j \ge 1\). Assuming that both sequences \(\{ U_{j} \} ,j \ge 1\) and \(\{ D_{j} ,j \ge 1\}\) are dominated, there exist two CDFs \(F\) and \(G\) such that \(F_{j} \to F\) and \(G_{j} \to G\) in distribution as \(j \to \infty\), and some other technical requirements, Mi (2006a, b) gave three sets of conditions under which the steady-state availability \(A(\infty )\) exists and is given by

$$A(\infty ) = \frac{\mu }{\mu + \nu }$$

where

$$\mu = \int\limits_{0}^{\infty } {\bar{F}(t){\text{d}}t,} \quad \nu = \int\limits_{0}^{\infty } {\bar{G}(t){\text{d}}t}$$
(20)

Moreover, it was shown there that if there exists an integer \(k \ge 0\) such that \(F_{\text{nk} + j} (t) = F_{j} (t)\), \(G_{\text{nk} + j} (t) = G_{j} (t)\), for any \(1 \le j \le k\), \(t \ge 0\), then it holds that

$$A(\infty ) = \frac{{\sum\nolimits_{j = 1}^{k} {\mu_{j} } }}{{\sum\nolimits_{j = 1}^{k} {\mu_{j} } + \sum\nolimits_{j = 1}^{k} {\nu_{j} } }},$$
(21)

where \(\mu_{j}\) and \(\nu_{j}\) are the means associated with \(F_{j}\) and \(G_{j} ,\quad 1 \le j \le k\). Clearly, the results of both Model A and Model B discussed in Biswas and Sarkar (2000) can be obtained as special cases of this result in Mi (2006a, b).

In the models reviewed above, there is no spare system on cold standby and there is only one repair facility so failed system can be placed for repairing without any waiting time. However, in the model considered in Sarkar and Li (2006) in addition to the original system, there are \(s \ge 1\) identical spares remain on cold standby, and there are \(r \ge 1\) repair facilities which serves the failed systems in the order in which they join the repair queue. The lifetimes of the original system and the \(s\) spares are iid; the repair times of the \(r\) repair facilities are also iid; further, these lifetimes and repair times are independent of each other.

At time \(t = 0 ,\) the original system is put on operation and at its failure one spare is placed on operation immediately without taking any time and the failed system is sent for repairing. In general, at the instant of failure of an operating system, the failed system always joins the repair queue and its repair starts as soon as one of the repair facilities is free, in the mean time one spare, if available, is placed to operation immediately without taking any time. If, however, at the failure of an operating system, there is no any spare available, that is all the \((s + 1)\) systems are either undergoing or awaiting repair, then the entire system enter the down state. It is obvious that \(r \le s + 1\) since otherwise at any time, there are always some repair facilities remain idle.

Let the original system be supported by \(r \ge 1\) repair facilities and \(s \ge r - 1\) spare systems. Assuming that the lifetime distribution is exponential with mean \(\alpha^{ - 1}\) and repair time distribution is exponential with mean \(\beta^{ - 1}\), the authors derived the limiting average availability as

$$A_{av} (\infty ) = \frac{{r\rho \sum\nolimits_{j = 0}^{s} {\upgamma_{j} \rho^{s - j} } }}{{1 + r\rho \sum\nolimits_{j = 0}^{s} {\upgamma_{j} \rho^{s - j} } }},$$
(22)

where \(\rho = \beta /\alpha\) and

$$\gamma_{j} = \left\{ {\begin{array}{*{20}l} {\frac{{r!r^{s - r} }}{j!},} \hfill & {\quad j = 0,1, \ldots ,r - 1} \hfill \\ {r^{s - j} ,} \hfill & {\quad j = r,r + 1, \ldots ,s.\quad } \hfill \\ \end{array} } \right.$$
(23)

In a more general case, if again there are at least one repair facilities (\(r \ge 1\)), repair time has exponential distribution with mean \(\beta^{ - 1}\), but the number of spare systems satisfies \(s \ge \hbox{max} \{ 1,r - 1\}\), and the lifetime distribution of systems has density and is arbitrary other than this. Based on these assumptions, the limiting average availability was obtained as

$$A_{\text{av}} (\infty ) = \frac{{\mu (0, \ldots ,0,1)(I - Q)^{ - 1} (1, \ldots ,1)^{\prime}}}{{\mu (0, \ldots ,0,1)(I - Q)^{ - 1} (1, \ldots ,1)^{\prime} + (r\beta )^{ - 1} }},$$
(24)

where \(\mu\) denotes the mean system lifetime, \((1, \ldots ,1)^{\prime}\) is a \(s \times 1\) column vector with all components of 1, and the \(s \times s\) matrix \(Q\) can be determined by some equations given in Sarkar and Li (2006).

Sarkar and Biswas (2010) employed the same Fourier transformation approach proposed in Sarkar and Chaudhuri (1999) to the model studied in Sarkar and Li (2006). Keeping the same assumption of the exponential system lifetimes and repair times, the authors expressed the instantaneous availability \(A(t)\) as

$$A(t) = 1 - \int\limits_{0}^{t} {b_{0} (u){\text{d}}u}$$

for the case of \(s \ge 1\) and \(r = 1\) and \(r = 2\), where the function \(b_{0} (u)\) is the derivative of \(B_{0} (u)\), and \(B_{0} (u)\) denotes the unavailability of the system at time \(u > 0\) when there is no failure of spares. Actually, \(A(t) = 1 - B_{0} (t)\). It turns out that \(b_{0} (u)\) is the sum of residues of a complex-valued function that is analytic except finite number of isolated singularities. For details, the readers are referred to the Appendix of Sarkar and Biswas (2010).

At the end of this section recall that usually it is difficult to obtain a closed-form expression for \(A(t)\) as mentioned before. As a matter of fact, the behavior of \(A(t)\) can also be very complicate as shown in the following example.

Example 2.2

Consider a system that has \(U{ \sim }Gamma(p,\alpha )\) and \(D{ \sim }\ln {\mathcal{N}}(\mu ,\sigma )\) with density functions

$$f(t) = \frac{{\alpha^{p} }}{{\Gamma (p)}}e^{ - \alpha t} t^{p - 1} \quad {\text{and}}\quad g(t) = \frac{1}{{t\sigma \sqrt{2}\pi}}e^{{ - \frac{{(\ln t - \mu )^{2} }}{{2\sigma^{2} }}}} .$$

In the following figure, the left panel shows the system availability functions \(A(t)\) corresponding to different parameters \(p = 2,4,8\) and \(p = 10\). The right panel shows the availability functions with different parameters \(\sigma = 0.25,0.50\) and \(\sigma = 0.75\). The function \(A(t)\) for all these cases does not have closed form and thus are obtained numerically.

3 Availability of System with Inspections

In this section, we will review research works on availability of systems that can be maintained through inspections. Inspection policy was proposed in Barlow and Proschan (1975) or even earlier. Inspections are important for systems whose failures are not self-announcing. This type of systems is common in industries. For instance, some industrial safety and protection system such as circuit breakers, fire detectors, gas detectors, pressure detectors, and safety valves are installed to prevent various specific risks. Depending on the status of the system being inspected the system will be repaired, replaced, upgraded, or modified. The system then will be restored to operation upon completion of these maintenances.

Two types of inspection policies are widely applied in practice. The first type called calendar-based inspection policy schedules inspections at fixed calendar intervals, say at times \(\tau ,2\tau , \ldots ,\) where \(\tau > 0\) is a predetermined constant. This policy is also named as periodic inspection policy. According to the calendar-based inspection policy, a system starting its operation at time \(t = 0\) is inspected at time \(t = \tau\), then at time \(t = 2\tau\) and so on.

The second type of inspection polity, the age-based inspection policy schedules inspections at fixed age intervals. Suppose that constant \(\tau > 0\) is determined in advance. Let the system be inspected at time \(t = \tau\) and resume operation at time \(\tau + m ,\) where \(m\) represents the required time to complete the above-mentioned maintenances. According to the age-based inspection policy, the system will be inspected at time \(t = \tau + m\) and this pattern will be continued in the same way.

Much has been done in studying availability of systems that are maintained through inspection. For example, Wortman et al. (1994), Wortman and Klutke (1994), Yeh (1995), Klutke et al. (1996), Dieulle (1999), Vaurio (1999), Ito and Nakagawa (2000), Chelbi and Ait-Kadi (2000), Yang and Klutke (2000, 2001) and Yadavalli et al. (2002), among others. But we will focus on the following papers.

Sarkar and Sarkar (2000) studied two models: Model A and Model B. In both models the periodic inspection policy is applied and a failed system is repaired as good as new (i.e., the repair is complete or perfect), and the repair takes constant time \(\nu \in [0,\tau ]\).

Specifically, under Model A an unfailed system found by inspection is considered as good as new. That is, necessary actions such as upgrading or modifying are taken to make the unfailed system as good as new. This is equivalent to an instantaneous perfect repair and automatically holds if the lifetime distribution of the system is exponential due to its memoryless property; whereas, a failed system revealed by inspection is completely repaired or replaced by an iid system under Model A. Thereafter, the completely repaired/replaced system is immediately restored to operation. Model A extends the case of instantaneous repair with \(\nu = 0\) in Høyland and Rausand (1994).

On the other hand, under Model B an unfailed system continues its operation without any intervention, i.e., the system remains as good as it is; a failed system will undergo perfect repair or replacement as under Model A, but the operation of the repaired system will start at the next scheduled inspection time after the repair/replacement, not immediately which is different from Model A.

To display the results in Sarkar and Sarkar (2000) we denote the life time of a given system starting operation at time \(t = 0\) as \(U\), the distribution function of \(U\) as \(F( \cdot )\). This notation will be kept in the rest of this paper.

For Model A with constant repair/replacement time \(0 \le \nu \le \tau\) the availability \(A(k\tau )\) is given as

$$A(k\tau ) = \frac{{[\bar{F}(\tau ) - \bar{F}(\tau - \nu )]^{k} F(\tau ) + \bar{F}(\tau - \nu )}}{{F(\tau ) + \bar{F}(\tau - \nu )}}$$
(25)

Based on it the instantaneous availability \(A(t)\) is given as

$$A(t) = \left\{ {\begin{array}{*{20}l} {\bar{F}(t),} \hfill & {{\text{if}}\quad 0 \le t \le \tau ;\quad } \hfill \\ {\bar{F}(\tau )\bar{F}(t - \tau ),} \hfill & {{\text{if}}\quad \tau < t < \tau + \nu ;\quad } \hfill \\ {\begin{array}{*{20}l} {A(k\tau )\bar{F}(t - k\tau )\quad } \hfill \\ { + [1 - A(k\tau )]\bar{F}(t - k\tau - \nu )} \hfill \\ \end{array} ,} \hfill & {\begin{array}{*{20}l} {{\text{if}}\quad k\tau + \nu \le t < (k + 1)\tau + \nu ,\quad } \hfill \\ {\quad \quad k = 1,2, \ldots } \hfill \\ \end{array} } \hfill \\ \end{array} } \right.$$
(26)

It is easy to see that when \(\nu = 0\) the expression of \(A(t)\) has the form

$$A(t) = \left\{ {\begin{array}{*{20}l} {\bar{F}(\tau ),} \hfill & {{\text{if}}\quad 0 \le t \le \tau \quad } \hfill \\ {\bar{F}(t - k\tau ) = \bar{F}\left( {t - \left\lfloor {\frac{t}{\tau }} \right\rfloor \tau } \right)} \hfill & {\begin{array}{*{20}l} {{\text{if}}\quad k\tau \le t < (k + 1)\tau ,} \hfill \\ {\quad \quad k = 1,2, \ldots } \hfill \\ \end{array} } \hfill \\ \end{array} } \right.$$
(27)

where \(\left\lfloor x \right\rfloor\) is the largest integer part of \(x\). This is exactly the result in Høyland and Rausand (1994). For the same Model A, the limiting average availability of the system is

$$A_{av} (\infty ) = \tau^{ - 1} \left\{ {\phi [H(\tau + \nu ) - H(\nu )] + (1 - \phi )H(\tau )} \right\}$$
(28)

where

$$\phi = \frac{{\bar{F}(\tau - \nu )}}{{\bar{F}(\tau - \nu ) + F(\tau )}}$$
(29)

and

$$H(t) = \int\limits_{0}^{t} {\bar{F}(x){\text{d}}x}$$
(30)

For Model B, in the case of \(\nu = 0\), the instantaneous availability \(A(t)\) is given as

$$A(t) = \sum\limits_{j = 0}^{k} c_{j} \bar{F}(t - j\tau ),\quad {\text{if}}\quad k\tau \le t < (k + 1)\tau ,\;\;k = 0,1, \ldots$$
(31)

where \(c_{0} = 1\) and \(c_{j} ,j \ge 1\) are determined recursively by

$$c_{j} = \sum\limits_{i = 0}^{j} p_{i} c_{j - i}$$

with \(p_{i} = F(i\tau ) - F((i - 1)\tau )\).

In addition, the limiting average availability of the system is

$$A_{\text{av}} (\infty ) = \frac{{{\mathbb{E}}(U)}}{{\tau {\mathbb{E}}\left( {\left\lceil {\frac{U}{\tau }} \right\rceil } \right)}}$$
(32)

where \(\left\lceil x \right\rceil\) is the smallest integer satisfying \(\left\lceil x \right\rceil \ge x\).

In the case of \(\nu > 0\), without loss of generality it can be assumed that \(\nu = m\tau\) for some integer \(m \ge 1\). This holds because under Model B the failed system is restored to operation only at the next scheduled inspection time following its perfect repair or replacement; that is, if repair/replacement is completed during the time interval \(((m - 1)\tau ,m\tau ]\), then the repaired/replaced system is restored to operation at time \(m\tau\).

In the case of \(\nu > 0\) suppose \(\nu = m\tau\) for an integer \(m \ge 1\), it wan shown that

$$A(t) = \left\{ {\begin{array}{*{20}l} {\bar{F}(t)} \hfill & {{\text{if}}\quad 0 \le t < (m + 1)\tau } \hfill \\ {\sum\limits_{j = 0}^{k + m} d_{j} \bar{F}(t - j\tau ),} \hfill & {{\text{if}}\quad (k + m)\tau \le t < (k + m + 1)\tau ,\;\;k = 1,2, \ldots } \hfill \\ \end{array} } \right.$$
(33)

where \(d_{0} = 1\), \(d_{1} = d_{2} = \cdots = d_{m} = 0\), and \(d_{j} ,\quad j \ge m\) is determined by

$$d_{j} = \sum\limits_{i = 1}^{j} q_{i} d_{j - i} ,\quad \forall j \ge m + 1$$

with \(q_{1} = q_{2} = \cdots = q_{m} = 0\) and \(q_{m + i} = F(i\tau ) - F((i - 1)\tau ),\quad \forall i \ge 1\). Moreover, the limiting average availability of the system is

$$A_{av} (\infty ) = \frac{{{\mathbb{E}}(U)}}{{\tau {\mathbb{E}}\left( {m + \left\lceil {\frac{U}{\tau }} \right\rceil } \right)}} = \frac{{{\mathbb{E}}(U)}}{{\nu + {\tau {\mathbb{E}}}\left( {\left\lceil {\frac{U}{\tau }} \right\rceil } \right)}}$$
(34)

Example 3.1

Consider Model A in Sarkar and Sarkar (2000). The system availability \(A(t)\) is determined by (25) and (26).

Specifically, let \(\overline{F} (t) = {\text{e}}^{ - t}\) for all \(t \ge 0\), \(\nu = \ln 2\) and \(\tau = 2\ln 2\). According to (25)

$$\begin{aligned} A_{k} \equiv & A(2k\ln 2) = \frac{{[{\text{e}}^{ - 2\ln 2} - {\text{e}}^{ - \ln 2} ]^{k} (1 - {\text{e}}^{ - 2\ln 2} ) + {\text{e}}^{ - \ln 2} }}{{(1 - {\text{e}}^{ - 2\ln 2} ) + {\text{e}}^{ - \ln 2} }} \\ = & \frac{4}{5}\left[ {( - 1)^{k} \frac{3}{{4^{k + 1} }} + \frac{1}{2}} \right] = \frac{1}{5}\left[ {( - 1)^{k} \frac{3}{{4^{k} }} + 2} \right],\quad k = 1,2, \ldots . \\ \end{aligned}$$

From (26) it follows that

$$A(t) = \left\{ {\begin{array}{*{20}l} {e^{ - t} } \hfill & {{\text{if}}\quad 0 \le t \le 3\ln 2} \hfill \\ {} \hfill & {} \hfill \\ {2^{2k} {\text{e}}^{ - t} (2 - A_{k} ),} \hfill & {{\text{if}}\quad (2k + 1)\ln 2 \le t < (2k + 3)\ln 2,} \hfill \\ {} \hfill & {\quad \quad k = 1,2, \ldots } \hfill \\ \end{array} } \right.$$

Obviously \(A(t)\) does not converge as \(t \to \infty\) but the limiting average availability \(A_{av} (\infty )\) exists and is given by (28). We have

$$\begin{aligned} \phi = & \frac{{e^{ - \ln 2} }}{{{\text{e}}^{ - \ln 2} + (1 - {\text{e}}^{ - 2\ln 2} )}} = \frac{2}{5} \\ H(t) = & \int\limits_{0}^{t} {{\text{e}}^{ - x} {\text{d}}x = 1 - {\text{e}}^{ - t} } \\ \end{aligned}$$

and

$$\begin{aligned} A_{av} (\infty ) = & \tau^{ - 1} [\phi [H(\tau + \nu ) - H(\nu )] + (1 - \phi )H(\tau )] \\ = & (2\ln 2)^{ - 1} \left[ {\frac{2}{5}[{\text{e}}^{ - \ln 2} - {\text{e}}^{ - 3\ln 2} ] + \frac{3}{5}(1 - {\text{e}}^{ - \ln 4} )} \right] \\ = & \, = \frac{3}{5}(2\ln 2)^{ - 1} \approx 0.4328. \\ \end{aligned}$$

Mi (2002) discussed a model which is similar to Model B with \(\nu = 0\) in Sarkar and Sarkar (2000) except that the system will undergo complete repair or replacement at time \(\eta \tau\) regardless whether the system is failed or unfailed from the result of inspection, where \(\eta\) is either an integer or \(\eta = \infty\). Under this assumption, the limiting average availability of the system was derived as

$$A_{av} (\infty ) = \frac{{\int_{0}^{\eta \tau } \bar{F}(x){\text{d}}x}}{{\uptau\sum\limits_{k = 0}^{\eta - 1} \bar{F}(k\tau )}}.$$
(35)

Note that assuming \(\nu = 0\) then the limiting average availability of the system under Model B in Sarkar and Sarkar (2000) is the special case of \(\eta = \infty\) in Mi (2002). As a matter of fact, we have

$${\mathbb{E}}\left( {\left\lceil {\frac{U}{\tau }} \right\rceil } \right) = \sum\limits_{k = 1}^{\infty } \int_{(k - 1)\tau }^{k\tau } \left\lceil {\frac{x}{\tau }} \right\rceil {\text{d}}F(x) = \sum\limits_{k - 1}^{\infty } \int_{(k - 1)\tau }^{k\tau } kdF(x)$$
$$= \sum\limits_{k - 1}^{\infty } k\left\{ {\bar{F}(k - 1)\tau - \bar{F}(k\tau )} \right\} = \sum\limits_{k - 1}^{\infty } \bar{F}(k\tau )$$
(36)

Example 3.2

Let the system lifetime distribution be exponential \(F(t) = 1 - exp( - \lambda t)\). The following figures show the limiting average availabilities obtained from (35) at fixed \(\lambda\) (left panel) and at fixed \(\tau\) (right panel).

In the model studied by Sarkar and Sarkar (2001), the system in application is periodically inspected with inspection interval \(\tau > 0\) and is supported by an iid spare system which is in cold standby. At time \(t = \tau ,\) the spare system takes over the operation no matter whether the status of the inspected system is failed or unfailed. If the inspection found the inspected system failed then it is sent for repair/replacement; otherwise, it is upgraded. Both the repair and upgrade are perfect meaning that repaired or upgraded system becomes as good as new. At time \(t = 2\tau ,\) the inspection is performed again. The system being repaired/replaced or upgraded at time \(t = \tau\) will take over operation if the repair/replacement or upgrade has been completed before \(t = 2\tau\), and the system inspected at \(t = 2\tau\) will undergo either repair/replacement or upgrade; otherwise, the inspection will be suspended and only after completion of repair/replacement or upgrade the repaired or upgraded system will take over the operation at the next scheduled inspection time. Denote the random time needed for repair/replacement and upgrade as \(Y_{r}\) and \(Y_{w}\), respectively. It can be assumed that the random times \(Y_{r}\) and \(Y_{w}\) have \(\{ \tau ,2\tau , \ldots \}\) as their support sets because of the assumption about the inspection policy. Furthermore, it is assumed naturally that \(Y_{w} \mathop \le \limits^{st} Y_{r}\).

Let \(P(Y_{w} = i\tau ) = p_{i}\) and \(P(Y_{r} = i\tau ) = q_{i}\) for \(i \ge 1\). Under their model, the authors obtain the instantaneous system availability \(A(t)\) as follows:

$$A(t) = \left\{ {\begin{array}{*{20}l} {\bar{F}(t),} \hfill & {\quad {\text{if}}\quad 0 \le t < \tau ;} \hfill \\ {\bar{F}(t - \tau ),} \hfill & {\quad {\text{if}}\quad \tau \le t < 2\tau ;} \hfill \\ {A_{1} (t - \tau ),} \hfill & {\quad {\text{if}}\quad 2\tau \le t < \infty .} \hfill \\ \end{array} } \right.$$
(37)

Where \(A_{1} (s) = {\mathbb{P}}(\xi (s + \tau ) = 1)\) and has the form

$$A_{1} (t) = \sum\limits_{j = 0}^{k} w_{kj} \bar{F}(s - j\tau ),\quad {\text{for}}\;\;k\tau \le s < (k + 1)\tau ,\;\;k = 0,1, \cdots$$
(38)

with \(w_{kj}\) determined by (2.1a), (2.1b) of Sarkar and Sarkar (2001). Based on the expression of \(A(t)\), the limiting average availability is also obtained as

$$A_{\text{av}} (\infty ) = \frac{{\sum\nolimits_{i = 0}^{\infty } {(1 - R_{i} )[H((i + 1)\tau ) - H(i\tau )]} }}{{\tau \sum\nolimits_{i = 0}^{\infty } {(1 - R_{i} )} }}$$
(39)

where \(R_{0} = 0\),

$$R_{i} = \sum\limits_{k = 1}^{i} [\bar{F}(\tau )p_{k} + F(\tau )q_{k} ],\quad \forall i \ge 1.$$
(40)

and

$$H(t) = \int\limits_{0}^{t} {\bar{F}(x){\text{d}}x}$$
(41)

Cui and Xie (2001) investigated two models: Model A and Model B similar to those of Sarkar and Sarkar (2000). But the models in Cui and Xie (2001) allow a perfect repair or a replacement that takes a random time \(Y\) with distribution \(G(y)\) and density function \(g(y)\). Special case when \(Y\) is a constant \(\nu\) that was assumed in Sarkar and Sarkar (2000) is also discussed in Cui and Xie (2001).

Using a random walk approach, Cui and Xie (2001) established the relationship between the random walk in a plane and the periodically inspected system. Based on this relationship, both the explicit and recursive formulas of \(A(t)\) for the case of constant perfect repair time or replacement time were displayed for the two models. In the case of random time \(Y\), the recursive formula of \(A(t)\) for the two models were obtained too. All these expressions and formulas of \(A(t)\) are complicate, so the readers are referred to their paper for details.

The model proposed in Biswas et al. (2003) assumed that for \(1 \le i \le h\), where \(h \ge 1\) is a given integer, at the time the \(i\)th system failure is detected by inspection, the failed system will undergo an incomplete repair and the lifetime distribution of the repaired system may not be the same as \(F\), the lifetime distribution of the original system. At the time, when the \((h + 1)\)th system failure is detected, the failed system will be repaired perfectly or replaced by one whose lifetime is iid as that of the original system. Here, the times needed for performing incomplete repair, perfect repair, and replacement can be either constants or random variables. It is also assumed that the repaired system will be restored to operation at the next scheduled inspection time but not immediately. As a consequence of this assumption, the support set of the random perfect repair time, replacement time, incomplete repair time can be limited on the set \(\{\uptau,2\uptau, \ldots \}\), and constant repair/replacement times can be limited to multiples of \(\uptau\).

Under these assumptions, both the instantaneous system availability \(A(t)\) and limiting average availability are obtained when there is only a single incomplete repair, i.e., \(h = 1\). The expression of \(A(t)\) is complicate, so we display only the limiting average availability \(A_{\text{av}} (\infty )\). In the deterministic case, that is repair/replacement times are constant, \(A_{\text{av}} (\infty )\) is given as

$$A_{\text{av}} (\infty ) = \frac{{{\mathbb{E}}(U_{1} + U_{2} )}}{{\tau \left[ {{\mathbb{E}}\left( {\left\lceil {\frac{{U_{1} }}{\tau }} \right\rceil } \right) + {\mathbb{E}}\left( {\left\lceil {\frac{{U_{2} }}{\tau }} \right\rceil } \right) + m_{1} + m_{2} } \right]}},$$
(42)

where \(U_{1}\) is the lifetime of the original system and \(U_{2}\) is the lifetime of the system upon completion of the first incomplete repair, constants \(m_{1} \tau\) and \(m_{2} \tau\) are the required times for perfect repair (or replacement) and incomplete repair, respectively.

In the stochastic case, denote the required times of perfect repair and incomplete repair as random variables \(D_{1}\) and \(D_{2} ,\) respectively. Then the limiting average availability is

$$A_{\text{av}} (\infty ) = \frac{{{\mathbb{E}}(U_{1} + U_{2} )}}{{\tau \left[ {{\mathbb{E}}\left( {\left\lceil {\frac{{U_{1} }}{\tau }} \right\rceil } \right) + {\mathbb{E}}\left( {\left\lceil {\frac{{U_{2} }}{\tau }} \right\rceil } \right) + {\mathbb{E}}(D_{1} ) + {\mathbb{E}}(D_{2} )} \right]}}.$$
(43)

Obviously, if \({\mathbb{P}}(D_{i} = m_{i} \tau ) = 1,\quad i = 1,2\) for two integers \(m_{1}\) and \(m_{2}\), then Eq. (43) becomes (42).

Note that the expression of \(A_{\text{av}} (\infty )\) shown in (43) becomes that one in Sarkar and Sarkar (2000) if \(U_{1} \mathop = \limits^{st} U_{2}\) and \(m_{1} = m_{2}\) which is equivalent to the case \(h = 0\).

Consider the case of single incomplete repair (\(h = 1\)) but the repair/replacement times are random. This time the expressions for \(A(t)\) and \(A_{av} (\infty )\) were also derived in Biswas et al. (2003). For example,

$$A_{\text{av}} (\infty ) = \frac{{{\mathbb{E}}(U_{1} + U_{2} )}}{{\left[ {{\mathbb{E}}\left( {\left\lceil {\frac{{U_{1} }}{\tau }} \right\rceil } \right) + {\mathbb{E}}\left( {\left\lceil {\frac{{U_{2} }}{\tau }} \right\rceil } \right) + {\mathbb{E}}(D_{1} ) + {\mathbb{E}}(D_{2} )} \right]}},$$

where \(D_{1}\) represents the random time required for an incomplete repair and \(D_{2}\) represents the random time required for a perfect repair. Clearly, if \(P(D_{i} = m_{i} \tau ) = 1,\quad i = 1,2\), then (43) is reduced to (42).

Cui and Xie (2005) assumed the following: failed system found by inspection is completely repaired or replaced; the times required for a perfect repair or replacement is either a constant \(\nu \ge 0\) or a random variable \(Y\) which has distribution \(G(y)\) and density function \(g(y)\), the repaired or replaced system is restored to operation immediately, i.e., it does not take any time, and treating the time at the completion of the perfect repair or replacement as a new starting point then the periodic inspection will be resumed. In their Model A, it is also assumed that unfailed system determined by inspection is upgraded as good as new, whereas in Model B unfailed system continues operation without any intervention.

Under Model A with constant repair/replacement time \(\nu\), the instantaneous system availability \(A(t)\) is determined recursively as follows

$$A(t) = \left\{ {\begin{array}{*{20}l} {\bar{F}(t),} \hfill & {\quad {\text{if}}\quad 0 \le t \le \tau } \hfill \\ {\bar{F}(t - \tau ),} \hfill & {\quad {\text{if}}\quad\uptau < t < \tau + \nu } \hfill \\ {\bar{F}(t)A(t - \tau ) + F(\uptau)A(t - \tau - \nu ),} \hfill & {\quad {\text{if}}\quad t \ge \tau + \nu } \hfill \\ \end{array} } \right.$$
(44)

From this recursive equation, it was shown there that the limit of \(A(t)\) as \(t \to \infty\) does not exist if \(\nu = 0\) or \(\tau /\nu\) is a rational number when \(\nu > 0\), i.e., the steady-state availability does not exist. If the repair/replacement time is random, \(A(t)\) can be determined by two different recursive equations:

$$\begin{aligned} A(t) = & (\bar{F}(\tau ))^{{\left\lceil {t/\tau } \right\rceil - 1}} \bar{F}\left( {t - \left( {\left\lceil {\frac{t}{\tau }} \right\rceil - 1} \right)\tau } \right) \\ & + \sum\limits_{i = 1}^{{\left\lfloor {t/\tau } \right\rfloor }} (\bar{F}(\tau ))^{i - 1} F(\tau )\int\limits_{0}^{t - i\tau } {A(t - y - i\tau )g(y){\text{d}}y} \\ \end{aligned}$$
(45)

and

$$A(t) = \bar{F}(\tau )A(t - \tau ) + F(\tau )\int\limits_{0}^{{t -\uptau}} {A(t - y - \tau )g(y){\text{d}}y}$$
(46)

Moreover, in this case the steady-state availability and consequently the limiting average availability exist and are given as

$$A(\infty ) = A_{\text{av}} (\infty ) = \frac{{\tau - \int_{0}^{\uptau} F(x){\text{d}}x}}{{\tau + F(\tau )\int_{0}^{\infty } \bar{G}(y){\text{d}}y}}.$$
(47)

For Model B with constant repair/replacement time \(\nu\), the instantaneous availability \(A(t)\) is given in a recursive way

$$A(t) = \bar{F}(t) + \sum\limits_{i = 1}^{{\left\lfloor {(t - \nu )/\tau } \right\rfloor }} A(t - i\tau - \nu )[F(i\tau ) - F((i - 1)\tau )]$$
(48)

On the other hand when the repair/replacement time \(Y\) has density function \(g(y)\) then \(A(t)\) satisfies equation

$$A(t) = \bar{F}(t) + \sum\limits_{i = 1}^{{\left\lfloor {t/\tau } \right\rfloor }} [F(i\tau ) - F((i - 1)\tau )]\int\limits_{0}^{t - i\tau } {A(t - y - i\tau )g(y){\text{d}}y}$$
(49)

The steady-state availability and consequently the limiting average availability exist and are given by

$$A(\infty ) = A_{\text{av}} (\infty ) = \frac{{\int_{0}^{\infty } \bar{F}(x){\text{d}}x}}{{\tau \sum\nolimits_{i = 1}^{\infty } {i[F(i\tau ) - F((i - 1)\tau )] + \int_{0}^{\infty } \bar{G}(y){\text{d}}y} }}$$
(50)

It is interesting that if we let \(\eta = \infty\) in Mi (2002), \(Y = 0\) in Cui and Xie (2005), and notice that

$$\sum\limits_{i = 1}^{\infty } i[F(i\tau ) - F((i - 1)\tau )] = \sum\limits_{i = 1}^{\infty } i[\bar{F}((i - 1)\tau ) - \bar{F}(i\tau )] = \sum\limits_{k = 0}^{\infty } \bar{F}(k\tau )$$
(51)

then the results in Mi (2002) and Cui and Xie (2005) are the same as that one given in (50).

Example 3.3

Assume that system lifetime has Weibull distribution \(F(t) = 1 - \text{exp}( - (x/\lambda )^{k} )\) with mean \(\lambda\Gamma (1 + 1/k)\) and \(G(t) = 1 - \text{exp}( - \beta t)\). The following figures show the limiting average availabilities given by Eqs. (47) and (50) with different parameters \(\tau ,\lambda\) and \(\beta\) when \(k = 2\).

Different from most of previous research work on system availability, Tang et al. (2013) considered both calendar-based and age-based inspection policy. In their study, not only the downtime due to repair or replacement but also the downtime due to inspection are taken into consideration. In the past, only few works considered both nonnegligible times, for instance, Barroeta (2005), Jardine and Tsang (2006), Jiang and Jardine (2006), and Pak et al. (2006).

In the following, we use \(\nu_{w}\) to denote the constant downtime due to inspection for unfailed system and constant \(\nu_{f}\) to denote the total time when a failed system is detected including the downtime due to repair/replacement for failed system.

Model A studied in Tang et al. (2013) assumes that unfailed system found by inspection is upgraded or modified to be as good as new. But Model B assumes that unfailed system is put back to operation, i.e., the unfailed system remains as good as it is.

For Model A, the instantaneous system availability \(A(t)\) with a calendar-based inspection polity is given recursively by

$$A(t) = \left\{ {\begin{array}{*{20}l} {\bar{F}(t),} \hfill & {{\text{if}}\quad 0 \le t \le \tau ;\quad } \hfill \\ {0,} \hfill & {{\text{if}}\quad k\tau < t < k\tau + \nu_{w} ;\quad } \hfill \\ {\bar{F}(t - k\tau - \nu_{w} )A(k\tau ),} \hfill & {{\text{if}}\quad k\tau + \nu_{w} \le t\; < k\tau + \nu_{f} ;\quad } \hfill \\ {\begin{array}{*{20}l} {\bar{F}(t - k\tau - \nu_{w} )A(k\tau )\quad } \hfill \\ {\quad + \bar{F}(t - k\tau - \nu_{f} )(1 - A(k\tau ))} \hfill \\ \end{array} ,} \hfill & {{\text{if}}\quad k\tau + \nu_{f} \le t\; < (k + 1)\tau .\quad } \hfill \\ \end{array} } \right.$$
(52)

for \(k = 1,2, \ldots\). And the limiting average availability is given as

$$A_{av} (\infty ) = \tau^{ - 1} [\phi \int\limits_{0}^{{\tau - \nu_{w} }} {\bar{F}(x){\text{d}}x} + (1 - \phi )\int\limits_{0}^{{\tau - \nu_{f} }} {\bar{F}(x){\text{d}}x],}$$
(53)

where

$$\phi = \mathop {\lim }\limits_{k \to \infty } A(k\tau ) = \frac{{\bar{F}(\tau - \nu_{f} )}}{{F(\tau - \nu_{w} ) + \bar{F}(\tau - \nu_{f} )}}.$$
(54)

Under Model A if the age-based inspection policy is applied, then \(A(t)\) is recursively determined by the following equation:

$$A(t) = \left\{ {\begin{array}{*{20}l} {\bar{F}(t),} \hfill & {\quad {\text{if}}\quad 0 \le t \le \tau ;} \hfill \\ {0,} \hfill & {\quad {\text{if}}\quad \tau < t < \tau + \nu_{w} ;} \hfill \\ {\bar{F}(\tau )\bar{F}(t - \tau - \nu_{w} ),} \hfill & {\quad {\text{if}}\quad \tau + \nu_{w} \le t < \tau + \nu_{f} ;} \hfill \\ {\bar{F}(\tau )A(t - \tau - \nu_{w} ) + F(t)A(t - \tau - \nu_{f} ),} \hfill & {\quad {\text{if}}\quad t \ge \tau + \nu_{f} .} \hfill \\ \end{array} } \right.$$
(55)

and the limiting average availability is

$$A_{av} (\infty ) = \frac{{\int_{0}^{\tau } \bar{F}(x){\text{d}}x}}{{\tau + \nu_{w} \bar{F}(\tau ) + \nu_{f} F(\tau )}}$$
(56)

For Model B, the instantaneous system availability \(A(t)\) with a calendar-based inspection policy is determined recursively as follows:

$$A(t) = \left\{ {\begin{array}{*{20}l} {\bar{F}(t),} \hfill & {{\text{if}}\quad 0 \le t < \tau ;} \hfill \\ {0,} \hfill & {{\text{if}}\quad k\tau < t < k\tau + \nu_{w} ;} \hfill \\ {\bar{F}(t - k\nu_{w} ) + \sum\limits_{i = 1}^{k - 1} B(t - i\tau - \nu_{f} )p_{i} ,} \hfill & {{\text{if}}\quad k\tau + \nu_{w} \le t < k\tau + \nu_{f} ;} \hfill \\ {\bar{F}(t - k\nu_{w} ) + \sum\limits_{i = 1}^{k} B(t - i\tau - \nu_{f} )p_{i} ,} \hfill & {{\text{if}}\quad k\tau + \nu_{f} \le t \le (k + 1)\tau .} \hfill \\ {} \hfill & {} \hfill \\ \end{array} } \right.$$
(57)

The equations for determining \(B(s)\) appearing in the above expression can be found in Tang et al. (2013) and is omitted here because it is tedious to display them. The limiting average availability in this case is obtained as

$$A_{av} (\infty ) = \frac{{\int_{0}^{\infty } \bar{F}(x){\text{d}}x}}{{\tau \sum\nolimits_{k = 0}^{\infty } {\bar{F}(s_{k} )} }},$$
(58)

where \(s_{0} = 0\), and \(s_{k} = k(\tau - \nu_{w} ) + \nu_{w} - \nu_{f} ,\forall k \ge 1\).

In the age-based inspection policy is applied under Model B, then \(A(t)\) is determined recursively by

$$A(t) = \left\{ {\begin{array}{*{20}l} {\bar{F}(t - n(t)\nu_{w} ) + \sum\limits_{k = 1}^{m(t)} A(t - t_{i} - \nu_{f} )p_{i} ,} \hfill & {\quad {\text{if}}\quad 0 \le t \le n(t)(\tau + \nu_{w} ) + \tau ;} \hfill \\ {\sum\limits_{k = 1}^{m(t)} A(t - t_{i} - \nu_{f} )p_{i} ,} \hfill & {\quad {\text{if}}\quad t > n(t)(\tau + \nu_{w} ) + \tau } \hfill \\ \end{array} } \right.$$
(59)

where

$$m(t) = \left\lfloor {\frac{{t - \nu_{f} + \nu_{w} }}{{\tau + \nu_{w} }}} \right\rfloor ,\quad n(t) = \left\lfloor {\frac{t}{{\tau + \nu_{w} }}} \right\rfloor$$
(60)

and

$$t_{i} = (i - 1)(\tau + \nu_{w} ) + \tau ,\quad p_{i} = \bar{F}((i - 1)\tau ) - \bar{F}(i\tau ).$$
(61)

Under the same assumptions the limiting average availability is obtained as

$$A_{av} (\infty ) = \frac{{\int_{0}^{\infty } \bar{F}(x){\text{d}}x}}{{(\tau + \nu_{w} )\sum\nolimits_{k = 0}^{\infty } {\bar{F}(k\tau ) + \nu_{f} - \nu_{w} } }}$$
(62)

Note that when \(\nu_{w} = 0\) it holds that

$$A_{av} (\infty ) = \frac{{\int_{0}^{\infty } \bar{F}(x){\text{d}}x}}{{\tau \sum\nolimits_{k = 0}^{\infty } {\bar{F}(k\tau ) + \nu_{f} } }}$$

and it turns out to be the same as the expression of \(A_{av} (\infty )\) in Sarkar and Sarkar (2000).

Example 3.4

Suppose that system lifetime has exponential distribution \(F(t) = 1 - exp( - \alpha t)\), the following figures show the limiting average availabilities determined by Eqs. (53), (58) and (56), (62), respectively, when \(\alpha = 0.05,\nu_{w} = 1\) and \(\nu_{f} = 2\). The left panel corresponds to the calendar-base inspection policy, and the right panel corresponds to the age-base inspection policy.

4 Other Works on System Availability

The previous two sections address availability of systems without specifying their configurations. In the field of reliability, the \(k\)-out-of-\(n\) system has particular importance since it is widely used in practice. Recent research works on the availability of \(k\)-out-of-\(n\) system include, for example, Fawzi (1991), Frostig (2002), De Smidt-Destombes et al. (2004, 2006, 2007, 2009), Li et al. (2006), Yam et al. (2003), and Zhang et al. (2000, 2006) among others.

There are also lots of studies on availability of various other systems appearing in industry. For instance, Berrade (2012), Chung (1994), Dhillon (1993), Dhillon and Yang (1992), Klutke et al. (1996, 2002), Lau et al. (2004), Mishra (2013), Pascual (2011), Pham-Gia and Turkkan (1999), Vaurio (1997, 1999), Wang et al. (2006), Wang and Chen (2009), and works referred therein. In these works warm standby, imperfect switch, reboot delay, common cause failures, and random deterioration were considered. Specifically, Mi (2006a, b) introduced the concept of pseudo availability which differs from the traditional availabilities in that once the system is in ‘up’ state, it will remain there forever without change.

It is worthy of mentioning that in addition to \(A(t)\), the interval availability and the steady-state interval availability are probably as important as the instantaneous system availability or even more important for certain situations. For any \(w \ge 0\) the interval availability is defined as \(A_{w} (t) = P(\xi (s) = 1,\quad t \le s \le t + w)\). It is the probability that the system is functioning during the interval \([t,t + w]\). Of course, if \(w = 0\) then \(A_{w} (t)\) becomes the instantaneous system availability \(A(t)\). Mi (1999) and Huang and Mi (2013) discussed interval availability. We think it would be worthwhile studying those models mentioned above to derive more results about interval availability.