1 Introduction

Analysis of interdependence among random variables and processes is abundantly available in the literature. The purpose of this paper is to study the reliability of systems with two or more interdependent components. Krishnamoorthy (2020) introduced analysis of interdependence through semi-Markov approach. He assumes that the evolution is according to a multi-dimensional semi-Markov process; the dimension depends on the number of processes involved. If there are n processes in our study, a few of them could be interdependent in groups; between distinct groups there will be no interdependence. A few may remain neutral, which means that, those are independent of all other processes involved. More specifically, we have a finite number of processes forming distinct classes; members of the same class are interdependent whereas those from distinct classes are not dependent on each other. The analysis of such systems is much more complex. Therefore we consider the case of interdependence among all the processes involved. Though interdependence of processes have extensively been analysed in the literature, the type of analysis we employ in this paper is a very recent introduction as mentioned in Krishnamoorthy (2020).

First we give an overview of the work done very recently on system reliability with interdependent components. A brief review of the work done on different kinds of interdependent models in queueing theory could be found in Krishnamoorthy and Viswanath (2021) . Some details on \( k\hbox {-}out\hbox {-}of\hbox {-}n: G \) system with/without repair can be found in Krishnamoorthy et al. (1999) and in the references therein.

Bian et al. (2021) examine two reliability models for multi-component systems whose normal functioning gets affected due to dependent competing failure processes. All components in the systems are subject to two types of failures:

  1. 1.

    Soft failure owing to internal degradation

  2. 2.

    Hard failure due to external shocks.

A soft failure happens when the total degradation amount of a component exceeds its soft failure threshold whereas a hard failure of components occur when a single shock load or the accumulated shock load exceeds its hard failure threshold. A component fails when a soft failure or a hard failure happens, whichever occurs first. The external shock arrival is governed by a non-homogeneous Poisson process and impacts all components of the system simultaneously. The soft failure and hard failure processes are dependent because the external shock can not only break down components stochastically, but also accelerate the degradation process of the components. The analytic forms of the reliability functions for systems under cumulative shock model and extreme shock model are derived. The random point method in Monte Carlo simulation and vector program in Matlab are applied to calculate multiple integral in the expressions of the reliability functions. Finally, a case study on the Micro-Electro Mechanical Systems with multiple and distinct components, is provided to illustrate the proposed model.

Bian et al. (1992) investigate the extreme and \( \delta \) shock models in which systems are subject to dependent hard failure and soft failure processes. Under the extreme shock model, a hard failure occurs if the magnitude of a single shock surpasses a critical level, while under the \( \delta \) shock model, a hard failure occurs if the interval time between two successive shocks is smaller than a threshold. Under both shock models, soft failure of the system occurs if the total quantum of degradation surpasses a given soft failure threshold. The hard and soft failure processes are interdependent owing to the fact that external shock will bring abrupt increment in the degradation path of the system, and on the other hand, the amount of total degradation will affect the hard failure threshold of the system. The failure of the system occurs if a hard failure or a soft failure occurs, whichever happens first. The reliability expressions of the system subject to the two shocks are derived explicitly. Some reliability indices of the system are calculated. A case study of the sliding spool is provided to illustrate the proposed model.

Haiyan Shi et al. (2020). In the absence of failure data, to use the inaccurate empirical data given by experts to evaluate the reliability of the system, the inaccurate empirical data are regarded as uncertain variables, and the parameters in uncertainty distribution function are also uncertain variables. This paper studies an extreme shock model with dependent competitive failure, both internal natural degradation and external shock can cause system failure, the external shock will cause a sudden increase in the amount of degradation. The degradation process is a linear uncertain process, and the external shock is described by an uncertain renewal reward process. The reliability and the mean time to failure of the system are calculated by employing uncertainty theory. Using micro-electro-mechanical systems (MEMS) as an example, the sensitivity of the system reliability is simulated, and the reliability of the system under uncertain parameters and constants is compared, as well as the reliability of the system under the dependent competitive failure model and the independent competitive failure model.

Bian et al. (1992) proposes an effective dependent competing failure model for systems subject to shocks. Under worse system degradation, shocks with the same magnitudes can bring sudden degradation increments which was ignored in most existing research. To address this problem, a time-dependent rate is included for the sudden degradation increments by shocks. This time-varying rate is applied for the consideration that system degradation is closely related to operation time. Two dependent competing failure processes, i.e., soft failure and hard failure, are involved in the dependent competing failure model. The distribution of the total sudden degradation increments is then deduced, and its accuracy is verified by Monte Carlo simulation. The so developed reliability model is illustrated by the analysis of a micro-electro-mechanical system. The sensitivity analysis of important parameters is also performed. The results obtained show that the proposed time-varying model effectively considers the impact of system degradation on sudden degradation increments, and by using this model, the change of sudden degradation increments can be well reflected under different system performances. These advantages make the reliability model more practical and help achieve more effective maintenance policies.

With this brief survey, we proceed to the theme of this paper Shuyuan and Zhifang (2021) evaluate the reliability of a dependent competing failure model including a time-varying rate for sudden degradation increments.

The salient features of this paper are:

  • It introduces a new approach to the analysis of reliability of systems with components (units) which are interdependent.

  • Distributions of time to failure of serial, parallel and the more general \( k\hbox {-}out\hbox {-}of\hbox {-}n: G \)system with interdependent units, are derived.

  • Theoretical comparisons with the independent systems are provided.

  • It opens a new direction of research in reliability analysis.

Notations and Abbreviations used:

MC::

Markov chain

CTMC::

Continuous time Markov chain

rv::

Random variable

The section-wise breakup of this paper is as follows: In section 2, the mathematical formulation of the problem is provided. Distributions of the time to failure of distinct interdependent systems are derived in section 3. Comparisons with the corresponding independent systems are also provided. Finally in the concluding section, we provide an outline of future course of work.

2 Description and Mathematical Formulation

In the sequel we need Phase type (PH) distribution, introduced by Marcel F. Neuts in 1970’s (see Neuts 1981). Consider a MC on the state space \(\{1,\ldots , m, m+1\} \) with the first m states transient and the state \( m+1, \) absorbing. The initial probability vector of this MC is \( (\alpha _1,\ldots , \alpha _m, \alpha _{m+1})= (\pmb {\alpha }, \alpha _{m+1}) \) and one step transition probability matrix \( P = (p_{ij}) \). In most of the modeling problems \( \alpha _{m+1} \) is taken as zero, to avoid instantaneous absorption (this assumption also ensures that in phase type processes only a finite number of events takes place with positive probability in a finite time interval). Imagine a particle, starting from one of the transient states according to the initial probability vector, moving through the transient states and finally getting absorbed in state \( m+1 \). The sojourn time random variable in a transient state j depends on j as well as the state to be visited next, say k (including \( k = m + 1 \)). This sojourn time random variable is assumed to be exponentially distributed with parameter \( \mu _{jk} \). The sojourn time distribution is usually taken as a general one. However, in this paper we restrict the sojourn time to be exponentially distributed. The general case will be investigated in a follow up paper. We may regard \( p_{jk} \) and \( \mu _{jk} \) connected via \( \mu _{jk} = \mu p_{jk} \) for some positive \( \mu \).Such an evolution is referred to as semi-Markov process (see Ross 1992, pp. 86–87). The distribution of the time T, spent by the particle in the transient states before it escapes to \( m+1 \), is referred to as Phase type distribution with representation \( (\pmb {\alpha },S) \) of order m and S is the matrix of coefficients of the \( p_j(t)' s \) on the right hand side of the matrix difference-differential equations described below. This is obtained by solving the finite system of difference-differential equations satisfied by \(p_j(t) = \) probability that the particle is in the transient state j at time t,  that is \( P(T>t)\). We get this as the solution to the matrix differential equation: \( P'(t)= P(t).S \) with initial condition \( P(0) = \pmb {\alpha } \). In this \( P'(t) = (p'_1(t),\ldots , p'_m(t)) \) is the vector of component wise derivatives of \( P(t) = (p_1(t),\ldots , p_m(t)) \) and S is the matrix of coefficients on the right hand side of the equations (these are precisely the transition rates among the transient states). The solution to the above matrix differential equation is: \( P(t) = \pmb {\alpha } e^{St} \). This means that \( P(T \le t) = 1 - \pmb {\alpha }e^{St}{\mathbf {e}} \) where \( {\mathbf {e}} \) stands for the m-component column vector of 1’s. Define \( S^0 \) by \( S^0 = -S{\mathbf {e}} \). Thus \( S^0 \) consists of rates of absorption into \( m+1 \) from the transient states.

3 Derivation of Distributions of Time to Failure

In reliability, interdependence is considered in different forms. We have seen a few work on interdependence in the introduction. In a two component system if both perform independently of each other, we can easily compute the time to system failure irrespective of whether it is a serial or parallel system. Let us consider an \( n- \) component system with identical or non-identical components. Assume that there are n Markov chains each of which generates the lifetime of one component each. When the components are interdependent, we have to consider the product space of these MCs. The Markov chain defined on this product space is our "starting set". To this we associate an initial probability vector and a one-step transition probability matrix. The generating MCs for the first, second,..., nth components be designated by \( \{Z_j, j=1,\ldots ,n\} \), on finite state spaces \( \{1, 2, \ldots , m_j\} , j = 1, 2,\ldots , n \). Look at the process that evolves according to a semi-Markov process on the product space of these state spaces: \( \{( k_1,\ldots , k_n) | k_j \in \) the state space of the \( MC \{Z_j, j = 1,2\ldots ,n \}\). The sojourn time in state \( (k_1,\ldots ,k_n) \) is assumed to have exponential distribution with parameter \( \lambda _{(k_1,\ldots ,k_n), (r_1,\ldots ,r_n)}\) where \( (r_1,\ldots ,r_n) \) is the state visited after the sojourn in \( (k_1,\ldots ,k_n)\) according to the Markov chain rule. Here none, one, two,.., all coordinates in \( (r_1,\ldots ,r_n) \) can differ from the corresponding ones in \( (k_1, .\ldots , k_n) \). If the component life times are independent, then at most one coordinate change alone can take place with positive probability. The purpose with the assumption of exponentially distributed sojourn time is to have a continuous time Markov chain (CTMC). The transition could be to itself as well, which is equivalent to not moving out. A transition can trigger an "event" of the type of the component Markov chains or may turn out to be a transition without any event occurrence. Here by "event" we mean a component failure. To make such "event occurrence" more explicit, we can include one extra state (phase) in the state spaces of each of the \( MC'\)s \(\{Z_j, j =1,2,\ldots ,n\} \). Transitions without event occurrence are referred to as those resulting in mere phase changes. If the MC does not have any specific absorbing state, then the starting phase of the next event(s) is the new coordinate resulting due to coordinate change. Note that when the components are interdependent, more than one coordinate changes can take place with positive probability.

First we consider a 2-component system. Let the state spaces of the underlying \( MC'\)s, indicating the evolution of the components be labeled as \( \{1, 2,\ldots , r, r+1\} \) and \( \{1,2,\ldots , s, s+1\} \) where states \( r+1 \) and \( s+1 \) indicate the absorbing (failure) state of the components 1 and 2 respectively.

We consider two cases: 1.The system fails when the state \( (i,s+1) \) or \( ((r+1,j) \) or \( (r+1,s+1) \) is reached. 2. The system fails only when the state \( (r+1,s+1) \) is reached. In reliability theory, a system with failure of the first type is called a serial system system where as the one which fails only when situation described by 2 is arrived at, is called a parallel system. We shall compute the reliability of such systems. For case 1, system failure could be consequent to its entering the state \( (r+1,s+1) \)(if it is an interdependent system in which more than one coordinate change is also permitted due to a transition in an interval of very short duration) or on entering the state\( (i, s+1) \)or the state \( (r+1, j) \).

We can also consider the case of transitions being restricted to the cases where at most one coordinate in the second state differs from the corresponding state in the first state indicated in the pair. We shall discuss all these in the sequel.

Notice that, for systems with independent components, only none or one coordinate change takes place in a short time interval, with positive probability. Therefore the infinitesimal generator of the system evolution contains a large number of zeros (results in a sparse matrix); the same is the case if we consider the interdependent system also as having the property that due to a transition at most one change in coordinate takes place with positive probability. If we allow the possibility of more than one coordinate change also with positive probability, then the infinitesimal generator of the system evolution becomes less sparse (in the case of a two-component system, the infinitesimal generator will have all entries different from zero, provided all transitions are possible with positive probability). Thus analysis of interdependent systems with more than two components becomes harder depending on the assumptions we make.

To make the analysis a bit simple, consider the case of \( n= 2 \) to start with. Let the life times of the two components be generated, respectively by the Markov chains X and Y, described by the sequences \( \{X_n\} \) and \( \{Y_n\} \) and having state spaces labeled as \( \{1,\ldots , r, r+1\} \) and \( \{1,2,\ldots ,s, s+1\} \) with the states \( r+1 \) and \( s+1 \) absorbing states of the respective MC’s. The initial probability vectors of these \( MC'\)s are \( (\pmb {\alpha },\alpha _{r+1}) = (\alpha _1, \ldots , \alpha _r, \alpha _{r+1}) \) and \( (\pmb {\beta },\beta _{s+1}) = (\beta _1,\ldots ,\beta _s, \beta _{s+1}) \) respectively; in these \( \pmb {\alpha } \) and \( \pmb {\beta } \) are the vectors formed by the first r and first s components of the initial probability vector. The last components of \( \alpha _{r+1} \) and \( \beta _{s+1} \) of these vectors are taken to be zero to ensure that the two do not have zero life time (instantaneous failure on starting). The \( MC'\)s \( X = \{X_n\} \) and \( Y= \{Y_n\} \) are independent if

$$\begin{aligned} P\{(X_{n+1},Y_{n+1}|(X_n,Y_n)\} = P\{X_{n+1}|X_n\}.P\{Y_{n+1}|Y_n\}; \end{aligned}$$

if the equality does not hold, the processes involved are interdependent.

We consider a two-component system which starts its operation in one of the states described above according to the initial probability vector which can be regarded as

$$\begin{aligned} \pmb {\alpha }\times \pmb {\beta } = ((\alpha _1, \beta _1), (\alpha _1, \beta _2), ...(\alpha _1,\beta _{s+1}), (\alpha _2, \beta _1), ... , (\alpha _{r+1}, \beta _{s+1})), \end{aligned}$$

an \( (r+1)(s+1) \) component row vector. We continue to assume that \( \alpha _{r+1}=\beta _{s+1}= 0 \). We have a one-step transition probability matrix L for this MC. Assume that the sojourn time in a state (ij) depends on the that state and also state, say (pq) , to be visited next, according to the MC rule. Further assume that this sojourn time is exponentially distributed with parameter \( \lambda _{(i,j),(p.q)} \). This is how the process evolves. This is a particular case of semi-Markov transition“(the description given below for the general case is assumed to have the components behaving independently of each other); particular case because there is no need to assume the sojourn time to be exponentially distributed. It is done to get a continuous time MC. However, the parameter should have dependence on the state currently occupied and the state to be visited next, as indicated earlier.

A brief analysis of the underlying semi-Markov process with generally distributed sojourn time in the transient states is given below: Assume that the sojourn time random variable in the state i, has the distribution \( G_{ij} (.)\) where j is the state visited next according to the MC rule. Upon visiting the state \( m+1 \), the component fails. Thus the semi-Markov kernel is \( G(t) = [G_{ij}(t)] \). Similarly, for the second component let \( F_{ij}(.) \) be the distribution of the sojourn time in state i before jumping to the next state j. Its semi-Markov kernel is \( F(t) = [F_{ij}(t)] \). Then, for the 2-component parallel system with independent components the failure time distribution can be computed. To this end we proceed as follows. Suppose that the system starts in stage 1 of each component and the degradation process is in the order \( 1 \rightarrow 2 \rightarrow 3...r \rightarrow r+1 \) and \( 1 \rightarrow 2 \rightarrow 3...s \rightarrow s+1 \) . Then we have

Theorem 3.1

The distribution to the time till failure for the serial system is \( min\{G_{(12)}*G_{(23)}*...*G_{(r,r+1)} (t), F_{(12)}*F_{(23)} * ... *F_{(s,s+1)}(t)\} \) for non-identically distributed component life times. For parallel system it is given by \( max \{ G_{(12)}*G_{(23)}*...*G_{(r,r+1)} (t), F_{(12)}*F_{(23)} *... *F_{(s,s+1)}(t)\} \). The notation * stands for convolution.

The more general case in which the system starts with initial states as given by the IPVs \(\pmb {\alpha } \) and \(\pmb {\beta } \), can be treated in a similar fashion. The book by Dudin et al. (2020) gives a detailed analysis of queues with semi-Markov arrival and semi-Markov service. However, the processes discussed therein are not interdependent. Now we turn to a few special cases of the general results derived above. We have the following

Corollary 1

If the \( MC'\)s X and Y are independent, then

  1. a.

    for a two-component serial system with non-identical but mutually independent PH life time distributions with representations, \(PH(\pmb {\alpha },T) \) and \( PH(\pmb {\beta }, S) \) of dimensions r and s, the life time is Phase type distributed with representation \( PH(\pmb {\alpha } \otimes \pmb {\beta }, T \otimes I_s + I_r\otimes S) \) of order mn where T and S are, respectively the infinitesimal generators of the two components life times. \( I_m \) stands for identity matrix of order m.

  2. b.

    for a two-component parallel system the failure time distribution is also PH with representation \( PH(\Gamma ,L) \) of order \( rs+r+s \) where \( \Gamma = (\pmb {\alpha } \otimes \pmb {\beta }, \beta _{s+1} \pmb {\alpha } ,\alpha _{r+1}\pmb {\beta }) \) and L is the matrix \(\begin{bmatrix} T\oplus S &{}I_r \otimes S^{o}&{}T^{o}\otimes I_s\\ 0&{}T&{}0 \\ 0&{}0&{}S \\ \end{bmatrix}.\) In this \( T^{0}= -T{\mathbf {e}} \) and \( S^{0}= -S {\mathbf {e}} \) .

Proof of a: Note that under the assumptions made, the life times of the two components are distributed as Phase type with representations \((\pmb {\alpha },T) \) and \( (\pmb {\beta }, S) \) respectively, and independent of each other. The minimum of two independent PH distributions is also PH with representation, indicated in the statement of the Theorem (see Theorem 2.2.9 of Neuts 1981).

Proof of b: In this case we have to write the infinitesimal generator of the system to accommodate the two phase type random variables (i.i.d. or merely independent). Writing the difference-differential equation satisfied by the probability of the system state (jk) at time t and solving it using the initial condition, we get the stated result. Alternatively, we can make use of the fact that the maximum of two independent PH distributions is also a PH distribution (see Neuts 1981), Theorem 2.2.9).

Next we consider the case of the two chains being interdependent. Consider the product space of the two \( MC'\)s: \( \{(i,j)| i=1,2\ldots ,r,r+1; j= 1,2,\ldots ,s,s+1\} \) and the MC defined on that. We may consider the states \( r + 1 \) and \( s + 1 \) as absorbing states of the respective \( MC's \). . When we consider the MC on the product space, the states \( (r+1,j), (i,s+1), (r+1,s+1) \) for \( i=1,2,\ldots ,r \) and \( j=1,2,\ldots ,s \) could be absorbing states, depending on the specific problem under consideration - for example if the system is a serial one, then the states \( (r + 1, j), (i, s + 1), (r + 1, s + 1), \) for \( i = 1, 2, \ldots , r and j = 1, 2, \ldots , s \), are absorbing. Since in interdependence cases more than one change in coordinates due to a transition is possible with positive probability, we first consider the case where at most one change in coordinate is possible with positive probability when a transition takes place. In this case the resulting infinitesimal generator of the CTMC is sparse: rates of transitions is zero for transitions like \( (j,k)\rightarrow (j',k') \) where both entries in the right hand pair differ from the corresponding ones on the left.

First we consider the reliability of a 2-component serial system. Designate the transition rates \( (j,k)\rightarrow (j',k) \) by \( \lambda _{(j,k),(j',k)} \) and to represent transitions of the form \( (j,k) \rightarrow (j,k') \) we use the notation \( \mu _{(j,k) (j,k')} \). Thus along the diagonal of the infinitesimal generator the typical element is \( -( \lambda _{(j,k),(j,k)}+ \mu _{(j,k)(j,k)})\). These descriptions of entries in the infinitesimal generator provide a clear picture of how it looks like. For a quick reference the infinitesimal generator \( Q_1 \) of the process is given below:

$$ { \left[ {\begin{array}{*{20}c} -(\lambda _{(1,1),(1,1)}+ \mu _{(1,1),(1,1)}) &{} \mu _{(1,1),(1,2)} &{} {...} &{} {0} &{} {\lambda _{(1,1),(2,1)}} &{} {..} &{} {0} &{} &{} {..} &{} {0} \\ {\vdots } &{} {} &{} {\ddots } &{} {\ddots } &{} {} &{} {} &{} {} &{} {}&{} {} &{} \\ {0} &{} {...} &{} {..} &{} {0} &{} {..} &{} {-(\lambda _{(r,s),(r,s)}+ \mu _{(r,s),(r,s)})} &{} \mu _{(r,s),(r,s+1)} &{} {...} &{} \lambda _{(r,s),(r+1,s)} &{} {0} \\ 0&{} ... &{}&{} 0 &{} .. &{} &{} &{} &{}.. &{}0 \\ \end{array}} \right] }$$

Assume that the initial probability vector of the system state (restricted to the transient part; the rest of the states are assumed to be occupied by the process with zero probability) is given by the vector \( \pmb {\gamma } = (\gamma _{11}, \gamma _{12},\ldots , \gamma _{1r} , \gamma _{21}, \ldots \gamma _{2r}, \ldots \gamma _{rs}) \). It is to be noted that there are several absorbing states for the serial system given by \( (j,s+1) \), for \( j = 1,\ldots , r \) and \( (r+1,k) \) for \( k = 1,\ldots , s \).

Now form the difference-equations satisfied by the probabilities \( P_{(j,k)}(t) \) of the transient states (jk) , being occupied by the system at time t. Solving the resulting matrix differential equation we get the solution as given in the following

Theorem 3.2

Let U be the random variable designating the time until absorption (system failure). Then \( P(U > t) =\pmb {\gamma }. e^{Q_1 t} {\mathbf {e}}\) where the entries of \( Q_1 \) are the rates of transitions among the transient states with the entries along the diagonal each equal to the negative of the sum of all off-diagonal entries of that row, including those in the positions corresponding to the absorbing columns. Its complement is the probability of the system failure at or prior to time t. This distribution is also of PH type with representation \( PH(\pmb {\gamma } , Q_1) \) of order r.s.

Note: A comparison between results in Corollory 1a. and the result in Theorem 3.2 gives us the significant difference between reliability of a 2-component independent serial system and that of an interdependent serial 2-component system.

Next we proceed to compute the reliability of an interdependent 2-component parallel system. In this case the states \( (j,s+1) \) and \( (r+1,k) \) for \( j = 1,\ldots , r \) and \(k = 1, , \ldots , s \) are transient states. These states did not appear in the initial probability vector (of the states in the transient part of the serial system discussed in Theorem 3 because they are absorbing states for that system. However, for the parallel system such states are not absorbing (they are rather “partially absorbing”). Nevertheless, we assume that the system is not in any of these states initially with positive probability. Let \( (\pmb {\sigma },\sigma _{r+1,s+1}) \) be the initial probability vector. In this \( \pmb {\sigma } \) is the initial probability vector corresponding to the transient states. It is assumed that \( \sigma _{r+1,s+1} = 0. \) In this case we shall designate transitions among states \( (j,k)\rightarrow (j',k') \) by the symbol \( \delta _{((j,k),(j',k')} \); these will be zero whenever both coordinates in the second pair differ from the corresponding coordinates in the first pair and also if \( j'\ne j+1 \) if change is in the first coordinate and for\( k'\ne k+1 \) if change is in the second coordinate (not skip-free to the right).

Writing the system of difference differential equations satisfied by the system state probabilities (there are \( r.s+r+s \) such equations in the system), designating by  Vthe matrix of coefficients on the right sides of these equations(as done in Theorem 3.2) and solving the resulting matrix differential equation we arrive at the following

Theorem 3.3

The probability of the interdependent 2-component parallel system providing failure free operation in the time interval [0, t] is given by \( P(T > t) = \pmb {\sigma } e^{Vt} {\mathbf {e}} \) where T is the time till failure (time till absorption to the state \( (r+1,s+1) \) and \( {\mathbf {e}} \) is a column vector of 1s’ of order \( r.s+r+s \). In other words the distribution of the time until failure of the system is PH type with representation \( PH(\pmb {\sigma }, V) \) of order \( r.s+r+s \). Presented in notations: \( P(T\le t) = 1 - \pmb {\sigma } e^ {Vt} {\mathbf {e}} \) .

Next we pass on to the computation of the reliability of a \( k\hbox {-}out\hbox {-}of\hbox {-}n: G \) system in the two cases where the components are independent and also when the components are interdependent. This system has n components (units); their lifetimes could be i.i.d random variables or merely independent, but not identically distributed or even mutually dependent (interdependent). The system is operational as long as at least k among the n units are in working condition. Note that in the independent case it is quite easy to compute the system reliability. We need to compute the distribution of the time till \( n-k+1 \) units fail irrespective of the life time distribution of the components. The computation of reliability becomes complex when the components work in groups that have some dependence characteristics.

For simplicity in exposition, hereafter we restrict our attention to the case of individual component life times distributed as Erlang of order r and parameter \( \lambda \) (this parameter is that of the exponentially distributed life time of each phase of the components). It is a particular case of PH distribution in which the infinitesimal generator has entries \( -\lambda \) along the diagonal up to the rth row and the last entry along the diagonal is zero; along the upper diagonal each entry is \( \lambda \). The advantage with this assumption is that the r stages of the Erlang distribution \( 1,2,\ldots , r \) can be regarded as the best, second best,..., very poor working conditions of the component. Erlang distribution of order r is the sum of r i.i.d exponential rvs. From the state r the component moves to state \( r+1 \) which represents the failed state. One can assume PH distributed component life times. However, this will not give a feel about the increase in deterioration as the component working state moves from j to some state u because in PH distribution the transitions can be to any of its states from a transient state. The author of this paper has very recently devised an approach were each stage life time can be of phase type-this will appear in a special issue of Annals of Operations Research (2022/23). The assumption of Erlang distribution as component life time is highly advantageous when we analyse the reliability of the \( k\hbox {-}out\hbox {-}of\hbox {-}n \) system with interdependent components. It is quite easy to extend the results to generalized Erlang life time distribution of components. In this case at least two among the exponential rvs involved should have distinct parameters.

Each of the  n components of the system has \( Erlang-r \) distribution with parameter \( \lambda \) for the exponentially distributed stage life time. If n is large compared to r, we consider the number of components which occupy the same state at any given time. On the other hand, if r is large compared to n, we consider the state associated with each component at any given time. This helps in reducing the dimension problem which is especially important in computation. Theoretically also this has an advantage, as we see in the discussions to follow. When one of the components fails, the number of working components moves down from n to \( n-1 \) and again drops down by one, each time a component failure takes place.

At time zero \( (t=0) \) all components are in the best of their conditions (each one occupying state/stage 1). With passage of time the components start deteriorating. Because we have a CTMC describing the system state at any time, at most one state change \( j\rightarrow j+1 \) can take place with positive probability and, that too in only one component (this is the case when the components behave independently of each other because only one event can take place with positive probability in a short interval of time; in interdependent case the scenario could be different which, however, is to be decided by the researcher concerned.

Assume that \( n > r \). Then the state space of the process has the form

$$\begin{aligned} \{((1,n_1), (2,n_2),\ldots ,(r,n_r), s)| n_1+\cdots +n_r+ s = n\} \end{aligned}$$

where the last element s stands for the number of failed components at the time of observation and \( n_j \) is the number of components in operation whose stage of life is \( j, j=1,2\ldots ,r \). This can also be represented in a much simpler form: \( \{((n_1, n_2,\ldots , n_r, s))| n_1+ \cdots +n_r+ s = n\} \). Initially \( s= 0 \) and on failure of the system, s has the value \( n - k+1 \). This completely specifies the CTMC. Since transitions are such that either the states remain the same or there is a change in exactly one state due to the transition,the transition, we see that (jnj) to \( (j + 1, n_{j+1}) \) has the rate \( \lambda .{n_j} \). Consequent to this transition, \( (j,n_j) \) becomes \( (j,n_{j} -1) \) because one component moves to the next higher deterioration stage and \( (j+1,n_{j+1}) \) changes to \( (j+1, n_{j+1}+1) \). If \( n_j =0 \) then no such transition takes place with positive probability. Suppose a transition from \( (r,n_r) \) takes place. This indicates failure of a component and so the last element s goes up by 1 to \( s+1 \). With the above description we can write down the infinitesimal generator of the system evolution. Note that the transient states of the underlying MC are

$$\begin{aligned} \{((1,n_1), (2,n_2),\ldots ,(r,n_r), s)| n_1+\cdots +n_r+ s = n-k\}. \end{aligned}$$

The infinitesimal generator of the above described system is: \(\begin{bmatrix} -n\lambda &{} n\lambda &{} 0&{} \ldots &{}0\\ 0&{} -(n-1)\lambda &{} (n-1)\lambda &{} \ldots &{}0\\ \vdots &{} \ddots &{}\ddots &{}&{}\\ 0&{}0&{}\ldots &{} \lambda &{} 0 \\ 0&{}0&{}0&{}0&{}0\\ \end{bmatrix}\)

Observe that the part of infinitesimal generator for the transient states of the process is the matrix of order \( r^{n}.s \). These are the coefficients (of unknown probabilities) appearing on the right hand side of the differential-differential equations, satisfied by the system state probabilities \( P_{(j,n_j) }(t) \) of being in state j with \( n_j \) components in that state at time t . The initial probability vector of the MC is \( \pmb {\Pi } =(1,0,..,0,0). \) In this we consider only the contribution of the transient part which is the vector \( \pmb { \xi }=(1,0,\ldots ,0) \) of dimension r(which is the part the initial probability vector corresponding to the transient states). The solution to this matrix differential equation gives the probability of the system in the transient states at least up to time \( t: P(T> t) = \pmb {\xi } e^{Wt} {\mathbf {e}} \) where W is the matrix of coefficients in the matrix differential equation and \( {\mathbf {e}} \) is a column vector of 1s’ of order r. Its complement gives the probability of system failure at or prior to time t. The above discussion leads to the

Theorem 3.4

The time T to failure of a \( k\hbox {-}out\hbox {-}of\hbox {-}n: G \) system with independent identical \( Erlang-r \) distributed components, is given by the PH distribution \( P(T> t) = \pmb {\xi } e^{Wt}{\mathbf {e}} \). Its complement is \( P(T \le t)= 1-\pmb {\xi } e^{Wt} {\mathbf {e}} \) which is represented as \( PH(\pmb {\xi },W) \) of order \( r^{n}.s - 1 \).

Next we pass on to a more complicated structure. What happens when the components of the above described \( k\hbox {-}out\hbox {-}of\hbox {-}n: G \) system are interdependent. As done in the development of Theorems 3.2 and 3.3, we assume that there are n finite state space MCs which determine the evolution of the interdependent system. Consider the product space of these MCs and look at the resulting semi-Markov process whose state space is that product space, which governs the evolution of the entire system. We assume that the transitions in the component states follow the order \( 1\rightarrow 2 \rightarrow 3... r\rightarrow \) the absorbing state \( r+1 \). Unlike the case when the components behave independently of each other, in the case of interdependence, we have to consider the components separately. This results in a huge increase in the dimension of the state space as against the case described in Theorem 3.4. A typical element of the state space is \( (j_1,\ldots ,j_n) \) in which each coordinate takes values \( 1, 2,\ldots , r, r+1 \). Falling into state \( r+1 \), results in the failure of that component. Here again failure of \( n - k + 1\) components results in the failure of the system. First we restrict to the case of at most one coordinate change with positive probability due to a transition. Note that any coordinate \( j_s \) can change to \( j_s +1 \) for \( s = 1,2,\ldots , n \), consequent to a transition. In this, if \( j_s \) is r, then a further transition that triggers a change in it results in the failure of the corresponding component. Denote the rate of transition: \( (j_1,\ldots ,j_s\ldots ,j_n) \rightarrow (j_1,\ldots j_s +1,\ldots ,j_n) \) by \( \lambda _{(j_1,\ldots j_s \ldots ,j_n),(j_1 \ldots j_s +1,\ldots ,j_n) }\). Form the difference-differential equations satisfied by the probability that the process, occupying the state \( (j_1,\ldots ,j_s\ldots ,j_n) \) at time t, moves to \( (j_1,\ldots j_s +1, \ldots ,j_n) \) in the interval \( (t,t+h) \), for \(s = 1,2,\ldots n \). The infinitesimal generator is highly sparse. which is represented below.

$$ \begin{bmatrix} -\lambda _{(1,\ldots ,1)(1,\ldots ,1)} &{} \lambda _{(1,\ldots ,1)(1,\ldots ,2)} &{} \dots &{}\lambda _{(1,\ldots ,1)(2,1,\ldots ,1)} &{} 0&{} \dots &{} 0&{} 0\\ \vdots &{} \ddots &{}\ddots &{}&{}&{}&{}&{}\\ 0&{}0&{}\dots &{}0&{}\dots &{} &{}-\lambda _{(r+1,\ldots r+1,r),(r+1,\ldots r+1,r)} &{} \lambda _{(r+1,\ldots r+1,r)(r+1,\ldots ,r+1,r+1)} \end{bmatrix} $$

At time zero, the system is new and so all components are in state 1 of their life time. Thus there will be a lot of computational advantage.Denote by S the matrix of coefficients in the difference-differential equations. The order of this matrix is \( n^r.(n-k+1) \). From the discussion above, we notice that only two types of events have positive probability of occurrence in an interval \( (t,t+h) \) where h is pretty small: (i) no change in \( (t,t+h) \)and (ii) transitions with change in only one coordinate. The remaining transitions have zero probability, thus leading to highly sparse infinitesimal generator. Solve the system of equations with the initial condition that at time zero, all components are in the best of their states (state/stage 1). Therefore with probability 1, the system occupies state \((1, 1,\ldots ,1)\) and so all other states are occupied initially with probability zero. Here again use the notation \( P_{(j_1,\ldots , ,j_n)} (t) \) to denote the unconditional probability of the system occupying state \( (j_1,\ldots ,j_n) \) at time t. The time derivative of this probability is denoted by \( P'_{(j_1, \ldots , j_n)} \). The vector of these probabilities is designated by P(t) and the corresponding vector of derivatives by \( P'(t) \). Solving the matrix differential equation \( P'(t) = P(t). S \) and using the initial condition indicated above, we get the solution given in

Theorem 3.5

The time T to failure of an interdependent \( k\hbox {-}out\hbox {-}of\hbox {-}n: G \) system, described in the above paragraph, is phase type distributed with representation \( (\pmb {\sigma },S) \) of order \( n^r.(n-k+1) \). where \( \pmb {\sigma } \) is the initial probability vector and S is the part of the infinitesimal generator corresponding to transient states. The number of transient states is \( r^n n!/[(k-1)!(n-k+1)!]. \)

Corollary 2

Comparison between the distributions obtained in Theorems 3.4 and 3.5 we see that the two types of \( k\hbox {-}out\hbox {-}of\hbox {-}n: G \) systems–the independent and interdependent component systems differ significantly.

The proof for interdependent systems in which two or more changes in coordinates can also take place with positive probability due a transition, can be described on similar lines. Therefore we omit the details for arriving at the distribution of the time till failure of such a system. The result is given in the following:

Theorem 3.6

The waiting time \( W' \) to failure of the interdependent component \( k\hbox {-}out\hbox {-}of\hbox {-}n: G \) system, is Phase type distributed with representation \(PH (\pmb {\eta },S \)) of appropriate order, where \( \pmb {\eta } \) is the part of the initial probability vector corresponding to the transient states and S is that part of the infinitesimal generator corresponding to the transient states. In fact \( \pmb {\eta } \) and \( \pmb {\sigma } \)differ only in dimension (their first entry is the same-an n- vector of 1’s and the remaining are zero-vectors of order n each) and S has all its entries filled by non zero quantities. S has all entries non zero because we assume that due to a transition, no change, one change, two changes,...,n changes in coordinates are possible with positive probabilities, due to a transition.

4 Conclusion

In this paper we analysed the reliability of independent two component serial and parallel systems to start with. Then we proceeded to derive the time to failure of such systems when their components are interdependent through a semi-Markov process. In all these we assumed that the life times of the components fall in a very general frame work. However, the analysis of the reliability of the k-out-of-n: G system was restricted to the case of the components degradation taking place in a graded fashion: from state(stage) 1 \( \rightarrow \)stage 2 \( \rightarrow \ldots \rightarrow \) stager and then to the state \( r+1 \), representing the failure of the component. In this case also we obtained the distribution of the time to failure of both independent and interdependent component systems. The technique employed was semi-Markov analysis. We showed in all cases that the system failure time distribution is of phase type with appropriate representations.

The approach employed in this paper is new in stochastic modeling in general, and in particular in reliability analysis. In fact the scope of this method is too vast that leads the author of this paper to make the following claim: given any finite number (at least two) of interdependent processes in any branch of study, it is possible to analyse the resulting processes using the procedure explained in this paper!