A reward-based performability modelling of a fault-tolerant safety–critical system

Ahamad, Shakeel; Gupta, Ratneshwer

doi:10.1007/s13198-023-02055-3

A reward-based performability modelling of a fault-tolerant safety–critical system

Original Article
Published: 03 August 2023

Volume 14, pages 2218–2234, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

A reward-based performability modelling of a fault-tolerant safety–critical system

Download PDF

72 Accesses
1 Citation
Explore all metrics

Abstract

Nowadays, various computer system carries out critical functions. The failure of these systems leads to unacceptable loss. Such systems are called Safety–Critical Systems (SCS). The Performance and Reliability of SCS should be high. So, the combined study of performance and reliability (called Performability) is an important issue. The testing of the system is also used to improve its performance. However, some issues might not be addressed in the testing procedure. Formal verification is used for developing secure software. In most of the research work, performability is obtained by operational systems or fail repair systems. Some studies have considered the fail-repair, including fault-tolerant systems. Safety–critical systems generally have fault-tolerant mechanisms to minimize the severity of the failure. This paper studies the safety–critical system's performability using the continuous-time Markov chain (CTMC) with a reward called the Markov reward model (MRM), keeping in mind the fail-repair, fault-tolerant characteristics of the systems. The various parameters of the performability have been analyzed. For mathematical calculation, python language is used. The case study illustrates the proposed approach.

An SRN-Based Resiliency Quantification Approach

Energy-Utility Analysis for Resilient Systems Using Probabilistic Model Checking

Reliability and Control Theory: An Integration Approach for Safety Analysis

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Many modern computers play a significant role in carrying out critical functionality. Failure of these systems leads to a massive loss of money, time, environment, and human life (John 2002). The SCSs are used in various domains like medical systems, finance systems, nuclear power plants, aviation, etc. (Pietrantuono and Russo 2013). The SCS include software and hardware, the essential factors in managing the high-level performance and reliability of the system. Performance and reliability are antagonists in nature (Tokuno and Yamada 2009). The combined study of reliability and performance is called performability (Mo et al. 2018). The performability analysis depends on various parameters that affect the reliability and performance of the system. During the analysis, one must formally understand the interrelationship of different parameters and their possible effects. So, the combined study of performance and reliability is essential to optimize the overall quality.

The quality assurance of such a system is essential for safety–critical systems. Quality is the umbrella term which is used in the various domain. The meaning of the quality in the software is a fitness of purpose. That explains the software system is working according to the Software system requirement (SRS). Fitness of the purpose is not only that represents the quality. There are various other parameters that show the quality of the system, like portability, maintainability, reusability, reliability, performance etc.

The safety–critical systems generally consist of various components. The failure of some parts does not lead to the collapse of the entire system (Singh et al. 2012). So, the safety–critical system works with the degraded system quality. So, the system has various operational states other than the extreme states (completely failed or operational). Such types of systems can be modelled using stochastic modelling. There are various stochastic models like the Petri net, Markov model, etc. These stochastic models efficiently handle performance and dependability-related issues (Gokhale et al. 2004). Markov model is of various types, i.e., discrete-time Markov model, continuous-time Markov model, semi-Markov, and hidden Markov model, based on the system orientation and design. The Markov reward method extends the Markov chain(Kwon and Agha 2007). The reward can be any effect (loss, cost, penalty), and it can be positive or negative.

Most researchers have used the Continuous-time Markov chain and Discrete-time Markov chain without reward to find the reliability or availability(Goel and Okumoto 1979; Trivedi et al. 2003; Wang 2004). The combined study of the performance and reliability needs more attention for the safety–critical system (Viktorova et al. 2018). This paper presents the continuous-time Markov reward model with the reward for the systems' performability analysis. In MRM, the reward matrices are changed according to the parameter under measurement. The accumulated reward on the particular state shows the measured parameter value. So, the different type of reward matrix for the different performability measure has been derived. The proposed methodology for the performability parameter analysis has been illustrated using the case study.

The paper is organized as follows: The related work has presented in Sect. 2. A brief introduction of the Markov chain with reward is given in Sect. 3. The performability parameters used for measuring performability are listed in Sect. 4. Section 5 presents the methodology for the Markov reward modelling for calculating the performability parameters. The step taken for calculating the system's performability is given in Sect. 6. Section 7 illustrates the case study to demonstrate performability measurement. Finally, Sect. 8 presents the future work and concludes the paper.

2 Related work

In the current scenario, various systems carry out critical functionality. So, the system's proper functioning mostly depends on the software, as the software is prone to errors and malfunctioning. There is a need to analyze the safety–critical system for the non-functional parameter properly. Reliability and performance are the most critical quality requirement of the system. The improvement of any one parameter may not be sufficient, e.g., in a fire alarm, If the system is highly reliable but has deficient performance. That leads to severe damage to the resource because the alarm signal is not timely. The same problem is also faced if high performance and reliability are low. So, the combined study of performance and reliability is significant and challenging for the researchers.

There is a large amount of research to find, predict, and estimate performance and reliability. Earlier, probability distributions are used considering the system's only binary state (up, down). But in recent times, the system is more complex and multitasking. So, the system has various other states than up and down states. That means if some component fails, it does not mean that the system completely fails. But the system work with degraded quality. Recently, the multistate stochastic model has determined performance and dependability parameters.

In the paper (Viktorova et al. 2018), a complex system is used to study performance, reliability, and performability. In this model, only the reliability-related parameter of the system is calculated. The distributed computer system's reliability and the number of failures are studied by Jin-Long Wang (Wang 2004) using the discrete Markov chain. The program's reliability runs on the particular terminal, and the overall system reliability is calculated. The directed graph is used to represent the distributed system that is taken for the case study. The two reliability parameters are used to calculate the system's reliability. One is Markov-chain distributed program reliability (MDPR) which calculates the reliability of the particular program of the distributed system. The second is Markov-chain distributed system reliability (MDSR), representing overall system reliability.

Lisnianski also derives the system's reliability with different capacities and demands. This paper finds the consolidated performance, but there is a lack of information about how much time is spent on different levels. The general approach is suggested to compute commonly used reliability measures(Lisnianski 2007). The general Markov reward model has been built according to the approach so that the corresponding reward matrix determination can calculate different reliability measures. The performability model for the wireless network is presented in the paper (Trivedi et al. 2003). The Erlang loss model creates composite and hierarchical Markov chains to derive loss formulas for a system with channel failures. The theory of queuing is used to create these formulas. For the blocking probability in a loss system (i.e., no waiting room in the system, that mean number of servers is equal to the number of customers), use the Erlang-B formula. For the wait probability in a delay system, use the Erlang-C formula. A reversible Markov process can be used to simulate a network of queues, as in the multidimensional Erlang-B formula for the blocking probability in a loss system with several classes of calls and various server occupations. In (Goel and Okumoto 1979) papers, a Markovian model has been presented for software error failures that are not removed, e.g., imperfect debugging. A compositional method for estimating the software reliability of the multi-threads program is developed in the paper (Kwon and Agha 2007). The reliability is calculated based on the reliability of the individual component and the transition among them. Only accurate data can be used to extract information for Condition-Based Maintenance (CBM), which is based on sensors(Martins et al. 2023).

Many approaches are used for the performability analysis of critical safety systems. But the stochastic method is efficient for measuring, predicting, and estimating the performability of the complex system. Various researchers used the Markov model for the individual parameter (reliability availability calculation). This paper presents an approach based on the combined study of reliability and performance using the Markov reward model. The major contribution of this research work is to analyze the performance and reliability of the software systems combined manner that perform the critical functionality. This study also helps in reducing the probability of a failure system and helps in taking decisions for enhancing performability. Because the Performability analysis is performed in the early stage of the system development, so, decisions can easily be incorporated into the systems.

3 Markov chain/process with reward

Markov chain is the stochastic model that shows all the possible events of the system (Norris 1998). Markov process has transitions and states. The states represent the possible system condition. The transition represents the events that carry the system from one state to another. The state's transition from one state to another depends on the current state. The state transition history to reach the present state does not affect the next transition. This property is known as the memoryless property (Mikosch and Kallenberg 1998). The Markov process's state space is the set of all possible states. The transition matrix shows the probability of the transition. The sum of the probability of each row is equal to one. There are two types of the Markov chain based on time, viz. discrete-time Markov chain (DTMC) and continuous-time Markov chain (CTMC). A statistical method known as a hidden Markov model (HMM) proceeds through a number of states that are 'hidden' from the observer (Sotelo et al. 2023).

The Markov chain can efficiently represent a real-world multistate stochastic problem (Bas 2019) if the time spent on the state (sojourn time) does not follow the exponential distribution. Then this Markov chain is called the semi-Markov chain. One interesting fact is that knowledge must not decide which state to enter next, although it remembers how long the current state has lasted. The stochastic model can be used for collective and individual performance and reliability.

The Markov reward method extends the Markov chain(Kwon and Agha 2007). The reward can be any effect, e.g., loss, cost, penalty, etc. A reward can be positive or negative. Using the reward in the Markov model for the analysis is called the Markov reward model (MRM). In the MRM, there is a reward associated with each state. The reward variable shows the stated reward up to the time t. Using the Markov models with reward to measure the performability parameters is more efficient than without reward.

4 A markov model representation of a system under study

The Markov model for a system has states and transitions among the states. The transition among the state is based on some rate parameters (failure rate and repair rate)(Karlin and M. Taylor 1975). A system and its different states are designed to demonstrate the failure and repair scenario with the help of the Markov model. A circle denotes the states of the system, and arrows denote the transition path between states, as illustrated in Fig. 1. Only the failure transitions are used for the reliability-related measure, while failure and repair transitions are used for the availability and performance measurement. Let suppose $\lambda$ is the failure rate and the $\mu$ is the repair rate for the system. X(t), t $\ge 0$ represents the system's state at t. The quality measurement of a system depends on the quality of its sub-systems. So, the system's quality can function of the states at time t (Mo et al. 2018). This case assumes that the system has a dual redundant fault-tolerant system. State p shows the primary component, r state shows the redundant, and b state shows the backup state. A state diagram of a system is drawn below to show the system's different failure and repair scenarios.

Here,

μ_p: the Repair rate of the primary component.

μ_r: the Repair rate of the redundant component.

μ_b: the Repair rate of the backup component.

λ_r: The failure rate of the primary component.

λ_r: The failure rate of the redundant component.

λ_b: The failure rate of the backup component.

The transition from state $i$ to $j$ takes place at some transition rates. In this study, two transition rates have been considered, i.e., failure rate $\lambda$ and repair rate $\mu$. These rates can also be represented using the Matrix.

$\Lambda =\left[{a}_{i,j}\right]$, $\Lambda$ is the $n*n$ Matrix and ${a}_{i,i}=-\sum_{j,j\ne i}{a}_{i,j}$

For the Continuous-Time Markov Chain, the diagonal element of the matric is the sum of the rest row elements with a minus sign. A specific cost is accumulated for each state while staying and transitioning to another state.

$W=\left[{w}_{ij}\right]$ _n*n.

The reward matric depends on the quality parameter under measurement. The reward (${w}_{ii})$ is the state's reward when there is no transition to the same state. The reward (${w}_{ij})$ denotes the reward for the transition from $i$ to $j$ state. If all the rewards are assigned zero, the MRM is the same as the ordinary Markov chain. The total reward accumulated up to time t under the initial conditions represents the parameters. The transition intensities do not depend on the time t. That means remaining constant for the homogeneous Markov processes.

The next step is calculating the accumulated reward for the state $i$ using the Howard differential equation (A. Howard 1960).

$$\frac{{V}_{i}\left(t\right)}{dt}= {w}_{ii}+\sum_{j=1,j\ne i}^{n}{a}_{ij}{w}_{ij}+\sum_{j=1}^{n}{a}_{ij}{V}_{j}\left(t\right) , i=1 . . . \dots ..n$$

(1)

Here,

${V}_{i}\left(t\right)$ Denote the accumulated reward at time $t$ at state $i$.

${w}_{ii}$ is the reward for state $i$

${w}_{ij}$ is the transition reward from state $i$ to $j$

The transition and reward matrices are created to calculate the performability parameters. The set of differential equations is written based on Eq. 1.

5 The performability measures used for the study

For the performability analysis, various parameters have been defined in the study of a given system. Generally, performability is defined as a combination of performance and dependability (Mo et al. 2018). Dependability has the four-parameter viz reliability, availability, safety, and security. This paper has taken two dependability parameters (Reliability and Availability). Table 1 shows the study's performability parameters used in this paper.

Table 1 The performability parameters and their definitions

Full size table

The activity block diagram is given in Fig. 2. That describes the process of the performability analysis.

6 Design and development of the proposed performability model

For a system designed in Sect. 3 (Fig. 1), a methodology for performability measurement has been proposed in different steps. The model is depicted through the activity diagram, as shown in Fig. 2. Further, the steps are explained in the following subsections. Different dependability and performance measures are used to obtain performability. We used the reward process for different transitions mentioned in the respective reward matrices.

6.1 Reliability

For reliability, the first step is to define the system's reward and transition matrix. Besides, we consider the failed state as an absorbing state, so we delete all arcs in the Markov graph leading from the failed state to the working state (Viktorova et al. 2018). The elements of the reward matrix are as.

For the reward matrix ${r}_{ii}=\left\{\begin{array}{c}1, \forall i=j and i\in operational\\ 0, otherwise\end{array}\right.$

The reward matrix, according to the discussion

$$W=\left[\begin{array}{cc}\begin{array}{c}1\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 1\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}1\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 1\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\right]$$

For the safety–critical system, reliability should be very high. We consider that if at least two components working is considered acceptable, another failed state. So, the updated transition matrix is as follows.

$$\Lambda =\left[\begin{array}{cc}\begin{array}{c}-\left({\lambda }_{p}+{\lambda }_{r}+{\lambda }_{b}\right)\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{\lambda }_{p}\\ -\left({\lambda }_{r}+{\lambda }_{b}\right)\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{\lambda }_{r}\\ 0\\ \begin{array}{c}-\left({\lambda }_{p}+{\lambda }_{b}\right)\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{\lambda }_{b}\\ 0\\ \begin{array}{c}0\\ -\left({\lambda }_{p}+{\lambda }_{r}\right)\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ {\lambda }_{r}\\ \begin{array}{c}{\lambda }_{p}\\ 0\\ \begin{array}{c}-\left({\lambda }_{b}\right)\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ {\lambda }_{b}\\ \begin{array}{c}0\\ {\lambda }_{p}\\ \begin{array}{c}0\\ -\left({\lambda }_{r}\right)\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}{\lambda }_{b}\\ {\lambda }_{r}\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}-\left({\lambda }_{p}\right)\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}{\lambda }_{b}\\ {\lambda }_{r}\\ \begin{array}{c}{\lambda }_{p}\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\right]$$

The differential equation for the total accumulated reward is written down according to Eq. 1.

$$\begin{gathered} \frac{{dV_{1} \left( t \right)}}{dt} = 1 - \left( {\lambda_{p} + \lambda_{r} + \lambda_{b} } \right)V_{1} \left( t \right) + \lambda_{p} V_{2} \left( t \right) + \lambda_{r} V_{3} \left( t \right) + \lambda_{b} V_{4} \left( t \right) \hfill \\ \frac{{dV_{2} \left( t \right)}}{dt} = 1 - \left( {\lambda_{r} + \lambda_{b} } \right)V_{2} \left( t \right) + \lambda_{r} V_{5} \left( t \right) + \lambda_{b} V_{6} \left( t \right) \hfill \\ \frac{{dV_{3} \left( t \right)}}{dt} = 1 - \left( {\lambda_{p} + \lambda_{b} } \right)V_{3} \left( t \right) + \lambda_{p} V_{5} \left( t \right) + \lambda_{b} V_{7} \left( t \right) \hfill \\ \frac{{dV_{4} \left( t \right)}}{dt} = 1 - \left( {\lambda_{p} + \lambda_{r} } \right)V_{4} \left( t \right) + \lambda_{p} V_{6} \left( t \right) + \lambda_{r} V_{7} \left( t \right) \hfill \\ \frac{{dV_{5} \left( t \right)}}{dt} = - \left( {\lambda_{b} } \right)V_{5} \left( t \right) + \lambda_{b} V_{8} \left( t \right) \hfill \\ \frac{{dV_{6} \left( t \right)}}{dt} = - \left( {\lambda_{r} } \right)V_{6} \left( t \right) + \lambda_{r} V_{8} \left( t \right) \hfill \\ \frac{{dV_{7} \left( t \right)}}{dt} = - \left( {\lambda_{p} } \right)V_{7} \left( t \right) + \lambda_{p} V_{8} \left( t \right) \hfill \\ \frac{{dV_{8} \left( t \right)}}{dt} = 0 \hfill \\ \end{gathered}$$

(2)

For solving the differential equation, each state's accumulated reward obtains. At the initial time, we consider that all component is working. So, the accumulated reward at state 1 shows the system's Reliability (Kwon and Agha 2007).

6.2 Mean number of failures

For the Mean Number of failures, elements of the reward matrix are put as one if there is a transition from the operational state to the failure state. The failed state is considered the absorbing state (Viktorova et al. 2018). The differential equation is derived from Eq. (1).

6.3 Availability

Availability is defined as the system is operational at a time instant t. To calculate availability, we must calculate the average accumulated time spent in the operational states during the time interval (0, t) (Lisnianski 2007). The repair rate is also included in the transition matrix.

For the reward matrix $r_{{ii}} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {\forall i = jandi \in operational} \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right.$

The reward matrix is given below for the availability calculation.

$$W=\left[\begin{array}{cc}\begin{array}{c}1\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 1\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}1\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 1\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\right]$$

The transition matrix of the proposed Markov system

$$\Lambda =\left[\begin{array}{cc}\begin{array}{c}-\left({\lambda }_{p}+{\lambda }_{r}+{\lambda }_{b}\right)\\ {\mu }_{p}\\ \begin{array}{c}{\mu }_{r}\\ {\mu }_{b}\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{\lambda }_{p}\\ -\left({\lambda }_{r}+{\lambda }_{b}+{\mu }_{p}\right)\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}{\mu }_{r}\\ {\mu }_{b}\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{\lambda }_{r}\\ 0\\ \begin{array}{c}-\left({\lambda }_{p}+{\lambda }_{b}+{\mu }_{r}\right)\\ 0\\ \begin{array}{c}{\mu }_{p}\\ 0\\ \begin{array}{c}{\mu }_{b}\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{\lambda }_{b}\\ 0\\ \begin{array}{c}0\\ -\left({\lambda }_{p}+{\lambda }_{r}+{\mu }_{b}\right)\\ \begin{array}{c}0\\ {\mu }_{p}\\ \begin{array}{c}{\mu }_{r}\\ 0\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ {\lambda }_{r}\\ \begin{array}{c}{\lambda }_{p}\\ 0\\ \begin{array}{c}-\left({\lambda }_{b}+{\mu }_{p}+ {\mu }_{r}\right)\\ 0\\ \begin{array}{c}0\\ {\mu }_{b}\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ {\lambda }_{b}\\ \begin{array}{c}0\\ {\lambda }_{p}\\ \begin{array}{c}0\\ -\left({\lambda }_{p}+{\mu }_{p}+{\mu }_{b}\right)\\ \begin{array}{c}0\\ {\mu }_{r}\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}{\lambda }_{b}\\ {\lambda }_{r}\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}-\left({\lambda }_{p}+{\mu }_{b}+{\mu }_{r}\right)\\ {\mu }_{p}\end{array}\end{array}\end{array}\end{array}& \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}{\lambda }_{b}\\ {\lambda }_{r}\\ \begin{array}{c}{\lambda }_{p}\\ -({\mu }_{b}+{\mu }_{r}+{\mu }_{p)}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\right]$$

The following differential equations were obtained using Eq. 1. That shows the accumulated reward for the availability of each state.

$$\begin{gathered} \frac{{dV_{1} \left( t \right)}}{dt} = 1 - \left( {\lambda_{p} + \lambda_{r} + \lambda_{b} } \right)V_{1} \left( t \right) + \lambda_{p} V_{2} \left( t \right) + \lambda_{r} V_{3} \left( t \right) + \lambda_{b} V_{4} \left( t \right) \hfill \\ \frac{{dV_{2} \left( t \right)}}{dt} = 1 + \mu_{p} V_{1} \left( t \right) - \left( {\lambda_{r} + \lambda_{b} + \mu_{p} } \right)V_{2} \left( t \right) + \lambda_{r} V_{5} \left( t \right) + \lambda_{b} V_{6} \left( t \right) \hfill \\ \frac{{dV_{3} \left( t \right)}}{dt} = 1 + \mu_{r} V_{1} \left( t \right) - \left( {\lambda_{p} + \lambda_{b} + \mu_{r} } \right)V_{3} \left( t \right) + \lambda_{p} V_{5} \left( t \right) + \lambda_{b} V_{7} \left( t \right) \hfill \\ \frac{{dV_{4} \left( t \right)}}{dt} = 1 + \mu_{b} V_{1} \left( t \right) - \left( {\lambda_{p} + \lambda_{r} + \mu_{p} } \right)V_{4} \left( t \right) + \lambda_{p} V_{6} \left( t \right) + \lambda_{r} V_{7} \left( t \right) \hfill \\ \frac{{dV_{5} \left( t \right)}}{dt} = \mu_{r} V_{2} \left( t \right) + \mu_{p} V_{3} \left( t \right) - \left( {\lambda_{b} + \mu_{r} + \mu_{p} } \right)V_{5} \left( t \right) + \lambda_{b} V_{8} \left( t \right) \hfill \\ \frac{{dV_{6} \left( t \right)}}{dt} = \mu_{b} V_{2} \left( t \right) + \mu_{p} V_{4} \left( t \right) - \left( {\lambda_{r} + \mu_{b} + \mu_{p} } \right)V_{6} \left( t \right) + \lambda_{r} V_{8} \left( t \right) \hfill \\ \frac{{dV_{7} \left( t \right)}}{dt} = \mu_{b} V_{3} \left( t \right) + \mu_{r} V_{4} \left( t \right) - \left( {\lambda_{p} + \mu_{b} + \mu_{r} } \right)V_{7} \left( t \right) + \lambda_{p} V_{8} \left( t \right) \hfill \\ \frac{{dV_{8} \left( t \right)}}{dt} = \mu_{b} V_{5} \left( t \right) + \mu_{r} V_{6} \left( t \right) + \mu_{p} V_{7} \left( t \right) - \left( {\mu_{b} + \mu_{r} + \mu_{p} } \right)(V_{8} \left( t \right) \hfill \\ \end{gathered}$$

(3)

Solving the above differential equation shows the availability of the system. After solving equations, accumulated reward $({V}_{i}\left(t\right))$ is obtained for each state. We consider that all the components are working at the initial time, so the V₁(t) shows the average availability of the system (Viktorova et al. 2018). The average availability is obtained by dividing the total accumulated reward by the time.

The system is considered running for a long time for steady-state availability. At steady-state, system reliability does not change with time. So, the change rate of the differential equation is zero. So, the left-hand side of the differential Eqs. 3 should be set as zero to calculate the availability at the steady state.

6.4 Performance

First, the performance issue of the systems is described. Let's suppose a system has different levels of performance $l=\left\{{l}_{1},{l}_{2},{l}_{3}\dots {l}_{i}\dots \dots {l}_{n}\right\}$ where the performance level (${l}_{i})$ is the level of performance of the system at state $i$. ${\{L}_{i}\left(t\right)\ge 0, { L}_{i}\left(t\right)\in {l}_{j}\}$ (Goševa-Popstojanova and Trivedi 2000). It is a stochastic process that shows the performance level of the at state $i$ at time $t$. Let's suppose the Q is the quality associated with each state of the system. The state change with the time quality associated with the condition also changed (Goel and Okumoto 1979). So, the quality is the function of the S(t) (achieved state at time t), i.e., $Q\left(S\left(t\right)\right)={Q}_{cr}\left(t\right)$ Or ${Q}_{cr}\left(t\right)={w}_{i}$ The ${Q}_{cr}\left(t\right)$ is the current value of the quality at state (Toledano et al. 2016), or we can say it is a reward at state $i$. So, the performance level and the demand performance can be represented by the stochastic processes shown in Fig. 3.

Suppose we want to represent the weight for each state and change of weight from state $i$ to j. The Matrix W can define that.

$W= {[w}_{i,j}]$ _n*n,

The reward (${w}_{i,j})$ is the effect on the system in terms of quality arising from the transition from state $i$ to$j$. Transition state $i$ to $i$ i.e ${w}_{i,i}={w}_{i}$. Suppose we consider the three performance levels: low, medium, and high. Another stochastic process ${D}_{i}\left(t\right)\ge 0$ shows the demand performance level at state$i$. For each state, there are two possibilities regarding whether demand may satisfy or not (Temraz and El-Dmcese 2011). The whole state space can be divided into two disjoint sets based on the condition. Demand satisfy states are the acceptable states, and that state not fulfilling the demand is categorized as a failed state. The acceptability function for the state is the function of the level of performance and the demand (Temraz and El-Dmcese 2011).

$$Q\left\{{L}_{i}\left(t\right),{D}_{i}\left(t\right)\right\}=\left\{\begin{array}{c}{L}_{i}\left(t\right),-{D}_{i}\left(t\right), \left\{{L}_{i}\left(t\right)-{D}_{i}\left(t\right)\right\}\ge 0\\ 0, else\end{array}\right.$$

In Homogeneous Markov processes, the failure and repair rates remain independent of time t (Strielkina et al. 2018). Let's take the throughput as the demanded throughput as the performance parameter. The demand throughputs (D₁, D_2, and D₃) and provided throughputs by the system (T_1, T_2, and T₃₎ are shown in Fig. 3_. Demand and capacity stochastic matrices are merged to get the combined transition matrix shown in Fig. 4.

The following transition matrix is created based on the generating and demand capacity transition matrices. The transition rate assignment is done according to the rule in the paper (Lisnianski 2007). In this, only the horizontal and vertical transition is considered. The diagonal transition is ignored for the simplicity of calculating the transition rate.

$$\Lambda =\left[\begin{array}{cc}\begin{array}{c}-\left({y}_{1}\right)\\ {b}_{\mathrm{2,1}}\\ \begin{array}{c}{b}_{\mathrm{3,1}}\\ {a}_{\mathrm{2,1}}\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}{a}_{\mathrm{3,1}}\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{b}_{\mathrm{1,2}}\\ -\left({y}_{2}\right)\\ \begin{array}{c}{b}_{\mathrm{3,2}}\\ 0\\ \begin{array}{c}{a}_{\mathrm{2,1}}\\ 0\\ \begin{array}{c}0\\ \begin{array}{c}{a}_{\mathrm{3,1}}\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{b}_{\mathrm{1,3}}\\ {b}_{\mathrm{2,3}}\\ \begin{array}{c}-\left({y}_{3}\right)\\ 0\\ \begin{array}{c}0\\ {a}_{\mathrm{2,1}}\\ \begin{array}{c}0\\ \begin{array}{c}0\\ {a}_{\mathrm{3,1}}\end{array}\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{a}_{\mathrm{1,2}}\\ 0\\ \begin{array}{c}0\\ -\left({y}_{4}\right)\\ \begin{array}{c}{b}_{\mathrm{2,1}}\\ {b}_{\mathrm{3,1}}\\ \begin{array}{c}{a}_{\mathrm{3,2}}\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ {a}_{\mathrm{1,2}}\\ \begin{array}{c}0\\ {b}_{\mathrm{1,2}}\\ \begin{array}{c}-\left({y}_{5}\right)\\ {b}_{\mathrm{3,1}}\\ \begin{array}{c}0\\ \begin{array}{c}{a}_{\mathrm{3,2}}\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ {b}_{\mathrm{1,3}}\\ \begin{array}{c}{b}_{\mathrm{2,3}}\\ -\left({y}_{6}\right)\\ \begin{array}{c}0\\ \begin{array}{c}0\\ {a}_{\mathrm{3,2}}\end{array}\end{array}\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}{a}_{\mathrm{1,3}}\\ 0\\ \begin{array}{c}0\\ {a}_{\mathrm{2,3}}\\ \begin{array}{c}0\\ 0\\ \begin{array}{c}-\left({y}_{7}\right)\\ \begin{array}{c}{b}_{\mathrm{2,1}}\\ {b}_{\mathrm{3,1}}\end{array}\end{array}\end{array}\end{array}\end{array}& \begin{array}{c}\begin{array}{cc}0& 0\end{array}\\ \begin{array}{cc}{a}_{\mathrm{1,3}} & 0\end{array}\\ \begin{array}{c}\begin{array}{cc}0& {a}_{\mathrm{1,3}}\end{array}\\ \begin{array}{cc}0& 0\end{array}\\ \begin{array}{c}\begin{array}{cc}{a}_{\mathrm{2,3}}& 0\end{array}\\ \begin{array}{cc}0& {a}_{\mathrm{2,3}}\end{array}\\ \begin{array}{c}\begin{array}{cc}{b}_{\mathrm{1,2}}& {b}_{\mathrm{1,3}}\end{array}\\ \begin{array}{cc}\begin{array}{c}-\left({y}_{8}\right)\\ {b}_{\mathrm{3,2}}\end{array}& \begin{array}{c}{b}_{\mathrm{2,3}}\\ -\left({y}_{9}\right)\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\end{array}\right]$$

$$available throughput=pr\left\{\varnothing \left(t\right)\ge 0\right\}$$

For calculating, we have defined the reward for each state. The acceptable states have the reward one. The unacceptable state and transition have the reward of zero. All the rewards associated with each state are determined using Eq. 1. Usually, the state with the greatest capacity level and minimum demand is considered as an initial state (Temraz and El-Dmcese 2011).

$$average available throughput TH\left(T\right)=\sum \frac{{V}_{i}\left(T\right)}{T}$$

A similar approach can be used for the response time and other performance parameters. Calculating the reward determines how much the system remains at the particular performance level. The reward related to the state with the required throughput or more is defined as 1, otherwise 0.

6.5 Performability

For critical safety systems, performability is very important to ensure that the system's quality is according to specification (Eshragh and Kargahi 2013). In this paper, performability only considers the reliability and performance of the system. Sections 6.1 and 6.2 measure the system's reliability, while subSect. 6.3 measures the combined performance and availability. The parameters measured in the above subsection show performability of the system. The combined parameter is named the available throughput. Another important performability measure is calculated to show the different performance levels of time spent during operations.

To explain how this methodology is beneficial. We must understand that the safety–critical system with high reliability and low performance or low reliability and high performance has no worthwhile performing the critical functionality. Because performance describes the system's responsiveness, and reliability shows how accurately the system produces the result. So, the performance and reliability should be combined to get the optimal solution for software reliability and performance. Second thing, Software security has a direct impact on both its reliability and performance. For example, security lapses may result in downtime, permit the entry of viruses that turn off essential functions, or give criminals access to your data. However, overly complicated security mechanisms can cause the system to lag and prevent integrations with other programs.

7 Case study

For the illustration of the proposed methodology in this work, The case study presented by (Kalaiarasi et al. 2017) is taken for study. The Markov chain graph is shown in Fig. 1. If at least two components out of three-component are in operational mode, then the system is considered operational; otherwise, the system is in a failed state. The system under study has three components; there are eight possible states. There are four operational states, and four are failed states. So, the state set is categorized as the set of failed and operational states. Figure 5 shows all-possible states of the system.

Assumptions

A system with components is presented using CTMC, where states show operational and state transition indicates the movement to the other states.
Component failure and repair rates are statistically independent and exponentially distributed.
Link failures and repairs trigger transitions from one state into another.
At most, one component can fail or repair at a particular time.
All the components are working perfectly at the initial time.

The transition rate triggers the transition from one state to another. There are two types of transition rated failure rate and repair rate. The system has finite state space$s=\left\{0, 1, 2, 3, \dots \dots ., 7\right\}$. The homogeneous CTMC ($X\left(t\right), t\ge 0$) describes the stochastic nature. Transition rate matrix $\Lambda$

After identifying operational and failed states, the Markov transition diagram of the system with non-repairable capability was generated in Fig. 6, removing all the repair rate transitions from Fig. 1. If the two components are upstate, the state is considered the operational state; otherwise, fail state. So, the number of operational states is 4, and the number of failed states is 4. There is an 8*8 transition matrix $\Lambda$ should be created. The failure rate ${\lambda }_{1}=0.03+0.002*t$, ${\lambda }_{2}=0.03+0.001*y and{\lambda }_{3}=0.01$, the repair rate${\mu }_{1}=0.1$, ${\mu }_{2}=0.1$ And ${\mu }_{3}=0.05$ (Wang 2004). the first two failure rates are time-dependent. That means the failure of the component increases the time passes.

7.1 Mean Number of failures during the time interval (0, T)

The aim of calculating the mean number of failures is to put the trend of the system failure. The failure trend prediction is the challenge. This measure helps the fault tolerance technique make proactive action to manage the failure (Kalaiarasi et al. 2017). By putting all the failure and repair rates, the accumulated reward obtained in each state is given in Table 1. At the initial time system have all the component upstate. So, column zero shows the Mean no. of failure during the time (0, T) (Lisnianski 2007). Table 2 shows the mean number of failures during the time interval.

Table 2 Table showing the Mean no. of failure

Full size table

As the table shows, the mean number of failures is zero at time instance zero. As time increases, the number of failures also increases. This measure helps the system manager understand the system failure trend. That may allow the system manager to determine how much time the system goes to the failure stage. The graph representing the mean no of failures is given in Fig. 7.

7.2 Reliability

In the multistate system, the components have many objects. The mutual interaction influences the reliability of the system. Only Failure rates are considered for the reliability measurement (Lisnianski 2007). The set of Eq. 2 is used for the calculation of the reliability. Rewards related to all the operational states are considered 1, and all the failed states and transitions are considered 0. The accumulated reward is given in Table 3 up to the time t.

Table 3 Table showing the reliability

Full size table

Column zero shows the system's reliability because, at the initial time, all the components are working. That is represented by the state 0 (Temraz and El-Dmcese 2011). The reliability of state 1, state 2, and state 3 are less than state 1. These states are more prone to transit into the failure state. The reliability graph is shown in Fig. 8. As time passes, the reliability decreases exponentially. The system reaches zero after 140 years.

Although the reliability also greatly depends on the various characteristics of the systems. Software or specialized hardware monitoring can improve the overall system reliability. To improve the reliability of various faults, tolerate feature is added to the safety–critical system. In this case, the availability measures show the overall availability of the system. During the operation time, several failures and repairs occur. The fault-tolerance features can eliminate or reduce service disruption whenever the equipment fails by providing alternate routing and restoring lost connections(Wang 2004).

7.3 Availability

The time the system remains operational for the availability analysis is calculated. The system transition matrix has repair and failure transitions (Khvatskin and Frenkel 2017). The reward matrix and transition matrix are defined in subsection 5.2. Using the set of differential Eq. 3 used for the calculation. Table 4 shows the accumulated reward for each state that shows the amount of time spent.

Table 4 Table showing the availability

Full size table

The system's initial state ${V}_{1}(t)$ shows that all the components are working. Using the differential Eqs. 3, we obtain the average accumulated time spent on the working state (Lisnianski and Frenkel 2009). As time passes, the availability decreases, as shown in Fig. 9. At the steady-state, availability is 0.90. Similarly, the availability of state 2 is less than one as state 2 is more prone to failure.

7.4 Performance

The performance has various parameters, but the average available throughput and time spent on the different performance levels are considered. These are discussed in the following subsections.

7.4.1 Average available throughput

Subsection 5.3 presents a system with three different generating and demand capacities. The stochastic process represents demand and capacity, as shown in Fig. 3. For calculating the available throughput, these two stochastic processes are combined into a single stochastic process Fig. 4. The resulting process has nine states. Their failure and repair rates are given. If the demand is less than the generating capacity for any state, it is considered the acceptable state or else failed state. The numerical base data is taken from paper (Lisnianski 2007). We extended all two transitions to three transitions. Below is the transition matrix for the demanded throughput, generating throughput, and combined transition matrix.

$${a}_{ij}=\left[\begin{array}{ccc}-500& 100& 400\\ 200& -1000& 800\\ 1& 10& -11\end{array}\right]$$

$${b}_{ij}=\left[\begin{array}{ccc}-540& 156& 391\\ 900& -1110& 210\\ 1000& 110& -11\end{array}\right]$$

The combined stochastic transition matrix C is based on Matrix a and Matrix b.

$$C_{{ij}} = \left[ {\begin{array}{*{20}c} { - 1047} & {156} & {391} & {100} & 0 & 0 & {400} & 0 & 0 \\ {900} & { - 1610} & {210} & 0 & {100} & 0 & 0 & {400} & 0 \\ {1000} & {110} & { - 1610} & 0 & 0 & {100} & 0 & 0 & {400} \\ {200} & 0 & 0 & { - 1547} & {156} & {391} & {800} & 0 & 0 \\ 0 & {200} & 0 & {900} & { - 2110} & {210} & 0 & {800} & 0 \\ 0 & 0 & {200} & {1000} & {110} & { - 2110} & 0 & 0 & {800} \\ 1 & 0 & 0 & {10} & 0 & 0 & { - 1302} & {900} & {391} \\ 0 & 1 & 0 & 0 & {10} & 0 & {900} & { - 1121} & {210} \\ 0 & 0 & 1 & 0 & 0 & {10} & {1000} & {110} & { - 1121} \\ \end{array} } \right]$$

The throughput generating levels are L = {70, 89, 100} and the demand throughput level are D = {80, 85, 90}(Lisnianski 2007). When these values are put into the transition matrix represented in subSect. 6.3. The state where the demand satisfies (acceptable states) are 4,5,7,8,9, and failed states are 1,2,3,6.

To find the average availability throughput $A (T)$ according to the introduced approach. we should present the reward matrix w. By using differential Eq. 1, the nine-differential equation generated. The initial state is where demand is minimum and generating capacity is high. So, the initial state in our case is state 7. So, state 7 shows the available throughput of the system. Table 5 shows the accumulated reward of the states for the available throughput.

Table 5 Average available throughput

Full size table

The table revealed that the available throughput is approximately 0.9917 as time passes. The throughput decreases slightly in Fig. 10. The range of the y-axis is (0.991745, 0.991775). The values of the y-axis are converted into the normalized form. The available throughput remained unchanged after 20 years.

7.4.2 Time spent different performance levels

The complex system work on different performance level. How much time the system works in the different working states is also an important measure to show the performability of the safety–critical system (Januzaj et al. 2009). In this section, the system is classified into the four-performance level, i.e. (high, medium, low, and fail). The high-performance level state is state 7. The Medium performance states are 4,8,9. The low-performance state is state 5. State 1,2,3,6 represent the failure states of the system.

To calculate the high performance, we have to assign a reward ${w}_{77}=1$ rest of the reward should be zero. The high-performance reward is shown in Table 6. For the medium performance level reward ${w}_{44}={w}_{88}={w}_{99}=1$. The accumulated reward for the low-performance level is given in Table 7. For the low-performance level reward of state ${w}_{55}=1$. The accumulated low reward is represented in Table 8. Similarly, for the failure time that the system spent, the reward for the failed states should be assigned 1, i.e., ${w}_{11}={w}_{22}={w}_{33}={w}_{55}=1$. The accumulated failed reward for the state is given in Table 9.

Table 6 High-Performance time spent in percentage

Full size table

Table 7 Medium Performance Level (PL) time spent in percentage

Full size table

Table 8 Low-Performance time spent in percentage

Full size table

Table 9 Failed time spent in percentage

Full size table

For calculating the percentage time spent on the different, the total accumulated reward is divided by time t. Table 6 shows the high-performance percentage time spent at the initial time is 0.629015. As time passes, the high-performance probability decreases. After some time, this reached to steady-state, as shown in Fig. 11. Similarly, the time spent between medium, low, and failed states is also given in Tables 6, 7, and 8.

The throughput availability in the medium is 0.3617 (approximately), low performance (0.001185), and at the failed state, 0.00824 (approximately). The sum of all the performance level probability equals one at each time. E.g., at time instance1, the sum of the probability of all the levels (0.629015 + 0.361573 + 0.001184 + 0.008228 = 1) is 1.

The high, medium and low-performance levels show the system is operational. These performance levels show the system's availability calculated in subSect. 7.4.1. the failed state availability shows the system's unavailability during the operation.

The corresponding line graph for the different performance level probability is given the Fig. 11. Initially, the system is at a high-performance level. So, the high-performance graph in Fig. 11 shows the decrease in the probability. The other performance level at the initial time is zero. So, as time increases, the probability of the performance level increases with time. After some time, all the performance levels reached a steady state. The performability measures in this study show that the system is performing with high availability and throughput during the operational time. The primary purpose of this study is that a safety–critical system should accomplish a task within the deadline.

8 Limitations and possible scalability of the proposed approach

In investigations of predictability, Markov models are frequently used. For instance, Markov models empirically predict future fields from the present and past. It has been argued that utilizing Markov models to assess climate sensitivity can help solve some issues arising in time series research due to sampling errors. Markov models are widely used. However, they cannot be developed from deterministic, dynamical models in a precise way. This raises the question of when Markov models are suitable for modelling dynamical systems. The exponential expansion in the number of states with rising system complexity is one of the Markov model's drawbacks. It makes applying Markov-based state model generation for complex systems more difficult and increases computational resource consumption. This paper introduces a method for decomposing the target system into independent sub-systems and adopting system-level failure rates of the sub-systems estimated individually by the developed formulas. The method is based on decomposing the target system into independent sub-systems. A straightforward model of the target system can be created using the failure rates of the sub-systems.

9 Conclusion and future work

Safety Critical Systems should have high performance and reliability. If the system's reliability improves by adding some mechanism, it affects its performance and vice-versa. So, the combined study of performance and reliability (called performability) is essential and produces significant knowledge to improve the overall system quality. A combined study of performance and reliability is done based on the continuous-time Markov chain with rewards in this paper. The performability analysis is done based on the definition of the performability. The major part considered for performability analysis is dependability and performance. From the dependability, reliability and availability attributes are taken. For reliability, the mean number of failures was calculated. For the performance, the overall available throughput of the system is calculated. Further, time spent by the system during the operation at the different performance levels has been measured.

Future research can be dedicated to developing a more general dependability model for the safety–critical system. If state holding time does not follow the exponential distribution, the system cannot be modelled using the ordinary Markov chain. In practice, various systems do not hold the exponential distribution rule. This problem requires a more sophisticated modelling tool. This issue can be modelled using semi-Markov modelling. One more research dimension can be immersed as the automatic performability modelling using the suitable action based on the reward. Reinforced learning is the basic technique that can help to address this challenge. In reinforcement learning, the system can make a suitable decision based on the system requirement based on the reward.

References

Bas E (2019) An introduction to markov chains. Basics of Probability and Stochastic Processes, 1st edn. Springer International Publishing, Cham, pp 179–198
Chapter MATH Google Scholar
Eshragh F, Kargahi M (2013) Analytical architecture-based performability evaluation of real-time software systems. J Syst Softw 86:233–246. https://doi.org/10.1016/j.jss.2012.08.014
Article Google Scholar
Goel AL, Okumoto K (1979) A Markovian model for reliability and other performance measures of software systems. 1979 international workshop on managing requirements knowledge. MARK 1979:769–774. https://doi.org/10.1109/MARK.1979.8817248
Article Google Scholar
Gokhale SS, Wong WE, Horgan JR, Trivedi KS (2004) An analytical approach to architecture-based software performance and reliability prediction. Perform Eval 58:391–412. https://doi.org/10.1016/j.peva.2004.04.003
Article Google Scholar
Goševa-Popstojanova K, Trivedi K (2000) Stochastic modeling formalisms for dependability, performance and performability. Lecture Notes Comput Sci 1769:403–422. https://doi.org/10.1007/3-540-46506-5_17
Article Google Scholar
Heddaya A, Helal A (1996) Reliability, Availability, dependability and performability. A User-centered View, Boston
Google Scholar
Howard AR (1960) Dynamic programming and markov process, 1st edn. The Massachusetts Institute of Technology, Cambridge
MATH Google Scholar
Januzaj V, Mauersberger R, Biechele F (2009) Performance modelling for avionics systems. International conference on computer aided systems theory. Springer-Verlag, Berlin Heidelberg, pp 833–840
Google Scholar
John CK (2002) Safety critical system:challenges and directions. In: Proceedings of the 24th international conference on software engineering. IEEE, Orlando, FL, USA, pp 1–4
Kalaiarasi S, Merceline Anita A, Geethanjalii R (2017) Analysis of system reliability using markov technique. Global J Pure Appl Math 13:5265–5273
Google Scholar
Karlin S, Taylor MH (1975) A first cource in stochatic prcess, 2nd edn. Elsevier, San Diego
Google Scholar
Khvatskin L, Frenkel I (2017) Markov reward model for reliability assessment of aging markov reward model for reliability assessment of aging refrigeration
Koichi T, Shigeru Y (2009) Performability modeling for software system with performance degradation and reliability growth. IEICE Trans Fundam Electron Commun Comput Sci E92-A(7):1563–1571. https://doi.org/10.1587/transfun.E92.A.1563
Article Google Scholar
Kwon YM, Agha G (2007) A Markov reward model for software reliability. In: Proceedings - 21st International parallel and distributed processing symposium, IPDPS 2007; Abstracts and CD-ROM. https://doi.org/10.1109/IPDPS.2007.370525
Lisnianski A (2007) The markov reward model for a multistate system reliability assessment with variable demand. Qual Technol Quant Manag 4:265–278. https://doi.org/10.1080/16843703.2007.11673150
Article MathSciNet Google Scholar
Lisnianski A, Frenkel I (2009) Non-homogeneous Markov reward model for aging multistate system under corrective maintenance. Saf Reliab Risk Anal 1:551–557. https://doi.org/10.1201/9781482266481-84
Article Google Scholar
Martins A, Fonseca I, Farinha JT et al (2023) Online monitoring of sensor calibration status to support condition-based maintenance. Sensors 23:2402. https://doi.org/10.3390/S23052402
Article Google Scholar
Kallenberg O, Kallenberg O (1997) Foundations of modern probability (Vol. 2). New York: springer
Mo Y, Liu Y, Cui L (2018) Performability analysis of multistate series-parallel systems with heterogeneous components. Reliab Eng Syst Saf 171:48–56. https://doi.org/10.1016/j.ress.2017.10.023
Article Google Scholar
Norris JR (1998) Markov Chains, Número 2008
Pietrantuono R, Russo S (2013) Introduction to Safety Critical Systems. In: Innovative technologies for dependable OTS-based critical systems. Springer Milan, pp 17–27
Singh L, Vinod G, Tripathi AK (2012) Modeling and prediction of performability of safety critical computer based systems using Petri nets. In: Proceedings - 23rd IEEE international symposium on software reliability engineering workshops, ISSREW 2012. pp 85–94
Smith RM, Trivedi KS, Ramesh AV (1988) Performability analysis: measures, an algorithm and a case study. IEEE Commun Surv Tutor 37:406–417
Google Scholar
Sotelo M, Martins A, Mateus B et al (2023) Predicting the health status of a pulp press based on deep neural networks and hidden markov models. Energies 16:2651. https://doi.org/10.3390/EN16062651
Article Google Scholar
Strielkina A, Kharchenko V, Uzun D (2018) A markov model of healthcare internet of things system considering failures of components. CEUR Workshop Proc 2104:530–543
Google Scholar
Temraz NS, El-Dmcese MA (2011) Availability and reliability measures for multistate system by using markov reward model. Reliab Theory Appl 2:68–85
Google Scholar
Toledano S, Gartsman I, Avitan G, et al (2016) On Markov reward approach to failure criticality importance assessment for aging multistate system. In: Proceedings - 2nd international symposium on stochastic models in reliability engineering, life science, and operations management, SMRLO, pp 375–379. https://doi.org/10.1109/SMRLO.2016.66
Trivedi KS, Ma X, Dharmaraja S (2003) Performability modelling of wireless communication systems. Int J Commun Syst 16:561–577. https://doi.org/10.1002/dac.605
Article Google Scholar
Viktorova VS, Lubkov NV, Stepanyants AS (2018) A unified approach to reliability, availability, performability analysis based on markov processes with rewards. Adv Syst Sci Appl 18:13–38. https://doi.org/10.25728/assa.2018.18.4.624
Article Google Scholar
Wang JL (2004) Markov-chain based reliability analysis for distributed systems. Comput Electr Eng 30:183–205. https://doi.org/10.1016/j.compeleceng.2002.02.001
Article MATH Google Scholar
Wang L, Tian Y, Pei Z (2017) Reliability analysis of 6-component lattice load-sharing Markov repairable system with spatial dependence. Int J Perform Eng 13:279–287. https://doi.org/10.23940/ijpe.17.03.p4.279287
Article Google Scholar

Download references

Author information

Authors and Affiliations

Software Quality Assurance Lab, School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, 110067, India
Shakeel Ahamad & Ratneshwer Gupta

Authors

Shakeel Ahamad
View author publications
You can also search for this author in PubMed Google Scholar
Ratneshwer Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shakeel Ahamad.

Ethics declarations

Conflict of interest

I don't have any conflict of interest with any member of your journal. I don't have any affiliation with financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

Ethical approval

There are no human or animal studies are mentioned in this article.

Consent to participate

Not applicable.

Consent to publish

The manuscript does not contain any data from individuals. Hence it is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ahamad, S., Gupta, R. A reward-based performability modelling of a fault-tolerant safety–critical system. Int J Syst Assur Eng Manag 14, 2218–2234 (2023). https://doi.org/10.1007/s13198-023-02055-3

Download citation

Received: 19 April 2023
Revised: 22 June 2023
Accepted: 19 July 2023
Published: 03 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s13198-023-02055-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A reward-based performability modelling of a fault-tolerant safety–critical system

Abstract

Similar content being viewed by others

An SRN-Based Resiliency Quantification Approach

Energy-Utility Analysis for Resilient Systems Using Probabilistic Model Checking

Reliability and Control Theory: An Integration Approach for Safety Analysis

1 Introduction

2 Related work

3 Markov chain/process with reward

4 ﻿A markov model representation of a system under study

5 ﻿The performability measures used for the study

6 ﻿Design and development of the proposed performability model

6.1 Reliability

6.2 ﻿Mean number of failures

6.3 Availability

6.4 Performance

6.5 Performability

7 Case study

Assumptions

7.1 ﻿Mean Number of failures during the time interval (0, T)

7.2 Reliability

7.3 Availability

7.4 Performance

7.4.1 Average available throughput

7.4.2 Time spent different performance levels

8 Limitations and possible scalability of the proposed approach

9 Conclusion and future work

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent to publish

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

4 A markov model representation of a system under study

5 The performability measures used for the study

6 Design and development of the proposed performability model

6.2 Mean number of failures

7.1 Mean Number of failures during the time interval (0, T)