Introduction

It is estimated that 85 % of the total life cycle cost is determined by decisions made during operation, support and service stage. Hence, focus is made on critical deteriorative items to optimize repairs/replacements based on condition-based maintenance. A model was proposed by Lad and Kulkarni [1] for obtaining optimal preventive repair and replacement intervals for a machine tool sub-assembly considering the cost profile. The problem of integrated system and maintenance schedule design for life cycle from the perspectives of Indian machine tool industries was explored in [2]. The main objective of reliability centered maintenance (RCM) is to reduce the maintenance cost, by focusing on the critical functions of the system and avoiding maintenance actions that are not strictly necessary, but more than 60 % of RCM schemes have failed in successful implementation [3]. The risk quantification associated with RCM is determined by failure frequency and failure consequence. In many cases, failures occur while the system is far from the wear, fatigue and aged, and it has been understood recently that only a small number of failures are age-related out of many different system failure characteristics [4]. Hence, there is a need to look into various aspects of maintenance schemes in order to enhance efficacy. Eisinger et al. [5]. proposed a strategy of using a “percentage—YES” answer leading to probabilistic RCM. Hauge et al. [6]. used a risk assessment matrix (comprising of three zones e.g. acceptable, medium and unacceptable levels of risks) integrated with traditional RCM activities and brought out shortcomings in defining safety risk involved in space shuttle programmes in USA. Pexa et al. [7] proposed a methodology of concurrent use of RCM, risk based inspection (RBI) and safety instrumented functional process (method) for process equipment. Selvik et al. [8] made a shift in the methodological focus from reliability and probabilities (expected values) to reliability, uncertainty and risk, and named the methodology as risk and reliability centered maintenance (RRCM) scheme. The authors used broader risk perspective as advocated by Aven [9], where the probability is replaced with uncertainty in the definition of risk. The risk is stated as combination of events and its consequences of these events, and also uncertainties about occurrence of the said event and its consequences. The uncertainty factors are identified in failure mode, effect and criticality analysis (FMECA) worksheets. Then they are assessed with respect to degrees of uncertainty and sensitivity and their relative importance are estimated. The concept of likelihood coefficients for system failure modes is introduced by Fonseca [10] in RCM analysis for prioritization of critical failure modes. The failure modes are screened by using the fuzzification of the effects of the precipitating factors per failure mode. Three different types of precipitating factors (i.e. critical, important and related) are deliberated in [10]. When the likelihood coefficient of a failure mode is greater than the predetermined threshold value for that failure mode, then same failure mode is included in the RCM analysis and action taken for maintenance. The analytic hierarch process (AHP) has been utilised to estimate relative worths of items and then to rank them accordingly [11]. The relative worth, being a measure of probability specifies the probability of occurrence. The critical items of a Turbo Blower were ranked on the basis of the risk priority number (RPN) technique utilizing their weight ages to obtain the likelihood coefficients for failure modes and compared them with threshold RPN to decide on further actions on maintenance activities [12]. It is very clear from above discussion that RRCM framework provides additional decision support in the traditional RCM process. In this paper, authors have proposed a quantified risk ranking methodology for prioritization of critical items for condition-based risk and reliability centered maintenance (CBRRCM) scheme.

Proposed Risk Coefficient (β i ) for Risk Ranking

In order to identify and carry out maintenance actions in time, identify the failure modes of the maintenance significant items (MSI) that are likely to precipitate according to the working conditions of the system under scrutiny and then prioritise them according to the risk that they pose to the health of the system. The maintenance significant items (MSI) are combination of potentially critical items with respect to the functional failures and also having high failure rates and repair costs [3]. The maintenance significant precipitating factors (MSPF) are the critical failure modes that are likely to precipitate (i.e. develop into certainty of occurrences) which are assessed based on functional failures. The commonly used traditional RPN methodology for prioritizing risk for corrective action is not very objective as these values are obtained by simple multiplication of different combinations of severity, occurrence and detection rankings. Thus, there exists fuzziness in the process from collection of data to calculation of results. To solve this problem, a quantified risk ranking methodology with the introduction of risk coefficient (β i ) (critical item is denoted by i) for prioritization of critical items for condition-based risk and reliability centered maintenance (CBRRCM) scheme has been suggested. The methodology is explained as under.

  1. (a)

    The critical items are identified and MSPF are found out using traditional RPN values estimated through FMEA.

  2. (b)

    The fuzzy methodology has been adopted treating the probability as a fuzzy number, defining reliability in terms of a possibility measure and considering failure as a fuzzy event. The precipitating factors are fuzzified and considered as weightages which are multiplied with RPN values. When it is divided by threshold RPN value, it gives a co-efficient, called likelihood risk coefficient (β 1i ). It is an extension of the concept introduced by Fonseca [10].

  3. (c)

    The probability component of abstract risk is influenced by uncertainty, sensitivity besides other factors [9, 13] and risk can be quantified using these influencing factors and corresponding cost components. It is called as abstract risk coefficient (β 2i ).

  4. (d)

    The authors introduce hazardous risk coefficient (β 3i ), which is due to potential hazards if occur in future. The risk is deduced from criteria of consequences on safety, environment, maintenance and economic risks with corresponding cost for consequences.

  5. (e)

    We would get characteristic values of β 1i , β 2i and β 3i after a particular test. With few more tests on the system, the values may change significantly within controlling range of each coefficient. The ‘random number simulation’ is resorted to simulate the random behavior of the characteristic values of β i and one distinctive value for each coefficient is obtained.

  6. (f)

    The risk coefficients, β 1i , β 2i and β 3i are then statistically added then to obtain risk coefficient β i for each critical item.

    $$\beta_{i} = \beta_{1i} + \beta_{2i} + \beta_{3i }$$
    (1)
    1. (g)

      The final ranking of critical items is estimated based on relative worth weightages for critical items.

Likelihood Risk Coefficient (β 1i )

The failure modes whose RPN values are equal to or more than threshold RPN value are considered for further analysis based upon the fuzzification of the effects of the precipitating factors per failure mode. All identified precipitating factors are expressed as trapezoidal fuzzy numbers so that their contributions to the development of a particular failure mode are quantified as fuzzy numbers between 0 and 1, which are taken as magnitude limits. The slopes of resulting trapezoidal numbers depend on fuzzification of precipitating factors. All the precipitating factors for a failure mode under scrutiny are evaluated based on the assigned weightages. Analytic hierarchical process (AHP) has been used to estimate relative worths of items and then to rank them accordingly. The relative worth specifies the probability of occurrence as it is a measure of probability. The comparative weightages are used to obtain the likelihood coefficients for failure modes with respect to threshold RPN to decide on further actions on maintenance activities. Let f(x i ) j be the trapezoidal fuzzy number representing precipitating factor for jth MSPF of ith critical item. The likelihood risk coefficient for ith critical item is developed as given below [13].

$$\beta_{1i} = \frac{{(RPN)_{i} \mathop \sum \nolimits_{j}^{n} f(x_{i} )_{j} }}{{RPN_{1th} }}$$
(2)

The numbers of MSPFs are given by n. The f(x i ) j is quantified by the fuzzy number between 0 and 1 for jth MSPF of ith critical item. The RPN 1th denotes the RPN threshold value above which all RPN values correspond to critical items.

Abstract Risk Coefficient (β 2i )

Risk is defined in many ways under different perspective, making it abstruse and uncertainty is most dominating factor. Most scholars believe that uncertainty and risk are two different concepts and that uncertainty gives rise to risk and is vital for decision making [14]. The risk is estimated through potential financial losses in terms of expenditure for servicing, repair, maintenance including cost of materials, spare parts, if any, for each MSPF of n number of critical items and can be expressed for i th critical item as

$$|Risk|_{2i} = \mathop \sum \limits_{i = 1}^{n} C_{ji} W_{ji}$$
(3)

When \(C_{ji}\) are financial losses and W ji are relative worth of jth precipitating factor of ith critical item respectively. We also need to consider threshold value for estimating relative worth of each critical item. The MSPF values shall have a threshold value based on RPN values as discussed earlier. Hence, the threshold value of abstract risk is given by

$$|Risk|_{2th} = C_{2th} . W_{2th}$$
(4)

The abstract risk coefficient, β 2i for ith critical item is given by [13].

$$\beta_{2i} = \frac{{\left| {Risk} \right|_{2i} }}{{\left| {Risk} \right|_{2th} }}$$
(5)

Hazardous Risk Coefficient (β 3i )

We may note that there may be some risks which have potential to become hazardous if occur in future and their outcomes are termed as ‘hazardous risks’. Hazardous risk is evaluated from four categories such as safety risk, environmental risk, maintenance cost risk and economic risk, which is shown in Fig. 1. The safety risk is a situation when system failure includes the possibility of injury to personnel or of damage to property. Environmental risk refers to a situation when there is a threat to environment and or society. Maintenance cost risk refers to the loss due to down time of system and repair and replacement of items. Besides the risk to humans and the environment, we should also focus our attention on the economic impact that often results from any maintenance activity. The financial risks related to a possible interruption to the product output cannot be ignored, even if they are not associated with significant impacts on the other risk categories as discussed earlier. The losses caused by interruption to the work output, as a result of maintenance downtime, may range from small reductions in profit to significant production losses, contract delays and damage to the system. Therefore, it is necessary to include their influence while carrying out risk assessment due to maintenance activities. Hence, the economic risk has been considered separately to take care of the system failures causing delay and resulting in not meeting the target time.

Fig. 1
figure 1

Block diagram for hazardous risk

A method of risk evaluation considering each failure modes for different categories of risk proposed in [4]. It uses the probabilities of occurrences on subjective criteria, such as low, medium and high making the analysis qualitative. In the present study, a novel concept for quantitative evaluation of different categories of risk related to hazardous risks based on criteria of consequences is introduced as explained in Table 1. For each one of the criteria can be for “hazardous risks” stated above, e.g. a i , b i et cetera can be analysed through AHP to arrive at their respective relative worth. The criteria of consequences may be named as “precipitating criteria”’ and these involve expenditures on manpower, material, spares (if needed), and a “lost time” in the process of achieving targets. Such cost figures may be generated by maintaining proper record of various costs associated with each precipitating factor for all the four categories of risks during any repair work being done on the system. Such hazardous risks are in a way depicted below by the symbols and abbreviations.

$$\left| {{\text{Safety}}\,{\text{Risk}}} \right|_{i}\,=\,\left| {\text{SR}} \right|_{i}$$
$$\left| {{\text{Environmental}}\,{\text{Risk}}} \right|_{i}\,=\,\left| {\text{ER}} \right|_{i}$$
$$\left| {{\text{Maintenance}}\,{\text{Risk}}} \right|_{i}\,=\,\left| {\text{MR}} \right|_{i}$$
$$\left| {{\text{Economic}}\,{\text{Risk}}} \right|_{i}\,=\,\left| {\text{NR}} \right|_{i}$$
Table 1 Criteria of consequences for hazardous risks and their calculated relative worths

The corresponding critical item is denoted by i in the analysis of RPN values.

Next, the ‘Safety risk’ category of hazardous risk is to be determined, which is given by

$$\left| {Risk} \right|_{3i}^{\text{SR}} = \, \sum\limits_{j = i}^{n = k} {W_{j} C_{j} }$$
(6)

where, ‘k’ denotes the number of criteria of consequences for ‘safety risk’ and W j and C j are relative worth and respective expected financial losses for jth precipitating factor of ith critical item.

Now, the hazardous risk coefficient for category ‘safety risk’ is determined as

$$\left| {\beta _{{3i}} } \right|^{{SR}} = \frac{{\left| {Risk} \right|_{{3i}}^{{SR}} }}{{\left| {Risk} \right|_{{th}}^{{SR}} }}$$
(7)

where \(\left| {Risk} \right|_{3i}^{\text{SR}}\) is the threshold risk for category ‘safety risk’ of hazardous risk.

In a similar manner, \(\left| {\beta_{3i} } \right|^{\text{ER}} ,\,\left| {\beta_{3i} } \right|^{\text{MR}} {\text{and}}\,\left| {\beta_{3i} } \right|^{\text{NR}}\) can be deduced.

Finally, the β 3i can be calculated as

$$\beta_{3i} = \left| {\beta_{3i} } \right|^{\text{SR}} + \, \left| {\beta_{3i} } \right|^{\text{ER}} + \, \left| {\beta_{3i} } \right|^{\text{MR}} + \left| {\beta_{3i} } \right|^{\text{NR}}$$
(8)

Estimation of β i

  1. (a)

    Conduct at least 10 tests at various instant of times as each β i takes different values due to condition of work and variability in MSPF values.

  2. (b)

    Calculate likelihood risk coefficient (β 1i ), abstract risk coefficient (β 2i ) and hazardous risk coefficient (β 3i ) for each critical item.

  3. (c)

    Calculate probability density function (pdf) of β 1i , β 2i and β 3i for each critical item.. Plot pdf and cumulative density function (pdf) curves for β 1i , β 2i and β 3i for each critical item against occurrence distributions.

  4. (d)

    Use Monte Carlo simulation for at least 20 random numbers for β 1i , β 2i and β 3i for each critical item using cdf curves.

  5. (e)

    Obtain simulated values for β 1i , β 2i and β 3i for each critical item and add them up to get β i for each critical item. Check that β i values for each critical item follow normal distribution.

  6. (f)

    Draw probability distribution function (pdf) of β i for each critical item against occurrence distribution and estimate mean value f. The mean value gives the value of β i for each critical item.

  7. (g)

    Calculate relative worths of all the critical items and obtain final ranking of critical items. The higher the relative worth, the higher is the ranking of critical item.

A Case Study

The proposed method may be explained through a case study done on hyperbaric chamber system (HCS) which provides a safe, reliable and carefully controlled inhabitable environment (pressure, temperature, breathable gas etc.) in closed chamber at pressures above atmospheric. The mission requirements are medical treatment to divers suffering from decompression sickness and patients having high altitude pulmonary oedema, carbon mono-oxide poisoning. The schematic diagram for the system structure of HCS relevant to the purpose of subject study is shown in Fig. 2. There are three critical sub-systems viz. air and oxygen and fire protection systems. The air system is functional continuously during operation and oxygen system is working intermittently as per need. In case of accidental fire taking place inside the chamber facility, the fire protection system shall function on demand and there shall be conditional supply of air through oxygen system in place of oxygen. We shall apply the proposed methodology to quantify the risks in order to rank the critical items to finalise the activities for CBRRCM scheme for HCS.

Fig. 2
figure 2

Schematic diagram of HCS

Calculations of Critical RPN

The risk prioritisation process is calculated via RPN method. The failure mode and effect analysis (FMEA) is carried out for HCS. The detailed analysis is not reported here, only risk priority number (RPN) values of identified critical items as obtained through FMEA study are given in Table 2. The threshold of pursuing failures in order to identify critical items may be ascertained by any statistical scale [11]. The RPN value 125 (of air filter) is estimated as a ‘threshold’ value above which all items are critical.

Table 2 RPN values and MSPF for critical items

Maintenance Significant Precipitating Factor (MSPF)

It is possible for us to evolve MSPF of items using the RPN values of various items of the total system. One item may have one or more MSPF. Table 1 gives the critical items and their Maintenance Significant Precipitating Factors (MSPF). The parameters those are monitored for judging their health at an assumed instant are also listed in Table 2. The cause and effect analysis for the hyperbaric chamber system is carried out and the outcome in the form of fishbone diagram is shown in Fig. 3.

Fig. 3
figure 3

Cause and effect analysis for ‘HCS fails’

Calculationfor Likelihood Risk Coefficient (β 1i )

Table 2 shows the observed values of the characteristics criteria of each MSPF after a particular test. The weightages of MSPF for critical items in terms of relative worths are obtained and shown in Table 1. But with subsequent tests of the system, MSPF values may change giving different values of likelihood coefficients. Under such circumstances, which usually occur due to successive trials or tests done on the system, the β 1 (due to likelihood coefficient) for each of the items A, B, C, D etc. will have number of different values for each of β 1A , β 1B , β 1C , β 1D and β 1E . In order to obtain one distinctive value of β 1i (where i = A, B, C, D etc.) for each of the item, it is necessary to use random number simulation known popularly as ‘Monte-Carlo’ Simulation. The method has been shown by taking number of tests on all the items. The weightage of MSPF for items (say for example \(f\left( {x_{A} } \right)_{1 } = 0.7)\) are shown in Fig. 4. One such set of estimation has been shown below:

$$\beta_{1A} = \frac{{\left( {RPN} \right)_{A } \left[ {f(x_{A} )_{1 } + f(x_{A} )_{2} } \right]}}{{RPN_{1th} \left[ {f(x_{F} )_{1} } \right]}} = \frac{{210 \left( {0.7 + 0.4} \right)}}{125 x 0.375} = 4.9$$
$$\beta_{1B} = \frac{{\left( {RPN} \right)_{B } \left[ {f(x_{B} )_{1 } } \right]}}{{RPN_{1th} \left[ {f(x_{F} )_{1} } \right]}} = \frac{180 x 0.5 (0.7 + 0.4)}{125 x 0.375} = 1.92$$
Fig. 4
figure 4

Weightages of MSPF for critical items

In the similar way, the values can be obtained as,

$$\beta_{1C} = \, 2.12,\quad \beta_{1D} = \, 0.98\quad {\text{and}}\quad \beta_{1E} = \, 1.91$$

Calculation for Abstract Risk Coefficient (β 2i )

For each precipitating factor, the relative worth \(w_{ij }\) is dependent on degree of uncertainty, degree of sensitivity, degree of importance and overall importance and they are randomly variable. The relative worths of these four criteria has been estimated as 0.491, 0.111, 0 067 and 0.331 by using Satty’s method in [13]. The losses due to abstract risk and hazardous risk consequences for each factors (a1, b1, c1, d1,… a5, b5, c5, d5) are estimated in terms of Rupees based on past experiences and risk coefficients are calculated. Now, from Eq. (5)

$$\beta_{2A} = \frac{{\left| {Risk} \right|_{2A} }}{{\left| {Risk} \right|_{th} }} = 2.07,\quad \beta_{2B} = \frac{{\left| {Risk} \right|_{2B} }}{{\left| {Risk} \right|_{th} }} = 3.6$$

Similarly, β 2C  = 2.6, β 2D  = 0.64 and β 2E  = 0.25.

Calculation for Hazardous Risk Coefficient (β 3i)

There are four aspects of hazardous risks which are calculated item-wise with relative worths and corresponding expected financial losses. The β 3i for each critical item is obtained by adding up all the corresponding \(\left| {\beta_{3i} } \right|^{\text{SR}} , \, \left| {\beta_{3i} } \right|^{\text{ER}} , \, \left| {\beta_{3i} } \right|^{\text{MR}} , \, \left| {\beta_{3i} } \right|^{\text{NR}}\). Thereafter, β 3i which are summation of all the four categories of hazardous risks for critical items are obtained. Table 1 gives relative worths of criteria of consequences for four categories using analytical hierarchical process (AHP) method [14]. However, the calculation procedure for relative worths is not included in this paper. The ‘safety risk for critical items are calculated using Eqs. (6) and (7) as given below.

$$\left| {\beta_{3A} } \right|^{SR}\,=\,\frac{{\left| {Risk} \right|_{3A}^{SR} }}{{\left| {Risk} \right|_{th}^{SR} }} \,= \, 2.29\quad \left| {\beta_{3B} } \right|^{SR}\,=\,\frac{{\left| {Risk} \right|_{3B}^{SR} }}{{\left| {Risk} \right|_{th}^{SR} }}\,=\,4.61\quad \left| {\beta_{3C} } \right|^{SR}\,=\,3.39\quad \left| {\beta_{3D} } \right|^{SR}\,=\,0.38\quad {\text{and}}\quad \left| {\beta_{3E} } \right|^{SR}\,=\,0.56$$

The risk coefficients \(\beta_{3A}\) for critical item A is calculated using Eq. (8) and are given in Table 3. Thereafter, \(\beta_{3B}\), β 3C , \(\beta_{3D}\) and \(\beta_{3E}\) are calculated in the similar manner.

Table 3 Risk coefficients β1A, β2A and β3A

Estimation of β A , β B , β C , β D and β E

The methodology of estimation of β i is explained earlier. The same procedure is applied step by step for the case study taken up in this report.

  1. (a)

    Let us conduct 10 tests at various instant of time where MPSF are found varied.

  2. (b)

    Likelihood risk coefficient (β 1i ), abstract risk coefficient (β 2i ) and hazardous risk coefficient (β 3) for critical items A, B, C, D and E are obtained for ten tests and results for critical item A are given in Table: 3.

  3. (c)

    The occurrence intervals are selected for all critical items for β 1i , β 2i and β 3i . The density and cumulative probabilities are calculated. The cumulative probability (cdf) for all critical items are plotted against occurrences. Figure 5 shows the same for item A.

    Fig. 5
    figure 5

    Cdf versus number of occurrence for item A

  4. (d)

    Monte Carlo simulation method is applied for items A, B, C, D and E separately with 20 random numbers selected from random number table [15] for obtaining occurrences of \(\beta_{1i} , \beta_{2i}\) and \(\beta_{3i}\). The simulated values of occurrences are noted in Table 4 for critical item A. For example, as shown in Fig. 5, the occurrence values of \(\beta_{1A} , \beta_{2A} , \beta_{3A}\) for item A for a random number 50 are found as 3.5, 1.9 and 10.9. The simulated values, thereafter are checked for distribution using Anderson–Darling normality test [16]. It may be noted that the distribution becomes normal (i.e. P value is more than 0.05), when three values out of twenty are neglected for item A. The pdf values for β A are shown along with mean and standard deviation values in Table 4.

    Table 4 Simulated values for item A using random numbers and Pdf curve for βA
  5. (e)

    The pdfs of occurrence of β B , β C , β D and β E are also obtained in the similar manner and the mean values of β i for each critical item are also estimated.

  6. (f)

    The relative worths of critical items calculated and results are given in Table 5. The item B is found to have highest ranking followed by items A, D, C and E. The results obtained through proposed method is different than those obtained using RPN values where ranking orders of critical items are obtained as A, B, C, D and E (refer Table 2) from height to lowest. Since RPN methodology of prioritizing critical items is based on to a great extent on human judgment, it is likely to have some personal error. This study shows that results obtained using risk coefficient (β i ) methodology removes fuzziness giving more clarity in risk ranking methodology.

    Table 5 Final ranking of critical items

Conclusion

In this paper authors discussed the risk quantification methods and the evaluation of risks using the suggested decision parameters to rank critical items based on condition-based risk and reliability centered maintenance (CBRRCM). The developed mathematical model for risk assessment and prioritizing in ranking of critical items shall be useful in optimization of financial losses and timing of maintenance actions. It is possible to summarize that the paper aims at deriving the following vital points.

  1. (a)

    A new risk coefficient (β i ) for quantified risk ranking model for critical items for prioritization for initiating actions for CBRRM scheme is proposed which is denoted by the statistical summation of likelihood risk coefficient (β 1i ), abstract risk coefficient (β 2i ) and hazardous risk coefficient (β 3i ). In order to arrive at the correct decision, numbers of tests are simulated and ran through Monte Carlo simulation processes for each of β 1i , β 2i and β 3i separately. The expected values of decision criteria are obtained through mean values.

  2. (b)

    The hazardous risk is evaluated from four categories such as safety risk, environmental risks, maintenance cost risk and economic risk. Each category has various criterions of consequences. The hazardous risk of each category is calculated from expected financial losses and relative worth. This proposition is entirely authors’ contribution towards quantification of risk ranking of items.

  3. (c)

    The next logical step is to estimate the degradation values and the time intervals for ranked critical items in order to carry out predictive maintenance tasks. These aspects can be studied further as future work.