Keywords

1 Introduction

Asset management policy helps in high quality services by tracking the inventory status, estimating lifecycle costs and planning financial investments. Asset maintenance covers a combination of all technical, administrative, and managerial actions during the system life cycle, from requirement specification preparation to realistic and time-bound evidence-based procurements, installation and maintenance policy decision making procedures and decommission. The key factors to be considered are the physical deterioration, obsolescence, operation, and system maintenance. Asset Management must be specific, measurable, actionable, realistic and time-bound. A Failure Reporting, Analysis and Corrective Action System (FRACAS) provides important information from failure analysis and corrective actions, based on management data reviews, for reliability data reports. Quantified Fault Tree Analysis (FTA) from an efficient FRACAS helps to identify fault prone modules in subsystems.

Main asset management tasks are preparing rigid unambiguous requirement specification, sub-system reliability and maintainability allocation according to their influences on the system function, system availability calculation, choosing interchangeable system architecture with redundancy as required, fixing sample size and acceptance based on lifecycle cost, choice of preventive and predictive maintenance policy, maintenance based on repair or replacement, location of spare parts and documented failure reporting analysis and corrective action systems. Supply contracts must include incentive on improved documented reliability data. Safety critical system asset management uses risk influencing factor, vulnerability and managerial oversight risk analyses. Acceptance testing should comply with reliability targets and stress testing targets.

Maintenance combines all administrative, technical and managerial actions during the life cycle and covers fault localization and restoration, enhanced Mean Time Between Failures (MTBF) for repairable and Mean Time To Failure (MTTF) for non-repairable components, reduced Mean Time To Repair (MTTR), resource management with least capital investment and human skill and aptitude management. It also reduces the duration of unplanned and planned outages. Asset management must identify the failure cause, probability and consequence, estimate the life-cycle cost, prioritize various renewal plans, award extensive performance-based maintenance contract registers.

System failures can be due to faults in design, manufacturing, installation and commissioning, operation, maintenance and decommissioning processes. Asset management policy can be based on following international standards like ISO 55001 [1] and EN 16646 [2].

All the calculations in this paper, are based on Reliability Engineering textbooks like Introduction to Reliability and Maintainability Engineering by Ebeling [3] and Practical Reliability Engineering by O’Connor [4]. The following flowchart explains the various activities in QRAMS based asset management.

figure a

2 Quantified RAMS Based Asset Management

2.1 Benefits of Quantified RAMS Management

This technique uses mathematical calculations to reduce the likelihood of frequent failures, downtime duration and capital investment, maintaining proper alignment of maintenance resources. Lifecycle cost calculations help in choosing best asset procurement, maintenance policy with renewal plan, logistics and supply chain management, audit and review of activities. Quantified analysis helps to identify the key performance indicators and mathematically decides whether requirement specifications or design and installation processes are to be modified.

2.2 Quantified Asset Management Plan Attributes

The planning must have actual data based compliance with regulatory standards and must be fully traceable to RAMS specification goals. Calculations assist to compare performance of best spare parts procurement and storage policy. It also picks up the best offer and judge maintenance contracts. Renewal process for a system restoration to a condition like new, means replacing a failed component with one from same population. However, it might not result in keeping MTTF constant and guarantee services equal to the existing replaced component. All assumptions during replacement must be thoroughly analyzed and validated.

Treating all system hours as equivalent and all failures as equal ignores potential and likely real age effects. A better and less assuming approach to measure reliability is to analyze the data versus system age, i.e., apply time dependent reliability analysis.

Reliability Issues for a system of multiple systems are.

  • The mean number of repairs, by time t

  • The mean repair rate for all systems

  • The expected variation in the mean number of repairs at a given time

  • The distribution of failures across the identical systems.

  • The expected time to 1st repair and kth repair

  • The mean repair cost and whether the trend is changing

  • The adequacy of spare parts

  • Serviceability of the system for repairs.

2.3 Goals of Maintenance Management

Maintenance must be treated as an investment rather than a cost. Sustainable strategic plans in economic, environmental and social dimensions, set the management goals and activities. The maintenance team must strictly fulfill their allotted responsibilities. Maintenance management goals can be classified as primary and secondary:

  • Primary goals cover reliability, fault localization, accessibility, and diagnostics. It decides the repair level, chooses between repair and replacement, with the best repair option.

  • Secondary goals cover repair resources with spare parts levels, human factors with competency level, ergonomics.

2.4 Tasks Associated with RAMS Management Plan

  • A sustainable quantified RAMS plan, covers preparation of a rigid and unambiguous requirement specification, reliability and maintainability allocation to subsystems and availability calculation.

  • Choosing interchangeability oriented best system architecture. Procurement policy including fixing sample size and acceptance test duration and acceptance of the best offer from suppliers.

  • Choosing optimum life-cycle cost and adherence to specifications.

  • Adapting the best maintenance practice with adequate spare parts provision

  • Evaluating maintenance cost contract with suppliers, allowing incentive on improved reliability with documented data.

  • Checking for compliance with reliability and maintainability specifications.

3 Quantified Life-Cycle RAMS Activities

3.1 Preparation of Specification

The various requirements included in the specification are related to system functions, operational procedures, subsystem interfaces, installation and commissioning, maintenance with on-site testing, RAM targets, power supply (if needed), system performance metrics, lifetime sustainability design modifications, physical constructions, environmental conditions, diagnostics and documentation.

System requirement Specifications must be cohesive, consistent, feasible, relevant, unique, unambiguous, verifiable and must be validated. Requirement specification for an equipment covers many aspects, each of which influences the performance of the system. The following requirements must be complied:

  • Functional requirements

  • System requirements

  • System operation requirements

  • Procedural requirements

  • Interface requirements, if any

  • RAM requirements

  • Safety requirements, if the system is safety oriented

  • Environmental condition requirements

  • Maintenance requirements

  • Power supply requirements, if any

  • Installation, commissioning and on-site testing requirements

  • Lifetime sustainability requirements

  • Modification requirements

  • Performance requirements

  • Physical construction requirements

  • Diagnostic requirements

  • Documentation requirements.

RAM Specification should cover target system reliability, availability and maintainability, subsystem redundancy configuration, target maximum MTBF or MTTF and minimum MTTR, inspection intervals with durations and system integration RAM Qualification test criteria. It must also include reliability and maintainability demonstration clauses as per MIL HBKs 781 [5] and 471 [6] respectively. We are to consider that reliability reduces with time. If, a system has a constant failure rate of 3 per 107 h, and initial reliability of 0.99737, after 5 years, the reliability would come down to 0.98694.

Requirements written in natural language, could be interpreted differently by different professionals, engaged in development of a subsystem. Application of formal mathematical models like Markov Model and Petri Nets can reduce this problem. While specifications are prepared, care must be taken to avoid over-specification, which might increase the complexity of the system.

3.2 RAMS Allocation

During concept phase of the life-cycle of a complex system, allocations of RAM targets should be specified. When the system reliability is decided, all the sub-systems must be allocated their individual reliabilities in a least cost manner.

Reliability values between the subsystems are based on complexity, criticality, achievable reliability, or other factors that are deemed appropriate. Reliability is allocated usually to system, subsystem, module and component levels. The calculations depend on the system reliability block diagram. Each sub-system should be allotted its importance index to show its influence on the system reliability. The importance index wi for a particular subsystem, is allotted after studying failure history of similar systems installed and maintained earlier and in consultation with expert systems, if any. All the constituent systems or subsystems are treated to be independent and connected serially from reliability point of view. Subsystems having more failure effect probability, should have more reliability allocated.

Using Advisory Group on Reliability of Electronics Equipment (AGREE) method, which allows component operating time to be less than system operating time, individual sub-system failure rate λi is found for electronic subsystems using the following formula,

$$ \uplambda _{\text{i}} = - \left( {1/{\text{t}}_{\text{i}} } \right)\;{\text{ln}}\;\left[ {1 - \left\{ {(1 - ({\text{R}}^* )^{{\text{n}}/{\text{N}}} } \right\}/{\text{wi}}_{\text{i}} } \right] $$
(1)

where,

ti:

operation time,

R*:

System Reliability,

N:

total number of components,

n:

No. of similar components in sub-system.

wii:

Importance Index.

Example:

Find the individual subsystem (with Importance Index wi) reliabilities for a Train Control System, that must have a reliability R* of 0.99 after system operation time (ti) of 10,000 h. There are 9 axle counters (wi = 0.95), 80 track circuits (wi = 0.0.85), 140 signals (wi = 0.95), 86 points (wi = 0.95) and 10 power supplies (wi = 1).

Answer:

Total number of units (N) = 325.

Results are calculated as per Eq. (1) from individual failure rate λi. We find failure rates and reliability after 10,000 h operation to be, as showed in Table 1.

Table 1 Subsystem failure rates and availability after 10,000 h

However, the following flaws remain unattended. AGREE method, assumes exponential failure rate distribution, does not provide insight into the failure intensity function of the systems, ignores the systems/subsystem life characteristics and can’t be used for mechanical or electro mechanical systems. Moreover, it works suitably only if wi is 1 or very near to 1. For the mechanical systems, Weibull distribution is more suited. The failure behaviour of mechanical components over continuous/cumulative operating time β factor is considered when Weibull distribution used.

Availability versus Failure period per year can be related to the downtime duration as showed in Table 2.

Table 2 Availability versus Failure period per year

Availability depends on MTBF, MTTR and inspection interval. For a system with exponential distribution, it can be calculated if test time and repair time are negligible, as per the equation:

$$ {\text{A}}\left( {\text{T}} \right) = \left( {{1}-{\text{e}}^{-{\uplambda }{\text{T}}} } \right)/{\uplambda }\left[ {{\text{T}} + {\text{t1}} + {\text{t2}}\left( {{1}-{\text{e}}^{{\uplambda }{\text{T}}} } \right)} \right] $$
(2)

where, T = Inspection interval, in Hrs. t1 = Inspection duration in Hrs. and t2 = MTTR in Hrs.

Availability after a specific time period can be calculated by the equation below

$$ {\text{A}}({\text{t}}) = \frac{{{\text{MTTR}}}}{{{\text{MTTR}} + {\uplambda }}} + \frac{{\uplambda }}{{{\text{MTTR}} + {\uplambda }}}{\text{e}}^{({\text{MTTR}} + {\uplambda }){\text{t}}} . $$
(3)

Maintainability depends on MTTR, which must comply with 95% confidence that repairs are performed within defined duration. MTTR can be calculated by the equation,

$$ {\text{MTTR}}_{\text{i}} \le \{ ({1} - {\text{A}}_{\text{i}} )/{\text{A}}_{\text{i}} \} \times {\text{MTBF}}_{\text{i}} $$
(4)

where, Ai = Individual availability.

Example:

A Four Component system with constant failure rate, has individual MTBF of 2100, 3200, 5000 and 1700 h. and the availability is specified to be 0.99 find the Individual MTTR.

Answer:

Individual Availability Ai, where i = 1, 2, 3, 4, is (0.99)1/4 or 0.9975.

So, MTTR1 ≤ 5.26 h, MTTR2 ≤ 8.02 h, MTTR3 ≤ 12.53 h and MTTR4 ≤ 4.26 h.

3.3 Procurement Policy

3.3.1 Type Approval

The procurement policy includes type approval for newly designed systems covering manufacturing, maintenance needs with spares, test plan and quality control and cross acceptance for imported systems and suppliers’ assurance. Cross this covers type approval for a newly designed system, cross acceptance for imported systems and suppliers’ assurance. The approval process covers manufacturing, maintenance requirements with spares, test plan and quality control. Type approval remains valid until the equipment is obsolete, design is changed or the performance is unsatisfactory. Reliability and supply of specific reliability data for the system must be implied in supply contract, considering other performance aspects.

The RAM points to be considered before type approval are:

  • Documented compliance to installation, maintenance and RAM guidelines of the organisation,

  • Interfacing with risk analysis to existing subsystems for a safety–critical system,

  • System integrity with interchangeability between subsystems procured from different suppliers,

  • Equipment manufacture must be done under an accredited documented quality control system,

  • Supply of extensive hardware Failure Mode Effect and Criticality Analysis (FMECA) and Independent Software Verification and Validation reports must be done for computer based subsystems or system,

  • If software development was partly done by third party, a domain expert must be the part of the development team.

Suppliers must inform the regulator regarding documented evidence of past or present product approval.

3.3.2 Acceptance Testing

Acceptance policy must be complied to ensure proper functioning in local environments. Acceptance testing comply with reliability targets. Highly Accelerated Life Test (HALT), Highly Accelerated Stress Screening (HASS) and Burn-in Tests are part of acceptance testing for electronic components. If there is any software involved, third party independent verification and validation of the software also become part of procurement policy. Failure rate calculations at thermal and electrical stress conditions, is very important.

For systems having electronic components reducing the room temperature reduces failure rate and increases reliability. We can use MIL HDBK 217F [7] to calculate failure rates of components depending on temperature and component quality. For example, temperature of 30 °C instead of 45 °C, reduces failure rate of Amplifier-Rectifier card of an Axle Counter, used in Train control, for example, from 8.13884 to 5.7342 per 106 h., i.e. an improvement of 29.5%. If we use components with better quality (naturally costlier option), failure rate further reduces to 1.62183 or by 80%. Whether these options would be chosen, depends on the criticality of the function of the card.

Burn-in does not reduce the failure rate of an electronic subsystem, but increases the lifetime. Components lost during burn-in test must be discarded and replaced by a new one. For example, a component for an electronic system has a decreasing failure rate of 0.0005(T/1000)−0.5/Year and Reliability R(t) 0.9.

Now, \({\text{e}}^{-\left( {{\text{T}}/{1}000} \right) - 0.{5}} = 0.{9}\).

From this,

$$ \begin{aligned} {\text{T}} & = {1}000\left\{ {-{\text{ln}}\left( {0.{9}} \right)} \right\}^{2} = {1}000 \times \left( {0.{1}0{536}} \right)^{2} \\ & = {1}000 \times 0.0{111} = {11}.{1}\;{\text{years}} \\ \end{aligned} $$

When a burn-In period of 6 months (0.5 year) is introduced, reliability still being 0.9.

So, \({\text{e}}^{[-\left( {{\text{t}} + 0.{5}/{1}000} \right) - 0.{5}]} /{\text{e}}^{[-\left( {0.{5 }/{1}000} \right) - 0.{5}]} = 0.{9}.\)

From this,

$$ \begin{aligned} {\text{T}} & = {1}000\left\{ {0.{1}0{536} + 0.0{2236}} \right\}^{2} -0.{5} \\ & = {1}000\left\{ {0.{12772}} \right\}^{2} -0.{5} = \left( {{1}000 \times 0.{1631}} \right)-0.{5} \\ & = {16}.{31}-0.{5} = {15}.{81}\,{\text{Years}}. \\ \end{aligned} $$

Thus, the Designed Life of the Component is increased by (15.81 − 11.1) or 4.71 Yrs.

For software oriented systems, both static and dynamic testing are to be implemented during verification and validation task. Verification ensures that the output of a system life-cycle phase correctly reflects the inputs of that phase. It is performed for requirements, functional design, internal design, and coding phases. Validation checks whether the system matches user requirements. It covers software requirement review, software integration testing and system acceptance testing.

Maximum likelihood estimation for failure is calculated before asset acceptance. It is performed on a randomly selected and specified sample size. Suppose, N number of samples are tested and X failures are observed at time T. If X is much less than the specified number of failures that can be allowed, the material is accepted. Duration for acceptance testing are fixed by the asset managers.

Suppliers must inform the regulator regarding documented evidence of planned or current acceptance status in other similar organizations and report any non-compliance and any change in conditions of use. They must state that their offers meet the required RAM specifications, defined by the asset managers.

3.3.3 Sample Size

Asset managers fix sample size depending on margin of error, confidence level (Z score), and standard deviation. And is calculated as per the equation

$$ {\text{Sample}}\;{\text{Size}} = \left( {\text{Z-score}} \right)^{2} *{\text{StdDev}}*\left( {{1} - {\text{StdDev}}} \right)/\left( {{\text{margin}}\;{\text{of}}\;{\text{error}}} \right)^{2} $$
(5)

Z-scores for the most common confidence levels:

90%: Z Score = 1.645,

95%: Z Score = 1.96, and

99%: Z Score = 2.576.

Example:

Find the sample size for 95% confidence level, 0.5 standard deviation, and a margin of error of ± 5%,

Answer:

The sample size would be calculated as per Eq. (5) and is

$$ \begin{aligned} & \left( {\left( {1.96} \right)^2 \times 0.5\left( {0.5} \right)} \right)/\left( {0.05} \right)^2 \\ & \quad = \left( {3.8416 \times 0.25} \right)/0.0025 \\ & \quad = 0.9604/0.0025 \\ & \quad = 384.16 \\ \end{aligned} $$

So, 385 samples are needed.

Since subsystems may be procured from different suppliers in future, interchangeability must be there for replacement. Interchangeability means that system components can be procured from any supplier and replace any legacy component without any substantial change in functionality or performance. It also allows the system to adapt to technology evolutions without significant modifications to its architecture.

3.3.4 Total Life-Cycle Cost

Before acceptance of an offer from suppliers, the total life-cycle cost of each offer, as stated by the suppliers, must be calculated and compared to find the best option. Annuity factors must be taken into consideration. Acquisition cost is the visible part of the submerged portion of life-cycle cost, which is calculated using the equation

$$ \begin{aligned} & {\text{Life}}\,{\text{cycle}}\,{\text{cost}} = ({\text{Acquisition}}\,{\text{cost}} \times {\text{No}}.\,{\text{of}}\,{\text{Units}}) + ({\text{Annuity}} \times {\text{Units}}) \\ & \quad [{\text{op}}.{\text{cost}} + \left( {{\text{op}}.{\text{time}} \times {\text{failure}}\,{\text{cost}}} \right)/{\text{MTBF}}] + {\text{fixed}}\,{\text{repair}}\,{\text{labour}}\,{\text{cost}} \\ \end{aligned} $$
(6)

Example:

Choose the better option from two offered Designs for supplying 10 units of a subsystem having a lifetime of 15 years and an annuity of 9.125. Offer A has acquisition cost ₹2.70 lakh, failure rate 0.00833/h, MTTR 8 h, availability 0.9285, repair cost ₹2.0 lakh/yr, operating cost ₹65,000/yr and Offer B has acquisition cost ₹3.15 lakh, failure rate 0.00677/h, MTTR 6 h, availability 0.993, repair cost ₹27,650/h, operating cost ₹65,000/yr. Choose the better offer from lifecycle cost angle.

Answer:

Applying Eq. (6), we get lifecycle cost for system A is ₹14,896,333 and system B is ₹19,108,344. So, though the acquisition cost is higher in system B, it would be having less lifecycle cost.

The supplier must also provide documented support for life cycle maintenance cost after the warranty period and ensure critical spare part supply. The deliverables from system assurance are supplier's requirement specification compliance report, system assurance capability proof, design approval documents, system assurance report and system assurance audit report. Finally, asset management audits must be performed to check whether the specified strategies are being followed.

3.4 System Architecture for Electronic Systems

Computer based system architecture is divided into hardware, software, and power supply. In addition, for systems connected in a network depends on data transmission system as well.

Redundancy is used to increase availability. To avoid common mode failures, redundant elements must work independent from each other. Redundancy can be achieved by hardware, software as well as time. In the latter case, inputs are fed to all the processors but with time lag from each other, to avoid any glitch or spike in inputs. Redundant elements appear in parallel in the reliability block diagram. Redundancy can be in different configurations. For having high MTBF and improved availability, 2 out of 3 mode of redundancy is chosen, where inputs are fed to three processors connected in parallel and outputs are passed through a voter circuit with facility for reading back the output to check whether the intended output is actually available. Even if one processor fails, the system still remains available, as other two processor give the same output. Chosen redundancy configuration must be mentioned in specification for tender documents to avoid offers with different configurations.

Though the system reliability reduces due to redundant components, the quality of service increases due to better availability.

3.5 Maintenance Policy

3.5.1 Managing the Maintenance Process

Reliability and availability depend on managing the maintenance process. All maintainability criteria including logistics and EMC related hazards must be considered. Inter-department interfacing guidelines must be strictly followed. MTTR comprises preparation time, verification time, fault localization time, parts procurement time, logistics time, final testing time and over all administrative time. MTTR must comply with 95% Confidence that repairs are performed within defined duration.

Maintenance can be basically classified as corrective, preventive and predictive. Corrective or proactive maintenance means action is taken after a failure is reported.

Preventive maintenance takes action before the failure happens. This type of maintenance interval can be decided based on time based, volume of work based or condition based. Inspection interval and duration play vital roles in preventive maintenance. It needs properly trained personnel, regular inspection and service and has to maintain regular records.

Since preventive maintenance also keeps the system down during maintenance, a degraded condition based predictive maintenance is the best option. Maintenance resources should be stored in three levels—depot, supervisor level and on-site maintenance persons.

3.5.2 Optimum Inspection Interval

Optimum inspection interval must be calculated from failure rates, inspection duration and repair time. For example, an Axle Counter equipment has constant failure rate of 0.0000971/106 h. Any defective component would be restored, if found faulty during the periodic inspection of 1 h. time. The repair/replacement time is 8 h. in the worst case. The availability after specific inspection intervals can be calculated by the equation,

$$ {\text{A}}\left( {\text{T}} \right) = \left( {{1}-{\text{e}}^{-\lambda {\text{T}}} } \right)/\lambda \left[ {{\text{T}} + {\text{t1}} + {\text{t2}}\left( {{1}-{\text{e}}^{\lambda {\text{T}}} } \right)} \right] $$
(7)

where, t1 = inspection duration, t2 = MTTR, and T = Inspection Periodicity.

Example:

A system has a constant failure rate 0.0000971/106 h. Any failure identified during inspection would be rectified. Inspection duration is 1 h and repair or replacement takes 8 h, in worst case. What should be the optimum inspection interval?

Answer:

Let us calculate the system availability at different inspection intervals of 96, 168, 240, 336, 504, and 672 h. Using the Eq. (6), we find from Table 3 that the optimum gap between Periodic Inspections is found to be 168 h.

Table 3 Inspection interval versus availability

The optimum gap between Periodic Inspections is found to be 168 h.

3.5.3 Adequacy of Spare Parts

Adequacy of spare parts and location of stores, whether centralized or distributed, are to be specified after calculations to provide maximum availability. For example, 6 systems with λ = 1 × 10–5/h, and cumulative operating time of 50,000 h and reliability ≥ 0.99, the number of spare parts with centralized store will be 11. For decentralized 6 stores, each would need 10 spare parts. Thus, centralized stores needs less number of spare parts, but availability depends on supply chain and logistics.

Adequacy of spare parts available is to be checked by the equation.

Example:

Suppose an electronic component has a failure rate of 0.000003/h repair shop has procured two spare components. If the designed life of the component is 20 years, what is the probability that the spares would be adequate for 10 such components?

Answer:

Expected number of failures during lifetime for 10 components, is 0.00003 × 10 × 20 × 8760 or 5.256.

Probability of ≤ 2 failures in 20 years is

$$ \begin{aligned} {\text{R}}_{(20)} & = \sum_0^2 {\left\{ {{\text{e}}^{ - {5}.{256}} \left( {{5}.{256}} \right)^{\text{n}} /{\text{n}}!} \right\},\quad {\text{where}}\;{\text{n}}\;{\text{is}}\;{2}.} \\ & = {\text{e}}^{ - {5}.{256}} \left( {{5}.{256}} \right)^0 /0! + \left( {{5}.{256}} \right){1}/{1}! + \left( {{5}.{256}} \right)^{2} /2! \\ & = 0.00{5216 }({1} + {5}.{256} + {13}.{812768)} \\ & = 0.00{5216} \times {2}0.0{68768} = 0.{1}0{46787} \\ \end{aligned} $$

It shows that the spares in stock will have only 10.46% probability of adequacy.

3.5.4 Repair and Renewal of Spare Parts

Maintenance using diagnostics and prognostics should be used and maximum availability is to be chosen after comparing repair and renewal of spare parts.

Replacement with a new unit enhances maintainability by reducing restoration time, maintenance skill requirement, document and test equipment. For a fixed number of failures and increasing unit cost, repair is preferable. For a fixed unit cost and reduced failures, replacement is advantageous.

Replacement is cheaper when,

$$ {\text{ad}} + \left( {{\text{c}} + {\text{bd}}} \right){\text{f}} \le {\text{ar}} + \left( {{\text{br}} + {\text{ck}}} \right){\text{f}}, $$
(8)

where,

ad:

Fixed cost of discarding,

ar:

Fixed cost of repair logistics,

bd:

Variable cost to remove and replace a discarded unit,

br:

Variable cost to repair a failure,

c:

Unit cost,

f:

Number of failure during unit life,

k:

% of failures, that cannot be repaired.

3.5.5 Sustainability and Resilience Policy

A stringent asset sustainability and resilience policy against climate change causing disruption, must be strictly followed in construction and asset renewal activities in the most cost-effective manner. To avoid any on-site maintenance activity lapse, specific checklists, maintenance tools and drawings must be available. Though human errors causing system failures, are difficult to analyze quantitatively, proper.

Reliability growth testing must be done to verify stipulated MTBF or MTTF. This test improves the reliability of the design through root cause analysis of the observed failures to determine specific design modifications, verifying that failure modes have been removed or mitigated without introducing new failure modes. Updated training is mandatory for the concerned men-at-site.

Cost models on maintenance optimization can be based on Markov model with inspection and replacement policy. To avoid any on-site maintenance activity lapse, specific checklists, maintenance manuals and proper tools, measuring instruments and drawings must be available. Audit of employee competency must be properly planned and followed at scheduled intervals. While performing scheduled maintenance of the system, compliance to routine tests must be documented. Care is to be taken to avoid insertion of any new hazard during maintenance tasks and if any, whether they could be mitigated. Any process changes, modification or replacement/repair action must be approved by competent authority.

3.5.6 Reliability Growth Testing

Reliability growth testing must be done to verify stipulated MTBF or MTTF. Checking for Compliance with Reliability and Maintainability Specifications calculated by using Fischer Distribution table, provided the confidence level is defined, must be performed to verify that the suppliers’ documented claims are fulfilled.

For example, specifications for a part needs 0.95 reliability at 1000 operating hours. 1 failure in 50 parts failed. Is the Specification being met?

Answer:

Reliability at 1000 h, R(1000) = 1 − (r/n), where r is the number of Failures and n Is the Sample Size for Test.

$$ {\text{R}}_{({1}000)} = {1}-\left( {{1}/{5}0} \right) = {1}-0.0{2} = 0.{98}, $$

For a 95% Lower-Bound Interval, from Fischer Distribution Table,

$$ \begin{aligned} {\text{F2}} & = {\text{F}}\left( {0.0{5},{4},{98}} \right) = {2}.{48}\;{\text{and}} \\ {\text{RL}} & = \frac{1}{{{1} + \left( {{2}/{49}} \right) \times {2}.{4}}} = 0.908 \\ \end{aligned} $$

We are Confident that the Reliability is at least 0.908.

So, we cannot certainly say that the Specification is being met.

Once the system is installed and worked for a reasonable time period, it is to be checked whether maintenance goal is achieved.

Example:

From a sample of 85 corrective maintenance repairs of a subsystem, 78 were completed within 5 h suppliers contract assured that 90% of repairs would be done within 5 h is the maintenance goal achieved?

Answer:

Sample proportion p* = 78/85 = 0.9176

Lower bound confidence interval is 95%

$$ {\text{p}}_{\text{L}} = {\text{p}}^* -{\text{z}}_{\upalpha } \surd \left\{ {{\text{p}}^* \times \left( {{1}-{\text{p}}^* } \right)} \right\}/{\text{n}} $$

From Table, for 95% confidence, zα = 1.66

$$ \begin{aligned} {\text{p}}_{\text{L}} & = 0.{9176}-{1}.{66}\surd \left( {0.{9176} \times 0.0{824}} \right)/{85} \\ & = 0.{8681} \\ \end{aligned} $$

We are now 95% confident that 86.81% repairs are performed in < 5 h. So, the goal is not met.

3.5.7 Annual Maintenance Contract

When annual maintenance contract is awarded to the suppliers, RAMS engineers must check whether the AMC cost is justifiable.

Example:

An item is procured in a bunch of 50. Four items fail in the 1st year and 22 need repair within 5 years. The average cost of repair is ₹1000. If Weibull failures are assumed, what is the actual cost of repair for the manufacturer?

Answer:

If β is the shape parameter and θ is the scale parameter,

$$ \begin{aligned} & {1}-{\text{e}}^{-\left( {{1}/{\uptheta }} \right)^{\upbeta } } = {4}/{5}0 = 0.0{8}\;{\text{and}} \\ & {1}-{\text{e}}^{-\left( {{5}/{\uptheta }} \right)^{\upbeta } } = {22}/{5}0 = 0.{44} \\ \end{aligned} $$

Or, \(\left( {{1}/{\uptheta }} \right) = \left[ { - \ln 0.{92}} \right]^{{1}/{\upbeta }} \;{\text{and}}\;\left( {{5}/{\uptheta }} \right) = \left[ { - \ln 0.{56}} \right]^{{1}/{\upbeta }}\)

From the two Equations,

$$ \begin{aligned} & {5}\left[ { - \ln 0.{92}} \right]^{{1}/{\upbeta }} \\ & \quad = \left[ { - \ln 0.{56}} \right]^{{1}/{\upbeta }} \\ \end{aligned} $$

So,

$$ \begin{aligned} {\upbeta } & = \log {6}.{199}/\log {5} = 0.{84}/0.{69897} = {1}.{2}0{18}\;{\text{and}} \\ & \left( {{1}/{\uptheta }} \right) = \left[ { - \ln 0.{92}} \right]^{{1}/{\upbeta }} = \left[ {0.0{838}} \right]^{0.{786}} = 0.{141893} \\ & {\uptheta } = {1}/0.{141893} = {7}.0{4756} \\ \end{aligned} $$

Assuming that a failure occurring during the 1st year is covered by warranty, the expected Cost of a failure in the 2nd year is

$$ 1000[{\text{e}}^{ - ({1}/{7}.0{476})^{1.2018} } - {\text{e}}^{ - \left( {{2}/{7}.0{476}} \right)^{1.2018} } ] = {\text{Rs.}}\,{1}00.{63} $$

Thus, the cost claimed by the supplier is too high.

3.6 RAM Metrics and Documentation

RAMS managers must have metrics to identify needs for improvement. Metrics are also extremely important to validate whether the maintenance team activities. There are dozens of reliability metrics. It is best to choose a few important KPIs.

Important Maintenance metrics are:

  • Number of time directed task schedules are to be performed

  • Number of preventive maintenance tasks identified

  • Number of failure finding tasks identified

  • Number of run to failure tasks identified

  • Total number of preventive maintenance tasks identified

  • Preventive maintenance tasks complied

  • Preventive maintenance labour cost

  • Predictive maintenance labour cost

  • Emergency measure maintenance labour cost

  • Corrective maintenance labour cost

  • Total consumable cost

  • Total maintenance cost

  • Hours of scheduled and unscheduled downtime.

4 Future Strategies

Reliability influencing factors identification depends on expert judgment of actual data collection and analysis. So, cloud-based intelligent asset management implement expert systems, where views from several domain experts regarding failure removal can be collected and the best one can be adopted. Video conferencing with experts during emergency can help the on-site maintenance staff. Video surveillance of construction sites can supervise installation process and sensor-based degraded condition monitoring would help predictive maintenance. Artificial Intelligence with Machine-learning can help in decision making. A computerized maintenance management system keeps a computer database of information about all the maintenance operations and can produce status reports and detailed summaries of maintenance activities, allowing an efficient asset management. This management process can be local or cloud based. Local server based systems introduce higher costs, complex implementation and constant management system maintenance.

5 Conclusion

This paper describes the benefits, associated tasks plans and future strategies in quantified RAM based management of Train control assets. Some points are explained with mathematical equations and examples.