Keywords

1 Introduction

RAMS is an integrated discipline that includes measures and characteristics for system reliability, availability, maintainability, and safety that are tailored to operational and project objectives. It is commonly used in the operation and engineering of railway system [1] s. RAMS, according to Alstom, must be followed internationally by rolling stock suppliers for the development of Mass Rail Transit Systems (MRTS) [2]. It has become a widely discussed and researched discipline in recent years due to its robustness and flexibility in achieving defined objectives such as service time, safety, and cost limitation. Railway assets are designed for long-term use with high reliability and availability without compromising safety; therefore, asset maintenance must be carried out optimally and cost-effectively. This management tool has the potential to significantly improve the effectiveness and economic competitiveness of railway transportation in comparison to other modes of transportation. Modern railway systems are complex, incorporate multiple technologies, and operate in an environment where it is difficult to pinpoint precise system responses and behaviors. The use of today’s technology, such as computers, microprocessors, interconnected communication, and information technologies, in combination with historically developed electromechanical components, has dramatically increased the complexity of railway systems [3]. A primary goal for completing RAMS-related duties today is to obtain a safe, highly dependable, and available railway system, as well as an innovative and sustainable railway system. RAMS activities are also critical in this context for extending the lifespan of railway system. Railway RAMS-related standards require that railway manufacturers and operators to install a RAMS management system and verify compliance with specific safety and RAM requirements. While the standards only provide a general framework for RAMS activities, real-world implementation is still being researched [4].

This paper provides a general analysis of RAMS’s impact. The primary goal of this work is to inform engineers, industrialists, and researchers who are interested in RAMS as a promising technology in railway systems. In scientific indexes, many literatures, including the most recent articles, are reviewed from highly rated journals.

As technology advances, the environment changes, and consumer needs change, railway system designers and operators are constantly upgrading their various operational tasks. A secure and dependable network with sufficient capacity and availability is required. A railway system’s goal is to achieve a specific level of rail traffic in each time frame while remaining safe and within budget. The Railway RAMS procedure determines the system’s confidence in achieving this goal. Railway RAMS has a significant impact on customer service quality [5]. Table 1 shows component of RAMS:

Table 1 RAMS component

Common tools used to assess RAMS are Fault Tree Analysis (FTA) [7], Failure Mode Effect Analysis (FMEA) [8], Reliability Block Diagram [9] and many more.

2 Railway Reliability

The reliability of a product is closely connected to its quality. This criterion is one of the most important considerations during the various stages of product design, testing, and operation [10]. Reliability is a function of time, and it decreases as the period of operation lengthens. Because the cost of procuring each railway asset is very high, a high reliability product or system over a long period of time is required.

Aside from technical definition, railway reliability can also be defined in terms of train operationality. Vromans, M. (2005) investigated the reliability of a railway using train punctuality [11]. A train is considered reliable when it can run properly all of the time, allowing goods and other services to be delivered on time.

According to Durivage, M. A., reliability is “probability that an item will perform a required function without failure under stated conditions for a specific period of time [12].” According to this interpretation, studying the reliability of a system or component entails investigating its failure behavior. The failure event would be collected, tabulated, and plotted stochastically to understand and compute the relevant information.

To investigate the probability of failure, a population of products or systems must be observed over time. Gerokostopoulos et al. (2015) proposed an estimation approach as well as a risk control approach for calculating sample size for a reliability study [13]. A Probability Density Function (PDF) of a failure event could be developed with an adequate number of samples. A PDF’s data will contain the Time To Failure (TTF) of the samples, which is the time it takes for an individual sample to fail. The collected data is known as life data, and it is used to calculate the product’s lifespan. In the study of Life Data Analysis (LDA), it is known that there are a few failure distributions that most likely fit the collected data, which are exponential, lognormal, and Weilbull Distributions [14].

A histogram of failure numbers versus observed time is plotted, and the line of best fit is calculated. This TTF Distribution, f(x), will be used to calculate the Reliability, R(t). A cumulative probability function (CDF) is found by integrating the plotted TTF Distribution. In this case, CDF is also known as the probability of failure, F(t). R(t) is defined as the complement of F(t) [15, 16].

From PDF, mean time to failure, MTTF could be calculated. Hence, failure rate, λ could be computed with the formula below:

$$ mean,\,MTTF\, = \, \frac{{\int_{a}^{b} {f\left( t \right)dt} }}{b - a} $$
(1)
$$ \lambda = \frac{1}{MTTF} $$
(2)

Because of its constant failure rate, an exponential pdf is used to predict reliability when there is no historical data on the operation of a system [17].

The failure rate is typically high at the beginning and end of a system’s operation. When all the components start to work together, some minor adjustments may be required. As shown in the Bathtub graph, after a long period of operation, the components begin to wear out and contribute to a higher failure rate [18, 19]. The useful operational period with a low and steady failure rate is defined as the constant failure rate after the adjustment and wear out period [20]. The PDF for exponential distributed failure is:

$$ f\left( t \right) = \lambda e^{ - \lambda t} $$
(3)
$$ F\left( t \right) = \smallint f\left( t \right)dt = 1 - e^{ - \lambda t} $$
(4)
$$ R\left( t \right) = 1 - F\left( t \right) = e^{ - \lambda t} $$
(5)

Component reliability can also be calculated using historical data. If the component is highly reliable, a long period of operation is required to collect and capture adequate data on failure events. The failure probability density function could be plotted, and the best fit distribution found. With such a case study, the true reliability of the system or component could be discovered, and the value could be compared to the theoretical and predicted value.

Rail systems are made up of several subsystems, each with its own set of functionalities and reliability characteristics [21]. For example, an electro-pneumatic brake control system includes an electric, pneumatic, braking system, compressor, and other components [22]. To calculate the reliability of a system, all the subsystems that are interconnected must be determined and arranged in an orderly manner. Subsystem or component configurations are typically in series, parallel, or complex. The inter-reliability of the system could be calculated based on the configuration. This method is known as a reliability block diagram (RBD) [23].

2.1 Reliability Methodology

To successfully finding reliability using RBD, the boundary of the system under review needs to be defined. All the subsystems need to be arranged accordingly. FTA is a widely used method to be used together with RBD to determine the reliability and risk analysis of a system [24]. The arrangement of subsystem will be determined, either series, parallel or mixed combination. The calculation is straightforward and could be done using software such as Reliasoft (Fig. 1).

Fig. 1
figure 1

Possible arrangement of subsystems

Figure 2(a) shows the series combination, 2(b) shows parallel and 2(c) shows mixed combination. Reliability for series combination \({R}_{T}= {R}_{1}*{R}_{2}*\dots * {R}_{n}\) and \({R}_{T}= \frac{1}{ {R}_{1}*{R}_{2}*\dots * {R}_{n}}\) for parallel arrangement. To compute the mixed combination, the system needs to be divided into smaller series or parallel subsystems and using the two previous described equation to find the system’s reliability.

Fig. 2
figure 2

Uptime and Downtime of Railway Operation

3 Railway Availability

The availability of the required and relevant systems is critical for a railway asset manager’s train operation. To keep the line running smoothly, the number of inoperable rolling stock must be kept to a minimum. Furthermore, the electrical section of the rolling stock, as well as power traction, must be available and operable. Availability is closely tied to reliability and availability [25, 26]. Availability is defined as the sum of the total time the system is working properly or Uptime and the total time the system is not working properly or Downtime.

Where TPM is total preventive maintenance, TCM is total corrective maintenance and ALDT is Administrative and Logistic delay time. Figure 2.0 shows that availability is a combination of dependability (Uptime) and maintainability (Downtime) [27].

The time recorded when the train is in operation and in standby mode is referred to as uptime. Down time, on the other hand, is time recorded as maintenance time plus all administrative and logistical delays incurred to complete the maintenance schedule. Varies to the asset manager’s maintenance philosophy, the downtime could be much longer or shorter. This would almost certainly have an impact on a system’s availability. Thus, the expanded availability, A formula is as follow,

$$ A = \frac{OT + ST}{{OT + ST + TPM + TCM + ALDT}} $$
(6)

4 Railway Maintainability

The main objective of railway track maintenance and renewal is to ensure safety and meet quality standards [28]. The availability of a system or service is heavily influenced by an asset’s maintainability. The best maintained system is one that can always be relied on and is available when its service is required. To accomplish this, maintenance must be performed as much as needed but as infrequently as possible. The maintainability philosophy and methodology need to be adopted wisely by Train Operator Company (TOC).

There are several philosophies in planning and scheduling maintenance. Afzali et al. (2019) proposed a new model for reliability-centered maintenance (RCM) of electrical power distribution [29]. A reliability team thoroughly evaluates each critical component in this approach, and all failure modes are identified. The maintenance requirements will then be identified, and a preventive maintenance (PM) schedule will be created.

Su et al. (2019) are investigating another approach known as condition-based maintenance (CBM) for railway track maintenance in the Netherlands [30]. This is not significantly different from RBM. While RBM determines the PM through failure analyses, CBM considers the machine’s condition as well. Maintenance is performed only when it is necessary, based on continuous observation of the system or item conditions. Although monitoring required a small number of skilled workers at regular intervals, this method provides efficient use of the asset’s useful life [31].

4.1 Maintainability Methodology

There are several methodologies available to achieve the asset management philosophy. There are two main methods used in common practice: Corrective Maintenance (CM) and Preventive Maintenance (PM) [32]. The CM is tasked with restoring a faulty system to its pre-crash state. While PM is being completed on a set schedule or at a predetermined time. It could also be carried out based on the health of a specific system component. PM includes routine inspection and checking. To ensure high operational availability of the railway, PM is typically performed when the opportunity arises, such as late at night when the train is shut down. Opportunity maintenance is another name for this kind of maintenance [33].

Another two methods that gaining more interest in maintenance strategy are Proactive Maintenance (PaM) [34] and Predictive Maintenance (PdM) [35]. PaM focuses on resolving problems before they become failures. PdM, on the other hand, is a process that analyses and monitors machine performance and operational parameters to detect and diagnose developing problems before they cause failure and significant damage [36]. PdM can be performed using techniques such as oil analysis, mechanical ultrasound, vibration analysis, and wear particle analysis [37].

Total continuous monitoring is now possible thanks to advancements in information technology. The asset or plant under maintenance would only be monitored by sensors, which would record a wide range of data from the physical movements of a structure or piece of equipment, such as temperature, vibrations, and conductivity, among other things. The Internet of Things (IoT) is an important part of the process because it allows multiple systems to work together to translate and analyze recorded data to forecast when maintenance should be performed [38] Furthermore, new machine-learning technologies have the potential to increase the accuracy of predictive algorithms over time, resulting in even better performance [39].

Maintaining assets would necessarily require a maintenance team repairing or replacing faulty components or systems. The component’s reliability would normally decrease once it was repaired or replaced [40]. Repair is classified into three levels: perfect repair, minimal repair, and imperfect repair. Perfect repair restores the component to its original state; minimal repair restores the component to its pre-maintenance state; and imperfect repair restores the component to its pre-maintenance state [41].

Discussing all the philosophy and methodology of maintenance’s possibilities, it appears that the asset could always be in the best condition. The reality is that maintenance costs a lot of money. For example, Prasarana Malaysia Berhad, the operator of the Rapid Rail network, spends RM350 million per year on maintenance cost [42]. Around 70% of the maintenance cost, which includes employee wages and other corporate and management costs, will be used for technical maintenance. Manual inspection and monitoring will then consume approximately 30% of the technical maintenance cost [43]. The goal of all asset managers is to have a reliable, highly available, and safe operation at the lowest possible cost. This requires the best and most efficient maintenance scheduling and activities. Maintenance cost analysis could be used to calculate and determine the rate and scheduling of maintenance [44].

5 Railway Safety

A robust design with high reliability and accessibility to maintenance, combined with good management, would result in a high safety standard for railway operation. EN 50,129, as a guide for primarily electronic systems such as signaling, communication, and processing systems, discusses and assesses RAMS safety in railway [45]. EN 50,126-2 is another standard that describes railway safety requirements [6]. This standard, which supplements EN 50,129 on safety procedures, discusses other railway applications such as command, control, and signaling, rolling stock, and fixed installation. Wang et al. (2021) investigates the method for safety analysis using cusp catastrophe model. This method describes the ever-changing process of railway system safety and considers the emergent property of safety [46]. Liu et al. (2020) proposes a comprehensive model by combining commonly used practice in safety analysis such as Analytic Hierarchy Process (AHP) and Maximum Entropy Method (MEM) [47].

5.1 Safety Methodology

The main subtopics in Safety Analysis are Risk Assessment and Hazard Control. The EN 50,126-2 proposes Hourglass Model to describes activities conducted in safety analysis.

Hazard identification, consequence analysis, and risk acceptance are all components of risk assessment. It defines high-level system safety requirements, more specifically safety requirements for the system under consideration from the perspective of the railway duty holder and operator. It considers operational safety, previous rail application experience, and regulatory requirements.

Activities such as Causal Analysis, Hazard Identification (refinement), Common Cause Analysis, and Show of Compliance, on the other hand, must be carried out in accordance with Hazard Control standards. Hazard Identification in Risk Assessment focuses on high-level hazards derived from system functions (black boxes) and related system operation, as well as the system’s environment, whereas Hazard Identification in Hazard Control focuses on the event’s cause. Because there could be several causes for a hazard to occur, an iterated hazard identification process would normally be carried out in Hazard Control. The Bowtie Model is a method that is frequently used as a methodology for safety analysis [48].

FTA, FMEA and event tree analysis (ETA) are common techniques used for system reliability and safety [49]. FTA is a powerful diagnostic technique that uses logical and functional links between components, processes, and subsystems to identify the underlying causes of potential risks. A fault tree (FT) is a model that logically and graphically shows the various combinations of likely events in a system, both faulty and normal, that result in unexpected events or states. FTs can be used to determine the source of potential hazards. Faults can be caused by hardware failure, software error, or human error. Traditional FTA involves events and gates and is based on Boolean algebra. Logic modelling is a visual representation of basic event relationships.

6 Sustainability of RAMS

RAMS is an extensive tool that covers the entire life cycle of a system, project, or component. This effective tool is widely used in the design and operation of Critical Infrastructure (CI) such as petroleum platforms, servers, critical public construction and building, and many others [50]. To ensure its sustainability, RAMS has specific requirements for suppliers, project executors, tendering departments, project owners, and rail operators. It would ensure the operation’s dependability and safety. Aside from that, the costs incurred from project concept to project design phase and final decommissioning phase could be calculated and determined [51]. RAMS, as previously stated, is becoming a common term among railway engineering practitioners. This is due to the railway’s history, which dates back more than 200 years, with the first railway track being built in 1825 [52]. As a result, considering the age of railway engineering, the implementation of RAMS in railway is relatively new.

RAMS is a tool for understanding how a system works, particularly a complex and multi-interface system like a railway. Predicting vulnerabilities, weaknesses, and potential failures and explaining how they affected the system’s quality and performance is a significant management task. It also used a strategy to improve the operation’s quality to achieve optimal long-term availability and choose the best maintenance solutions [53]. An asset manager can use it to predict and plan appropriate maintenance strategies and technologies to be integrated into the existing system. The manager could take appropriate action at the lowest possible cost to ensure that the service is not disrupted.

The railway-based vehicle transportation system is both mission and safety critical [54] As a result, to ensure the safety and reliability of a railway system, all potential hazards that affect system components could be detected, analyzed, and controlled via RAMS. Safety and availability are higher level RAMS characteristics that can only be attained by meeting all reliability and maintainability requirements as well as controlling ongoing, long-term operation and maintenance activities. In short, RAMS provides a wide range of methodologies and approaches to engineers, system designers, clients, and system operators to address relevant concerns during the construction and operation of a specific system.

7 Life Cycle Cost Analysis

While RAMS is taking care of technical parameters in determining reliability, availability, maintainability, and safety along railway’s life cycle, LCCA determines whether the investment in meeting the RAMS requirements is leveraged or is there any other meaningful alternative [55]. Like RAMS, LCCA could be performed from the conceptual to operation and decommissioning phase. Any parameter and risk identified would have cost attribute, thus alternative or other option need to be defined in RAMS [56]. Both LCCA and RAMS are needed to make a right decision in railway project or operation.

Liden et al. (2016) investigate availability of train operation and maintainability (RAMS parameter) versus cost impact (LCCA) [57]. A model for assessing and dimensioning such maintenance windows is described in this study, which considers marginal effects on both maintenance costs and predicted train traffic demand. Senaratne et al. (2020) assessed the alternative for railway support material using LCC analysis [58]. Banar et al. (2015) evaluated the investment benefit of Turkey railway system for the passenger [59].

Generally, there are several stages in performing LCCA. The first stage involves the definition of objectives, assumption development, gathering all source materials and preparation of input data. In the second stage, RAMS parameter is prepared and analyze for all the proposed variants. Later, the LCC model is developed based on RAMS parameter and assumption in the first stage. In the fourth stage, the model is being analyze and all the calculation take place. The result of the analysis is being reviewed in the fifth stage. The model and calculation will then be verified in the sixth and final stage that required continuous monitoring and reassessment with real operational data.

8 Conclusion

RAMS is a major theme in system design, project execution, and asset management. As stated in System Life Cycle, it controls the product or system life cycle from the requirement and design phase to the decommissioning phase. RAMS introduces a wide range of interdisciplinary engineering, technical, logistic, and cost-effective methods, making it the best candidate for complex system management tools.

Because the use of RAMS in century-old railway infrastructure is not as widespread as in new industries such as petroleum, chemicals, and aviation, it is now becoming a new standard for the procurement and construction of new railway assets. As the application of RAMS is still in its early stages, it provides an excellent opportunity for engineers, consultants, and scientists to build and develop a better, standardized, and widely applicable RAMS methodology for the railway industry.

There are considerable limitations in implementing RAMS in railway industry. From authority, supply chain, management, sand also technical knowledge as well as player awareness on RAMS importance need a paradigm shift to fully implement RAMS. A workable policy needs to be drafted by the government that to be followed by all railway stakeholders. Absence of such a policy would affect the RAMS implementation throughout the lifecycle of railway. Often during developing RAMS, significant information from supplier is missing. Many suppliers do not reveal their design and technical specification to the degree that RAMS engineer could use for the calculation. A widely enforced standard needs to be developed to make sure suppliers meet the requirement for information relegation to relevant parties throughout railway construction and operation. These are a few future works that need to be addressed and researched as a first step into implementing RAMS.