Keywords

1 Introduction

The study of games against nature is an important field of game theory. The payoff of the active player might depend on weather conditions, rainfall, currency exchange rates, timings of breakdowns of systems of equipment to mention only a few. All these uncertain factors can be predicted with mathematical statistical methods and so some probabilistic characterizations become available making these factors random variables. There are usually two different ways to deal with such situations. In one case the probabilities of unfavorable outcomes are limited or minimized considering the situation as a zero-sum two-person game. In the other approach the expected payoff is optimized. This second type of approach is the usual method in reliability and quality engineering. In this paper we will also follow this approach.

In competing industries, units and machines are subject to failure by usage and time (Wang 2002). Performing perfect maintenance is not always possible for all the units due to the limitation in resources, budgets and time (Liu and Huang 2010). Maintenance can be performed imperfectly at different levels to return the system to somewhere between as good as new and as bad as old. Different models of imperfect maintenance have been developed by academic researchers, such as virtual age models. A comprehensive review is presented in (Pham and Wang 1996).

Virtual age models are developed by Kijima (1989). In one model, the virtual age of a system after \(n_{th}\) maintenance is \(y_{n} = y_{n - 1} + \alpha X_{n}\), where \(y_{n - 1}\) is the virtual age of the system before \(n_{th}\) maintenance, \(\alpha\) is the level of maintenance \((0 \le \alpha \le 1)\) and \(X_{n}\) is the \(n_{th}\) time to failure. In another model, the \(n_{th}\) repair decreases all the accumulated damage up to \(n_{th}\) failure, \(y_{n} = \alpha \left( {y_{n - 1} + X_{n} } \right)\). Two extensions of the Kijima model; proportional age reduction (PAR) and proportional age setback (PAS), have been studied by different researchers (Sanchez et al. 2009; Martorell et al. 1999; Zhou et al. 2007). Ferreira et al. (2015) presented a Weibull-based generalized renewal process using mixed virtual age model. Tanwar et al. (2014) provided a survey for imperfect repair models for repairable systems using the concepts of Generalized Renewal Process (GRP), arithmetic reduction of age (ARA), and arithmetic reduction of intensity (ARI).

Due to limitations in budget, resources, and time, maintenance might be performed at different levels and managers should make the decision according to actual condition of each unit. This sort of maintenance action is called selective maintenance (Cao et al. 2018; Rice 1999; Cassady et al. 2001), which is widely used in industry. Cassady et al. (2001) developed a mathematical programming model to select a subset of maintenance actions for making selective maintenance decisions, where component life length followed Weibull distribution. Cassady et al. (2001) established a framework for modeling and optimizing selective maintenance, considering different models and concluded different models resulted in different optimal selective maintenance decisions. Lüx et al. (2012) proposed a non-linear binary mathematical model for selective maintenance considering cannibalization and multiple maintenance actions. Pandey et al. (2013) addressed a selective maintenance model for a binary system under imperfect maintenance. They consider age reduction and hazard adjustment to make the model assumption more realistic. Cao et al. (2017) proposed a simulation method for selective maintenance model to maximize system availability. Time and budget are the most frequent constraints in selective maintenance models, which could be negligible, certain or uncertain (Ali et al. 2011). We consider the budget as a challenging constraint to select the best subset of maintenance actions.

Optimizing maintenance model of a system is more complex when the system consists of many components. Multi-unit maintenance models are focused on optimal maintenance policies for a system with several units (Nicolai and Dekker 2008; Cho and Parlar 1991). A multi-unit system might be affected by competing risks, which is modeled in (Zhang and Yang 2015). The authors considered a repairable multi-component system, where maintenance policy restores the entire system to as-good-as-new state after maintenance. Such assumptions are not realistic. Combining multi-unit system assumption and virtual age is used in (Liu and Huang 2010; Dao and Zuo 2017) to select optimal maintenance strategies. Liu et al. (2018) developed a selective maintenance model to choose a subset of maintenance actions, where maintenance time is stochastic. The authors applied the proposed model on a three-unit system and proposed their model application in industry. A systematic review of the selective maintenance models in multi-unit systems is presented in (Cao et al. 2018).

In this paper, we introduce a model for a selective maintenance policy of a multi-unit system with different initial virtual ages and different maintenance levels. The objective is to find the optimal preventive maintenance level for each unit to minimize the total maintenance cost subject to budget constraints. The imperfect preventive maintenance cost, replacement cost (for non-repairable failures) and minimal maintenance cost (for repairable failures) are included in the maintenance costs. We develop a binary integer programming model to analyze the proposed problem. To the best of our knowledge, this is the first study that optimizes the maintenance program for a multi-unit system with considering initial virtual ages, imperfect maintenance, and various maintenance levels.

This paper is organized as follows. In Sect. 2, we present the problem definition and problem formulation. Numerical examples are given in Sect. 3 to illustrate the model and its efficiency. Sensitivity analysis is performed in this section as well. Section 3 includes a real case study from the railway industry. Finally, Sect. 4 concludes the paper and provides future research directions.

2 Problem Definition

The basic problem is finding the optimal preventive maintenance policy for a multi-unit system with different initial virtual ages. In this problem, we assume that there are different preventive maintenance levels for each unit. First, we briefly describe the maintenance level concept.

In many industrial environments, there are different maintenance levels for different machines and units. As an example, based on the information of Mobility Work website (Giorgio and Pulcini 2018), the maintenance levels could be categorized in 5 different levels. Level 1 of maintenance includes simple maintenance actions that are necessary for the operations, such as condition monitoring rounds and daily lubrication. A maintenance Level 2 is the simple procedures performed for the equipment that is usually implemented by a qualified worker with a brief training, such as controlling the operation parameters in equipment, breaking and safety devices control. Level 3 consists of operations which need complex procedures and a qualified technician with detailed procedure. Level 4 maintenance, performed by a team, takes care of operations whose procedures use specific techniques or technologies, such as measuring and analyzing the machine vibration and Level 5 maintenance, which is named renovation or reconstruction operations, includes operations whose procedures apply a particular know-how and need special techniques, technologies or processes, like complete inspection on dismantled machines. In here, first different maintenance levels for each unit needs to be determined and next the following formulation can be used to determine the optimal maintenance policy.

2.1 Problem Formulation

The following notations are used throughout the paper.

\(k = (1,2, \ldots ,N)\):

Set of all units

\(M_{k}\):

Number of possible preventive maintenance levels for each unit k

\(i(0\, \le \,i\, \le \,M_{k} )\):

Preventive maintenance level i, which decreases the virtual age of each unit k from \(T_{k}\) to \(\alpha_{ki} T_{k}\) where \(\alpha_{ki} \in \left[ {0,1} \right]\)

\(C_{ki}^{(m)}\):

Preventive maintenance cost for unit k at maintenance level i

\(R_{k}\):

Number of repairable failure types for each unit k

\(\rho_{kj} (t)\) for \(j = 1,2, \ldots ,R_{k}\):

The failure rate of each repairable failure type j for each unit k

\(c_{kj}^{(r)}\):

Cost of minimal repair for repairable failure type j for each unit k

\(F_{k} (t)\):

The CDF of time to the first non-repairable failure for each unit k from zero virtual age

\(C_{k}^{\left( R \right)}\):

Cost of failure replacement for unit k

\(B^{(m)}\):

Preventive maintenance budget

\(B^{\left( R \right)}\):

Replacement budget

\(B^{(r)}\):

Minimal corrective repair budget

Consider a multi-unit system with N units and initial virtual ages \(T_{1} ,T_{2} , \ldots ,T_{N}\). The management wants to decide on the optimal preventive maintenance plan, which would minimize total expected cost. The preventive maintenance is performed at time zero and the planning horizon is the next T time periods. For each unit k, preventive maintenance with level i decreases the virtual age of the unit from \(T_{k}\) to \(\alpha_{ki} T_{k}\), where \(\alpha_{ki} \in \left[ {0,1} \right]\). There are \(M_{k}\) possible preventive maintenance levels, for each unit, i.e. \(0\, \le \,i\, \le \,M_{k}\). The preventive maintenance cost \(C_{ki}^{(m)}\) depends on the unit k and the maintenance level i. For each unit k, maintenance level 0 means that no preventive maintenance is performed with factor \(\alpha_{k0} = 1\) and cost \(C_{k0}^{(m)} = 0\), while the value \(\alpha_{ki} = 0\) refers to preventive replacement.

Each unit is subjected to both non-repairable failure and \(R_{k}\) types of repairable failures, with failure rate \(\rho_{kj} (t)\) for \(j = 1,2, \ldots ,R_{k}\). Let \(c_{kj}^{(r)}\) and \(C_{k}^{\left( R \right)}\) be the costs of minimal repairs for repairable failure type j and that of the failure replacement including possible damages. So generally three types of maintenance is considered for each unit: preventive maintenance, minimum repair for repairable failures and replacement for non-repairable failures.

During any time period of length X, the expected number of type j repairable failures is clearly

$$\int\limits_{{\alpha_{ki} T_{k} }}^{{\alpha_{ki} T_{k} + X}} {\rho_{kj} (t)dt}$$
(1)

So the total expected repair cost becomes:

$$\mathop \sum \limits_{j = 1}^{{R_{k} }} c_{kj}^{(r)} \int\limits_{{\alpha_{ki} T_{k} }}^{{\alpha_{ki} T_{k} + X}} {\rho_{kj} (t)dt}$$
(2)

Let t denote the time of the first non-repairable failure, then the conditional CDF considering the initial virtual age and imperfect preventive maintenance is given as

$$F_{ki} (t) = \frac{{F_{k} (t + \alpha_{ki} T_{k} ) - F_{k} (\alpha_{ki} T_{k} )}}{{1 - F_{k} (\alpha_{ki} T_{k} )}}$$
(3)

For simplicity, we assume that at most one non-repairable failure might occur during the considered time period of length T. If it occurs at time \(X \in \left( {0,T} \right)\), then the unit becomes as new, so the expected number of repairable failures until the end of the planning horizon is

$$\int\limits_{0}^{T - X} {\rho_{kj} (t)dt}$$
(4)

for failure type j, so the expected total repair cost for all types of repairable failures after the non-repairable failure is

$$\mathop \sum \limits_{j = 1}^{{R_{k} }} c_{kj}^{(r)} \int\limits_{0}^{T - X} {\rho_{kj} (\tau )d\tau }$$
(5)

If the time of the non-repairable failure \(X \in \left( {0,T} \right)\) were known, then the total minimal repair cost for repairable failures would have the following form:

$$\varGamma_{ki} \left( X \right) = \mathop \sum \limits_{j = 1}^{{R_{k} }} c_{kj}^{(r)} \left[ {\int\limits_{{\alpha_{ki} T_{k} }}^{{\alpha_{ki} T_{k} + X}} {\rho_{kj} (t)dt} + \int\limits_{0}^{T - X} {\rho_{kj} (t)dt} } \right]$$
(6)

Let \(f_{ki} (t) = F_{ki}^{\prime } (t)\). The total expected cost of preventive maintenance, minimal repairs and possible replacement during time period T is given as

$$\begin{aligned} \psi_{ki} \left( T \right) & = \int\limits_{0}^{T} {\varGamma_{ki} \left( X \right)f_{ki} \left( X \right)dX + \varGamma_{ki} \left( T \right)\left( {1 - F_{ki} \left( T \right)} \right) + C_{k}^{\left( R \right)} F_{ki} \left( T \right) + C_{ki}^{(m)} } \\ & = \mathop \sum \limits_{j = 1}^{{R_{k} }} c_{kj}^{(r)} \int\limits_{ 0}^{T} {\left[ {\int\limits_{{\alpha_{ki} T_{k} }}^{{\alpha_{ki} T_{k} + X}} {\rho_{kj} (t)dt + \int\limits_{0}^{T - X} {\rho_{kj} (t)dt} } } \right]f_{ki} (x)dx} \\ & \quad + \mathop \sum \limits_{j = 1}^{{R_{k} }} c_{kj}^{(r)} \left[ {\int\limits_{{\alpha_{ki} T_{k} }}^{{\alpha_{ki} T_{k} + T}} {\rho_{kj} (t)dt} } \right]\left( {1 - F_{ki} \left( T \right)} \right) + C_{k}^{\left( R \right)} F_{ki} \left( T \right) + C_{ki}^{(m)} \\ \end{aligned}$$
(7)

Now, we can formulate the optimization problem:

$$\hbox{min} \mathop \sum \limits_{k = 1}^{N} \mathop \sum \limits_{i = 0}^{{M_{k} }} x_{ki} \psi_{ki} \left( T \right)$$
(8)

Subject to:

$$\mathop \sum \limits_{i = 0}^{{M_{k} }} x_{ki} = 1,\quad \forall \,k$$
(9)
$$\mathop \sum \limits_{k = 1}^{N} \mathop \sum \limits_{i = 0}^{{M_{k} }} x_{ki} C_{ki}^{(m)} \le B^{(m)}$$
(10)
$$\mathop \sum \limits_{k = 1}^{N} \mathop \sum \limits_{i = 0}^{{M_{k} }} x_{ki} \left( {\int\limits_{0}^{T} {\varGamma_{ki} \left( X \right)f_{ki} \left( X \right)dX + \varGamma_{ki} \left( T \right)\left( {1 - F_{ki} \left( T \right)} \right)} } \right) \le B^{(r)}$$
(11)
$$\mathop \sum \limits_{k = 1}^{N} \mathop \sum \limits_{i = 0}^{{M_{k} }} C_{k}^{\left( R \right)} x_{ki} F_{ki} \left( {\alpha_{ki} T_{k} + T} \right) \le B^{\left( R \right)}$$
(12)

In the proposed model, the decision variables are as follows:

$$x_{ki} = \left\{ {\begin{array}{*{20}l} 1 \hfill \\ 0 \hfill \\ \end{array} \begin{array}{*{20}r} \hfill {{\text{if}}\;{\text{maintenance}}\;{\text{level}}\;i\;{\text{is}}\;{\text{chosen}}\;{\text{for}}\;{\text{unit}}\;k} \\ \hfill {\text{otherwise}} \\ \end{array} } \right.$$

Equation (8) is the objective function which is the overall expected maintenance cost. Equation (9) implies that for each unit only one maintenance level should be considered and constraint (10) shows the limitation in total preventive maintenance cost. The budget limitation for total corrective repair cost and replacement cost are required by inequalities (11) and (12), respectively.

It should be noted that if the unit failure rate follows Weibull distribution, then we have

$$\rho_{kj} (t) = \frac{{\beta_{kj} }}{{\eta_{kj} }}\left( {\frac{t}{{\eta_{kj} }}} \right)^{{\beta_{kj} - 1}} ,\forall \,k,j$$
(13)
$$F_{k} (t) = 1 - e^{{ - \left( {\frac{t}{{\eta_{k} }}} \right)^{{\beta_{k} }} }} ,{\forall }\,k$$
(14)
$$F_{ki} \left( X \right) = \frac{{F_{k} \left( {X + \alpha_{ki} T_{k} } \right) - F_{k} \left( {\alpha_{ki} T_{k} } \right)}}{{1 - F_{k} \left( {\alpha_{ki} T_{k} } \right)}},\forall \,k,i$$
(15)
$$\begin{aligned} f_{ki} \left( X \right) & = F_{ki}^{\prime } \left( X \right) = \frac{{f_{k} \left( {X + \alpha_{ki} T_{k} } \right)}}{{1 - F_{k} \left( {\alpha_{ki} T_{k} } \right)}} \\ & = \frac{1}{{1 - F_{k} \left( {\alpha_{ki} T_{k} } \right)}}\frac{{\beta_{kj} }}{{\eta_{kj} }}\left( {\frac{{X + \alpha_{ki} T_{k} }}{{\eta_{kj} }}} \right)^{{\beta_{kj} - 1}} e^{{ - \left( {\frac{{X + \alpha_{ki} T_{k} }}{{\eta_{kj} }}} \right)^{{\beta_{kj} }} }} \quad \forall \,k,i \\ \end{aligned}$$
(16)

So Eq. (6) can be rewritten as:

$$\begin{aligned} \varGamma_{ki} \left( X \right) & = \mathop \sum \limits_{j = 1}^{{R_{k} }} c_{kj}^{(r)} \left[ {\int\limits_{{\alpha_{ki} T_{k} }}^{{\alpha_{ki} T_{k} + X}} {\rho_{kj} (t)dt + } \int\limits_{0}^{T - X} {\rho_{kj} (t)dt} } \right] \\ & = \mathop \sum \limits_{j = 1}^{{R_{k} }} c_{kj}^{(r)} \left( {\frac{1}{{\eta_{kj} }}} \right)^{{\beta_{kj} }} \left[ {\left( {\alpha_{ki} T_{k} + X} \right)^{{\beta_{kj} }} - \left( {\alpha_{ki} T_{k} } \right)^{{\beta_{kj} }} + \left( {T - X} \right)^{{\beta_{kj} }} } \right] \\ \end{aligned}$$

Then, we can obtain Eq. (7) as follows:

$$\begin{aligned} \psi_{ki} \left( T \right) & = \int\limits_{0}^{T} {\Gamma _{ki} \left( X \right)f_{ki} \left( X \right)dX +\Gamma _{ki} \left( T \right)\left( {1 - F_{ki} \left( T \right)} \right) + C_{k}^{\left( R \right)} F_{ki} \left( T \right) + C_{ki}^{(m)} } \\ & = \mathop \sum \limits_{j = 1}^{{R_{k} }} c_{kj}^{(r)} \left( {\frac{1}{{\eta_{kj} }}} \right)^{{\beta_{kj} }} \int\limits_{0}^{T} {[\left( {\alpha_{ki} T_{k} + X} \right)^{{\beta_{kj} }} - \left( {\alpha_{ki} T_{k} } \right)^{{\beta_{kj} }} } \\ & \quad + \left( {T - X} \right)^{{\beta_{kj} }} ]f_{ki} \left( X \right)dX \\ & \quad + \mathop \sum \limits_{j = 1}^{{R_{k} }} c_{kj}^{(r)} \left( {\frac{1}{{\eta_{kj} }}} \right)^{{\beta_{kj} }} \left[ {\left( {\alpha_{ki} T_{k} + T} \right)^{{\beta_{kj} }} - \left( {\alpha_{ki} T_{k} } \right)^{{\beta_{kj} }} } \right]\left( {1 - F_{ki} \left( T \right)} \right) \\ & \quad + C_{k}^{\left( R \right)} F_{ki} \left( T \right) + C_{ki}^{(m)} \\ \end{aligned}$$
(17)

We can compute the equations of constraints accordingly. We use Weibull distribution as the unit failure rate in the following numerical examples.

3 Numerical Examples

In this section, we solve two simple numerical examples and one real world case study by utilizing CPLEX software to illustrate the efficiency of the proposed model.

3.1 Example 1

We consider a 3 unit system \(K = \left( {1,2,3} \right)\) with initial virtual ages \(T_{k} = \left( {1,1,1.5} \right)\) years. Each unit has 3 preventive maintenance levels and 4 repairable failure types, \(j = \left( {1,2,3,4} \right)\). The planning horizon is \(T = 3\) years, and the budget for preventive maintenance, minimal repair and replacements are \(B^{(m)} = \$ 40\), \(B^{(r)} = \$ 1000\), \(B^{\left( R \right)} = \$ 200\), respectively. The Weibull scale parameter and shape parameter for each repairable failure for each unit is presented in Tables 1 and 2, respectively. Cost of replacement is \(C_{k}^{\left( R \right)} = \left( {5,5,5} \right)\) and cost of minimal repair and preventive maintenance is presented in Tables 3 and 4, respectively. Effect of each preventive maintenance level for each unit can be seen in Table 5.

Table 1 Value of scale parameter \(\eta_{kj}\) of Weibull distribution for Example 1
Table 2 Value of shape parameter \(\beta_{kj}\) of Weibull distribution for Example 1
Table 3 Cost \(c_{kj}^{(r)}\) of minimal repair of failure of type j for each machine k for Example 1
Table 4 Preventive maintenance cost \(C_{ki}^{(m)}\) for unit k at maintenance level i for Example 1
Table 5 Age reduction coefficient \(\alpha_{ki}\) in virtual age model for Example 1

Applying the above parameters, we first compute the total expected maintenance costs, \(\psi_{ki} \left( T \right)\), and the results are presented in Table 6. The total maintenance cost includes preventive maintenance, minimal repair for all repairable failure types and replacement cost for non-repairable failure.

Table 6 The total expected cost \(\psi_{ki} \left( T \right)\) of preventive maintenance, minimal repairs and possible replacement for Example 1

Next, we optimize function (14) and obtain $222.318 as the optimal objective value. The optimal decision variables can be seen in Table 7. It shows that the management should consider preventive maintenance level 1 for unit 1 and unit 2, and level 2 for unit 3. It is clear that any other combination of maintenance levels for the units would lead to larger maintenance costs.

Table 7 The optimal preventive maintenance policy for Example 1

3.1.1 Discussion on Weibull Distribution Parameters

The Weibull parameters are the critical factors in determining the optimal solutions. We are now examining how the value of shape and scale parameters affect the optimal solution. First, we solve the proposed example for random values of shape parameter \(\beta_{kj}\). The results are shown in Table 8. It is pretty clear that even a small change in shape parameter would lead to change in optimal solution. Likewise, we perform the optimization problem for various value of scale parameter \(\upeta_{kj}\) (Table 9). Like as shape parameter, any deviation in scale parameter value remarkably changes the optimal solution. Therefore, it is very important for decision makers to indicate the precise and correct value of Weibull parameters if they want to obtain the real optimal solution.

Table 8 Optimal solution corresponding to various values of shape parameter \(\beta_{kj}\)
Table 9 Optimal solution corresponding to various value of scale parameter \(\upeta_{kj}\)

There are different methods to estimate the distribution parameters. Generally, these methods categorized into two groups: (1) the graphically method such as probability plotting and hazard plotting, and (2) the analytically methods such as method of moment (MOM) least square method (LSM), maximum likelihood estimation (MLE) and density power method (DPM). All these methods depend on the data quality that are used to estimate the parameters. Many of data analysis and statistical packages easily compute the Weibull parameters based on the given data.

3.2 Example 2

In this example, we consider a system with 10 units, 5 preventive maintenance levels for each unit, and 8 types of possible repairable failures. The planning horizon is T = 5 years and we assume the maintenance budgets as \(B^{(m)} = \$ 4000,B^{\left( R \right)} = \$ 20000,\) and \(B^{(r)} = \$ 100000\). The rest of the parameters are presented in Tables 10, 11, 12, 13, 14 and 15.

Table 10 Value \(\eta_{kj}\) of scale parameter of Weibull distribution for Example 2
Table 11 Value \(\beta_{kj}\) of shape parameter of Weibull distribution for Example 2
Table 12 Cost \(c_{kj}^{(r)}\) of minimal repair of repairable failure of type j for unit k for Example 2
Table 13 Preventive maintenance cost \(C_{ki}^{(m)}\) for unit k at maintenance level i for Example 2
Table 14 Age reduction coefficient \(\alpha_{ki}\) in virtual age model for Example 2
Table 15 Values of \(C_{k}^{\left( R \right)}\) and \(T_{k}\)

We next compute the total expected maintenance costs for all units in all possible preventive maintenance levels, and the results can be seen in Table 16.

Table 16 The total expected cost \(\psi_{ki} \left( T \right)\) of preventive maintenance, minimal repairs and replacements for Example 2

We then solve the optimum problem to find the optimal solution. The results show that minimum total maintenance cost for all units is $64768.95 The corresponding optimal decision variable are given in Table 17. Based on the results, we conclude that the management should apply maintenance level 1 for all units except for units 7 and 10, where maintenance level 2 is optimal.

Table 17 Optimal solutions for Example 2

Next, we perform sensitivity analysis based on the Example 2 information. First, we vary the value \(\beta_{11}\) of Weibull distribution shape parameter of unit 1 at failure type 1. The results are shown in Fig. 1.

Fig. 1
figure 1

Total expected maintenance costs for unit 1 with varying Weibull shape parameter \(\beta_{11}\) of failure type 1

It is clear that when the shape parameter is increased, the total maintenance cost is increased, as well. The same process for the Weibull scale parameter \(\eta_{11}\) of failure type 1 can be done.

Next we repeat the analysis of the maintenance costs based on different values of T for all units in maintenance level 1. The results given in Fig. 2 show that increase in the total life cycle for each unit leads to an increase in the maintenance cost. Thus, when the considered unit age becomes larger, the maintenance cost increases as well. From managerial point of view, it is implied that the unit replacement strategy is justified when the unit age becomes old.

Fig. 2
figure 2

Total expected maintenance costs for all units at maintenance level 1 regard to various T

3.3 Case Study

In this section we present a real world application of the model for railroad tracks. Track repairable failure types are categorized into two main categories, structural and geometrical failures. While structural defects are created by structural conditions of the track, including rail, sleeper, fastening, sub-grade and drainage system, geometry failures are related to bad condition of the rail geometry parameters, such as profile and alignment (He et al. 2015). In this study, we consider three of the major repairable geometry failure types as follow:

  • The first is the surface failure type, which measures any non-uniformity of the top surface of a single rail. As can be seen in Fig. 3, the surface measurement can be positive or negative when there is a hump or a dip, respectively.

    Fig. 3
    figure 3

    Graphical representation of surface failure

  • The second repairable failure type, demonstrated in Fig. 4, is DIP, which measures a fall or a rise in the centerline of the track.

    Fig. 4
    figure 4

    Graphical representation of DIP failure

  • The third is the cross level (X-level) failure type, which measures the difference in elevation of top surface of two rails at any specific point of the railroad track. The cross level measurement is mostly performed under load since the rails can move up or down under a load. Figure 5 presents cross level defect.

    Fig. 5
    figure 5

    Graphical representation of cross level failure

Geometry cars equipped with sensors, GPS and measurement devices, periodically inspect tracks and record different track geometries such as track alignment, elevation, curvature and track surface. Part of the data that geometric cars gather are segment number, milepost, defect amplitude, and class. A brief definition of these variables is as follows.

  • Segment number: Segment is like tracks connecting two cities

  • Milepost: Point on the track segment

  • Defect type: Geometry defect types

  • Defect amplitude: Size of defect in inches or degrees

  • Class: All tracks get a number between one and five. Each class represents operating speed limits for passenger and freight traffic. Class one has the lowest speed limit and class five has the highest speed limit.

Federal Railroad Administration (FRA) defines the defect amplitude threshold of each failure type and a defect amplitude recorded by geometry cars is considered a failure if greater than the threshold. Such defects violate FRA safety standards and need immediate maintenance. The failure threshold for each failure type is presented in Table 18.

Table 18 Failure amplitude threshold for each failure type in different rail classes (inch)

In this study we consider segments as different units of the system, where each segment can have three types of repairable failure; DIP, surface and X-level. A unit is considered failed when the defect amplitude of at least one milepost is greater than the FRA threshold. That milepost needs to be minimally repaired.

To elaborate the real application of the model, the required data has been obtained from Burlington Northern and Santa Fe (BNSF) Railway Company. BNSF Railway is one of the major freight railroad networks in North America and is one of the seven class I railroads in US. We consider the track geometry failures from 2007 to 2013. We consider a track with three segments. For each segment, we analyze the failure time data recorded by geometry cars and estimate a Weibull distribution for each failure type in class 5 rails. Table 19 shows the results.

Table 19 Estimated Weibull parameters for different modes of failure in rail segments

The cost of minimal repair for each unit based on the failure type is presented in Table 20. The minimal repair cost is the same for each segment.

Table 20 Cost \(c_{kj}^{(r)}\) of minimal repair of failure type j in segment k

There are two maintenance levels to preventively maintain any track segment; tamping and stone blowing. In tamping process, a tamping machine raise the sleepers and ballast the stone under them, while in the stone blowing process the ballast rests and the stone will be blown under them. The preventive maintenance cost for each segment is shown in Table 21. The age reduction coefficient \(\alpha_{ki}\) for all segments in tamping process is 0.6 and in stone blowing maintenance is 0.8.

Table 21 Preventive maintenance cost \(C_{ki}^{(m)}\) for segment k at maintenance level i

The initial virtual age of each segment is \({\text{T}}_{k} = \left( {4,5,3} \right)\) years respectively. The replacement cost for all segments is same and is equal to $480000. We assume the time horizon of \({\text{T}} = 5\) years and maintenance budgets as \(B^{(m)} = \$ 500000\), \(B^{(r)} = \$ 1000000\), \(B^{\left( R \right)} = \$ 600000\).

Using the above data, the total maintenance cost for each segment is shown in Table 22.

Table 22 The total expected cost, \(\psi_{ki} \left( T \right)\) of maintenance, repairs and possible replacement

The optimal solution shows that to minimize the total maintenance cost, all segment should consider the maintenance level 1 (Table 23). The minimal total maintenance cost for the whole system is $410167.

Table 23 Optimal solutions for case study

4 Conclusion

In this paper, we developed a new model to study imperfect maintenance of a multi-unit system with different maintenance levels and different initial virtual ages. The mathematical formulation was given and numerical examples were presented. A real world case study of rail tracks was presented and the optimal maintenance level for each unit and the corresponding total maintenance costs for each maintenance policy were presented in the results. Moreover, sensitivity analysis was performed to study the impact of changing some model parameter values. This result can assist the management to realize the importance of the correct estimation of the model parameters.

The model introduced in this paper can be extended in several ways. The preventive maintenance time can be optimized in addition to the current maintenance level. Instead of expected cost we could also consider expected cost per unit time, when the cycle ends either at time T or at the time of non-repairable failure, which occurs first. These model variants will be the subject of our next project.