1 Introduction

Most of the systems are made up of several subsystems, each of which is comprised of some components. Subsystems are connected to each other in series, such that failure of a single subsystem leads to total system failure. Therefore, to avoid total system failure in case of failure of one component, it is better to put more than one component in parallel in each subsystem. Increased number of components in each subsystem leads to increase of system cost, weight, and volume (Chern (1992). Hence, system designers should determine number of redundant components in such a way that the system configuration is optimal, which means costs are minimized, availability is maximized, and constraints pertaining to the system weight and volume are taken into account.

Redundancy Allocation Problem (RAP) is one of the most authentic and best-known reliability optimization problems. In such problems, it is posited that several subsystems are connected together in series, such that each subsystem, \(i \in \left\{ {1,2, \ldots ,N} \right\}\), can be constituted of several (\(n\)) parallel components (Fig. 1). In RAP, the aim is to select the optimal number of parallel components in each subsystem (Chern 1992). For the RAP to determine the best and most appropriate solution, for system designers, the real system features must be incorporated into RAP modeling and these conditions must be integrated into the modeling. Some of the most important features are: availability (system repair), redundant system configuration (parallel and standby), and time and dependent failures (CCF and Load share).

Fig. 1
figure 1

Series parallel system of RAP

The term ‘availability’ refers to a property of systems with repair capability or checking after each period of operation (mission). Integrating the repair in reliability allocation models exerts a significant effect on their outcome. Increased repair rate is accompanied by increased availability and decreased repair rate is accompanied by decreased availability (Lambert et al. 1971). In recent years, the discussion on repair in RAP has received considerable attention. Arabi and Jahromi (2012) modeled the steady-state availability of a system with several redundant subsystems according to the cold standby strategy considering repairable components. They constructed their model using the Markov process. Lins and Droguett (2011) considered the system structure and the number of maintenance groups and provided comprised solutions set in their proposed methodology. This system was subject to imperfect repairs. They presented a Multi-Objective Genetic Algorithm (MOGA) in combination with discrete event simulation to solve the problem. Liu (2015) considered an availability optimization problem with repairable components with different constraints such as weight, volume, the required level of components reliability, and cost. To solve the problem, he developed a redundancy allocation heuristic method by combining four algorithms constituted of tabu search, simulated annealing, non-equilibrium simulated annealing, and the genetic algorithm. He applied the sensitivity analysis for the proposed approach under design limitation. A common categorization of RAP optimization is based on reparability of components in the system and includes reliability optimization and availability optimization. Zoulfaghari et al. (2014a) evaluated a system with configuration for both repairable and non-repairable components concurrently by presenting a Mixed Integer Nonlinear Programming (MINLP) model for the availability optimization problem. To solve their model, they developed a Genetic Algorithm (GA).

Guo et al. (2014) considered a parallel-series system with the repairable components. In their paper, failure rate, repair rate, and comparative factors were considered as uncertain variables. Also, for solving the Multi-Objective RAP (MORAP), they proposed an efficient Non-Dominated Sorting GA (NSGA-II). Ebrahimipour and Sheikhalishahi (2011) investigated a multi-objective reliability redundancy allocation problem and solved their proposed model to obtain the number of components and reliability of each component in the subsystems. In their paper, parameters were considered as fuzzy numbers. To study effects of the repair strategy, several approaches under various situations were investigated. To increase the system availability, non-identical multi-state components can be added as a redundant component in the parallel configuration in the subsystem. Nourelfath et al. (2012) presented a model for investigating redundancy and imperfect preventive maintenance planning optimization in the series–parallel multi-state degraded systems. They utilized Markov process to model and analyze the repair rates. Xie et al. (2014) presented an operational availability maximization model with two decision variables of the component redundancy and the number of spares stocking under cost and physical limitation. In their paper, a single repairable k-out-of-n system under various shut-off rules was modeled by a developed continuous-time Markov chain. Some main parameters of operational availability and spare parts availability were calculated using the model. In the paper by a multi-objective Joint Availability Redundancy Allocation Problem (JARAP) in the series–parallel system was solved using a Simulation-Based Optimization (SBO) method by Attar et al. (2017). With an emphasis on the developed SBO method, they considered random free-distributed time for failures and repair times under several standby conditions. To solve the model, two efficient algorithms were used: a Non-Dominated Sorting Genetic Algorithm (NSGA-II) and a Strength Pareto Evolutionary Algorithm (SPEA2). Khalili-Damghani et al. (2013) proposed a Dynamic Self-Adaptive Multi-Objective Particle Swarm Optimization (DSAMOPSO) method to solve the binary-state Multi-Objective Reliability Redundancy Allocation Problems (MORAPs). Different properties of their proposed method made it robust and competitive, among other existing methods. A procedure based on an extended version of efficient method and Data Envelopment Analysis (DEA) was proposed to solve binary-state Multi-Objective RAP series–parallel problem by Khalili-Damghani and Amiri (2012). Other innovative ways have been used to solve problems of RAP. Khalili-Damghani et al. (2014) also proposed a Decision Support System (DSS) to efficiently solve the Multiple Objective Decision Making (MODM) problems by producing a Pareto front with a Decision Maker (DM) preferred resolution. The core of proposed DSS is based on Topsis module, modified efficient \(\varepsilon\)-constraint module, and DEA module. Different applications of metaheuristic methods and MODM methods in RAP can be found in the literature (Ardakan and Rezvan 2018; Yeh 2018; Dolatshahi-Zand and Khalili-Damghani 2015; Li et al. 2010; Taboada et al. 2007; Zio and Bazzo 2011).

There are the different structures in a redundant system, including parallel, k-out-of-n, hot standby, warm, and cold standby. The difference between parallel and other structures is the operating condition of all components. Chen and You (2005) considered a series–parallel redundant reliability problem, in which both the multiple component choices of each subsystem and the redundancy levels of every selected component were to be decided simultaneously so as to maximize the system reliability. In another study, Soylu and Ulusoy (2011) studied a bi-objective redundancy allocation problem on a series–parallel system with component level redundancy strategy. Their main aim was to maximize the minimum subsystem reliability, while minimizing the overall system cost. Gen and Yun (2006) introduced several variations of reliability design problem with parallel configurations such as reliability design problems of redundant system, reliability design problems with alternative design, reliability design problems with time-dependent reliability, reliability design problems with interval coefficients, and reliable networks design problems. They have also described various GA-based approaches for the problems. Bhunia et al. (2010) studied reliability stochastic optimization problem in the series–parallel systems with various limitations of resources. For solving the problem, they considered reliability of each component as fuzzy-random quantities with known probability distributions and fuzzy membership functions.

Chambari et al. (2012) presented a model with two objectives: of maximizing the reliability and minimizing the cost of the system in a RAP. In their paper, two redundancy strategies were considered: active and cold standby. They had also effectively solved the model using NSGA-II and MOPSO. Kim and Kim (2017) addressed the Reliability Redundancy Allocation Problem (RRAP) of either active or cold standby components by considering an optimal redundancy strategy. For modeling purposes, they utilized Markov chains and solved their proposed model using a parallel GA.

There is a limited number of studies on a specific operational time in RAP. Specifically, a specific time implies that systems can or should be operational within a short period of time, i.e., it should respond to a situation instantaneously. For example, consider one-shot systems, which should operate at a specific and very short time. Therefore, it is crucial to consider operation time in RAP modeling. This can be achieved by utilizing Markov chains. Amiri and Ghassemi-Tari (2007) proposed a methodology based on continuous-time Markov chain for analyzing system availability with instantaneous responses. Their proposed method was applicable to series, parallel and k-out-of-n system configurations. An instantaneous availability model for repairable multi-state system (MSS) was studied by Yu et al. (2014). Their model was created from combination of both Markov process and Universal Generating Function (UGF). Xu and Hu (2013) investigated instantaneous availability of a kind of repairable system with preventive maintenance using Markov chain. They also developed a time-dependent solution, which is essential for analyzing the instantaneous availability of the repairable system.

Types of dependent failures in reliability engineering are introduced in (Mortazavi et al. 2016; Mortazavi et al. 2017). Load share and common cause failure (CCF) are among the most important dependent failures. da Costa Bueno (2005) utilized a mathematical expression to allocate spare components in a k-out-of-n system and assumed that components are dependent on each other. Li et al. (2010) studied the heterogeneous redundancy optimization of multi-state series–parallel systems. Their main aim was minimizing costs and finding an optimal level of redundancy. In this regard, they incorporated CCF into their proposed model and concluded that considering CCF results in a different redundancy allocation strategy compared to the case where CCF is absent. Ramirez-Marquez and Coit (2007) formulated a RAP in the presence of CCF and presented three non-linear optimization models categorized based on computational time, degree of similarity to the true system behavior and CCF modeling. After solving models, they concluded that considering CCF significantly influences the outcomes. Arabi and Jahromi (2013) designed a new model for RAP using Markov chain. Two types of decision variables, redundancy level and number of repairmen, are directly calculated in the objective function. In the designed model, load share was also considered, i.e., failure of a component in a subsystem affects the load on others.

To create an applicable model of the RAP, real-world features of the problem should be incorporated. Specifically, as salient matter, the repair rate affects availability of each system. Therefore, in this paper, a parameter named work interference factor is defined to model influence of the number of repairmen on the repair rate in real-world situations. As previously mentioned, structure of redundant systems can be of different types. One of the most important structures, which is considered the least in RAP, is the k-out-of-n structure. In this structure, the dependent failure decreases availability of the system and it should be considered in reliability analyses. The scarcity of research on RAP in k-out-of-n systems under realistic situations was our main inspiration in this paper, which led to development of a dynamic RAP model considering the realistic assumptions, including time, work interference factor, repairable components, and their dependent and independent failures.

2 Model development of RAP

As explained in Sect. 1, system designers seek to increase system availability; hence, in recent years, many researchers endeavored to increase system availability through various models and techniques. In this section, a model along with its solution algorithm for a RAP in a k-out-of-n system considering load share, reparability, interference factor, and instantaneous response availability will be discussed.

2.1 Problem description and assumptions

Assume a system with m different subsystems connected in series, each has n components with k-out-of-n configuration. In addition, suppose that the failure time of each component in the ith subsystem follows exponential distribution with parameter λ. Exponential distribution is commonly used for its mathematical simplicity and sufficiently realistic description of life time and time to failure (Robinson and Neuts 1989; Rausand and Høyland 2004; Cui and Li 2007). For instance, Çekyay and Özekici (2015) and Kuo et al. (2014) made the assumption that both failure and repair rates follow exponential distribution, considering a k-out-of-n configuration in which failure of k out of n components (in each subsystem) results in the whole system failure.

Components in each subsystem maintain an interdependency of load share type (the subsystems themselves are independent). Load share is a kind of dependent failure in redundant systems in which failure in one redundant system component increases the load on the surviving ones, thus increasing their failure rate. The capacity flow model is a common functional model for computing failure rate of redundant system components with constant failure rate and load share (Pozsgai et al. (2003). Equation (1) presents the formulation of this model for computing failure rate in each subsystem.

$$\lambda_{ij} = \left( {\frac{{n_{i} }}{{n_{i} - j}}} \right)^{{\gamma_{i} }} \cdot \lambda_{i0} ,$$
(1)

where, \(n_{i}\) refers to the number of components in the ith subsystem, j is the number of failed components in the subsystem, \(\gamma_{i}\) is the load factor for the ith subsystem, \(\lambda_{i0}\) represents the initial failure rate for the ith subsystem and \(\lambda_{ij}\) denotes the failure rate of surviving components after failure of j components in the ith subsystem. Load share is common in most redundant systems, including electric generators, water pumps, cable-stayed bridges, CPUs, etc. (Shao and Lamberson 1991).

Components in each subsystem are repairable. Each component can be repaired by a single repairman or more repairmen. Each subsystem may suffer failure in one or several components. In such case, a single repairman or a team of repairmen can repair one failed component at any moment. Here, we assume that each component has a constant repair rate and the repair time follows exponential distribution. In addition, suppose that transfer of repairmen among subsystems is not allowed, i.e., each repairman repairs a specific subsystem and is assigned to that specific subsystem. Each subsystem can have \(y_{i}\) identical repairmen. There is a direct relationship between the number of repairmen and repair rate. More specifically, assume that \(\mu_{i}\) represents the repair rate for each component in the ith subsystem and \(y_{i}\) specifies the number of repairmen for the ith subsystem, then the repair rate of this subsystem is obtained by \(y_{i} \times \mu_{i}\). In real-world situations, due to the work interference among repairmen, increased number of repairmen does not necessarily lead to an increase in repair rate. High degree of work interference reduces repair rate; thus, it diminishes system availability. Equation (2) is a way of modeling the relationship between the number of repairmen and the repair rate, where \(\alpha \in \left( {0,1} \right)\) is the work interference factor. A value close to 1 for the interference factor indicates high work interference among repairmen, while a value close to 0 indicates low work interference.

$$\mu_{i} \times y_{i} \times \left( {1 - \alpha_{i} } \right)^{{\left( {y_{i} - 1} \right)}} .$$
(2)

As an illustration, assume that the repair rate and the interference factor of a subsystem are equal to 0.02 and 0.2, respectively. Table 1 presents repair rate versus number of repairmen. As can be seen, increasing the number of repairmen up to five repairmen leads to increase in repair rate. However, repair rate declines with six or more repairmen due to the work interference. RAP modeling must consider work interference to prevent unnecessary increase in the number of repairmen. In case of high work interference among repairmen, increasing the number of repairmen not only reduces the availability of the system, but imposes higher costs on the systems (increased maintenance costs).

Table 1 Effect of the number of repairmen on the repair rate

It should be noted that the number of components in each subsystem is limited and varies between ȴmin and ȴmax. The fact that each subsystem has a k-out-of-n configuration necessitates the existence of at least two components in each subsystem, i.e. ȴmin > 2. Similarly, the number of repairmen in each subsystem lies in the range [\(\eta_{\hbox{min} } ,\eta_{\hbox{max} }\)]. All components in the subsystem are repairable (All components are fully repaired); therefore, each subsystem requires at least one repairman, i.e. \(\eta_{\hbox{min} } > 1\).

Since k and n are decision variables, each subsystem can have a specific configuration; therefore, each system is thought to have different installation costs. Furthermore, since the number of repairmen may vary from one subsystem to another, repairman fees are considered to be different for each subsystem. Moreover, the components function in a binary (not multi-state) manner.

With the above descriptions and assumptions regarding the addressed RAP, model of the current problem can be formulated as follows.

2.2 Notation

See Table 2.

Table 2 Symbols

2.3 Mathematical model of RAP

Based on the forgoing assumptions and Table 2 of symbols, the mathematical RAP model is as follows:

$${\text{Max}}\;A\left( t \right) = \mathop \prod \limits_{i = 1}^{N} A_{i} \left( {n_{i} ,k_{i} ,y_{i} ,\lambda_{i0} ,\lambda_{ij} ,\gamma_{i} ,\mu_{i} ,\alpha_{i} ,t} \right),$$
(3)
$${\text{Min}} \mathop \sum \limits_{i = 1}^{N} \left( {c_{i} n_{i} + h_{i} y_{i} } \right),$$
(4)
$${\text{s}}.{\text{t}}.,$$
$$\mathop \sum \limits_{i = 1}^{N} w_{i} n_{i} \le W,$$
(5)
$$\mathop \sum \limits_{i = 1}^{N} v_{i} n_{i} \le V,$$
(6)
(7)
$$\eta_{{min} } \le y_{i} \le \eta_{{max} } .$$
(8)

The two objective functions in the forgoing model are in conflict with each other. The first objective is to maximize total system availability at a specific time t and the second is to minimize the total cost. Since all subsystems are arranged in series, total system availability is determined by multiplying \(A_{i} \left( t \right)\) of each subsystem. Hence, it is imperative to compute the \(A_{i} \left( t \right)\) of each subsystem prior to computation of A(t). Great number of studies has developed various methods to compute the availability of k-out-of-n systems (Mortazavi et al. 2016; Carpitella et al. 2017; Li et al. 2016). In the present paper, the value of \(A_{i} \left( t \right)\), which is a function of number of components (\(n_{i}\)), number of (\(k_{i}\)), number of repairman (\(y_{i}\)), dependent failure rate (\(\lambda_{ij}\)), repair rate (\(\mu_{i}\)), interference factor (\(\alpha_{i}\)), and specific time (t), is computed using Markov chain and transition matrix. According to the forgoing discussion, each subsystem can have its specific configuration. For system designers, it is of vital importance to ascertain the proper configuration and number of repairmen for each subsystem.

Figure 2 illustrates the transition diagram for a k-out-of-n system. In this diagram, each state presents the number of failed components in the subsystem. Hence, state 0 denotes absence of failed components in the system (all components are operating) and state n indicates failure of all subsystem components. It is worth mentioning that all components in each subsystem operate at moment 0.

Fig. 2
figure 2

Transition diagram for k-out-of-n configuration

In Fig. 2, each state includes two events: if a component fails, the system enters the next state, and if the failed components are repaired, the system returns to the previous state. As mentioned previously, components in each subsystem maintain an interdependency of load share type. Failure of one component is followed by increased load on others, hence their increased failure rate. Under such circumstance, failure rate varies from one state to the next, which can be computed by Eq. (1). Besides, there are \(n_{i}\) components arranged in series in each subsystem; therefore, transition rate for each state equals \((n_{i} - j) \times \lambda_{ij}\) (i represents subsystem index and j is the number of failed components in each state).

Repair rate for any state is dependent on the number of repairmen and the interference factor, which is determined by Eq. (2). Due to interference in maintenance activities which increases carelessness and indiscipline, not only increased number of repairmen does not necessarily entail increased repair rate, but also it might even reduce repair rate, resulting in reduced availability for each subsystem.

In k-out-of-n systems, if k out of n system components fail, the whole system is considered a failure. Given the transition diagram in Fig. 2, in which each state indicates the number of failed components, the availability of each system is determined as follows:

$$A_{i} \left( t \right) = \mathop \sum \limits_{j = 0}^{k - 1} p_{ij} \left( t \right).$$
(9)

For instance, Fig. 3 illustrates the transition diagram for a 2-out-of-3 system. Availability of this system is equal to the sum of probabilities of states 0 and 1 (states marked in green).

Fig. 3
figure 3

Transition diagram for 2-out-of-3 configuration

Computation of failure probabilities requires the conversion of the transition diagram in Fig. 2 into a transition rate matrix named \(Q\) here. \(Q\) is a square matrix since the number of rows and columns are equal to the number of transition matrix states. For example, \(Q_{12}\) shows transition rate from state 1 to state 2. The transition matrix for the transition diagram in Fig. 2 is based on the following:

$$Q = \left[ {\begin{array}{*{20}l} { - n_{i} \lambda_{i0} } \hfill & {n_{i} \lambda_{i0} } \hfill & 0 \hfill & \cdots \hfill & 0 \hfill & 0 \hfill \\ {\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} } \hfill & { - \left( {\left( {\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} } \right)}\right.}\\& \quad {\left.{+ (n_{i} - 1)\lambda_{i1} } \right)} \hfill & {(n_{i} - 1)\lambda_{i1} } \hfill & \cdots \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & {\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} } \hfill & { - \left( {\left( {\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} } \right) }\right.}\\&&\quad{\left.{+ (n_{i} - 1)\lambda_{i2} } \right)} \hfill & \cdots \hfill & 0 \hfill & 0 \hfill \\ \vdots \hfill & \vdots \hfill & \vdots \hfill & \ddots \hfill & \vdots \hfill & \vdots \hfill \\ 0 \hfill & 0 \hfill & {\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} } \hfill & {\begin{aligned}&{ - ((\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} )}\\&\quad {+ (n_{i} - 1)\lambda_{i(n - 2)} )}\end{aligned}} \hfill & {(n_{i} - 1)\lambda_{i(n - 2)} } \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & {(\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} )} \hfill &{\begin{aligned}& { - \left( {\left( {\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} } \right)}\right.}\\&\quad{\left.{ + (n_{i} - 1)\lambda_{i(n - 1)} } \right)}\end{aligned}} \hfill & {(n_{i} - 1)\lambda_{i(n - 1)} } \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & {(\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} )} \hfill & { - \left( {\mu_{i} y_{i} (1 - \alpha_{i} )^{{(y_{i} - 1)}} } \right)} \hfill \\ \end{array} } \right],$$
(10)

Amiri and Ghassemi-Tari (2007) developed an equation to compute system availability (Eq. (11)). Using Markov chain and transition matrix, they determined time-based availability functions. Using Eq. (11), the availability functions of a k-out-of-n system versus time can be determined. The obtained matrix is denoted by \(P\left( t \right)\). Due to memoryless property of exponential distribution, the probability of failing n components at time t which is denoted by \(P_{n} \left( t \right)\) depends only on probability of failing n components at time t = 0 denoted by \(P_{n} \left( 0 \right)\).

$$P\left( t \right) = p_{ij} \left( t \right) = e^{Q \times t} ,$$
(11)

where it is assumed that all n components function at moment t = 0. \(P_{n} \left( 0 \right)\) is a row vector; all arrays of which are zero except the first array which is one. Equation (12) demonstrates the probability of occurrence of the nth failure at moment t, which is determined by the multiplication,\(P_{n} \left( 0 \right) \times P\left( t \right)\).

$$P_{n} \left( t \right) = P_{n} \left( 0 \right) \times e^{Q \times t} .$$
(12)

Availability of each k-out-of-n subsystem can be computed through Eq. (9) and the total availability is obtained by multiplication of \(A_{i} \left( t \right)\) of each subsystem.

3 NSGA II

Solving multi-objective redundancy allocation problem (MORAP) using exact techniques is very difficult and it has been proved that the MORAP is a NP-hard problem (Alavi et al. 2017a, b). Many researchers have used metaheuristic algorithms that have shown a great efficiency in solving the MORAPs (Eshraghniaye Jahromi and Feizabadi 2017). Genetic Algorithm (GA) is an intelligent population-based evolutionary metaheuristic algorithm. Deb et al. (2002) introduced a modified version of Non-Dominated Sorting Genetic Algorithm (NSGA) called NSGA-II.

In the NSGA II algorithm, with addition of two necessary operators, the single-objective GA is converted into a multi-objective algorithm, which offers a set of best solutions known as the Pareto front, rather than only the best solution. NSGA II is highly efficient in finding the optimal Pareto front and many researchers have applied NSGA II to optimize their problems. The two necessary operators are:

  1. 1.

    The operator which assigns an excellence criterion to the population members based on non-dominated sorting.

    The concept of domination sort is used where certain conditions are followed, if solution from objective 1 dominates the solutions from objective 2.

  2. 2.

    The crowding distance which maintains solution diversity among solutions with equal ranks.

    If two solutions are of the same rank, the solution with larger crowding distance is selected. Large average crowding distance will result in better diversity in the population (Alikar et al. 2017)

In the following, steps of the algorithm are presented (Farrokhi-Asl et al. 2017):

Step 1 Initialization

An initial population is generated randomly

Step 2 Evaluation

Values of fitness function for each individual are calculated. Individuals are compared based on times of non-dominated and a rank is assigned to each chromosome. Rank one is the best level, rank two is the next best level, and so on (Finding non-dominated solutions as the first front).

Step 3 Density estimation

The average distance of two points on each side of this point is estimated and a crowding distance is specified.

Step 4 Selection

Parents are selected for participating in reproduction

Step 5 Crossover operator

Crossover operator is applied to create offspring from parents for a predetermined percentage of individuals selected

Step 6 Mutation operator

Mutation operator is applied to create individuals from a predetermined percentage of individuals selected

Step 7 Replacement

Old set of solutions and newly created solutions are merged to create a new population. The new population is sorted using the non-domination criterion with respect to elitism and crowding distance.

Step 8 Individuals with the non-domination level 1 are specified as Pareto solutions.

Step 9 These steps are repeated until a stopping condition is met

In this paper, to solve the proposed multi-objective problem, NSGA-II algorithm is applied to obtain the optimal solution.

3.1 Individual representation

An effective definition of the chromosome can help find a better result quickly. These chromosomes are converted to meaningful chromosomes to display solutions of the model (Farrokhi-Asl et al. 2017).

In the proposed GA, the solution encoding chromosome is presented as a 3 × n matrix, where n is the number of subsystems and by considering the bounds of variables, random permutation of numbers as a chromosome is created. The first row represents number of components in each subsystem; the elements of the second row illustrate number of k component in a k-out-of-n system, and final row represents number of repairmen. Figure 4 presents a chromosome structure considered for this problem with n =4.

Fig. 4
figure 4

Chromosome definition

Defining the initial population is a primary and important part of any metaheuristic algorithm. In this paper, population is generated legally and randomly. Initial population size is considered 100.

3.2 Constraint handling

To handle the constraints of the MORAP, a strict method called ‘remove infeasible individual’ is considered. To manage the limitations, infeasible individual solutions are eliminated after production. In this method, solutions are produced regardless of limitations and are investigated afterward in terms of feasibility so that the infeasible solutions are eliminated from the population. This method is, in fact, a strict penalty allocation method with very high penalties.

3.3 Selection

To select parents, the tournament selection is applied to find the top solutions for the next generation in which k chromosomes are selected randomly. These chromosomes are compared based on two criteria. The first criterion is the rank of the selected solutions; the one with the least front rank is chosen. Second, if rank of the solutions is equal, the crowding distance is compared and the one with the higher crowding distance will be selected (Alikar et al. 2017; Eshraghniaye Jahromi and Feizabadi 2017).

3.4 Crossover operators

In the crossover operator, a new solution is produced by combining the information of two or more parents. Combining chromosomes to produce new chromosomes (offspring) prevents premature convergence and helps to conduct an exhaustive investigation of the solution space. The popular crossover methods are: (1) one-point crossover, (2) two-point crossover, (3) uniform crossover. In this paper, the two-point method is employed. After selecting parents, two random integers between 1 and the chromosome length (number of variables) are selected. Parents are divided into three distinct parts by these two integers (Tavana et al. 2016). Offspring is produced by swapping the mid-part of parent’s chromosomes and other parts of parents are without change. The resulting chromosomes are the offspring. Figure 5 depicts the crossover accomplished in the algorithm. After crossover operator, replacement strategy is adopted so that parents can be replaced by their corresponding offspring. The cross rate of 0.8 is assumed here.

Fig. 5
figure 5

Example of two-point crossover

3.5 Mutation operators

The mutation operator helps to move toward a new point in the solution space. Mutation provides access to the solution space areas, a possibility which is not offered by the crossover operator. The main purpose of applying the mutation operator is to enhance diversity and avoid being trapped in local optimization (Zoulfaghari et al. 2014b; Amiri et al. 2013).

In this paper, adaptive feasible mutation method is used. This method does a great search for the solution space widely and produces the new generation by generating directions. This method shows improvement in performance throughout the evolution steps and mutates individuals in a constrained optimization. The following steps clearly describe the proposed method (Kumar 2010; Tavana et al. 2016).

  1. 1.

    Select chromosome randomly for applying mutation.

  2. 2.

    Generate mutation direction vector and initial step size randomly.

  3. 3.

    Generate mutated individual by considering direction vector and step size.

    A direction vector is generated. According to the vector, the step size is added to the amount of the selected gene for mutation or the step size is subtracted from the amount of the selected gene.

  4. 4.

    Check mutated individual within constraints and bounds.

  5. 5.

    Continue, if individual is feasible, produce mutated individual; else decrease the step size.

If the generated mutated individual is located on an infeasible space, algorithm automatically reduces the amount of step and generates another mutated individual using direction vector. The process repeats to obtain a feasible individual.

3.6 Stopping condition

The proposed algorithm is stopped after a specified iteration that provides a stable Pareto front for problem. The number of iterations is considered 1000 in the current problem.

4 Discussion

In this section, the model is validated through a numerical example. Table 3 demonstrates that eight subsystems (For example, electronic components) interconnected in series are assumed; each of which has a k-out-of-n configuration. Both k and n are regarded as decision variables. System designers can build a separate k-out-of-n configuration for each subsystem. Furthermore, considering the presence of load share between the components of each subsystem, different configurations can be built for each subsystem. Each subsystem configuration can have different availability; therefore, selection of an appropriate and optimal configuration leads to establishment of a system with high availability. In addition, since the model presented in the present paper is time-dependent, an appropriate mission time can be determined for the system. Table 3 presents the initial failure rate, repair rate, interference factor, costs of each component, repairman costs, weight of each component, and volume of each component.

Table 3 Parameters for each subsystem

The maximum permissible weight for the whole system is approximately 130,000 (W = 130,000 kg) and the maximum permissible volume for the whole system is about 110,000 (V = 110,000 m3). Table 4 demonstrates the maximum and minimum numbers of components and repairmen for each subsystem. The minimum number of components in each subsystem is assumed to be two (because, the k-out-of-n configuration of the subsystems necessities the presence of at least two components in each subsystem). Furthermore, since all the subsystems are repairable, each subsystem requires at least one repairman. The numerical example presented in this section is solved using NSGA II. The initial population size is considered 50; number of iterations is 200; and number of function evaluations is 11,250.

Table 4 Maximum and minimum number of components and repairmen in each subsystem

The model has been solved in three different times: t = 0.1 h, t = 0.5 h, and t = 1 h. Figure 6 illustrates the first Pareto front for the aforementioned model with two objective functions of costs and availability at specified times. In this illustration, the red arrow represents the worst solution and the green arrow represents the best solution. Figure 6 clearly demonstrates the diminution of availability over time. For instance, consider a cost of 9000 monetary units. Given this amount, the whole system availability is approximately 72% at t = 0.1, 19.6% at t = 0.5, and 3% at t = 1. Figure 6 also shows that increased availability leads to increase in the whole system costs.

Fig. 6
figure 6

RAP Pareto front

Table 5 demonstrates six different configurations in the Pareto front diagram and presents the data relating to the best and worst degree of availability for each Pareto front (green arrow and red arrow). These tables specify the values of n, y, and k for each subsystem at t = 0.1, t = 0.5, and t = 1. More components are required to reach maximum availability. For example, 37 components are required in the best configuration at t = 0.1, 32 are required at t = 0.5, and only 23 at t = 1 to reach the best availability. Availability decreases over the time; hence, this decrease makes it possible to reach the best availability with a smaller number of components.

Table 5 Minimum and maximum points of Pareto front based on availability

As the number of components decreases, fewer repairmen are required. This is presented in Table 5 for the three states (i.e. t = 1, t = 2, and t = 3).

Figure 7 illustrates the diagrams for all the six configurations in Table 5. As illustrated in all the diagrams, system availability decreases over time. In other words, all systems, except system (b), fail over time (systems hardly resist failure). For system b, if t = 0.1, then availability equals 0.9974. The availability of system (b) reduces over time, but the system resists failure up to t = 3.5. System (b) is in a stable state at t = 3.5 and maintains an availability of approximately 0.062 (still resisting failure). This system can be used as an optimization system with very short operation time.

Fig. 7
figure 7

System availability for systems presented in Table 5

In the light of the forgoing explanations, it is illogical to select systems (a), (c), (d), (e), or (f) as optimizations systems, because these systems undergo failure prior to reaching the stable state. Availability of the systems (c) and (d) at t = 0.5 is 0.0947 and 0.7695, respectively. Availability of the system (d) might appear logical, but this system also undergoes failure prior to reaching the stable state. This is the case for systems (e) and (f) as well. As regards systems (d) and (f), if short-time operation is desirable, system (d) can be more effective than system (f). Besides, the costs of system (d) are lower than those of system (f). However, as mentioned previously, for short-time operation, system (b) is the most efficient of all. In real-world settings, designers sometimes need to build new systems with very short operation time and short-time operation. Therefore, they must account for the effect of the passage of time on RAP. The best mission time which can be considered for the system is t = 0.1. Considering the existence of dependent failure (e.g. load share) in most redundant systems, it is recommended that these systems are not given long operation time, since their availability diminishes substantially over time.

As mentioned before, increased number of repairmen does not necessarily lead to increase in availability, and, in case of work interference among the repairman, an extreme increase in the number of repairmen does not result in increase in repair rate. Results presented in Table 5 demonstrate that the first subsystem has the lowest interference factor; therefore, it receives the greatest number of repairmen in all states (t = 0.1, t = 0.5, t = 1). In contrast, the third subsystem has the highest interference factor, and hence it receives the lowest number of repairmen. To better understand the effect of the interference factor in the model presented in this paper, a special sensitivity analysis is carried out on this parameter. For instance, assuming that the interference factor is zero for all the subsystems (no work interference among the repairmen), the model is solved again at t = 0.1. Table 6 presents the best solution for the model (Other parameters are assumed to be according to Table 3 and the model solution conditions are assumed to be identical). Due to the lack of work interference among the repairmen, the number of repairmen in relation to the best solution in Table 5 (t = 0.1) has increased (from 29 to 35). Moreover, the configuration of each subsystem has changed and the best availability at t = 0.1 has decreased (from 99 to 79%) in comparison to Table 5. Hence, it is very important to incorporate the work interference among repairmen into RAP, since it exerts various effects on model solution results. Figure 8 illustrates the Pareto front for the model without considering the work interference.

Table 6 Maximum point of Pareto front of Fig. 8 based on availability
Fig. 8
figure 8

RAP Pareto front for \(\varvec{\alpha}_{\varvec{i}} = 0\)

5 Conclusion

In the present study, a RAP is presented for the k-out-of-n system with reparability, time, load share, and interference factor. In this regard, the Markov chain has been used to account for these features and efforts have been made to integrate real conditions into the optimization model. The first objective function in the optimization model is defined to create the system availability function by the use of the transition rate. Using Markov chain equation, this function is obtained in terms of time; considering the mission time in RAP modeling is very important, as it affects the number of redundant systems. Moreover, there are certain systems designed for a very short time interval (such as one-shot systems). In these systems, the time interval of the mission is very sensitive and has a significant impact on the performance of the system. Additionally, reparability is an important feature that should be considered in creating an availability function. The repair rate is directly related to the number of repairmen. Typically, with the increase in the number of repairmen, the repair rate also increases, but in real conditions, this is not always the case and it is possible that as number of the repairmen increases, the interactions between them also increase and this affects the repair rate. To address such a situation, the interference factor has been defined and is integrated in the optimization model. Another feature that has been investigated in this paper for RAP is dependent failures. One of the most important dependent failures is the load share. This type of dependency makes availability extremely vulnerable in redundant systems. To take the load share into account, we used the capacity flow model.

To reduce the system costs, the second function has been created. System costs include component costs and the repairmen fees. Besides, as the problem is NP-hard, the genetic algorithm (NSGA II) has been selected for solving the optimization model and finally the results were analyzed in accordance with different times (t = 0.1, 0.5, 1).

Future studies can focus on developing methods for solving and choosing the system configurations (strategy selection). Other evolutionary methods can also be used to solve the model in terms of uncertainty and to analyze the solutions. On the other hand, there are different configurations for redundancy systems (such as cold standby, warm standby, etc.) through which the configuration can be accessed with the highest availability and the lowest cost by adding a decision variable to the optimization model. It is also suggested that other possible distributions (except exponential distribution) be considered for calculations.