Keywords

1 Introduction

In the manufacturing and service sectors, human labor has been replaced by machines. These machines can deteriorate over time, making maintenance a critical issue in these sectors. Machinery and parts have also started to have more features, and with it, the maintenance cost has increased. Maintenance costs can account for up to 70% of total production costs for some companies (Bevilacqua & Braglia, 2000 [1]). Companies with better maintenance strategies can be more competitive in the market due to the cost savings they achieve. The firm that implements poor maintenance strategies not only spend more money on their machines and parts but also damage their reputation, as the use of inadequate maintenance strategies can result in an approximately 80% rise in downtime (Özcan et al., 2017 [2]). On the other hand, ineffective maintenance strategies can cause significant losses for companies, which can lead to a reduction in their direct production capacity by up to 20% (Wollenhaupt, 2017 [3]).

Maintenance operations can significantly impact the daily operations of businesses. Machines that are not properly maintained may not function properly, and downtime may increase. Even if downtime does not occur, there may be a decrease in machine efficiency. If the maintenance requirements of machines are not met regularly, machine failure and increased replacement costs are also possible. In addition, maintenance requirements can affect the safety of the business, and it should not be forgotten that machines that are not regularly maintained can lead to dangerous situations. Businesses are increasingly using machines that have advanced technology, and with the development of machines, maintenance has become much more important for businesses. This not only makes the machines more expensive but also increases their workload. Examples of these machines are high-tech lithography systems, MRI machines, and wind turbines. Even momentary interruptions in machines can result in thousands or even millions of dollars in losses for companies. According to Sleptchenko and van der Heijden's research in 2016 [4], an average aircraft can lose $ 10,000 for every hour it is out of service, while a high-tech lithography system can lose up to $ 100,000 per hour in downtime.

There are two primary maintenance strategies that companies use: time-based maintenance (TBM) and condition-based maintenance (CBM). Time-based maintenance involves performing maintenance activities at fixed intervals regardless of the equipment's condition. For instance, a machine may receive maintenance every eight months or every year, regardless of whether it shows signs of wear and tear. While this type of maintenance is typically easier to plan and schedule, it can lead to unnecessary maintenance and increased costs. Condition-based maintenance, on the other hand, involves monitoring the equipment's condition and performing maintenance activities only when necessary. However, obtaining information about the equipment's condition requires the use of various tools. This type of maintenance is more cost-effective since maintenance is performed only, when necessary, but it requires more advanced planning and monitoring systems. CBM can reduce maintenance costs by more than 50% (Zhang, 2018 [5]). Despite being more complex for technologically advanced machines, CBM is preferred due to the high cost of downtime and the increased cost of equipment replacement when equipment is replaced more frequently. A comparison between condition-based maintenance (CBM) and time-based maintenance (TBM) has shown that CBM is more effective in reducing maintenance costs (Elwany & Gebraeel 2008 [6]).

In the previous section, we mentioned that to make decisions about intervening in the system along with the CBM maintenance strategy, we need to obtain information about the condition of the equipment or system. Sensors can be used to detect changes in equipment performance. Sensors provide information about critical components or systems they are attached to. For example, blades are an important part of wind turbine generators. The sensor signals us in the light of various information. Zhang et al., (2018) [7], suggest that the level of degradation can be evaluated by factors such as the degree of corrosion, wear area, creep fatigue, and crack growth. However, since sensors are expensive, it is not economically feasible to install them on every component. Therefore, we will install sensors in the system, which is more cost-effective, but the information we obtain will belong to the system, not components and obtain partial information. In this case, we will not be able to obtain information about which component failed and how many components failed when there is a failure.

As mentioned before gathering information by installing sensors in conjunction with the CBM strategy in order to intervene in the system. After deciding to maintain, we will need to replace the failed components. However, in this case, we need to determine how many components we need to bring with us, as these are high-tech components that are difficult to transport and require careful handling. If we bring an insufficient number of components, we will need to emergency order the missing component(s) before the machine breaks down. On the other hand, if we bring too many, we will need to return them to our inventory, which can result in lower costs than emergency orders. There are not many papers on spare part selection, as this decision alone is already difficult, and creating a system with spare parts selection makes it even more complex and challenging to find an optimal policy. One of my main objectives is to contribute to the literature on spare part selection.

In this study, we aim to contribute to the literature with a wider design of experiments on spare parts selection decisions and the use of identical components. The problem that we are interested in is writing a simulation model that optimizes the time of maintenance and spare part selection with a CBM maintenance strategy, finds optimal policies under various scenarios in this model, and draws managerial insights about these policies. The wide design of the experiment makes this thesis unique.

The remaining sections of the paper are composed of five main parts. In the first section, we will review the literature related to CBM, CBM-related sensors, and spare parts selection papers, under the headings of critical component and multi-component systems. In the second section, we will briefly examine the problem definition, including what the problem is, its critical elements, its costs, and how to solve the problem. Next, in the Solution methodology section, we will provide detailed information on how to solve the problem, and how to validate and adapt the model to real-life scenarios. In the fourth section, we will discuss the inputs and outputs of the model, the scenarios developed, why these scenarios were selected, and the realism of these scenarios, as well as the data used, presenting the scenario results and enriching them with managerial insights using graphics. Finally, the conclusion section will provide conclusions and future directions for research.

2 Literature Review

While reviewing the literature, we will focus on single-component and multi-component systems for future experiments on CBM. Contrary to the disaster and management issues in CBM in general, we gave priority to the papers in the maintenance and cost part of the literature flows. In most of the papers we have looked at, studies have been done on sensor deterioration, but we will ignore this effect. Our contribution to the literature is to find the optimal policies by establishing a simulation model for a multi-component system, to optimize the selection of spare parts during the intervention, and to the differences in the experiments we apply. Karabağ et al., (2022) [8] modeled a single-component system as Markov Decision Process. They made two naïve policy which is Corrective Policy and Naïve Policy and try to minimize total expected maintenance cost rate. In their observation, they find that the long-term expected maintenance cost rate rises with increasing values of corrective maintenance cost/preventive maintenance cost and α, eventually reaching the cost rate of Naive Policy 2. Similarly, the number of yellow signals received before maintenance decreases as corrective maintenance cost increases, ultimately converging to a value of 1.

Karabağ et al. (2020) [9] developed an integrated optimization model that combines condition-based maintenance and spare part selection for a multi-component system with spare part selection. They analyzed this problem using a partially observable Markov decision process. The optimal policy implemented, results in an average cost reduction of 28% and 15%, respectively, for the two scenarios examined. Furthermore, the study explores the benefits of having complete information, which involves using sensors dedicated to each component. On average, this leads to a 13% cost reduction compared to the situation where only partial information is available. They find that the value of having full information is found to be higher for cheaper and less reliable components than for more expensive and more reliable components.

Wang et al. (2008) [10] created an analytical model for a condition-based order-replacement policy, which considers factors such as the cost rate, preventive replacement rate, and mean availability. The proposed maintenance policy's characteristics, including the trade-off among three performance criteria and the impact of the lead time of spare parts orders, were examined through numerical experiments. The findings suggest that the lead time has a detrimental effect on the policy's overall performance.

Maillart (2006) [11] investigates the problem of scheduling both perfect and imperfect observations and preventive maintenance actions for a multi-state, Markovian deterioration system with self-announcing failures. The researcher develops a closed-form heuristic method to address the perfect-information problem, and they modify and apply this approach to handle a specific scenario of the imperfect-information problem.

Zheng et al. (2023) [12] conducted a study about K-out-of-N System with failures during inspection intervals. Apart from considering the system's degradation condition, maintenance schedules need to account for the availability of spare parts. By integrating information on spare part orders, planners can make more accurate arrangements for both maintenance activities and spare parts replenishment, resulting in reduced operating costs. This study delves into the simultaneous optimization of condition-based maintenance and spare parts provisioning for a K-out-of-N system, taking failures into account during inspection intervals. This research explores the optimization of joint condition-based replacement and spare parts ordering for a multi-component system.

Wari et al. (2023) [13] introduce a partially observable Markov decision process (POMDP) model to optimize maintenance decisions based on the progression of corrosion in pipelines. The corrosion progress is assessed through inline inspections, which help determine the extent of pipeline corrosion. To compute the transition matrix required for the POMDP model, the researchers utilize both Monte Carlo simulation and a pure birth Markov process method. This paper is important to see the usage of CBM’s area.

3 Problem Defınıtıon

This problem is a maintenance and spare part selection model with a periodic system. We focus on a machine that has 3 or more critical components Components have the current deterioration level and the maximum deterioration levels they can reach. Maximum deterioration level is 3 for these components. Each component's deterioration level increases by 1 with the probability of alpha or staying same. Components are classified as non-defective, defective and failure. The component is non-defective if and only if the deterioration level of component is 0. The component is defective if and only if the deterioration level of component is between 0 and 3. The component is failure if and only if the deterioration level of component is maximum deterioration level and if component is failure machine breaks down. A sensor is attached to the system to get information about the deterioration levels of the components. We assume that the sensor never degrades, which means it always gives correct information.

The sensor displays three colors. Sensor shows red, if any component is failure, shows yellow, if there is no failure component and there is a defective component, shows green, if each component is non-defective. However, as we mentioned before, we receive partial information from the sensor. We know that at least one component's deterioration level reaches 3 when the sensor gives a red signal, or that at least one component's deterioration level is at least 1 when it gives a yellow signal, but we cannot have an idea about the other components.

We have 2 different decisions, the first of them is the decision of when to intervene in the machine. After seeing yellow in sensor, we start to count how many periods we see yellow in sensor. We decide to intervene after a certain period. If the sensor shows red before deciding to intervene after seeing yellow, we immediately decide to maintain. We assume that maintenance lasts for a negligible time. When we decide to maintain we can see the actual deterioration level of components. All failure and defective components are replaced during the intervention with new ones and all components become non-defective after the intervention.

Another decision we will make in the system is how many components we will bring with us to change after deciding to intervene. We mentioned that all components with defective and failures were replaced. If we do not bring enough components with us, we order these components urgently and after the order, the arrival lead time of the components is assumed as negligible. If the number of components we bring is more than the number of components we will replace, we send the extra components back to the warehouse.

There are 4 cost parameters in the system, 2 different cost parameters for the maintenance decision and 2 different cost parameters for the spare part selection. In the maintenance decision, if the intervention decision is made while the maintenance, if signal shows yellow, the Preventive Maintenance Cost (PMC) is paid, and if the maintenance is made after the red signal, the Corrective Maintenance Cost (CMC), which is much higher than the Preventive Maintenance Cost, is paid. In order to reduce the cost here, we need to try to do as little maintenance as possible, but we need to do maintenance without getting a red signal. If we make a maintenance decision earlier than it should, we will pay more PMC than we normally would, but if we wait longer than necessary, we will overpay the CMC and our total cost will increase. This cost is fix cost and we pay for each maintenance. For the spare part selection cost parameters, we pay Emergency order cost (EOC) for every product we bring with us incomplete after the intervention decision. If we have brought excess items with us after the intervention decision, we will pay the Return Cost (RC) for each excess item. This cost is variable cost.

We wrote a simulation model to optimize our decision. We wrote this model in the Python program.

4 Solution Methodology

This model will be solved by optimizing the parameters by writing the simulation code. In order to find the optimal results, the decision variables will be tried to be optimized with the Exhaustive Search method. Since the verification of the model is the analytical solution of the single-component models, it was compared with them. The pseudocode of the model is given in Table 1, system parameters are in Table 2 and the decision variables of the model are given in Table 3.

Table 1. Pseudocode of the Model
Table 2. System Parameters
Table 3. Decision Variables of a System

We mentioned that we would do an Exhaustive Search. Although an Exhaustive Search may seem like the easiest way, the time it takes to find the optimal results in the simulation can be very long since you have tried all possible possibilities. There are two main variables searched in the simulation, these are MP and Ci. MP is how many yellow signals to intervene after. Ci is a threshold for each component and the coefficient determined here is the threshold related to how many components we will take with us in case of interference. Because let's think that we will intervene after 10 yellow signals, the number of components we will take in the red signal we will receive after 3 yellow signals and the number of components we need to take in the red signal we will receive after 8 yellow signals will be different. For this reason, we have set a threshold for the components and if it is higher than this threshold value, we decide to take that component with us. Since the components are identical, Ci+1 is always greater than or equal to Ci. While searching, it returns all possible combinations and searches for each MP depending on the condition we have described.

The random numbers we used in the simulation model were run for the same seed since there was no difference between the searchers. The seed used is chosen randomly in our model. The number of Periods determined in the system is 1 million. Each experiment was run for at least 2 different seeds. If the results were different, the number of periods was doubled and tried once again.

5 Numerical Experiments

After finding the optimal results in the simulation, the optimal decision variables, the average costs per period of 4 different cost types and the percentages of these 4 different cost types according to the average cost are kept in the system.

In the experiments, it was desired to look at the effects of the number of components, CMC and different RC - EOC changes on the cost in the optimal condition, and it aims to look at the examples in the literature from a wider perspective. The design of experiments is given in Table 4.

Table 4. Design of Experiments

In this study, there will be a total of 96 different experiments. Since CBM is generally used in high-tech machines, the deterioration of these products does not occur very often, and when the literature is examined, it has been observed that it varies between 1 percent and 12.5%. Contrary to CMC and PMC, RC and EOC have variable costs, so these costs are generally relatively less, and since EOC is ordered, it should be at least as much as RC. CMC, on the other hand, is much higher compared to PMC, as it causes the machine to stop completely. Since there are no experiments in this number of components in the literature, the number of components has been tried in this number. While determining the design numbers, our first change is in the number of components as an increase, then with the change of the RC-EOC pair as an increase and finally with the change of the CMC as a decrease.

First, we would like to point out that we added one more decision variable to the model. The thing we wanted to look at here was whether the signal color mattered when we intervened in yellow except Ci, and they were only used in this model when we received a red signal and intervened. When these were optimized for a short data set, we saw that we needed to take the red as much as we take in the yellow in period X after the first yellow, and we removed this decision variable from the model and we can say very clearly that the signal color does not matter in the spare part selection.

The first thing we learn from the results is that as the CMC decreases, the weight of the maintenance decision in the system gradually decreases, and the spare part selection becomes more important, and the increasing importance of this spare part selection increases the nonlinearity in the system, and the tour interventions in machine increase can sometimes decrease and then increase, making each solution unique. However, in Design Numbers 21–24, 69–72 and 93–96, where RC and EOC are 15 each, that is, relatively more important than in other experiments, MP goes further and returns, while in other experiments, MP goes further and returns to its former place. The main reason for this is that as the importance of RC and EOC, that is, spare part selection, increases, the system may decide to intervene later to reduce these costs.

The first thing we learned from the results is that as the CMC gradually decreases and the number of components increases, the weight of the costs that come with the maintenance decision in the system gradually decreases and the spare part selection becomes more important. Can increase, which makes each solution unique. However, in Design Numbers 21–24, 69–72 and 93–96, where RC and EOC are 15 each, that is, relatively more important than other experiments, MP increases and then decreases and returns to the old MP number, while in other experiments MP decreases first and then later. Has returned to its former place. The main reason for this is that as the importance of RC and EOC, that is, spare part selection, increases, the system may decide to intervene later to reduce these costs, but in other cases, being able to give CMC scares more than giving these costs and a decision to intervene earlier can be made. In Fig. 1, we see the graph percentage of the total cost due to maintenance decisions for values RC-EOC 5–5 and 15–15, respectively. Figure 2 shows the graph of the MP values changing with the variable machine increase.

Fig. 1.
figure 1

Maintenance Decision Cost Rate to total cost as a percentage

Fig. 2.
figure 2

NoC- MP relationship

In our experiments, although the product is durable, it has an alpha close to its maximum value in the examples in the literature, that is, it can be said that it is one of the most nondurable products in the relatively durable product, and PMC dominates most other costs since the MDL value is not high. However, when there is a decrease in CMC or an increase in the number of components, PMC is affected dramatically by these changes. Also, when PMC rate to total cost decreases between 40 and 50 percent MP will change and CMC rate increase. It, it can be said that the Spare Part Selection decision will become a very important decision in the decisions to be made on machines that consist of too many components. In addition, it can be said that with an increase in MDL or a decrease in alpha value, the system will mostly turn to CMC and the cost we pay to PMC will become insignificant in terms of the system. Figure 3 shows the percentage of CMC to total cost.

Fig. 3.
figure 3

CMC rate to total cost as percentage

6 Conclusion

For a multi-component system, we developed an optimization model in which we perform both maintenance and spare part selection in 96 different cases. As we see in the model, the importance of spare part selection in total cost increases, making the system nonlinear. The decrease of CMC in the products or the increase in the number of components delays the intervention in the system and reduces the importance of PMC on the total cost. However, it is an undeniable fact that even in this example we made almost the most durable of the products in the durable product category, the importance of the maintenance decision still has a lot of weight in the spare part selection decision. We want to find if there is a relationship with the MDL or the number of components related to the change in the MP value.

We would like to do a wider simulation study to see how this nonlinearity works for more durable products, and at the same time, we would like to discuss how the maintenance and spare part selection rates will change with the increase in MDL. Also, how does MP change with the rate. However, together with Exhaustive Study, we will make a comparison of simulation times with certain heuristics that will shorten the optimization time.