Introduction

Organizations need to improve production systems to meet customer demands on time due to competition in work environments. In fact, organizations typically operate in accordance with their production plan and prevent the costly consequences of production line downtime. Preventive maintenance (PM) operations, human errors, equipment failures, and corrective maintenance (CM) operations are recognized as the major causes of production line downtime (Ayvaza and Alpay 2021). The implementation of PM and CM operations brings about temporary downtime for the production line, thereby affecting the amount of production and the organization’s inventory levels (Ait-El-Cadi et al. 2021). Moreover, maintenance planning and the frequency of PM operations during the examination period have a considerable effect on the equipment failure rate (Rivera-Gómez et al. 2021). Consequently, the proper implementation of PM and CM operations has a significant effect on the downtime of the production line and the overall performance of the production system (Alaswad and Xiang 2017).

Timely maintenance operations contribute to improving performance and reducing equipment failures (Guo et al. 2013). Consequently, organizations experience fewer equipment breakdowns and achieve a desirable level of production and inventory (Bouslah et al. 2018). Proper planning ensures the availability of the production system to meet customer demands even during the maintenance period (Modares et al. 2023a). Therefore, proper planning of PM operations plays a vital role in enhancing the performance of the production system. Although implementing maintenance activities is crucial for enhancing the reliability of systems, improper execution of these operations can result in equipment failures and excessive, abnormal shutdowns of the production line (Hobbs and Williamson 2003). Improper planning of maintenance operations increases the occurrence of unplanned failures and downtimes in the production line and reduces its overall efficiency. Besides, human error is one of the reasons for the improper implementation of maintenance activities, which disrupts the effectiveness of these operations (Hobbs 2021).

Therefore, human error is a decisive factor in ensuring that these operations are performed correctly (Chen 2013). Recognizing the undeniable presence of human error in maintenance operations, it is increasingly challenging to overlook its impact on these activities. Human error plays a considerable role in incidents across various industries, resulting in adverse consequences and an increase in maintenance-related breakdowns (Hobbs 2021). These errors result in negative outcomes and an upsurge in breakdowns related to maintenance. The presence of human error in maintenance activities obstructs the creation of successful equipment maintenance strategies (Hobbs and Williamson 2003), and also brings about ineffective implementation of inspections and maintenance tasks. Furthermore, human error is a decisive factor in equipment failures and the halting of production lines (Froger et al. 2018). Given the substantial expenses incurred due to breakdowns, it is essential to address the human errors associated with maintenance activities.

Despite the significant impact of human error on maintenance operations, little attention has been given to this aspect in previous studies. Most research has focused solely on non-human factors in maintenance and production operations, overlooking the role of human error. Although some studies have explored the integration of maintenance and production operations, none have considered the impact of human error. The aim of this study is to determine the optimal values of production rate, the frequency of PM operations, the level of PM operation in each period, and the human error probability so as to minimize costs subject to various constraints. Consequently, this study considers the integrated aspects of production planning, inventory management, maintenance operations, and human error associated with this task. Moreover, this research assesses the costs associated with maintenance operations, which are both time-dependent and time-independent costs, to establish a stronger relationship with the production department and minimize production line downtime. In addition, this research takes into account equipment setup costs associated with maintenance operations.

To address these challenges, a novel mathematical model is presented in this research for the first time, which enables simultaneous planning of production and maintenance operations while quantitatively accounting for human errors. Moreover, the costs associated with PM and CM operations are considered more comprehensively. Optimal and integrated maintenance and production planning that takes human error into account can significantly impact sustainability in several ways. This type of planning reduces the likelihood of equipment breakdowns and unplanned downtime, minimizing waste and improving resource efficiency. As a result, there is lower energy consumption, reduced material waste, and more efficient use of labor, all of which contribute to a more sustainable operation (Vrignat et al. 2022). In addition, incorporating human error into maintenance and production planning reduces the risk of accidents and injuries, benefiting employees' health, and reducing costs associated with lost time, worker compensation, and legal liabilities (Siew et al. 2020).

Human error can result in accidents, injuries, and fatalities, leading to increased healthcare costs, lost productivity, and decreased employee morale. Consequently, improving safety and preventing accidents are essential for creating a safer and healthier workplace, which is crucial for sustainability. Moreover, human error can lead to inefficient use of resources, such as water, energy, and raw materials. Companies can reduce costs and environmental impact by reducing human error and improving resource efficiency. Additionally, human error can bring about non-compliance with environmental, health, and safety regulations, leading to fines, legal liabilities, and damage to a company’s reputation. By implementing better training and controls, companies can reduce the risk of non-compliance and improve their sustainability (Jasiulewicz Kaczmarek and Saniuk 2015). Overall, an optimal and integrated maintenance and production planning approach that considers human error can positively impact sustainability by improving resource efficiency, reducing waste, enhancing safety, and maximizing operational effectiveness. Based on previous studies conducted in this field, we will highlight five factors which are the main contributions of this paper:

  • Based on a review of previous studies, it is evident that none of them have examined the cost of human error associated with maintenance and production quantitatively using a mathematical model. Furthermore, the impact of human error on the reduction coefficient of equipment virtual age has not been considered.

  • In the proposed model of this research, the cost function associated with the human error probability for maintenance tasks, including PM and CM operations, is estimated using the regression method.

  • Also, a model is proposed in order to minimize costs associated with production, maintenance, inventory control, and human error, while simultaneously satisfying constraints related to customer demand, budget, production rate, and capacity.

  • Additionally, no one to the best of our knowledge has categorized the costs of maintenance operations, and the costs associated with maintenance operations have been categorized into two groups for the first time in this paper: time-dependent and time-independent costs.

  • In contrast to existing research in the literature, the model considers setup and equipment opportunity costs, which are determined based on the equipment downtime resulting from maintenance operations.

The rest of this paper is organized as follows: The second section provides an explanation of the theoretical background; the third section provides a brief overview of previous research on this topic. In the fourth section, the problem statement and a case study are introduced. The fifth section discusses the methods employed. The research findings are reported in the sixth section. To ensure the validity of the suggested model, the sensitivity analysis is investigated in the seventh section. In the eight section, we provide a discussion on the research findings and compare the results of this paper to those of conducted studies. The ninth section offers insights into managerial aspects. Finally, the tenth section provides some remarkable conclusions and suggestions for future studies.

Conceptual Background

Maintenance

Maintenance operations impose significant costs on companies. For example, in manufacturing companies, approximately 15 to 40% of their expenses are allocated to maintenance operations costs (Wireman 2014), while for thermal power plants and offshore wind farms, these operations account for about 30% of the total expenses. As a result, the effective implementation of maintenance operations and strategies is a decisive and important factor in the profitability and competitiveness of companies (Gräber 2004). In PM operation, the machine’s lifetime is generally considered a benchmark for conducting PM operations. The equipment’s age is typically determined based on the scheduling and execution of PM and CM operations, as well as the type and number of components involved in each operation (Legat et al. 1996).

Human Error

Human error refers to unintentional failure in performing an action to achieve the desired outcome. According to this definition, human error occurs when there is no deliberate intent to make an error (Whittingham 2004). Human errors can occur due to a variety of reasons, such as lack of training, fatigue, stress, distraction, complexity of the task, or inadequate communication. Human error has been studied extensively in various fields, including aviation, healthcare, nuclear power plants, transportation, and manufacturing. Understanding the causes and consequences of human error is crucial to preventing accidents and improving safety in these industries (Reason 2000). Human error is a significant concern in industrial settings, where workers perform complex tasks that require high levels of attention, skill, and precision. Human errors in industrial settings can result in accidents, equipment damage, decreased productivity, and increased costs  (Hagen 1980).

Human error is a critical concern in the maintenance industry, where workers perform complex and often hazardous tasks that require a high level of attention, skill, and precision (Bafandegan Emroozi et al. 2023). Human error in the maintenance industry can lead to accidents, equipment damage, reduced efficiency, and increased costs. Therefore, it is crucial to understand the causes and consequences of human error in maintenance operations and implement measures to prevent or mitigate it (Melchers 1995). Human error is examined as a significant aspect of human factors in maintenance operations (Bafandegan Emroozi and Fakoor 2023). The effective implementation of maintenance operations and related processes depends on the performance of human resources and their errors. Therefore, in addition to engineering aspects, attention should be given to human factors and human errors associated with maintenance operations in the effective implementation of maintenance operations.

Literature Review

While there is a vast amount of literature exclusively focused on maintenance and production operations, only a few studies have examined these operations simultaneously. Unfortunately, these studies have also failed to investigate and address the impact of human error on maintenance and production operations. To mention just a few, some of this research has been presented as follows:

The initial work in this field primarily focused on integrated preventive maintenance planning problems and production control in order to increase the availability of the production system and reduce overall costs (Boukas and Haurie 1990). In their groundbreaking 1999 paper, Das and Sarkar (1999) developed a preventive maintenance model for a production system by identifying the probability distribution of machine failures and gathering information on system conditions. In their investigation of maintenance and production operations planning, Gharbi et al. (2007) introduced a combined approach that integrated PM operations and spare part inventory management in unreliable production environments. Their study took into consideration factors such as backorders and machine failures. Dehayem Nodem et al. (2011) considered the production planning problem by taking into account system failures and repair time.

In a study conducted by Moghaddam and Usher (2011), a mixed-integer nonlinear multi-objective optimization model was presented with the purpose of determining the optimal scheduling for PM and replacement. Mifdal et al. (2013) addressed optimal production and maintenance planning in a multi-product production system considering random demands to meet the demands of customers for each product. Aramon Bajestani et al. (2014) presented a combination of production and maintenance planning in deteriorating multi-machine production systems over multiple periods. In a study conducted by Assid et al. (2015), effective integrated policies for maintenance, setup, and production in a single-machine production system were developed. Emami-Mehrgani et al. (2016) investigated the effect of human errors on repairable production systems under conditions where the time horizon is unlimited and errors occur randomly. In this study, an optimal policy for minimizing production costs based on maintenance, machine repairs, and inventory management to meet market demands is discussed. Huang et al. (2019) introduced a mixed-integer programming model for integrating production and maintenance problems so as to minimize costs.

Kang and Subramaniam (2018) presented an integrated control of dynamic maintenance and production in deteriorating systems. Kim et al. (2019) evaluated the probabilistic perspective for optimal inspection and maintenance planning. Their approach encompassed both pre- and post-failure detection multi-objective optimization processes. Duffuaa et al. (2020) provided an integrated model for optimizing production, maintenance, and process control decisions for a single machine. In this research, a methodology was developed that optimized the scheduling of PM operations and incorporated an integrated model for production scheduling, inventory holding, maintenance and repair, and process control. Ghaleb et al. (2020) presented a mixed-integer stochastic mathematical model that integrated production and maintenance planning decisions in a single-machine deteriorating production environment. Rivera-Gómez et al. (2020) determined an appropriate PM and production policy, and also quality control rate so as to minimize costs.

Liu et al. (2020A) presented an integrated production and maintenance planning model considering production capacity and service level constraints. Liu et al. (2020b) presented an integrated model considering buffer inventory and imperfect PM in production systems. Adloor and Vassiliadis (2020) proposed an optimal control approach for maintenance and production planning. They introduced a multi-stage mixed-integer optimization problem (MSMIOCP) and solved it using standard nonlinear optimization techniques. Zheng et al. (2021) pointed out that an economic production quantity and condition-based maintenance policy for a deteriorating production system is significantly more cost-effective. Sharifi and Taghipour (2021) introduced an integrated model for production and maintenance planning. This model was designed for single-machine production systems with multiple types of failures. Rivera-Gómez et al. (2021) presented a production control, sampling inspection, and maintenance planning policy based on machine age in their research. The policies were examined for an unreliable production system with a deteriorating trend.

Ait-El-Cadi et al. (2021) addressed a novel combination of production, maintenance, and sampling inspection control policies for susceptible failure production systems. uit het Broek et al. (2021) suggested a novel policy for production and maintenance that takes into account dynamic conditions. This innovative approach combined condition-based production and condition-based maintenance policies. Li et al. (2022) offered a multi-objective optimization model which was applied to estimate maintenance performance taking into account maintenance costs and production losses. The research incorporated the perspective of probabilistic modeling. Bismut et al. (2022) improved maintenance and inspection strategies in conditions where the system had an acceptable level of reliability. Morato et al. (2022) provided an in-depth analysis of optimal maintenance and inspection planning approaches for deteriorating components through a dynamic Bayesian network and Markov decision process. Also, Hejazi and Roozkhosh (2019) proposed an optimization model for multi-stages systems inspection. Their model focuses on cost minimization, even when faced with uncertainties in costs.

Azadeh et al. (2016) put forth a sophisticated, scenario-based approach that integrated historical data and simulation optimization to elevate maintenance planning and policy frameworks. The authors’ methodology took human error and learning effects into account. The outcomes of their approach encompassed metrics such as reliability, machine availability, errors, and costs. These outcomes underwent analysis using the analytical hierarchy process (AHP) and the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) methods. Hameed et al. (2016) suggested a novel risk-based methodology that integrates human errors with degradation modeling for estimating shutdown inspection and maintenance intervals in a processing unit. Authors determined the number of shutdown intervals needed to achieve target reliability within a specified timeframe. Human error probability during the shutdown is assessed through the success likelihood methodology (SLIM).

Carr and Christer (2003) brought up a mathematical framework for delay-time modeling in inspection maintenance expanded to include human error. Their primary aim was to establish a method for quantifying the cost of human error, thereby assisting in making corrective decisions. Ighravwe and Ayoola Oke (2021) utilized multi-criteria decision-making methods to incorporate human factors into maintenance system evaluation. They employed fuzzy logic, AHP, GRA, and TOPSIS along with key indicators. Their paper’s novelty lies in merging safety, maintenance tasks, errors, human relations, and equipment factors through a fuzzy multi-criteria approach for maintenance system assessment. Aalipour et al. (2016) focused on improving maintenance human reliability in cable manufacturing. The authors employed three common HRA techniques (i.e., human error assessment and reduction technique, standardized plant analysis risk-human reliability, and Bayesian network) to estimate error probabilities consistently. Major maintenance error causes include time pressure, inexperience, and procedural issues. Table 1 presents some previous studies that have focused on maintenance and production operations simultaneously. However, it is important to note that these studies have not considered the influence of human factors on maintenance operations.

Table 1 Previous studies on maintenance

Overall, these studies offer significant insights into maintenance and production operations. However, they have failed to explore the impact of human error on these operations. The main weakness in these papers lies in their neglect of the influence of the human factor on maintenance, as they have solely focused on the effects of non-human factors on integrated maintenance and production operations. In order to bridge this gap, our research aims to investigate the integrated production and maintenance operations while considering the decisive factor of human error probability on the total cost and equipment's age resulting from these operations.

Problem Statement

The hydraulic steering box is an essential component in modern vehicles, enabling drivers to steer easily and safely. The manufacturing process of hydraulic steering boxes involves several stages, including designing, prototyping, testing, and production. The production stage involves the mass production of the steering boxes. The manufacturing process includes casting, machining, assembly, and testing. During the casting process, the raw materials are melted and poured into a mold to form the steering box's basic shape. The machining process involves cutting and shaping the steering box’s components to the required specifications. The assembly process involves combining all the components to form the final steering box. Finally, the steering box is tested to ensure that it meets the required standards for quality and performance. Common equipment and machinery used in the manufacturing process include casting machines, lathes, drilling machines, assembly machines, CNC machines, and testing machines.

In this research, we focus on planning and practical evaluation of maintenance operations on a computer numerical control (CNC) machine in an automobile parts manufacturing factory. The data for analyzing and optimizing integrated production and maintenance operations is focused only on a CNC machine for manufacturing automobile steering boxes. The reliable operation of CNC machines significantly influences production reliability and waste reduction within an organization. As equipment damage increases, the average time between device failures decreases, potentially leading to disruptions in the production process. Equipment failures due to damage not only result in lost production time but also require additional time for equipment setup and adjustments. To gather relevant reliability data, maintenance report books from the automobile parts manufacturing industry are examined.

Improper implementation of maintenance operations can lead to equipment breakdowns and disruptions in production. Consequently, conducting maintenance operations at inappropriate timing and frequency not only fails to enhance production operations but also incurs additional costs for the organization. Moreover, human error has a detrimental impact on the proper execution of maintenance operations. Therefore, achieving the correct implementation of these operations requires desirable planning and the optimization of human error. It is crucial to minimize human error to ensure that operations comply with predetermined targets and plans. Reducing human error involves improving contextual conditions that impose costs on the organization. Hence, establishing a proper balance between costs is essential for optimizing the occurrence of human error. Essentially, improving the levels of contextual conditions that influence human error, known as common performance conditions (CPCs), should be done at an optimal and cost-effective level. To address this challenge, a model for the optimal implementation of maintenance operations, considering human error, is presented.

Methods

This research presents a mathematical model for simultaneous optimization planning of production and maintenance, aiming to enhance operational efficiency and reduce downtime in the production line. To validate the proposed model, it has been implemented in a real-world case study. The study utilizes written documents from the organization, questionnaires, and interviews to gather the necessary parameters for the problem. Questionnaires and interviews with experts and professionals from the company are used to collect data related to factors that influence human error probability, known as CPCs. Additionally, this research involves estimating the cost function associated with human errors in maintenance activities. Historical data from the company is employed to estimate the function that represents the cost of maintenance-related human errors. The presented model considers a single product and equipment, based on the following assumptions:

Model Assumptions

  • The time horizon is limited.

  • The repair and replacement costs after failure are higher than the activities related to the PM operations.

  • Over time, the failure rate increases.

  • Opportunistic maintenance is not included.

  • The CM operation is carried out as soon as the equipment breakdown.

  • The time spent on PM and CM operations are considered as part of the system’s operating time.

  • PM is performed in the range from perfect to minimal, and CM is minimal.

  • The CM does not affect the machine’s failure rate and the virtual age; its distribution function always follows the Weibull distribution function with specific parameters.

  • The time-independent cost of PM and CM in different periods is always constant.

  • During the time and the increasing number of PM times, the time required to implement the PM decreases (due to the increase in the skill of the personnel and based on the learning curve).

  • Shortage costs depended on the shortage average.

  • Holding costs depended on the inventory average.

Mathematical Model

Nomenclature

Sets

TThe set of periods indexed by t;

KThe set of the types of levels (i.e., actions) of PM indexed by k;

Decision variables

pHuman error probability in maintenance task.

z ktThe binary variable is equal to 1 if a PM operation is performed for kth level in tth period; otherwise, it is equal to 0.

y tThe binary variable is equal to 1 if the production operation is executed in tth period; otherwise, it is equal to 0.

a tThe virtual age of the equipment in tth period.

\({t}_t^p\)Production time (available time) in tth period (hours/month).

\({t}_{kt}^{\left[m\right]}\)Preventive maintenance time for kth level in tth period (hours/month).

u tThe production rate in tth period (number/month).

B tThe shortage quantity in tth period

I tThe inventory quantity in tth period

Parameters

Sh tInventory shortage cost for each unit of product in tth period (currency/unit).

h tInventory holding cost for each unit of product in tth period (currency/unit).

d tDemand for product in tth period (value).

Cset tSetup cost of production operation in tth period (currency).

Cv tVariable cost to each unit of product in tth period (currency/unit).

CbLost opportunity cost for CM or PM operations (currency).

\({t}_t^{\left[r\right]}\)Corrective maintenance time in tth period (hours/month).

g tMachine nominal production rate in tth period (value/month).

WartProduct warehouse capacity in tth period (value).

Nm ktNumber of required technicians for PM operation for kth level in tth period (number).

Nr tNumber of required technicians for CM operation in tth period (number).

Np tNumber of required technicians for manufacturing each product unit in tth period (number).

Hm ktHuman resource cost for PM operation for kth level in tth period (currency/month).

Hr tHuman resource cost for CM operation in tth period (currency/month).

Hp tHuman resource cost for each manufacturing each product unit in tth period (currency/month).

Spr tThe cost of required material and spare parts for CM in tth period (currency).

Spm ktThe cost of required material and spare parts for the PM for kth level in tth period (currency).

HLength of the planning horizon (month).

\({u}_t^{\textrm{min}}\)Minimize production rate in tth period (value/month).

\({u}_t^{\textrm{max}}\)Maximize production rate in tth period (value/month).

βThe shape parameter of Weibull distribution (machine failure time distribution).

ηScale parameter of Weibull distribution (machine failure time distribution).

α ktThe reduction coefficient of virtual age of the equipment using implementing kth level of PM operation in tth period.

bp tSell price of per unit product in tth period (currency).

lThe length of each time interval (month).

AE tThe minimum of machine accessibility in tth period (hours/month).

TBThe maximum available budget (currency).

p currentHuman error probability in maintenance task for the current state of the company.

Objective Function

The objective function is designed to maximize the overall profit of the organization. The first part of the objective function involves multiplying the profit of each unit of the product by the organization’s sales volume, which is determined based on the minimum production level and product demand. This expression is assigned a positive coefficient in the objective function. Conversely, the other expressions in the objective function represent the organization’s costs and thus have negative coefficients. The second and third parts of the objective function account for inventory control costs. The second part specifically considers holding costs, while the third part represents the costs associated with product shortages. The fourth part of the objective function represents the costs of human errors in maintenance tasks. Lastly, the fifth, sixth, and seventh parts of the objective function correspond to the costs of production planning, CM, and PM operations, respectively, which are associated with maintenance operations. The first objective function presents in Eq. (1).

$$\max \sum_{t\in T}b{p}_t\left[\min\left\{{u}_t{t}_t^p,{d}_t\right\}\right]-\left(\begin{array}{l}\sum_{t\in T}{h}_t{\overline{I}}_t+\sum_{t\in T}{Sh}_t{\overline{B}}_t+f(p)\\ {}+\sum_{t\in T} ((C{v}_t+H{p}_tN{p}_t\left){u}_t\;{t}_t^p+ Cse{t}_t\right)\;{y}_t\\ {}+\sum_{t\in T} ((Cb+H{r}_tN{r}_t){t}_t^r+ Sp{r}_t+ Cse{t}_t)\left[\frac{\left({\left({a}_t+l\right)}^{\beta }-{a}_t^{\beta}\right)}{\eta^{\beta }}\right]\\ {}+\sum_{k\in K}\sum_{t\in T} ((Cb+H{m}_{kt}N{m}_{kt}\left){t}_{kt}^m+ Sp{m}_{kt}+ Cse{t}_t\right){z}_{kt}\end{array}\right)$$
(1)

The symbol “bt” signifies the profit per unit of the product. Consequently, the organization’s total profit results from multiplying the profit per unit by the quantity of the product sold. The volume of product sales is contingent upon both production quantity and market demand. Therefore, the product’s sales volume is established by determining the lower value between market demand and factory production quantity. In order to linearize Eq. (2), a new decision variable (PSt) should be added to the problem. Essentially, this new decision variable denotes the minimum value between market demand and production quantity. Consequently, Eq. (3) is introduced to ensure the fulfillment of these conditions.

$${b}_t\;\left[\min\left\{{u}_t{t}_t^p,{d}_t\right\}\right]={b}_tP{S}_t$$
(2)
$$\min\left\{{u}_t{t}_t^p,{d}_t\right\}=P{S}_t\kern0.5em \quad P{S}_t\le {u}_t\;{t}_t^p,P{S}_t\le {d}_t$$
(3)

Fig. 1 shows the modeling framework such as objective function and constraints, as well as the steps of solving the model and its validation.

Fig. 1
figure 1

The modeling framework

The Inventory Control Costs

This part consists of two components related to holding and shortage costs. Eq. (4) represents the inventory and shortage levels. Inventory level (It) is the difference between the quantity of the produced product and the demand level. Essentially, when the quantity of the manufactured product surpasses the demand level, the expression takes on a positive value, and the shortage level (Bt) equals zero. Conversely, if the demand level exceeds the quantity of the ordered product, the total inventory (It) is reduced to zero, and the shortage level (Bt) assumes a positive value. Eq. (4) represents which how It and Bt are calculated.

$${\displaystyle \begin{array}{cc}{I}_t={u}_t\;{t}_t^p+{I}_{t-1}-{d}_t-{B}_{t-1},& {B}_t=-{u}_t\;{t}_t^p-{I}_{t-1}+{d}_t+{B}_{t-1}\end{array}}$$
(4)

The holding cost considers the expenses incurred from maintaining excessive inventory beyond the demand at the end of each period (Modares et al. 2023b). It is calculated by multiplying the average inventory level during each period by the per-unit maintenance cost. On the other hand, if the demand exceeds the production level, the cost of inventory shortage is incorporated into the overall cost. In this model, the total shortage cost is determined by multiplying the average shortage level by the cost per unit of shortage. These costs are formulated in Eq. (5) that

$$\sum_{t=1}^T{h}_t{\overline{I}}_t+\sum_{t=1}^TS{h}_t{\overline{B}}_t.$$
(5)

where

$$\begin{array}{cc}{\overline{I}}_t=\frac{u_t\;{t}_t^p+{I}_{t-1}-{d}_t-{B}_{t-1}}{t_t^p},& {\overline{B}}_t=\frac{-{u}_t\;{t}_t^p-{I}_{t-1}+{d}_t+{B}_{t-1}}{t_t^p}\end{array}$$
(6)

Production Costs

The fifth part of the objective function represents the costs associated with product manufacturing, which include both fixed and variable costs per unit. The variable costs vary depending on the number of units produced, while the fixed costs are not directly influenced by the production quantity but are generated by the production in each period. The variable costs consist of expenses for raw materials, energy per unit of product, and labor. As these costs depend on the production quantity, they are multiplied by the production rate during the production periods to determine the total production cost based on the number of products. In this study, the fixed cost is considered equivalent to the equipment setup cost. These costs are calculated based on Eq. (7) that

$$\sum_{t\in T}\;\left(\left(C{v}_t+H{p}_tN{p}_t\right){u}_t\;{t}_t^p+ Cset)\;{y}_t\right.$$
(7)

CM Costs

Due to the unpredictable nature of equipment breakdowns, they occur randomly, resulting in stochastic maintenance operations (Al-Naggar et al. 2021; Gbadamosi et al. 2021). As equipment and its components gradually deteriorate, failure becomes inevitable. In this study, equipment failure is modeled using the Weibull distribution, with parameter values determined based on degradation processes and failure events. The Weibull distribution is commonly employed in research to estimate equipment failure and consists of two parameters: shape (β) and scale (𝜂) (Niu et al. 2021; Chien et al. 2019; Montoya et al. 2019; Sgarbossa et al. 2018). Eq. (8) describes the probability density function of the Weibull distribution, wherein the shape parameter (β) and scale parameter (𝜂) are defined.

$$\begin{array}{cc}f(t)=\frac{\beta }{\eta }{\left(\frac{t}{\eta}\right)}^{\beta -1}\exp {\left(-\frac{t}{\eta}\right)}^{\beta }& \beta >1,\eta >0,t>0.\end{array}$$
(8)

In Eqs. (6) and (7), the value of the shape parameter (β) in the Weibull distribution is specified to be greater than one. When the shape parameter is greater than 1, the probability of equipment failure increases as the equipment ages. Conversely, if the shape parameter is equal to 1, the failure probability remains constant over time. If the shape parameter is determined to be less than 1, it indicates that the failure probability of the equipment decreases as time passes. In this study, considering the typical assumption of equipment deterioration in most studies on production systems, the value of β is set to be greater than 1. Eq. (9) presents the cumulative distribution function of the Weibull distribution as follows:

$$\begin{array}{cc}F(t)=1-\exp \left[-{\left(\frac{t}{\eta}\right)}^{\beta}\right]& \beta >1,\eta >0,t>0.\end{array}$$
(9)

Therefore, Eq. (10) illustrates the failure rate in the Weibull distribution that

$$h(t)=\frac{f(t)}{1-F(t)}=\frac{\frac{\beta }{\eta }{\left(\frac{t}{\eta}\right)}^{\beta -1}\exp \left[-{\left(\frac{t}{\eta}\right)}^{\beta}\right]\;}{\exp \left[-{\left(\frac{t}{\eta}\right)}^{\beta}\right]} = \frac{\beta }{\eta }{\left(\frac{t}{\eta}\right)}^{\beta -1}.$$
(10)

Thus, the mean number of failures in each period with length L is shown based on Eq. (11):

$${\int}_{a_t}^{a_t+l}\frac{\beta }{\eta }{\left(\frac{t}{\eta}\right)}^{\beta -1} dt={\left(\frac{1}{\eta}\right)}^{\beta}\left[{\left({a}_t+l\right)}^{\beta }-{\left({a}_t\right)}^{\beta}\right].$$
(11)

In this study, according to the model assumptions, CM operations are performed minimally, meaning that the virtual age of the equipment remains unchanged after the CM operation. Consequently, the equipment’s condition remains as bad as it was before. Therefore, no improvement in the equipment’s condition is considered. Considering this minimal approach in CM operations, the average number of failures occurring in each period is determined by a heterogeneous Poisson process with a rate denoted as h(t), which is defined in Eq. (10). As a result, the total cost of CM operations is calculated by multiplying the number of equipment failures by their associated costs.

In this paper, the costs per failure include both time-dependent and time-independent costs. The time-dependent costs encompass the opportunity cost, which represents the loss incurred by the company due to production line downtime, and is related to the profit generated from each unit of product sales. Additionally, expenses related to human resources, such as wages, are considered, which depend on the hours required to repair the equipment failure. Furthermore, there are time-independent costs associated with the CM operation, including expenses for spare parts, materials, and equipment setup. These costs are independent of the frequency of CM operations. In this study, the time dedicated to CM is considered as part of the overall operating time. The CM time is assumed to be constant and does not depend on the number of CM occurrences or system failure time. Besides, only one level for the CM operation is assumed. Eq. (12) outlines the calculation for the total cost of the CM operation, as follows:

$$\sum_{t\in T} ((Cb+H{r}_tN{r}_t){t}_t^r+ Sp{r}_t+ Cse{t}_t)\left[\frac{\left({\left({a}_t+l\right)}^{\beta }-{a}_t^{\beta}\right)}{\eta^{\beta }}\right]$$
(12)

PM Costs

Similar to CM operations, costs associated with PM operations include both time-dependent and time-independent costs. The time required to perform PM at different levels is not fixed and varies depending on the frequency of PM operations. As employees gain more experience and knowledge, the time needed for PM operations decreases with an increase in the number of these operations. Eq. (13) defines the function for determining the time required for PM operations, where γ denotes the time required for the initial PM operation of each level, and r represents the percentage of the experience curve. Fig. 2 illustrates the impact of learning on performing PM operations.

Fig. 2
figure 2

The learning influence of PM operations

$${t}_{kt}^m={\gamma}_k\;{z}_{kt}\;{\left(1+\sum_{\begin{array}{c}l\in T\\ {}l\le t-1\end{array}}{z}_{kl}\right)}^{\frac{\ln r}{\ln 2}}\forall k,t$$
(13)

This paper investigates the impact of PM operations on reducing the equipment age. Fig. 3 provides a visual representation of the influence of PM operations on the equipment age. Moreover, Fig. 3 illustrates how the implementation time of PM is affected by the learning process. The coefficient α represents the age reduction coefficient, which determines the effectiveness of the PM operation in reducing the equipment age. As a result, the virtual age of the equipment is calculated using Eq. (14), which indicates that the virtual age at the beginning of each period is equal to the virtual age at the end of the previous period, plus the duration of the period minus the extent of improvement achieved through the performance of PM operations.

Fig. 3
figure 3

The impact of different levels of PM operations and learning on the equipment’s age

In this research, three different levels of PM operations are considered for equipment maintenance: level 3 involves servicing and inspection of equipment, level 2 involves servicing and repairing certain components, and level 1 involves servicing and replacing specific components. The parameter αk, which represents the impact on machine performance and equipment virtual lifetime, varies for each level. If any of these levels of PM operations are implemented, the virtual lifetime of the equipment will decrease proportionally to the age reduction coefficient (αk) associated with each level. Without any PM operations, the age reduction coefficient becomes zero, resulting in no change to the machine’s lifetime. The overall virtual age of the equipment is determined by adding the virtual age of the previous period to the duration of the current period. This research emphasizes the influence of human error probability on the successful execution of each level of PM operations. Consequently, the age reduction coefficient (αk) is multiplied by the human resource reliability. In other words, the effective and proper execution of PM operations at each level depends on the human error probability.

If maintenance operations are limited to equipment inspection and service at level 3, they will have a minimal effect on the equipment’s lifespan, and the equipment will remain relatively deteriorated. When PM operations are performed at level 2, involving equipment servicing and repair of certain components, the impact of these preventive measures will be partial, resulting in the equipment's condition falling between “as bad as old” and “as good as new.” Finally, implementing PM operations at level 1, which includes equipment servicing and replacement of specific components, will restore the equipment to an “as good as new” state. In this scenario, the equipment’s lifespan is effectively reset to zero, treating it as if it is a new machine. As a result, the age reduction coefficient (αk) in this research varies depending on the level of PM operation implementation, assuming different values (zero, a value between zero and one, and one). It is important to note that besides the direct impact of the age reduction coefficient on the equipment’s lifespan, human error also plays a significant role in determining this value. The execution of each level of PM operations involves a certain probability of human error. Therefore, in the absence of human error, the age reduction coefficient can accurately reflect its impact on the equipment's lifespan.

Therefore, taking into account the influence of human error, the implementation of the third level of PM operations on the equipment will have a limited effect on its lifespan (i.e., the age reduction coefficient is 0.1). Even with the presence of human error associated with this level’s execution, it is not possible to extend the equipment’s lifespan beyond its actual age. Consequently, the implementation of this level will minimally impact the equipment’s age. When the second level of PM operations is performed on the equipment, the equipment’s age will be reduced but not completely (i.e., the age reduction coefficient is between zero and one). In the presence of human errors during the execution of this level, the impact of its implementation on the equipment’s age will be diminished (αk(1 − P)). Hence, the implementation of this level influences the equipment’s age, but the extent of its impact on the virtual age of the equipment depends on the value of the human error probability. The operations at this level are executed imperfectly, leading to a partial effect on the equipment's age.

If the first level of PM operation is implemented on the equipment, it will effectively restore the equipment to an “as good as new” state, resulting in an age of zero (i.e., the age reduction coefficient is one). However, in the presence of human errors during its execution, the impact of the first level of PM operation on the equipment's age will be diminished. Considering the human error probability, the equipment may not be fully restored to an “as good as new” state. Human error acts as a barrier to achieving a virtual age equivalent to an “as good as new” state for the equipment. This can only occur when the human error probability is zero. Therefore, implementing the first level will effectively impact the equipment’s age, and the degree of impact and reduction in the virtual age resulting from the execution of this level will depend on the human error probability. Although the operations at this level are executed perfectly, due to the presence of human error, the impact of executing this level on the equipment's age will consistently be imperfect. Fig. 3 illustrates the impact of different levels of PM operations on the equipment’s age, and it also demonstrates the influence of learning on the execution time of these operations. The parameter αk represents the age reduction coefficient, and thus the virtual age of the equipment is calculated according to Eq. (14).

$$\begin{array}{cc}{a}_t=\left({a}_{t-1}+l\right)\left(1-{z}_{kt}{\alpha}_k\left(1-p\right)\right),& 0\le {\alpha}_k\le 1,\forall t.\end{array}$$
(14)

Fig. 4 depicts the interrelation between age reduction coefficient αkt and the level of PM operation concisely and effectively.

Fig. 4
figure 4

The interrelation between age reduction coefficient αkt and level of PM

The selection of PM operations to be performed in each period can be determined based on the equipment's condition, which encompasses factors such as its lifetime, production rate, noise levels, and other equipment-specific conditions. This study focuses on time-based maintenance, where the level of PM operations depends on the equipment’s age. Decision variables are utilized to establish threshold levels for the equipment’s age and the corresponding levels of PM operations. In this research, the symbols ξ and ξ′ represent the upper and lower threshold values, respectively. These values enable the determination of suitable levels of PM operations based on the equipment's age, considering their associated costs and effects on other expenses. The symbol μ denotes the maximum number of levels, which is set to 3 in this study, while ψ represents the levels of PM operation. The variables ψ, ξ, and ξ′ are decision variables within the problem. Eq. (15) represents the determination of the level of PM operation dependent on the equipment's age.

$$\begin{array}{cc}{\xi}^{\prime }-\frac{\xi^{\prime }-\xi }{\mu -2}\left({\psi}_t-1\right)\le {a}_{t-1}+l\le {\xi}^{\prime }-\frac{\xi^{\prime }-\xi }{\mu -2}\left({\psi}_t-2\right)& \mu >2,\forall t\end{array}$$
(15)

To determine the various levels of PM, it is necessary to consider Eqs. (14) and (15). Eq. (16) illustrates that only one level of PM can be selected at each inspection point, based on the equipment’s conditions. This ensures that the logical requirements of the problem are met, allowing for the selection of a single maintenance level for each inspection point. Eq. (17) establishes a connection between Eqs. (15) and (16). In Eq. (15), the appropriate level of PM operation is determined by taking into account the predicted age of the equipment. The variable ψ represents the execution level for PM, and through Eq. (16), the relationship between ψ and k can be defined.

$$\begin{array}{cc}\sum_{k\in K}{z}_{kt}=1& \forall t\end{array}$$
(16)
$$\begin{array}{cc}{\psi}_t=\sum_{k\in K}{k}_t\;{z}_{kt}& \forall t\end{array}$$
(17)

The total costs associated with PM operations are addressed in Eq. (18). The decision variable zkt, which is binary, represents the execution or non-execution of each level of PM operations in each period. The time function for executing each level of PM operations is based on Eq. (13).

$$\sum_{t\in T} ((Cb+H{m}_{kt}N{m}_{kt}){t}_{kt}^m+ Sp{m}_{kt}+ Cse{t}_t){z}_{kt}$$
(18)

The Human Error Probability Cost Associated with the Maintenance Task

Since human beings are responsible for conducting inspections, PM operations, and CM operations on equipment, there is always a risk of human error. This can lead to higher costs resulting from incorrect PM operations, inspections, misdiagnosis during PM operations, and unrecognized PM needs leading to higher CM costs. To estimate the cost function associated with human error, historical data, and expert opinions are used to gather information on the costs incurred due to human error. Since it is not feasible to directly collect information on human error from experts, information regarding CPCs is collected, and human error is calculated based on Eq. (19).

$$HEP= HE{P}_0\times {e}^{k\; CII}=0.002236\times {e}^{-0.7629\; CII}$$
(19)

The CII index, introduced in Eq. (20), calculates the difference between the total number of factors that exhibit reduced performance (represented by CPCs in an undesirable condition) and the total number of factors that exhibit improved performance (represented by CPCs in a desirable condition.

$$CII=\sum \left( Reduced- Improvement\right)\kern0.24em$$
(20)

In order to include human error in the calculation of maintenance costs within the organization, a cost function that accounts for human error must be developed. To address this issue, maintenance cost data related to various human errors was collected, based on expert opinions. Regression analysis was used to estimate the relationship between the probability of human error and its corresponding costs, using historical data. Among several known functions such as quadratic, exponential, Fourier, and exponential, it was found that the cubic function provided the closest fit to the collected data.

Regression algorithms are utilized to approximate the mapping function between input variables and continuous output variables. Several error metrics are employed to evaluate the performance of the model, and one commonly used method is the mean squared error (MSE). MSE measures the error by squaring the difference between the actual value (yi) and the predicted value and then averaging it across the dataset. The results of the average score in predicting the relationship between HEP and its cost using various functions are given in Table 2. According to the findings presented in Table 2, the cubic function demonstrates the highest concordance in terms of R-square and MSE, serving as the optimal choice for describing cost function. The cost function derived from this regression method is expressed as follows:

$$MSE=\frac{1}{n}\sum_{i\in n}{\left({y}_i-f\left({x}_i;\omega \right)\right)}^2$$
(21)
Table 2 Evaluating different known functions (e.g., quadratic, exponential, Fourier, exp)

The goal of this method is to find the best regression function (f(xi; ω)) that is equivalent to the best ω, where the optimal linear parameters of the regression function are. Fig. 5 shows the relationship between maintenance cost and human error probability.

Fig. 5
figure 5

The relationship between maintenance cost and human error probability (cubic function)

The value of the coefficient and intercept of the cubic function is given in Table 3.

Table 3 Coefficient and intercept of the estimated cubic function

According to the obtained results from Table 5, the final cost in terms of the total cost of human error probability in maintenance operation is as follows:

$$f(p)=\left(\pm {\varphi}_0\pm {\varphi}_1p\pm {\varphi}_2{p}^2\pm {\varphi}_3{p}^3\right)=-1.022\;{p}^3+128.9\;{p}^2-55.41\;p+69.83$$
(22)

Constraints

Eq. (23) is one of the capacity constraints of the model. This constraint ensures that the inventory level is less than the warehouse capacity. This constraint is formulated as follows:

$${I}_{t-1}+{u}_t{t}_t^p-{d}_t\le Wa{r}_t\forall t$$
(23)

where

$${I}_t={u}_t\;{t}_t^p+{I}_{t-1}-{d}_t-{B}_{t-1}$$
(24)

Eq. (25) signifies that the sum of the previous period’s inventory level and the current period’s production quantity must exceed the predicted demand level in order to minimize shortages within the system. Shortages not only result in evident financial losses but also impose significant hidden costs on the company, such as diminished company credibility and customer satisfaction. Therefore, it is crucial to minimize the occurrence of shortages to the greatest extent possible.

$${I}_{t-1}+{u}_t\;{t}_t^p\ge {d}_t\forall t$$
(25)

Eq. (26) represents the production time for each period. The production time is calculated by subtracting the duration of each period (L) from the time intervals associated with production line stoppages caused by system breakdowns or PM and CM operations.

$${t}_t^{\left[p\right]}=l-\sum_{k\in K}{t}_{kt}^{\left[m\right]}-{t}_t^{\left]r\right]}\left[\frac{\left({\left({a}_t+l\right)}^{\beta }-{a}_t^{\beta}\right)}{\eta^{\beta }}\right]\forall t$$
(26)

Eq. (27) defines the boundaries of the production rate range for the product. Since the production rate is a variable decision, it is essential to ensure that it does not exceed the nominal capacity of the machinery. Moreover, considering the economic aspect of the organization’s operations, the production rate should surpass a predetermined threshold. Therefore, a range is established to accommodate the production rate. In practice, the actual production rate of the machinery tends to be lower than its nominal capacity due to interruptions caused by breakdowns and maintenance operations. The nominal coefficient of the machinery’s production rate (ϑt) effectively reduces the overall time allocated to production operations, including the duration of machinery breakdown stoppages and the time dedicated to maintenance operations.

$${u}_t^{\textrm{min}}\le {u}_t\le {u}_t^{\textrm{max}}$$
(27)

where

$${u}_t^{\textrm{max}}={g}_t\left(1-{\vartheta}_t\right)$$
(28)

and

$${\vartheta}_t=\frac{t_{kt}^m\;{z}_{kt}}{H}+{t}_t^r\left[\frac{\left({\left({a}_t+l\right)}^{\beta }-{a}_t^{\beta}\right)}{\eta^{\beta }}\right]$$
(29)

If no production takes place during a specific period, then no setup costs are applied to the system, and the production rate is zero. This inherent constraint is represented by Eq. (30).

$$0\le {u}_t\le Cset\;{y}_t\forall t$$
(30)

By combining Eqs. (29) and (30), Eq. (31) is obtained, which effectively represents the integration of both constraints and ensures their fulfillment.

$${u}_t^{\textrm{min}}\kern0.24em {y}_t\le {u}_t\le {u}_t^{\textrm{max}}\kern0.24em {y}_t\forall t$$
(31)

Eq. (32) indicates that during each period, only one level of PM operation can be performed on the machine.

$$\sum_{k\in K}{z}_{kt}=1\forall t$$
(32)

Eq. (33) describes that the machine availability must surpass the minimum accessibility level required for servicing in each period.

$${t}_t^p\ge A{E}_t \forall t$$
(33)

Eq. (34) indicates that the total organization costs should be less than the available budget amount.

$$\textrm{Total}\kern0.24em \textrm{Cost}\le TB$$
(34)

The final constraint outlines the specific range limitations of certain variables in the model, which, based on their nature, can only take on values within a particular range.

$$0.00005\le p\le {p}_{\textrm{current}},{z}_{kt}\in \left\{0,1\right\}$$
(35)

Findings

The model is converted to a mixed-integer non-linear programming model and solved using the DICOPT solver in the GAMS software. The computations are executed on a system with an AMD Ryzen 3 2200U processor running at 2.5 GHz, 8 GB of RAM, and a 64-bit operating system. Notice that the case study of this research centers around a factory engaged in the manufacturing of automobile hydraulic steering boxes. This study specifically directs its attention towards the strategic planning and practical evaluation of maintenance operations, focusing on a CNC machine situated within an automobile parts manufacturing facility. The dataset employed for the analysis and optimization of integrated production and maintenance activities is exclusively centered on a CNC machine designated for the manufacturing of automobile steering boxes. The dependable functionality of CNC machines plays a significant role in influencing production reliability and reducing waste generation within an organizational context. As equipment damage escalates, the average interval between device failures diminishes, potentially resulting in disruptions to the production process. Equipment failures arising from damage not only lead to periods of lost production time but also necessitate supplementary time for equipment setup and adjustments.

The results obtained from solving the model, based on data from the case study involving a single equipment (CNC lathe machine), one product (hydraulic steering box), and three different levels of PM operations over nine periods, are presented in Table 4. These results provide insights into the associated costs for each inspection period, the threshold associated with the equipment’s age, and the impact of each level of PM operations on the equipment’s age (ak). The equipment’s age in each period (at) is determined by the execution of each level of PM operations and the passage of time. The timing of executing each level of PM operations during the inspection period (\({t}_{kt}^{\left[m\right]}\)) is determined based on whether that level of operation is implemented or not, as well as the initial timing for executing that particular level. The production quantity (ut) is also determined by taking into account maintenance costs, shortages, the demand level, and the equipment’s condition in terms of maintenance operations. The production time in each period is calculated based on the total available time and the downtime caused by PM and CM operations (Table 5).

Table 4 The values obtained from solving the model
Table 5 The various scenario to conducting sensitive analysis

The results suggest that for the first, fourth, sixth, and eighth periods, it is preferable to implement level 3 PM operations. However, it should be noted that level 3 has only a negligible impact on the equipment's age. Therefore, at the beginning of the second period, the equipment’s virtual age is set at 3.6 (while its actual age remains at 4). Considering the costs associated with each level of PM operations, their effects on the equipment's age, and the resulting failure rate and CM cost for the next period, it is advisable to perform level 2 PM operations on the equipment. By implementing level 2 PM operations, the virtual age of the equipment is significantly reduced, halving its duration. Consequently, at the start of the third period, the equipment's age is expected to be 3.8 (compared to its age of 8, considering two periods with 4-unit time intervals). Taking into account the costs involved in executing PM operations at different levels, the most favorable approach for the equipment in this period is to implement level 1 PM operations on the system.

Implementing level 1 PM operations results in optimal PM operations, leading to a significant reduction in the equipment’s age, approaching zero (although a complete reduction to zero is not achieved due to the probability of human errors associated with these operations). This trend continues in subsequent periods. The equipment's accessibility time is determined by considering the overall duration and the time allocated to both PM and CM operations. The eighth period stands out with the highest level of equipment accessibility. During this period, level 3 PM operations were carried out, ensuring uninterrupted production line operation. Moreover, level 1 PM operations were performed in the previous period, resulting in a decrease in system failures for the next period. As a result, system downtime caused by equipment failures and PM operations can be minimized to the lowest feasible value.

The results of the model for the next nine periods of the company propose that in three periods, first-level PM operations (i.e., Periods 3, 5, and 7); in two periods, second-level PM operations (i.e., Periods 2 and 9); and in four periods, third-level operations (i.e., Periods 1, 4, 6 and 8) should be implemented. Also, the human error probability level should be reduced to 0.00005.

Sensitivity Analysis

This section employs a sensitivity analysis procedure to assess the accuracy and performance of the model. This is accomplished by manipulating various parameters to create different scenarios, involving both decreases and increases. By evaluating the model's performance in each scenario, a comprehensive understanding of its capabilities is obtained. The validation of the model’s results relies on the consistency between the outcomes of each scenario and the expected behavior. If the results demonstrate the desired behavior and exhibit logical performance in each scenario, it can be concluded that the model is representative and accurate. This validation process enhances the credibility of the model’s results. It is important to note that all other parameters were kept constant at their original values throughout each analysis. Through this sensitivity analysis, our goal was to gain deeper insights into how changes in the parameters impact the components of the objective function. This process aimed to provide valuable insights into the behavior of the model.

In this section, we conducted an extensive sensitivity analysis on specific model parameters, namely, the age reduction coefficient, profit per unit, setup cost, and length of each period. The objective was to examine how variations in these input parameters affect the objective function and decision variables. The results obtained from solving the model with different values of αk for each level are presented in Table 6. In the first scenario, we decreased the value of αk for the third level from 0.1 to 0, resulting in no impact on the virtual age of the equipment when the third level of PM actions is performed. The results indicate that executing the third level of PM actions is no longer cost effective. As a result, the outcomes for the first, fourth, sixth, and eighth periods, where PM actions were implemented at the third level, have changed. Consequently, it is advisable to carry out the first and second levels of PM actions during these periods (Fig. 6).

Table 6 The values obtained from the sensitive analysis
Fig. 6
figure 6

The changes of decision variables (zkt) in scenario 1 (α3t = 0)

In the second scenario, when the value of αk for the first level is reduced from 1 to 0.8, it becomes more favorable to execute the second level of PM actions. This indicates that the effectiveness of implementing the first level in enhancing the machine’s age decreases, leading to the selection of the second level for these operations. Since the benefits of executing the second level of PM actions outweigh the associated costs, this level is chosen for more periods (Fig. 7).

Fig. 7
figure 7

The changes of decision variable (zkt) in scenario 2 (α1t = 0.8)

In the third scenario, when the value of αkt for the second level decreases from 0.5 to 0.45, the possibility of opting for the second level of PM actions decreases. As evident from the results, either level 1 or level 3 consistently emerges as the optimal choice for PM actions across different periods (Fig. 8).

Fig. 8
figure 8

The changes of decision variable (zkt) in scenario 3 (α2t = 0.45)

Moreover, in order to perform the analysis, we systematically adjusted the values of three parameters (i.e., the interval of each time, the profit of each unit, and setup cos). These adjustments ranged from −5 to +5% of their original values. Table 6 shows the results for some selected values. For the basic model, bp, Csett and L are 285, 20, and 4, respectively. Clearly, if the profit of each product increase, the total profit will also grow. On the other hand, the decrease in setup cost brings about the growth in total profit. Therefore, our expectations regarding the model’s performance are accurately satisfied based on the results presented in Table 7. Also, it illustrates the change in the length of each period parameter. Smaller inspection periods bring the system closer to continuous inspection mode. This leads to more frequent PM considerations and operations, resulting in improved review and implementation accuracy for these operations. Consequently, as the length of each period decreases, costs decrease due to the appropriate execution of PM operations.

Table 7 The values obtained from sensitive analysis for three parameters (i.e., Cset, bp, and L)

Discussion

Previous studies provide valuable insights into both maintenance and production operations. However, these studies have thus far overlooked the crucial examination of how human errors impact these operations. The primary deficiency within these papers is their omission of the significant influence of the human factor on maintenance. Their concentration has been exclusively directed toward the effects of non-human factors on integrated maintenance and production operations. To the best of our knowledge, this paper is the inaugural instance in which the decisive influence of human error probability on the overall cost and equipment lifespan resulting from these operations has been considered. Furthermore, this paper introduces an enhanced model aimed at cost reduction across production, maintenance, inventory control, and human error aspects, all while ensuring adherence to constraints tied to customer demand, budgetary limits, production rates, and capacity considerations. In this study, maintenance operation costs have been methodically classified. Notably, this work innovatively segregates maintenance-related costs into two distinct categories: time-dependent and time-independent costs. Besides, in contrast to existing research in the literature, the model takes into account setup and equipment opportunity costs, calculated based on the downtime experienced by equipment due to maintenance activities.

The findings underscore the substantial role of human error in contributing to equipment failures. The derived results have determined the optimal human error probability to be 0.00005. This numeric insight strongly highlights the significant influence of human error on both overall costs and equipment lifespan. It emphasizes the critical need for implementing substantial reductions in human error whenever viable. Our findings are consistent with prior research by Aalipour et al. (2016), Azadeh et al. (2016), Hameed et al. (2016), and Ighravwe and Ayoola Oke (2021). These studies have considered the importance of human errors in maintenance, addressing the analysis and prioritization of such errors through MADM methods. However, as previously mentioned in the research background, none of these studies have quantitatively computed the optimal value of human error in production, maintenance, and repair operations, while accounting for additional costs. Moreover, our results share a number of similarities with the result of Emami-Mehrgani et al. 2016 Their research confirms that human errors in maintenance activities contribute to an escalation in the total production cost. The findings of this study also suggest that the overall costs decrease through the integrated optimization of production and maintenance operations, in comparison to individually examining each of these components. This concurs well with the findings of Rivera-Gómez et al. (2021), Sharifi and Taghipour (2021), Ghaleb et al. (2020), and Zheng et al. (2021).

In this study, considering human errors, the execution of PM and CM operations is consistently accompanied by a margin of human error. Human error is indeed recognized as an integral factor affecting equipment lifespan. Consequently, within this investigation, none of the PM operation levels lead to achieving an “as good as new” state for the equipment after their implementation. In fact, the machine’s lifespan never reaches zero. As a result, the outcomes of optimizing PM operations differ from certain published studies, e.g., Ait-El-Cadi et al. (2021), Li et al. (2022),; and uit het Broek et al. (2021); they align with the results anticipated under perfect PM conditions. In studies assuming maintenance operations are perfect, the selection of maintenance levels is interdependently linked with the costs associated with implementing these maintenance levels. Nonetheless, within the framework of this study, the identification of optimal maintenance levels goes beyond the sole consideration of PM operation costs. It also takes into account human errors and their associated expense.

Managerial Insights

This paper provides a solution to decreasing costs in industries to increase equipment accessibility and reduce human error probability. The findings of this study are important for managers and system designers who intend to implement the policies to develop efficient solutions for decreasing human error probability and increasing equipment accessibility. These outstanding results and optimal solutions are very important for industrial managers and decision-makers to achieve a successful market. Our research could be a useful aid for decision-makers because simultaneously identifying and investigating both maintenance and production operations lead to more appropriate planning and a unified model in the field of this research. In order to mitigate the impact of human error on equipment, organizations should focus on CPCs affecting human error probability and oversight during maintenance activities. Thus, managers and decision-makers will make efforts to improve the factors that influence human error (i.e., CPCs), aiming to minimize costs by determining the optimal value of human error. In conclusion, optimizing integrated production and maintenance planning in industrial settings should involve a comprehensive consideration of human error. Improving contextual conditions that affect human error in maintenance activities, such as implementing proactive measures, training programs, error-proofing mechanisms, and utilizing advanced technologies, can enhance operational efficiency, reliability, and productivity in maintenance activities.

The presented model is highly beneficial to similar organizations in which PM performance is essential, and human error and equipment failure will incur high costs. This model determines the time and frequency of PM operation properly regarding production operations. Moreover. In this paper by improving the level of CPCs affecting human error, the human error probability can be improved to the desired level. Consequently, the equipment maintenance and production costs are minimized by properly implementing maintenance and production operations and the optimal value of human error. In order to validate the presented model, sensitivity analysis was conducted on four parameters (i.e., αkt, Cset, bp and l), and their results were presented. The changes in these parameters were investigated, and the effect of these changes on the decision variables and the total cost was presented.

Conclusions

In summary, this research has presented a model for the integrated planning of production and maintenance operations. An optimal and integrated maintenance and production planning approach that considers human error can positively impact sustainability by improving resource efficiency, reducing waste, enhancing safety, and maximizing operational effectiveness. The main purpose of the current study was to introduce a model so as to optimize the planning of these operations to minimize costs and enhance future work processes within the organization. This research simultaneously considers the costs associated with production operations, inventory control, PM and CM operations, as well as human errors related to maintenance tasks. One of the main advantages of the proposed model is its ability to quantify the impact of human errors and integrate them into the comprehensive planning of production and maintenance operations, an aspect that has been overlooked in previous research. Moreover, this is the first study to estimate the cost function associated with human errors in maintenance tasks using historical data and a regression method.

Due to the fact that the human error probability has a considerable impact on the efficiency and suitability of maintenance operations, this study has demonstrated, for the first time, that human error could affect the age reduction coefficient of equipment lifespan. Indeed, if the level of human error decreases, the execution of maintenance operations at different levels can be carried out more effectively and appropriately, as well as leading to enhancing equipment conditions. The findings of this paper effectively outlined the optimal production rate to minimize holding and shortage costs, as production line downtime is caused by maintenance operations. Moreover, the results of this study suggest which period and level of PM operations have been more suitable, taking into account their impact on the lifespan and conditions of the equipment, and also its associated costs. Besides, the findings determine the frequency of execution for each level of PM operation.

This research investigated human error only on the cost of maintenance and equipment age. It is recommended that further research should be undertaken at different levels of influence on the equipment age of CM operations. Furthermore, future work can concentrate on the effect of human error on other operations simultaneously. The issue of human error in various operations of an organization is an intriguing one that could be usefully explored in further studies. The present study has only investigated the impact of personnel learning on PM operations. Therefore, future work on the current topic can examine this impact on CM operations. Also, in this paper, the time-independent cost of PM and CM operations in different periods is assumed to be constant. Future research can be assumed to be unstable, and their costs are dependent on other parameters, e.g., inflation rate and spare part inventory level. Also, we need to examine more closely the links between human error and maintenance operations. Besides, in the case of CM operations, akin to PM measures, it is essential to account for various levels with varying degrees of influence on equipment conditions and lifespan. This practice ensures a closer alignment between the execution of these operations and real-world scenarios.

In the present study, maintenance operations have been carried out based on the equipment’s lifespan, forming the basis for decisions regarding the execution of different levels of preventive measures. So far, all conducted studies have either focused on preventive actions solely based on lifespan or have centered around equipment conditions when formulating plans for preventive maintenance programs. It is recommended that future research endeavors take a more integrated approach by simultaneously considering maintenance based on both conditions and lifespan. This integration has the potential to yield enhancements in the process of scheduling preventive maintenance activities. In this study, the impact of human error on the equipment's lifespan has been exclusively assessed by evaluating its influence on the reduction coefficient. It is advisable that future research efforts focus on estimating the effect of human error on the equipment's lifespan as a function. Such an approach would facilitate a more accurate representation of its influence on the overall life span of the equipment. Moreover, through the integration of inventory control planning for spare parts, maintenance operations can be executed efficiently.