1 Introduction

In today’s business world, quality is a key component of customer satisfaction (Konstantas et al. 2018). A high-quality manufacturing process is a critical prerequisite for producing reliable products which necessitate flawless manufacturing processes as well as proper quality assurance technique (He et al. 2018). Therefore, it is essential to monitor the key variables that are critical to ensure the quality of processes and products. To ensure proper monitoring, statistical tools are suggested to analyze the anomalous variations of the process and improve the quality of the overall production system. Statistical process control (SPC) is the collection of different statistical tools that can be used to regulate the process stability and prevent deviation of the products and processes from the desired quality. SPC is not only used in manufacturing and industrial sectors but also applied to monitor several non-industrial processes as well (Bersimis et al. 2018). In SPC, various types of control charts are used to monitor and control necessary process parameters by detecting shifts or deviation from the desired condition. Conventional control charts are designed considering statistical criteria only. Therefore, sometimes its performance can be unsatisfactory from an economic point of view. However, economic control charts (ECC) are designed considering both economic and statistical criteria (Sultana et al. 2018; Khadem and Moghadam 2019). Thus, ECC is not only used in quality control but also to determine and evaluate some design parameters to optimize overall costs (Pasha, Bameni Moghadam et al. 2018a, b).

While the control charts focus on restricting the process parameters within the statistical control limit, the process itself can be hampered due to the sudden breakdown of a machine and requires corrective maintenance action to reperform. In addition, before the complete breakdown, the machine may degrade from its desired working conditions while manufacturing process parameters remain within control limits, which deteriorates product quality and sometimes results in rejections of products. The rejections, in turn, increase repair and replacement and the associated costs. Preventive maintenance (PM) can reduce these breakdown rates as well as improve a machine's performance and enhance productivity (Esmaeili et al. 2019). Besides, ‘Condition-based monitoring’, a predictive maintenance program can also be launched to take necessary actions for the critical to failure equipment (G. Q. Cheng et al. 2018). It guides the top management to decide and execute suitable maintenance plans to prevent premature failures of the equipment. Even the data from regular cleaning and repairing actions; often termed as ‘Minor maintenance’ can assess the current condition of an asset and thus contribute to accuracy in the diagnosis and predictive techniques used in condition-based monitoring (Liang et al. 2020). Therefore, a suitable maintenance policy can help to reduce the number of breakdowns and process variations, which enhances quality levels. Hence, quality control and maintenance management are interrelated and a joint optimization technique between these two may bring economical amenities towards the sustainable growth of the organization.

Therefore, economic control charts and maintenance policies should be implemented simultaneously by the industries to maintain desired quality levels at minimum cost, which led researchers to develop integrated economic models using the concept of maintenance policy and control charts (B. Bouslah et al. 2016; X. Yang and Zeng 2018). However, there is a significant lack of comprehensive studies to develop a joint optimization model that aims to deliver the best quality products considering both quality and maintenance-related costs. Therefore, this research contributes to the existing literature by addressing the following objectives:

  • Proposing an integrated model for evaluating the optimal values of CUSUM chart parameters and preventive maintenance intervals to minimize costs.

  • Introducing the VSIFT sampling policy in conjunction with a CUSUM control chart. This will facilitate faster detection of any shift or process deviation; consequently, out of control ARL during the process will be minimized.

  • Computing quality cost by precisely detecting in-control and out-of-control quality loss for both process mean and variance using the TLF function and a modified linear loss function respectively.

The remainder of the paper is organized as follows: Sect. 2 gives the related literature. Section 3 presents the proposed model. Sections 4 provides the numerical examples and compares the findings with extant literature, followed by sensitivity analysis in Sect. 5. Finally, the conclusions comprising of the research implications, limitations, and scope for the future works are discussed in Sect. 6.

2 Literature review

The extant literature shed light on the necessity of integrating quality control and maintenance management, also presents several models of control charts with a focus on maintenance policies.

Many studies proposed that process quality control and machine maintenance should be considered simultaneously. It has been established that integration of quality control and maintenance decisions can reduce the overall production cost significantly for a single machine unit as well as for a multi-stage production system (Duffuaa et al. 2020; G. Cheng and Li 2020). Rasay et al. (2018) considered SPC with maintenance activities to evaluate as well as improve productivity and showed that integrated economic design of SPC and maintenance planning are far more effective and lucrative for a production system than stand-alone models, which are designed with each component considered separately. Salmasnia et al. (2018) demonstrated that integration of production planning, maintenance policy, and statistical process monitoring can contribute to significant cost savings in the production system. An integrated model of production, quality, and maintenance control for a multistage manufacturing system has also been developed by some researchers (Bassem Bouslah et al. 2018; Zheng et al. 2020). They demonstrated that system reliability is interrelated to the incoming product’s quality in a multistage manufacturing system and the final outgoing quality of a product depends on the production, quality, and maintenance control settings across all manufacturing stages.

Due to the correlation between quality control and maintenance, researchers initially started developing joint economic models of control chart and maintenance using \(\overline{X}\) control chart (Tagaras 1988; Ben-Daya 1999; Ben-Daya and Rahim 2000). Xiang (2013) integrated age-based preventive maintenance with \(\overline{X}\) control chart to minimize the overall operational costs of a deteriorating production system. Besides, Liu et al. (2013) utilized \(\overline{X}\) control chart with a condition-based maintenance program for a two-unit series system and concludes that quality loss can be reduced when statistical analysis is blended with maintenance policy.

Over time, exponentially weighted moving average (EWMA) control charts have received increased attention from researchers compared to \(\overline{X}\) charts (Serel and Moskowitz 2008; Haq et al. 2015). Moreover, researchers have focused on determining both in-control and out-of-control costs and the proper selection of a sampling policy for control charts. For instance, Chou et al. (2008) used the variable sampling interval fixed time (VSIFT) control scheme in developing a framework for the design of an EWMA control chart. S. F. Yang (2010) compared a variable sampling interval (VSI) control chart with a fixed sampling interval control chart to monitor quality variables of two dependent process steps. He found that the VSI control chart showed better performance over fixed sampling interval control charts in detecting small and median shifts in mean and variance. Pandey et al. (2012) introduced a new integrated approach for joint optimization of preventive maintenance intervals and control chart parameters using the Taguchi loss function (TLF) and put forward a method to categorize machine and process failure. Shamsuzzaman et al. (2016) developed a model by combining the Shewhart \(\overline{X}\) and EWMA chart to optimize the charting parameters of the \(\overline{X}\) and EWMA charts. Sultana et al. (2018) illustrated an economic model of the EWMA control chart using the VSIFT sampling policy. They used the Taguchi quadratic loss function to determine both in-control and out-of-control quality loss. Their proposed model optimizes both preventive maintenance intervals and quality control policies.

However, many researchers are now giving preference to CUSUM and other control charts than \(\overline{X}\) and EWMA charts. CUSUM showed a prompt and accurate response in detecting shifts within a standard deviation of one or two (De Vargas et al. 2004). Shrivastava et al. (2016) developed an integrated model using a CUSUM control chart to optimize maintenance and quality control policies jointly. They considered minimal corrective maintenance and imperfect preventive maintenance policy to develop the model. Li et al. (2018) proposed a recursive model by integrating a CUSUM control chart and age-based preventive maintenance policy. In their model, sampling policy has been developed using non-Markovian deterioration assumptions. Adeoti and Olaomi (2018) developed an SPC model to monitor the process mean using a new process capability index control chart. They found that the model performs better in detecting small shifts than do existing control charts. Researchers also attempted to develop an integrated model of quality control and maintenance using time-between-events (TBE) control charts and preventive maintenance (PM) policy to minimize product reliability degradation (He et al. 2019; Chen et al. 2020). According to the findings of these articles, a superior manufacturing process quality is the prime need to ensure final product reliability. It is also found that joint PM and TBE chart performs better than conventional periodic maintenance and SPC method, to ensure product reliability, with the same economic constraint.

Even though many of the above-mentioned integrated models focused on integrating quality parameters and maintenance policies, there are still some gaps that have not been addressed by contemporary researchers. We have identified the following research gaps in the existing literature on this topic:

  1. 1.

    Most of the studies implemented \(\overline{X}\) control charts and EWMA charts to observe the process mean of the joint-optimization models, whereas the application of the CUSUM chart, which is effective to detect a small shift, has been limited in the literature.

  2. 2.

    To the best of our knowledge, no study combines the mean and variance of a process utilizing the CUSUM chart. However, process variance due to the event of a machine/equipment failure can deviate more than the acceptable limits while the process means remain within the control limit in some cases.

  3. 3.

    Although the VSIFT sampling policy is found more effective than other sampling techniques in designing different control charts, it has never been used in conjunction with a CUSUM chart to develop integrated economic models.

  4. 4.

    In-control quality losses due to the deviation from the target value have rarely been considered in most of the studies. However, integrating in-control quality losses along with the out of control quality costs should provide a broader perspective of the overall costs to the top management.

Hence, this study aims to address all these research gaps and provide an all-in-one solution to the industrial managers to obtain the economic benefits in their organizations.

3 Model development

3.1 Problem statement and assumptions

An integrated economic model is presented using a CUSUM chart to determine the optimal values of eight parameters; i.e., sample size (n), fixed sampling interval (h), sampling sub-interval (\(\eta\)), control limit coefficient of CUSUM-m chart (k), warning limit coefficient of CUSUM-m chart (w), control limit coefficient of CUSUM-S2 chart (k1) warning limit coefficient of CUSUM-S2 chart (w1) and preventive maintenance interval (tpm) to minimize the expected total cost per unit time of this integrated model. Here, the VSIFT sampling policy is considered for both CUSUM-m and CUSUM-S2 charts. Joint ARL for mean and variance was computed using the absorbing Markov chain approach. In-control and out-of-control costs were determined using the Taguchi quadratic and linear loss function for the CUSUM-m and CUSUM-S2 charts, respectively.

We consider a production system that comprises one machine and produces the same item at a constant production rate. Here, a single component of a machine is considered as the operating part and the time to failure of this component follows the two-parameter Weibull distribution. Two failure modes are considered for machine failures:

  1. (i)

    Failure mode 1 (FM1): causes breakdown of the machine.

  2. (ii)

    Failure mode 2 (FM2): causes deterioration in process and product quality due to partial failure of machine or due to some external causes.

Therefore, if any breakdown occurs that is due to FM1, and if the machine runs at interior condition without any breakdown that is because of FM2. Similar classifications were also used by Lad and Kulkarni (2008), who stated machine tool failures as any event, which either causes the breakdown of a machine or keeps the machine running with a high percentage of defective products. According to Garg et al. (2013), this second failure mode actually occurs due to unplanned preventive maintenance action and incurs repairing and replacement costs. Therefore, due consideration of these failures and failure costs is essential to make a suitable maintenance planning decision.

We make the following assumptions when developing the model.

  • A single product is manufactured on the machine with a single critical to quality (CTQ) characteristic.

  • Minimal corrective maintenance and imperfect preventive maintenance are considered here. Therefore, corrective maintenance equipment will be of the same age as it was during the time of failure. On the contrary, after preventive maintenance, machine life will be increased but the machine will not be considered as new.

  • Failure modes FM1 and FM2 are independent of each other and in case of failure, probability of failure mode 1 (PFM1) + Probability of failure mode 2 (PFM2) = 1, since these two modes are the only failure types considering here. These probabilities can be obtained from the failure reports maintained by maintenance personnel from production lines.

  • The process is jointly monitored by a VSIFT CUSUM-m and CUSUM-S2 control chart.

  • Detection of assignable causes and restoration of the machine is done by the transitory shut down of the system. The whole system starts production again once the machine returns to perfect operating condition.

3.2 Problem description

Different types of failure modes and the costs associated with them have been discussed in some of the existing studies. Since we have considered two different modes of failures, we need to find out the costs that have been incurred due to the complete failure (FM1) or partial failure (FM2) of the machines. According to the study conducted by Pandey et al. (2012), if FM1 occurs, prompt corrective measures are performed to fix the machine by stopping it. Thus, the expected cost of corrective maintenance E[CCM]FM1 includes repair/restoration and downtime costs. If FM2 occurs, the process deviates from its desired condition, which increases the probability of rejection of products. Thus, corrective actions are made to rectify the process deviation. Process deviation or deterioration of process performance occurs not only due to FM2 but it can also take place due to some external causes (E), like environmental conditions, inefficient operators, use of the wrong tools, lack of awareness, etc. After detecting any external cause ‘E’, the process is reset again in the controlled condition. The Occurrence of FM2 or the presence of an external cause may be detected by observing the process. In an ideal scenario, it is assumed that after the termination of a maintenance action, the component will return to its’ brand-new condition. However, this is seldom possible to achieve in practical settings. Instead, it is more accurate to consider the age of the renewed component to be a certain percentage of its’ original age, which is termed as the ‘restoration factor (RF)’. A value of RF equals 1 indicates that the repaired component is as good as new and a value of 0 means that no restoration takes place after maintenance action. RF is an important concept to the top management to establish proper inspection and maintenance plan provided that they don’t have the budget to replace the defective component with a new one.

We consider a VSIFT CUSUM control chart mechanism. The expected total cost of process failure, E [TCQ]process-failure owing to FM and external causes is computed considering downtime, rejections, repair/resetting, sampling, and inspection costs. Investigation costs of false/valid alarm and costs of deviation from target value are also considered. Along with these corrective actions, preventive maintenance is also necessary to minimize unexpected downtime losses.

Imperfect preventive maintenance is considered as a preventive maintenance policy implying that, after maintenance, equipment will be in a condition somewhere between as-good-as-new and as-bad-as-old. Therefore, the failure frequency and out of control quality cost can be minimized through proper use of PM and FM2. Since PM also causes some machine downtime while executing preventive actions, the expected cost of PM (E [CPM]) includes both downtime cost and the cost of performing preventive maintenance.

3.3 The VSIFT CUSUM chart

In the VSIFT sampling scheme, samples are taken at every fixed time interval (denoted as h) while the possibility of process deviation tends to zero. However, additional samples are taken when the sample points remain within the control limits but not close to the target value, indicating the probability of process deviation from the expected level. If the fixed time interval h is divided into η subintervals of length d, where d = h/η, for example, if h = 1 h and η = 5 (i.e., d = 12 min), and the sample point does not indicate any problem, then samples will be collected at each hour interval. However, if there is any indication of the process shift, then the next sample should be taken 12 min later.

3.3.1 CUSUM chart for the mean (CUSUM- m)

The ith sample statistic of a CUSUM chart for the mean is:

$$C_{i} + = max\left[ {C_{i - 1}^{ +} + x_{i} - \left( {\mu_{0} + s} \right), 0} \right],$$
(1)
$$C_{i} - = max\left[ {C_{i - 1}^ + + \left( {\mu_{0} - s} \right) - x_{i}, 0} \right],$$
(2)

where Ci+ and Ci indicate upper and lower (one-sided) CUSUM for ith sample statistic, respectively. µ0 is the process target value, xi is the observed value or average of the observed values of subgroups of a sample and s is the allowable slack and it is denoted by \(s = \frac{{\delta * \sigma_{x} }}{2}\) Since the primary aim of the CUSUM chart is to monitor the small shifts in the process, the slack is tried to maintain within 0.5–1 of standard deviation. So, the value of δ can be of any value between 1 and 2. However, the default value of δ is usually set to 1 to ensure the detection of the minimum shift and capture all the assignable causes around the vicinity (Woodall and Faltin 2019). The upper (UCL) and lower control limits (LCL) for the CUSUM-m chart are:

$$UCL = \mu_{0} + k\sigma_{x} ,$$
(3)
$$LCL = \mu_{0} - k\sigma_{x} .$$
(4)

The k is the control limit coefficient of the CUSUM-m chart and \(\sigma_{x}\) is the sample standard deviation and denoted as

$$\sigma_{x} = \frac{{{\text{o}}^{\prime}}}{\surd n},$$
(5)

where o’ is the process standard deviation and n is the sample size. The upper (UWL) and lower (LWL) warning limits for the CUSUM -m chart are:

$$UWL = \mu_{0} + w\sigma_{x} ,$$
(6)
$$LWL = \mu_{0} - w\sigma_{x} .$$
(7)

Here, w is the warning limit coefficient of the CUSUM chart for the mean.

3.3.2 CUSUM chart for variance (CUSUM-S2)

Castagliola et al. (2009) proposed a new type of CUSUM chart to monitor and control the variance of a process. The CUSUM-S2 chart is used in this paper to monitor the variance.

Let xk,1,...,xk,n be n independent random variables used as a sample in plotting a control chart to observe process dispersion, having process mean, µ0, nominal process standard-deviation, o’ and subgroup number k. Since this chart is employed to observe the process dispersion, therefore, the “out-of-control” condition is considered when the process mean matches the target mean but the standard deviation shifts from the desired level. If Sk2 is the sample variance of subgroup k, i.e.

$$S_{k}^{2 } = \frac{1}{n - 1}\mathop \sum \limits_{j = 1}^{n} (x_{kj} - \overline{x}_{k} )^{2} .$$
(8)

Here, \(\overline{x}_{k}\) is the sample mean of subgroup k.

$$T_{k} = a + bln\left( {S_{k}^{2} + c} \right),$$
(9)

where \(b = B\left( n \right),\)

$$c = C\left( n \right)\sigma^{2} ,$$
$$a = A\left( n \right) - 2B\left( n \right)\ln \left( \sigma \right),$$

The values of \(A\left( n \right), B\left( n \right), C\left( n \right)\) can be determined from the study of (Castagliola et al. 2009). The \(k^{th}\) static to monitor the process variance is,

$$Z_{k} + = {\text{max}}\left[ {0,Z_{k - 1} + \left( {E\left( {T_{k} } \right) + L} \right)} \right],$$
(10)
$$Z_{k} - = \max \left[ {0,Z_{k - 1} + \left( {E\left( {T_{k} } \right) - L} \right) - T_{k} } \right],$$
(11)
$$Z_{k} = \max \left( {Z_{k} - ,Z_{k} + } \right).$$
(12)

Here, E(\(T_{k}\)) is the expected mean value of Tk. L is the allowable slack for the CUSUM-S2 chart. L ≥ 0 is a constant. The upper control limit (UCL) and lower control limit (LCL) for the CUSUM-S2 chart are:

$$UCL = k_{1} \sigma_{x}$$
(13)
$$LCL = 0,$$
(14)

where k1 is the control limit coefficient of the CUSUM-S2 chart and

$$\sigma_{x = } \frac{{{\text{o}}^{\prime}}}{\surd n} .$$
(15)

The upper warning limit (denoted by UWL) for the chart is,

$$UWL = w_{1} \sigma_{x} ,$$
(16)

where \({w}_{1}\) is the warning limit coefficient of the CUSUM-S2 chart.

Thus, in case of a VSIFT sampling policy, if a sample point falls in the in-controlled region (i.e., LWL ≤ xi ≤ UWL) for both of the mean and variance charts, then the next sample is taken after the fixed sampling interval. If the last sample point falls in the warning region for at least one chart, i.e. CUSUM-m or CUSUM-S2 (i.e., UWL < xi ≤ UCL or LCL ≤ xi < LWL or xi > UWL), then the next sample is taken after sampling sub-interval d. If any sample point falls outside the control limit, then managers look for the assignable causes for the problem in order to rectify it.

3.4 Mathematical model

3.4.1 Joint ARL computation for mean and variance

Average run length (ARL), which measures the expected number of consecutive samples, remains at the in-control region before the sample statistic is outside the control limits. The value of ARL depends on whether the process is in-control or out-of-control. In case of using multiple charts for process monitoring, the search for an assignable cause commences if any one of the charts shows an out-of-control signal. Therefore, while using multiple charts simultaneously, joint ARL of multiple control charts is more effective than ARLs of individual control charts. The joint ARL of mean and variance for the VSIFT CUSUM chart can be computed using the absorbing Markov chain approach.

According to Morals and Pacheco (2000), to determine the ARL of a combined scheme, the run length distribution of mean and variance charts needs to be determined. Then, using these results, the complementary cumulative distribution function can be determined. The joint ARL is:

$$ARL_{j} = \left[ {\mathop \sum \limits_{s = 0}^{\infty } \overline{F}_{{RL_{\mu }^{i} }} \left( s \right)*\overline{F}_{{RL_{{s^{2} }}^{j} }} \left( s \right)} \right]_{{\begin{array}{*{20}c} {i = 1, \ldots .u + 1} \\ {j = 1, \ldots .v + 1} \\ \end{array} }}$$
(17)
$$\overline{F}_{{RL_{\mu }^{i} }} \left( s \right) = \{_{{e_{\mu .i}^{T} *[Q_{\mu } ]^{s} *1_{{\mu ,^{ s > 1,} }} }}^{1, S < 1,}$$
(18)
$$F_{{RL_{{s^{2} }}^{j} }} \left( s \right) = \{_{{e_{{s.^{2 } j}}^{T} *[Q_{{s^{2 } }} ]^{s} *1_{{s^{2, s > 1,} }} }}^{1, s < 1,}$$
(19)

where \(\overline{F}_{{RL_{\mu }^{i} }} \left( s \right)\) and \(F_{{RL_{{s^{2} }}^{j} }} \left( s \right)\) are the probability that the expected run length of CUSUM-m and CUSUM-S2 is greater than or equal to s, respectively. Here, u + 1 and v + 1 are the number in the in-control state for the CUSUM-m and CUSUM-S2 charts, respectively. \(e_{\mu .i}^{T}\) and \(e_{{s.^{2 } j}}^{T}\) are the transpose of the orthonormal basis of Ru+1 and Rv+1, respectively.\(Q_{\mu }\) and \(Q_{{s^{2 } }}\) are the [u x u] and [v x v] matrix that represents initial transition probabilities for the CUSUM-m and CUSUM-S2 charts, respectively. 1µ and 1s2 are the column vector of one.

3.4.2 Development of cost function

The cost function is developed following the model by Lorenzen and Vance (1986), and the details of the cycle length and different cost functions are given in Appendix 2 and 3, respectively.

The expected cycle length is,

$$E\left( {T_{cycle} } \right) = E\left( {T_{1} } \right) + E\left( {T_{2} } \right) + E\left( {T_{3} } \right) + E\left( {T_{4} } \right).$$
(20)

Here, \(E\left( {T_{1} } \right)\) = the expected in-control process time;

\(E\left( {T_{2} } \right)\) = the expected out-of-control time before declaring the process is out of control;

\(E\left( {T_{3} } \right)\) = the expected sampling time; and.

\(E\left( {T_{4} } \right)\) = the expected time interval for searching for and correcting the assignable cause.

The expected cost of minimal corrective maintenance owing to FM1 is given as:

$$E\left[ {C_{CM} } \right]_{{FM_{1} }} = \left\{ {MTTR_{CM} \left[ {PR.C_{lp} + LC} \right] + C_{FCPCM} } \right\}*P_{{FM_{1} }} *N_{f} .$$
(21)

\(MTTR_{CM} \left[ {PR.C_{lp} + LC} \right]\) denotes the downtime cost owing to corrective maintenance.

The expected total cost of preventive maintenance action of the component will be

$$E\left[ {C_{PM} } \right] = \left\{ {MTTR_{PM} \left[ {PR.C_{lp} + LC} \right] + C_{FCPPM} } \right\}*\frac{{T_{eval} }}{{t_{PM} }} .$$
(22)

\(MTTR_{PM} \left[ {PR.C_{lp} + LC} \right] + C_{FCPPM}\) is the downtime cost owing to preventive maintenance.

The expected cost of process failure per cycle is denoted as:

$$\begin{aligned} E\left[ {C_{{process}} } \right] = & a E\left[ {C_{{false}} } \right] + E\left[ {Cost\;of\;Sampling} \right] \\ & + E\left[ {C_{{resetting}} } \right] + E\left[ {\left( {C_{{repair}} } \right)_{{FM_{2} }} } \right] \\ & + E\left[ {L_{{in\;control}} } \right] + E[(Cost\;of\;L_{{out\;of\;control}} )_{{\frac{M}{C}}} ] \\ & + E[(Cost\;of\;L_{{out\;of\;control}} )_{E} \\ \end{aligned}$$
(23)

The process failure is considered as a cyclic phenomenon because whenever a process goes into an out-of-control state, then the problem is rectified and the process is again restored to an in-control state. Therefore, a cycle is realized for a process from the occurrence of an assignable cause to return to the normal condition. This is known as a process failure cycle. If there are M process failure circles in a given period, then the expected cost of quality control due to process failure for the given period is:

$$E\left[ {TCQ} \right]_{process - failure} = E\left[ {C_{process} } \right]M,$$
(24)

where

$$M = \frac{{T_{eval} }}{{E\left[ {T_{cycle} } \right]}} .$$
(25)

3.4.3 Optimization model

The main objective of this model is to minimize the expected total cost per unit time (ETCPUT) of the system. Therefore, the desired objective function for this model is,

$$Minimize\,[ETCPUT] = \frac{{E\left[ {C_{CM} \left] {FM_1 + E} \right[C_{PM} } \right] + E\left[ {TCQ} \right]_{process - failure}}}{T_{eval}} .$$
(26)

where [ETCPUT] = f (nh, ηk, w,\(t_{pm}\), \(k_{1}\) and \(w_{1}\)) and \(T_{eval}\) is the time taken to complete the observation.

4 Results and discussions

4.1 Numerical illustrations

In this section, we present a numerical illustration to test the effectiveness of the model.

A component of a machine is considered here as the only operating part which is expected to operate six days a week. Each day has 3 shifts of 7 h. The required time for preventive maintenance is considered TPM = 7 time units with a restoration factor of RFPM = 0.6, which means, after preventive maintenance action, the component will recover 60% of its life and its age will be reduced by 40% of the age it was before PM. The required time for corrective maintenance is considered as TCM = 12 time units having a restoration factor of RFCM = 0. That means the component will not be able to get an extra life; rather it will be of the same age as it was when it failed. This is called minimal corrective maintenance. In this model, the time to failure for the component is obtained through the simulation used by Pandey et al. (2012).

An example is also provided in this section to compute joint ARL and to analyze the proposed integrated model. In this example, the process is considered to be running in an in-control state with mean µo and standard deviation o’. Here, the deviation of the process mean from the target value owing to external reasons is denoted by \(\delta_{E}\) and owing to the machine failure is denoted by \(\delta_{m/c}\). Similarly, the deviation of the process variance from the target value owing to external reasons is denoted by \(\delta 1_{E}\) and owing to the machine failure is denoted by \(\delta 1_{m/c}\), which occurs at random. The initial transition probability matrix for the CUSUM-m and CUSUM-S2 charts is given in Appendix 1. The initial values of necessary parameters used for the integrated model are given in Table 1.

Table 1 Initial values of parameters

After setting the values of the parameters, we attempted to solve our objective function as mentioned in the previous section. The proposed model is formulated in Matlab and is optimized using the Nelder Mead Downhill simplex algorithm and the Genetic Algorithm (Gen and Cheng 1999) methods.

Table 2 gives the optimization results obtained using the Nelder-Mead downhill simplex method. It is evident from Table 2 that each observation shows similar results, not only for the optimum cost but also for the values of 8 test variables. We need to set three different categories of parameters for the Nelder-Mead simplex method as shown in Appendix 4. The parameters are set in such a way that they can bring the maximum percentage of successful minimization of our problem.

Table 2 Optimization using the Nelder-Mead downhill simplex method

Table 3 gives the results obtained using the genetic algorithm. Stall generation (G) and mutation rate (m) have been changed to provide more insight into the results, and it is found that the results are similar to that of the former method. The parameter settings for the genetic algorithm have been shown in Appendix 4. Genetic algorithm mainly consists of two phases named the intensification and diversification phase which are characterized by appropriate mutation and cross-over operators (Fathollahi-Fard et al. 2018). We have selected the parameters based on a trial and error method to obtain the best performance of the operators.

Table 3 Optimization data using a genetic algorithm

Since the results of these two methods resemble each other, it can be concluded that the solutions obtained from the model attained global optimum values.

4.2 Comparative analysis of the research with existing literature

The final model of this study shows some variations from the existing literature. Most of the studies focusing on the joint integration of the quality control and maintenance policy often prefer the EWMA, \(\overline{X }\) chart, and the Shewhart p-chart to detect the process shift (Charongrattanasakul and Pongpullponsak 2011; Bahria et al. 2020). However, we have implemented the CUSUM chart and found that, for faster detection of the smallest shift, the CUSUM chart is much more effective than any other conventional quality chart. Moreover, we have discovered that the CUSUM chart with the VSIFT sampling policy can successfully trace a concurrent process with partial failure, unlike the fixed sampling policy as suggested by some researchers (Shrivastava et al. 2016; Li et al. 2018). Furthermore, most of the studies assumed that two or three standard deviations of the process should be considered as significant process variations and they often concentrated on the process mean to detect the assignable causes (X. Yang and Zeng 2018; Bahria et al. 2020). In contrast, we have considered that even one standard deviation of the process can be significant enough for the industries and focused on both mean and variance simultaneously. Consequently, our model can detect the assignable causes within one standard deviation of the process and thus decreases the rate of premature equipment failure.

5 Sensitivity analysis

A sensitivity analysis is done based on the example mentioned in Sect. 4 to analyze the effects of different design parameters on the total cost. The variation of the final results with the change of sensitive parameters is displayed in Table 4. In Table 4, the basic level shows the dataset utilized to solve the example as illustrated in Sect. 4.1. Levels 1 and 2 mentioned in Table 4 represent the values corresponding to these parameters with a 10% decrease and a 10% increase from the basic level, respectively. The optimum objective function values for these three different levels are shown in three different columns in Table 4. The final column of Table 4 indicates how sensitive the optimum value is to the changes in the values of these parameters.

Table 4 Dataset and the results of the sensitivity analysis

It is evident from Table 4 that four parameters (\(\delta_{E}\), \(\delta 1_{E}\), \({\uplambda }_{1}\) and \(A\)) are the major contributors to bring changes to the optimum cost value compared to other parameters which ultimately emphasizes the robustness of the proposed model. However, with the changes in the magnitude of these four parameters, the optimum cost value of the proposed model is changed significantly, which means the model is highly sensitive to changes to these four parameters.

To precisely analyze the effect of these four parameters, the experiment is performed using ½ fraction factorial analysis in Minitab, and the findings are illustrated in Table 5 and Fig. 1. From Table 5 and Fig. 1, it can be observed that a shift due to assignable causes or external reasons takes place in the CUSUM-m chart (\(\delta_{E}\)), as reflected in the change in \(\delta_{E}\). It is the most significant parameter even though \(A\) and \(\delta_{E} * A\) contribute to the change of optimum cost value, being much less than the change caused by \(\delta_{E} .\) It can be concluded that the proposed model is highly sensitive to the change in the value of \(\delta_{E} ,\) so the production process should be designed in a way to restrict \(\delta_{E} ,\) as much as possible to minimize the overall cost.

Table 5 Results from ½ fraction factorial analysis
Fig. 1
figure 1

Pareto chart of the standardized effects in ½ fraction factorial analysis

To observe the impact of the variables on the total cost, an analysis of some significant decision variables was conducted. In total, 16 data points were used for each of the independent variables keeping all other variables constant. In this dataset, eight points were taken following an increasing trend, and another eight points were taken following a decreasing trend of the respective variables to better understand the effect of the variable change on the total cost.

It is evident from Fig. 2 that with an increase in sample size, the cost increases, while with a decrease in sample size, the cost decreases. It could be expected that an increase in sample size will increase the sampling and inspection costs, and vice versa.

Fig. 2
figure 2

Relationship between sample size variability and cost with (a) increasing n, (b) decreasing n

As shown in Fig. 3, with the increase in the control limit coefficient (k), the cost decreases rapidly, but at a value of 6 or higher, the rate of cost reduction becomes steady. On the other hand, with a decrease of k, the cost increases rapidly at first before slowing down. One possible reason for this is, with a higher control limit coefficient value, the probability of rejection decreases. Since the value of standard deviation is fixed for this problem, with the increase of k, the area between the two control limits also increases, which, in turn, increases the probability of poor products are accepted. Thus, rejection, repair, and out-of-control costs also decrease.

Fig. 3
figure 3

The relationship between the control limit coefficient of CUSUM-m chart variability and cost with (a) increasing k, (b) decreasing k

On the contrary, with the decrease of k, the area between control limits decreases, which increases the probability of rejection and demands a high precision production system with little margin of error. Thus, the rejection cost and out-of-control cost also increase. However, while k increases, due to the application of TLF, the in-control cost also increases. That is why the rate of cost increase is much higher than that of cost decrease for the decrease and increase of k, respectively.

The change of total cost concerning the number of sampling sub-interval (\(\eta\)) has been shown in Fig. 4. For the values of 5 to 16, the cost remains almost unchanged and, beyond this range, the cost increases, and for values less than 5, it increases rapidly. As such, when the number of subintervals becomes high, the sampling cost increases, and when the number of \(\eta\) is very low, the out-of-control ARL increases, which, in turn, increases the probability of repair and rejection, as well as out-of-control costs.

Fig. 4
figure 4

Relationship between the number of subintervals between two consecutive sampling times’ variability and cost with (a) increasing \(\eta\), (b) decreasing \(\eta\)

Figure 5 shows that, with the increase and decrease of \(t_{pm}\), the cost decreases and increases, respectively, and the relationship is linear. It occurs since the increase of \(t_{pm}\) results in the decrease of preventive maintenance costs. However, it is also found that the change in cost is very low, so a small change of \(t_{pm}\) from its optimum value has little effect on the cost.

Fig. 5
figure 5

Relationship between preventive maintenance interval variability and cost with (a) increasing tpm, (b) decreasing tpm

Figure 6 shows that, with the increase of \(k_{1}\) the cost remains almost unchanged, and with the decrease of \(k_{1}\), the cost remains constant at first and starts increasing from 3.75 and continue increasing with a decrease in the value of \(k_{1}\). Due to the very small value of \(k_{1}\), the margin of error decreases at a very low level for sample variances, which, in turn, increases the probability of rejection and, thus, increases rejection costs.

Fig. 6
figure 6

The relationship between control limit coefficient of CUSUM-S2 chart variability and cost with (a) increasing k1, (b) decreasing k1

6 Conclusions

Combining the SPC and maintenance management policy in a production system can lead to considerable economic gains. To address this, an integrated economic model of quality control and maintenance policy was developed. A CUSUM chart was employed to examine the process mean and variance due to its’ absolute nature in detecting small shifts and for facilitating faster detection. Besides, deviation from a desired mean or variance deteriorates a product’s quality and incurs costs. Therefore, calculating joint ARL by integrating both mean and variance is justified. With the use of a VSIFT sampling policy, substantially faster detection of process shifts became possible and the probability of running the manufacturing process in an out-of-control was decreased. Moreover, incorporation of TLF and a modified Kapoor and Wang’s linear loss function (C. H. Chen and Chou 2005) in the model helped to minimize in-control and out-of-control costs for both the mean and variance, which performed better than conventional approaches. Industrial managers and practitioners may be benefitted from CUSUM-m and CUSUM-S2 charts as prescribed in this research by minimizing the joint optimization costs of quality and maintenance actions when the overall system is sensitive to a small shift from the desired value.

6.1 Research implications

The final model developed in this research has several implications for manufacturing industries, textile mills, and other industries that deal with machines and other related equipment regularly. Industrial managers can use it to monitor and control the dimensions of products and different process parameters to ensure the desired quality. Some research implications of this research are discussed as follows.

  • Process failure cost owing to external factors and equipment failure has a significant contribution to the total cost. So, the top management should inspect and regulate the machine conditions as needed and take maintenance actions accordingly. For example, in the textile industry, dyeing machines can be inspected during operations to control different parameters, such as squeezing pressure, mangle pressure, dye bath temperatures, etc.

  • Overestimation or underestimation of sampling intervals may drastically increase the total costs. So, the top management should have the ability to analyze the sample data and choose a suitable sampling technique accordingly.

  • Frequent preventive maintenance activities may be used as a tracking tool for transport vehicles and other devices to prevent transportation failures. Here, tpm can be used as the required test parameter to monitor delivery chains and control delivery times for different destinations. However, instead of using univariate CUSUM in this model, incorporating multivariate CUSUM charts would be more effective.

6.2 Limitations and future research direction

In this study, a single unit is considered to be manufactured with a single quality characteristic at a time. However, the production system of the current decade is complex and it often involves multi-unit batch production. Even a single unit consists of various quality attributes and so suitable testing and inspection methods need to be included for each quality trait to make the research more robust and extensive.

In the future, one possible extension of this research is to incorporate the use of the Multivariate CUSUM chart and Multivariate EWMA chart or the combined use of CUSUM and EWMA charts to monitor both mean and variation in addressing the quality control and preventive maintenance problems. Besides, the variable sampling rate (VSR) can also be an effective alternative to VSIFT as VSR allows both the sample size and the sampling interval to vary based on the previous values of the control statistics. Moreover, using non-normal quality characteristic distribution, especially Johnson distribution for quality variables (Pasha, Moghadam et al. 2018a, b) in conjunction with our proposed model can be a persuasive subject for further research.