1 Introduction

In statistical quality control, profile refers to the situations in which the quality of a process or product is represented by a functional relationship between a response and one or more predictors. When the response variable follows a distribution that belongs to the exponential family, it is called a generalized linear profile (GLP).

Because of the similarities of the profiles to the regression models, monitoring methods are typically based on regression modelling. Although the maximum likelihood (ML) approach is the traditional way of estimating parameters, alternative estimation methods are discussed in the literature due to the problems arising from the multicollinearity between the predictors. Some of these alternative estimation methods are integrated into profile monitoring.

A common way of monitoring GLPs with count response is the application of Poisson profile monitoring methods. Skinner et al. (2003) proposed a technique based on the likelihood ratio statistic and used the deviance residual from Poisson regression for monitoring purposes. Amiri et al. (2011) examined Hotelling’s \(T^2\) approach based on estimated model parameters, Asgari et al. (2014) developed a procedure that involves a mixture of log and square root link functions for the modelling process with count response. Asgari et al. (2014) monitored the process via proposed Shewhart and exponentially weighted moving average (EWMA) charts based on the standardized residuals. Qi et al. (2016) suggested a control chart based on weighted likelihood ratio statistics for monitoring GLPs. Later Marcondes Filho and Sant’Anna (2016) proposed a Shewhart-type residual control chart based on the principal component (PC) scores for the Poisson processes. The effect of the parameter estimator in Poisson profile monitoring is investigated by Maleki et al. (2019). Wen et al. (2021) proposed a regression-adjusted EWMA chart that adjusts and updates the expected values according to the situation to monitor the Poisson process. Mammadova and Özkale (2021a) studied the impact of the tuning parameter on ridge deviance-based control charts and Iqbal et al. (2022) presented homogeneously weighted moving average control charts where monitored observations are either deviance or standardized residuals of the generalized linear model (GLM).

The aforementioned approaches are extended for monitoring COM-Poisson profiles since a Poisson distribution becomes unsuitable when the data set shows signs of over or underdispersion. The flexible two-parameter COM-Poisson distribution was proposed by Conway and Maxwell (1962) to overcome the challenges caused by the difference between the mean and variance of the count data set. Park et al. (2018) adapted the principal components regression (PCR) approach for monitoring Poisson processes and constructed an r-chart for COM-Poisson profile monitoring. Park et al. (2020) examined COM-Poisson regression-based control charts and utilized randomized residuals to build a Shewhart chart. Rao et al. (2020) studied a mixed EWMA and cumulative sum (CUSUM) chart for COM-Poisson profile while Shewhart, EWMA, and CUSUM charts on the bases of ridge deviance residuals were developed by Mammadova and Özkale (2021b) for monitoring Poisson as well as the COM-Poisson profiles. Jamal et al. (2021) monitored real-time highway safety surveillance data set with CUSUM and EWMA charts and used randomized quantile and deviance residuals of the COM-Poisson regression model for the monitoring.

To address the multicollinearity problem, options other than ridge estimator were developed for GLMs. The iterative PCR estimator and the first-order approximated Liu estimator were proposed for GLMs respectively by Marx and Smith (1990) and Kurtoğlu and Özkale (2016). Özkale (2019) studied a combination of the Liu and PCR estimators, the r–d class estimator to minimize the effect of multicollinearity. Abbasi and Özkale (2021) developed the iterative r–k class estimator for GLMs by combining ridge and PCR estimators. The authors showed the superiority of proposed approaches to ML and ridge estimators in terms of the mean squared error criterion through the simulation study. Apart from the mentioned studies, several studies are specifically devoted to the examination of COM-Poisson distribution in the framework of GLM (see, Guikema and Goffelt (2008); Lord et al. (2008); Sellers and Shmueli (2010); Francis et al. (2012)). Reformulation of the distribution was suggested by Guikema and Goffelt (2008) to model the data set in order to prevail over the computational limitations. Characteristics of the COM-Poisson regression model, estimation, diagnostics, and interpretation were discussed by Guikema and Goffelt (2008); Sellers and Shmueli (2010) utilized the Bayesian technique for parameter estimation whereas Sellers and Shmueli (2010) and Francis et al. (2012) used unconstrained optimization. Abdella et al. (2019) introduced a penalized likelihood technique in the form of the ridge estimation, Mammadova and Özkale (2021b) provided an iterative closed-form solution to the ridge estimator given by Abdella et al. (2019); Sami et al. (2022b) proposed a COM-Poisson ridge regression estimator, which is a ridge estimator obtained at the final iteration of ML estimator. Recently, Sami et al. (2022a) proposed a modified one-parameter Liu estimator, and Rasheed et al. (2022) developed a modified jackknifed Liu-type estimator for COM-Poisson regression.

In this paper, we propose an extension of residual-based CUSUM and EWMA charts for identifying abnormalities in the COM-Poisson profile mean by using PCR and r–k class estimators. We intend to address a multicollinearity problem and optimize the monitoring process by reducing the dimension of the data set while detecting out-of-control observations as quickly as possible by utilizing CUSUM and EWMA charts based on the PCR and the r–k class estimator. Compared with the control charts previously proposed in the literature, our contributions can be summarized as follows:

  • CUSUM and EWMA control charts, which are used to detect small changes in the process, will become effective by combining these control charts with the r–k class estimator, which is an effective estimator in multicollinearity.

  • The CUSUM and EWMA control charts based on the r–k class estimator provide a general framework of CUSUM and EWMA control charts based on ridge and PCR estimators.

  • The CUSUM and EWMA control charts based on the r–k class estimator outperform the CUSUM and EWMA control charts based on the ML estimator in the presence of multicollinearity.

The following is the outline for this paper: Sect. 2 covers a brief description of the COM-Poisson distribution and estimation methods for the COM-Poisson regression model in the case of multicollinearity. Construction of the deviance-based CUSUM and EWMA charts for monitoring GLP with correlated predictors and dispersed count response is presented in Sect. 3. Section 4 provides a performance analysis of the proposed method through a simulation study. Section 5 delivers a real-life application that is carried out in the example of the SECOM data set. The concluding remarks are given in Sect. 6.

2 COM-Poisson modelling

2.1 COM-Poisson distribution

Introduced in the early 1960 s, the COM-Poisson distribution has attracted more attention from researchers in the recent past due to its flexibility. Characterizations of the distribution is thoroughly investigated by Boatwright et al. (2003); Shmueli et al. (2005); Li et al. (2020).

The probability density function of the COM-Poisson distribution is defined as

$$\begin{aligned} f(y_i)=\frac{\mu _i^{y_i}}{(y_i!)^v}\frac{1}{Z(\mu _i,v)},\quad y_i=0,1,2,\ldots ,\quad i=1,2,\ldots ,n \end{aligned}$$

where \(\mu _i\ge 0\) is the centering parameter, \(v \ge 0\) is the shape parameter, \(Z(\mu _i,v)=\sum _{s=0}^{\infty }{\frac{\mu _i^s}{(s!)^v}}\) is a normalization parameter and v is a dispersion parameter. Overdispersion in the data set is represented by \(v<1\), equidispersion by \(v=1\), and underdispersion by \(v>1\).

Based on the values of centering and shape parameter, COM-Poisson distribution converges to three different distributions. These are

  • Geometric distribution: \(v=0\) and \(\mu _i<1\);

  • Poisson distribution: \(v=1\);

  • Bernoulli distribution: \(v\rightarrow \infty \).

When \(v=0\) and \(\mu _i\ge 1\), the normalization parameter \(Z(\mu _i,v)\) does not converge and the distribution is undefined (Shmueli et al. 2005).

2.2 Parameter estimation in COM-Poisson regression

Let \(y_{n \times 1}=[y_1,y_2, \dots , y_n]'\) be the response vector with COM-Poisson distribution and \(X_{n \times p}=[x_1, x_2, \dots , x_n]'\), be the predictor matrix with \(x_i'=[x_{i1}, x_{i2}, \dots , x_{ip}]\), \(i=1,2, \dots , n\) being the i-th observation. Log-link function can be applied for modelling the relationship between y and X as \(\mu =\text {log}(X\beta )\) where \(\beta _{n \times 1}=[\beta _1,\beta _2,\dots , \beta _p]'\) is the vector of unknown parameters.

The \(\beta \) parameters are estimated with the help of the log-likelihood function of the COM-Poisson distribution that is provided by Sellers and Shmueli (2010) as

$$\begin{aligned} l(\beta ;y) =v\sum _{i=1}^n{y_i \text {log} (\mu _i)}-v\sum _{i=1}^n{\text {log}(y_i!)}-\sum _{i=1}^n{\text {log}\left( Z(\mu _i;v)\right) }. \end{aligned}$$
(1)

Sellers and Shmueli (2010) and Francis et al. (2012) proposed using the iteratively reweighted least squares (IRLS) technique which was presented by Nelder and Wedderburn (1972) and Wood (2017). Then, a closed form solution for the IRLS estimator known as the ML estimator was given by Mammadova and Özkale (2021b) as

$$\begin{aligned} \begin{aligned} {\hat{\beta }}^{(t)}_{ML}=\left( X'{\hat{W}}_{ML}^{(t-1)}X\right) ^{-1} X'{\hat{W}}_{ML}^{(t-1)}u_{ML}^{(t-1)} \end{aligned} \end{aligned}$$

where t is the iteration step, \(u_{ML}^{(t-1)}=X{\hat{\beta }}_{ML}^{(t-1)}+({\hat{W}}_{ML}^{(t-1)})^{-1}(y-{\hat{\mu }}^{(t-1)}_{ML})\) is the working response, and \({\hat{W}}_{ML}\) is the estimated weight matrix evaluated at \({\hat{\beta }}^{(t-1)}_{ML}\). After the successful iterations, \({\hat{\beta }}_{ML}\) at convergence is obtained as \({\hat{\beta }}_{ML}=\left( X'{\hat{W}}_{ML}X\right) ^{-1}X'{\hat{W}}_{ML}u_{ML}\).

The weighted matrix for the COM-Poisson model was first introduced by Sellers and Shmueli (2010) and later elaborated by Francis et al. (2012) as \(W=\text {diag}(w_{ii})\),Footnote 1\(i=1,2,\ldots ,n\) where

$$\begin{aligned} \begin{aligned} w_{ii}&= \sum _{s=0}^{\infty }{\frac{\frac{v(v-1)s^2(\text {exp}(\mu _i ))^{2s}\left( \frac{\left( \text {exp}(\mu _i )\right) ^s}{s!}\right) ^{v-2}}{(s!)^2}+\frac{vs^2(\text {exp}(\mu _i ))^{s}\left( \frac{(\text {exp}(\mu _i ))^s}{s!}\right) ^{v-1}}{s!}}{\sum _{s=0}^{\infty }{\left( \frac{(\text {exp}(\mu _i ))^s}{s!}\right) }^v}}\\&\quad -\sum _{s=0}^{\infty }{\frac{\left[ \frac{vs(\text {exp}(\mu _i ))^{s}\left( \frac{(\text {exp}(\mu _i ))^s}{s!}\right) ^{v-1}}{s!}\right] ^2}{\sum _{s=0}^{\infty }{\left[ \left( \frac{(\text {exp}(\mu _i ))^s}{s!}\right) ^v\right] ^2}}}. \end{aligned} \end{aligned}$$
(2)

In the case of uncorrelated predictors, it is well known that the ML estimator is a reliable method. However, multicollinearity poses challenges with the computation of the inverse matrix of \(X'WX\), which is essential for ML estimation. Therefore, alternative approaches were proposed.

One of these alternatives is the iterative ridge estimator presented by Mammadova and Özkale (2021b) in COM-Poisson regression as

$$\begin{aligned} \begin{aligned} {\hat{\beta }}^{(t)}_{ridge}=&(X'{\hat{W}}_{ridge}^{(t-1)}X+kI_p)^{-1} X' {\hat{W}}_{ridge}^{(t-1)}u_{ridge}^{(t-1)} \end{aligned} \end{aligned}$$

where t refers the iteration step, \(u_{ridge}^{(t-1)}=X{\hat{\beta }}^{(t-1)}_{ridge}+({\hat{W}}_{ridge}^{(t-1)})^{-1}(y-{\hat{\mu }}^{(t-1)}_{ridge})\) is the working response, \({\hat{W}}_{ridge}\) is the weight matrix in Eq. (2) evaluated at \({\hat{\beta }}^{(t-1)}_{ridge}\), and k is the tuning parameter.Footnote 2 The ridge estimator of \(\beta \) at convergence has the form of \({\hat{\beta }}_{ridge}=\left( X' {\hat{W}}_{ridge}X+kI_p\right) ^{-1}\) \(X'{\hat{W}}_{ridge}u_{ridge}\).

Another suitable alternative to the ML estimation in case of the multicollinearity is the PCR estimation. Unlike the ridge estimator, the PCR estimator does not require a tuning parameter; instead, it addresses the multicollinearity problem by generating a new set of uncorrelated variables using the singular value decomposition (SVD) technique discussed in GLMs by Smith and Marx (1990), Aguilera et al. (2006), Özkale and Arıcan (2016), Abbasi and Özkale (2021). Jolliffe (2002) stated that the SVD is effective in terms of both computation and interpretation in PCR. They also emphasized the importance of standardizing the predictors to zero mean and unit variance to eliminate scale dependence of PCs.

In brief, SVD can be described as follows: Let the linear predictor \(\eta \) be expressed as \(\eta =X\beta = XTT'\beta =X^*\omega \) where \(X^*=XT\) and \(\omega =T'\beta \). T is an orthogonal matrix through \(T'X'{\hat{W}}_{ML}XT=\Lambda = \text {diag} (\lambda _1,\lambda _2,\dots , \lambda _p)\) and \(\lambda _1=\lambda _{max} \ge \lambda _{2} \ge \dots \lambda _{p}=\lambda _{\text {min}}\) are the eigenvalues of the \(X'{\hat{W}}_{ML}X\) matrix. \(X^*\) matrix can be partitioned as \(X^*=[X^*_r \quad X^*_{p-r}],\) where \(X^*_r=XT_r \quad (r\le p) \) is the matrix of the PCs that will be retained in the model. \(\omega \), T, and \(\Lambda \) are partitioned as \(\omega = [\omega _r \quad \omega _{p-r}], \quad T= [T_r \quad T_{p-r}],\) and \(\Lambda =[\Lambda *_r, \Lambda *_{p-r}]\) where \(\Lambda *_r=X^{*'}_r {\hat{W}}_{ML} X^*_r=\text {diag} (\lambda _1,\dots ,\lambda _r)\), \(\Lambda *_{p-r}= X_{p-r}^{*'}{\hat{W}}_{ML}X_{p-r}^*=\text {diag} (\lambda _{r+1},\dots ,\lambda _p)\) and r is to the number of PCs that will be included the model.

By using SVD, Abbasi and Özkale (2021) obtained the PCR and r–k class estimators in GLMs which were originally introduced by Marx and Smith (1990); Baye and Parker (1984) in linear regression. We adjust the PCR and r–k class estimators for the COM-Poisson model as

$$\begin{aligned} {\hat{\beta }}_{PCR}^{(t+1)}= T_r\left( T_r'X'{\hat{W}}_{ML}XT_r\right) ^{-1}T_r'X'{\hat{W}}_{ML}u_{PCR}^{(t)} \end{aligned}$$
(3)

where \(u_{PCR}^{(t)}=XT_rT_r'{\hat{\beta }}^{(t)}_{PCR}+\left( {\hat{W}}_{ML}\right) ^{-1}\left( y-\mu _{PCR}^{(t)}\right) \) is evaluated at \({\hat{\beta }}_{PCR}^{(t)}\) and

$$\begin{aligned} {\hat{\beta }}_{r-k}^{(t+1)}= T_r\left( T_r'X'{\hat{W}}_{ML}XT_r+kI_r\right) ^{-1}T_r'X'{\hat{W}}_{ML}u_{r-k}^{(t)} \end{aligned}$$
(4)

where \(u_{r-k}^{(t)}=XT_rT_r'{\hat{\beta }}^{(t)}_{r-k}+\left( {\hat{W}}_{ML}\right) ^{-1}\left( y-\mu ^{(t)}_{r-k}\right) \).

Abbasi and Özkale (2021) can be examined for the detailed information on obtaining the PCR and r–k class estimators in GLMs. To summarize, the general idea behind the PCR and r–k class estimators is that the linear predictor is reduced another linear predictor by deleting the PCs having large variances. After then, the IRLS idea is applied on this reduced linear predictor, and the resulted estimator is transformed to the original parameter space to give the PCR estimator in GLMs. On the other hand, the r–k class estimator is obtained by applying the ridge idea on the reduced linear predictor and transforming the resulted estimator back to the original parameter space. The notion in Eqs. (3) and (4) is that both estimators use the same number of PCs. The main difference is that the r–k class estimator lowers the degree of multicollinearity a little bit more and uses the tuning parameter for this purpose.

The PCR and r–k class estimators at convergence are respectively as

$$\begin{aligned} {\hat{\beta }}_{PCR}= T_r\left( T_r'X'{\hat{W}}_{ML}XT_r\right) ^{-1}T_r'X'{\hat{W}}_{ML}u_{PCR} \end{aligned}$$

and

$$\begin{aligned} {\hat{\beta }}_{r-k}= T_r\left( T_r'X'{\hat{W}}_{ML}XT_r+kI_r\right) ^{-1}T_r'X'{\hat{W}}_{ML}u_{r-k}. \end{aligned}$$

2.2.1 Tuning parameter selection

The tuning parameter determines the effectiveness of the ridge estimator as well as the r–k class estimator since the increasing value of the tuning parameter pulls the estimator further from its actual value. Studies conducted by Hoerl and Kennard (1970), Hoerl et al. (1975), Lawless and Wang (1976), Kibria (2003), Alkhamisi et al. (2006), Alkhamisi and Shukur (2007), Månsson and Shukur (2011), Kibria et al. (2012), Algamal (2018), Zaldivar (2018) and others cover a wide range of calculating approaches for the tuning parameter of the ridge estimator in linear regression and GLMs.

Abbasi and Özkale (2021) adapted the tuning parameter selection method proposed by Hoerl and Kennard (1970) in linear regression for the estimation of the r–k class estimator in the GLMs. We adjust the same tuning parameter for the COM-Poisson regression models and obtain

$$\begin{aligned} k =\frac{rv}{{\hat{\beta }}^ {{(0)}^{'}}T_rT_r'{\hat{\beta }}^ {(0)}} \end{aligned}$$
(5)

where \({\hat{\beta }}^{(0)}\) is the initial value of the \(\beta \) parameter which is usually taken as the ordinary least squares (OLS) estimator.

2.3 Deviance residuals

The deviance residual represents the difference between the predicted value of a given point i (the \(E(y_i)\)), which is denoted as \({\hat{\mu }}_i\) and the actual response \(y_i\). Dobson (2002) defined the deviance residual for the i-th observation of the GLMs as the square root of the difference between the log-likelihood of the fitted and saturated models multiplied by two. The sign of the difference between the actual response and the fitted response determines the increase and decrease in deviance residual.

The i-th deviance residual based on ML, ridge, PCR, and r–k class estimators can be expressed by using the log-likelihood functions of the fitted model provided in Eq. (1) as

$$\begin{aligned} d_{est,i}=&\text {sign}(y_i-\widehat{E(y_i)}) \times \sqrt{2 \left[ l(y_i, y_i; {\hat{v}}) - l(\widehat{E(y_i)},y_i; {\hat{v}}) \right] } \end{aligned}$$
(6)

where \(l(\widehat{E(y_i)},y_i; {\hat{v}})\) and \(l(y_i, y_i; {\hat{v}})\) are the log-likelihood functions of the fitted and saturated models, respectively, \(\widehat{E(y_i)}= {\hat{\mu }}_{est,i}^{1/v}-\frac{v-2}{2v}\) and the subscript est is used to designate the method employed for modeling, i.e. \(est\equiv \{ML, ridge, PCR, r-k\}\) and \({\hat{\mu }}_{est,i}\) is the fitted value obtained by using the corresponding estimator at convergence. Deviance residuals in Eq. (6) are constrained such that \(y_i > c\) for \(v < 1/\left( 2c+1\right) \), \(c \in N^+\).

3 Monitoring of COM-Poisson profiles

Page (1954) proposed the CUSUM chart which utilizes the past information available from previously plotted points for effective monitoring. Later, Roberts (1959) introduced the EWMA chart, an alternative to the CUSUM chart. EWMA charts also accumulate current and past information from the observations that make the control chart sensitive to small shifts. Since then, several modifications and extensions of these control charts are investigated. Montgomery (2020) describes the CUSUM and EWMA charts as alternatives to the Shewhart chart where detecting small shifts is significant.

Mammadova and Özkale (2021b) extended the traditional CUSUM and EWMA charts to the charts by using both the deviance and ridge deviance residuals to define control chart statistics. Mammadova and Özkale (2021b) gave the formulas in Table 1 to construct CUSUM and EWMA charts based on deviance and ridge deviance residuals. In Table 1, deviance-based charts are referred to as CUSUM\(_{ML}\) and EWMA\(_{ML}\), whereas ridge deviance-based charts as CUSUM\(_{ridge}\) and EWMA\(_{ridge}\) to prevent confusion. \({\hat{\mu }}_{ML}^{0}\) and \({\hat{\mu }}_{ridge}^{0}\) are the in-control means, \({\hat{\sigma }}_{ML}^{0}\) and \({\hat{\sigma }}_{ridge}^{0}\) are the in-control standard deviations of deviance and ridge deviance residuals, respectively. K and h are the reference and the decision value of the CUSUM chart, respectively. \(0<\lambda \le 1\) refers to the smoothing parameter of the EWMA chart while L is the EWMA control limit constant.

Table 1 Construction of the deviance and ridge-deviance residual-based CUSUM and EWMA control charts

Although the ridge estimator is a frequently used estimator in the multicollinearity problem, the r–k class estimator obtained by combining the ridge and PCR estimators gives better results than the ridge estimator when there is a multicollinearity problem. For this reason, we propose a new control chart alternative to the ridge deviance based chart of Mammadova and Özkale (2021b) by defining the CUSUM and EWMA chart statistics based on PCR and r–k deviance residuals which we denote respectively as CUSUM\(_{PCR}\), EWMA\(_{PCR}\), CUSUM\(_{r-k}\), EWMA\(_{r-k}\). The difference between these charts with that of the one given by Mammadova and Özkale (2021b) is that the chart introduced by Mammadova and Özkale (2021b) uses the ridge deviance residuals while the newly proposed charts use respectively the PCR and r–k deviance residuals. So we will get control charts based on PCR deviance residual that is as good as the control charts based on the ridge deviance residual but easier to calculate because they do not depend on the tuning parameter. Furthermore, we will get control charts based on the r–k class estimator, which will give better results than the control charts depending on the PCR and ridge deviance residuals. Thus, we will have improved the control charts proposed by Mammadova and Özkale (2021b) to perform better in the case of multicollinearity.

3.1 The newly proposed control charts based on PCR and r–k deviance residuals

In this subsection, we define CUSUM and EWMA control charts based on PCR and r–k class deviance residuals.

We give the CUSUM\(_{PCR}\) and CUSUM\(_{r-k}\) charts’ statistics respectively as

$$\begin{aligned} \begin{aligned} C_{PCR,i}^-= \text {min}[0, {\hat{\mu }}^0_{PCR}-K-d_{PCR,i}+C_{PCR,i-1}^-],\quad i=1,2,\ldots ,n\\ C_{PCR,i}^+= \text {max}[0, d_{PCR,i}-{\hat{\mu }}^0_{PCR}-K+C_{PCR,i-1}^+ ],\quad i=1,2,\ldots ,n \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} C_{r-k,i}^-= \text {min}[0, {\hat{\mu }}^0_{r-k}-K-d_{r-k,i}+C_{r-k,i-1}^-],\quad i=1,2,\ldots ,n\\ C_{r-k,i}^+= \text {max}[0, d_{r-k,i}-{\hat{\mu }}^0_{r-k}-K+C_{r-k,i-1}^+ ],\quad i=1,2,\ldots ,n \end{aligned} \end{aligned}$$

where \({\hat{\mu }}^0_{PCR}\) and \({\hat{\mu }}^0_{r-k}\) correspond to the in-control mean of the PCR and r–k deviance residuals, \({\hat{\sigma }}^0_{PCR}\) and \({\hat{\sigma }}^0_{r-k}\) correspond to the in-control standard deviation of the PCR and r–k deviance residuals, respectively. The initial values are taken as zero: \(C_{PCR,0}^-=C_{PCR,0}^+=0\), \(C_{r-k,0}^-=C_{r-k,0}^+=0\).

The control limits for the CUSUM\(_{PCR}\) and CUSUM\(_{r-k}\) charts are as follows

$$\begin{aligned} CL_{PCR}=\pm h{\hat{\sigma }}^0_{PCR} \end{aligned}$$
(7)
$$\begin{aligned} CL_{r-k}=\pm h{\hat{\sigma }}^0_{r-k} . \end{aligned}$$
(8)

We define control chart statistics for the EWMA chart based on the PCR deviance residuals (EWMA\(_{PCR}\)) and EWMA chart based on the r–k deviance residuals (EWMA\(_{r-k}\)) as

$$\begin{aligned} z_{PCR,i}=\lambda d_{PCR,i}+(1-\lambda )z_{PCR,i-1},\quad i=1,2,\ldots ,n \end{aligned}$$
(9)
$$\begin{aligned} z_{r-k,i}=\lambda d_{r-k,i}+(1-\lambda )z_{r-k,i-1},\quad i=1,2,\ldots ,n \end{aligned}$$
(10)

where the initial values \(z_{PCR,0}\) and \(z_{r-k,0}\) are the in-control mean of the corresponding deviance residual.

The control limits for EWMA\(_{PCR}\) and EWMA\(_{r-k}\) are respectively as

$$\begin{aligned} CL_{PCR}={\hat{\mu }}^0_{PCR} \pm L{\hat{\sigma }}^0_{PCR} \sqrt{(\lambda /(2-\lambda ))(1-(1-\lambda ))^{2i}},i=1,2,\ldots ,n \end{aligned}$$
(11)
$$\begin{aligned} CL_{r-k}={\hat{\mu }}^0_{r-k} \pm L{\hat{\sigma }}^0_{r-k} \sqrt{(\lambda /(2-\lambda ))(1-(1-\lambda ))^{2i}},i=1,2,\ldots ,n . \end{aligned}$$
(12)

Since the combinations of K and h affect the performance of the control charts, the selection of K and h values for the CUSUM chart as well as the \(\lambda \) and L combinations for the EWMA chart is a sensitive task. It is usually recommended to select the values on the basis of the pre-specified in-control average run length value.

The run length (RL) is the number of observations until the first out-of-control observation is identified by the control chart. The average of the run length (ARL) is a commonly used metric to evaluate the performance of a control chart. This metric is classified as in-control ARL (\(ARL_0\)) and out-of-control ARL (\(ARL_1\)). \(ARL_0\) is expected to be significantly large, while it is desirable for \(ARL_1\) to be small.

4 Simulation study

In this section, we present a simulation study to compare the performance of residual-based control charts under different settings. The simulation study is carried out in R software. The simulation study consists of two stages:

  • Stage 1: In this stage, the goal is to obtain the control chart constants that meet desired in-control \(ARL_0\).

  • Stage 2: The objective of this stage is to evaluate and compare the performance of the control charts.

A description of each stage is explained on the algorithms given by Fig. 1 and details are given in Sects. 4.1 and 4.2.

Fig. 1
figure 1

Simulation algorithms for Stage 1 and Stage 2

4.1 Determination of control chart constants

For a fixed number of observations and correlation levels, different combinations of predictor number, dispersion levels, shift type, and shift sizes are considered for the simulation study and their values are given in Table 2.

Table 2 Simulation settings

Detailed information about the simulation settings are as follows:

  1. i)

    The number of observations, dispersion and correlation levels are fixed as given in Table 2. The \(\beta \) vector is set with elements \(\beta _i=1, i=1,2,\dots ,p\). \(ARL_0\) is determined to be considered approximately 370.

  2. ii)

    The correlated predictor matrix \(X_{n \times p}\) is generated using the formula presented by McDonald and Galarneau (1975):

    $$\begin{aligned} x_{ij}=(1-\rho ^2)^{1/2}s_{ij}+\rho s_{ip+1}, i=1,2,\dots ,n, j=1,2,\dots ,p \end{aligned}$$

    where \(s_{ij}\) is a random standard normal number, \(\rho ^2=0.95\) is the desired correlation between any two predictors. Then predictors are standardized by unit length standardization.

  3. iii)

    The control chart constants are initialized as \(h=0\) for CUSUM and \(L=0\) for EWMA charts. Montgomery (2020) has mentioned that, \(K =0.5\) with combination corresponding h, generally provide a CUSUM chart with good ARL against \(1\sigma \) shift. Lucas and Saccucci (1990) have presented optimal combinations of different \(\lambda \) and L that effectively minimize the \(ARL_1\) of the EWMA chart. Montgomery (2020), on the other hand, emphasized that \(0.05\le \lambda \le 0.25\) perform efficiently in practice. Based on these results, we set the reference value for CUSUM charts as \(K=0.5\) and the smoothing parameter for EWMA charts as \(\lambda =0.05, 0.1, 0.2\) to see the impact of smoothing parameters on the overall performance.

  4. iv)

    The COM-Poisson distributed response variable y is generated as \(y \sim COM-Poisson(E(y),v)\), where \(E(y)=\text {exp}(X\beta )\) is the mean function.

  5. v)

    Log-link function is used to model the relationship between the predictor matrix and response variable.

  6. vi)

    To calculate the ML and ridge parameters of the COM-Poisson regression model, OLS estimator \({\hat{\beta }}^{(0)}=(X'X)^{-1}X'y\) is chosen as initial value, while for the PCR and r-k class estimators \({\hat{\beta }}^{(0)}=T_rT_r'(X'X)^{-1}X'y\) is chosen as the initial value. The convergence criterion for the iteration is chosen as \(\Vert \hat{\beta }^{(t)}-{\hat{\beta }}^{(t-1)}\Vert \le 1\times 10^{-6}\).

  7. vii)

    The number of PCs is determined by the percentage of total variation (PTV) criterion which is defined by Jolliffe (2002) as the \(PTV=\left( \frac{\sum _{i=1}^{r}{\hat{\zeta }}_{i}}{\sum _{i=1}^{p}{\hat{\zeta }}_{i}} \right) \times 100\) where \({\hat{\zeta }}_i\), \(i=1,\dots ,p\) are the eigenvalues of the \(X'WX\) matrix. This criterion determines the required number of PCs that have the highest variance for which the chosen percentage, in our case \(95\%\) is exceeded.

  8. viii)

    The tuning parameter given in Eq. (5) is used in obtaining the ridge and r-k class estimators.

  9. ix)

    The deviance residuals are calculated as in Eq. (6). In obtaining the deviance residuals for the saturated model, we set the normalization parameter Z equal to one when \(v<1\) and \(y_i=0, i=1,2,\dots , n\), which was suggested by Sellers and Shmueli (2010).

  10. x)

    We conduct the simulation study to calculate control chart statistics, then compare each control chart statistic to the control limit of the corresponding chart and obtain the RL.

  11. xi)

    We reset the counter to zero and repeat the steps (iv)–(x) 100 times.

  12. xii)

    We calculate the average of RL, which is \(ARL_0\). If the desired \(ARL_0\) is not obtained, we increase the control chart constants (h for CUSUM charts, L for EWMA charts) by 0.001 and repeat the steps (iv)–(xi) until the approximate value of desired \(ARL_0 \approx 370\) is determined.

Stage 1 results of the simulation study are given in Table 3. In Table 3, the control chart constants for CUSUM and EWMA charts corresponding to the specified values of n, p, \(\textit{v}\), K, \(\lambda \), \(ARL_0\) are presented. Then the corresponding in-control mean and in-control deviation are computed that serve to compute the control limits. The control limits are then calculated by using the constants in Table 3 which will be used for Stage 2.

Table 3 CUSUM and EWMA chart constants and the corresponding \(ARL_0\) values, in-control mean and in-control standard deviations

4.2 Performance analysis

In Stage 2, control charts are tested by adding a previously given shift to the response variable. Two types of the shifts given in Table 2 are used for the simulation. Additive shift is formulated as \(\mu _1=\mu _0+\delta {\hat{\sigma }}^0_{ML}=\text {exp}(X\beta )+\delta {\hat{\sigma }}^0_{ML}\) and multiplicative shift has the form of \(\mu _1=\text {exp}(X(\beta +\delta {\hat{\sigma }}^0_{ML}))\) where \({\hat{\sigma }}^0_{ML}\) is the in-control standard deviation of deviance residual.

The performance of the control charts is evaluated based on the RL. In Stage 2, we use \(ARL_1\) and standard deviation of run-length (SDRL) to assess the performance of control charts. The chart with the lowest \(ARL_1\) and SDRL values is considered as the best.

Stage 2 results of the simulation study are presented in Appendix as Tables 14, 15, 16, and 17, and the key findings are given in the following paragraphs based on the shift type for each control chart:

4.2.1 Results for CUSUM charts

  • \({\underline{Additive\, \,shift, \,\,in \,\,terms \,\,of \,\,ARL_1\,\,}}\)

    \(\circ \):

    In all combinations, the CUSUM\(_{r-k}\) chart outperformed the other control charts. For most of the combinations, CUSUM\(_{ML}\) shows the worst performance. \(ARL_1\) values of the CUSUM\(_{PCR}\) chart are followed by either CUSUM\(_{r-k}\) or CUSUM\(_{ridge}\) chart. The CUSUM\(_{PCR}\) chart has the highest \(ARL_1\) values, especially when \(p=7, 10,\) and \(\delta \ge 2\) for the overdispersed values. CUSUM\(_{ridge}\) has similar performance to CUSUM\(_{PCR}\), it mostly outperforms CUSUM\(_{ML}\) except for a few combinations when \(p=4,7\) and \(v=1, 1.5\).

    \(\circ \):

    At fixed shift size, as the number of predictors increases, the performance of all CUSUM charts improves for \(v<1\). For fixed \(\delta \) and p, \(ARL_1\) values decrease as dispersion level increases. The performance of the CUSUM charts increases as the shift size increases in all combinations of \(p=4,7\). When \(p=4\), an analogous increase in performance is valid for \(\delta \ge 1\).

    \(\circ \):

    An increase in p has a positive effect on performance by reducing the \(ARL_1\) values. Also, there is an increase in the performance of CUSUM charts when \(p=7\) for all \(\delta \) and when \(p=4, 10\) for \(\delta \ge 0.5\).

  • \({\underline{Additive \,\,shift, \,\,in \,\,terms \,\,of \,\,SDRL}}\)

    \(\circ \):

    The SDRL results of the control charts exhibit an essentially similar pattern to those of \(ARL_1\). For fixed p and \(\delta \), the SDRL values of all CUSUM charts decrease as the value of v increases, and for fixed p and v, the SDRL values of all CUSUM charts increase as the shift size increases. CUSUM\(_{r-k}\) outperforms other CUSUM charts in most combinations. In rare cases, CUSUM\(_{r-k}\) is exceeded by CUSUM\(_{ridge}\) or CUSUM\(_{PCR}\) charts.

  • \({\underline{Multiplicative\,\, shift, \,\,in \,\,terms \,\,of \,\,ARL_1\,}}\)

    \(\circ \):

    With very little variation in the \(ARL_1\) values, the performances of CUSUM charts are similar to the results when the response is additively shifted. CUSUM\(_{r-k}\) outperforms other CUSUM charts in all cases. CUSUM\(_{ridge}\) and CUSUM\(_{PCR}\) charts also show similar performance. The exceptions are \(p=7\) and \(v=1.5\) for CUSUM\(_{ridge}\) and \(p=4, 7\), and \(v=0.75\) for CUSUM\(_{PCR}\). In these combinations, \(ARL_1\) values of the control charts are the highest.

    \(\circ \):

    The increased dispersion level results in increased \(ARL_1\) values across all CUSUM charts regardless of the number of predictors and shift size.

    \(\circ \):

    Regardless of the dispersion level, the performance of CUSUM charts increases as the number of predictor increases. For \(v=0.75\), with very minor changes, performances of the CUSUM charts increase as shift size increases, except for \(p=7\).

  • \({\underline{Multiplicative \,\,\,shift, \,\,in \,\,terms \,\,of \,\,SDRL}}\)

    \(\circ \):

    Like the \(ARL_1\) results, the CUSUM\(_{r-k}\) chart outperforms at least one of the CUSUM charts in terms of SDRL and is often followed by either CUSUM\(_{ridge}\) or CUSUM\(_{PCR}\). Both CUSUM\(_{ridge}\) and CUSUM\(_{PCR}\) often have smaller SDRL values compared to CUSUM\(_{ML}\). The SDRL values of CUSUM charts increase as v increases. In general, the SDRL performance of the control charts tends to improve as the number of predictor increases. This is particularly evident when the response is underdispersed.

4.2.2 Results for EWMA charts

  • \({\underline{Additive \,\,shift, \,\,in \,\,terms \,\,of \,\,ARL_1}}\)

    \(\circ \):

    In the sense of \(ARL_1\), EWMA\(_{r-k}\) surpasses other EWMA charts regardless of the smoothing parameter. In some combinations of \(v \ge 1\) with \(p=4,7\), EWMA\(_{r-k}\) (\(\lambda =0.05\)) outperformed by other EWMA charts. On the contrary, EWMA\(_{r-k}\) (\(\lambda =0.1\)) and EWMA\(_{r-k}\) (\(\lambda =0.2\)) are the best or second-best control chart. In cases where either EWMA\(_{r-k}\) (\(\lambda =0.1\)) or EWMA\(_{r-k}\) (\(\lambda =0.2\)) is the second-best, EWMA\(_{PCR}\) is outperformed by all EWMA type control charts. EWMA\(_{ridge}\) typically follows EWMA\(_{r-k}\) except for \(\lambda =0.1, p=10\) and \(\lambda =0.2, p=4,7\), both for the dispersion level is greater than or equal to one.

    \(\circ \):

    For cases where the response is overdispersed, the performances of EWMA charts increase as the shift size increases, except for \(p=7\). Apart from EWMA\(_{r-k}\), \(ARL_1\) values increase as the smoothing parameter for the EWMA chart gets larger, regardless of p, v, and \(\delta \).

    \(\circ \):

    While EWMA(\(\lambda =0.05\)) charts perform better for \(p=4\) with a significantly small shift size, EWMA charts with \(p=10\) perform much better in remaining combinations for \(v=1\) in terms of \(ARL_1\). An increase in the performance of EWMA charts in terms of \(ARL_1\) is especially noticeable when \(p=10\). An increase in the number of predictors increases the \(ARL_1\) performance of the EWMA charts when \(\delta \le 2\).

  • \({\underline{Additive \,\,shift, \,\,in \,\,terms \,\,of \,\,SDRL}}\)

    \(\circ \):

    In terms of SDRL, EWMA\(_{r-k}(\lambda =0.05)\) commonly surpasses other EWMA charts, although EWMA\(_{ML}\) has the highest results compared to others. The performance of EWMA\(_{r-k}\) is followed by that of EWMA\(_{ridge}\) or EWMA\(_{PCR}\) chart. The number of cases where EWMA\(_{ML}\) has the lowest SDRL result values is high for \(\lambda =0.1\) compared to the result values of EWMA charts with \(\lambda =0.05, 0.2\). Based on the SDRL values, the performances of the control charts decrease as the dispersion increases and increase as the shift size increases in fixed \(\delta \) and \(p=10\) combinations, regardless of the smoothing parameter. With the increase of the number of predictors and \(\delta \ge 0.75\), the result for EWMA (\(\lambda =0.05\)) charts becomes smaller. When the response is overdispersed and \(\lambda =0.1, 0.2\), the performance of the EWMA charts gets better as p increases. These results are inconsistent when \(v\le 1\).

  • \({\underline{Multiplicative \,\,shift, \,\,in \,\,terms \,\,of \,\,ARL_1}}\)

    \(\circ \):

    Even though the performance of EWMA charts is not particularly affected on a pattern-wise by increasing the value of the smoothing parameter, EWMA charts with \(\lambda =0.2\) often have better results than corresponding charts with \(\lambda =0.05, 0.1\). However, when \(\lambda =0.1\), the number of cases where EWMA\(_{ML}\) outperforms other charts is high. Some combinations with \(\lambda =0.2\), EWMA\(_{ridge}\) and EWMA\(_{PCR}\) charts have the same performance. When \(\lambda =0.05, 0.1\), EWMA\(_{r-k}\) is either the best or the second-best control chart in terms of \(ARL_1\). This control chart has the highest results when \(\lambda =0.2\), \(v=1.5\), \(p=4,7\). EWMA\(_{ridge}(\lambda =0.05)\) is frequently positioned second or third, except where \(p=10\), \(v=0.75\), and \(\delta \le 0.5\). EWMA\(_{PCR}\) (\(\lambda =0.05\)), on the other hand, has high \(ARL_1\) values when \(p=7\), the response is overdispersed and \(p = 7, 10\), when the response is underdispersed. Regardless of the smoothing parameter, EWMA\(_{ML}\) is the control chart with the worst performance.

    \(\circ \):

    For EWMA charts with \(\lambda =0.05\), \(ARL_1\) values grow as the value of the v increases when \(p=4,10\). When \(\lambda =0.1, 0.2\), this pattern is observed in all combinations regardless of the p.

    \(\circ \):

    Regardless of the v value, the performance of the EWMA charts with \(\lambda =0.1, 0.2\) increases as the number of predictors increases. Moreover, the performance of the EWMA charts increases as shift size increases. When \(p=10\), the \(ARL_1\) values of all EWMA charts get larger as the dispersion level goes from 0.75 to 1.5.

  • \({\underline{Multiplicative \,\,shift, \,\,in \,\,terms \,\,of \,\,SDRL}}\)

    \(\circ \):

    While in several combinations of simulation inputs, either EWMA\(_{r-k}\) (\(\lambda = 0.05\)) or EWMA\(_{r-k}\) (\(\lambda =0.1\)) has high values, it exceeds at least two EWMA charts in terms of SDRL. Except for a few combinations when \(p=10\), EWMA\(_{r-k}\) outperforms other EWMA charts when \(lambda=0.2\). In contrast, EWMA\(_{ML}\)(\(\lambda =0.05\)) and EWMA\(_{ML}\)(\(\lambda =0.2\)) provide the poorest outcomes in the majority of combinations. EWMA\(_{PCR}\)(\(\lambda =0.2\)) outperforms at least one EWMA chart in terms of SDRL. For fixed v and \(\delta \), SDRL values of EWMA charts decrease as the number of predictors increase. An increase in shift size also increases the SDRL performance. This change has a greater influence on the outcomes, especially when \(\lambda =0.2\) and \(p=10\). An increasing value of v causes an increase in SDRL values for \(p = 10\).

The control charts with the best performance in terms of \(ARL_1\) and SDRL are presented in Table 4. Table 4 can be used as a reference to determine the best control chart for each scenario considered in the simulation.

Table 4 The control charts with the best performance based on the simulation results

5 Real life application

In this section, the proposed method for monitoring the COM-Poisson profile is illustrated via a case study by analyzing the SECOM data set. This data set is obtained from a semiconductor manufacturing process and is available on the UCI machine learning repository. The data set was provided by McCann and Johnston (2008).

5.1 Data information

A modern semiconductor manufacturing process is equipped with high technology. The monitoring of the process is carried out on a continuous basis through the collection of signals/variables from sensors and/or process measurement points. Each type of signal can be considered a feature to improve the quality of semiconductors.

The data set presented in this section is generated from a similar process. After being measured by the 590 sensors, each of the 1567 instances of the production line was subjected to a Pass/Fail test. Test result associated with a specific date time stamp is either \(-1\) or 1, where \(-1\) stands for pass and 1 for failure. The null values are described using ‘NaN’. The final data set is the \(1567 \times 591\) matrix which consists of 590 features and a class. Moldovan et al. (2017), Moldovan et al. (2018), and Kim et al. (2017) used a data set to illustrate different machine learning algorithms for classification problems. Cao et al. (2020) applied a mixture of two refinement methods to detect quality issues in the data set. Via data set, Takahashi et al. (2019) demonstrated simplified machine learning implementation to reduce the time spent on failure prediction in manufacturing processes. Kwon and Kim (2020) adapted iterative feature selection for failure prediction and preferred the data set to evaluate the performance of the proposed method.

Inspired by the aforementioned studies, we designed the analysis process of the SECOM data set with deviation residual-based control charts. Our purpose is to determine if the CUSUM and EWMA charts based on PCR and r–k deviance residuals outperform the CUSUM and EWMA charts based on ML and ridge deviance residuals. The analysis of the data is conducted in R by using "COMPoissonReg" and "SPC" packages.

5.2 Data reconstruction

In this subsection, we rearranged the data and made it usable for our purposes. For this case, we follow the steps described below. The original data with a binary response variable was collected on certain days, and the time of data points collected was noted, so we followed steps 3–5 to convert it to count data.

  1. 1.

    Null values are replaced with previous value which results in the X matrix as \(1567\times 590\);

  2. 2.

    All categorical predictors are removed and only continuous predictors are kept. Then, the predictor matrix X is as \(1567\times 444\);

  3. 3.

    The 24-h period was divided into 8 different groups which is called as time frame. Time frame variable created based on the time stamp and the corresponding time intervals are \(1 \rightarrow 01:00-03:59\), \(2 \rightarrow 04:00-06:59\), \(3 \rightarrow 07:00-09:59\), \(4 \rightarrow 10:00-12:59\), \(5 \rightarrow 13:00-15:59\), \(6 \rightarrow 16:00-18:59\), \(7 \rightarrow 19:00-21:59\), \(8 \rightarrow 22:00-01:59(next day)\);

  4. 4.

    The data and the time frame are combined to create “date and time frame (DTF)”, e.g. 19-7-2008-4 stands for date 19-7-2008 and time frame 4;

  5. 5.

    The number of passes on each DTF is summed to create the response. The new predictor variable values are obtained by taking the average of each predictor variable in the corresponding DTF. By this way, a new data set based on DTF is created; the resulting matrix X is \(476 \times 444\) dimensional;

  6. 6.

    Within the data set arranged in steps 1–5, features/variables with the absolute value of pairwise correlation value between 0.948 and 0.952 were selected to be included in the analysis.Footnote 3 Then, we get the predictor matrix X in dimension \(476\times 8\) where the predictors in the model are values recorded by sensors numbered 93, 95, 98, 103, 116, 198, 470, and 523. The pairwise correlation matrix is than obtained as seen in Table 5.

Table 5 The correlation matrix

5.3 Data processing

In this subsection, we first determine the distribution of the data set created in Sect. 5.2, which we obtained with the eight predictors with 467 samples. Since Kim et al. (2017) mentioned that the data distribution is very irregular in each feature, prior to the modeling data set, we scaled the X matrix. Then, the ML estimator of regression coefficients is obtained which is then used for the multicollinearity diagnostics, the tuning parameter selection and the number of PCs selection.

Three different count data modelling techniques are utilized to determine the best-fitting distribution for the response variable: Log-likelihood, Akaike information criterion (AIC), and Bayesian information criterion (BIC). The log-likelihood, AIC, and BIC are than computed for the COM-Poisson, Negative Binomial, and Poisson models and the exact results are given in Table 6. The COM-Poisson distribution is recognized as the best fitting distribution due to its high log-likelihood and low AIC and BIC values. An estimated dispersion for the response variable is computed as \(v = 0.2019\), which indicates overdispersion.

Table 6 Diagnostic analysis of best-fitted distribution for response variable

After it is seen that the response variable follows COM-Poisson distribution, we parameterized the distribution. ML estimator is first obtained utilizing the OLS estimator as an initial value and \(\Vert \hat{\beta }^{(t)}-\beta ^{(t-1)}\Vert \le 1\times 10^{-6} \) is used as convergence criterion.

Scaled information matrix is examined for the multicollinearity. We calculated the eigenvalues of scaled information matrix as 64.271407, 56.797600, 54.648692, 48.305966, 1.584542, 1.432938, 1.418611, 1.326779 and the variance inflation factor (VIF) values as 28.49139, 28.50746, 29.49110, 29.49507, 28.37309, 28.51614, 28.56399, 28.34829. Since all VIF values are larger than 10, they indicate that all the predictors are involved in multicollinearity, supporting step (6) in Sect. 5.2.

The tuning parameter for ridge and r–k class estimators is obtained as \(k=0.001536528\) by using Eq. (5). The number of PCs that explain approximately 99\(\%\) of the overall variance is obtained as 7. Then the ML, ridge, PCR, and r-k deviance residuals are calculated as in Eq. (6).

The deviance residuals with respect to each other are compared to see the difference between residuals. The absolute value of the differences of the residuals was taken, and the minimum and maximum values of these obtained values and the top ten highest values were tabulated in Table 7. Table 7 shows how the different estimation methods affect the residuals and in which observations they are most effective relative to each other.

Table 7 Comparison of deviance residuals calculated from the models with ML, ridge, PCR, and r–k class estimators

Figure 2 shows while the closest residual values to each other are between \(d_{PCR}\) and \(d_{r-k}\), it is followed by \(d_{ML}\) and \(d_{ridge}\) and it is clear that residual differences are larger than these two. The highest range is between \(d_{ML}\) and \(d_{r-k}\).

Fig. 2
figure 2

Difference intervals for deviance residuals

5.4 Monitoring process: performed for CUSUM and EWMA separately

After we see in Sect. 5.3 that SECOM data set follows COM-Poisson distribution with correlated predictors and difference occurs between the residuals when different estimation methods are used, we switch to the profile monitoring.

Due to the lack of prior information on the status of the data set, we split the data set into two sets: training and test sets. The first \(25\%\) of the data set (first 119 observations) forms the training set and is used to create the data set of the in-control state, while the last \(75\%\) of the data set (last 357 observations) form the test set is used to examine the performance of the control charts.

After we split the data as training and test sets, we used the training set to construct the control limits from in-control data, and the test set for the analysis. The steps followed in each data set can be explained as follows.

  1. 1.

    Creating in-control data from training set

    1. (a)

      Utilizing in-built functions of the "SPC" package with the mean and standard deviation of deviance residuals of the training set, we obtained the control limits that met the \(ARL_0 \approx 200\) criterion;

    2. (b)

      We calculated the CUSUM chart statistics with \(K=0.5\) and EWMA chart statistics with smoothing parameters of \(\lambda =0.05, 0.1, 0.2\) of ML, ridge, PCR, and r–k deviance-based control charts;

    3. (c)

      We compared the control chart statistics with the control limits of the each control chart. Results are given in Tables 8 and 9 for CUSUM and EWMA charts respectively. Table 8 shows that the 80 observations exceeded the upper control limit of the CUSUM\(_{ML}\) and CUSUM\(_{ridge}\) charts while 79 observations for the CUSUM\(_{PCR}\) and CUSUM\(_{r-k}\) charts. On the other hand, Table 9 shows that EWMA (\(\lambda =0.05\)) detected two (2nd and 3rd) observations while EWMA(\(\lambda =0.1\)) detected only one (\(2^{nd}\)) observation as out-of-control observation. EWMA\(_{ML}\) and EWMA\(_{ridge}\) detected one (\(2^{nd}\)) observation whereas EWMA\(_{PCR}(\lambda =0.2)\) and EWMA\(_{r-k}(\lambda =0.2)\) detected two (2nd and 88th) observations.

    4. (d)

      We pooled the out-of-control observations detected by ML, ridge, PCR, and r–k deviance-based control charts and eliminated the unique ones to create the in-control training set. 80 observations were eliminated for monitoring the CUSUM chart and three observations were eliminated for the EWMA chart;

    5. (e)

      By obtaining the control limits meeting \(ARL_0=200\), we calculated the mean and standard deviation of each in-control training set such that the results are given in Tables 10 and 11.

  2. 2.

    Analysis for the test set

    1. (a)

      We calculated the control chart statistics for each chart by using test data;

    2. (b)

      We compared the control limits obtained from the training set with the control chart statistics obtained in (a);

    3. (c)

      We analyzed the performance of the control chart in terms of terms of \(ARL_1\) where \(ARL_1\) values are obtained by using the appropriate functions of the "SPC" package and the results are presented in Table 12.

Table 8 Control limits and the number of out-of-control observations identified by each CUSUM chart
Table 9 Control limit constants and the number of out-of-control observations identified by each EWMA chart
Table 10 CUSUM chart limits, in-control mean and standard deviation of training set
Table 11 EWMA chart limit constants, in-control mean and standard deviation of training set
Table 12 \(ARL_1\) results of CUSUM and EWMA charts

The process steps from Sects. 5.3 to 5.4 can be summarized with the flowchart given in Fig. 3.

Fig. 3
figure 3

Analyzing SECOM data set

5.4.1 Performance evaluation of the control charts

To see the direction of the shift, differences between the mean of residuals in the in-control training set and the mean of residuals in the test data set are computed and the results are given by Table 13. It was observed that the mean residual of the test set decreased when compared with the mean of the residuals under the in-control training set. According to Table 13 results, it is possible to say that a negative shift has occurred. Negative change is particularly evident in Fig. 4 since control chart statistics are positioned on the bottom part of the zero line and in Figs. 5, 6 and 7 since they show a downward trend.

Table 13 Mean change \({\hat{\mu }}^1_{est}-{\hat{\mu }}^0_{est}\)

Table 12 shows that both CUSUM\(_{r-k}\) and EWMA\(_{r-k}\) are the best in comparison to the corresponding control charts based on the ML, ridge, and PCR deviance residuals in terms of \(ARL_1\). The CUSUM\(_{PCR}\) and EWMA\(_{PCR}\) charts follow CUSUM\(_{r-k}\) and EWMA\(_{r-k}\), respectively and the CUSUM\(_{ML}\) and EWMA\(_{ML}\) charts have the highest \(ARL_1\) values.

The visualized analysis of the test set is conducted by plotting CUSUM and EWMA chart statistics against the observation number for the test set, as well as the control limits calculated from Tables 10 and 11 and we show the outcomes with Figs. 4, 5, 6, and 7. Figures 5, 6 and 7 show the effect of the smoothing parameter for the EWMA chart.

Fig. 4
figure 4

Analysis of test set with CUSUM charts

Fig. 5
figure 5

Analysis of test set with EWMA charts with \(\lambda =0.05\)

Fig. 6
figure 6

Analysis of test set with EWMA charts with \(\lambda =0.1\)

Fig. 7
figure 7

Analysis of test set with EWMA charts with \(\lambda =0.2\)

6 Conclusions

The traditional way to monitor count data in statistical process control is to use approaches designed for Poisson distribution. However, the Poisson distribution is used in the case of equidispersion and in real-life data sets, the underdispersed or overdispersed state may occur. If indications of under or overdispersion are present in the data set, it is more appropriate to use the COM-Poisson distribution.

This study presents CUSUM\(_{PCR}\), CUSUM\(_{r-k}\), EWMA\(_{PCR}\), and EWMA\(_{r-k}\) control charts based on deviance residuals to detect the out-of-control observations in the COM-Poisson profile, addressing the issue of multicollinearity in profile monitoring. The proposed control charts are compared to the CUSUM and EWMA control charts based on the deviance and ridge deviance residuals through a simulation study and real-life data set analysis and their performances are evaluated in terms of the \(ARL_1\) and SDRL statistics and.

Results showed that the r–k deviance-based charts mainly outperform the other control charts, while the deviance-based charts show the worst results in most combinations in the case of multicollinearity. Moreover, CUSUM charts show better results compared to the EWMA charts. It is to be noted that the choice of the smoothing parameter affects the performance of the EWMA charts. Both types of charts are more effective in detecting additive shifts than multiplicative shifts in the majority of combinations considered in the simulation. Furthermore, for fixed dispersion value and shift size, the changes in the result values are more plain and consistent in cases with high predictor numbers. The results are generally incompatible when the response is underdispersed. In summary, in the case of multicollinearity, CUSUM\(_{r-k}\) and EWMA\(_{r-k}\) charts based on deviance residuals perform better than alternative CUSUM and EWMA charts in determining small shifts in the process. The good performance indicator varies according to parameters, such as the number of predictors, dispersion level and shift size.