Keywords

1 Introduction

Nowadays, the performance of modern processes depends on several related quality characteristics. The statistical monitoring of “high-dimensional” processes is known as multivariate statistical process control (MSPC, see Bersimis et al. 2007, for a comprehensive review of the MSPC literature). A critical task for an MSPC control scheme is assessing whether the multidimensional process is in-control (IC) or not. Although it is unlikely that all the quality characteristics shift simultaneously, it is more common that only a subset of variables experiences abnormal changes. Thus, it could be more efficient to monitor only the potential out-of-control (OC) variables, which, however, are not known in advance. Thus, recent developments in the MSPC framework propose using variable selection (VS) algorithms to identify the suspected variables and then charting only these characteristics to test whether the multidimensional process is in-control or not(see Wang and Jiang 2009; Zou and Qiu 2009; Zou et al. 2010; Capizzi and Masarotto 2011; Jiang et al. 2012). These VS-based approaches seem attractive since they offer a very satisfactory performance in OC scenarios involving a shift in one, two,…, all the monitored quality characteristics. They are also coupled with diagnostic tools to accurately identify the variables responsible for the change.

These recent proposals combine different multivariate control charts with different VS procedures. A “forward” selection algorithm (FVS) has been combined with a Shewhart-type and a multivariate EWMA (MEWMA) by Wang and Jiang (2009) and Jiang et al. (2012), respectively. Other VS algorithms such as Least Absolute Shrinkage and Selection Operator (LASSO, see Tibshirani 1996) and Least Angle Regression (LAR, see Efron et al. 2004) have been proposed combined with an MEWMA-based control chart, by Zou and Qiu (2009), Zou et al. (2010) and Capizzi and Masarotto (2011), respectively. The suggested monitoring schemes differ not only in the VS algorithm but also in other aspects. In particular, the control charts based on stepwise regression assume that the number of variables that can be potentially OC is fixed a priori; this condition has been relaxed for the LASSO- and LAR-based schemes (LEWMA and LAR-EWMA, hereafter). Indeed, these control charts assume that any, proper or improper, subset of the monitored variables can potentially shift. Further, LEWMA and LAR-EWMA are based on two slightly different control statistics. In addition, LAR-EWMA is developed not only for testing the status of the process mean but also for detecting an increase in the total variability.

To provide some guidelines on how to choose between different VS-based multivariate control charts and give some suggestions for further research, we here compare and discuss some VS-based control charts recently proposed in the SPC literature. For a more objective comparison, we use for all the investigated control charts the general regression model introduced in Capizzi and Masarotto (2011) for the LAR-EWMA. Indeed, this more general regression framework allows to handle a wide a variety of multivariate scenarios not only involving shifts in the component of a multivariate mean vector but also those related to changes in a profile or in a multistage process.

The paper is organized as follows. Section 2 briefly describes the procedures based on the variable selection algorithms. Section 3 presents the main results concerning comparisons, in terms of average run length (ARL), between some control schemes based on different VS-based algorithms. Details on the multivariate OC scenarios, discussed in the comparisons, are given in the Appendix. Concluding remarks are given in Sect. 4.

2 Statistical Monitoring Based on Variable Selection Algorithms

2.1 Generalities

Assume that at each time t, t = 1, 2, , independent observations on \(\boldsymbol{y}_{t}\), an n × 1 vector of quality characteristics, are available, and consider the following Gaussian change-point model

$$\displaystyle{ \boldsymbol{y}_{t} \sim \left \{\begin{array}{ll} N_{n}(\boldsymbol{\mu },\boldsymbol{\varSigma }) &\mbox{ if }t <\tau \mbox{ (in-control)} \\ N_{n}(\boldsymbol{\mu }+\boldsymbol{\delta },\boldsymbol{\varOmega })&\mbox{ if }t \geq \tau \mbox{ (out-of-control)} \end{array} \right. }$$
(1)

that is, at τ, an unknown instant of time, the mean vector and the covariance matrix shift leading the process to an OC state. Further, we suppose that the IC mean vector \(\boldsymbol{\mu }\) and the IC covariance matrix \(\boldsymbol{\varSigma }\) are known.

Concerning the OC mean vector, we assume that, at least approximately, the mean shift \(\boldsymbol{\delta }\) takes the form

$$\displaystyle{ \boldsymbol{\delta }=\boldsymbol{ F}\boldsymbol{\beta }, }$$
(2)

where \(\boldsymbol{\beta }\) is a p × 1 vector of unknown parameters and \(\boldsymbol{F}\) a suitable n × p matrix of known constants. Thus, the mean vector may shift along any vector in the subspace spanned by the columns of \(\boldsymbol{F}\), allowing for a multitude of potential shift directions. As shown in Capizzi and Masarotto (2011), formulation (2) is sufficiently flexible to encompass a wide variety of change-point scenarios. Further, suppose there is a practical interest only in detecting an increase in the total dispersion and assume that \(\boldsymbol{\varOmega }-\boldsymbol{\varSigma }\) is a positive definite matrix.

Suppose process observations are accumulated in the following MEWMA

$$\displaystyle{ \boldsymbol{z}_{t} = (1-\lambda )\boldsymbol{z}_{t-1} +\lambda (\boldsymbol{y}_{t}-\boldsymbol{\mu }) }$$
(3)

with \(\boldsymbol{z}_{0} =\boldsymbol{ 0}_{n},\;\;0 <\lambda \leq 1\). Assuming the following (approximated) linear model

$$\displaystyle{\boldsymbol{z}_{t} =\boldsymbol{ F}\boldsymbol{\beta } +\boldsymbol{ a}_{t},}$$

with \(\boldsymbol{a}_{t} \sim N_{n}\left (\boldsymbol{0}_{n},\lambda /(1-\lambda )\boldsymbol{\varSigma }\right )\), the stability of the process mean can be checked by testing the hypothesis system

$$\displaystyle{ \left \{\begin{array}{l} H_{0}:\boldsymbol{\beta }=\boldsymbol{ 0}_{p}, \\ H_{1}:\boldsymbol{\beta } \neq \boldsymbol{0}_{p}.\end{array} \right. }$$
(4)

Unfortunately, the standard test, described in any regression textbook, for the hypothesis system (4) can show a very low sensitivity when only a few components of \(\boldsymbol{\beta }\) effectively shift, and a much more efficient approach should consider alternative hypothesis systems on reduced subsets of the parameters.

A promising solution consists of using a suitable VS algorithm for determining subsets, having different sizes, of suspected variables, i.e., subsets of columns of \(\boldsymbol{F}\) corresponding to nonzero coefficients. In particular, for k = 1, , p, denote with J k  = { j k, 1, , j k, k } the indices of the selected predictors. Since the set of coefficients \(\{\beta _{j_{k,1}},\ldots,\beta _{j_{k,k}}\}\) correspond to a plausible subset of possible out-of-control parameters, the VS-based control statistics, for k = 1, , p, test the following hypothesis systems:

$$\displaystyle{ \left \{\begin{array}{ll} H_{0}': &\beta _{j} = 0\mbox{ for }j = 1,\ldots,p, \\ H_{1,k}':&\beta _{j}\neq 0\mbox{ if }j \in J_{k}\mbox{ and }\beta _{j} = 0\mbox{ if }j\not\in J_{k}. \end{array} \right. }$$
(5)

2.2 Three Different Approaches

Three distinct methods have been suggested for testing the hypothesis system (5).

In Wang and Jiang (2009) and Jiang et al. (2012), users are requested to choose in advance a suitable value for k. Then, for t = 1, 2, , a standard forward search algorithm is used to select J k , and an OC alarm is signaled when the following control statistic

$$\displaystyle{ S_{t,k} =\boldsymbol{\hat{\beta }} _{t,k}'\boldsymbol{F}'\boldsymbol{\varSigma }^{-1}\boldsymbol{F}\boldsymbol{\hat{\beta }}_{ t,k} }$$
(6)

is greater than the control limit chosen for giving a desired IC performance. Here, \(\boldsymbol{\hat{\beta }}_{t,k}\) denotes the GLS estimate of \(\boldsymbol{\beta }\) obtained under H 1, k ′, i.e., constraining to zero the coefficients of the predictors not in J k .

In Zou and Qiu (2009), J 1, , J p are determined using the LASSO algorithm. Then, for k = 1, , p, the authors suggest to compute the control statistic

$$\displaystyle{ V _{t,k} = \dfrac{(\boldsymbol{z}_{t}'\boldsymbol{\varSigma }^{-1}\boldsymbol{F}\tilde{\boldsymbol{\beta }}_{t,k})^{2}} {\tilde{\boldsymbol{\beta }}_{t,k}'\boldsymbol{F}'\boldsymbol{\varSigma }^{-1}\boldsymbol{F}\tilde{\boldsymbol{\beta }}_{t,k}}, }$$
(7)

where \(\tilde{\boldsymbol{\beta }}_{t,k}\) denotes the LASSO estimator of \(\boldsymbol{\beta }\) obtained under H 1, k ′. An OC alarm is given when the overall control statistic

$$\displaystyle{ W_{t} =\max _{k=1,\ldots,p}\dfrac{V _{t,k} - E[V _{t,k}]} {\sqrt{\mathit{Var } [V _{t,k } ]}} }$$
(8)

is greater than a suitable control limit. In (8), the mean and standard deviation of (7) are computed under the null hypothesis.

Alternatively, Capizzi and Masarotto (2011) suggest selecting J 1, , J k using the LAR algorithm and, for each k = 1, , p, to compute the statistic S t, k . Since it is important to detect not only changes in the process mean but also increases in the dispersion, Capizzi and Masarotto (2011) also consider the additional alternative hypothesis

$$\displaystyle{H_{1,p+1}':\boldsymbol{\beta }=\boldsymbol{ 0}_{p}\;\mbox{ and }E\{(\boldsymbol{y}_{t}-\boldsymbol{\mu })'\boldsymbol{\varSigma }^{-1}(\boldsymbol{y}_{ t}-\boldsymbol{\mu })\} > n,}$$

and the related one-sided EWMA statistic

$$\displaystyle{ S_{t,p+1} =\max \left (1,(1-\lambda )S_{t-1,p+1}^{(1)} +\lambda \dfrac{(\boldsymbol{y}_{t}-\boldsymbol{\mu })'\boldsymbol{\varSigma }^{-1}(\boldsymbol{y}_{ t}-\boldsymbol{\mu })} {n} \right ), }$$
(9)

with \(S_{0,p+1} = 1\). Then, the LAR-based EWMA, for jointly monitoring the process mean and dispersion, is given by the aggregation of the p + 1 statistics

$$\displaystyle{ M_{t} =\max _{k=1,\ldots,p+1}\dfrac{S_{t,k} - E[S_{t,k}]} {\sqrt{\mathrm{Var } [S_{t,k } ]}}, }$$
(10)

where S t, k is given by (6) for k = 1, , p, and by (9) for \(k = p + 1\). The combined control statistic (10) triggers an alarm when it exceeds a suitable control limit.

2.3 First Recommendations and Open Questions

As shown in Zou and Qiu (2009), Capizzi and Masarotto (2011), and Jiang et al. (2012), control charts like W t and M t offer a good protection against shifts occurring in one, two,…, all components. Although the resulting scheme is not necessarily the best for detecting a shift occurring in a fixed number of components, it is usually close to the best. Conversely, control charts using a fixed value of k, such as those proposed by Wang and Jiang (2009) and Jiang et al. (2012), offer the best protection when shifts involve exactly k variables and unavoidably inferior protection when a shifts occur in a number of components different from the fixed value. Further, statistics such as W t and M t do not need an a priori choice of k. Thus, we suggest using an aggregated control statistic.

In addition, we strongly recommend including a control statistic, like S t, p+1, designed for detecting a change in the dispersion. Indeed, joint monitoring of the process mean and dispersion is relevant per se but also provides some level of robustness against modeling errors and unforeseen behaviors. Further, as shown in the univariate case by Reynolds and Stoumbos (20052006), the inclusion of a variance control statistic can be helpful for efficiently detecting large changes in the mean.

In the following, studying by simulation the ARL performance of VS-based control charts, we address the following additional issues: (1) Which variable selection algorithm should be used? (2) Which is better to use for monitoring, the elementary control statistic S t, k or V t, k ?

3 A Simulation Study

To address some of the issues discussed in the previous section, we compare five VS-based monitoring schemes. As recommended, all the schemes are based on a combination, similar to M t , of p elementary control statistics used for detecting a mean shift and of the control statistic S t, p+1, given by (9), for detecting increases in the total variation.

Details of the five control charts are given in Table 1. Note that, when a forward stepwise search is used, we have, for each k, that S t, k  = V t, k . Thus, we present only one scheme for the forward VS algorithm. However, the control statistics V t, k , given in (7), are here based on the LASSO- and LAR-based estimators of the vector \(\boldsymbol{\beta }\). Observe that in (6) and (7), at each stage k, the nonzero elements obtained via these three different VS algorithms are not necessarily the same.

Table 1 Five VS-based control charts

Concerning the choice of the smoothing constant, as suggested in the literature (Lucas and Saccucci 1990; Prabhu and Runger 1997; Zou and Qiu 2009; Capizzi and Masarotto 2011; Jiang et al. 2012), a reasonable choice for normally distributed observations is between 0.1 and 0.3. The performance of the different VS-based schemes has been investigated for different values of λ and τ. Because results are comparable for all the choices of these tuning constants, in the following results will be referred only to λ = 0. 1 and τ = 1. The five VS-based control charts are compared in terms of out-of-control ARL evaluated using 500,000 Monte Carlo replications. The control limits, giving an in-control ARL equal to 500, have been computed using a stochastic approximation algorithm (Ruppert 1991; Polyak and Juditsky 1992). Within a reasonable number of iterations, the algorithm estimates the control limits with a given level of accuracy. Table 1 lists the estimates of the control limits for the five VS-based control charts. In addition, the mean and standard deviation of the elementary statistics S t, k and V t, k , for \(k = 1,\ldots,p + 1\), were computed by simulation.

Suitable choices of the matrix F lead to several change-point models, such as the “unstructured” scenario, when changes directly involve the components of the multivariate mean vector, and several “structured” scenarios, such those involving changes in a profile, that is, in the relationship between a response variable and one or more explanatory variables, and in a multistage process. Details for the components of the vector \(\boldsymbol{\beta }\) that are supposed to change are listed, for each change-point model, in the Appendix. In every case, several possible mean shifts are considered, including shifts in a single parameter, equal and different shifts in a pair of parameters, shifts of the same size in either even or odd components and shifts in variance. Here, we briefly describe the scenarios examined in the simulation study.

3.1 Unstructured

In this case p = n, the matrix \(\boldsymbol{F}\) reduces to the identity matrix \(\boldsymbol{F} =\boldsymbol{ I}_{n}\) and the \(i\)-th element of \(\boldsymbol{\beta }\) directly points to a mean shift of the i-th quality characteristic, i.e., δ i  = β i . Following the example in Zou and Qiu (2009), we consider \(p = n = 15\) and assume that the IC distribution is \(N_{n}(\boldsymbol{0}_{n},\boldsymbol{\varSigma })\) with \(\boldsymbol{\varSigma }= (\sigma _{ij}) = (0.75^{\vert i-j\vert })\) for i, j = 1, 2, , n and the OC distribution \(N_{n}(\boldsymbol{\beta },\omega ^{2}\boldsymbol{\varSigma })\) with ω > 1.

3.2 Linear and Cubic Profiles

Under this scenario, we assume that

$$\displaystyle{y_{t,i} = \left \{\begin{array}{@{}l@{\quad }l@{}} \epsilon _{t,i} \quad &\mbox{ if }t <\tau \\ \beta _{1} +\beta _{2}x_{i} + \cdots +\beta _{p}x_{i}^{p-1} +\epsilon _{t,i}\quad &\mbox{ if }t \geq \tau \end{array} \right.}$$

with \(x_{i} = (2i - n - 1)/(n - 1)\), for i = 1, , n. Here, ε t, i are independent, zero-mean, Gaussian random variables, with the IC and OC variance equal to one and ω 2 > 1, respectively. Thus, in the described scenario, \(\boldsymbol{F} = (f_{i,j}) = (x_{i}^{j-1})\), \(\boldsymbol{\varSigma }=\boldsymbol{ I}_{n}\) and \(\boldsymbol{\varOmega }=\omega ^{2}\boldsymbol{I}_{n}\). In particular, we consider linear (p = 2) and cubic (p = 4) profiles with n = 4 and n = 8 observations, respectively.

3.3 Nonparametric Profiles

To investigate the performance of the VS-based control chart for nonparametric monitoring of non-linear profiles, we use the same IC model considered by Zou et al. (2008), \(y_{t,i} = 1 -\exp (-x_{i}) +\epsilon _{t,i},\) where \(x_{i} = (i - 0.5)/20\), i = 1, , 20, and the following three OC models:

  1. I.

    \(y_{t,i} = 1 -\beta _{1}\exp (-x_{i}^{\beta _{2}}) +\epsilon _{ti}\);

  2. II.

    \(y_{t,i} = 1 -\exp (-x_{t,i}) +\beta _{1}\cos (\beta _{2}\pi (x_{t,i} - 0.5)) +\epsilon _{t,i}\);

  3. III.

    \(y_{t,i} = 1 -\exp (-x_{t,i} -\beta _{1}\max (0,(x_{t,i} -\beta _{2})/(1 -\beta _{2}))^{2}) +\epsilon _{t,i}\).

Here, ε t, i are independent, zero-mean, Gaussian random variables, with the IC and OC variance equal to one and ω 2 > 1, respectively. In this case, we set \(\boldsymbol{F}\) equal to the basis matrix of a cubic spline with four equispaced knots within the interval [0, 1].

3.4 Multistage Processes

We consider an n-state process representable by the linear state-space model

$$\displaystyle{\left \{\begin{array}{l} y_{t,i} =\mu _{i} + c_{i}x_{t,i} + v_{t,i} \\ x_{t,i} = d_{i}x_{t,i-1} +\beta _{i}I_{\{t\geq \tau \}} + w_{t,i}\end{array} \right.\quad (i = 1,\ldots,n)}$$

where v t, i and w t, i are independent normal random variables with zero mean. The n elements of the \(\boldsymbol{\beta }= (\beta _{i})\) vector define the magnitude of the shifts and the stages at which the shifts occur. It is easy to show that this model is a particular case of (1). See Capizzi and Masarotto (2011) for the details and, in particular, for the structure of the \(\boldsymbol{F}\) and \(\boldsymbol{\varSigma }\) matrices. In the simulation, we fix the number of stages to n = 10 and investigate the performance for different shift locations, occurring in one, two, five and all stages assuming that μ i  = 0, \(c_{i} = d_{i} = \mbox{ var}(w_{t,i}) = \mbox{ var}(v_{t,i}) = 1\) for every i.

3.5 Results

Results are summarized in Fig. 1, which shows the following percent relative differences

$$\displaystyle{ 100 \cdot \frac{\mathrm{ARL}_{\mathit{rs}} -\mathrm{ MARL}_{r}} {\mathrm{MARL}_{r}},\;\;s = 1,\ldots,5, }$$
(11)

i.e., the percent relative differences between ARL rs , the OC ARL of s-th control chart in the r-th OC scenario, and MARL r , the mean of the five out-of-control ARL values, one for each control chart, obtained for the r-th OC scenario. Observe that the number of the OC scenarios is different for the different cases. In particular, r = 1, , 18 for the case of nonparametric and multistage process monitoring, r = 1, , 23 for the monitoring of linear and cubic profiles and r = 1, , 30 in the unstructured case (see the Appendix for a detailed description of each OC scenario).

Fig. 1
figure 1

Relative ARL differences of five VS-based control charts for several OC scenarios (see the Appendix for labels in the x axis)

A negative (positive) value of (11) can be interpreted as a quicker reaction (slower) reaction of the s-th control chart to the r-th OC situation, when compared to the other VS-based control charts.

Results show that, independently from the multivariate control charts, the forward- and the LAR-based schemes show a similar behavior that seems to be quite stable throughout all the different practical contexts. However, while LASSO shows a substantially negligible advantage for some out-of-control situations, it can also show a relatively large degradation for other applications, such as nonparametric profile monitoring. Further, monitoring schemes based on the same VS algorithm but on different elementary control statistics, i.e., on S t, k or V t, k , offer essentially the same performance.

4 Conclusions

In this paper, we compared the performance of multivariate control charts based on three different variable selection procedures. In particular, the compared multivariate control charts have been implemented for detecting many out-of-control conditions, even involving increases in process variation, for several MSPC frameworks. Results show that whereas control charts consisting of the aggregation of several control statistics, such as those proposed by Zou and Qiu (2009) and Capizzi and Masarotto (2011), behave quite similarly for different OC applications, suggestions can be given to practitioners concerning the particular variable selection procedure to use. As discussed before, the LASSO-based control charts can show an unsatisfactory performance in detecting some particular OC situations. However, the forward and LAR-based procedures can be considered substantially equivalent in terms of OC ARL performance for the investigated practical applications. Thus, from a practical point of view, multivariate control charts based on forward selection could be more appealing to users since these charts are more intuitive and simpler to implement.