Keywords

1 Introduction

Statistical process control (SPC) charts are widely used for monitoring the stability of certain sequential processes in various disciplines including manufacturing and healthcare systems. Typically, the SPC charts assume that there are two causes of variability in the process measurements: one is “common cause” which is due to unavoidable randomness and another is “special cause” when an undesirable variability intervenes, for example, mechanical defects, improper handling of machines, human errors, onset of certain medical conditions, etc. When the variability is only due to common causes, the process is said to be “in-control”. “In-control” process measurements can be considered as realizations of a random model, for example, independent and identically distributed (i.i.d.) observations from a cumulative distribution function (c.d.f.) F1. When a special cause interferes, the process measurements no longer appear as i.i.d. realizations of F1, and then the system is said to be “out-of-control”. Practitioners typically divide SPC into two phases. Initially, a set of process measurements are analyzed in Phase I. If any “unusual” patterns in the process measurements are found, they make necessary adjustments and fine-tuning of the system. After removing all such special causes, we have a clean set of process measurements data under stable operating conditions, and they are representative of the actual process performance. The major goal of Phase II SPC control charts is to detect any change in process distribution after an unknown time-point.

A change in the process distribution may not always be in location and scale only; it can be general, for example, changes in degrees of freedom of chi-squared distribution. Furthermore, changes can either be isolated, i.e., the system goes “out-of-control” for a short time and then returns to “in-control”, or persistent, i.e., once the system goes “out-of-control,” it remains “out-of-control” or even goes further away from control until the special causes are removed. Among existing SPC charts in the literature, Shewhart-type [1] charts are used to detect isolated changes and cumulative sum (CUSUM) type charts (e.g., [2]) are used to detect persistent changes. However, most SPC charts consider shifts in location and/or scale, because they are most common and often capture other departures. In real-world problems, shifts in skewness and kurtosis can happen without much change in location and scale. For example, the shape of the process distribution gradually changing over time without much change in mean or variance. If we fail to detect those changes and let the process run, it can eventually become worse and a shift in location and scale can creep in. Moreover, the special causes that initiated the change can cause more damage to the system, and it may become a challenge to fix. If we can detect such a change in skewness, we can avoid subsequent troubles. Therefore, it is imperative to develop an SPC chart that can detect changes in the process distribution in general. The proposed chart precisely focuses on this.

Various types of SPC control charts have been proposed in the literature including Shewhart-type charts [1], cumulative sum (CUSUM)-type charts (e.g., [2]), exponentially weighted moving average (EWMA chart) [3] (e.g., [4, 5]), etc. Many control charts assume that “in-control” process distribution F1is either known or has a known parametric form (e.g., normal). In real-life problems, this is usually not the case. It has been demonstrated in the literature that the SPC charts using prespecified distribution in their designs may not be reliable in such cases (e.g., [6, 7]). To address this, a number of non-parametric or distribution-free SPC charts have been proposed. For example [8,9,10,11,12,13,14,15,16,17,18], and so on. [19] provides an overview of nonparametric SPC charts. Discussions for multivariate cases are provided by Qiu and Hawkins [20, 21] and Qiu [22]. Some SPC charts (e.g., [23,24,25]) monitor process mean and variance jointly. Moustakides [26] proposes a method to detect distributional changes when both “in-control” and “out-of-control” distributions are known. Ross and Adams [27] propose two nonparametric control charts to detect arbitrary distributional change under the change point detection (CPD) framework when both “in-control” and “out-of-control” distributions are unknown. Mukherjee [28] also proposes a Phase II nonparametric SPC chart for detecting arbitrary distributional change, but it requires a substantial amount of “in-control” Phase I data. A thorough literature review on SPC charts can be found in [29] as well as in [30].

Most existing SPC charts mentioned above focus to detect changes in process location and scale but do not consider an arbitrary distributional change. Moreover, some methods require multiple observations at each time-point; some others require “in-control” Phase I observations. All of these may not be reasonable in many real-life problems. The proposed chart focuses on univariate continuous processes and proposes a p-value-based nonparametric SPC chart to detect an arbitrary distributional change when we can assume that the observations we collect are independent, but “in-control” c.d.f. F1 is unknown, and “in-control” Phase I data are unavailable. p-value-based SPC charts are new trends as some researchers have already started developing those for better interpretation, e.g., [31, 32], etc.

Additionally, the proposed chart uses the strengths of Cramer-von Mises test and Ansari-Bradley test and demonstrates a better performance in detecting an arbitrary distributional change early. It is demonstrated in the literature [27] that the power of two-sample Cramer-von Mises test to detect a small change in standard deviation or scale is rather weak. The proposed chart overcomes that by integrating this test with Ansari-Bradley test, a nonparametric test for detecting scale change. The integration process is quite general in nature, and hence similar integration techniques can be used to design better control charts in many scenarios.

The major steps of the proposed SPC chart are the following: First, we estimate the possible change-point that gives the minimum p-value of the relevant two-sample Cramer-von Mises test statistic. We use a computationally efficient method to do this. Using that change-point, we calculate the p-value of Ansari-Bradley test and pick the smallest of the two p-values as the effective p-value and use it as the charting statistic. Since it is demonstrated in the literature (e.g., [27]) that Cramer-von Mises test does not have strong power to detect scale changes, if there is indeed a scale change in the process distribution, Ansari-Bradley test gives a smaller p-value than Cramer-von Mises test . Therefore, to detect such changes, it is better to use the smaller of the p-values. If the process distribution changes without affecting the scale parameter, Cramer-von Mises test gives a smaller p-value than Ansari-Bradley test. It is also better to use the smaller of the two p-values to detect such a change as well. If the p-value is large enough, and our effective sequence of observations is too long, we prune observations from distant past and we collect the next observation. The amount of pruning is a nondecreasing function of the effective p-value as the higher the p-value, the more likely it is that no distributional change has occurred, and it is therefore better to ignore information from distant past to speed up computation. If the p-value is small enough, the chart signals a distributional change.

The remainder of the chapter is organized as follows. At the end of this paragraph, a nomenclature subsection is provided so that this chapter can be read smoothly. Next, brief descriptions of traditionally used SPC charts are provided in Sect. 19.2. The proposed control chart is described in Sect. 19.3. Numerical studies to evaluate its performance in comparison with several existing control charts are presented in Sect. 19.4. A climatological data analysis and a blood sugar monitoring data analysis by the proposed chart and its competitors are presented in Sect. 19.5. A few remarks in Sect. 19.6 conclude this chapter.

2 Traditionally Used SPC Charts

Statistical process control of a production process is roughly divided into two phases: Phase I and Phase II. In Phase I, i.e., in the initial stage, we usually do not have enough information about the performance of the production process, and our major goal in this stage is to adjust the production process so that it can run in a stable manner. First, we usually let the process produce a given amount of products, and then the quality characteristics of these products are analyzed. If the statistical analysis of these data shows indication that the process is not running stably, we try to figure out the root causes for that and make adjustments of the process so that it can run stably. After the adjustments, another set of data is collected and analyzed, and the production process should be adjusted again, if necessary. The analysis and adjustment process is iterated several times until we are confident that the performance of the production process is stable. Once all “special causes” have been removed and the production process is “in-control,” we collect an “in-control” dataset from the products manufactured under the stable operating conditions, and the “in-control” data are used for estimating the “in-control” distribution of the quality characteristic(s) of interest. Based on the actual “in-control” distribution if known, or estimated “in-control” distribution otherwise, a Phase II SPC control chart is designed. It is typically used for online monitoring of the production process. When the chart detects a significant change in the distribution of the quality characteristic(s) from “in-control” distribution, it gives a signal and the production process is stopped immediately for identification of the root cause for such a change and its removal. This online monitoring stage is often called Phase II SPC.

In both Phase I and Phase II of SPC, many statistical tools such as histograms, stem-and-leaf plots, regression, and design of experiments are very helpful. Among all these, control charts are especially useful since they are constructed specifically to detect “out-of-control” performance of the production process. A charting statistic should be chosen such that it contains as much of the information in the observed data about the distribution of the quality characteristic(s) as possible and be sensitive to any distributional change as well. In the literature, different types of control charts have been developed including the Shewhart charts, the cumulative sum (CUSUM) charts, the exponentially weighted moving average (EWMA) charts, charts based on change-point detection (CPD), and so on. Brief descriptions of some of these control charts are provided below.

2.1 Shewhart Chart

The first control chart was proposed by Shewhart [1] in 1931. The chart assumes that the quality variable X follows normal distribution N(μ0, σ2), and at each time-point we obtain m independent quality observations. Denote \(\left (X_{n1}, X_{n2}, \ldots , X_{nm}\right )\) to be the n-th batch of observations, where the batch size is m ≥ 2. Traditional z-test is used to check if the process observations are “in-control” at the n-th time-point. The process is considered “out-of-control” when

where \(\overline {X}_n\) is the sample mean of \(\left (X_{n1}, X_{n2}, \ldots , X_{nm}\right )\) and z1−α∕2 is (1 − α∕2)-th quantile of the standard normal distribution. This version can be used when both μ0 and σ are known. However, it is usually not the case in reality. In that case, they have to be estimated from a dataset known to be “in-control.” Suppose, \(\left (X^\ast _{i1}, X^\ast _{i2}, \ldots , X^\ast _{im}\right )\), i = 1, 2, …, M be an “in-control” dataset. Let \(\overline {X}^\ast _i\) and \(R^\ast _i\) be the sample mean and sample range of the i-th batch of “in-control” dataset and \(\overline {\overline {X}^\ast }\) and \(\overline {R}^\ast \) be the sample means of \(\{\overline {X}^\ast _i, i = 1, 2, \ldots , M\}\) and \(\{R^\ast _i, i = 1, 2, \ldots , M\}\), respectively. It can be easily verified that \(\overline {\overline {X}^\ast }\) is an unbiased estimator of μ0 and \(\overline {R}^\ast /d_1(m)\) is an unbiased estimator of σ, where \(d_1(m) = E(R_i^\ast /\sigma )\) is a constant depending on m. When m = 2, d1(m) = 1.128, when m = 5, d1(m) = 2.326. d1(m) values for many other commonly used m are provided in Table 3.1 of [30]. Therefore, the Shewhart chart signals a shift in process mean if

(19.1)

Traditionally, the manufacturing industry uses α = 0.0027, and hence (1 − α∕2)-th quantile of N(0, 1), i.e., z1−α∕2 is 3. Therefore, the chart signals mean shift at time-point n if \(\overline {X}_n\) falls outside the interval of width six sigma centered at μ0 and where sigma is the standard deviation of \(\overline {X}_n\). Thus, the terminology “six sigma” originated in the domain of quality control.

The performance of a control chart is traditionally measured by average run length (ARL). Since the charts use control limits for making decision on process performance, an “in-control” process sometimes give false signals of distributional shift. This phenomenon is analogous to having type I error in hypothesis testing. The number of samples or batches collected from the initial time-point of consideration to the occurrence of first false “out-of-control” signal when the process remains “in-control” is called “in-control” run length. The mean of such a run length is called “in-control” average run length denoted as ARL0. On the other hand, the number of samples or batches collected from the time-point when the shift actually occurs to the time-point of signal of shift is called “out-of-control” run length. Its mean is called “out-of-control’ average run length denoted as ARL1. The ideal situation is that, for a control chart, ARL0 value is large and ARL1 value is small. However, similar to type I and type II error probabilities in hypothesis testing, it is difficult to achieve. Usually, when ARL0 is large, ARL1 is also relatively large and vice versa. In SPC literature, we usually fix the ARL0 value at a given level and compare the performances of the control charts by comparing how small their ARL1 values are. In the \(\overline {X}\) Shewhart chart as described above, the distribution of “in-control” run length is clearly geometric with parameter α. α = 0.0027 makes ARL0 = 1∕α = 370.37. ARL1 can also be computed easily as a function of the shifted mean.

In the literature, there are many versions of Shewhart chart. One of them uses sample standard deviation to estimate σ rather than sample range. In this case, the chart gives a signal for mean shift when

(19.2)

where \(\overline {S}^\ast \) is the sample mean of the batch-wise sample standard deviations \(\{S^\ast _i, i = 1, 2, \ldots , M\}\), i.e.,

and \(d_3(m) = E(S_i^\ast /\sigma )\) is a constant depending on the value of m. Under the same setup, Shewhart charts were constructed to monitor process variability. Defining \(d_2(m) = \sqrt {\mbox{Var}\left (\frac {R_i^\ast }{\sigma }\right )}\), and using \(d_1(m) = E(R_i^\ast /\sigma )\), we can estimate \(\sigma _{R_i^\ast }\) by \(\frac {d_2(m)}{d_1(m)}\overline {R}^\ast \). Therefore, this version of Shewhart chart gives a signal for a change in variability if

(19.3)

Similarly, another version of Shewhart chart was constructed using sample standard deviation instead of sample range. Using the result \(\sigma _{S_i^\ast } = \sigma \sqrt {1 - d_3^2(m)}\) proved by Kenney and Keeping [34], \(\sigma _{S_i^\ast }\) can be estimated by \(\frac {\overline {S}^\ast }{d_3(m)} \sqrt {1 - d_3^2(m)}\). Therefore, the chart gives a signal of a change in variability if Sn, the sample standard deviation of \(\left (X_{n1}, X_{n2}, \ldots , X_{nm}\right )\), satisfies the following condition:

$$\displaystyle \begin{aligned} S_n &> \overline{S}^\ast + z_{1-\alpha/2} \frac{\sqrt{1 - d_3^2(m)}}{d_3(m)} \overline{S}^\ast \\ &\mbox{or}\quad S_n < \overline{S}^\ast - z_{1-\alpha/2} \frac{\sqrt{1 - d_3^2(m)}}{d_3(m)} \overline{S}^\ast. \end{aligned} $$
(19.4)

Using the result that \(\frac {(m-1)[S_i^\ast ]^2}{(\sigma ^2)} \sim \chi ^2_{m-1}\), another chart was constructed using sample variance instead of sample standard deviation. This chart gives a signal for a change in process variance when

(19.5)

where \(\overline {{S^\ast }^2}\) is the sample average of \(\{{S^\ast _i}^2, i = 1, 2, \ldots , M\}\).

Next, we discuss \(\overline {X}\) Shewhart chart for monitoring individual observations rather than batched observations in Phase I SPC. The idea is to artificially create grouped data by grouping consecutive observations. First, we fix the size of each group \(\widetilde {m} > 1\). Then, the first \(\widetilde {m}\) observations form the first group, the next \(\widetilde {m}\) observations form the second group, and so on. Next, we can apply \(\overline {X}\) Shewhart chart (19.1) on the grouped data. However, one problem here is that consecutive groups are \(\widetilde {m}\) time-points apart. Hence, it is difficult to know the process behavior at each time-point. To overcome this limitation, most people adopt the idea of moving windows. We artificially create grouped data as follows: Group 1 (X1, X2, …, \(X_{\widetilde {m}}\)), Group 2 (X2, X3, …, \(X_{\widetilde {m}+1}\)), and so on until Group \((n-\widetilde {m}+1)\) (\(X_{n-\widetilde {m}+1}\), \(X_{n-\widetilde {m}+2}\), …, Xn). Denote MR1, MR2, …, \(\mbox{MR}_{n-\widetilde {m}+1}\) to be the sample ranges of the \((n-\widetilde {m}+1)\) groups of data and \(\overline {\mbox{MR}}\) to be their sample mean. From the definition of \(d_1(\widetilde {m})\), we can estimate σ by \(\overline {\mbox{MR}} / d_1(\widetilde {m})\). Therefore, the \(\overline {X}\) Shewhart chart for monitoring individual observations gives a signal of mean shift when

(19.6)

Similarly, R Shewhart chart for monitoring individual observations gives a signal of a variability shift when

(19.7)

Here, we should check at all time-points that belong to the i-th group, i.e., from the i-th to the \((i + \widetilde {m} - 1)\)-th time-points. Other Shewhart charts for monitoring individual observations were constructed similarly.

In many applications, quality characteristics are categorical. Now, we describe some Shewhart charts for monitoring such characteristics. After certain products are randomly chosen for monitoring purposes, they are classified in conforming and nonconforming products based on requirements on the quality characteristics. Now, we monitor the proportion of nonconforming products over time. We assume that when a production process is “in-control,” the true proportion of nonconforming products is π0, and we obtain a random sample of m products at each time-point. Let Y  be the number of nonconforming products obtained at a given time-point. Therefore, Y ∼Binomial(m, π0) when the process is “in-control.” Let p = Ym be the sample proportion of nonconforming products at the time-point. When m is large, the probability distribution of p can be well approximated by \(N\left (\pi _0, \pi _0(1-\pi _0)/m\right )\) by the central limit theorem when the process is “in-control.” Hence, the process can be called “out-of-control” if

In practice, π0 is often unknown and should be estimated from collected Phase I data, just like in the original version of \(\overline {X}\) Shewhart chart use estimated μ0 and σ. As before, we assume that we have M batches of Phase I data. Let \(p_i^\ast \) be the sample proportion of nonconforming products in the i-th batch of Phase I data for i = 1, 2, …, M and \(\overline {p}^\ast \) be their sample mean. Therefore, we can estimate π0 by \(\overline {p}^\ast \), and hence the p Shewhart chart gives a signal for a change in the proportion of nonconforming products at n-th time-point when

(19.8)

There are other versions of Shewhart chart in the literature for monitoring count processes having distributions such as Poisson. Detailed descriptions of such charts are provided in Chapter 3 of [30].

Shewhart charts are good at detecting relatively large isolated shifts, but not so efficient in detecting relatively small but persistent shifts. This is because Shewhart charts evaluate the process performance based on the observed data collected at each individual time-point and ignore observations collected previously. Therefore, Shewhart charts are popular in Phase I SPC where large and isolated shifts are common but less commonly used in Phase II SPC.

As we mentioned before, the \(\overline {X}\), R, and S Shewhart charts are appropriate to use only in cases when the process distribution is normal and the observations are independent of each other. When the process distribution is not normal, the probability of type I error, i.e., the probability of false “out-of-control” signal when the process is actually “in-control,” can substantially differ from the prefixed value of α. If the type I probability of a Shewhart chart is larger than α, then the chart will give false “out-of-control” signal more often than expected. Consequently, much time and resource will be wasted to find the cause of such signals and adjusting the related production process. On the other hand, if the type I probability of a Shewhart chart is smaller than α, then real shifts will be missed more often than expected and hence many nonconforming products could be manufactured. However, when the distribution of Xij is non-normal but the batch size m is large, the issue described above will not be serious, because the distribution of \(\overline {X}_n\) can be well approximated by normal distribution due to the central limit theorem. In cases when the distribution of Xij is non-normal and the batch size m is small, mainly two approaches are usually taken. One approach is to transform the non-normal data to normal and then use the conventional Shewhart charts to the transformed data [35, 36], and the other approach is to use Shewhart charts that are constructed to monitor non-normal data.

If the performances of the Shewhart charts are evaluated by “in-control” average run length ARL0 and “out-of-control” average run length ARL1, then the performance measures will be not accurate if the observations are correlated [37,38,39]. In these cases, correlations have to be handled properly.

2.2 CUSUM Chart

Shewhart chart makes a decision whether a process is “in-control” or not at a time-point using only the observations obtained at that time-point and ignoring all previous observations. Therefore, it is not very effective in Phase II monitoring in most cases, because previous observations contain helpful information about process performance at present. Page [2] suggested the first cumulative sum (CUSUM) chart to overcome this limitation. Let us first describe the basic CUSUM chart for detecting mean shift of a process following normal distribution. Again, the chart assumes that the process observations X1, X2, X3, … follow N(μ0, σ2) and are independent. The CUSUM charting statistics are given by

(19.9)
(19.10)

where \(C_0^+ = C_0^- = 0\), and k > 0 is an allowance constant. The CUSUM chart gives a signal of mean shift if

(19.11)

where ρc > 0 is a control limit. The allowance constant k is prespecified, and the value of ρc is chosen so that the average run length when the process is “in-control,” denoted as ARL0, equals a given number, say, 200, 370, 500, etc. Table 4.1 of [30] provides the values of ρc for various values of allowance constant k and ARL0. We can easily see that the charting statistics \(C_n^+\) and \(C_n^-\) make use of all available data before the n-th time-point, and they restart from 0 when the cumulative charting statistics suggest no significant evidence for mean shift in the sense that \(C_{n-1}^+ \left ((X_n - \mu _0)/\sigma \right ) < k\) and \(C_{n-1}^- \left ((X_n - \mu _0)/\sigma \right ) > -k\). Because of this restarting mechanism, the CUSUM chart enjoys a good theoretical property that it has smallest “out-of-control” ARL, denoted as ARL1, among all control charts having the same value of ARL0. Moustakides [26] proved that the CUSUM chart with an allowance constant k has the shortest ARL1 value among all charts with a fixed ARL0 value for detecting a persistent shift of size δ = 2k.

To be able to use the above CUSUM chart, the process observations should be independent before and after a potential shift, both “in-control” and “out-of-control” distributions have to be normal, and the parameters μ0 and σ of the “in-control” distribution have to be known. However, in practice these assumptions may not be reasonable.

If the observations are autocorrelated, then the actual value of ARL0 will be different from the specified value for which the control limit ρc is determined in case of i.i.d. (independent and identically distributed) normal observations as provided by Table 4.1 of [30]. For example, if the production process is autoregressive of order 1, and the autocorrelation is negative, then the actual value of ARL0 will be larger than the specified value for which ρc is determined. Consequently, the chart will not be sensitive enough to mean shifts and a lot of nonconforming products can be manufactured. On the other hand, if the production process is autoregressive of order 1, with positive autocorrelation, then the actual value of ARL0 will be smaller than the specified value for which ρc is determined. That means the chart will give too many false signals of mean shift than expected and the production process has to be stopped too many times unnecessarily, and hence many resources will be wasted. One commonly used approach to accommodate possible autocorrelation among observed data is to group neighboring observations into batches and then apply conventional CUSUM charts for independent data to the batch means. One major reason behind this idea is that possible autocorrelation in the original data will be mostly eliminated in the new batch means [40]. Because of autocorrelation, the standard deviation of standardized group means may not be 1, and it can differ very much from 1. Therefore, the actual ARL0 may be far away from the specified ARL0 value. To overcome this limitation of the grouping approach, the group means need to be scaled properly which is difficult to do unless we know the nature of correlation in the original data. For related descriptions, see [40, 41], and [42]. Another disadvantage of the grouping approach is that the control chart cannot detect a mean shift promptly as it has to wait until all observations within a group are obtained. An alternative approach to the grouping idea is to describe the correlation by a statistical model such as autoregressive moving average (ARMA) model. In many practical applications, appropriate special cases of the ARMA model, such as first-order autoregressive model, can be used; otherwise, an appropriate model can be selected by a model selection procedure. After a time-series model is chosen and fitted by a certain routine procedure in time-series analysis, we can calculate the residuals. If the chosen time-series model describes the observed “in-control” data adequately, and the production process is “in-control” until the given time-point, the residuals should approximately be independent with a zero-mean common normal distribution whose variance can be estimated by an appropriate estimator. Then, we can apply the conventional CUSUM chart to the calculated residuals. However, we should be careful that an “out-of-control” signal may not always be due to a shift in mean; it can be due to a change in correlation structure in the observations as well. Related discussions on model-based control charts for monitoring autocorrelated processes can be found in [43,44,45], and so on.

In cases when the observations are i.i.d. normal but the “in-control” mean μ0 and variance σ2 are unknown, a common approach is to estimate those from a large “in-control” dataset. However, even a small randomness of these estimated values affects the performance of the CUSUM charts in a significant way. Hawkins [46] explored this research question in detail. Since in many applications we cannot have an extremely large “in-control” dataset, Hawkins [46] proposed the self-starting CUSUM charts in which “in-control” parameters are estimated from the observations collected in Phase II SPC. Assume that no “out-of-control” signal is given until the (n − 1)-th time-point. A new observation is collected at the n-th time-point and we want to make a decision whether a signal of mean shift should be given at this time-point. Since no signal for mean shift is given at (n − 1)-th time-point, we can consider all observations collected at that time-point and before, i.e., Xn−1, Xn−2, …, X1, as “in-control” observations. Therefore, μ0 and σ2 can be estimated by their sample mean \(\overline {X}_{n-1}\) and sample variance \(S^2_{n-1}\), respectively, as long as n ≥ 3. Therefore, for constructing a CUSUM chart, it is natural to replace (Xn − μ0)∕σ in (19.9) and (19.10) by \(T_n = (X_n - \overline {X}_{n-1})/S_{n-1}\). When X1, X2, …, Xn are i.i.d. N(μ0, σ2) which is the case when the process is “in-control” up to the n-th time-point, we can easily check that \(\left (\sqrt {(n-1)/n}\right )T_n \sim t_{n-2}\). As proved in [47], T1, T2, …, Tn are independent of each other when the process is “in-control” up to the n-th time-point. Therefore, in that case,

$$\displaystyle \begin{aligned} \begin{array}{rcl} Z_n = \Phi^{-1}\left[\Upsilon_{n-2} \left(\sqrt{\frac{n-1}{n}}T_n\right) \right] \end{array} \end{aligned} $$

follow i.i.d. N(0, 1) distribution, where Φ and Υn−2 are cumulative distribution functions (c.d.f.) of N(0, 1) and tn−2, respectively. Since Φ−1 and Υn−2 are increasing functions, a mean shift in the original observations Xi, 1 ≤ i ≤ n indicates a mean shift in the transformed observations Zi, 1 ≤ i ≤ n and vice versa. Therefore, detection of mean shift in Phase II monitoring can be accomplished by using transformed observations Zn, n ≥ 1 in place of (Xn − μ0)∕σ in (19.9) and (19.10). It has been demonstrated in the literature that if a persistent mean shift occurs within the first few observations in Phase II monitoring, the self-starting CUSUM chart as described above has a weak power to detect it. Therefore, in practice, at least a dozen or more “in-control” observations should be collected in Phase II before using the self-starting CUSUM chart. Self-starting control charts are now popular in the literature ( [48, 49], and so on).

The traditional CUSUM chart (19.919.10) has an allowance parameter k, which should be set as δ∕2 where δ is the size of potential mean shift. In practice, δ is often unknown at the time when we design the CUSUM chart and hence choice of k is not straightforward. Sparks [50] proposed two approaches to solve this issue. Sparks’ first approach is to use several CUSUM charts with different k values simultaneously so that these charts target to detect mean shifts of different sizes. Such a joint control scheme gives an “out-of-control” signal of mean shift if at least one of the CUSUM charts detects a mean shift. Of course, the ARL0 values of these CUSUM charts have to be the same prefixed number. If we have prior information about the potential mean shift, we should incorporate the information while determining the k values. Sparks’ second approach is to estimate the size of mean shift δ recursively at each time-point and updating the value of k accordingly. The control limit should also be updated at each time-point so that we can maintain the prespecified ARL0 value. These are called adaptive CUSUM charts.

When the “in-control” distribution of the production process is not normal, then the traditional CUSUM charts should not be used. If we know that the “in-control” distribution is in exponential family such as gamma and Weibull distribution, then we can similarly construct CUSUM charts by using sequential probability ratio test. However, if the “in-control” distribution is completely unknown, we can use nonparametric control charts [11,12,13,14,15,16,17,18] that do not assume any “in-control” distribution.

The versions of CUSUM chart mentioned above were designed for detecting a step shift in process mean. However, in many applications, the process mean and/or variance changes gradually with or without a known parametric pattern, after the process becomes “out-of-control.” Such changes are called drifts. Gan [51], Davis and Woodall [52], and many other researchers proposed CUSUM charts for detecting linear drifts.

Recently, CUSUM charts with variable sampling rate become popular ( [53, 54] and many others). In this type of CUSUM chart, the sampling rate varies over time based on all observed data. There are many different types of sampling rate such as variable sampling intervals, variable sample sizes, etc. One major advantage of variable sampling rate CUSUM charts compared to fixed sampling rate CUSUM charts is faster detection of small to moderate shift in process mean. Recently, Li and Qiu [32] suggested implementing a CUSUM chart using statistical p-values and proposed the concept of dynamic sampling.

In the literature, researchers have constructed CUSUM charts for monitoring the variance of the process distribution as well.

2.3 EWMA Chart

In spite of having good theoretical properties, CUSUM charts were difficult to use in the 1950s when there was there was no computers. A simpler chart, called exponentially weighted moving average (EWMA) chart, was proposed by Roberts [55] in 1959. Under the same assumptions and notations of CUSUM chart, the EWMA charting statistic is defined as

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} E_n = \lambda X_n + (1 - \lambda) E_{n-1} \end{array} \end{aligned} $$
(19.12)

where E0 = μ0 and 0 < λ ≤ 1 is a weighting parameter. We can easily check that

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} E_n = \lambda \sum_{i=1}^{n} (1 - \lambda)^{n-i} X_i + (1 - \lambda)^{n} \mu_0, \end{array} \end{aligned} $$
(19.13)

and when the process is “in-control,”

$$\displaystyle \begin{aligned} E_n \sim N\left(\mu_0, \frac{\lambda}{2 - \lambda} [1 - (1 - \lambda)^{2n}] \sigma^2 \right). \end{aligned}$$

That means En is a weighted average of μ0 and all observations up to time-point n, and the weight received by Xi decays exponentially fast when i moves away from n. Therefore, it becomes easy to study the properties of En when the process is “in-control.” From the probability distribution of En, EWMA chart gives a signal for mean shift when

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} |E_n - \mu_0| > \rho_e \sigma \sqrt{\frac{\lambda}{2 - \lambda} [1 - (1 - \lambda)^{2n}]}, \end{array} \end{aligned} $$
(19.14)

where ρe > 0 is a control limit. λ > 0 is chosen beforehand, and the value of ρe is determined such that a specified value of ARL0 is achieved.

To be able to use the EWMA chart in practice, we need to choose λ values properly and then the value of ρe so that the prespecified ARL0 value is achieved. Just like the CUSUM charts, we need to specify a target shift size first and then we can search for a λ value and the corresponding ρe value such that the prespecified ARL0 value is achieved and the ARL1 value for detecting the mean shift of target size is minimized. As a general guideline, small λ values are good for detecting relatively small mean shifts, and large λ values are good for detecting relatively large means shifts. Crowder [56] provides a discussion on this issue. While we assume that the “in-control” process distribution is normal, some researchers have demonstrated that the EWMA chart is quite robust to normality assumption [3].

In case of autocorrelated observations, similar approaches as we described in the CUSUM chart can be implemented in case of EWMA charts as well. However, some researchers suggest applying the EWMA charts directly to the original data and then adjusting the control limits to reflect the impact of autocorrelations [57].

For using Shewhart, CUSUM, or EWMA chart as described before, μ0 and σ has to be known or estimated before the monitoring starts. As we have discussed before, this is not convenient in many applications, and hence we should use self-starting control charts. Just like the self-starting CUSUM control charts, we first transform the original data to Zi, i ≥ 1 and then apply the traditional EWMA chart to the transformed data. In the literature, there are several methods such as [58] where λ is chosen adaptively depending on the size of potential mean shift. These are called adaptive EWMA charts.

In the literature, EWMA charts have been developed when the process distribution follows other parametric forms. Some researchers [59, 60] constructed EWMA charts for monitoring processes following Weibull distributions. Borror et al. [61], Gan [62], and some others discussed process monitoring when the process distribution is Poisson. Perry and Pignatiello [63], Sparks et al. [64], and many others discussed EWMA process monitoring with binomial or negative binomial distributions.

All versions EWMA charts that we discussed so far are designed for detecting step shift in the process mean. In some practical situations, when the process becomes “out-of-control,” its mean departs gradually from the “in-control” level. It is important that we can detect such gradual departures, called drifts, as early as possible. In the literature, some researchers have modified EWMA charts to detect drifts efficiently ( [65] and others).

Just like CUSUM charts, researchers have constructed EWMA charts to monitor the variance of the production processes as well.

2.4 Control Charts by Change-Point Detection (CPD)

In change-point detection (CPD), the distribution of the first part of a sequence of random variables is assumed to be the same, the distribution of the remaining part of the random variables of that sequence is also assumed to be the same, but the distributions of the two parts are assumed to be different. The specific position in the sequence at which the distribution of the random variables changes from one to the next is called a change-point. Our major goal is to estimate the position of the change-point. Gombay [66], Hinkley [67], and many others in the literature provide detailed description of this topic. In change-point detection problems, the sample size is usually fixed. In Phase I SPC, the sample size is usually fixed, and then the change-point methods can be applied directly. In Phase II SPC, observations are obtained sequentially over time. Therefore, change-point methods must be applied appropriately in such cases. Recently, change-point methods have been modified and applied to the SPC problems ( [4, 5], and others) as well. Change-point-based control charts are good at detecting small and persistent shifts and can estimate the position of change-point efficiently.

Let us describe the change-point based control chart proposed by Hawkins et al. [4]. It assumes that the observations X1, X2, …, Xn follow this change-point model

where r is the change-point and εi, 1 ≤ i ≤ n is a sequence of i.i.d. random variables having the common distribution N(0, σ2). For testing the existence of the change-point, the likelihood ration test statistic is

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} T_{max,n} = \max_{1 \le j \le n-1} \sqrt{\frac{j(n-j)}{n}} \left|\overline{X}_j - \overline{X}_j^{\prime}\right|/\widetilde{S}_j \end{array} \end{aligned} $$
(19.15)

where \(\overline {X}_j\) and \(\overline {X}_j^{\prime }\) are sample means of first j and the remaining (n − j) observations, and \(\widetilde {S}_j^2 = \sum _{i=1}^{j} (X_i - \overline {X}_j)^2 + \sum _{i=j+1}^{n-j} (X_i - \overline {X}_j^{\prime })^2\). The change-point-based chart gives an “out-of-control” signal of mean shift when

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} T_{max,n} > \rho_n \end{array} \end{aligned} $$
(19.16)

where ρn > 0 is a control limit. If we have an “out-of-control” signal, then the position of change-point r is estimated by the maximizer in (19.15). Hawkins et al. [4] provided formulas to calculate the values of ρn for commonly used prespecified values of ARL0.

Just like other control charts, change-point-based charts were also constructed to accommodate other situations such as if the “in-control” process distribution is not normal, but other known parametric distribution. Change-point-based charts were also developed to monitor process variance as well.

Although the change-point-based control charts have their advantages over Shewhart, CUSUM, and EWMA charts, their computation is still relatively complex. This is because the estimator of the position of change-point has to be recalculated each time a new observation is obtained which involve a search in a sequence of n time-points. A systematic comparison between the performances of self-starting and adaptive traditional (CUSUM and EWMA) control charts and the change-point-based control charts is currently lacking in the literature.

In practice, many variables do not follow normal distribution. For example, many economic indices, rainfall distribution, lifetime distribution of many products, etc. are usually skewed to the right. In most instances it is difficult to find an appropriate parametric distribution for their modeling. Researchers use nonparametric methods to describe their distributions. When the normality assumption is not reasonable, several researchers have pointed out that the traditional control charts would be unreliable for process monitoring ( [6, 7, 16], and many others). In such cases, nonparametric statistical methods based on the ranking or ordering information of the observations can be considered for making inferences about the underlying process distribution. Another approach is based on data categorization. Clearly, both approaches have the limitation of losing useful information during ranking and categorization. However, the methods based on data categorization seem to be more efficient in process monitoring as use some information about observation magnitudes. Nonparametric versions of conventional charts such as Shewhart, CUSUM, EWMA, and change-point-based have been developed in the literature. Most nonparametric control charts in the literature are for Phase II SPC. However, there is only a limited discussion on Phase I SPC when the process distribution does not follow any common parametric form [68, 69].

Many researchers have developed control charts in the literature to jointly monitor process mean and variance as well. We skip such details in this chapter. We also omit descriptions of multivariate control charts as well. Interested readers should go through [30].

3 The Proposed SPC Chart

In this section, we describe our proposed Phase II chart for detecting persistent distributional change in univariate continuous processes. We assume that “in-control” probability distribution of the process is unknown, “in-control” Phase I data are unavailable, and the observations we collect are independent of each other. Let Y1, Y2, …, Yt, … be a sequence of independent observations during Phase II process monitoring. We start performing statistical monitoring right after time-point S0 ≥ 4, because we need to need have at least a few observations so that statistical tests have enough power. This is a common practice as discussed in [4, 27], etc. At each time t ≥ S0, we consider the following change-point framework. Assuming τ, 2 ≤ τ ≤ t − 2 to be a possible point of distributional change, we consider two samples {Y1, Y2, …, Yτ} and {Yτ+1, Yτ+2, …, Yt}, and perform Cramer-von Mises test to check if they are coming from the same unknown continuous cumulative distribution function (cdf). For statistically testing whether two samples are from the same continuous distribution, two tests are commonly used: Kolmogorov-Smirnov test and Cramer-von Mises test. Kolmogorov-Smirnov test does not work well if there are ties in the samples. Also, Cramer-von Mises test usually have more power than Kolmogorov-Smirnov test in many situations. Therefore, we use Cramer-von Mises test ahead of Kolmogorov-Smirnov test. The two-sample Cramer-von Mises test statistic is given by

$$\displaystyle \begin{aligned} &C_{\tau}(t) = \frac{\tau(t-\tau)}{t^2}\left( \sum_{i=1}^{\tau}(F^\ast(Y_i)-G^\ast(Y_i))^2 \right. \\ &\left.\qquad \qquad + \sum_{j=\tau+1}^{t}(F^\ast(Y_j)-G^\ast(Y_j))^2 \right),\\ \end{aligned} $$
(19.17)

where F and G are the empirical distributions associated with the samples {Y1, Y2, …, Yτ} and {Yτ+1, Yτ+2, …, Yt}. Large values of Cτ means {Y1, Y2, …, Yτ} and {Yτ+1, Yτ+2, …, Yt} are possibly coming from different cdfs. Suppose \(P^{(\mathrm {CvM})}_{\tau }(t)\) is the p-value of the two-sample Cramer-von Mises test. The details of computing \(P^{(\mathrm {CvM})}_{\tau }(t)\) are provided in Sect. 19.3.5 along with other computational technicalities of the proposed chart. A natural approach in this framework is to (i) calculate the p-values of two-sample Cramer-von Mises test for all possible values of τ, i.e., τ = 2, 3, …, t − 2, (ii) determine \(\tau ^\ast (t) = \arg \min _{\tau \in \{2, 3, \ldots , (t-2)\}} P^{(\mathrm {CvM})}_{\tau }(t)\), and (iii) use \(P^{(\mathrm {CvM})}_{\tau ^\ast }(t)\) as the charting statistic.

However, there are two major drawbacks of this approach. The first drawback is that the determination of τ(t) can be computationally expensive because we have to compute \(P^{(\mathrm {CvM})}_{\tau }(t)\) for all possible values of τ, i.e., 2, 3, …, (t − 2). Moreover, we need to execute similar procedures for all values of t ≥ S0 until we get a signal. We run into this issue in the change-point framework wherever we need to compute the relevant statistic for each value of the possible change-point. To reduce computation, we consider the following two techniques and both can be applied to the proposed SPC chart. The first technique is to estimate τ(t) efficiently by the method proposed in Sect. 19.2.1 which can be applied to other change-point problems of similar nature. The second technique is the pruning of data from distant past based on p-values as proposed by Mukherjee [28]. The second drawback is the weakness of the power of Cramer-von Mises test to detect a change in standard deviation or scale. The proposed chart addresses this issue by integrating Cramer-von Mises test with Ansari-Bradley test which is a nonparametric test to detect scale difference. We describe these approaches below.

3.1 A Computationally Efficient Approach to Estimate τ(t)

We note that the values of \(P^{(\mathrm {CvM})}_{\tau }(t)\) and \(P^{(\mathrm {CvM})}_{\tau '}(t)\) should not be too different as long as τ and τ′ are close. In other words, for each t, the values of \(P^{(\mathrm {CvM})}_{\tau }(t)\) for different τ are strongly autocorrelated. Therefore, we can make the procedure faster by first calculating \(P^{(\mathrm {CvM})}_{\tau }(t)\) for τ-values that are multiples of \(\lfloor \sqrt {t} \rfloor \) instead of all possible values of τ. For a demonstration purpose, consider the case when t = 100. Calculate \(P^{(\mathrm {CvM})}_{\tau }(t)\) for τ = 10, 20, …, 90 instead of τ = 2, 3, 4, …, 98. Then, select the value of τ for which \(P^{(\mathrm {CvM})}_{\tau }(t)\) is minimum, and calculate \(P^{(\mathrm {CvM})}_{\tau }(t)\) for τ-values in the two adjoining intervals. In other words, if τ = 30 gives the smallest \(P^{(\mathrm {CvM})}_{\tau }(t)\)-value among τ = 10, 20, 30, …, 90, compute \(P^{(\mathrm {CvM})}_{\tau }(t)\) for τ = 21, 22, …, 29, 30, 31, …39 and select the τ-value for which \(P^{(\mathrm {CvM})}_{\tau }(t)\) is the smallest. The detailed procedure to estimate τ(t) is described below.

For each t, instead of calculating \(P^{(\mathrm {CvM})}_{\tau }(t)\) for possible values of τ, i.e., 2, 3, …, (t − 2), calculate the statistic for \(\tau = i.\lfloor \sqrt {t} \rfloor \), where possible values of i are 1, 2, …, I(t), where I(t) is the largest integer for which \(I(t).\lfloor \sqrt {t} \rfloor \le (t-2)\). Here, \(\lfloor \sqrt {t} \rfloor \) is the largest integer smaller than or equal to t. Since we start monitoring when the time-point t ≥ S0 ≥ 4, we always have at least one positive integer i for which we can compute \(P^{(\mathrm {CvM})}_{\tau }(t)\). For example, when t = S0 = 4, we have to compute \(P^{(\mathrm {CvM})}_{\tau }(t)\) for only one value of τ, i.e., when τ = 2. Next, we find \(\tilde {\tau } = \arg \min _{i \in \{1,2, \ldots , I(t)\}} P^{(\mathrm {CvM})}_{i. \lfloor \sqrt {t} \rfloor }\). Since the τ(t) should be close to \(\tilde {\tau }\), we calculate \(P^{(\mathrm {CvM})}_{\tau }(t)\) for all integer values of τ within \(\left [ \max { \{ (\tilde {\tau }\,{-}\,\lfloor \sqrt {t} \rfloor \,{+}\,1), 2 \}}, \min {\{ (\tilde {\tau }\,{+}\,\lfloor \sqrt {t} \rfloor \,{-}\,1), (t{-}2) \}} \right ]\) and pick the integer within that interval for which \(P^{(\mathrm {CvM})}_{\tau }(t)\) is minimum. This is our estimated τ(t) and we call it \(\widehat {\tau }^\ast (t)\). The method is summarized as follows:

  1. (i)

    Calculate \(P^{(\mathrm {CvM})}_{\tau }(t)\) for \(\tau = i.\lfloor \sqrt {t} \rfloor \), where possible values of i are 1, 2, …, I(t), where I(t) is the largest integer for which \(I(t).\lfloor \sqrt {t} \rfloor \le (t-2)\).

  2. (ii)

    Find \(\tilde {\tau } = \arg \min _{i \in \{1,2, \ldots , I(t)\}} P^{(\mathrm {CvM})}_{i. \lfloor \sqrt {t} \rfloor }\).

  3. (iii)

    Calculate \(P^{(\mathrm {CvM})}_{\tau }(t)\) for all integer values of τ within the interval:

    $$\displaystyle \begin{aligned} \left[ \max{ \{ (\tilde{\tau}-\lfloor \sqrt{t} \rfloor+1), 2 \}}, \min{\{ (\tilde{\tau}+\lfloor \sqrt{t} \rfloor-1), (t-2) \}} \right]. \end{aligned}$$
  4. (iv)

    Estimate τ(t) by the integer within the interval in (iii) for which \(P^{(\mathrm {CvM})}_{\tau }(t)\) is minimum.

3.2 Integration of Ansari-Bradley Test with Cramer-von Mises Test

It is well documented in the literature (e.g., [27]) that Cramer-von Mises test does not have high power to detect changes in scale parameters. However, it has high power to detect changes in location parameters. Because of this weakness, we integrate Cramer-von Mises test with Ansari-Bradley test [70] in the proposed control chart. Ansari-Bradley test is a nonparametric test based on rank sum to detect differences in scale parameters. The integration procedure is as follows.

For each t, once τ(t) is estimated by the procedure in Sect. 19.2.1:

  1. (i)

    Record the p-value of two-sample Cramer-von Mises test for checking if \(\{Y_1, Y_2, \ldots , Y_{\widehat {\tau }^\ast }\}\) and \(\{Y_{\widehat {\tau }^\ast +1}, Y_{\widehat {\tau }^\ast +2}, \ldots , Y_t\}\) are realizations from the same continuous cdf. Obviously, it is \(P^{(\mathrm {CvM})}_{\widehat {\tau }^\ast (t)}(t)\).

  2. (ii)

    Perform Ansari-Bradley two-sample test for checking if the scale parameters of \(\{Y_1, Y_2, \ldots , Y_{\widehat {\tau }^\ast }\}\) and \(\{Y_{\widehat {\tau }^\ast +1}, Y_{\widehat {\tau }^\ast +2}, \ldots , Y_t\}\) are same. Call the p-value \(P^{(\mathrm {AB})}_{\widehat {\tau }^\ast (t)}(t)\).

  3. (iii)

    Calculate \(p^{(E)}(t) = \min \{P^{(\mathrm {CvM})}_{\widehat {\tau }^\ast (t)}(t), P^{(\mathrm {AB})}_{\widehat {\tau }^\ast (t)}(t)\}\), and use this as charting statistic.

When we integrate two different tests in a change-point-based chart, it is natural to perform change-point analysis for both tests, because the estimated change-points can be different for different tests. However, to avoid extra computation, we do not consider this approach. Numerical studies show that the proposed chart based on the procedures as described performs well in detecting changes in scale parameters. When there is a small change in scale parameters and the p-value of Cramer-von Mises test is not small enough to give a signal, it is still small and hence the change-points based on both tests should be close. The proposed chart just detects such change by integrating a more powerful test appropriately.

3.3 Data Pruning Based on P-Values

In case the proposed chart does not detect any distributional change for a long time, the sequence of “in-control” observations will be long, and hence calculation of \(\widehat {\tau }^\ast (t)\) as described in Sect. 19.2.1 can be time-consuming as well. If the influx of observations is rapid, then this chart in its present form cannot be used to monitor such processes. To solve this problem, consider datapruning from distant past. If p(E)(t), calculated by the procedure described in Sect. 19.2.2 is large, it is very unlikely that a distributional change has taken place. Therefore, we prune a few observations from distant past and focus on more recent observations. In this way, we make sure that the sequence of “in-control” observations does not become too long. We provide the description of the method below.

When \(\tilde {t} \ge T_0\), where \(\tilde {t}\) is the current length of the sequence of observations, and T0 is a threshold parameter of the SPC chart, we consider the possibility of data pruning. If p(E)(t) ≥ P0, we prune the oldest \(C(\tilde {t},p^{(E)}(t),P_0) = \Bigl \lfloor \tilde {t}. \min \left (0.2, \left (\frac {p^{(E)}(t) - P_0}{1 - P_0}\right ) \right ) \Bigr \rfloor \) number of observations. Here, we make sure not to prune too much in one step, by specifying that we cannot prune more than 20% of the current length of the sequence at one step. Here, we set the maximum amount of pruning at one step to be 20% based on the performance of the chart in our simulation experiments in terms of how fast it can compute and how early it can give a signal when a process goes “out-of-control.” While one can introduce a parameter for the maximum percentage we can prune at one step and select its value based on a reasonable criterion, we set it equal to 20% for simplicity. Once we prune, we can estimate τ(t) faster in the next step, i.e., after the arrival of next observation. The summary of data pruning is as follows:

  1. (i)

    When p(E)(t) < P0, give a signal for distributional change and stop process monitoring; otherwise, go to Step (ii).

  2. (ii)

    If \(\tilde {t} \ge T_0\), prune the oldest \(C(\tilde {t},p^{(E)}(t),P_0) = \Bigl \lfloor \tilde {t}. \min \left (0.2, \left (\frac {p^{(E)}(t) - P_0}{1 - P_0}\right ) \right ) \Bigr \rfloor \) number of observations, and collect the next observation. Otherwise, go to Step (iii).

  3. (iii)

    Do not prune and collect the next observation.

3.4 The Algorithm of the Proposed Control Chart

The summary of the procedures to run the proposed control chart is as follows:

  1. 1.

    When \(\tilde {t} < S_0\), collect the next observation. Otherwise, go to Step 2.

  2. 2.

    Calculate \(\widehat {\tau }^\ast (t)\), an estimate of τ(t) by the method described in Sect. 19.2.1. Go to Step 3.

  3. 3.

    Calculate \(P^{(\mathrm {CvM})}_{\widehat {\tau }^\ast (t)}(t)\), \(P^{(\mathrm {AB})}_{\widehat {\tau }^\ast (t)}(t)\), and p(E)(t) by the method described in Sect. 19.2.2. If p(E)(t) < P0, give a signal for distributional change and stop process monitoring; otherwise, go to Step 4.

  4. 4.

    Perform the data pruning procedure as described in Sect. 19.2.3. Go to Step 1.

3.5 Implementation

Calculation of p-values for two-sample Cramer-von Mises tests is computationally expensive and time-consuming especially when the sample sizes are large [71]. However, [72] provide the asymptotic distribution of two-sample Cramer-von Mises criterion. We also note that the convergence rate is rapid. Therefore, in our implementation, we use asymptotic p-values rather than the exact ones. Using the software R (https://www.r-project.org/) and the R-package CvM2SL2Test developed by Xiao [73], we extend the table provided by Anderson and Darling [72] to calculate p-values of two-sample Cramer-von Mises tests when they are smaller than 0.01. We use the same R-package CvM2SL2Test to calculate the two-sample Cramer-von Mises test statistics given as in (1) and then calculate their asymptotic p-values. For calculating (1), the R-package uses a C+ + program developed by Xiao et al. [74]. We use R-function ansari.test() to perform Ansari-Bradley tests. It should be noted that approximate p-values are calculated even for small samples in presence of ties. The proposed method is designed to monitor univariate continuous processes and hence cannot handle even a moderate number of ties in the observation sequence. However, this chart can still perform well in presence of a small number of ties.

4 Numerical Studies

We perform several numerical studies to evaluate the performance of the proposed method in comparison with a number of state-of-the-art change-point-based control charts. Our main goal is to find a chart that shows overall better performance to detect arbitrary distributional changes. We also study how the performances are affected by the number of “in-control” observations before the distributional change. Like most research articles in the literature, we also evaluate the performances of the control charts by the average number of observations, called “run length” after the first observation from the changed distribution is obtained. We call it ARL1. The smaller the value of ARL1, the better the performance. Of course, we set the control limits of all control charts such that the average number of “in-control” observations required to give a false signal when no distributional change takes place is a prefixed large number. We call it ARL0. In all our numerical simulations, we set ARL0 = 500 unless mentioned otherwise.

We compare the proposed method with four other change-point-based methods. The first method is proposed by Hawkins and Deng [13]. It is based on Mann-Whitney test and it aims to detect location shift. We call this method MW. The next competing control chart, proposed by Ross et al. [33], is based on Mood test [75]. This chart aims to detect scale change in the process distribution. We call it Mood. The third competing chart, also proposed by Ross et al. [33], is based on Lepage test [76]. It aims to detect changes in both location and scale of the process distribution, and we call it Lepage. Our fourth competing chart, proposed by Ross and Adams [27], aims to detect arbitrary distributional change using Cramer-von Mises test. We call this chart CvM.

Initially, we consider three “in-control” process distributions: standard normal, N(0, 1); standardized t-distribution with 2.5 degrees of freedom, ST(2.5); and standardized log-normal distribution with parameters 1.0 and 0.5, SLN(1.0, 0.5). Note that the mean and standard deviation of log-normal distribution with parameters 1.0 and 0.5 are 3.08 and 1.64, respectively. However, we approximate those by 3 and 1.6 while standardizing. We first consider shifts in location only, scale only, and both location and scale simultaneously. Finally, we consider arbitrary changes of various distributions. To compare the performances of various methods, we consider two cases: when the distribution change occurs early, right after time-point τ = 50, and when the change occurs late, right after time-point τ = 300. It should be noted that if a false signal occurs before the actual distributional change, we disregard that sequence in our simulation, as it is a reasonable practice.

We use the R-package cpm to run the competing charts. For the proposed method, we need to select S0 and T0. For all methods, we select the startup time to be 20, and for the proposed method, we select S0 = 20 and T0 = ARL0 = 500. For comparing the performances of the methods when the distributional change is arbitrary, we run four versions of the proposed chart called P250, P500, P1000, and P when T0 = 250, 500, 1000, and , respectively. From Table 19.6, we see that T0 = 500 is a reasonable choice considering the fact that larger values of T0 requires more time for computation. It should be noted that the proposed chart that is designed to detect an arbitrary distributional change may not outperform the charts designed to detect a specific type of distributional changes when the actual distributional change is of that particular type. For example, MW should perform better than the proposed chart when the distributional change involves location change only. However, our goal is to find a chart that performs well to detect all types of distributional changes so that it can be used in various applications where the natures of changes are unknown.

4.1 Location Changes

First, we focus on changes in location only. We consider four different amounts of shift, δL = 0.25, 0.5, 1.0, and 2.0. For each of three distributions N(0, 1), ST(2.5), and SLN(1.0, 0.5), we generate 50, 000 sequences of observations where the post-change observations right after time-point τ are calculated by adding δL. From Table 19.1, we observe that

Table 19.1 Mean delay in detection of various location shifts δL occurring right after time τ. “In-control” distributions considered are N(0, 1), ST(2.5), and SLN(1.0, 0.5). The results are based on 50, 000 random simulations when ARL0 = 500
  • MW and CvM are slightly better than the proposed method when δL is small or moderate. When δL is large, all methods except Mood perform well. The reason is that MW is designed to detect location changes only; therefore, it has better power when detecting location changes compared to the methods designed to detect arbitrary distributional changes. Mood is designed to detect scale changes; therefore, it cannot perform well in this case.

  • Detections of location changes are faster when the “in-control” distribution is ST(2.5) or SLN(1.0, 0.5) compared to the N(0, 1) case.

  • Changes occurring right after time-point τ = 300 are easier to detect compared to the changes occurring right after τ = 50.

4.2 Scale Changes

Now, we focus on changes in scale only. We consider four scale changes, namely, δS = 1.50, 0.50, 2.00, and 0.25. For each of three distributions, N(0, 1), ST(2.5), and SLN(1.0, 0.5), we generate 50, 000 sequences of observations where the post-change observations are calculated by multiplying δS. From Table 19.2, we observe that

Table 19.2 Mean delay in detection of various scale changes δS occurring right after time τ. “In-control” distributions considered are N(0, 1), ST(2.5), and SLN(1.0, 0.5). The results are based on 50, 000 random simulations when ARL0 = 500
  • Mood and Lepage are slightly better than the proposed method in some cases. The reason is that Mood is designed to detect scale changes only and Lepage is designed to detect both location and scale change. Therefore, they have better powers when detecting scale changes compared to the methods designed to detect arbitrary distributional changes. MW is designed to detect location changes; therefore, it cannot perform well in this case. The proposed method performs much better than CvM. The reason is the incorporation of Ansari-Bradley test with Cramer-von Mises test.

  • MW performs fairly well when δS > 1.00 but performs very poorly when δS < 1.00. One explanation of this is when δS > 1.00, extremely large or small numbers are more likely, and MW interprets those as location changes. This phenomenon is commented in [27] and in [24].

  • Detections of scale changes are faster when the “in-control” distribution is ST(2.5) and SLN(1.0, 0.5) compared to the N(0, 1) case.

  • In this case also, changes occurring right after time-point τ = 300 are easier to detect compared to the changes occurring right after τ = 50. MW, however, cannot detect a decrease in scale at all.

4.3 Location-Scale Changes

Now, we focus on changes in location and scale simultaneously. We consider 8 changes in location and scale simultaneously, namely, (δL, δS) =  (0.50, 1.50), (0.50, 0.50), (0.50, 2.00), (0.50, 0.25), (1.00, 1.50), (1.00, 0.50), (1.00, 2.00), and (1.00, 0.25). For each of the three distributions N(0, 1), ST(2.5), and SLN(1.0, 0.5), we generate 50, 000 sequences of observations where the post-change observations are calculated by multiplying δS and then adding δL. From Tables 19.3, 19.4, and 19.5, we observe that

Table 19.3 Mean delay in detection of various amounts of changes in location and scale simultaneously, i.e., (δL, δS), occurring right after time τ. “In-control” distribution considered here is N(0, 1). The results are based on 50, 000 random simulations when ARL0 = 500
Table 19.4 Mean delay in detection of various amounts of changes in location and scale simultaneously, i.e., (δL, δS), occurring right after time τ. “In-control” distribution considered here is ST(2.5). The results are based on 50, 000 random simulations when ARL0 = 500
Table 19.5 Mean delay in detection of various amounts of changes in location and scale simultaneously, i.e., (δL, δS), occurring right after time τ. “In-control” distribution considered here is SLN(1.0, 0.5). The results are based on 50, 000 random simulations when ARL0 = 500
  • Lepage performs the best, as it should because it is designed to detect changes in location and scale simultaneously. Mood works well in some cases. The proposed method works well, better than CvM in many cases.

  • In these case also, detections of changes in location and scale simultaneously are faster when the “in-control” distribution is ST(2.5) and SLN(1.0, 0.5) compared to N(0, 1) case.

  • Changes occurring right after time-point τ = 300 are easier to detect compared to the changes occurring right after τ = 50. MW, however, cannot detect a decrease in scale at all.

4.4 General Distributional Changes

Finally, we consider general changes of various “in-control” distributions. The corresponding changes in the pair of mean and standard deviations are large in some cases (e.g., Weibull(1) to Weibull(3) and vice versa, Gamma(2,2) to Gamma(3,2) and vice versa) and small or zero in some other cases (e.g., N(0, 1) to ST(2.5) and vice versa, N(0, 1) to SLN(1.0, 0.5) and vice versa). From Table 19.6, we observe that

  • No one method is uniformly best. P500 works well in all cases. Note that P500 is the proposed method when T0 = 500 as defined in the fourth paragraph of Sect. 19.3.

  • In cases where another method is the best, P500 is not far behind.

  • The performance of P500 is not far from the best choice of T0 for the proposed method. The larger the value of T0, the more the computing time. Therefore, P500 is a good balance between computing time and performance. Hence, we suggest using T0 = ARL0 in most applications.

  • In cases where location change is large, MW works well, and when scale change is large, Mood works well, as expected.

  • Distributional change from N(0, 1) to SLN(1.0, 0.5) which does not alter the mean and standard deviation is detected by the proposed method earlier than other charts.

  • When we are not sure about the nature of possible distributional change, the proposed method is a good choice.

  • Changes occurring right after time-point τ = 300 are easier to detect compared to the changes occurring right after τ = 50. MW cannot detect an arbitrary distributional change well if the median does not change by much.

Table 19.6 Mean delay in detection of various general changes in distribution. The results are based on 50, 000 random simulations when ARL0 = 500

Numerical simulations when ARL0 = 200 are provided in Tables 19.7, 19.8, 19.9, 19.10, 19.11, and 19.12. Similar conclusions as provided above can be drawn from these tables as well.

Table 19.7 Mean delay in detection of various location shifts δL occurring right after time τ. “In-control” distributions considered are N(0, 1), ST(2.5), and SLN(1.0, 0.5). The results are based on 10, 000 random simulations when ARL0 = 200
Table 19.8 Mean delay in detection of various scale changes δS occurring right after time τ. “In-control” distributions considered are N(0, 1), ST(2.5), and SLN(1.0, 0.5). The results are based on 10, 000 random simulations when ARL0 = 200
Table 19.9 Mean delay in detection of various changes in location and scale simul- taneously, i.e., (δL, δS) occurring right after time τ. “In-control” distribution considered here is N(0, 1). The results are based on 10, 000 random simulations when ARL0 = 200
Table 19.10 Mean delay in detection of various changes in location and scale simul- taneously, i.e., (δL, δS) occurring right after time τ. “In-control” distribution considered here is ST(2.5). The results are based on 10, 000 random simulations when ARL0 = 200
Table 19.11 Mean delay in detection of various changes in location and scale simul- taneously, i.e., (δL, δS) occurring right after time τ. “In-control” distribution considered here is SLN(1.0, 0.5). The results are based on 10, 000 random simulations when ARL0 = 200
Table 19.12 Mean delay in detection of various general changes in distribution. The results are based on 10, 000 random simulations when ARL0 = 200

From these simulation studies, we see that the proposed method works well in most applications. When we are trying to detect a specific type of distributional change, its performance is slightly worse than the chart that is designed to detect that particular type of changes. However, in most cases, the differences are not much. Therefore, when the nature of distributional change is unknown, the proposed chart can be used with anticipation of a good performance.

5 Analysis of Various Real-World Data

Now, we focus on applications of the proposed chart on real-world problems. We consider two datasets: a climate data and a blood glucose monitoring data.

5.1 Climate Data on Minneapolis, USA

Monitoring for the changes of patterns of various climatological measurements such as maximum and minimum temperatures on a daily, monthly, and yearly basis, amounts of rainfall, snow, measurements of snow depth, number of rainy or snowy days is an emerging research area. In the literature, various statistical methods are demonstrated with the capacity of analyzing such data. Modern statistical process control (SPC) charts also deserve a chance to monitor such climatological variables. In this context, we consider mean daily maximum temperature in Fahrenheit in the month of January in Minneapolis, USA. We collect the data from http://www.dnr.state.mn.us/climate/twin_cities/listings.html. The data are from 1873 to 2017 with no missing value and are presented in Fig. 19.1. At first, before applying the control charts, we check if our assumption of independence of the observations is reasonable. Durbin-Watson test (R function: dwtest, R package: lmtest) for two-sided alternative gives a high p-value 0.8359 showing lack of autocorrelation. Therefore, we can assume that the observations are independent. Now, we apply the proposed method along with other competing methods when we set ARL0 = 500. For the proposed method, we set T0 = ARL0 = 500 as per our suggestion mentioned before. The proposed method detects a distributional change at 1956 while the estimated change-point is 1947. All other methods, i.e., MW, Mood, Lepage, and CvM, do not detect any distributional change. We also run all SPC charts when we set ARL0 = 200. In this case also, the proposed method detects a distributional change at 1956 while the estimated change-point is 1947. Mood also detects distributional change but at a later time at 1960 while the estimated change point is 1947. MW, Lepage, and CvM still do not detect any distributional change. Now, we study the “in-control” distribution and the estimated “out-of-control” distribution. Table 19.13 shows the first four sample moments of “in-control” observations from 1873 to 1947 and “out-of-control” observations from 1948 to 1956. From Table 19.13, we see that the second moment changed a lot, but not the other three moments. We carry out the calculations using R-package moments.

Fig. 19.1
figure 1

Mean daily maximum temperature in the month of January in Minneapolis, USA. Setting ARL0 = 500, the proposed chart detects a distributional change in the year 1956 estimating the change-point to be just after 1947. Other charts do not detect any distributional change

Table 19.13 First four sample moments of the observations from 1873 to 1947 and 1948 to 1956

5.2 Blood Glucose Monitoring Data

Monitoring blood glucose level on a daily basis is essential for advanced diabetic patients. It gives information on whether the particular lifestyle change and the treatment procedure including the medicine with its administered dosage are working well for the patient. Monitoring such measurements is complicated because we are not just focusing on the mean and the standard deviation only; we need to monitor the stability of its probability distribution as well. We collect a data from UCI Machine Learning Repository [77]. This directory contains a dataset prepared for the use of participants for the 1994 AAAI Spring Symposium on Artificial Intelligence in Medicine. For our analysis, we pick the data from the first patient. We choose to monitor pre-breakfast blood glucose level on a daily basis. We prefer this over blood glucose measurements at other times because they depend too much on many variables like the type of food, amount of food, etc. The data from the first patient contains daily pre-breakfast observations from 21 April 1991 to 3 September 1991 with the number on 18 August 1991 missing. The data are presented in Fig. 19.2 ignoring the only one missing value. As always, before applying the control chart, we check if our assumption of independence of the observations is reasonable. Durbin-Watson test for two-sided alternative gives a high p-value 0.7727 showing lack of autocorrelation. Therefore, our assumption of independence is reasonable. Now, we apply the proposed method along with other competing methods when we set ARL0 = 500. For the proposed method, we set T0 = ARL0 = 500 like before. The proposed method detects a distributional change on 11 June 1991 while the estimated change-point is on 5 June 1991. Other competing charts do not detect any distributional change. Running the charts for ARL0 = 200 gives similar results except that the proposed chart detects distributional change one day earlier on 10 June 1991 while the estimated change-point being the same as before. Table 19.14 shows first four moments before and after the distributional change. The standard deviation decreases a lot showing a more stable fasting blood glucose numbers, and also the skewness appears to have decreased considerably. It is to be noted that the phrases “in-control” and “out-of-control” we use are in the sense of standard terminologies in SPC literature, not in the sense of blood glucose control.

Fig. 19.2
figure 2

Daily pre-breakfast blood glucose measurements of a selected patient. Setting ARL0 = 500, the proposed chart detects a distributional change at the 52nd day estimating the change-point to be the 46th day. Other charts do not detect any distributional change

Table 19.14 First four sample moments of the observations from 21 April 1991 to 5–6 June 1991 to 11 June 1991

6 Concluding Remarks

This chapter first describes a few commonly used traditional statistical process control (SPC) charts such as Shewhart, CUSUM, EWMA, and change-point-based (CPD) control charts and discusses a number of situations where these charts should be appropriate modified for practical use. Next, this chapter proposes a change-point based nonparametric statistical process control chart for detecting arbitrary distributional changes when the process distribution is univariate continuous. There are two specifically important contributions of the proposed chart. The first one is the combination of the strengths of two statistical tests: Cramer-von Mises test and Ansari-Bradley test. The second one is the introduction of a numerically efficient technique to estimate the possible change-point without sacrificing the accuracy by much. Both these contributions are quite general in nature and have broad applications well beyond the numerical examples and real-world data analyses shown in this chapter including monitoring fast data streams. Another important aspect of this control chart is that runtime distribution is not geometric even when the process is “in-control.” The reason is that the probability of getting a signal is a function of current runtime. However, it appears that such charts can still be used in many applications. However, further research is required to fully understand the pros and cons of such charts.