Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Our aim is to propose a new look at change detection tasks that arise when we observe a large number of parallel processes that may change in time. To motivate our approach, consider a sequence of images provided by a camera that follows the quality of a certain production process. Each image contains millions of pixels. Fixing our attention at a particular pixel we can observe fluctuations of its grey levels in time as one time series. Applying a change detector (e.g., EWMA, CUSUM etc.) to all time series arising from observing each pixel, for each instant of time we obtain a set of YES/NO decisions concerning the presence or absence of a change. It is clear that the change at one pixel only is rather unimportant from the view point of the production quality control. We should rather concentrate on more massive changes that arise in a spatially concentrated area at the same (or approximately the same) time.

From the statistical point of view quite similar change detection tasks arise when a bank observes the amount of money collected on accounts of its clients. Even a sharp change of deposits of one or several clients is usually not important. However, when we detect essential deposit changes of a larger group of clients approximately at the same time (and possibly in the same city or region), then it should be an indicator that something important may happened in the banking market.

One more example of a need for detecting massive changes of parallel processes comes from following intensity of the traffic in the Internet. The growth of the traffic intensity approximately at the same time to a group of web pages is well known indicator of hackers attack. In this case a geographical closeness may not appear, but attacked web pages may have similarities of other kind, e.g., the same owner.

In the same vain one can consider:

  • a health care system, when the growing number of patients in a certain area should be detected as a possible indicator of an epidemia,

  • a stock market – prices of shears of enterprises can fluctuate, but a rapid reduction of them at a certain area can be a symptom of certain economic changes.

All the above examples have the following common features:

  1. 1.

    observed processes run in parallel, but not necessarily independently, in time at different sites (spatial locations),

  2. 2.

    change detection along time axis at one or even a few site(s) (pixels) can be neglected, unless they have very high or very low values in comparison to typical (in-control) state,

  3. 3.

    moderate in size changes of observed variables, arising in time and at moderate area in space are typical cases to be detected,

  4. 4.

    smaller changes of observed variables, but arising at larger areas in the spatial domain should (or at least could) be considered as an alarm,

  5. 5.

    changes along time axis may arise not necessarily at the same time instant, but can be spread in a certain time interval.

Additionally, one should distinguish between up and down changes, because at a certain area the number of up changes may dominate largely the number of down changes and the former can be neglected.

Clearly, the terms used above like: “high” and “low” changes as well as “close” time instants and “larger area” are problem dependent and require to be defined precisely at scales relevant to an application at hand.

The above list of possible changes of interest can be named spatio-temporal change detection problems. One can try to solve some of them using the classic control charts and aggregating observations over the spatial domain. However, such approaches may lead to overlooking changes along time axis when the aggregation covers larger spatial regions. Furthermore, not all the above sketched problems can be solved using a spatial aggregation.

For these reasons it seems justified to consider new kinds of spatio-temporal change detectors. Apparently, it is not possible to propose change detectors for all the above mentioned problems in one paper.

We propose a simple change detector of changes in time that is well suited for parallel use at a large number of spatial sites. The idea is based on exponentially weighted moving average smoothing (EWMAS), but the detector itself is different than the one that is used in the classic EWMA chart. In particular, it allows to distinguish between jumps of a moderate size and those that are large. It keeps the main advantage of the classic EWMA chart, namely, there is no need to store historical data, i.e., for the current decision it suffices to have the present smoothed state and the current observation, which is crucial importance when we have to monitor millions of sites or pixels. The EWMAS is based on the idea of vertical weighting that was used for detecting changes in space (edges) in [14].

Detection of spatio-temporal changes may also include

  1. 1.

    changes along curves at the spatial domain that are observed at the same (or close) time instant(s) can also be of interest (e.g., as in edge detection tasks in image sequence processing),

  2. 2.

    changes of the observed variable that “travels” in time along curves at the spatial domain.

These tasks are much more difficult than those listed above and they are outside the scope of this paper.

Quality control of continuously running industrial production processes is the subject of research for many years (see [10] and the bibliography cited therein and [16, 18] for recently proposed nonparametric control charts). These charts as well as classic control charts like the Shewhart one, CUSUM, EWMA are well suited for detecting changes in time. In the stream of research called spatial statistics (see [3] and the bibliography cited therein) the topic of detecting changes in space domain is present. Somewhat unexpectedly, detecting changes simultaneously in time and space has not so rich bibliography as one might expect. The main contributions in this direction come from applications of image sequences processing and their applications in geoscience (see [1, 5, 6, 11, 17]). Quickest detection of significant changes in a sensor net that is based on a non-cooperative stopping game which is a model of the multivariate disorder detection has been proposed in [19]. The approach proposed in [12] also covers spatio-temporal changes as a special case of detecting jumps of time series with values in a Banach space.

In recent years one can observe a rapid development of relatively cheap, high resolution and high speed industrial cameras that are well suited for quality monitoring of such processes (see [7]). Simultaneously, a high speed, running in parallel computers and graphical processing units (GPU) made it possible to process sequences of high resolution images on-line. As a result, the stream of research on control charting with image data, which is closely related to this paper, is rapidly growing (see [9] for a stimulating review and [8, 13] for more recent contributions).

The paper is organized as follows. In the next section we describe our version of EWMAS temporal change detector and present its elementary properties. Then, we shall describe how a bank of such change detectors can be used to detect spatio-temporal changes. Finally, we present an example of application to quality control of a copper slab using images from a camera.

EWMA Smoothed Jump Detector

For simplicity, we shall describe our jump detector in 2D spatial case, but the extension to larger dimensions is immediate. Let x=(x (1),x (2))∈Ω denotes a spatial position (e.g., of a pixel or site) in a rectangular domainFootnote 1 (image) Ω. By t=1,2,… we denote time instants when observations are made (e.g., images from a camera are sampled). Observed real-valued random field (e.g., grey-level image) Y(x,t) results from observing an unknown function m(x,t) with zero mean, finite variance additive errors ε(x,t), i.e.,

$$\begin{aligned} Y(x,t)= m(x,t)+ \varepsilon (x,t), \quad x\in\varOmega,\ t=1,2,\dots . \end{aligned}$$
(48.1)

The probability distribution of ε(x,t) is unknown, but – for simplicity of the exposition – we assume that there exists its p.d.f., denoted by f ε , which does not depend on x and t and it is symmetric. We also assume that Y(x,t) and Y(x′,t′) are uncorrelated for tt′, t,t′=1,2,…, x,x′∈Ω, even if x=x′. However, for each t a spatial correlation is allowed. This assumption will be used only when theoretical properties of our jump detector are investigated.

Consider a symmetric and unimodal kernel K:RR + such that K(0)=1 and K(z)→0 as |z|→∞. In particular, the gaussian K G (z)=exp(−z 2/2) kernel and the uniform one: K U (z)=1 for |z|≤1 and K U (z)=0 for |z|>1 are of special interest.

Then, for m(x,t) we have the following nonlinear equation (see [15] for the proof that can be adopted to the case considered here):

$$\begin{aligned} &m(x,t)= \kappa^{-1} E \bigl[Y(x,t)K \bigl(\bigl(Y(x,t)-m(x,t)\bigr)/H \bigr) \bigr], \\ & \quad x\in\varOmega,\ t=1,2,\dots \end{aligned}$$
(48.2)

where \(\kappa \stackrel{\mathit{def}}{=} \int _{-\infty }^{\infty}K(z/H) f_{\varepsilon }(z) dz\). Equation (48.2) can be the source of many empirical versions for estimating m(x,t). We select one of the simplest that can be run in parallel w.r.t. time for each site (pixel) xΩ. Namely,

$$\begin{aligned} \hat{m}(x, t+1)=(1-\alpha) \hat{m}(x,t)+\frac{\alpha}{\hat{\kappa }} Y(x,t)K \bigl(\bigl(Y(x,t)-\hat{m}(x,t)\bigr)/H \bigr), \end{aligned}$$
(48.3)

where t=1,2,…, xΩ, while 0<α<1 is a smoothing parameter. Also K and H>0 are selected by the statistician. \(\hat{\kappa}\) can be estimated from residuals, because κ=E[K(ε(x,t)/H)]. If the variance of ε(x,t) is small in comparison to H 2, then κ is close to 1 and later on we take \(\hat{\kappa}=1\).

A really fast version of (48.3) one obtains for the uniform kernel:

$$\begin{aligned} \hat{m}(x, t+1)=(1-\alpha)\hat{m}(x,t)+ \textstyle\begin{cases} 0, & \mbox{if } |Y(x,t)-\hat{m}(x,t)| > H \cr \alpha Y(x,t)& \mbox{if } |Y(x,t)-\hat{m}(x,t)|\leq H \end{cases}\displaystyle \end{aligned}$$
(48.4)

From (48.4) it is clear that \(\hat{m}(x, t+1)\) is essentially updated only if there is no jump larger than H>0.

Spatio-Temporal Change Detector – Basic Version

Step 1 – Detection of changes in time. :

For current time instant t and for every xΩ calculate matrix B(x,t) of the same size as Ω in the following way:

$$\begin{aligned} B(x, t)= \textstyle\begin{cases} 1, & \mbox{if } |Y(x,t)-\hat{m}(x,t)| > H \cr 0, & \mbox{if } |Y(x,t)-\hat{m}(x,t)|\leq H \end{cases}\displaystyle \end{aligned}$$
(48.5)
Step 2 – First decision. :

If ∑ xΩ B(x,t)≤θ 0, set t=t+1, calculate (48.4) and go to Step 1, otherwise, go to Step 3. Here θ 0≥0 is a threshold preselected in such a way that if we met only a few number of sites with time changes, we can decide that there were no essential changes in the space–time domain.

Step 3 – Removing small clusters in the space domain. :

For fixed t one can interpret B(x,t) as a binary image and apply image processing tools like morphological erosion or blob analysis (see [2]) in order to remove single sites or small clusters of them by setting the corresponding B(x,t)=0.

Step 4 – Final decision. :

Select θ 1>θ 0 as a threshold for declaring essential spatio-temporal change. If ∑ xΩ B(x,t)≥θ 1, then declare essential change, set t=t+1 and go to Step 1. Otherwise, calculate (48.4) and also set t=t+1 and go to Step 1.

Remark

Only (48.4) and Step 1 can be run in parallel, but these are the most time consuming operations, because they are repeated for all xΩ and all t.

If we skip Step 2 and Step 3 and fix a particular xΩ, then we can compare the above algorithm for change detection in time with other control charts. As one can notice, (48.4) runs as the EWMA chart with two exceptions. Namely, \(\hat{m}(x,t)\) is updated only when new observation is close to it. Thus, \(\hat{m}(x,t)\) estimates the process mean, but only in-control states. In contrary, in EWMA chart \(\hat{m}(x,t)\) also jumps are incorporated into \(\hat{m}(x,t)\), if they were not detected. The second difference is in that in the classic EWMA chart \(\hat{m}(x,t)\) is compared to the threshold in order to detect jumps. Here, the decision is based on the difference \(Y(x,t)- \hat{m}(x,t)\), which resembles the Shewhart control chart, however with important difference that the smoothed in-control behavior \(\hat{m}(x,t)\) is the base for comparisons. One may hope that the proposed combination of the EWMA smoothing idea and the Shewhart chart gives a detector that will be useful for spatio-temporal change detection.

By simple modifications one can easily tune the above basic algorithm to a variety of particular applications.

  1. 1.

    When only jumps above the mean are of interest, i.e., the conditions in (48.4) and in (48.5) are replaced by \(Y(x,t)-\hat{m}(x,t)> H\) (resp. ≤H), then in Step 4 the final decision can take also jump heights into account as follows: \(\sum_{x\in\varOmega} (Y(x,t)- \hat{m}(x,t)) B(x,t)\geq \theta_{1}\).

  2. 2.

    In Step 4 the final decision takes into account changes detected at the same time instant. When sampling rate in time is high, one can consider also changes that occurred at several earlier time instants J≥1, i.e., \(\sum_{j=0}^{J} \sum_{x\in \varOmega} B(x, t-j) \geq\theta_{1}\) at all spatial points. This require to store (J+1)th previous B(x,tj) matrices.

  3. 3.

    One can replace the conditions in (48.5) by the following:

    $$\begin{aligned} \Biggl|Y(x,t)-z^{-1} \sum_{x\in Z(x)}\hat{m}(x,t)\Biggr| > H \quad (\leq H, \ \mbox{resp.}) , \end{aligned}$$

    where Z(x) is a neighborhood of x, while z is its cardinality. This version is less sensitive to false alarms, but not so easy to run on parallel processors as (48.5).

Some Properties of the Spatio-Temporal Change Detector

In this section we announce simple properties of the basic version of our spatio-temporal change detector. By the lack of space, we omit most of the proofs that will be published elsewhere. For simplicity, we assume that in the basic algorithm Step 3 and Step 4 is omitted.

For simplicity we assume that random errors are commonly bounded, i.e., there exists \(\mathscr {E}\) such that with probability 1, \(|\varepsilon (x,t)|\leq \mathscr {E}\). H is selected such that \(H\geq2 \mathscr {E}\), which means that there are no guarantees of detecting jumps smaller than \(2 \mathscr {E}\).

In Control Behavior

Let us assume for a while that there are no spatio-temporal jumps, i.e., Y(x,t)=M(x)+ε(x,t), xΩ, t=1,2,…, where M(x) is a stationary proper background process. If our algorithm starts from \(\hat{m}(x, 0)=Y(x, 0)\), then the following properties can be proved.

InC1:

\(E[\hat{m}(x,t)] = M(x)\), xΩ, t=1,2,… .

InC2:

The false alarm probability is zero for all xΩ and t>1. Notice that this is the consequence of the assumptions: \(|\varepsilon (x,t)|\leq \mathscr {E}\) and \(H\geq2 \mathscr {E}\).

InC3:

For xΩ and t=1,2,… , define \(\hat{\varepsilon }(x,t)=M(x)-\hat{m}(x,t)\). Then, for \(\hat{\varepsilon }\) the following recurrent relationships hold:

$$\begin{aligned} \hat{\varepsilon }(x,t)= (1-\alpha) \hat{\varepsilon }(x, t-1)+\alpha \varepsilon (x,t), \quad t=1,2,\dots \end{aligned}$$
(48.6)

with the initial condition \(\hat{\varepsilon }(x, 0)=\varepsilon (x, 0)\). Furthermore, (48.6) implies \({|\hat{\varepsilon }(x,t)|\leq \mathscr {E}}\).

InC4:

For xΩ and t=1,2,… we have \({|Y(x,t)-\hat{m}(x,t)| \leq2 \mathscr {E}}\).

Change Detection

We firstly consider change detection in time for arbitrary but fixed spatial site xΩ. To this end we assume that at a certain time instant t 0>1 for the first time

$$\begin{aligned} Y(x, t_0)=M(x)+r(x, t_0)+\varepsilon (x, t_0), \quad x\in\varOmega \end{aligned}$$
(48.7)

where r(x,t 0) is a jump to be detected, which is assumed to be persistent ((48.7) holds also for t>t 0) and bounded away from 0, i.e., there exists R>0 such that r(x,t 0)>R, xΩ.

We shall assume that R is known and \(R>3 \mathscr {E}\), because it defines the smallest jump that we are able to detect immediately, as we shall see below. Select H>0 such that

$$\begin{aligned} \mathscr {E} \leq H < R - 2 \mathscr {E}. \end{aligned}$$
(48.8)

Let us note that \(\hat{m}(x, t_{0})=M(x)+\hat{\varepsilon }(x, t_{0})\), according to (48.6), which can be invoked here, because there was no jump before t 0. Hence, using this equality and (48.7) we obtain

$$\begin{aligned} \bigl|Y(x, t_0)-\hat{m}(x, t_0)\bigr| = \bigl|r(x, t_0)+\varepsilon (x, t_0)-\hat {\varepsilon }(x, t_0)\bigr|\geq R-2 \mathscr {E}. \end{aligned}$$
(48.9)

The last inequality follows from the following facts: r(x,t 0)>R, \(|\varepsilon (x, t_{0})|\leq \mathscr {E}\), which also implies \(|\hat{\varepsilon }(x, t_{0})|\leq \mathscr {E}\) with probability one. In the worst case \(r(x, t_{0})+\varepsilon (x, t_{0})-\hat{\varepsilon }(x, t_{0})=R-\mathscr {E}-\mathscr {E}\), which finishes the proof of (48.9). According to (48.8) this implies \(|Y(x, t_{0})-\hat{m}(x, t_{0})|>H\).

Corollary 48.1

Under the above assumptions the jump is detected immediately after its occurrence at each site xΩ where it appears.

The above corollary was obtained under idealized assumptions. In practice, \(\mathscr {E}\) is not known and should be estimated from previous runs. If the errors have a distribution with infinite support, then one can select \(\mathscr {E}\) so as with probability 0<β<1 errors are contained in the interval \([-\mathscr {E}, \mathscr {E}]\). Then, repeating the above reasoning, we can say that with probability at least β jumps will be detected at time t 0 at all sites where they happened. Hence, if jumps appeared at K>θ 1 sites, then the probability that less than θ 1 of them will be detected at t 0 can easily be calculated from the binomial distribution, since the events of detecting or not detecting a jump at each site at t 0 are independent. If a jump in a certain site is not detected at t 0 it will be detected later with a high probability, but evaluating it is not so easy, because heights of the undetected jumps enter into \(\hat{m}(x, t_{0}+j)\), j=1,2,… .

Example

The above approach to spatio-temporal change detection can be used for quality control of continuously running processes like production of plain fabrics, paper, steel sheets, wires, slabs, uniformly painted surfaces etc. The idea is based on a simple constatation that it is very difficult to detect a motion of a uniformly painted or produced surface. In contrary, any defects, having different grey levels than the proper surface, are easier to detect as moving objects, because they are frequently visible at several subsequent images. Additional feature of our approach is its ability to follow slow changes of a background, caused, e.g., by changes of its temperature (see also [4]).

Exactly such circumstances appear when we want to detect defects (darker places) on a proper (bright) surface of a hot copper slab continuously moving before a camera. Slow changes of the proper surface temperature make the task more difficult. Applying the proposed approach with the uniform kernel K, α=0.5 and H=6 grey levels (scale [0,255]) provides the results shown in Fig. 48.1, where matrices B(x,t) and B(x,t+1) are displayed as images. As one can notice, the results are quite satisfactory – the same configuration of two cluster of defects was detected in two subsequent images (enclosed by ellipse) and in the next two, which are not displayed.

Fig. 48.1
figure 1

Defects detected on the copper slab – two subsequent frames (the left image was taken first)