Keywords

1 Introduction

The quickest change detection/isolation (multidecision) problem is of importance for a variety of applications. Efficient statistical decision tools are needed for detecting and isolating abrupt changes in the properties of stochastic signals and dynamical systems, ranging from on-line fault diagnosis in complex technical systems (like networks) to detection/classification in radar, infrared, and sonar signal processing. The early on-line fault diagnosis (detection/isolation) in industrial processes (SCADA systems) helps in preventing these processes from more catastrophic failures.

The quickest multidecision detection/isolation problem is the generalization of the quickest changepoint detection problem to the case of \(K-1\) post-change hypotheses. It is necessary to detect the change in distribution as soon as possible and indicate which hypothesis is true after a change occurs. Both the rate of false alarms and the misidentification (misisolation) rate should be controlled by given levels.

2 Problem Statement

Let \(X_1, X_2, \ldots \) denote the series of observations, and let \(\nu \) be the serial number of the last pre-change observation. In the case of multiple hypothesis, there are several possible post-change hypotheses \(\mathcal{H}_j\), \(j=1,2,\ldots ,K-1\). Let \(\mathbb {P}_k^j\) and \(\mathbb {E}_k^j\) denote the probability measure and the expectation when \(\nu =k\) and \(\mathcal{H}_j\) is the true post-change hypothesis, and let \(\mathbb {P}_\infty \) and \(\mathbb {E}_\infty =\mathbb {E}_0^0\) denote the same when \(\nu =\infty \), i.e., there is no change. Let (see [1] for details)

$$\begin{aligned} {\mathbb {C}}_\gamma = \left\{ \delta =(T,d) :\min _{0 \le \ell \le K-1} \min _{1 \le j \ne \ell \le K-1}\mathbb {E}_0^{\ell }\left( \inf _{r\ge 1}\{T_r~:~d_r =j\}\right) \ge \gamma \right\} , \end{aligned}$$
(1)

where T is the stopping time, d is the final decision (the number of post-change hypotheses) and the event \(\{d_r=j\}\) denotes the first false alarm of the j-th type, be the class of detection and isolation procedures for which the average run length (ARL) to false alarm and false isolation is at least \(\gamma >1\). In the case of detection–isolation procedures, the risk associated with the detection delay is defined analogously to Lorden’s worst-worst-case and it is given by [1]

$$\begin{aligned} {\mathsf {ESADD}}(\delta ) = \max _{1 \le j \le K-1} \sup _{0 \le \nu <\infty } \left\{ \mathrm{esssup}\,\mathbb {E}_{\nu }^j[(T-\nu )^+|{ \mathcal {F}}_{\nu }]\right\} . \end{aligned}$$
(2)

Hence, the minimax optimization problem seeks to

$$\begin{aligned} \text {Find } \delta _{\mathrm {opt}}\in {\mathbb {C}}_\gamma \text { such that } {\mathsf {ESADD}}(\delta _{\mathrm {opt}})= \inf _{\delta \in {\mathbb {C}}_\gamma }{\mathsf {ESADD}}(\delta ) \text { for every } \gamma >1 , \end{aligned}$$
(3)

where \({\mathbb {C}}_\gamma \) is the class of detection and isolation procedures with the lower bound \(\gamma \) on the ARL to false alarm and false isolation defined in (1).

Another minimax approach to change detection and isolation is as follows [2, 3]; unlike the definition of the class \({\mathbb {C}}_\gamma \) in (1), where we fixed a priori the changepoint \(\nu =0\) in the definition of false isolation to simplify theoretical analysis, the false isolation rate is now expressed by the maximal probability of false isolation \(\sup _{\nu \ge 0}{\mathbb {P}}^{\ell }_{\nu }(d = j \ne \ell | T > \nu )\). As usual, we measure the level of false alarms by the ARL to false alarm \(\mathbb {E}_{\infty } T\). Hence, define the class

$$\begin{aligned} {\mathbb {C}}_{\gamma ,\beta }=\left\{ \delta =(T,d) :\mathbb {E}_{\infty } T \ge \gamma , ~ \max _{1 \le \ell \le K-1} \max _{1\le j \ne \ell \le K-1}\sup _{\nu \ge 0}{\mathbb {P}}^{\ell }_{\nu }(d =j | T > \nu ) \le \beta \right\} . \end{aligned}$$
(4)

Sometimes Lorden’s worst-worst-case ADD is too conservative, especially for recursive change detection and isolation procedures, and another measure of the detection speed, namely the maximal conditional average delay to detection \({\mathsf {SADD}}(T)=\sup _\nu \mathbb {E}_{\nu }(T-\nu |T>\nu )\), is better suited for practical purposes. In the case of change detection and isolation, the SADD is given by

$$\begin{aligned} {\mathsf {SADD}}(\delta ) = \max _{1 \le j \le K-1}\sup _{0\le \nu <\infty }\mathbb {E}_{\nu }^j(T-\nu |T>\nu ) . \end{aligned}$$
(5)

We require that the \({\mathsf {SADD}}(\delta )\) should be as small as possible subject to the constraints on the ARL to false alarm and the maximum probability of false isolation. Therefore, this version of the minimax optimization problem seeks to

$$\begin{aligned} \text {Find } \delta _{\mathrm {opt}}\in {\mathbb {C}}_{\gamma ,\beta } \text { such that }&{\mathsf {SADD}}(\delta _{\mathrm {opt}})=\inf \nolimits _{\delta \in {\mathbb {C}}_{\gamma ,\beta }}{\mathsf {SADD}}(\delta ) \nonumber \\&\qquad \quad \text { for every } \gamma >1 \text { and } \beta \in (0,1). \end{aligned}$$
(6)

A detailed description of the developed theory and some practical examples can be found in the recently published book [4].

3 Efficient Procedures of Quickest Change Detection/isolation

Asymptotic Theory. In this paragraph we recall a lower bound for the worst mean detection/isolation delay over the class \({\mathbb {C}}_\gamma \) of sequential change detection/isolation tests proposed in [1]. First, we start with a technical result on sequential multiple hypotheses tests and then we give an asymptotic lower bound for \({\mathsf {ESADD}}(\delta )\).

Lemma 1

Let \((X_k)_{k\ge 1}\) be a sequence of i.i.d. random variables. Let \(\mathcal{H}_0,\ldots ,\mathcal{H}_{K-1}\) be \(K \ge 2\) hypotheses, where \(\mathcal{H}_i\) is the hypothesis that X has density \(f_i\) with respect to some probability measure \(\mu \), for \(i=0,\ldots ,K-1\) and assume the inequality

$$ 0 < \rho _{ij} \mathop {\overset{{\tiny def.}}{=}}\nolimits \int f_i \log \frac{f_i}{f_j} d \mu < \infty , \;\;\;0 \le i\ne j \le K-1, $$

to be true.

Let \(\mathbb {E}_i(N)\) be the average sample number (ASN) in a sequential test \((N,\delta )\) which chooses one of the K hypotheses subject to a \(K \times K\) error matrix \(A=\left[ a_{ij}\right] \), where \(a_{ij}=\mathbb {P}_i(\textit{accepting}\;\;\mathcal{H}_j),\;i,j= 0,\ldots ,K-1\).

Let us reparameterize the matrix A in the following manner :

$$ \left( \begin{array}{cccc} \!\!1\!-\!\sum \nolimits _{\ell =1}^{K-1}\!\alpha _\ell \! &{} \alpha _1 &{} \ldots &{} \alpha _{K-1}\!\! \\ \\ \!\!\gamma _1 &{} 1-\sum \nolimits _{\ell =2}^{K-1} \beta _{1,\ell } -\gamma _1 &{} \ldots &{} \beta _{1,K-1} \!\!\\ \\ \!\!\gamma _2 &{} \beta _{2,1} &{} \ldots &{} \beta _{2,K-1}\!\! \\ \\ \!\!\ldots &{} \ldots &{} \ldots &{} \ldots \!\! \\ \\ \!\!\gamma _i &{} \beta _{i,1} &{} \ldots &{} \beta _{i,K-1}\!\!\\ \\ \!\!\ldots &{} \ldots &{} \ldots &{} \ldots \!\! \\ \\ \!\!\gamma _{K-1} &{} \beta _{K-1,1} &{} \ldots &{} \!\!1\!-\!\sum \nolimits _{\ell =1}^ {K-2}\!\beta _{K-1,l}\!-\!\gamma _{K-1}\!\! \end{array} \right) . $$

Then a lower bound for the ASN \(\mathbb {E}_i(N)\) is given by the following formula :

$$\begin{aligned} \mathbb {E}_i(N)\!\!\ge & {} \!\!\max \left\{ \! \frac{(1-\tilde{\gamma }_i)\ln \left( \sum _{\ell =1}^{K-1}\alpha _l\right) ^{-1}-\log 2}{\rho _{i0}},\right. \\&\left. \max _{1\le j \ne i \le K-1} \!\left( \! \frac{(1-\tilde{\gamma }_i) \ln \beta _{ji}^{-1}-\log 2}{\rho _{ij}}\!\right) \! \right\} \end{aligned}$$

for \(i=1,\ldots ,K-1\), where

$$ \tilde{\gamma _i} = \gamma _i + \sum _{\ell =1,\ell \ne i}^ {K-1}\beta _{i,\ell }. $$

Theorem 1

Let \((Y_k)_{k \ge 1}\) be an independent random sequence observed sequentially :

$$ \mathcal{L}({ Y}_k) = \left\{ \begin{array}{lll} {P}_0 &{} \textit{if} &{} k \le \nu \\ {P}_\ell &{} \textit{if} &{} k \ge \nu +1 \end{array} \right. ,\;\; \nu = 0,1,2,\ldots ,\;\;\textit{for}\;\;1 \le \ell \le K-1 $$

The distribution \(P_\ell \) has density \(f_\ell \), \(\ell =0,\ldots ,K-1\). An asymptotic lower bound for \({\mathsf {ESADD}}(\delta )\), which extends the result of Lorden [5] to multiple hypotheses case, is:

$$ {\mathsf {ESADD}}(T;\gamma ) ~\gtrsim ~ \frac{\log \gamma }{\rho ^*} \;\;\textit{as}\;\; \gamma \rightarrow \infty , $$

where

$$ \rho ^* \mathop {\overset{{\tiny def.}}{=}}\nolimits \min _{1 \le \ell \le K-1}~ \min _{0 \le j \ne \ell \le K-1}~ \rho _{\ell ,j}\;\;\text {and}\;\; 0 < \rho _{\ell ,j} \mathop {\overset{{\tiny def.}}{=}}\nolimits \mathbb {E}_1^l\left( \log \frac{f_{l}(Y_i)}{f_{j}(Y_i)} \right) < \infty $$

is the K-L information.

Generalized CUSUM Test. The generalized CUSUM (non recursive) test asymptotically attains the above mentioned lower bound [1]. Let us introduce the following stopping time and final decision

$$ \tilde{N}=\min \{\tilde{N}^1,\ldots ,\tilde{N}^{K-1}\};\;\; \tilde{d} = \mathrm{argmin}\{\tilde{N}^1,\ldots ,\tilde{N}^{K-1}\} $$

of the detection/isolation algorithm. The stopping time \(\tilde{N}^\ell \) is responsible for the detection of hypothesis \(\mathcal{H}_\ell \):

$$\begin{aligned} \tilde{N}^\ell&= \inf _{k \ge 1}\tilde{N}^\ell (k),\; \tilde{N}^\ell (k) = \inf \left\{ n \ge k : \min _{0\le j \ne \ell \le K-1}S_k^n(\ell ,j) \ge h \right\} \\ {\tilde{N}^\ell }&= \inf \left\{ n\ge 1~: \max _{1 \le k \le n}\min _{0\le j \ne \ell \le K-1}S_k^n(\ell ,j)\ge h\right\} ,\;\;S_k^n(\ell ,j) = \sum _{i=k}^n\log \frac{f_{\ell }({Y}_i)}{f_{j} ({Y}_i)}. \end{aligned}$$

The generalized matrix recursive CUSUM test, which also attains the asymptotic lower bound, has been considered in [6, 7]. Let us introduce the following stopping time and final decision

$$ \widehat{N}_r=\min \{\widehat{N}^1,\ldots ,\widehat{N}^{K-1}\};\;\; \tilde{d}_r = \mathrm{argmin}\{\widehat{N}^1,\ldots ,\widehat{N}^{K-1}\} $$

of the detection/isolation algorithm. The stopping time \(\widehat{N}^\ell \) is responsible for the detection of hypothesis \(\mathcal{H}_l\) :

$$\begin{aligned} \widehat{N}^\ell= & {} \inf \left\{ n \ge 1: \min _{0\le k\ne j \le K-1}Q_n(\ell ,j)\ge h\right\} ,\\ Q_n(\ell ,j)= & {} \left( Q_{n-1}(\ell ,j) + Z_n(\ell ,j)\right) ^+,\;\;Z_n(\ell ,j) = \log \frac{f_{\ell }({Y}_n)}{f_{j} ({Y}_n)} \end{aligned}$$

For some safety critical applications, a more tractable criterion consists in minimizing the maximum detection/isolation delay:

$$\begin{aligned} {\mathsf {SADD}}(\delta ) = \max _{1 \le j \le K-1}\sup _{0\le \nu <\infty }\mathbb {E}_{\nu }^j(T-\nu |T>\nu ) . \end{aligned}$$
(7)

subject to :

$$ {\mathbb {C}}_{\gamma ,\beta }=\left\{ \delta ~: \mathbb {E}_{\infty } T \ge \gamma , ~ \max _{1 \le \ell \le K-1} \max _{1\le j \ne \ell \le K-1}\sup _{\nu \ge 0}{\mathbb {P}}^{\ell }_{\nu }(d =j | T > \nu ) \le \beta \right\} . $$

for \(1 \le \ell ,j \ne \ell \le K-1\). An asymptotic lower bound in this case is given by the following theorem [3].

Theorem 2

Let \((Y_k)_{k \ge 1}\) be an independent random sequence observed sequentially:

$$ \mathcal{L}({ Y}_k) = \left\{ \begin{array}{lll} {P}_0 &{} \text {if} &{} k \le \nu \\ {P}_\ell &{} \text {if} &{} k \ge \nu +1 \end{array} \right. ,\;\; \nu = 0,1,2,\ldots ,\;\;\text {for}\;\;1 \le \ell \le K-1 $$

Then

$$ {\mathsf {SADD}}(N;\gamma ,\beta ) \gtrsim \max \left\{ \frac{\log \gamma }{\rho ^*_{\mathrm {d}}},~ \frac{\log \beta ^{-1}}{\rho ^*_{\mathrm {i}}} \right\} \;\;\text {as}\;\;\min \{\gamma , \beta ^{-1}\} \rightarrow \infty , $$

where \(\rho ^*_{\mathrm {d}}\! =\!\min _{1\! \le j \le K-1}\rho _{j,0}\) and \( \rho ^*_{\mathrm {i}} \!=\!\min _{1 \le \ell \le K-1}\min _{1 \le j \ne \ell \le K-1}\rho _{\ell ,j}\).

Vector Recursive CUSUM Test. If \(\gamma \rightarrow \infty \), \(\beta \rightarrow 0\) and \(\log \gamma \ge \log \beta ^{-1}(1+o(1))\), then the above mentioned lower bound can be realized by using the following recursive change detection/isolation algorithm [2] :

$$ {N}_{r}\!=\!\min _{1\le \ell \le K-1}\!\!\{N_{r}(\ell )\},\;\;\;\; {d}_{r}\!=\!\arg \!\min _{1\le \ell \le K-1}\!\!\{N_{r}(\ell )\}, $$

where \({N}_r(\ell )\!=\!\inf \left\{ n \ge 1:\min _{0\le j \ne \ell \le K-1}\left[ S_n(\ell ,j)\!-\! h_{\ell ,j}\right] \! \ge \!0\right\} \),

$$ S_n(\ell ,j)= g_n(\ell ,0)- g_n(j,0),\;g_n(\ell ,0)=\left( {g}_{n-1}(\ell ,0)+Z_n(\ell ,0)\right) ^+, $$

with \(Z_n(\ell ,0)=\log \frac{f_{\ell }(Y_n)}{f_{0}(Y_n)}\), \({g}_0(\ell ,0)=0\) for every \(1 \le \ell \le K-1\) and \(g_n(0,0) \equiv 0\),

$$ h_{\ell ,j}\!\!=\!\!\left\{ \begin{array}{ccccc} h_{d}&{}\text {if}&{} 1 \le \ell \le K-1 &{}\text {and} &{} j=0\\ h_{i}&{}\text {if}&{} 1 \le j,\ell \le K-1 &{}\text {and} &{} j \ne \ell \end{array} \right. . $$

4 Applications to Network Monitoring

In this section the above mentioned theoretical results are illustrated by application of the proposed detection/isolation procedures to the problem of network monitoring.

Let us consider a network composed of r nodes and n mono-directional links, where \(y_\ell \) denotes the volume of traffic on the link \(\ell \) at discrete time k (see details in [8, 9]). For the sake of simplicity, the subscript k denoting the time is omitted now. Let \(x_{i,j}\) be the Origin-Destination (OD) traffic demand from node i to node j at time k. The traffic matrix \(X=\{x_{i,j}\}\) is reordered in the lexicographical order as a column vector \(X=\left[ (x_{(1)},\ldots ,x_{(m)})\right] ^T\), where \(m=r^2\) is the number of OD flows.

Let us define an \(n \times m\) routing matrix \(A=\left[ a_{\ell ,k}\right] \) where \(0 \le a_{\ell ,k} \le 1\) represents the fraction of OD flow k volume that is routed through link \(\ell \). This leads to the linear model

$$ Y=A\,X, $$

where \(Y={(y_1,\ldots ,y_n)}^T\) is the Simple Network Management Protocol (SNMP) measurements. Without loss of generality, the known matrix A is assumed to be of full row rank, i.e., \(\mathrm{rank}\,{A}=n\).

The problem consists in detecting and isolating a significant volume anomaly in an OD flow \(x_{i,j}\) by using only SNMP measurements \(y_1,\ldots ,y_n\). In fact, the main problem with the SNMP measurements is that \(n \ll m\). To overcome this difficulty a parsimonious linear model of non-anomalous traffic has been developed in the following papers  [1017].

The derivation of this model includes two steps: (i) description of the ambient traffic by using a spatial stationary model and (ii) linear approximation of the model by using piecewise polynomial splines.

The idea of the spline model is that the non-anomalous (ambient) traffic at each time k can be represented by using a known family of basis functions superimposed with unknown coefficients, i.e., it is assumed that

$$ X_k \approx B\mu _k,\;\;k=1,2,\ldots , $$

where the \(m \times q\) matrix B is assumed to be known and \(\mu _t \in \mathbb {R}^q\) is a vector of unknown coefficients such that \(q < n\). Finally, it is assumed that the model residuals together with the natural variability of the OD flows follow a Gaussian distribution, which leads to the following equation:

$$\begin{aligned} X_k=B\mu _k+\xi _k \end{aligned}$$
(8)

where \(\xi _k \sim {\mathcal N}(0,\varSigma )\) is Gaussian noise, with the \(m \times m\) diagonal covariance matrix \(\varSigma =\mathrm{diag}{(\sigma _1^2,\ldots ,\sigma _m^2)}\). The advantages of the detection algorithm based on a parametric model of ambient traffic and its comparison to a non-parametric approach are discussed in [11, 14], (see also [18] for PCA based approach). Hence, the link load measurement model is given by the following linear equation :

$$\begin{aligned} Y_k=A\,B\mu _k+A\xi _k=H\mu _k + \zeta _k+[\theta _\ell ], \end{aligned}$$
(9)

where \(Y_k={(y_1,\ldots ,y_n)_k}^T\) and \(\zeta _k \sim {\mathcal N}(0,A\varSigma A^T)\). Without any loss of generality, the resulting matrix \(H=A\,B\) is assumed to be of full column rank. Typically, when an anomaly occurs on OD flow \(\ell \) at time \(\nu +1\) (change-point), the vector \(\theta _\ell \) has the form \(\theta _\ell =\varepsilon \,a(\ell )\), where \(a(\ell )\) is the \(\ell \)-th normalized column of A and \(\varepsilon \) is the intensity of the anomaly. The goal is to detect/isolate the presence of an anomalous vector \(\theta _\ell \), which cannot be explained by the ambient traffic model \(X_k \approx B\mu _k\).

Therefore, after the de-correlation transformation, the change detection/isolation problem is based on the following model with nuisance parameter \(X_k\) :

$$\begin{aligned} Y_k =H X_k + \xi _k + \theta (k,\nu ),\;\;\xi _k \sim {\mathcal N}(0, \sigma ^2 I_n),\;\;k=1,2,\ldots , \end{aligned}$$
(10)

where H is a full rank matrices of size \(n \times q\), \(n>q\), and \(\theta (k,\nu )\) is a change occurring at time \(\nu +1\), namely :

$$ \theta (k,\nu ) = \left\{ \begin{array}{lll} 0 &{}\text {if} &{} k \ne \nu \\ \theta _\ell &{}\text {if} &{} k \ge \nu +1 \end{array}\right. ,\;1 \le \ell \le K-1. $$

This problem is invariant under the group \(G=\{Y \rightarrow g(Y)=Y+HX\}\) (see details in [19]). The invariant test is based on maximal invariant statistics. The solution is the projection of Y on the orthogonal complement \(R(H)^{\bot }\) of the column space R(H) of the matrix H. The parity vector \(Z=W Y\) is a maximal invariant to the group G.

$$ WH=0,\;\;W^TW=P_H=I_r-H(H^T H)^{-}H^T,\;\;WW^T=I_{n-q}. $$

Transformation by W removes the interference of the nuisance parameter X

$$ Z=WY=W\xi \;( + W\theta ). $$

Hence, the sequential change detection/isolation problem can be re-written as

$$ Z_k=WY_k=W\xi _k+\;W\theta (k,\nu ),\;\;\xi _k \sim {\mathcal N}(0, \sigma ^2 I_{n-q}),\;\;k=1,2,\ldots . $$

Theorem 3

Let \((Y_k)_{k \ge 1}\) be the output of the model given by (10) observed sequentially. Then the generalized CUSUM or matrix recursive CUSUM tests attain the lower bound corresponding to the minimax setup :

$$ {\mathsf {ESADD}}(N;\gamma ) ~\gtrsim ~ \frac{\log \gamma }{\overline{\rho ^*}} \;\;\text {as}\;\; \gamma \rightarrow \infty ,\;\;\overline{\rho ^*} \mathop {\overset{{\tiny def.}}{=}}\nolimits \inf _{X^\ell , X^j}\min _{1 \le \ell \le K-1}~ \min _{0 \le j \ne \ell \le K-1}~ \rho _{\ell ,j}(X^\ell ,X^j)$$

where \(X^\ell \) (resp. \(X^j\)) corresponds to the hypothesis \({\mathcal H}_\ell \) (resp. \({\mathcal H}_j\)). The vector recursive CUSUM test attains the lower bound

$$ {\mathsf {SADD}}(N;\gamma ,\beta ) \gtrsim \max \left\{ \frac{\log \gamma }{\overline{\rho ^*}_{\mathrm {d}}},~ \frac{\log \beta ^{-1}}{\overline{\rho ^*}_{\mathrm {i}}} \right\} \;\text {as}\;\gamma \rightarrow \infty ,\;\beta \rightarrow 0, \log \gamma \ge \log \beta ^{-1}(1+o(1)),$$

where

$$ \overline{\rho ^*}_{\mathrm {d}}\! =\inf _{X^j, X^0}\!\min _{1\! \le j \le K-1}\rho _{j,0}(X^j,X^0)\;\;\text {and}\;\;\overline{\rho ^*}_{\mathrm {i}} \!=\inf _{X^\ell , X^j}\!\min _{1 \le \ell \le K-1}\min _{1 \le j \ne \ell \le K-1}\rho _{\ell ,j}(X^\ell ,X^j). $$