Introduction

Quickest change-point detection is concerned with the design and analysis of reliable statistical machinery for rapid detection of changes that may spontaneously affect a “live” process, continuously monitored via sequentially made observations. See, e.g., [24] or [33, Part II]. A quickest change-point detection procedure is a stopping time adapted to the observed data, and is a rule whereby one is to stop and “sound an alarm” that the characteristics of the observed process may have (been) changed. A “good” (i.e., optimal or nearly optimal) detection procedure is one that minimizes (or nearly minimizes) the desired detection delay penalty, subject to a constraint on the false alarm risk. For an overview of the major optimality criteria see, e.g., [18, 23, 32, 38] or [33, Part II].

A problem particularly persistent in applied change-point detection (e.g., in quality control) is evaluation of detection procedures’ performance. To that end, the ideal would be to have the needed performance metrics expressed exactly and in a closed and simple form. However, this is generally quite difficult mathematically, if at all possible. Part of the reason is that the renewal equations that many popular performance metrics satisfy are Fredholm integral equations of the second kind (possibly written as equivalent differential equations), and such equations seldom allow for an analytical solution. As a result, the standard practice has been to evaluate the performance numerically (one particularly popular approach has been to devise an asymptotic approximation of some sort). Nevertheless, some exact performance formulae have been derived explicitly, although primarily for the “mainstream” detection methods. For instance, a number of characteristics of the celebrated CUSUM “inspection scheme” (due to [13]) have been expressed explicitly, e.g., in [1, 2, 6, 7, 25, 37],Footnote 1 although for only a handful of scenarios. Likewise, exact closed-form formulae for various performance metrics of the famous EWMA chart (due to [26]) in an exponential scenario have been established, e.g., in [3, 12, 21] (see footnote 1).

However, the corresponding progress made to date for the classical Shiryaev–Roberts (SR) procedure (due to [2729]) is far more modest (except for the continuous-time case), and especially little has been done for the Generalized SR (GSR) procedure, which was introduced recently in [11] as a “headstarted” version of the classical SR procedure. Since the latter is a special case of the GSR procedure (when the headstart is zero), from now on we will follow [34] and use the term “GSR procedure” to refer to both procedures. As a matter of fact, to the best of our knowledge, exact and explicit formulae for a small subset of characteristics of the GSR procedure have been obtained only in [4, 9, 10, 14, 22, 23, 35, 40]. The purpose of this work is to add on to this list. Specifically, we obtain an exact, closed-form formula for the standard (minimax) Average Run Length (ARL) to false alarm delivered by the GSR procedure devised to detect a jump in the common baseline mean of a sequence of independent exponentially distributed observations. The formula is found analytically, through direct solution of the respective renewal (integral) equation, and is valid for an arbitrary (nonnegative) headstart, with the detection threshold not restricted from above. Furthermore, the formula is remarkably simple (it is a function linear in the detection threshold and in the headstart) and, unlike its complicated and cumbersome CUSUM and EWMA counterparts, can be used to compute the GSR procedure’s ARL to false alarm (in the exponential scenario) essentially “by hand”. This would clearly be of aid to a practitioner.

Preliminaries

The centerpiece of this work is the (minimax) Average Run Length (ARL) to false alarm of the Generalized Shiryaev–Roberts (GSR) detection procedure (due to [11]) considered in the context of the basic minimax quickest change-point detection problem (see, e.g., [8, 14]). As a performance metric, the ARL to false alarm was apparently introduced in [13]; see also, e.g., [8].

Let f (x) and f 0(x) denote, respectively, the observations’ pdf in the pre- and post-change regime. Let Λ n f 0(X n )/f (X n ) be the “instantaneous” likelihood ratio (LR) for the n-th data point, X n . The GSR procedure (due to [11]) is then formally defined as the stopping time

$$ \mathscr{S}_{A}^r \triangleq \inf\bigl\{ n \ge1\colon R_n^r \ge A\bigr\} , \quad \mbox{such that } \inf\{\varnothing\} =\infty, $$
(7.1)

where A>0 is a detection threshold used to control the false alarm risk, and

$$ R_{n+1}^r = \bigl(1+R_{n}^r \bigr)\varLambda _{n+1} \quad \mbox{for } n=0,1,\ldots\ \mbox{with }R_0^r=r\ge0, $$
(7.2)

is the GSR detection statistic. We remark that \(R_{0}^{r}=r\ge0\) is a design parameter referred to as the headstart and, in particular, when \(R_{0}^{r}=r=0\), the GSR procedure is equivalent to the classical Shiryaev–Roberts (SR) procedure (due to [2729]); a brief account of the SR procedure’s history may be found, e.g., in [16]. Albeit “young” (the GSR procedure was proposed in 2011), it has already been shown (see, e.g., [17, 22, 30, 34, 35]) to possess very strong optimality properties, not exhibited by the CUSUM scheme or the EWMA chart; in fact, in certain scenarios, the latter two charts have been found experimentally to be inferior to the GSR procedure.

Let \(\mathbb {P}_{\infty}\) (\(\mathbb {E}_{\infty}\)) be the probability measure (expectation) induced by the observations in the pre-change regime, i.e., when X n f (x) for all n≥1. The ARL to false alarm of the GSR procedure is defined as \(\mathrm {ARL}(\mathscr{S}_{A}^{r})\triangleq \mathbb {E}_{\infty}[\mathscr{S}_{A}^{r}]\). A key property of the GSR statistic (7.2) is that the sequence \(\{R_{n}^{r}-n-r\} _{n\ge0}\) is a zero-mean \(\mathbb {P}_{\infty}\)-martingale, i.e., \(\mathbb {E}_{\infty}[R_{n}^{r}-n-r]=0\) for all n≥0 and all r. This and Doob’s Optional stopping (sampling) theorem (see, e.g., [33, Theorem 2.3.1, p. 31]) imply that \(\mathbb {E}_{\infty}[R_{\mathscr{S}_{A}^{r}}-\mathscr{S}_{A}^{r}-r]=0\), so that \(\mathrm {ARL}(\mathscr{S}_{A}^{r})=\mathbb {E}_{\infty}[R_{\mathscr{S}_{A}^{r}}]-r\ge A-r\). As a result, to ensure that \(\mathrm {ARL}(\mathscr{S}_{A}^{r})\ge\gamma\) for a desired γ>1, it suffices to pick A and r from the solution set of the inequality Arγ and such that A>0 and r≥0.

A more accurate result is the approximation \(\mathrm {ARL}(\mathscr {S}_{A}^{r})\approx (A/\xi)-r\) valid for sufficiently large A>0; see, e.g., [15, Theorem 1] or [34]. To define ξ, let \(S_{n}\triangleq\sum_{i=1}^{n}\log \varLambda _{n}\) for n≥1, and let τ a ≜inf{n≥1:S n a} for a>0 (again, with the understanding that inf{∅}=∞). Then \(\kappa_{a}\triangleq S_{\tau_{a}}-a\) is the so-called “overshoot” (excess over the level a>0 at stopping), and \(\xi\triangleq\lim_{a\to\infty} \mathbb {E}_{0}[e^{-\kappa_{a}}]\), and is referred to as the “limiting average exponential overshoot”; here \(\mathbb {E}_{0}\) denotes the expectation under the probability measure induced by the observations in the post-change regime, i.e., when X n f 0(x) for all n≥1. In general, ξ is clearly between 0 and 1, and is a model-dependent constant, which falls within the scope of nonlinear renewal theory; see, e.g., [39], [38, Section II.C] or [33, Section 2.6].

We now state the main equation that we shall deal with (and, in fact, solve analytically) in the next section in a certain exponential scenario. Let \(P_{\infty}^{\varLambda }(t)\triangleq \mathbb {P}_{\infty}({\varLambda _{1}\le t})\), t≥0, be the cdf of the LR under probability measure \(\mathbb {P}_{\infty}\). Let \(R_{0}^{r=x}=r=x\ge0\) be fixed and define

$$ \mathscr{K}_\infty(x,y) \triangleq \frac{\partial}{\partial y} \mathbb {P}_\infty\bigl(R_{n+1}^r\le y \big|R_n^r=x \bigr) = \frac{\partial}{\partial y}P_{\infty}^{\varLambda } \biggl(\frac{y}{1+x} \biggr), \quad \mbox{for } x,y\ge0, $$
(7.3)

i.e., the transition probability density kernel for the homogeneous Markov process \(\{R_{n}^{r}\}_{n\ge0}\) under probability measure \(\mathbb {P}_{\infty}\).

From now on, let \(\ell(x,A)\triangleq \mathrm {ARL}(\mathscr{S}_{A}^{r=x})\). It is shown, e.g., in [11], that (x,A) is governed by the renewal equation

$$ \ell(x,A) = 1+\int_0^A \mathscr{K}_{\infty}(x,y) \ell(y,A)\,dy, $$
(7.4)

where x≥0 and A>0. The question of existence and uniqueness of solution for this equation has been answered in the affirmative, e.g., in [11]. It is this equation, viz. the exact solution thereof in a specific exponential scenario, that is the centerpiece of this work.

Equation (7.4) is a Fredholm (linear) integral equation of the second kind. Since for such equations an analytical solution is rarely a possibility, they are usually solved numerically. Numerical schemes specifically for Eq. (7.4) have been developed and applied, e.g., in [11, 20, 36]. However, it turns out that in a certain exponential scenario it is possible to solve (7.4) analytically, and, more importantly, the solution is a simple linear function of x and A, just as one would expect from the approximation \(\mathrm {ARL}(\mathscr{S}_{A}^{r})\approx (A/\xi)-r\) mentioned earlier. This is the main result of this paper, it generalizes [5, Proposition 1], and the details are given in the next section.

The Main Result

We are now in a position to establish the main result of this work, i.e., derive analytically an exact closed-form formula for the ARL to false alarm exhibited by the GSR procedure (7.1)–(7.2) “tasked” to detect a change in the baseline (common) mean of a series of independent exponentially distributed observations. More concretely, suppose the observations’ pre- and post-change pdf’s are

(7.5)

respectively, where θ>0, a known parameter with an obvious interpretation: it is the magnitude of the shift in the mean of the exponential distribution, so that the higher (lower) the value of θ, the more (less) contrast the mean shift is, and the easier (harder) it is to detect. We shall from now on refer to this scenario as the \(\mathscr {E}(1)\)-to-\(\mathscr {E}(1+\theta)\) model, to reflect not only the throughout “exponentiality” of the data, but also that their mean is 1 pre-change and 1+θ>1 post-change. For a motivation to consider this model, see, e.g., [4, 31], or [33, Section 3.1.6].

To “tailor” the general equation (7.4) on the ARL to false alarm to the \(\mathscr {E}(1)\)-to-\(\mathscr {E}(1+\theta)\) model, the first step is to find Λ n f 0(X n )/f (X n ). To that end, it is easy to see from (7.5) that

$$ \varLambda _n = \frac{1}{1+\theta}\exp \biggl\{ \frac{\theta}{1+\theta} X_n \biggr\} , \quad n\ge1, $$
(7.6)

and we note that since X n ≥0 w.p. 1 for all n≥1 under any probability measure, it can be deduced that Λ n ≥1/(1+θ) w.p. 1 for all n≥1, also under any probability measure. The latter inequality is a circumstance with consequences, which are illustrated in the following two results.

Lemma 7.1

For the \(\mathscr {E}(1)\)-to-\(\mathscr {E}(1+\theta)\) model (7.5), the pre-change transition probability density kernel, 𝒦(x,y), defined by (7.3), is given by the formula:

(7.7)

where it is understood that x≥0.

Proof

The desired result can be established directly from (7.3), i.e., the definition of the pre-change transition probability density kernel, 𝒦(x,y), combined with (7.6), i.e., the formula for the LR specific to the \(\mathscr {E}(1)\)-to-\(\mathscr {E}(1+\theta)\) model (7.5). The presence of the indicator function in the right-hand side of (7.7) is an implication of the aforementioned inequality Λ n ≥1/(1+θ) valid w.p. 1 for all n≥1 and under any probability measure. □

Now, with (7.7) put in place of 𝒦(x,y) in the general equation (7.4) the latter takes on the form

$$ \ell(x,A) = 1+\theta^{-1}(1+\theta)^{-1/\theta}(1+x)^{1+1/\theta} \int_{(1+x)/(1+\theta)}^Ay^{-2-1/\theta}\ell(y,A)\,dy, $$
(7.8)

where x≥0 and A>0, and we recall that \(\ell(x,A)\triangleq \mathbb {E}_{\infty}[\mathscr{S}_{A}^{r=x}]\). It is this equation that we shall now attempt solve explicitly. To that end, a natural point of departure here would be the aforementioned approximation \(\mathrm {ARL}(\mathscr {S}_{A}^{r})\approx (A/\xi)-r\), where ξ is the limiting average exponential overshoot formally defined in the preceding section. It is known (see, e.g., [31]) that ξ=1/(1+θ)∈(0,1) for the \(\mathscr {E}(1)\)-to-\(\mathscr {E}(1+\theta )\) model (7.5). Hence, at least for large enough A’s, the solution to (7.8) should behave roughly as (x,A)≈A(1+θ)−x. As will be shown shortly, this is, in fact, precisely the behavior of the solution, without A having to be large. However, the aforementioned fact that Λ n ≥1/(1+θ) w.p. 1 under any measure makes things a bit complicated.

Lemma 7.2

For the \(\mathscr {E}(1)\)-to-\(\mathscr {E}(1+\theta)\) model (7.5), at each epoch n≥0 and under any probability measure, the GSR statistic \(R_{n}^{r}\) has a deterministic lower bound, i.e., \(R_{n}^{r}\ge B_{n}^{r}\) w.p. 1, for each n≥0 and under any probability measure, where

$$ B_n^r \triangleq \frac{1}{\theta} \biggl[1-\frac{1}{(1+\theta)^n} \biggr]+\frac{r}{(1+\theta )^n}, \quad n\ge0, $$
(7.9)

and r is the GSR statistic’s headstart, i.e., \(R_{0}^{r}=r\ge0\).

Proof

It is merely a matter of “unfolding” the recursion \(R_{n}^{r}=(1+R_{n-1}^{r})\varLambda _{n}\), n≥1, one term at a time, and applying, at each step, the inequality Λ n ≥1/(1+θ) valid w.p. 1 under any probability measure. □

At this point note that since 1+θ>1, the lower bound sequence \(\{ B_{n}^{r}\}_{n\ge0}\) given by (7.9) is such that (a) for r≤1/θ, it increases monotonically with n, i.e., \(r\equiv B_{0}^{r}\le B_{1}^{r}\le B_{2}^{r}\le\ldots\), when r≤1/θ, and (b) \(\lim_{n\to\infty}B_{n}^{r}=1/\theta\), irrespective of \(R_{0}^{r}=r\ge0\). Hence, when A<1/θ, the GSR statistic, \(\{R_{n}^{r}\}_{n\ge0}\), is guaranteed to either hit or exceed the level A>0 within at most m steps, where mm(r,A,θ) is found from the inequality \(B_{m}^{r}\ge A\), i.e.,

with ⌈x⌉ denoting the usual “ceiling” function. Therefore, the general solution to (7.8) is dependent upon whether A<1/θ or A≥1/θ. In the latter case, the (exact) solution is given by the following theorem, which is the main result of this paper.

Theorem 7.1

For the \(\mathscr {E}(1)\)-to-\(\mathscr {E}(1+\theta)\) model (7.5), if the detection threshold, A>0, is set so that A≥1/θ, then the ARL to false alarm of the GSR procedure is given by the formula:

(7.10)

and it is understood that x≥0.

Proof

It is sufficient to insert (7.10) into Eq. (7.8) and directly verify that the latter does, in fact, “check out”. The condition that A≥1/θ “protects” against the situation described in Lemma 7.2 and in the discussion following it. □

The special case of Theorem 7.1 when \(R_{0}^{r=x}=r=x\ge0\) (i.e., when there is no headstart) was previously established in [4, Proposition 1] using the memorylessness of the exponential distribution. It is also noteworthy that formula (7.10) as well as Eq. (7.8) are actually valid for x≥−1; the same can also be said about the general equation (7.4).

We conclude this section with a brief analysis of the case when A<1/θ. Recall that the integral in the right-hand side of (7.8) plays no role, unless (1+x)/(1+θ)<A. For this condition to hold when A<1/θ, it must be the case that (1+x)/(1+θ)<1/θ, i.e., that x<1/θ. Hence, if A<1/θ, then (x,A)≡1 for all x≥1/θ. To obtain (x,A) explicitly for x<1/θ, note that if x<1/θ, the function h(x)≜(1+x)/(1+θ), i.e., the lower limit of integration in the integral in the right-hand side of (7.8), is such that h(x)≥x. As a result, the nature of the integral equation becomes such that the unknown function, (x,A), is dependent solely upon the values it assumes for higher x’s, and since (x,A)≡1 for x≥1/θ, one can iteratively work out backwards the solution for any x≥0. However, this process involves formidable integrals, and only the first few steps seem to be feasible to actually carry out.

While an explicit formula for the ARL to false alarm of the GSR procedure when A<1/θ turned out to be problematic to get, from a practical standpoint it might not be worthwhile altogether, for the formula for A≥1/θ alone, i.e., Theorem 7.1, is sufficient. Specifically, since \(\mathrm {ARL}(\mathscr{S}_{A}^{r})\ge A-r\), the formula for the ARL to false alarm when A>1/θ, i.e., formula (7.10), will never yield ARL’s lower than (1/θ)−r. However, the size of this “blind spot” is not necessarily large, unless θ is very small, which is to say that the change in the mean in the \(\mathscr {E}(1)\)-to-\(\mathscr {E}(1+\theta)\) model (7.5) is faint and not worthy of detection to begin with. As an illustration of this point, consider the original SR procedure (r=0) and suppose that θ is 0.01, which, from a practical standpoint, can hardly be considered a “change” in the first place. Yet, since 1/θ in this case is 100, the linear formula for the ARL to false alarm will never yield a value of 100 or less. However, this is unlikely to be of inconvenience to a practitioner, as in most applications the ARL to false alarm is set to be at least in the hundreds, and, when θ=0.01, these levels of the ARL to false alarms would be obtainable through formula (7.10).

Concluding Remarks

This contribution is part of the authors’ ongoing effort (manifested, e.g., in [19, 20], and, with other collaborators, e.g., in [11, 22, 34, 35]) to “pave the way” for further research on the theory and application of the GSR procedure. To that end, case studies involving “stress-testing” the GSR procedure on real data are still an “uncharted territory” and would be of particular interest. Hopefully, the result obtained in this work, the data-analytic advantages pointed out in [5], and the strong optimality properties established, e.g., in [17, 22, 30, 34, 35], will help the GSR procedure rightly stand out as the top tool for change-point detection.