Keywords

1 Introduction

Let \((X_n)\) be a strictly stationary sequence with common cumulative distribution function (cdf) F. Introduce the following

Definition 1

[11]. The sequence \((X_n)\) is said to have the extremal index \(\theta \in [0,1],\) if for each \(\tau >0\) there exists a sequence of real numbers \(u_n = u_n(\tau )\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty } n(1 - F(u_n)) = \tau , \quad \lim _{n\rightarrow \infty } P(M_n \le u_n) = e^{-\theta \tau }, \end{aligned}$$

where \(M_n = \max (X_1, \ldots , X_n).\)

The extremal index exists for a wide class of stationary sequences and reflects a cluster structure of an underlying sequence and its local dependence properties. The extremal index of an independent sequence is equal to 1,  and the opposite is not true. In particular, if the Berman’s condition holds, then a Gaussian stationary sequence has the extremal index equal to 1,  [11].

We consider the problem of non-parametric estimation of the extremal index. It is important to note that absolutely all non-parametric extremal index estimators require the selection of the threshold parameter u and/or the block size b or other declustering parameter. The well-known blocks estimator [1] and its later modifications [19] depend of the choice of both b and u. Another classical one, the runs estimator [24], requires the selection of u and the number of consecutive observations r running below u separating two clusters. The intervals [8] and K-gaps [21] estimators require the choice of u only, whereas the sliding blocks estimators proposed in [16] and simplified in [2] depend on the block size only.

Less attention in the literature is devoted to methods of selection of the mentioned parameters, in particular, the threshold parameter u. Usually, the value of u is taken from high quantiles of an underlying sequence \((X_n)\) or selected visually corresponding to a stability plot of values of some estimate \(\hat{\theta }(u)\) with respect to u. Fukutome et al. [9] proposed the procedure of selection among pairs (uK) for the K-gaps estimator based on Information Matrix Test.

Markovich and Rodionov [15] proposed the non-parametric tool to select one of the necessary parameters for extremal index estimation. Although the proposed method can be applied for selection of arbitrary aforementioned parameter, they focused on the selection of a threshold parameter u. The developed method is an automatic procedure of extremal index estimation in cases if it is based on estimators requiring the choice of only one parameter, in particular, the intervals and K-gaps estimators. But in [15] this procedure was established only if the proportion of the largest order statistics of a sample used vanishes as \(n\rightarrow \infty \) (more precisely, see Theorem 3.3 [15]). The aim of this work is to investigate the opportunity of justification of the Markovich and Rodionov’s method if the mentioned proportion tends to some positive constant c. The problem of goodness-of-fit testing of distribution tails is also studied.

2 Preliminaries

2.1 Inter-exceedance Times and Their Asymptotic Behavior

Let us discuss the properties of a stationary sequence \((X_n)\) and its extremal index \(\theta .\) Let L be the number of exceedances of level u by the sequence \((X_i)_{i=1}^n\) and \(S_j(u)\) be the j-th exceedance time, that is,

$$\begin{aligned} S_j(u) = \min \{k> S_{j-1}(u): X_k>u\},\quad j=1, \ldots , L, \end{aligned}$$

where \(S_0=0.\) Define the inter-exceedance times as

$$\begin{aligned} T_j(u) = S_{j+1}(u) - S_{j}(u), \quad j=1, \ldots , L-1 \end{aligned}$$

and assume its number equal to L for convenience.

Introduce the following \(\varphi \)-mixing condition.

Definition 2

[8]. For real u and integers \(1\le k\le l\), let \(\mathcal {F}_{k,l}(u)\) be the \(\sigma \)-field, generated by \(\{X_i>u\}\), \(k\le i\le l\). Introduce the mixing coefficients \(\alpha _{n,q}(u)\),

$$\begin{aligned}\alpha _{n,q}(u)= & {} \max _{1\le k\le n-q}\sup |P(B|A)-P(B)|, \end{aligned}$$

where the supremum is taken over all sets \(A\in \mathcal {F}_{1,k}(u)\) with \(P(A)> 0\) and \(B\in \mathcal {F}_{k+l,n}(u).\)

The next theorem states that for some sequence of levels \((u_n)\) it holds

$$\overline{F}(u_n)T_1(u_n){\mathop {\rightarrow }\limits ^{d}} T_{\theta }=\left\{ \begin{array}{ll} \eta , &{} \text{ with } \text{ probability }~~ \theta , \\ 0, &{} \text{ with } \text{ probability }~~ 1-\theta , \end{array} \right. $$

where \(\eta \) is exponential with mean \(\theta ^{-1}.\)

Theorem 1

[8]. Let the positive integers \((r_n)\) and the thresholds \((u_n)\), \(n\ge 1\), be such that \(r_n\rightarrow \infty \), \(r_n\overline{F}(u_n)\rightarrow \tau \) and \(P\{M_{r_n}\le u_n\}\rightarrow exp(-\theta \tau )\) as \(n\rightarrow \infty \) for some \(\tau \in (0,\infty )\) and \(\theta \in (0,1]\). If there are positive integers \(q_n=o(r_n)\) such that \(\alpha _{cr_n,q_n}(u_n)=o(1)\) for all \(c>0\), then for \(t>0\)

$$\begin{aligned} P\{\overline{F}(u_n)T_1(u_n)>t\}\rightarrow \theta \exp (-\theta t)=: 1 - F_\theta (t),\qquad n\rightarrow \infty .\end{aligned}$$
(1)

The well-known intervals estimator of the extremal index is based on inter-exceedance times and is found via method of moments applied to the limit distribution (1). It is defined as ([8], see also [1], p. 391),

$$\begin{aligned} \hat{\theta }_n(u_n)= & {} \Big \{ \begin{array}{ll} \min (1,\hat{\theta }_n^1(u)), \text{ if } \max \{T_i(u_n),\ 1 \le i \le L\} \le 2,\\ \min (1,\hat{\theta }_n^2(u)), \text{ if } \max \{T_i(u_n),\ 1 \le i \le L\} > 2, \end{array} \end{aligned}$$
(2)

where

$$\begin{aligned} \hat{\theta }_n^1(u_n)=\frac{2\left( \sum _{i=1}^{L}T_i(u_n)\right) ^2}{L\sum _{i=1}^{L}(T_i(u_n))^2} \quad \text{ and } \quad \hat{\theta }_n^2(u_n)=\frac{2\left( \sum _{i=1}^{L}(T_i(u_n)-1)\right) ^2}{L\sum _{i=1}^{L}(T_i(u_n)-1)(T_i(u_n)-2)}. \end{aligned}$$

It is known that

$$\begin{aligned} \sqrt{L}(\hat{\theta }_n(u_n) - \theta ) {\mathop {\longrightarrow }\limits ^{d}} N(0,\theta ^3 v(\theta )),\end{aligned}$$
(3)

where \(v(\theta )\) is the second moment of the cluster size distribution \(\{\pi (m)\}_{m\ge 1},\) [18]. Moreover, Theorem 2.4 [18] states that non-zero elements of the sequence

$$\begin{aligned} Z_i = \overline{F}(u_n)T_i(u_n), \quad i=1,\ldots , L,\end{aligned}$$
(4)

are asymptotically independent under the assumptions of Theorem 1 and some assumptions on the cluster structure of the initial stationary sequence \((X_n).\)

To be able to use these properties, Markovich and Rodionov [15] assume that there exists a sequence \((E_i)_{i=1}^l,\) \(l=[\theta L],\) of independent exponentially distributed random variables with mean \(\theta ^{-1}\) such that

$$\begin{aligned} Z_{(L-k)} - E_{(l-k)} = o\left( \frac{1}{\sqrt{k}}\right) \end{aligned}$$
(5)

uniformly for all \(k\rightarrow \infty \) with \(k/L\rightarrow 0\) as \(L\rightarrow \infty .\) This assumption is based on the following reasoning. It follows from Theorem 3.2 [18], that the limit distribution of the statistic

$$\begin{aligned} \sqrt{L}\left( \sum _{i=1}^{L} f(Z_i) - E f(Z_1)\right) \end{aligned}$$

for some class of continuous f does not depend on substitution of the set of r.v.s \(\{Z^*_i\}_{i=1}^{L}\) with cdf \(F_\theta \) instead of \(\{Z_i\}_{i=1}^{L}\) under some regularity conditions. Moreover, for these r.v.s Theorem 2.2.1 and Lemma 2.2.3 [7] imply that if \(k/L\rightarrow c,\ c\in [0,\theta )\) as \(k\rightarrow \infty ,\) \(L\rightarrow \infty ,\) then

$$\begin{aligned}&\sqrt{k}(E_{(l-k)} - \ln (l/k)/\theta ) = O_P(1).\end{aligned}$$

2.2 Discrepancy Method

The method proposed in [15] is based on the so-called discrepancy method initially introduced in [12] and [22], see also [13], for optimal bandwidth selection in the problem of density estimation and applied at the first time for extremal index estimation in [14]. Let \(\rho (\cdot , \cdot )\) be some distance on the space of probability measures, \(\hat{F}_n\) be the empirical cdf of the sequence \((X_i)_{i=1}^n\) and \(\{F_u, u\in U\}\) be the family of cdfs parametrized by one-dimensional parameter u. Then the optimal value of u can be found as a solution of the discrepancy equation

$$\begin{aligned} \rho (\hat{F}_n, F_u) = \delta ,\end{aligned}$$
(6)

where \(\delta ,\) the so-called discrepancy value, is defined by the choice of \(\rho .\) The statistic of the Cramér–von Mises–Smirnov goodness-of-fit test (CMS statistic)

$$\begin{aligned} \omega _n^2 = n\int _{\mathbbm {R}}(\hat{F}_n(x) - F_0(x))^2 dF(x) \end{aligned}$$

was chosen as \(\rho \) in [15], though the statistics of other goodness-of-fit tests, e.g., the Kolmogorov and Anderson-Darling tests, can be applied in the discussing problem. Then quantiles of the limit distribution of the CMS statistic can be used as \(\delta .\) The choice of the parameter u as \(\hat{u} = \mathrm{argmin}_u \rho (\hat{F}_n, F_u)\) is usually not optimal.

3 Main Results

In this section we consider the problem of the discrepancy method application to choose the threshold/block-size parameter of the extremal index estimator. To simplify, let us assume that we choose the threshold parameter u. It seems that for this purpose one can take \(F_u = T_{\hat{\theta }(u)},\) \(\hat{F}_n\) be equal to the empirical cdf of the sequence \(\{Z_i\}_{i=1}^L\) and \(\rho \) be equal to the \(\omega ^2\) distance in (6). However, we cannot directly apply the discrepancy method coupling with the \(\omega ^2\) distance to this problem since \(T_\theta \) is not a continuous distribution and thus the limit distribution of the CMS statistic would depend on \(\theta \). To overcome this difficulty, we introduce the modification of the CMS statistic based only on the largest order statistics corresponding to \(\{Z_i\}_{i=1}^L,\) since, as was mentioned in Sect. 2.1, the largest elements of this sequence are continuously distributed and asymptotically independent. Thus we face the problem of goodness-of-fit testing of left-censored data and, in particular, distribution tails, see [20] for the principles of testing of distribution tails.

Let \((Y_i)_{i=1}^n\) be independent identically distributed random variables with common continuous cdf \(F_Y.\) Recall that if the hypothesis \(H_0: F_Y = F_0\) for continuous cdf \(F_0\) holds, then the CMS statistic can be rewritten as

$$\begin{aligned} \omega ^2_n = \sum _{i=1}^n \left( F_0(Y_{(i)}) - \frac{i-0.5}{n}\right) ^2 + \frac{1}{12 n}, \end{aligned}$$

where \(Y_{(1)}\le \ldots \le Y_{(n)}\) are the order statistics corresponding to \((Y_i)_{i=1}^n.\) It is well-known that the limit distribution of \(\omega ^2_n\) (denote its cdf as \(A_1\)) under \(H_0\) does not depend on \(F_0.\)

Goodness-of-fit procedures for various types of left- and right-censored data were proposed in a large number of works, we refer to the classical monograph [4] and recent monograph [23]. But to the best of author’s knowledge, there are no works in the literature proposing the modifications of goodness-of-fit statistics for censored data having the same limit distribution as their full-sample analogues. Introduce the following modification of the CMS statistic

$$\begin{aligned} \hat{\omega }_k^2 = \sum _{i=0}^{k-1} \left( \frac{F_0(Y_{(n-i)}) - F_0(Y_{(n-k)})}{1 - F_0(Y_{(n-k)})} - \frac{k-i-0.5}{k}\right) ^2 + \frac{1}{12 k}. \end{aligned}$$

Theorem 2

Let the hypothesis \(H_0^t: \{F_Y(x) = F_0(x)\ \mathrm{for}\ \mathrm{all}\ \mathrm{large}\ x\}\) holds. Then there is c such that

$$\begin{aligned} \hat{\omega }_k^2 {\mathop {\rightarrow }\limits ^{d}} \xi \sim A_1 \end{aligned}$$

as \(k\rightarrow \infty ,\) \(k/n < c,\) \(n\rightarrow \infty .\)

Theorem 3.1, [15], is a particular case of the latter theorem for \(F_0 = F_\theta ,\) where \(F_\theta \) is defined in (1). It is worth noting that there is no necessity to require the continuity of \(F_0\) for all real x;  we need this only for all sufficiently large x. Theorem 2 allows us to propose the goodness-of-fit test for continuous distribution tail of significance level \(\alpha \) in the following way

$$\begin{aligned} \text{ if } \hat{\omega }_k^2 > a_{1-\alpha }, \text{ then } \text{ reject } H_0^t, \end{aligned}$$

where \(a_{1-\alpha }\) is the \((1-\alpha )\)-quantile of \(A_1.\) Additionally, one can show that this test is consistent as \(k\rightarrow \infty ,\) \(k/n \rightarrow 0,\) \(n\rightarrow \infty .\) Clear, only the largest order statistics of a sample can be used for testing the distribution tail hypotheses both if \(k/n\rightarrow 0\) and if \(k/n<c.\) This problem is reasonable both if only the upper tail of the distribution is of interest and/or if only the largest order statistics of a sample are available.

Let us return to the problem of extremal index estimation. Consider

$$\begin{aligned} \widetilde{\omega }^2_L(\theta ) = \sum _{i=0}^{k-1} \left( \frac{F_\theta (Z_{(L-i)}) - F_\theta (Z_{(L-k)})}{1 - F_\theta (Z_{(L-k)})} - \frac{k-i-0.5}{k}\right) ^2 + \frac{1}{12 k}. \end{aligned}$$

The following theorem states that under some mild conditions the statistic \(\widetilde{\omega }^2_L(\theta )\) with some estimator \(\widehat{\theta }_{n}\) substituted for \(\theta \) has the same limit distribution as the statistic \(\hat{\omega }^2_k\) in Theorem 2 and the classical CMS statistic.

Theorem 3

[15]. Let the assumptions of Theorem 1 and the condition (5) hold. Assume the extremal index estimator \(\widehat{\theta } = \widehat{\theta }_{n}\) is such that

$$\begin{aligned}&\sqrt{m_n}(\widehat{\theta }_{n} - \theta ) {\mathop {\rightarrow }\limits ^{d}} \zeta ,~~ n\rightarrow \infty , \end{aligned}$$

where \(\zeta \) has a non-degenerate cdf H. Assume the sequence of integers \((m_n)\) is such that

$$\begin{aligned} \frac{k}{m_n} = o(1)\ \text{ and } \ \frac{(\ln L)^2}{m_n} = o(1) \end{aligned}$$

as \(n\rightarrow \infty \). Then

$$\begin{aligned} \widetilde{\omega }^2_L(\widehat{\theta }_{n})&{\mathop {\rightarrow }\limits ^{d}}&\xi \sim A_1.\end{aligned}$$

Remark 1

All extremal index estimators mentioned in Introduction satisfy the assumptions of Theorem 3 with H equal to the normal cdf with zero mean.

Theorem 3 guarantees the correctness of the discrepancy method

$$\begin{aligned} \widetilde{\omega }^2_L(\widehat{\theta }_{n}) = \delta ,\end{aligned}$$
(7)

where \(\delta \) can be selected equal to 0.05,  the mode of \(A_1,\) and \(k/L\rightarrow 0\) as \(k\rightarrow \infty \) and \(L\rightarrow \infty .\) The simulation study provided in [15] shows that \(u_{\max } = \max \{u_1, \ldots , u_d\}\) is the best choice for threshold parameter both for the intervals and the K-gaps estimators of the extremal index, where \(\{u_1, \ldots , u_d\}\) are solutions of the discrepancy equation (7). The numerical comparison of the proposed method with other methods of threshold selection shows the significant advantage of the developed procedure on a wide class of model processes, see [15] for details. Although the limit distribution of the statistic \(\widetilde{\omega }^2_L(\widehat{\theta }_{n})\) does not depend on k,  the selection of k for samples of moderate sizes remains a problem. The choice \(k = \min (\hat{\theta }_0 L, L^\beta )\) with \(\beta \in (0,1),\) where \(\hat{\theta }_0\) is some pilot estimate, has proven by simulation study to be the most suitable. But in case of \(k/L\rightarrow c>0\) as \(k\rightarrow \infty ,\) \(L\rightarrow \infty \) the distribution of \(\widehat{\theta }_{n}\) affects the limit distribution of the modified CMS statistic \(\widetilde{\omega }^2_L(\widehat{\theta }_{n})\) in contrast to the case \(c=0,\) thus this limit distribution would differ from \(A_1\).

The asymptotic distributions of goodness-of-fit test statistics with parameters of an underlying distribution being estimated were intensively studied in the literature. The starting point for this classical theory was in works [5] and [10], whereas the common method to derive the limit distribution was proposed in [6]. However, this method based on a multivariate central limit theorem and convergence in the Skorokhod space cannot be directly applied to the problem of evaluating the limit distribution of the statistic \(\widetilde{\omega }^2_L(\widehat{\theta }_{n})\) when the assumption \(k/m_n = o(1)\) does not hold (\(m_n = O(L)\) for the intervals and K-gaps estimators, thus we can talk about the case \(k/L = o(1)\)). For this purpose we consider another modification of the CMS statistic, the first analogue of which was introduced in [17],

$$\begin{aligned} \omega ^2_{L,c}(\theta ) = L \int _{x(c)}^\infty (F_L^*(x) - F_\theta (x))^2 dF_\theta (x),\end{aligned}$$
(8)

where \(F_L^*(x)\) is the empirical distribution function of the sequence \(\{Z_i\}_{i\le L}\) and \(x(c) = \inf \{x: F_\theta (x) \ge 1-c\} =: F^{\leftarrow }_\theta (1-c),\) \(c\in (0,1).\) In the sequel, we will assume \(0<c<\theta ,\) therefore \(F_\theta (x(c)) = 1 - c.\) It follows from the results derived in [17] and the assumption (5) that the statistic \(\omega ^2_{L,c}\) converges in distribution to \(\omega ^2(c),\) where

$$\begin{aligned} \omega ^2(c) = \int _{1-c}^1 B^2(t) dt \end{aligned}$$

and B(t) is the standard Brownian bridge, i.e. the Gaussian process on the interval [0, 1] with mean zero and covariance function \(\mathrm{cov}(B_t, B_s) = \min (t,s) - ts.\)

Denote \(\ell _c(t) = \max (t - (1-c), 0).\) Following the ideas of [6] we introduce the sample process

$$\begin{aligned} y_{L,c}(t, \theta ) = \sqrt{L} \left( \hat{F}_{L,c}(t, \theta ) - \ell _c(t)\right) , \quad t\in [0,1], \end{aligned}$$

where

$$\begin{aligned} \hat{F}_{L,c}(t, \theta ) = \frac{1}{L} \sum _{i=1}^L I(1 - c < F_\theta (Z_i) \le t), \end{aligned}$$

call it the truncated empirical distribution function of the sequence \(\{Z_i\}_{i\le L}.\) Clear, since \(\theta >c\) it holds

$$\begin{aligned} \int _{0}^{1} y^2_{L,c}(t, \theta ) dt = \omega ^2_{L,c}(\theta ). \end{aligned}$$

Denote D,  the Skorokhod space, i.e. the space of right-continuous functions with left-hand limits on [0, 1] and metric d(xy) (see, e.g., [3], p. 111). The following theorem allows us to find the asymptotic distribution of the statistic \(\omega ^2_{L,c}(\hat{\theta }_n),\) where \(\hat{\theta }_n\) is the intervals estimator (2).

Theorem 4

Let the sequence \(\{Z_i\}\) defined by (4) satisfies the assumptions of Theorem 3.2 [18]. Assume \(\theta >0.\) For every \(c\in (0, \theta )\) the estimated sample process \(\hat{y}_{c}(t) := y_{L,c}(t, \hat{\theta }_n)\) converges weakly in D as \(n\rightarrow \infty \) to the Gaussian process \(X(t), t\in [1-c,1],\) with mean zero and covariance function

$$\begin{aligned} C(t, s)= & {} \ell _c(\min (t,s)) - (2-2/\theta )\ell _c(t)\ell _c(s) \\&- \frac{1}{2\theta ^2}h_{c}(t)(2 h_c(s) + \tilde{h}_{c}(s)) - \frac{1}{2\theta ^2}h_{c}(s)(2 h_c(t) + \tilde{h}_{c}(t)) + \frac{v(\theta )}{\theta } h_{c}(t) h_{c}(s), \nonumber \end{aligned}$$
(9)

where

$$\begin{aligned} h_{c}(t) = (1-t)\log \left( \frac{1-t}{\theta }\right) - c\log \left( \frac{c}{\theta }\right) , \quad \tilde{h}_{c}(t) = (1-t)\log ^2\left( \frac{1-t}{\theta }\right) - c\log ^2\left( \frac{c}{\theta }\right) \end{aligned}$$

and \(v(\theta )\) is defined in (3).

We see that the covariance function of the process X(t) depends on \(\theta \) and c. This fact makes the usage of quantiles of the limit distribution of the statistic \(\omega ^2_{L,c}(\hat{\theta }_n)\) (or some its appropriate normalization) as \(\delta \) in the discrepancy method (7) quite inconvenient in practice. However, high efficiency of the discrepancy method apparently means that the values of the mentioned quantiles do not strongly depend on the values of \(\theta \) and c.

4 Proofs

4.1 Proof of Theorem 2

To prove Theorem 2, we need the following

Lemma 1

(Lemma 3.4.1, [7]). Let \(X, X_1, X_2,... ,X_n\) be i.i.d. random variables with common cdf F, and let \(X_{(1)} \le X_{(2)} \le ...\le X_{(n)}\) be the nth order statistics. The joint distribution of \(\{X_{(i)}\}^n_{i=n-k+1}\) given \(X_{(n-k)} = t\), for some \(k= 1,..., n\,-\,1\), equals the joint distribution of the set of order statistics \(\{X^*_{(i)}\}^k_{i=1}\) of i.i.d. r.v.s \(\{X^*_{i}\}^k_{i=1}\) with cdf

$$\begin{aligned}F_t(x)= & {} P\{X \le x|X> t\} = \frac{F(x) - F(t)}{1-F(t)}, ~~~ x>t.\end{aligned}$$

Assume \(F_Y(x) = F_0(x)\) for all \(x> x_0\) and set \(c = 1 - F_0(x_0)-\varepsilon \) for some small \(\varepsilon >0.\) Clear, since \(k/n< c\) then \(P(Y_{(n-k)}> x_0) \rightarrow 1\) under the assumptions.

Consider the conditional distribution of \(\hat{\omega }^2_k\) given \(F_0(Y_{(n-k)}) = t,\ t>1-c.\) By Lemma 1 and the assumption \(k/n < c\), the conditional joint distribution of the set of order statistics \(\{F_0(Y_{(i)})\}_{i=n-k+1}^n\) coincides with the joint distribution of the set of order statistics \(\{U^*_{(i)}\}_{i=1}^k\) of a sample \(\{U^*_i\}_{i=1}^k\) from the uniform distribution on [t, 1]. Therefore, it holds

$$\begin{aligned}&\hat{\omega }^2_k {\mathop {=}\limits ^{d}} \frac{1}{(1 - t)^2} \left( \sum \limits _{i=1}^k\left( U^*_{(i)} - t-\frac{i-0.5}{k}(1-t)\right) ^2\right) + \frac{1}{12k}.\end{aligned}$$

Next, \(V^*_{(i)} = U^*_{(i)}-t,\) \(1\le i \le k,\) are the order statistics of a sample \(\{V^*_i\}_{i=1}^{k}\) from the uniform distribution on \([0, 1-t]\). Hence, it follows

$$\hat{\omega }^2_k {\mathop {=}\limits ^{d}} \frac{1}{(1 - t)^2} \left( \sum \limits _{i=1}^k\left( V^*_{(i)} - \frac{i-0.5}{k}(1-t)\right) ^2\right) + \frac{1}{12k}.$$

Finally, \(W^*_{(i)} = V^*_{(i)}/(1-t),\) \(1\le i \le k,\) are the order statistics of a sample \(\{W^*_i\}_{i=1}^k\) from the uniform distribution on [0, 1]. Therefore, we get

$$\begin{aligned}\hat{\omega }^2_k&{\mathop {=}\limits ^{d}}&\sum \limits _{i=1}^k\left( W^*_{(i)} - \frac{i-0.5}{k}\right) ^2 + \frac{1}{12k}.\end{aligned}$$

It is easy to see, that the latter expression is exactly the CMS statistic and converges in distribution to a random variable \(\xi \) with cdf \(A_1\) independently of the value of t.

4.2 Proof of Theorem 4

Assume for convenience that \(\hat{\theta }_n = \hat{\theta }^2_n(u_n),\) where \(\hat{\theta }^2_n(u_n)\) is defined by (2). For shortening, write \(\hat{\theta }\) instead of \(\hat{\theta }_n\). We need the following

Lemma 2

Under the assumptions

$$\begin{aligned} \sqrt{L}(\hat{\theta }- \theta ) = \frac{\theta }{\sqrt{L}} \sum _{i=1}^L \left( 2Z_i - \frac{\theta }{2}Z_i^2 - 1\right) + o_P(1), \end{aligned}$$

where \(o_P(1)\) denotes a sequence of random variables vanishing in probability.

Proof

(of Lemma 2)

Observe that

$$\begin{aligned} n - \sum _{i=1}^L T_i(u_n) \le T_{L+1}(u_n) {\mathop {=}\limits ^{d}} T_1(u_n), \end{aligned}$$

where the last relation holds by stationarity and definition of inter-exceedance times. Denote \(r_n = n \overline{F}(u_n).\) Theorem 3.2 [18] implies that

$$\begin{aligned} \sqrt{r_n}\left( L/r_n - 1\right) {\mathop {\longrightarrow }\limits ^{d}} N(0, \theta v(\theta )),\end{aligned}$$
(10)

thus for all \(\varepsilon >0\)

$$\begin{aligned} \nonumber P\left( \sqrt{r_n}\Big (1 - \frac{1}{r_n}\sum _{i=1}^L Z_i\Big )> \varepsilon \right)= & {} P\left( \sqrt{r_n} \frac{\overline{F}(u_n)}{r_n} \Big (n - \sum _{i=1}^L T_i(u_n)\Big )> \varepsilon \right) \\\le & {} P\left( \frac{\overline{F}(u_n)}{\sqrt{r_n}} T_{L+1}(u_n)> \varepsilon \right) = P\left( \frac{1}{\sqrt{r_n}} Z_{1} > \varepsilon \right) \rightarrow 0, \end{aligned}$$
(11)

where the last relation holds by Theorem 1. Note that by (10) \(L/r_n {\mathop {\longrightarrow }\limits ^{P}} 1,\) thus

$$\begin{aligned} \sqrt{L}(\hat{\theta }- \theta ) = \sqrt{r_n}(\hat{\theta }- \theta ) + o_P(1). \end{aligned}$$

The latter relations imply that

$$\begin{aligned} \sqrt{L}(\hat{\theta }- \theta ) = \sqrt{r_n}\left( \frac{2r_n/L}{\frac{1}{r_n}\sum _{i=1}^L Z_i^2} - \theta \right) + o_P(1) = \theta \sqrt{r_n}\left( \frac{2r_n/(\theta L) - \frac{1}{r_n}\sum _{i=1}^L Z_i^2}{\frac{1}{r_n}\sum _{i=1}^L Z_i^2}\right) + o_P(1). \end{aligned}$$

Next, it follows from Lemma B.7 [18] that

$$\begin{aligned} \frac{1}{r_n}\sum _{i=1}^L Z_i^2 {\mathop {\longrightarrow }\limits ^{P}} \frac{2}{\theta }, \end{aligned}$$

therefore

$$\begin{aligned} \sqrt{L}(\hat{\theta }- \theta ) = \frac{\theta ^2}{2}\sqrt{r_n}\left( \frac{2r_n}{\theta L} - \frac{1}{r_n}\sum _{i=1}^L Z_i^2\right) + o_P(1). \end{aligned}$$

It immediately follows from (10) and the delta method that

$$\begin{aligned} \sqrt{r_n}(r_n/L - 1) = \sqrt{r_n}(L/r_n - 1) + o_P(1). \end{aligned}$$

Finally, we obtain from (11) and the latter

$$\begin{aligned}\sqrt{r_n}\left( \frac{2r_n}{\theta L} - \frac{1}{r_n}\sum _{i=1}^L Z_i^2\right)= & {} \sqrt{r_n}\left( \frac{4}{\theta } - \frac{2L}{\theta r_n} - \frac{1}{r_n}\sum _{i=1}^L Z_i^2\right) + o_P(1)\\= & {} \frac{4}{\theta }\sqrt{r_n} \left( 1 - \frac{1}{r_n}\sum _{i=1}^L Z_i\right) \\&+ \frac{2}{\theta }\sqrt{r_n}\left( 2\frac{1}{r_n}\sum _{i=1}^L Z_i - \frac{1}{r_n}\sum _{i=1}^L 1 - \frac{\theta }{2 r_n}\sum _{i=1}^L Z_i^2\right) + o_P(1)\\= & {} \frac{2}{\sqrt{r_n} \theta }\sum _{i=1}^L \left( 2 Z_i - \frac{\theta }{2}Z_i^2-1\right) +o_P(1).\end{aligned}$$

Applying again \(L/r_n {\mathop {\rightarrow }\limits ^{P}} 1,\) we derive the result.

First of all, we observe that by (11)

$$\begin{aligned} \nonumber y_{L,c}(t, \theta )= & {} \sqrt{L}\left( \frac{1}{L} \sum _{i=1}^L I(1 - c< F_\theta (Z_i) \le t) - \ell _c(t)\right) \\\nonumber= & {} \frac{1}{\sqrt{L}} \sum _{i=1}^L \big (I(1 - c< F_\theta (Z_i) \le t) - \ell _c(t) Z_i\big ) + \ell _c(t) \sqrt{L} \left( \frac{1}{L} \sum _{i=1}^L Z_i - 1\right) \\= & {} \frac{1}{\sqrt{L}} \sum _{i=1}^L \big (I(1 - c < F_\theta (Z_i) \le t) - \ell _c(t) Z_i\big ) + o_P(1). \end{aligned}$$
(12)

It is also worth noting that we can change all expressions of the form \(\frac{1}{\sqrt{L}}\sum _{i=1}^L \xi _i\) appearing in our proof with \(E\xi _i = 0\) for all i on expressions of the form \(\frac{1}{\sqrt{r_n}}\sum _{i=1}^{r_n} \xi _i\) using the same argument as in the proof of Theorem 3.2 [18] based on the formula (10). It means that the randomness of L does not affect the asymptotic of \(\hat{y}_c(t).\)

We follow the ideas of the proof of Theorem 2 in [6]. For shortening, write \(\hat{\theta }\) instead of \(\hat{\theta }_n\). Denote

$$\begin{aligned} \hat{t}(t) = F_\theta (F^\leftarrow _{\hat{\theta }}(t)),\quad t\in [1-c, 1]. \end{aligned}$$

Since \(\hat{\theta }\) is a consistent estimator of \(\theta ,\) see [18], we have

$$\begin{aligned} P(\hat{t}(t) \ge 1 - c) \rightarrow 1\end{aligned}$$
(13)

for all \(t \in (1-c,1],\) and for \(t = 1-c\) the latter probability tends to 1/2. First, we will show that

$$\begin{aligned} y_{L,c}(\hat{t}(t), \theta ) - y_{L,c}(t, \theta ) {\mathop {\longrightarrow }\limits ^{P}} 0\end{aligned}$$
(14)

uniformly for \(t\in [1-c, 1].\) We restrict ourselves to the study of the case \(t\in (1-c, 1],\) the case \(t = 1-c\) is similar. Denote \(x_1(t) = F^{\leftarrow }_{\theta _1}(t)\) and \(\tilde{t}(t) = F_{\theta _2}(x_1(t))\) for \(\theta _1, \theta _2\ge c\) and \(t\in (1-c, 1].\) We have

$$\begin{aligned}\sup _{t\in (1-c, 1]} |\tilde{t} - t|\le & {} \sup _{t\in (1-c, 1]}|F_{\theta _2}(x_1(t)) - F_{\theta _1}(x_1(t))| \\= & {} \sup _{x > F^{\leftarrow }_{\theta _1}(1-c)} \left| (\theta _2 - \theta _1) \left. \frac{\partial F_\theta (x)}{\partial \theta }\right| _{\theta = \theta ^*}\right| , \end{aligned}$$

where \(\theta ^*\) is between \(\theta _1\) and \(\theta _2.\) Since

$$\begin{aligned} \left| \frac{\partial F_\theta (x)}{\partial \theta }\right| = \big |(\theta ^2-1) e^{-\theta x}\big | \le 1 \end{aligned}$$

for all \(\theta \in [0,1]\) and \(x> F^{\leftarrow }_{\theta _1}(1-c),\) we derive that \(\tilde{t}\) converges uniformly to t as \(\theta _1\rightarrow \theta \) and \(\theta _2\rightarrow \theta .\) Therefore, since \(\hat{\theta }\) converges to \(\theta \) in probability,

$$\begin{aligned} \sup _{t\in (1-c, 1]} |\hat{t}(t) - t| {\mathop {\longrightarrow }\limits ^{P}} 0. \end{aligned}$$

An appeal to Lemma B.7 [18] gives us that \(y_{L,c}(t,\theta ) {\mathop {\rightarrow }\limits ^{d}} y(t),\) \(t\in (1-c,1],\) in D where y(t) is the Gaussian random process with mean zero and covariance function

$$\begin{aligned} \mathrm{cov}(y(t),y(s)) = \ell _c(\min (t,s)) - (2-2/\theta )\ell _c(t)\ell _c(s). \end{aligned}$$

The rest of the proof of (14) coincides with the corresponding steps in the proof of Lemma 1 [6].

Now let us show that

$$\begin{aligned} \hat{y}_c(t) = y_{L,c}(t,\theta ) - \sqrt{L}(\hat{\theta }- \theta ) (g(t,\theta ) - g(1-c,\theta )) + o_P(1), \quad t\in [1-c,1],\end{aligned}$$
(15)

where

$$\begin{aligned} g(t, \theta ) = \frac{1-t}{\theta ^2}\log \left( \frac{1-t}{\theta }\right) . \end{aligned}$$

Note that by definition \(\hat{y}_c(1-c) = 0\) a.s., thus it remains to show (15) for \(t\in (1-c,1].\) First we find the explicit form of the “estimated” empirical cdf \(\hat{F}_{L,c}(\hat{t}(t), \theta ).\) We have

$$\begin{aligned}\hat{F}_{L,c}(\hat{t}(t), \theta )= & {} \frac{1}{L}\sum _{i=1}^L I\left( 1 - c \le F_\theta (Z_i)< F_\theta (F^{\leftarrow }_{\hat{\theta }}(t))\right) \\= & {} \frac{1}{L}\sum _{i=1}^L I\left( F_{\hat{\theta }}(F^{\leftarrow }_{\theta }(1 - c))< F_{\hat{\theta }}(Z_i) \le t\right) \\= & {} F_{L,c}(t, \hat{\theta }) - \frac{1}{L}\sum _{i=1}^L I\left( 1-c \le F_{\hat{\theta }}(Z_i)< F_{\hat{\theta }}(F^{\leftarrow }_{\theta }(1 - c)) \right) \\= & {} F_{L,c}(t, \hat{\theta }) - \frac{1}{L}\sum _{i=1}^L I\left( \hat{t}(1-c) < F_{\theta }(Z_i) \le 1 - c \right) . \end{aligned}$$

Denote

$$\begin{aligned} \tilde{F}_{L,c}(t, \theta ) = \frac{1}{L}\sum _{i=1}^L I\left( t < F_{\theta }(Z_i) \le 1 - c \right) , \quad t\le 1-c. \end{aligned}$$

Therefore we derive for the estimated sample process \(\hat{y}_c(t)\)

$$\begin{aligned} \hat{y}_c(t)= & {} \sqrt{L}(F_{L,c}(t, \hat{\theta }) - \ell _c(t)) \\= & {} \sqrt{L}\Big (F_{L,c}(\hat{t}(t), \theta ) - \ell _c(\hat{t}(t))\Big ) + \sqrt{L}\Big (\ell _c(\hat{t}(t)) - \ell _c(t)\Big ) + \sqrt{L}\tilde{F}_{L,c}(\hat{t}(1-c), \theta )\\= & {} y_{L,c}(\hat{t}(t), \theta ) + \sqrt{L}\Big (\ell _c(\hat{t}(t)) - \ell _c(t)\Big ) + \sqrt{L}\tilde{F}_{L,c}(\hat{t}(1-c), \theta ). \end{aligned}$$

Consider the third summand on the right-hand side. Fix \(c_1\in (c, \theta ).\) Note that (14) remains true for all \(c_1<\theta ,\) thus we derive

$$\begin{aligned} y_{L,c_1}(\hat{t}(1-c), \theta ) - y_{L,c_1}(1-c, \theta ) {\mathop {\longrightarrow }\limits ^{P}} 0. \end{aligned}$$

On the other hand,

$$\begin{aligned} y_{L,c_1}(\hat{t}(1-c), \theta )- & {} y_{L,c_1}(1-c, \theta ) \\= & {} \sqrt{L}(F_{L,c_1}(\hat{t}(1-c), \hat{\theta }) - \ell _{c_1}(\hat{t}(1-c))) - \sqrt{L}(F_{L,c_1}(1-c, \hat{\theta }) - \ell _{c_1}(1-c))\\= & {} \sqrt{L} (1-c - \hat{t}(1-c)) - \sqrt{L}\tilde{F}_{L,c}(\hat{t}(1-c), \theta ), \end{aligned}$$

therefore we derive

$$\begin{aligned} \sqrt{L}\tilde{F}_{L,c}(\hat{t}(1-c), \theta ) = \sqrt{L} (1-c - \hat{t}(1-c)) + o_P(1). \end{aligned}$$

Next, (13) implies that \(\sqrt{L}\Big (\ell _c(\hat{t}(t)) - \ell _c(t)\Big ) = \sqrt{L}(\hat{t}(t) - t) + o_P(1)\) in case of \(t\in (1-c,1].\) We have

$$\begin{aligned} \sqrt{L}\Big (\ell _c(\hat{t}(t)) - \ell _c(t)\Big )= & {} \sqrt{L}\Big (F_\theta (F^{\leftarrow }_{\hat{\theta }}(t)) - F_{\hat{\theta }}(F^{\leftarrow }_{\hat{\theta }}(t))\Big ) + o_P(1)\\= & {} \sqrt{L}(\theta - \hat{\theta }) \left. \frac{\partial F_\gamma (x)}{\partial \gamma }\right| _{{\mathop {\gamma = \theta ^*}\limits ^{x = F^{\leftarrow }_{\hat{\theta }}(t)}}} + o_P(1), \end{aligned}$$

where \(\theta ^*\) is between \(\theta \) and \(\hat{\theta }.\) Similarly to the corresponding steps in the proof of Lemma 2 [6] we can show that

$$\begin{aligned} \left. \frac{\partial F_\gamma (x)}{\partial \gamma }\right| _{{\mathop {\gamma = \theta ^*}\limits ^{x = F^{\leftarrow }_{\hat{\theta }}(t)}}} = \left. \frac{\partial F_\gamma (x)}{\partial \gamma }\right| _{{\mathop {\gamma = \theta }\limits ^{x = F^{\leftarrow }_{\theta }(t)}}} + o_P(1), \end{aligned}$$

since \(\partial F_\gamma (x)/\partial \gamma \) is continuous with respect to \((x, \gamma )\) for all \(x> F^{\leftarrow }_\theta (1-c)\) and \(\gamma \in (0,1].\) Clear,

$$\begin{aligned}\left. \frac{\partial F_\gamma (x)}{\partial \gamma }\right| _{{\mathop {\gamma = \theta }\limits ^{x = F^{\leftarrow }_{\theta }(t)}}}= & {} \left. (x-1) e^{-\gamma x}\right| _{{\mathop {\gamma = \theta }\limits ^{x = F^{\leftarrow }_{\theta }(t)}}} = \frac{1-t}{\theta ^2}\log \left( \frac{1-t}{\theta }\right) = g(t, \theta ). \end{aligned}$$

Note that the relation

$$\begin{aligned} \sqrt{L}(\hat{t}(t) - t) = \sqrt{L}(\theta - \hat{\theta })g(t,\theta ) + o_P(1) \end{aligned}$$

derived above for \(t\in (1-c, 1]\) remains true also for \(t=1-c.\) Finally, combining the previous relations and using (14), we derive (15).

Define the empirical process

$$\begin{aligned} z_L(t) = \frac{1}{\sqrt{L}} \sum _{i=1}^L\Big (I(1-c< F_\theta (Z_i) \le t) - \ell _c(t) Z_i - \theta \big (2Z_i - \frac{\theta }{2}Z_i^2-1\big )(g(t,\theta ) - g(1-c,\theta ))\Big ) \end{aligned}$$

and notice that \(\hat{y}_c(t) = z_L(t) + o_P(1)\) by Lemma 2, (12) and (15). To complete the proof of Theorem 4 we need to prove that

$$\begin{aligned} (z_L(t_1), \ldots , z_L(t_k)) {\mathop {\longrightarrow }\limits ^{P}} (X(t_1), \ldots , X(t_k)),\quad \text{ for } \text{ all } 1-c\le t_1< \ldots < t_k\le 1, \end{aligned}$$

where X(t) is the Gaussian process on \([1-c, 1]\) with mean zero and covariance function (9), and justify that the sequence of random elements \((z_L)\) is tight. These parts of the proof are carried out similarly to the proofs of Lemma 3 and Lemma 4 [6], respectively.

5 Conclusion

The paper provides a study of properties of the new threshold selection method for non-parametric estimation of the extremal index of stationary sequences proposed in [15]. We consider a specific normalization of the discrepancy statistic based on some modifications of the Cramér–von Mises–Smirnov statistic \(\omega ^2\) that is calculated by only k largest order statistics of a sample. We show that the asymptotic distribution of the truncated Cramér–von Mises–Smirnov statistic (8) as \(k \rightarrow \infty , k/L \rightarrow c, L\rightarrow \infty \) depends both on c and the limit distribution of the extremal index estimator being substituted in the statistic. We also develop the goodness-of-fit test for distribution tails based on the \(\omega ^2\) statistic modification, which limit distribution coincides with the limit distribution of the classical Cramér–von Mises–Smirnov statistic under null hypothesis.