Change-Point Detection Under Dependence Based on Two-Sample U-Statistics

Dehling, Herold; Fried, Roland; Garcia, Isabel; Wendler, Martin

doi:10.1007/978-1-4939-3076-0_12

Herold Dehling⁶,
Roland Fried⁷,
Isabel Garcia⁶ &
…
Martin Wendler⁸

Part of the book series: Fields Institute Communications ((FIC,volume 76))

1032 Accesses
12 Citations

Abstract

We study the detection of change-points in time series. The classical CUSUM statistic for detection of jumps in the mean is known to be sensitive to outliers. We thus propose a robust test based on the Wilcoxon two-sample test statistic. The asymptotic distribution of this test can be derived from a functional central limit theorem for two-sample U-statistics. We extend a theorem of Csörgő and Horváth to the case of dependent data.

Access provided by Autonomous University of Puebla. Download chapter PDF

On change-points tests based on two-samples U-Statistics for weakly dependent observations

Article 28 May 2021

Change Point Detection with Multivariate Observations Based on Characteristic Functions

Gradual change-point analysis based on Spearman matrices for multivariate time series

Article 05 January 2024

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Change-point tests address the question whether a stochastic process is stationary during the entire observation period or not. In the case of independent data, there is a well-developed theory; see the book by Csörgő and Horváth [6] for an excellent survey. When the data are dependent, much less is known. The CUSUM statistic has been intensely studied, even for dependent data; see again Csörgő and Horváth [6]. The CUSUM test, however, is not robust against outliers in the data. In the present paper, we study a robust test which is based on the two-sample Wilcoxon test statistic. Simulations show that this test outperforms the CUSUM test in the case of heavy-tailed data.

In order to derive the asymptotic distribution of the test, we study the stochastic process

$$\displaystyle{ \sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}h(X_{ i},X_{j}),\;0 \leq \lambda \leq 1, }$$

(1)

where $h: \mathbb{R}^{2} \rightarrow \mathbb{R}$ is a kernel function. In the case of independent data, the asymptotic distribution of this process has been studied by Csörgő and Horváth [5]. In the present paper, we extend their result to short range dependent data $(X_{i})_{i\geq 1}$. Similar results have been obtained for long range dependent data by Dehling, Rooch and Taqqu [10], albeit with completely different methods.

U-statistics have been introduced by Hoeffding [14], where the asymptotic normality was established both for the one-sample as well as the two-sample U-statistic in the case of independent data. The asymptotic distribution of one-sample U-statistics of dependent data was studied by Sen [18, 19], Yoshihara [22], Denker and Keller [12, 13] and by Borovkova, Burton and Dehling [3] in the so-called non-degenerate case, and by Babbel [1] and Leucht [16] in the degenerate case. For two-sample U-statistics, Dehling and Fried [8] established the asymptotic normality of $\sum _{i=1}^{n_{1}}\sum _{ j=n_{1}+1}^{n_{1}+n_{2}}h(X_{ i},X_{j})$ for dependent data, when $n_{1},n_{2} \rightarrow \infty$. The main theoretical result of the present paper is a functional version of this limit theorem.

In our paper, we focus on data that can be represented as functionals of a mixing process. In this way, we cover most examples from time series analysis, such as ARMA and ARCH processes, but also data from chaotic dynamical systems. For a survey of processes that have a representation as functional of a mixing process, see e.g. Borovkova, Burton and Dehling [3]. Earlier references can be found in Ibragimov and Linnik [15], Denker [11] and Billingsley [2].

2 Definitions and Main Results

Given the samples $X_{1},\ldots,X_{n_{1}}$ and $Y _{1},\ldots,Y _{n_{2}}$, and a kernel h(x, y), we define the two-sample U-statistic

$$\displaystyle{ U_{n_{1},n_{2}}:= \frac{1} {n_{1}\,n_{2}}\sum _{i=1}^{n_{1} }\sum _{j=1}^{n_{2} }h(X_{i},Y _{j}). }$$

(2)

More generally, one can define U-statistics with multivariate kernels $h: \mathbb{R}^{k} \times \mathbb{R}^{l} \rightarrow \mathbb{R}$. In the present paper, for the ease of exposition, we will restrict attention to bivariate kernels h(x, y). The main results, however, can easily be extended to the multivariate case.

Assuming that $(X_{i})_{i\geq 1}$ and $(Y _{i})_{i\geq 1}$ are stationary processes with one-dimensional marginal distribution functions F and G, respectively, we can test the hypothesis H: F = G using the two-sample U-statistic. E.g., the kernel $h(x,y) = y - x$ leads to the U-statistic

$$\displaystyle{ U_{n_{1},n_{2}} = \frac{1} {n_{1}\,n_{2}}\sum _{i=1}^{n_{1} }\sum _{j=1}^{n_{2} }(Y _{j} - X_{i}) = \frac{1} {n_{2}}\sum _{j=1}^{n_{2} }Y _{j} - \frac{1} {n_{1}}\sum _{i=1}^{n_{1} }X_{i}, }$$

(3)

and thus to the familiar two-sample Gauß-test. Similarly, the kernel $h(x,y) = 1_{\{x\leq y\}}$ leads to the U-statistic

$$\displaystyle{ U_{n_{1},n_{2}} = \frac{1} {n_{1}\,n_{2}}\sum _{i=1}^{n_{1} }\sum _{j=1}^{n_{2} }1_{\{X_{i}\leq X_{j}\}}, }$$

(4)

and thus to the 2-sample Mann-Whitney-Wilcoxon test.

In the present paper, we investigate tests for a change-point in the mean of a stochastic process $(X_{i})_{i\geq 1}$. We consider the model

$$\displaystyle{ X_{i} =\mu _{i} +\xi _{i},\;i \geq 1, }$$

(5)

where $(\mu _{i})_{i\geq 1}$ are unknown constants and where (ξ _i)_i ≥ 1 is a stochastic process. We want to test the hypothesis $H:\;\mu _{1} =\ldots =\mu _{n}$ against the alternative that there exists 1 ≤ k ≤ n − 1 such that $\mu _{1} =\ldots =\mu _{k}\neq \mu _{k+1} =\ldots =\mu _{n}$.

Tests for the change-point problem are often derived from 2-sample tests applied to the samples X ₁, …, X _k and X _k+1, …, X _n, for all possible 1 ≤ k ≤ n − 1. For two-sample tests based on U-statistics with kernel h(x, y), this leads to the test statistic $\sum _{i=1}^{k}\sum _{j=k+1}^{n}h(X_{i},X_{j})$, 1 ≤ k ≤ n, and thus to the process

$$\displaystyle{ U_{n}(\lambda ) =\sum _{ i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}h(X_{ i},X_{j}),\;0 \leq \lambda \leq 1. }$$

(6)

In this paper, we will derive a functional limit theorem for the process $(U_{n}(\lambda ))_{0\leq \lambda \leq 1}$, n ≥ 1. Specifically, we will show that under certain technical assumptions on the kernel h and on the process (X _i)_i ≥ 1, a properly centered and renormalized version of $(U_{n}(\lambda ))_{0\leq \lambda \leq 1}$ converges to a Gaussian process.

In our paper, we will assume that the process (ξ _i)_i ≥ 0 is weakly dependent. More specifically, we will assume that (ξ _i)_i ≥ 0 can be represented as a functional of an absolutely regular process.

Definition 1.

(i)
Given a stochastic process $(X_{n})_{n\in \mathbb{Z}}$ we denote by $\mathcal{A}_{l}^{k}$ the $\sigma -$ algebra generated by (X _k, …, X _l). The process is called absolutely regular if
$$\displaystyle{ \beta (k) =\sup _{n}\left \{\sup \sum _{j=1}^{J}\sum _{ i=1}^{I}\vert P(A_{ i} \cap B_{j}) - P(A_{i})P(B_{j})\vert \right \}\rightarrow 0, }$$
(7)
as $k \rightarrow \infty$, where the last supremum is over all finite $\mathcal{A}_{1}^{n}-$ measurable partitions (A ₁, …, A _I) and all finite $\mathcal{A}_{n+k}^{\infty }-$ measurable partitions (B ₁, …, B _J).
(ii)
The process (X _n)_n ≥ 1 is called a two-sided functional of an absolutely regular sequence if there exists an absolutely regular process $(Z_{n})_{n\in \mathbb{Z}}$ and a measurable function $f: \mathbb{R}^{\mathbb{Z}} \rightarrow \mathbb{R}$ such that
$$\displaystyle{X_{i} = f((Z_{i+n})_{n\in \mathbb{Z}}).}$$
Analogously, (X _n)_n ≥ 1 is called a one-sided functional if $X_{i} = f((Z_{i+n})_{n\geq 0})$.
(iii)
The process (X _n)_n ≥ 1 is called 1-approximating functional with coefficients (a _k)_k ≥ 1 if
$$\displaystyle{ E\left \vert X_{i} - E(X_{i}\vert Z_{i-k},\ldots,Z_{i+k})\right \vert \leq a_{k}. }$$
(8)

In addition to weak dependence conditions on the process (X _i)_i ≥ 1, the asymptotic analysis of the process (6) requires some continuity assumptions on the kernel functions h(x, y). We use the notion of 1-continuity, which was introduced by Borovkova, Burton and Dehling [3]. Alternative continuity conditions have been used by Denker and Keller [13].

Definition 2.

The kernel h(x, y) is called 1-continuous, if there exists a function $\phi: (0,\infty ) \rightarrow (0,\infty )$ with ϕ(ε) = o(1) as $\epsilon \rightarrow 0$ such that for all ε > 0

$$\displaystyle\begin{array}{rcl} E(\vert h(X^{{\prime}},Y ) - h(X,Y )\vert 1_{\{ \vert X-X^{{\prime}}\vert \leq \epsilon \}}) \leq \phi (\epsilon )& &{}\end{array}$$

(9)

$$\displaystyle\begin{array}{rcl} E(\vert h(X,Y ^{{\prime}}) - h(X,Y )\vert 1_{\{ \vert Y -Y ^{{\prime}}\vert \leq \epsilon \}}) \leq \phi (\epsilon )& &{}\end{array}$$

(10)

for all random variables $X,X^{{\prime}},Y$ and $Y ^{{\prime}}$ having the same marginal distribution as X ₁, and such that X, Y are either independent or have joint distribution $P_{(X_{1},X_{k})}$, for some integer k.

The most important technical tool in the study of U-statistics is Hoeffding’s decomposition, originally introduced by Hoeffding [14]. If $E\vert h(X,Y )\vert <\infty$ for two independent random variables X and Y with the same distribution as X ₁, we can write

$$\displaystyle{ h(x,y) =\theta +h_{1}(x) + h_{2}(y) + g(x,y), }$$

(11)

where the terms on the right-hand side are defined as follows:

$$\displaystyle\begin{array}{rcl} \theta & =& \int \!\!\int h(x,y)dF(x)dF(y) {}\\ h_{1}(x)& =& \int h(x,y)dF(y) -\theta {}\\ h_{2}(y)& =& \int h(x,y)dF(x) -\theta {}\\ g(x,y)& =& h(x,y) - h_{1}(x) - h_{2}(y) -\theta. {}\\ \end{array}$$

Here, F denotes the distribution function of the random variables X _i. Observe that, by Fubini’s theorem,

$$\displaystyle{E(h_{1}(X)) = E(h_{2}(X)) = 0.}$$

In addition, the kernel g(x, y) is degenerate in the sense of the following definition.

Definition 3.

Let (X _i)_i ≥ 1 be a stationary process, and let g(x, y) be a measurable function. We say that g(x, y) is degenerate if

$$\displaystyle{ E(g(x,X_{1})) = E(g(X_{1},y)) = 0, }$$

(12)

for all $x,y \in \mathbb{R}$.

The following theorem, a functional central limit theorem for two-sample U-statistics of dependent data, is the main theoretical result of the present paper.

Theorem 1.

Let (X _n ) _n≥1 be a 1-approximating functional with constants (a _k ) _k≥1 of an absolutely regular process with mixing coefficients (β(k)) _k≥1 , and let h(x,y) be a 1-continuous bounded kernel, satisfying

$$\displaystyle{ \sum _{k=1}^{\infty }k^{2}(\beta (k) + \sqrt{a_{ k}} +\phi (a_{k})) <\infty, }$$

(13)

Then, as $n \rightarrow \infty$ , the D[0,1]-valued process

$$\displaystyle{ T_{n}(\lambda ):= \frac{1} {n^{3/2}}\sum _{i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}(h(X_{ i},X_{j})-\theta ),\;0 \leq \lambda \leq 1, }$$

(14)

converges in distribution towards a mean-zero Gaussian process with representation

$$\displaystyle{ Z(\lambda ) = (1-\lambda )W_{1}(\lambda ) +\lambda (W_{2}(1) - W_{2}(\lambda )),\;0 \leq \lambda \leq 1, }$$

(15)

where $(W_{1}(\lambda ),W_{2}(\lambda ))_{0\leq \lambda \leq 1}$ is a two-dimensional Brownian motion with mean zero and covariance function $\mathop{\mathrm{Cov}}\nolimits (W_{k}(s),W_{l}(t)) =\min (s,t)\sigma _{kl}$ , where

$$\displaystyle{ \sigma _{kl} = E(h_{k}(X_{0})h_{l}(X_{0})) + 2\,\sum _{j=1}^{\infty }\mathop{\mathrm{Cov}}\nolimits (h_{ k}(X_{0}),h_{l}(X_{j})),\;k,l = 1,2. }$$

(16)

Remark 1.

(i)
In the case of i.i.d. data, Theorem 1 was established by Csörgő and Horváth [5]. In the case of long-range dependent data, weak convergence of the process $(T_{n}(\lambda ))_{0\leq \lambda \leq 1}$ has been studied by Dehling, Rooch and Taqqu [10] and by Rooch [17], albeit with a normalization different from n ^3∕2.
(ii)
Using the representation (15), one can calculate the autocovariance function of the process $(Z(\lambda ))_{0\leq \lambda \leq 1}$. We obtain
$$\displaystyle\begin{array}{rcl} \mathop{\mathrm{Cov}}\nolimits (Z(\lambda ),Z(\mu ))& =& \sigma _{11}[(1-\lambda )(1-\mu )\min \{\lambda,\mu \}] {}\\ & & +\sigma _{22}[\lambda \mu (1 -\mu -\lambda +\min \{\lambda,\mu \})] {}\\ & & +\sigma _{12}[\mu (1-\lambda )(\lambda -\min \{\lambda,\mu \}) +\lambda (1-\mu )(\mu -\min \{\lambda,\mu \})]. {}\\ \end{array}$$
(iii)
We conjecture that a similar theorem also holds for unbounded kernels under some moments conditions and faster mixing rates (similar to Theorem 2.7 of Sharipov, Wendler [20]). As our main application is the Wilcoxon test, where the kernel is bounded, we restrict the theorem to the case of bounded kernels.
(iv)
For the kernel $h(x,y) = y - x$, we can analyze the asymptotic behavior of the process T _n(λ) using the functional central limit theorem (FCLT). Note that, since $X_{j} - X_{i} = (X_{j} - E(X_{j})) - (X_{i} - E(X_{i}))$, we may assume without loss of generality that X _i has mean zero. Then we get the representation
$$\displaystyle\begin{array}{rcl} T_{n}(\lambda )& =& \frac{1} {n^{3/2}}\sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}(X_{ j} - X_{i}) \\ & =& \frac{[n\lambda ]} {n} \frac{1} {\sqrt{n}}\sum _{i=1}^{n}X_{ i} - \frac{1} {\sqrt{n}}\sum _{i=1}^{[n\lambda ]}X_{ i}. {}\end{array}$$
(17)
Thus, weak convergence of $(T_{n}(\lambda ))_{0\leq \lambda \leq 1}$ can be derived from the FCLT for the partial sum process $\frac{1} {\sqrt{n}}\sum _{i=1}^{[n\lambda ]}X_{i}$. Such FCLTs have been proved under a wide range of conditions, e.g. for functionals of uniformly mixing data in Billingsley [2].

We finally want to state an important special case of Theorem 1, namely when the kernel is anti-symmetric, i.e. when $h(x,y) = -h(y,x)$. Kernels that occur in connection with change-point tests usually have this property. For anti-symmetric kernels, the limit process has a much simpler structure; moreover one can give a simpler direct proof in this case. Note that for independent random variables X, Y we have by anti-symmetry that $Eh(X,Y ) = -Eh(Y,X) = -Eh(X,Y )$ and so $\theta = Eh(X,Y ) = 0$.

Theorem 2.

Let (X _n ) _n≥1 be a 1-approximating functional with constants (a _k ) _k≥1 of an absolutely regular process with mixing coefficients (β(k)) _k≥1 , and let h(x,y) be a 1-continuous bounded anti-symmetric kernel, such that (13) holds. Then, as $n \rightarrow \infty$ , the D[0,1]-valued process

$$\displaystyle{ T_{n}(\lambda ):= \frac{1} {n^{3/2}}\sum _{i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}h(X_{ i},X_{j}),\;0 \leq \lambda \leq 1, }$$

(18)

converges in distribution towards the mean-zero Gaussian process $\sigma \,W^{(0)}(\lambda ),\;0 \leq \lambda \leq 1$ , where (W ⁰ (λ)) _0≤λ≤1 is a standard Brownian bridge and

$$\displaystyle{ \sigma ^{2} =\mathop{ \mathrm{Var}}\nolimits (h_{ 1}(X_{1})) + 2\sum _{i=2}^{\infty }\mathop{\mathrm{Cov}}\nolimits (h_{ 1}(X_{1}),h_{1}(X_{k})). }$$

(19)

3 Application to Change Point Problems

In this section, we will apply Theorem 1 in order to derive the asymptotic distribution of two change-point test statistics. Specifically, we wish to test the hypothesis

$$\displaystyle{ H_{0}:\mu _{1} =\ldots =\mu _{n} }$$

(20)

against the alternative of a level shift at an unknown point in time, i.e.

$$\displaystyle{ H_{A}:\mu _{1} =\ldots =\mu _{k}\neq \mu _{k+1} =\ldots =\mu _{n},\text{ for some }k \in \{ 1,\ldots,n - 1\}. }$$

(21)

We consider the following two test statistics,

$$\displaystyle\begin{array}{rcl} T_{1,n}& =& \max _{1\leq k<n}\left \vert \frac{1} {n^{3/2}}\sum _{i=1}^{k}\sum _{ j=k+1}^{n}\left (1_{\{ X_{i}<X_{j}\}} - 1/2\right )\right \vert {}\end{array}$$

(22)

$$\displaystyle\begin{array}{rcl} T_{2,n}& =& \max _{1\leq k<n}\left \vert \frac{1} {n^{3/2}}\sum _{i=1}^{k}\sum _{ j=k+1}^{n}\left (X_{ j} - X_{i}\right )\right \vert.{}\end{array}$$

(23)

Theorem 3.

Let (X _n ) _n≥1 be a 1-approximating functional with constants (a _k ) _k≥1 of an absolutely regular process with mixing coefficients (β(k)) _k≥1 , satisfying (13) , and assume that X ₁ has a distribution function F(x) with bounded density. Then, under the null hypothesis H ₀ ,

$$\displaystyle{ T_{1,n} \rightarrow \sigma _{1}\sup _{0\leq \lambda \leq 1}\vert W^{(0)}(\lambda )\vert, }$$

(24)

where (W ⁽⁰⁾ (λ)) _0≤λ≤1 denotes the standard Brownian bridge process, and where

$$\displaystyle{ \sigma _{1}^{2} =\mathop{ \mathrm{Var}}\nolimits (F(X_{ 1})) + 2\,\sum _{k=2}^{\infty }\mathop{\mathrm{Cov}}\nolimits (F(X_{ 1}),F(X_{k})). }$$

(25)

Assuming that $E\vert X_{i}\vert ^{2+\delta } <\infty$ , $\beta (k) = O(k^{-(2+\delta )/\delta })$ and $a_{k} = O(k^{-(1+\delta )/2\delta })$ , and under the null hypothesis H ₀ ,

$$\displaystyle{ T_{2,n} \rightarrow \sigma _{2}\sup _{0\leq \lambda \leq 1}\vert W^{(0)}(\lambda )\vert, }$$

(26)

where

$$\displaystyle{ \sigma _{2}^{2} =\mathop{ \mathrm{Var}}\nolimits (X_{ 1}) + 2\,\sum _{k=2}^{\infty }\mathop{\mathrm{Cov}}\nolimits (X_{ 1},X_{k}). }$$

(27)

Proof.

We will establish weak convergence of T _1, n. In order to do so, we will apply Theorem 1 to the kernel h(x, y) = 1_{x < y}. Borovkova, Burton and Dehling [3] showed that this kernel is 1-continuous. By continuity of the distribution function of X ₁, we get that $\theta =\int \!\!\int 1_{\{x<y\}}dF(x)dF(y) = 1/2$. Moreover, we get

$$\displaystyle\begin{array}{rcl} h_{1}(x)& =& P(x <X_{1}) -\frac{1} {2} = \frac{1} {2} - F(x) {}\\ h_{2}(x)& =& P(X_{1} <x) -\frac{1} {2} = F(x) -\frac{1} {2}. {}\\ \end{array}$$

Note that $h_{2}(x) = -h_{1}(x)$. Hence $W_{2}(\lambda ) = -W_{1}(\lambda )$, and thus the limit process in Theorem 1 has the representation

$$\displaystyle{Z(\lambda ) = (1-\lambda )W_{1}(\lambda ) +\lambda (W_{2}(1) - W_{2}(\lambda )) = W_{1}(\lambda ) -\lambda W_{1}(1).}$$

Here W ₁(λ) is a Brownian motion with variance $\sigma _{1}^{2}$. Weak convergence of T _2, n can be shown directly from the functional central limit theorem for the partial sum process; see Corollary 3.2 of Wooldridge and White [21]. We have to check the L ₂-near epoch dependence. Note that by our assumptions

$$\displaystyle\begin{array}{rcl} & & E\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert ^{2} \\ & & \ = E\left [\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert ^{2}1_{ \{\vert X_{0}-E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\vert \leq a_{l}^{- \frac{1} {1+\delta } }\}}\right ] \\ & & \quad \qquad \qquad + E\left [\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert ^{2}1_{ \{\vert X_{0}-E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\vert>a_{l}^{- \frac{1} {1+\delta } }\}}\right ] \\ & & \leq a_{l}^{- \frac{1} {1+\delta } }E\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert + a_{l}^{ \frac{\delta }{ 1+\delta } }E\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert ^{2+\delta } \\ & & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \leq Ca_{l}^{ \frac{\delta }{ 1+\delta } } = O(l^{-1/2}), {}\end{array}$$

(28)

so the condition of Corollary 3.2 of Wooldridge and White [21] holds. Hence, the partial sum process $( \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt]}X_{i})_{0\leq t\leq 1}$ converges in distribution to $(\sigma _{2}\,W(t))_{0\leq t\leq 1}$, where W is standard Brownian motion. Convergence in distribution of T _2, n follows by an application of the continuous mapping theorem.

Remark 2.

(i)
The distribution of $\sup _{0\leq \lambda \leq 1}\vert W(\lambda )\vert$ is the well-known Kolmogorov-Smirnov distribution. Quantiles of the Kolmogorov-Smirnov distribution can be found in most statistical tables.
(ii)
In order to apply Theorem 3, we need to estimate the variances $\sigma _{1}^{2}$ and $\sigma _{2}^{2}$. Regarding $\sigma _{2}^{2}$ given in expression (27), we apply the non-overlapping subsampling estimator
$$\displaystyle{ \hat{\sigma }_{2}^{2} = \frac{1} {[n/l_{n}]}\sum _{i=1}^{[n/l_{n}]} \frac{1} {l_{n}}\left (\sum _{j=(i-1)l_{n}+1}^{il_{n} }X_{j} -\frac{l_{n}} {n} \sum _{j=1}^{n}X_{ j}\right )^{2} }$$
(29)
investigated by Carlstein [4] for α-mixing data. In case of AR(1)-processes, Carlstein derives
$$\displaystyle{ l_{n} =\max (\lceil n^{1/3}(2\rho /(1 -\rho ^{2}))^{2/3}\rceil,1) }$$
(30)
as the choice of the block length which minimizes the MSE asymptotically, with ρ being the autocorrelation coefficient at lag 1.

Regarding $\sigma _{1}^{2}$ given in (25), one faces the additional challenge that the distribution function F is unknown. This problem has been addressed, e.g. in Dehling, Fried, Sharipov, Vogel and Wornowizki [9], for the case of functionals of absolutely regular processes and F being estimated by the empirical distribution function F _n. The authors find the subsampling estimator for $\sigma _{1}^{2}$

$$\displaystyle{ \hat{\sigma }_{1} = \frac{1} {[n/l_{n}]}\sqrt{ \frac{\pi } {2}}\sum _{i=1}^{[n/l_{n}]} \frac{1} {\sqrt{l_{n}}}\left \vert \sum _{j=(i-1)l_{n}+1}^{il_{n} }F_{n}(X_{j}) -\frac{l_{n}} {n} \sum _{j=1}^{n}F_{ n}(X_{j})\right \vert, }$$

(31)

employing non-overlapping subsampling, to give smaller biases, but somewhat larger MSEs than the corresponding overlapping subsampling estimator. The adaptive choice of the block length l _n proposed by Carlstein worked well in their simulations if the data were generated from a stationary ARMA(1,1) model and an estimate of ρ was plugged in. In the next section, we will explore this and other proposals in situations with level shifts and normally or heavy-tailed innovations.

4 Simulation Results

The assumptions regarding the underlying process (X _i) in Theorem 1 are satisfied by a wide range of time series, such as AR and ARMA processes. To illustrate the results and to investigate the finite sample behavior and the power of the tests based on T _1, n and T _2, n, we will give some simulation results. We study the underlying change-point model

$$\displaystyle{ X_{i} = \left \{\begin{array}{cl} \xi _{i} &\text{if }i = 1,\ldots,[n\lambda ] \\ \mu +\xi _{i}&\text{if }i = [n\lambda ] + 1,\ldots,n. \end{array} \right. }$$

(32)

Within this model, the hypothesis of no change is equivalent to μ = 0. We assume that the noise follows an AR(1) process, i.e. that

$$\displaystyle{ \xi _{i} =\rho \,\xi _{i-1} +\epsilon _{i}, }$$

(33)

where − 1 < ρ < 1, and where the innovations ε _i are i.i.d. random variables with mean zero, bounded density and finite second moments. The innovations ε _i are generated from a standard normal or a t _ν-distribution with ν = 3 degrees of freedom, scaled to have the same 84.13 % percentile as the standard normal, which is 1. The autoregression coefficient is varied in ρ = { 0. 0, 0. 4, 0. 8}, corresponding to zero, moderate or strong positive autocorrelation, and the sample size is n = 200. For the choice of the block length we used Carlstein’s adaptive rule outlined above, or a fixed block length of l _n = 9, which is in good agreement with the empirical findings of Dehling et al. [10] for larger sample sizes, and their theoretical result that l _n should be chosen as $o(\sqrt{n})$ to achieve consistency. For comparison, we also include tests employing overlapping subsampling for estimation of the asymptotical variance, applying the same block lengths as the non-overlapping versions.

Table 1 contains the empirical levels (i.e. the fraction of rejections) of the tests with an asymptotic level of 5 %, obtained from 4000 simulation runs for each situation. Note that the tests developed under the assumption of independence, not adjusting for autocorrelation, become strongly oversized with an increasingly positive autocorrelation, i.e. they reject a true null hypothesis far too often, and are practically useless already for ρ = 0. 4. The performance of the adjusted tests is much better in this respect and in a good agreement with the asymptotic results. Only if the autocorrelation is strong (ρ = 0. 8), the tests with a fixed block length become somewhat anti-conservative (oversized), and even more so for the CUSUM-test. Longer block lengths are needed for stronger positive autocorrelations, and Carlstein’s adaptive block length (30) adjusts for this. There is little difference between the tests employing overlapping and non-overlapping subsampling here.

Table 1 Empirical level of the tests based on T _1, n and T _2, n, for n = 200, with fixed or adaptive subsampling block length l _n and overlapping (ol) or non-overlapping (nol) subsampling. The results are for AR(1) observations with different lag-one autocorrelations ρ and different t ₃-distributed innovations, and based on 4000 simulation runs each

Full size table

In order to investigate the powers of the tests under the alternative, we consider shifts of increasing height μ, generating 400 data sets for each situation. The sample size is again n = 200, and the change point is at observation number $\tau = [\lambda n] = 100$.

Figure 1 illustrates the powers of the different versions of the tests in case of Gaussian or t ₃-distributed innovations and several autocorrelation coefficients ρ. Under normality, the CUSUM test T _2, n is somewhat more powerful than the test T _1, n based on the Wilcoxon statistic, while under the t ₃-distribution it is the other way round. The CUSUM test with the fixed block length considered here becomes strongly oversized if ρ is large, while this effect is less severe for the test based on the Wilcoxon statistic. Carlstein’s adaptive choice of the block length increases the power if ρ is small and improves the size of the test substantially if ρ is large. The tests employing overlapping subsampling (not shown here) perform even slightly more powerful in case of zero or moderate autocorrelations, but much less powerful in case of strong autocorrelations. We have also considered the case of negative autocorrelation ($\rho = -0.4$, not shown here). We obtained similar results for the power of the test based on the Wilcoxon statistic relatively to that of the CUSUM test, and little difference between using a fixed or the adaptive block length.

The tests with Carlstein’s adaptive choice of the block length could be improved further by using a more sophisticated estimate of ρ than the ordinary sample autocorrelation used here. The latter is positively biased in the presence of a shift, which leads to too large choices of the block length. This negative effect becomes more severe for larger values of ρ, since the plug-in-estimate of the asymptotically MSE-optimal choice of l _n increases more rapidly if $\hat{\rho }$ is close to 1, while it is rather stable for moderate and small values of $\hat{\rho }$. In our study, for ρ = 0 the average value chosen for l _n increases from about 2 to about 3, only, as the height of the shift increases, while it increases from about 6 to about 9 if ρ = 0. 4, and even from about 16 to about 24 if ρ = 0. 8. An estimate of the autocorrelation coefficient which resists shifts could be used, e.g. by applying a stepwise procedure which estimates the possible time of occurrence of a shift before calculating $\hat{\rho }$ from the corrected data, but this will not be pursued here.

5 Data Example

For illustration we apply the tests to time series data representing the monthly average daily minimum temperatures in Potsdam, Germany, measured between January 1893 and December 1992. The 1200 data points for these 100 years have been deseasonalized by subtracting the median value from each calendar month, see Fig. 2. Our interest is in whether the level of this time series is constant or whether there is a monotonic change. Such a systematic change is likely to show a trend-like behavior and not a sharp shift, but nevertheless we would like a change-point test to detect such a change if its null hypothesis is a constant level.

The empirical autocorrelation and partial autocorrelation functions suggest a first order autoregressive model with lag-one autocorrelation about 0.25 for the deseasonalized data. The test statistics take their maximum values after time point 595, i.e. rather in the middle of the time series. The resulting p-values are 0.23 and 0.16 for the CUSUM test with the fixed and the adaptive block length, respectively. As opposed to this, both versions of the Wilcoxon based test become significant as the corresponding p-values are 0.04 and 0.015, respectively. The differences between the results agree with the better power behavior of the Wilcoxon based test relatively to the CUSUM test in case of the (left-)skewed distributions of minimum temperatures, and the better power of the versions employing the adaptive block length over those with the fixed block length considered here in case of small positive autocorrelations. The sample median of the second time period is about 0.4 degrees larger than that of the first period.

6 Auxiliary Results

In this section, we will prove some auxiliary results which will play a crucial role in the proof of Theorem 1. The main result of this section is the following proposition, which essentially shows that the degenerate part in the Hoeffding decomposition of the U-statistic T _n(λ) is uniformly negligible.

Proposition 1.

Let (X _n ) _n≥1 be a 1-approximating functional with constants (a _k ) _k≥1 of an absolutely regular process with mixing coefficients (β(k)) _k≥1 , satisfying

$$\displaystyle{ \sum _{k=1}^{\infty }k(\beta (k) + \sqrt{a_{ k}} +\phi (a_{k})) <\infty. }$$

(34)

Moreover, let g(x,y) be a 1-continuous bounded degenerate kernel. Then, as $n \rightarrow \infty$ ,

$$\displaystyle{ \frac{1} {n^{3/2}}\sup _{0\leq \lambda \leq 1}\left \vert \sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}g(X_{ i},X_{j})\right \vert \rightarrow 0 }$$

(35)

in probability.

The proof of Proposition 1 requires some moment bounds for increments of U-statistics of degenerate kernels, which we will now state as separate lemmas.

Lemma 1.

Let (X _n ) _n≥1 be a 1-approximating functional with constants (a _k ) _k≥1 of an absolutely regular process with mixing coefficients (β(k)) _k≥1 , satisfying

$$\displaystyle{ \sum _{k=1}^{\infty }k(\beta (k) + \sqrt{a_{ k}} +\phi (a_{k})) <\infty. }$$

(36)

Moreover, let g(x,y) be a 1-continuous bounded degenerate kernel. Then, there exists a constant C ₁ such that

$$\displaystyle{ E\left (\sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}g(X_{ i},X_{j})\right )^{2} \leq C_{ 1}[n\lambda ](n - [n\lambda ]). }$$

(37)

Proof.

We can write

$$\displaystyle\begin{array}{rcl} & & E\left (\sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}g(X_{ i},X_{j})\right )^{2} =\sum _{ i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}E(g(X_{ i},X_{j}))^{2} \\ & & \qquad \qquad \qquad \qquad + 2\sum _{1\leq i_{1}\neq i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}\neq j_{2}\leq n}E\left (g(X_{i_{1}},X_{j_{1}})g(X_{i_{2}},X_{j_{2}})\right ) {}\end{array}$$

(38)

The elements of the first sum all are bounded, hence

$$\displaystyle{ \sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}E(g(X_{ i},X_{j}))^{2} \leq C[n\lambda ](n - [n\lambda ]). }$$

(39)

Concerning the second sum, by Lemma 5, we get

$$\displaystyle\begin{array}{rcl} & & \sum _{1\leq i_{1}<i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}<j_{2}\leq n}E\left (g(X_{i_{1}},X_{j_{1}})g(X_{i_{2}},X_{j_{2}})\right ) \\ & & \quad \qquad \qquad \leq 4\,S\sum _{1\leq i_{1}<i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}\leq j_{2}\leq n}\phi (a_{[k/3]}) \\ & & \qquad \qquad \qquad \qquad \ + 8\,S^{2}\sum _{ 1\leq i_{1}<i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}\leq j_{2}\leq n}(\sqrt{a_{[k/3]}} +\beta ([k/3])){}\end{array}$$

(40)

with $k =\max \{ \vert i_{2} - i_{1}\vert,\vert j_{2} - j_{1}\vert \}$. We will first treat the summands with $k = i_{2} - i_{1}$. Suppose for one moment that k is fixed and we will bound the number of indices that appear in the sum. Observe that in this case we have [n λ] ways to choose i ₁, once i ₁ is chosen we have one way to pick i ₂ because $i_{2} = i_{1} + k$. For j ₁ we have as before n − [n λ] ways to pick this index and then for each j ₁, j ₂ need to be in the interval [j ₁, j ₁ + k] and there are exactly k integers in such interval.

$$\displaystyle\begin{array}{rcl} & & \sum _{1\leq i_{1}<i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}<j_{2}\leq n}\left (4S\phi (a_{[k/3]}) + 8S^{2}\sqrt{a_{ [k/3]}} + 8S^{2}\beta ([k/3])\right ) \\ & & \leq C[n\lambda ](n - [n\lambda ])\left (\sum _{k=1}^{n}k\phi (a_{ k}) +\sum _{ k=1}^{n}k\sqrt{a_{ k}} +\sum _{ k=1}^{n}k\beta (k)\right ) \leq C[n\lambda ](n - [n\lambda ]){}\end{array}$$

(41)

Analogously we can find the bounds for the terms with $k = i_{1} - i_{2}$, $k = j_{2} - j_{1}$ and $k = j_{1} - j_{2}$ using the conditions of summability.

We now define the process G(λ), 0 ≤ λ ≤ 1, by

$$\displaystyle{ G_{n}(\lambda ):= n^{-3/2}\sum _{ i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}g(X_{ i},X_{j}),\quad 0 \leq \lambda \leq 1. }$$

(42)

Lemma 2.

Under the conditions of Lemma 1 , there exists a constant C such that

$$\displaystyle{ E(\vert G_{n}(\eta ) - G_{n}(\mu )\vert ^{2}) \leq \frac{C} {n} (\eta -\mu ), }$$

(43)

for all 0 ≤μ ≤η ≤ 1.

Proof.

We can write

$$\displaystyle\begin{array}{rcl} & & E(\vert G_{n}(\eta ) - G_{n}(\mu )\vert ^{2}) {}\notag \\ & & \quad \ \leq \frac{2} {n^{3}}E\left (\sum _{i=1}^{[n\mu ]}\sum _{ j=[n\mu ]+1}^{[n\eta ]}g(X_{ i},X_{j})\right )^{2} + \frac{2} {n^{3}}E\left (\sum _{i=[n\mu ]+1}^{[n\eta ]}\sum _{ j=[n\eta ]+1}^{n}g(X_{ i},X_{j})\right )^{2} \\ & & \quad \ = \frac{2} {n^{3}}E\left (\sum _{i=1}^{[n\mu ]}\sum _{ j=[n\mu ]+1}^{[n\eta ]}g(X_{ i},X_{j})\right )^{2} + \frac{2} {n^{3}}E\left (\sum _{i=1}^{[n\eta ]-[n\mu ]}\sum _{ j=[n\eta ]-[n\mu ]+1}^{n-[n\mu ]}g(X_{ i},X_{j})\right )^{2} \\ & & \quad \ \leq C \frac{1} {n^{3}}\left ([n\mu ]([n\eta ] - [n\mu ]) + ([n\eta ] - [n\mu ])(n - [n\eta ])\right ) \leq \frac{C} {n} (\eta -\mu ) \\ \end{array}$$

(44)

using the stationarity of the process $(X_{n})_{n\in \mathbb{N}}$ and Lemma 1.

Proof of Proposition 1.

From Lemma 2 we obtain, using Chebyshev’s inequality,

$$\displaystyle{ P\left (\vert G_{n}(\eta ) - G_{n}(\mu )\vert \geq \epsilon \right ) \leq \frac{1} {\epsilon ^{2}} \frac{C} {n} (\eta -\mu ), }$$

(45)

for all ε > 0. Thus we get for 0 ≤ k ≤ m ≤ n with $k,m,n \in \mathbb{N}$

$$\displaystyle\begin{array}{rcl} P\left (\left \vert G_{n}\left (\frac{m} {n} \right ) - G_{n}\left (\frac{k} {n}\right )\right \vert \geq \epsilon \right )& \leq & \frac{1} {\epsilon ^{2}} E\left (G_{n}\left (\frac{m} {n} \right ) - G_{n}\left (\frac{k} {n}\right )\right )^{2} \\ & \leq & \frac{1} {\epsilon ^{2}} \frac{C} {n^{2}}(m - k) \leq \frac{1} {\epsilon ^{2}} \frac{C} {n^{5/3}}(m - k)^{4/3}{}\end{array}$$

(46)

as m − k ≤ n. Now consider the variables

$$\displaystyle{ \zeta _{i} = \left \{\begin{array}{cl} G_{n}\left ( \frac{i} {n}\right ) - G_{n}\left (\frac{i-1} {n} \right )&\text{if }i = 1,\ldots,n - 1\\ 0 &\text{else } \end{array} \right. }$$

(47)

and suppose that $S_{i} =\zeta _{1} +\zeta _{2} +\ldots +\zeta _{i}$ with S ₀ = 0, then $S_{i} = G_{n}( \frac{i} {n})$. In consequence the inequality (46) is equivalent to

$$\displaystyle{ P(\vert S_{m} - S_{k}\vert \geq \epsilon ) \leq \frac{1} {\epsilon ^{2}} \left [\frac{C^{3/4}} {n^{5/4}} (m - k)\right ]^{4/3}\quad \text{for}\quad 0 \leq k \leq m \leq n. }$$

(48)

So the assumption of Theorem 7 are satisfied with the variables (47) in the role of the ξ _i, $\beta = 1/2$, $\alpha = 2/3$ and $u_{l} = C^{3/4}/n^{5/4}$, u _o = 0 and hence

$$\displaystyle{ P\left (\max _{1\leq i\leq n-1}\vert S_{i}\vert \geq \epsilon \right ) \leq \frac{K} {\epsilon ^{2}} \left [\frac{C^{3/4}} {n^{5/4}} (n - 1)\right ]^{4/3} \leq \frac{KC} {\epsilon ^{2}n^{1/3}} }$$

(49)

where K depends only of α and β. Thus, (35) holds as $n \rightarrow \infty$. □

7 Proof of Main Results

In this section, we will prove Theorems 1 and 2. Note that Theorem 2 is a direct consequence of Theorem 1, applied to anti-symmetric kernels. We will nevertheless present a direct proof of Theorem 2, since this proof is much simpler than the proof in the general case. Moreover, Theorem 2 covers those cases that are most relevant in applications.

The first part of the proof is identical for both Theorems 1 and 2. Note that, for each λ ∈ [0, 1], the statistic T _n(λ) is a two-sample U-statistic. Thus, using the Hoeffding decomposition (11), we can write T _n(λ) as

$$\displaystyle\begin{array}{rcl} T_{n}(\lambda )& =& \frac{1} {n^{3/2}}\left (\sum _{i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}(h_{ 1}(X_{i}) + h_{2}(X_{j}) + g(X_{i},X_{j}))\right ) \\ & =& \frac{1} {n^{3/2}}\left ((n - [n\lambda ])\sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i}) + [n\lambda ]\sum _{j=[n\lambda ]+1}^{n}h_{ 2}(X_{j}) +\sum _{ i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}g(X_{ i},X_{j})\right ){}\end{array}$$

(50)

By Proposition 1, we know that

$$\displaystyle{ \frac{1} {n^{3/2}}\sup _{0\leq \lambda \leq 1}\left \vert \sum _{i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}g(X_{ i},X_{j})\right \vert \rightarrow 0}$$

in probability. Thus, by Slutsky’s lemma, it suffices to show that the sum of the first two terms, i.e.

$$\displaystyle{ \left (\frac{n - [n\lambda ]} {n^{3/2}} \sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i}) + \frac{[n\lambda ]} {n^{3/2}}\sum _{j=[n\lambda ]+1}^{n}h_{ 2}(X_{j})\right )_{0\leq \lambda \leq 1} }$$

(51)

converges in distribution to the desired limit process.

Proof of Theorem 2.

It remains to show that (51) converges in distribution to $\sigma W^{(0)}(\lambda ),0 \leq \lambda \leq 1$, where (W ⁽⁰⁾(λ))_{0 ≤ λ ≤ 1} is standard Brownian bridge on [0, 1], and where $\sigma ^{2}$ is defined in (19). By antisymmetry of the kernel h(x, y), we obtain that $h_{2}(x) = -h_{1}(x)$. Hence, in this case, (51) can be rewritten as

$$\displaystyle{\frac{n - [n\lambda ]} {n^{3/2}} \sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i})- \frac{[n\lambda ]} {n^{3/2}}\sum _{i=[n\lambda ]+1}^{n}h_{ 1}(X_{i}) = \frac{1} {n^{1/2}}\sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i})- \frac{[n\lambda ]} {n^{3/2}}\sum _{i=1}^{n}h_{ 1}(X_{i}).}$$

By Proposition 2.11 and Lemma 2.15 of Borovkova, Burton and Dehling [3], the sequence (h ₁(X _i))_i ≥ 1 is a 1-approximating functional with approximating constant $C\sqrt{a_{k}}$. Since h ₁(X _i) is bounded, the L ₂-near epoch dependence in the sense of Wooldridge and White [21] also holds, with the same constants. Moreover, the underlying process (Z _n)_n ≥ 1 is absolutely regular, and hence also strongly mixing. Thus we may apply the invariance principle in Corollary 3.2 of Wooldridge and White [21], and obtain that the partial sum process

$$\displaystyle{ \left ( \frac{1} {n^{1/2}}\sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i})\right )_{0\leq \lambda \leq 1} }$$

(52)

converges weakly to Brownian motion (W(λ))_{0 ≤ λ ≤ 1} with $\mathop{\mathrm{Var}}\nolimits (W(1)) =\sigma ^{2}$. The statement of the Theorem follows with the continuous mapping theorem for the mapping $x(t)\mapsto x(t) - tx(1),\;0 \leq t \leq 1$.

The proof of Theorem 1 requires an invariance principle for the partial sum process of $\mathbb{R}^{2}$-valued dependent random variables; see Proposition 2 below. For mixing processes, such invariance principles have been established even for partial sums of Hilbert space valued random vector, e.g. by Dehling [7]. In this paper, we provide an extension of these results to functionals of mixing processes.

Proposition 2.

Let $(X_{n})_{n\in \mathbb{N}}$ be a 1-approximating functional of an absolutely regular process with mixing coefficients (β(k)) and let h ₁ (⋅), h ₂ (⋅) be bounded1–continuous functions with mean zero, such that

$$\displaystyle{ \sum _{k}k^{2}(\beta (k) + a_{ k} +\phi (a_{k})) <\infty. }$$

(53)

Then, as $n \rightarrow \infty$ ,

$$\displaystyle{ \left ( \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt]}\left (\begin{array}{*{10}c} h_{1}(X_{i}) \\ h_{2}(X_{i}) \end{array} \right )\right )_{0\leq t\leq 1}\longrightarrow \left (\begin{array}{*{10}c} W_{1}(t) \\ W_{2}(t) \end{array} \right )_{0\leq t\leq 1} }$$

(54)

where (W ₁ (t),W ₂ (t)) _0≤t≤1 is a two-dimensional Brownian motion with mean zero and covariance $E(W_{k}(s)\,W_{l}(t)) =\min (s,t)\sigma _{kl}$ , where $\sigma _{k,l}$ as defined in (16) .

Proof.

To prove (54), we need to establish finite dimensional convergence and tightness. Concerning finite-dimensional convergence, by the Cramér-Wold device it suffices to show the convergence in distribution of a linear combination of the coordinates of the vector

$$\displaystyle\begin{array}{rcl} & & \left ( \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{1}]}h_{ 1}(X_{i}), \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{1}]}h_{ 2}(X_{i}),\ldots, \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{j}]}h_{ 1}(X_{i}), \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{j}]}h_{ 2}(X_{i}),\right. \\ & & \qquad \qquad \qquad \qquad \qquad \ \quad \qquad \left.\ldots, \frac{1} {\sqrt{n}}\sum _{i=1}^{n}h_{ 1}(X_{i}), \frac{1} {\sqrt{n}}\sum _{i=1}^{n}h_{ 2}(X_{i})\right ), {}\end{array}$$

(55)

for $0 = t_{0} <t_{1} <\ldots <t_{j} <\ldots <t_{k} = 1$. Any such linear combination can be expressed as

$$\displaystyle{ \sum _{j=1}^{k} \frac{1} {\sqrt{n}}\sum _{i=[nt_{j-1}]+1}^{[nt_{j}]}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})), }$$

(56)

for $(a_{j},b_{j})_{j=1}^{k} \in \mathbb{R}^{2\,k}$. By using the Cramér-Wold device again, the weak convergence of this sum is equivalent to the weak convergence of the vector

$$\displaystyle\begin{array}{rcl} & & \left ( \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{1}]}(a_{ 1}h_{1}(X_{i}) + b_{1}h_{2}(X_{i})),\ldots, \frac{1} {\sqrt{n}}\sum _{i=[nt_{j-1}]+1}^{[nt_{j}]}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})),\right. \\ & & \qquad \qquad \qquad \qquad \qquad \quad \qquad \left.\ldots, \frac{1} {\sqrt{n}}\sum _{i=[nt_{k-1}]+1}^{n}(a_{ k}h_{1}(X_{i}) + b_{k}h_{2}(X_{i}))\right ) {}\end{array}$$

(57)

to

$$\displaystyle\begin{array}{rcl} & & \big(a_{1}(W_{1}(t_{1}) - W_{1}(t_{0})) + b_{1}(W_{2}(t_{1}) - W_{2}(t_{0})),\ldots, \\ & & \qquad \qquad \qquad \qquad \qquad a_{k}(W_{1}(t_{k}) - W_{1}(t_{k-1})) + b_{k}(W_{2}(t_{k}) - W_{2}(t_{k-1}))\big).{}\end{array}$$

(58)

Since (X _n)_n ≥ 1 is a 1-approximating functional, it can be coupled with a process consisting of independent blocks. Given integers $L:= L_{n} = [n^{3/4}]$ and $l_{n} = [n^{1/2}]$, we introduce the (l, L) blocking (B _m)_m ≥ 0 of the variables (a _j h ₁(X _i) + b _j h ₂(X _i)) with $i = [nt_{j-1}] + 1,\ldots,[nt_{j}]$, j = 0, …, k and

$$\displaystyle{ B_{m}:=\sum _{ i=(m-1)(L_{n}+l_{n})+1}^{m(L_{n}+(m-1)l_{n})}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})) }$$

(59)

and separating blocks

$$\displaystyle{ \tilde{B}_{m}:=\sum _{ i=mL_{n}+(m-1)l_{n}+1}^{m(L_{n}+l_{n})}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})). }$$

(60)

By Theorem 5 there exists a sequence of independent blocks $(B_{m}^{{\prime}})$ with the same blockwise marginal distribution as (B _m) and such that

$$\displaystyle{P\left (\vert B_{m} - B_{m}^{{\prime}}\vert \leq 2\alpha _{ l}\right ) \geq 1 -\beta (l) - 2\alpha _{l},}$$

where $\alpha _{l}:= (2\sum _{k=[l_{n}/3]}^{\infty }a_{k})^{1/2}$. We can express the components of our vector (57) as a sum of blocks

$$\displaystyle\begin{array}{rcl} & & \sum _{i=[nt_{j}]+1}^{[nt_{j+1}]}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})) \\ & & \qquad \qquad =\sum _{ m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}B_{m} +\sum _{ m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}\tilde{B}_{m} +\sum _{R_{ j}}(a_{j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})),{}\end{array}$$

(61)

where R _j denotes the set of indices not contained in the blocks. Observe that by the Lemma 3 for any set $A \subset \{ 1,\ldots,n\}$

$$\displaystyle{ E\left (\sum _{i\in A}(a_{j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i}))\right )^{2} \leq C\#A }$$

(62)

and hence

$$\displaystyle{ E\left (\sum _{m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}\tilde{B}_{m}\right )^{2} \leq C \frac{n} {L_{n} + l_{n}}l_{n} \leq Cn^{3/4}, }$$

(63)

so it follows with the Chebyshev inequality that this term is negligible. For the last summand, we have that

$$\displaystyle{ E\left (\sum _{R_{j}}(a_{j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i}))\right )^{2} \leq C2(L_{ n} + l_{n}) \leq Cn^{3/4}. }$$

(64)

Furthermore, we need to show that we can replace the blocks B _m by the independent coupled blocks $B_{m}^{{\prime}}$:

$$\displaystyle\begin{array}{rcl} P\left (\left \vert \frac{1} {\sqrt{n}}\sum _{m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}(B_{m} - B_{m}^{{\prime}})\right \vert>\epsilon \right )& \leq & \sum _{ m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}P\left (\vert B_{m} - B_{m}^{{\prime}}\vert> \frac{\epsilon \sqrt{n}} {n^{1/4}}\right ) {}\\ & \leq & n^{\frac{1} {4} }\left (\beta ([\frac{l_{n}} {3} ]) +\alpha _{[\frac{l_{n}} {3} ]}\right ) \rightarrow 0 {}\\ \end{array}$$

as $n \rightarrow \infty$ by our conditions on the mixing coefficients and approximation constants. Here we used that fact that $\alpha _{n} \rightarrow 0$ and thus, for almost all $n \in \mathbb{N}$,

$$\displaystyle{ P\left (\vert B_{m} - B_{m}^{{\prime}}\vert>\epsilon n^{1/4}\right ) \leq P\left (\vert B_{ m} - B_{m}^{{\prime}}\vert> 2\alpha _{ l_{n}}\right ). }$$

(65)

With the above arguments the result holds if we show the convergence of

$$\displaystyle{ \frac{1} {\sqrt{n}}\left (\sum _{m=\left [ \frac{nt_{0}} {L+l}\right ]+1}^{\left [ \frac{nt_{1}} {L+l}\right ]}B_{m}^{{\prime}},\ldots,\sum _{ m=\left [ \frac{nt_{k}} {L+l}\right ]+1}^{\left [\frac{nt_{k+1}} {L+l} \right ]}B_{m}^{{\prime}}\right ). }$$

(66)

Since this vector has independent components, we only need to show the one-dimensional convergence, which is a consequence of Theorem 4, using the summability condition (53).

We now turn to the question of tightness and show that, for each ε and η, there exist a δ, 0 < δ < 1, and an integer n ₀ such that, for 0 ≤ t ≤ 1,

$$\displaystyle{ \frac{1} {\delta } P\left (\sup _{t\leq s\leq t+\delta }\vert Y _{n}(s) - Y _{n}(t)\vert \geq \epsilon \right ) \leq \eta,\quad n \geq n_{0} }$$

(67)

with

$$\displaystyle{ Y _{n}(t) = \frac{1} {\sigma \sqrt{n}}\sum _{i=1}^{[nt]}h_{ 1}(X_{i}) + (nt - [nt]) \frac{1} {\sigma \sqrt{n}}h(X_{[nt]+1}) }$$

(68)

(h ₂ can be treated in the same way) and by Theorem 8, this condition reduces to: For each positive ε there exist a α > 1 and an integer n ₀, s. t.

$$\displaystyle{ P\left (\max _{i\leq n}\left \vert \sum _{j=1}^{i}h_{ 1}(X_{j})\right \vert \geq \lambda \sqrt{n}\right ) \leq \frac{\epsilon } {\lambda ^{2}},\quad n \geq n_{0}. }$$

(69)

Let t ≥ s, s, t ∈ [0, 1]. By Lemma 4 we get

$$\displaystyle\begin{array}{rcl} E\left (\left \vert \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt]}h_{ 1}(X_{i}) - \frac{1} {\sqrt{n}}\sum _{i=1}^{[ns]}h_{ 1}(X_{i})\right \vert ^{4}\right )& =& \frac{1} {n^{2}}E\left (\sum _{[ns]+1}^{[nt]}h_{ 1}(X_{i})\right )^{4} \\ & \leq & \frac{1} {n^{2}}(([nt] - [ns])^{2}C) {}\end{array}$$

(70)

and this implies

$$\displaystyle{ P\left (\left \vert \frac{1} {\sqrt{n}}\sum _{i=1}^{m}h_{ 1}(X_{i}) - \frac{1} {\sqrt{n}}\sum _{i=1}^{k}h_{ 1}(X_{i})\right \vert \geq \epsilon \right ) \leq \frac{1} {\epsilon ^{4}} \left (\frac{C^{1/2}} {n} (m - k)\right )^{2}. }$$

(71)

By Theorem 7

$$\displaystyle{ P\left (\max _{i\leq n}\left \vert \sum _{j=1}^{i}h_{ 1}(X_{j})\right \vert \geq \epsilon \sqrt{n}\right ) \leq \frac{K} {\epsilon ^{4}} \left (\frac{C^{1/2}} {n} (n - 1)\right )^{2} }$$

(72)

and we get the assertion. Thus we have established tightness of each of the two coordinates of the partial sum process, which implies tightness of the vector-valued process.

Proof of Theorem 1.

From Proposition 2 we obtain that

$$\displaystyle{ \left ( \frac{1} {\sqrt{n}}\sum _{i=1}^{[n\lambda ]}\left (\begin{array}{*{10}c} h_{1}(X_{i}) \\ h_{2}(X_{i}) \end{array} \right )\right )_{0\leq \lambda \leq 1}\longrightarrow \left (\begin{array}{*{10}c} W_{1}(\lambda ) \\ W_{2}(\lambda ) \end{array} \right )_{0\leq \lambda \leq 1}, }$$

(73)

in distribution on the space (D([0, 1]))². We consider the functional given by

$$\displaystyle{ \left (\begin{array}{*{10}c} x_{1}(t) \\ x_{2}(t) \end{array} \right )\mapsto (1-t)x_{1}(t)+t(x_{2}(1)-x_{2}(t)),\quad 0 \leq t \leq 1. }$$

(74)

This is a continuous mapping from (D[0, 1])² to D[0, 1], so we may apply the continuous mapping theorem to (73), and obtain

$$\displaystyle\begin{array}{rcl} & & \left (\frac{n - [n\lambda ]} {n^{3/2}} \sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i}) + \frac{[n\lambda ]} {n^{3/2}}\sum _{j=[n\lambda ]+1}^{n}h_{ 2}(X_{j})\right )_{0\leq \lambda \leq 1} {}\\ & & \qquad \qquad \qquad \qquad \qquad \qquad \ \ \longrightarrow \left ((1-\lambda )W_{1}(\lambda ) +\lambda (W_{2}(1) - W_{2}(\lambda ))\right )_{0\leq \lambda \leq 1}. {}\\ \end{array}$$

Together with the remarks at the beginning of this section, this proves Theorem 1.

References

Babbel, B.: Invariance principles for U-statistics and von Mises functionals. J. Stat. Plan. Inference 22, 337–354 (1989)
Article MATH MathSciNet Google Scholar
Billinsgley, P.: Convergence of Probability Measures. 2nd edn. Wiley, New York (1999)
Book Google Scholar
Borovkova, S.A., Burton, R.M., Dehling, H.G.: Limit theorems for functionals of mixing processes with applications to U-statistics and dimension estimation. Trans. Am. Math. Soc. 353, 4261–4318 (2001)
Article MATH MathSciNet Google Scholar
Carlstein, E.: The use of subseries values for estimating the variance of a general statistic from a stationary sequence. Ann. Stat. 14, 1171–1179 (1986)
Article MATH MathSciNet Google Scholar
Csörgő, M., Horvath, L.: Invariance principles for changepoint problems. J. Multivar. Anal. 27, 151–168 (1988)
Article Google Scholar
Csörgő, M., Horvath, L.: Limit Theorems in Change Point Analysis. Wiley, New York (1997)
Google Scholar
Dehling, H.: Limit theorems for sums of weakly dependent Banach space valued random variables. Z. für Wahrscheinlichkeitstheorie und verwandte Gebiete 63, 393–432 (1983)
Article MATH MathSciNet Google Scholar
Dehling, H., Fried, R.: Asymptotic distribution of two-sample empirical U-quantiles with applications to robust tests for shifts in location. J. Multivar. Anal. 105, 124–140 (2012)
Article MATH MathSciNet Google Scholar
Dehling, H., Fried, R., Sharipov, O.Sh., Vogel, D., Wornowizki, M.: Estimation of the variance of partial sums of dependent processes. Stat. Probab. Lett. 83, 141–147 (2013)
Article MATH MathSciNet Google Scholar
Dehling, H., Rooch, A., Taqqu, M.S.: Nonparametric change-point tests for long-range dependent data. Scand. J. Stat. 40, 153–173 (2013)
Article MATH MathSciNet Google Scholar
Denker, M.: Asymptotic Distribution Theory in Nonparametrics Statistics. Vieweg Verlag, Braunschweig/Wiesbaden (1985)
Book Google Scholar
Denker, M., Keller, G.: On U-statistics and v. Mises’ statistics for weakly dependent processes. Z. für Wahrscheinlichkeitstheorie und verwandte Gebiete 64, 505–522 (1983)
Article MATH MathSciNet Google Scholar
Denker, M., Keller, G.: Rigorous statistical procedures for data from dynamical systems. J. Stat. Phys. 44, 67–93 (1986)
Article MATH MathSciNet Google Scholar
Hoeffding, W.: A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19, 293–325 (1948)
Article MATH MathSciNet Google Scholar
Ibragimov, I.A., Linnik, Yu.V.: Independent and Stationary Sequences of Random Variables. Wolters-Noordhoff, Groningen (1971)
MATH Google Scholar
Leucht, A.: Degenerate U- and V-statistics under weak dependence: asymptotic theory and bootstrap consistency. Bernoulli 18, 552–585 (2012)
Article MATH MathSciNet Google Scholar
Rooch, A.: Change-point tests for long-range dependent data. Dissertation, Ruhr-Universität Bochum (2012)
Google Scholar
Sen, P.K.: On the properties of U-statistics when the observations are not independent. I. Estimation of non-serial parameters in some stationary stochastic processes. Calcutta Stat. Assoc. Bull. 12, 69–92 (1963)
MATH Google Scholar
Sen, P.K.: Limiting behavior of regular functionals of empirical distributions for stationary ∗-mixing processes. Z. für Wahrscheinlichkeitstheorie und Verwandte Gebiete 25, 71–82 (1972)
Article MATH Google Scholar
Sharipov, O.Sh., Wendler, M.: Bootstrap for the sample mean and for U-statistics of mixing and near-epoch dependent processes. J. Nonparametric Stat. 24, 317–342 (2012)
Article MATH MathSciNet Google Scholar
Wooldridge, J.M., White, H.: Some invariance principles and central limit theorems for dependent heterogeneous processes. Econ. Theory 4, 210–230 (1988)
Article MathSciNet Google Scholar
Yoshihara, K.-I.: Limiting behavior of U-statistics for stationary, absolutely regular processes. Z. für Wahrscheinlichkeitstheorie und verwandte Gebiete 35, 237–252 (1976)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

The authors wish to thank the referees for their very careful reading of an earlier version of this manuscript, and for their many thoughtful comments that helped to improve the presentation of the paper. This research was supported by the Collaborative Research Center 823, Project C3 Analysis of Structural Change in Dynamic Processes, of the German Research Foundation DFG.

Author information

Authors and Affiliations

Fakultät für Mathematik, Ruhr-Universität Bochum, 44780, Bochum, Germany
Herold Dehling & Isabel Garcia
Fakultät für Statistik, Technische Universität Dortmund, 44221, Dortmund, Germany
Roland Fried
Institut für Mathematik und Informatik, Ernst-Moritz-Arndt-Universität, 17487, Greifswald, Germany
Martin Wendler

Authors

Herold Dehling
View author publications
You can also search for this author in PubMed Google Scholar
Roland Fried
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Martin Wendler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Herold Dehling .

Editor information

Editors and Affiliations

School of Mathematics & Statistics, Herzberg Laboratories, Ottawa, Ontario, Canada
Donald Dawson
Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
Rafal Kulik
School of Mathematics & Statistics, Carleton University, Ottawa, Ontario, Canada
Mohamedou Ould Haye
School of Mathematics & Statistics, Carleton University, Ottawa, Ontario, Canada
Barbara Szyszkowicz
School of Mathematics & Statistics, Carleton University, Ottawa, Ontario, Canada
Yiqiang Zhao

Appendix: Some Auxiliary Results from the Literature

In this section, we collect some known lemmas and theorems for weakly dependent data. We start with some results on the behaviour of partials sums:

Lemma 3 (Borovkova, Burton, Dehling [3], Lemma 2.23).

Let $(X_{k})_{k\in \mathbb{Z}}$ be a 1-approximating functional with constants (a_k)_k≥0 of an absolutely regular process with mixing coefficients (β(k))_k≥0. Suppose moreover that EX_i = 0 and that one of the following two conditions holds:

1.
X ₀ is bounded a.s. and $\sum _{k=0}^{\infty }(a_{k} +\beta (k)) <\infty.$
2.
$E\vert X_{0}\vert ^{2+\delta } <\infty$ and $\sum _{k=0}^{\infty }(a_{k}^{ \frac{\delta }{ 1+\delta } } +\beta ^{ \frac{\delta }{1+\delta } }(k)) <\infty.$

Then, as $N \rightarrow \infty$ ,

$$\displaystyle{ \frac{1} {N}ES_{N}^{2} \rightarrow EX_{ 0}^{2} + 2\sum _{ j=1}^{\infty }E(X_{ 0}X_{j}) }$$

(75)

and the sum on the r.h.s. converges absolutely.

Lemma 4 (Borovkova, Burton, Dehling [3], Lemma 2.24).

Let $(X_{k})_{k\in \mathbb{Z}}$ be a 1-approximating functional with constants (a _k ) of an absolutely regular process with mixing coefficients (β(k)) _k≥0 . Suppose moreover that EX _i = 0 and that one of the following two conditions holds:

1.
X ₀ is bounded a.s. and $\sum _{k=0}^{\infty }k^{2}(a_{k} +\beta (k)) <\infty.$
2.
$E\vert X_{0}\vert ^{4+\delta } <\infty$ and $\sum _{k=0}^{\infty }k^{2}(a_{k}^{ \frac{\delta }{ 3+\delta } } +\beta ^{ \frac{\delta }{4+\delta } }(k)) <\infty.$

Then there exits a constant C such that

$$\displaystyle{ ES_{N}^{4} \leq CN^{2}. }$$

(76)

Theorem 4 (Borovkova, Burton, Dehling [3], Theorem 4).

Let $(X_{k})_{k\in \mathbb{Z}}$ be a 1-approximating functional with constants (a _k ) _k≥0 of an absolutely regular process with mixing coefficients (β(k)) _k≥0 . Suppose moreover that EX _i = 0, $E\vert X_{0}\vert ^{4+\delta } <\infty$ and that

$$\displaystyle{ \sum _{k=0}^{\infty }k^{2}(a_{ k}^{ \frac{\delta }{ 3+\delta } } +\beta ^{ \frac{\delta }{4+\delta } }(k)) <\infty, }$$

(77)

for some δ > 0. Then, as $n \rightarrow \infty,$

$$\displaystyle{ \frac{1} {\sqrt{n}}\sum _{i=1}^{n}X_{ i} \rightarrow \mathcal{N}(0,\sigma ^{2}), }$$

(78)

where $\sigma ^{2} = EX_{0}^{2} + 2\sum _{j=1}^{\infty }E(X_{0}X_{j}).$ In case $\sigma ^{2} = 0$ , $\mathcal{N}(0,0)$ denotes the point mass at the origin. If X ₀ is bounded, the CLT continues to hold if (77) is replaced by the condition that $\sum _{k=0}^{\infty }k^{2}(a_{k} +\beta (k)) <\infty$ .

An important tool to derive asymptotic results for weakly dependent data are coupling methods. We will apply this method in the proof of Proposition 2.

Theorem 5 (Borovkova, Burton, Dehling [3], Theorem 3).

Let $(X_{n})_{n\in \mathbb{N}}$ be a 1-approximating functional with summable constants (a _k ) _k≥0 of an absolutely regular process with mixing rate (β(k)) _k≥0 . Then given integers K,L and N, we can approximate the sequence of $(K + 2L,N)-$ blocks (B _s ) _s≥1 by a sequence of independent blocks $(B_{s}^{{\prime}})_{s\geq 1}$ with the same marginal distribution in such a way that

$$\displaystyle{ P(\vert \vert B_{s} - B_{s}^{{\prime}}\vert \vert \leq 2\alpha _{ L}) \geq 1 -\beta (K) - 2\alpha _{L}, }$$

(79)

where $\alpha _{L}:= \left (2\sum _{l=L}^{\infty }a_{l}\right )^{1/2}.$

In statistical application, the question of how to estimate $\sigma ^{2}$ is important. In the situation when the observations are a functional of α-mixing process, Dehling et al. [10] propose the estimation of the variance of partial sums of dependent processes by the subsampling estimator

$$\displaystyle{ \hat{D}_{n} = \frac{1} {[n/l_{n}]}\sqrt{ \frac{\pi } {2}}\sum _{i=1}^{[n/l_{n}]}\frac{\vert \hat{T}_{i}(l_{n}) - l_{n}\tilde{U}_{n}\vert } {\sqrt{l_{n}}} }$$

(80)

with $\hat{T}_{i}(l) =\sum _{ j=(i-1)l+1}^{il}F_{n}(X_{j})$ and $\tilde{U}_{n} = \frac{1} {n}\sum _{j=1}^{n}F_{ n}(X_{j})$, where F _n(⋅ ) is the empirical distribution function.

Theorem 6 (Dehling, Fried, Sharipov, Vogel, Wornowizki [9], Theorem 1.2).

Let (X _k ) _k≥1 be a stationary, 1-approximating functional of an α-mixing processes. Suppose that for some δ > 0, $E\vert X_{1}\vert ^{2+\delta } <\infty$ , and that the mixing coefficients (α _k ) _k≥1 and the approximation constants (a _k ) _k≥1 satisfy

$$\displaystyle{ \sum _{k=1}^{\infty }(\alpha _{ k})^{ \frac{2} {2+\delta } } <\infty,\quad \sum _{k=1}^{\infty }(a_{k})^{\frac{1+\delta } {2+\delta } } <\infty. }$$

(81)

In addition, we assume that F is Lipschitz-continuous, that $\alpha _{k} = O(n^{-8})$ and that $a_{m} = O(m^{-12})$ . Then, as $n \rightarrow \infty$ , $l_{n} \rightarrow \infty$ and $l_{n} = o(\sqrt{n})$ , we have $\hat{D}_{n}\longrightarrow \sigma$ in L ₂ .

To deal with the degenerate kernel g, we need to find upper bounds for the expectations $E\left (g(X_{i_{1}},X_{j_{1}})g(X_{i_{2}},X_{j_{2}})\right )$, in terms of the maximal distance among the indices. Since 1 ≤ i ₁ < i ₂ ≤ [n λ] and $[n\lambda ] + 1 \leq j_{1} <j_{2} \leq n$, we get i ₁ < i ₂ < j ₁ < j ₂.

Lemma 5 (Dehling, Fried [8], Proposition 6.1).

Let (X _n ) _n≥1 be a 1-approximating functional with constants (a _k ) _k≥1 of an absolutely regular process with mixing coefficients (β(k)) _k≥1 and let g(x,y) be a 1-continuous bounded degenerate kernel. Then we have

$$\displaystyle{ \vert E(g(X_{i_{1}},X_{j_{1}})g(X_{i_{2}},X_{j_{2}}))\vert \leq 4S\phi (a_{[k/3]}) + 8S^{2}(\sqrt{a_{ [k/3]}} +\beta ([k/3])) }$$

(82)

where $S = \vert \sup _{x,y}g(x,y)\vert$ and $k =\max \left \{i_{2} - i_{1},j_{1} - i_{2},j_{2} - j_{1}\right \}$ .

The following two results are useful for proving tightness of a stochastic process. The first one is used to control the fluctuation of maximum. Let ξ ₁, …, ξ _n be random variables, and define $S_{k} =\xi _{1} +\ldots +\xi _{k}$ (S ₀ = 0), and $M_{n} =\max _{0\leq k\leq n}\vert S_{k}\vert$.

Theorem 7 (Billingsley [2], Theorem 10.2).

Suppose that β ≥ 0 and α > 1∕2 and that there exist nonnegative numbers u ₁ ,…,u _n such that for all positive λ

$$\displaystyle{ P\left (\vert S_{j} - S_{i}\vert \geq \lambda \right ) \leq \frac{1} {\lambda ^{4\beta }} \left (\sum _{i<l\leq j}u_{l}\right )^{2\alpha },\quad 0 \leq i \leq j \leq n\quad, }$$

(83)

then for all positive λ

$$\displaystyle{ P\left (M_{n} \geq \lambda \right ) \leq \frac{K_{\beta,\alpha }} {\lambda ^{4\beta }} \left (\sum _{0<l\leq n}u_{l}\right )^{2\alpha }, }$$

(84)

where K _β,α is a constant depending only on β and α.

Theorem 8 (Billingsley [2], Theorem 8.4).

The sequence {Y _n }, defined by

$$\displaystyle{ Y _{n}(t) = \frac{1} {\sigma \sqrt{n}}S_{[nt]} + (nt - [nt]) \frac{1} {\sigma \sqrt{n}}\xi _{[nt]+1} }$$

(85)

is tight if for each ε > 0 there exist a λ > 1 and a $n_{0} \in \mathbb{N}$ such that for n ≥ n ₀

$$\displaystyle{ P\left (\max _{i\leq n}\vert S_{k+i} - S_{k}\vert \geq \lambda \sigma \sqrt{n}\right ) \leq \frac{\epsilon } {\lambda ^{2}}. }$$

(86)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dehling, H., Fried, R., Garcia, I., Wendler, M. (2015). Change-Point Detection Under Dependence Based on Two-Sample U-Statistics. In: Dawson, D., Kulik, R., Ould Haye, M., Szyszkowicz, B., Zhao, Y. (eds) Asymptotic Laws and Methods in Stochastics. Fields Institute Communications, vol 76. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-3076-0_12

Download citation

DOI: https://doi.org/10.1007/978-1-4939-3076-0_12
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-3075-3
Online ISBN: 978-1-4939-3076-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Change-Point Detection Under Dependence Based on Two-Sample U-Statistics

Abstract

Similar content being viewed by others

On change-points tests based on two-samples U-Statistics for weakly dependent observations

Change Point Detection with Multivariate Observations Based on Characteristic Functions

Gradual change-point analysis based on Spearman matrices for multivariate time series

Keywords

1 Introduction

2 Definitions and Main Results

Definition 1.

Definition 2.

Definition 3.

Theorem 1.

Remark 1.

Theorem 2.

3 Application to Change Point Problems

Theorem 3.

Proof.

Remark 2.

4 Simulation Results

5 Data Example

6 Auxiliary Results

Proposition 1.

Lemma 1.

Proof.

Lemma 2.

Proof.

Proof of Proposition 1.

7 Proof of Main Results

Proof of Theorem 2.

Proposition 2.

Proof.

Proof of Theorem 1.

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Some Auxiliary Results from the Literature

Appendix: Some Auxiliary Results from the Literature

Lemma 3 (Borovkova, Burton, Dehling [3], Lemma 2.23).

Lemma 4 (Borovkova, Burton, Dehling [3], Lemma 2.24).

Theorem 4 (Borovkova, Burton, Dehling [3], Theorem 4).

Theorem 5 (Borovkova, Burton, Dehling [3], Theorem 3).

Theorem 6 (Dehling, Fried, Sharipov, Vogel, Wornowizki [9], Theorem 1.2).

Lemma 5 (Dehling, Fried [8], Proposition 6.1).

Theorem 7 (Billingsley [2], Theorem 10.2).

Theorem 8 (Billingsley [2], Theorem 8.4).

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation