Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Change-point tests address the question whether a stochastic process is stationary during the entire observation period or not. In the case of independent data, there is a well-developed theory; see the book by Csörgő and Horváth [6] for an excellent survey. When the data are dependent, much less is known. The CUSUM statistic has been intensely studied, even for dependent data; see again Csörgő and Horváth [6]. The CUSUM test, however, is not robust against outliers in the data. In the present paper, we study a robust test which is based on the two-sample Wilcoxon test statistic. Simulations show that this test outperforms the CUSUM test in the case of heavy-tailed data.

In order to derive the asymptotic distribution of the test, we study the stochastic process

$$\displaystyle{ \sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}h(X_{ i},X_{j}),\;0 \leq \lambda \leq 1, }$$
(1)

where \(h: \mathbb{R}^{2} \rightarrow \mathbb{R}\) is a kernel function. In the case of independent data, the asymptotic distribution of this process has been studied by Csörgő and Horváth [5]. In the present paper, we extend their result to short range dependent data \((X_{i})_{i\geq 1}\). Similar results have been obtained for long range dependent data by Dehling, Rooch and Taqqu [10], albeit with completely different methods.

U-statistics have been introduced by Hoeffding [14], where the asymptotic normality was established both for the one-sample as well as the two-sample U-statistic in the case of independent data. The asymptotic distribution of one-sample U-statistics of dependent data was studied by Sen [18, 19], Yoshihara [22], Denker and Keller [12, 13] and by Borovkova, Burton and Dehling [3] in the so-called non-degenerate case, and by Babbel [1] and Leucht [16] in the degenerate case. For two-sample U-statistics, Dehling and Fried [8] established the asymptotic normality of \(\sum _{i=1}^{n_{1}}\sum _{ j=n_{1}+1}^{n_{1}+n_{2}}h(X_{ i},X_{j})\) for dependent data, when \(n_{1},n_{2} \rightarrow \infty\). The main theoretical result of the present paper is a functional version of this limit theorem.

In our paper, we focus on data that can be represented as functionals of a mixing process. In this way, we cover most examples from time series analysis, such as ARMA and ARCH processes, but also data from chaotic dynamical systems. For a survey of processes that have a representation as functional of a mixing process, see e.g. Borovkova, Burton and Dehling [3]. Earlier references can be found in Ibragimov and Linnik [15], Denker [11] and Billingsley [2].

2 Definitions and Main Results

Given the samples \(X_{1},\ldots,X_{n_{1}}\) and \(Y _{1},\ldots,Y _{n_{2}}\), and a kernel h(x, y), we define the two-sample U-statistic

$$\displaystyle{ U_{n_{1},n_{2}}:= \frac{1} {n_{1}\,n_{2}}\sum _{i=1}^{n_{1} }\sum _{j=1}^{n_{2} }h(X_{i},Y _{j}). }$$
(2)

More generally, one can define U-statistics with multivariate kernels \(h: \mathbb{R}^{k} \times \mathbb{R}^{l} \rightarrow \mathbb{R}\). In the present paper, for the ease of exposition, we will restrict attention to bivariate kernels h(x, y). The main results, however, can easily be extended to the multivariate case.

Assuming that \((X_{i})_{i\geq 1}\) and \((Y _{i})_{i\geq 1}\) are stationary processes with one-dimensional marginal distribution functions F and G, respectively, we can test the hypothesis H: F = G using the two-sample U-statistic. E.g., the kernel \(h(x,y) = y - x\) leads to the U-statistic

$$\displaystyle{ U_{n_{1},n_{2}} = \frac{1} {n_{1}\,n_{2}}\sum _{i=1}^{n_{1} }\sum _{j=1}^{n_{2} }(Y _{j} - X_{i}) = \frac{1} {n_{2}}\sum _{j=1}^{n_{2} }Y _{j} - \frac{1} {n_{1}}\sum _{i=1}^{n_{1} }X_{i}, }$$
(3)

and thus to the familiar two-sample Gauß-test. Similarly, the kernel \(h(x,y) = 1_{\{x\leq y\}}\) leads to the U-statistic

$$\displaystyle{ U_{n_{1},n_{2}} = \frac{1} {n_{1}\,n_{2}}\sum _{i=1}^{n_{1} }\sum _{j=1}^{n_{2} }1_{\{X_{i}\leq X_{j}\}}, }$$
(4)

and thus to the 2-sample Mann-Whitney-Wilcoxon test.

In the present paper, we investigate tests for a change-point in the mean of a stochastic process \((X_{i})_{i\geq 1}\). We consider the model

$$\displaystyle{ X_{i} =\mu _{i} +\xi _{i},\;i \geq 1, }$$
(5)

where \((\mu _{i})_{i\geq 1}\) are unknown constants and where (ξ i ) i ≥ 1 is a stochastic process. We want to test the hypothesis \(H:\;\mu _{1} =\ldots =\mu _{n}\) against the alternative that there exists 1 ≤ k ≤ n − 1 such that \(\mu _{1} =\ldots =\mu _{k}\neq \mu _{k+1} =\ldots =\mu _{n}\).

Tests for the change-point problem are often derived from 2-sample tests applied to the samples X 1, , X k and X k+1, , X n , for all possible 1 ≤ k ≤ n − 1. For two-sample tests based on U-statistics with kernel h(x, y), this leads to the test statistic \(\sum _{i=1}^{k}\sum _{j=k+1}^{n}h(X_{i},X_{j})\), 1 ≤ k ≤ n, and thus to the process

$$\displaystyle{ U_{n}(\lambda ) =\sum _{ i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}h(X_{ i},X_{j}),\;0 \leq \lambda \leq 1. }$$
(6)

In this paper, we will derive a functional limit theorem for the process \((U_{n}(\lambda ))_{0\leq \lambda \leq 1}\), n ≥ 1. Specifically, we will show that under certain technical assumptions on the kernel h and on the process (X i ) i ≥ 1, a properly centered and renormalized version of \((U_{n}(\lambda ))_{0\leq \lambda \leq 1}\) converges to a Gaussian process.

In our paper, we will assume that the process (ξ i ) i ≥ 0 is weakly dependent. More specifically, we will assume that (ξ i ) i ≥ 0 can be represented as a functional of an absolutely regular process.

Definition 1.

  1. (i)

    Given a stochastic process \((X_{n})_{n\in \mathbb{Z}}\) we denote by \(\mathcal{A}_{l}^{k}\) the \(\sigma -\) algebra generated by (X k , , X l ). The process is called absolutely regular if

    $$\displaystyle{ \beta (k) =\sup _{n}\left \{\sup \sum _{j=1}^{J}\sum _{ i=1}^{I}\vert P(A_{ i} \cap B_{j}) - P(A_{i})P(B_{j})\vert \right \}\rightarrow 0, }$$
    (7)

    as \(k \rightarrow \infty\), where the last supremum is over all finite \(\mathcal{A}_{1}^{n}-\) measurable partitions (A 1, , A I ) and all finite \(\mathcal{A}_{n+k}^{\infty }-\) measurable partitions (B 1, , B J ). 

  2. (ii)

    The process (X n ) n ≥ 1 is called a two-sided functional of an absolutely regular sequence if there exists an absolutely regular process \((Z_{n})_{n\in \mathbb{Z}}\) and a measurable function \(f: \mathbb{R}^{\mathbb{Z}} \rightarrow \mathbb{R}\) such that

    $$\displaystyle{X_{i} = f((Z_{i+n})_{n\in \mathbb{Z}}).}$$

    Analogously, (X n ) n ≥ 1 is called a one-sided functional if \(X_{i} = f((Z_{i+n})_{n\geq 0})\).

  3. (iii)

    The process (X n ) n ≥ 1 is called 1-approximating functional with coefficients (a k ) k ≥ 1 if

    $$\displaystyle{ E\left \vert X_{i} - E(X_{i}\vert Z_{i-k},\ldots,Z_{i+k})\right \vert \leq a_{k}. }$$
    (8)

In addition to weak dependence conditions on the process (X i ) i ≥ 1, the asymptotic analysis of the process (6) requires some continuity assumptions on the kernel functions h(x, y). We use the notion of 1-continuity, which was introduced by Borovkova, Burton and Dehling [3]. Alternative continuity conditions have been used by Denker and Keller [13].

Definition 2.

The kernel h(x, y) is called 1-continuous, if there exists a function \(\phi: (0,\infty ) \rightarrow (0,\infty )\) with ϕ(ε) = o(1) as \(\epsilon \rightarrow 0\) such that for all ε > 0

$$\displaystyle\begin{array}{rcl} E(\vert h(X^{{\prime}},Y ) - h(X,Y )\vert 1_{\{ \vert X-X^{{\prime}}\vert \leq \epsilon \}}) \leq \phi (\epsilon )& &{}\end{array}$$
(9)
$$\displaystyle\begin{array}{rcl} E(\vert h(X,Y ^{{\prime}}) - h(X,Y )\vert 1_{\{ \vert Y -Y ^{{\prime}}\vert \leq \epsilon \}}) \leq \phi (\epsilon )& &{}\end{array}$$
(10)

for all random variables \(X,X^{{\prime}},Y\) and \(Y ^{{\prime}}\) having the same marginal distribution as X 1, and such that X, Y are either independent or have joint distribution \(P_{(X_{1},X_{k})}\), for some integer k.

The most important technical tool in the study of U-statistics is Hoeffding’s decomposition, originally introduced by Hoeffding [14]. If \(E\vert h(X,Y )\vert <\infty\) for two independent random variables X and Y with the same distribution as X 1, we can write

$$\displaystyle{ h(x,y) =\theta +h_{1}(x) + h_{2}(y) + g(x,y), }$$
(11)

where the terms on the right-hand side are defined as follows:

$$\displaystyle\begin{array}{rcl} \theta & =& \int \!\!\int h(x,y)dF(x)dF(y) {}\\ h_{1}(x)& =& \int h(x,y)dF(y) -\theta {}\\ h_{2}(y)& =& \int h(x,y)dF(x) -\theta {}\\ g(x,y)& =& h(x,y) - h_{1}(x) - h_{2}(y) -\theta. {}\\ \end{array}$$

Here, F denotes the distribution function of the random variables X i . Observe that, by Fubini’s theorem,

$$\displaystyle{E(h_{1}(X)) = E(h_{2}(X)) = 0.}$$

In addition, the kernel g(x, y) is degenerate in the sense of the following definition.

Definition 3.

Let (X i ) i ≥ 1 be a stationary process, and let g(x, y) be a measurable function. We say that g(x, y) is degenerate if

$$\displaystyle{ E(g(x,X_{1})) = E(g(X_{1},y)) = 0, }$$
(12)

for all \(x,y \in \mathbb{R}\).

The following theorem, a functional central limit theorem for two-sample U-statistics of dependent data, is the main theoretical result of the present paper.

Theorem 1.

Let (X n ) n≥1 be a 1-approximating functional with constants (a k ) k≥1 of an absolutely regular process with mixing coefficients (β(k)) k≥1 , and let h(x,y) be a 1-continuous bounded kernel, satisfying

$$\displaystyle{ \sum _{k=1}^{\infty }k^{2}(\beta (k) + \sqrt{a_{ k}} +\phi (a_{k})) <\infty, }$$
(13)

Then, as \(n \rightarrow \infty\) , the D[0,1]-valued process

$$\displaystyle{ T_{n}(\lambda ):= \frac{1} {n^{3/2}}\sum _{i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}(h(X_{ i},X_{j})-\theta ),\;0 \leq \lambda \leq 1, }$$
(14)

converges in distribution towards a mean-zero Gaussian process with representation

$$\displaystyle{ Z(\lambda ) = (1-\lambda )W_{1}(\lambda ) +\lambda (W_{2}(1) - W_{2}(\lambda )),\;0 \leq \lambda \leq 1, }$$
(15)

where \((W_{1}(\lambda ),W_{2}(\lambda ))_{0\leq \lambda \leq 1}\) is a two-dimensional Brownian motion with mean zero and covariance function \(\mathop{\mathrm{Cov}}\nolimits (W_{k}(s),W_{l}(t)) =\min (s,t)\sigma _{kl}\) , where

$$\displaystyle{ \sigma _{kl} = E(h_{k}(X_{0})h_{l}(X_{0})) + 2\,\sum _{j=1}^{\infty }\mathop{\mathrm{Cov}}\nolimits (h_{ k}(X_{0}),h_{l}(X_{j})),\;k,l = 1,2. }$$
(16)

Remark 1.

  1. (i)

    In the case of i.i.d. data, Theorem 1 was established by Csörgő and Horváth [5]. In the case of long-range dependent data, weak convergence of the process \((T_{n}(\lambda ))_{0\leq \lambda \leq 1}\) has been studied by Dehling, Rooch and Taqqu [10] and by Rooch [17], albeit with a normalization different from n 3∕2.

  2. (ii)

    Using the representation (15), one can calculate the autocovariance function of the process \((Z(\lambda ))_{0\leq \lambda \leq 1}\). We obtain

    $$\displaystyle\begin{array}{rcl} \mathop{\mathrm{Cov}}\nolimits (Z(\lambda ),Z(\mu ))& =& \sigma _{11}[(1-\lambda )(1-\mu )\min \{\lambda,\mu \}] {}\\ & & +\sigma _{22}[\lambda \mu (1 -\mu -\lambda +\min \{\lambda,\mu \})] {}\\ & & +\sigma _{12}[\mu (1-\lambda )(\lambda -\min \{\lambda,\mu \}) +\lambda (1-\mu )(\mu -\min \{\lambda,\mu \})]. {}\\ \end{array}$$
  3. (iii)

    We conjecture that a similar theorem also holds for unbounded kernels under some moments conditions and faster mixing rates (similar to Theorem 2.7 of Sharipov, Wendler [20]). As our main application is the Wilcoxon test, where the kernel is bounded, we restrict the theorem to the case of bounded kernels.

  4. (iv)

    For the kernel \(h(x,y) = y - x\), we can analyze the asymptotic behavior of the process T n (λ) using the functional central limit theorem (FCLT). Note that, since \(X_{j} - X_{i} = (X_{j} - E(X_{j})) - (X_{i} - E(X_{i}))\), we may assume without loss of generality that X i has mean zero. Then we get the representation

    $$\displaystyle\begin{array}{rcl} T_{n}(\lambda )& =& \frac{1} {n^{3/2}}\sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}(X_{ j} - X_{i}) \\ & =& \frac{[n\lambda ]} {n} \frac{1} {\sqrt{n}}\sum _{i=1}^{n}X_{ i} - \frac{1} {\sqrt{n}}\sum _{i=1}^{[n\lambda ]}X_{ i}. {}\end{array}$$
    (17)

    Thus, weak convergence of \((T_{n}(\lambda ))_{0\leq \lambda \leq 1}\) can be derived from the FCLT for the partial sum process \(\frac{1} {\sqrt{n}}\sum _{i=1}^{[n\lambda ]}X_{i}\). Such FCLTs have been proved under a wide range of conditions, e.g. for functionals of uniformly mixing data in Billingsley [2].

We finally want to state an important special case of Theorem 1, namely when the kernel is anti-symmetric, i.e. when \(h(x,y) = -h(y,x)\). Kernels that occur in connection with change-point tests usually have this property. For anti-symmetric kernels, the limit process has a much simpler structure; moreover one can give a simpler direct proof in this case. Note that for independent random variables X, Y we have by anti-symmetry that \(Eh(X,Y ) = -Eh(Y,X) = -Eh(X,Y )\) and so \(\theta = Eh(X,Y ) = 0\).

Theorem 2.

Let (X n ) n≥1 be a 1-approximating functional with constants (a k ) k≥1 of an absolutely regular process with mixing coefficients (β(k)) k≥1 , and let h(x,y) be a 1-continuous bounded anti-symmetric kernel, such that (13) holds. Then, as \(n \rightarrow \infty\) , the D[0,1]-valued process

$$\displaystyle{ T_{n}(\lambda ):= \frac{1} {n^{3/2}}\sum _{i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}h(X_{ i},X_{j}),\;0 \leq \lambda \leq 1, }$$
(18)

converges in distribution towards the mean-zero Gaussian process \(\sigma \,W^{(0)}(\lambda ),\;0 \leq \lambda \leq 1\) , where (W 0 (λ)) 0≤λ≤1 is a standard Brownian bridge and

$$\displaystyle{ \sigma ^{2} =\mathop{ \mathrm{Var}}\nolimits (h_{ 1}(X_{1})) + 2\sum _{i=2}^{\infty }\mathop{\mathrm{Cov}}\nolimits (h_{ 1}(X_{1}),h_{1}(X_{k})). }$$
(19)

3 Application to Change Point Problems

In this section, we will apply Theorem 1 in order to derive the asymptotic distribution of two change-point test statistics. Specifically, we wish to test the hypothesis

$$\displaystyle{ H_{0}:\mu _{1} =\ldots =\mu _{n} }$$
(20)

against the alternative of a level shift at an unknown point in time, i.e.

$$\displaystyle{ H_{A}:\mu _{1} =\ldots =\mu _{k}\neq \mu _{k+1} =\ldots =\mu _{n},\text{ for some }k \in \{ 1,\ldots,n - 1\}. }$$
(21)

We consider the following two test statistics,

$$\displaystyle\begin{array}{rcl} T_{1,n}& =& \max _{1\leq k<n}\left \vert \frac{1} {n^{3/2}}\sum _{i=1}^{k}\sum _{ j=k+1}^{n}\left (1_{\{ X_{i}<X_{j}\}} - 1/2\right )\right \vert {}\end{array}$$
(22)
$$\displaystyle\begin{array}{rcl} T_{2,n}& =& \max _{1\leq k<n}\left \vert \frac{1} {n^{3/2}}\sum _{i=1}^{k}\sum _{ j=k+1}^{n}\left (X_{ j} - X_{i}\right )\right \vert.{}\end{array}$$
(23)

Theorem 3.

Let (X n ) n≥1 be a 1-approximating functional with constants (a k ) k≥1 of an absolutely regular process with mixing coefficients (β(k)) k≥1 , satisfying (13) , and assume that X 1 has a distribution function F(x) with bounded density. Then, under the null hypothesis H 0 ,

$$\displaystyle{ T_{1,n} \rightarrow \sigma _{1}\sup _{0\leq \lambda \leq 1}\vert W^{(0)}(\lambda )\vert, }$$
(24)

where (W (0) (λ)) 0≤λ≤1 denotes the standard Brownian bridge process, and where

$$\displaystyle{ \sigma _{1}^{2} =\mathop{ \mathrm{Var}}\nolimits (F(X_{ 1})) + 2\,\sum _{k=2}^{\infty }\mathop{\mathrm{Cov}}\nolimits (F(X_{ 1}),F(X_{k})). }$$
(25)

Assuming that \(E\vert X_{i}\vert ^{2+\delta } <\infty\) , \(\beta (k) = O(k^{-(2+\delta )/\delta })\) and \(a_{k} = O(k^{-(1+\delta )/2\delta })\) , and under the null hypothesis H 0 ,

$$\displaystyle{ T_{2,n} \rightarrow \sigma _{2}\sup _{0\leq \lambda \leq 1}\vert W^{(0)}(\lambda )\vert, }$$
(26)

where

$$\displaystyle{ \sigma _{2}^{2} =\mathop{ \mathrm{Var}}\nolimits (X_{ 1}) + 2\,\sum _{k=2}^{\infty }\mathop{\mathrm{Cov}}\nolimits (X_{ 1},X_{k}). }$$
(27)

Proof.

We will establish weak convergence of T 1, n . In order to do so, we will apply Theorem 1 to the kernel h(x, y) = 1{x < y}. Borovkova, Burton and Dehling [3] showed that this kernel is 1-continuous. By continuity of the distribution function of X 1, we get that \(\theta =\int \!\!\int 1_{\{x<y\}}dF(x)dF(y) = 1/2\). Moreover, we get

$$\displaystyle\begin{array}{rcl} h_{1}(x)& =& P(x <X_{1}) -\frac{1} {2} = \frac{1} {2} - F(x) {}\\ h_{2}(x)& =& P(X_{1} <x) -\frac{1} {2} = F(x) -\frac{1} {2}. {}\\ \end{array}$$

Note that \(h_{2}(x) = -h_{1}(x)\). Hence \(W_{2}(\lambda ) = -W_{1}(\lambda )\), and thus the limit process in Theorem 1 has the representation

$$\displaystyle{Z(\lambda ) = (1-\lambda )W_{1}(\lambda ) +\lambda (W_{2}(1) - W_{2}(\lambda )) = W_{1}(\lambda ) -\lambda W_{1}(1).}$$

Here W 1(λ) is a Brownian motion with variance \(\sigma _{1}^{2}\). Weak convergence of T 2, n can be shown directly from the functional central limit theorem for the partial sum process; see Corollary 3.2 of Wooldridge and White [21]. We have to check the L 2-near epoch dependence. Note that by our assumptions

$$\displaystyle\begin{array}{rcl} & & E\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert ^{2} \\ & & \ = E\left [\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert ^{2}1_{ \{\vert X_{0}-E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\vert \leq a_{l}^{- \frac{1} {1+\delta } }\}}\right ] \\ & & \quad \qquad \qquad + E\left [\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert ^{2}1_{ \{\vert X_{0}-E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\vert>a_{l}^{- \frac{1} {1+\delta } }\}}\right ] \\ & & \leq a_{l}^{- \frac{1} {1+\delta } }E\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert + a_{l}^{ \frac{\delta }{ 1+\delta } }E\left \vert X_{0} - E[X_{0}\vert Z_{-l},\ldots,Z_{l}]\right \vert ^{2+\delta } \\ & & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \leq Ca_{l}^{ \frac{\delta }{ 1+\delta } } = O(l^{-1/2}), {}\end{array}$$
(28)

so the condition of Corollary 3.2 of Wooldridge and White [21] holds. Hence, the partial sum process \(( \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt]}X_{i})_{0\leq t\leq 1}\) converges in distribution to \((\sigma _{2}\,W(t))_{0\leq t\leq 1}\), where W is standard Brownian motion. Convergence in distribution of T 2, n follows by an application of the continuous mapping theorem.

Remark 2.

  1. (i)

    The distribution of \(\sup _{0\leq \lambda \leq 1}\vert W(\lambda )\vert\) is the well-known Kolmogorov-Smirnov distribution. Quantiles of the Kolmogorov-Smirnov distribution can be found in most statistical tables.

  2. (ii)

    In order to apply Theorem 3, we need to estimate the variances \(\sigma _{1}^{2}\) and \(\sigma _{2}^{2}\). Regarding \(\sigma _{2}^{2}\) given in expression (27), we apply the non-overlapping subsampling estimator

    $$\displaystyle{ \hat{\sigma }_{2}^{2} = \frac{1} {[n/l_{n}]}\sum _{i=1}^{[n/l_{n}]} \frac{1} {l_{n}}\left (\sum _{j=(i-1)l_{n}+1}^{il_{n} }X_{j} -\frac{l_{n}} {n} \sum _{j=1}^{n}X_{ j}\right )^{2} }$$
    (29)

    investigated by Carlstein [4] for α-mixing data. In case of AR(1)-processes, Carlstein derives

    $$\displaystyle{ l_{n} =\max (\lceil n^{1/3}(2\rho /(1 -\rho ^{2}))^{2/3}\rceil,1) }$$
    (30)

    as the choice of the block length which minimizes the MSE asymptotically, with ρ being the autocorrelation coefficient at lag 1.

Regarding \(\sigma _{1}^{2}\) given in (25), one faces the additional challenge that the distribution function F is unknown. This problem has been addressed, e.g. in Dehling, Fried, Sharipov, Vogel and Wornowizki [9], for the case of functionals of absolutely regular processes and F being estimated by the empirical distribution function F n . The authors find the subsampling estimator for \(\sigma _{1}^{2}\)

$$\displaystyle{ \hat{\sigma }_{1} = \frac{1} {[n/l_{n}]}\sqrt{ \frac{\pi } {2}}\sum _{i=1}^{[n/l_{n}]} \frac{1} {\sqrt{l_{n}}}\left \vert \sum _{j=(i-1)l_{n}+1}^{il_{n} }F_{n}(X_{j}) -\frac{l_{n}} {n} \sum _{j=1}^{n}F_{ n}(X_{j})\right \vert, }$$
(31)

employing non-overlapping subsampling, to give smaller biases, but somewhat larger MSEs than the corresponding overlapping subsampling estimator. The adaptive choice of the block length l n proposed by Carlstein worked well in their simulations if the data were generated from a stationary ARMA(1,1) model and an estimate of ρ was plugged in. In the next section, we will explore this and other proposals in situations with level shifts and normally or heavy-tailed innovations.

4 Simulation Results

The assumptions regarding the underlying process (X i ) in Theorem 1 are satisfied by a wide range of time series, such as AR and ARMA processes. To illustrate the results and to investigate the finite sample behavior and the power of the tests based on T 1, n and T 2, n , we will give some simulation results. We study the underlying change-point model

$$\displaystyle{ X_{i} = \left \{\begin{array}{cl} \xi _{i} &\text{if }i = 1,\ldots,[n\lambda ] \\ \mu +\xi _{i}&\text{if }i = [n\lambda ] + 1,\ldots,n. \end{array} \right. }$$
(32)

Within this model, the hypothesis of no change is equivalent to μ = 0. We assume that the noise follows an AR(1) process, i.e. that

$$\displaystyle{ \xi _{i} =\rho \,\xi _{i-1} +\epsilon _{i}, }$$
(33)

where − 1 < ρ < 1, and where the innovations ε i are i.i.d. random variables with mean zero, bounded density and finite second moments. The innovations ε i are generated from a standard normal or a t ν -distribution with ν = 3 degrees of freedom, scaled to have the same 84.13 % percentile as the standard normal, which is 1. The autoregression coefficient is varied in ρ = { 0. 0, 0. 4, 0. 8}, corresponding to zero, moderate or strong positive autocorrelation, and the sample size is n = 200. For the choice of the block length we used Carlstein’s adaptive rule outlined above, or a fixed block length of l n  = 9, which is in good agreement with the empirical findings of Dehling et al. [10] for larger sample sizes, and their theoretical result that l n should be chosen as \(o(\sqrt{n})\) to achieve consistency. For comparison, we also include tests employing overlapping subsampling for estimation of the asymptotical variance, applying the same block lengths as the non-overlapping versions.

Table 1 contains the empirical levels (i.e. the fraction of rejections) of the tests with an asymptotic level of 5 %, obtained from 4000 simulation runs for each situation. Note that the tests developed under the assumption of independence, not adjusting for autocorrelation, become strongly oversized with an increasingly positive autocorrelation, i.e. they reject a true null hypothesis far too often, and are practically useless already for ρ = 0. 4. The performance of the adjusted tests is much better in this respect and in a good agreement with the asymptotic results. Only if the autocorrelation is strong (ρ = 0. 8), the tests with a fixed block length become somewhat anti-conservative (oversized), and even more so for the CUSUM-test. Longer block lengths are needed for stronger positive autocorrelations, and Carlstein’s adaptive block length (30) adjusts for this. There is little difference between the tests employing overlapping and non-overlapping subsampling here.

Table 1 Empirical level of the tests based on T 1, n and T 2, n , for n = 200, with fixed or adaptive subsampling block length l n and overlapping (ol) or non-overlapping (nol) subsampling. The results are for AR(1) observations with different lag-one autocorrelations ρ and different t 3-distributed innovations, and based on 4000 simulation runs each

In order to investigate the powers of the tests under the alternative, we consider shifts of increasing height μ, generating 400 data sets for each situation. The sample size is again n = 200, and the change point is at observation number \(\tau = [\lambda n] = 100\).

Figure 1 illustrates the powers of the different versions of the tests in case of Gaussian or t 3-distributed innovations and several autocorrelation coefficients ρ. Under normality, the CUSUM test T 2, n is somewhat more powerful than the test T 1, n based on the Wilcoxon statistic, while under the t 3-distribution it is the other way round. The CUSUM test with the fixed block length considered here becomes strongly oversized if ρ is large, while this effect is less severe for the test based on the Wilcoxon statistic. Carlstein’s adaptive choice of the block length increases the power if ρ is small and improves the size of the test substantially if ρ is large. The tests employing overlapping subsampling (not shown here) perform even slightly more powerful in case of zero or moderate autocorrelations, but much less powerful in case of strong autocorrelations. We have also considered the case of negative autocorrelation (\(\rho = -0.4\), not shown here). We obtained similar results for the power of the test based on the Wilcoxon statistic relatively to that of the CUSUM test, and little difference between using a fixed or the adaptive block length.

Fig. 1
figure 1

Power of the tests in case of a shift in the middle of an AR(1) process with Gaussian (left) or t 3-innovations (right) and different lag one correlations ρ = 0. 0 (top), ρ = 0. 4 (middle) or ρ = 0. 8 (bottom), n = 200. Wilcoxon test T n, 1 (bold lines) and CUSUM test T n, 2 (thin lines). Adjustment by non-overlapping subsampling with fixed (black) or adaptive block length (dashed)

The tests with Carlstein’s adaptive choice of the block length could be improved further by using a more sophisticated estimate of ρ than the ordinary sample autocorrelation used here. The latter is positively biased in the presence of a shift, which leads to too large choices of the block length. This negative effect becomes more severe for larger values of ρ, since the plug-in-estimate of the asymptotically MSE-optimal choice of l n increases more rapidly if \(\hat{\rho }\) is close to 1, while it is rather stable for moderate and small values of \(\hat{\rho }\). In our study, for ρ = 0 the average value chosen for l n increases from about 2 to about 3, only, as the height of the shift increases, while it increases from about 6 to about 9 if ρ = 0. 4, and even from about 16 to about 24 if ρ = 0. 8. An estimate of the autocorrelation coefficient which resists shifts could be used, e.g. by applying a stepwise procedure which estimates the possible time of occurrence of a shift before calculating \(\hat{\rho }\) from the corrected data, but this will not be pursued here.

5 Data Example

For illustration we apply the tests to time series data representing the monthly average daily minimum temperatures in Potsdam, Germany, measured between January 1893 and December 1992. The 1200 data points for these 100 years have been deseasonalized by subtracting the median value from each calendar month, see Fig. 2. Our interest is in whether the level of this time series is constant or whether there is a monotonic change. Such a systematic change is likely to show a trend-like behavior and not a sharp shift, but nevertheless we would like a change-point test to detect such a change if its null hypothesis is a constant level.

Fig. 2
figure 2

Deseasonalized time series representing the monthly average daily minimum temperatures in Potsdam, Germany

The empirical autocorrelation and partial autocorrelation functions suggest a first order autoregressive model with lag-one autocorrelation about 0.25 for the deseasonalized data. The test statistics take their maximum values after time point 595, i.e. rather in the middle of the time series. The resulting p-values are 0.23 and 0.16 for the CUSUM test with the fixed and the adaptive block length, respectively. As opposed to this, both versions of the Wilcoxon based test become significant as the corresponding p-values are 0.04 and 0.015, respectively. The differences between the results agree with the better power behavior of the Wilcoxon based test relatively to the CUSUM test in case of the (left-)skewed distributions of minimum temperatures, and the better power of the versions employing the adaptive block length over those with the fixed block length considered here in case of small positive autocorrelations. The sample median of the second time period is about 0.4 degrees larger than that of the first period.

6 Auxiliary Results

In this section, we will prove some auxiliary results which will play a crucial role in the proof of Theorem 1. The main result of this section is the following proposition, which essentially shows that the degenerate part in the Hoeffding decomposition of the U-statistic T n (λ) is uniformly negligible.

Proposition 1.

Let (X n ) n≥1 be a 1-approximating functional with constants (a k ) k≥1 of an absolutely regular process with mixing coefficients (β(k)) k≥1 , satisfying

$$\displaystyle{ \sum _{k=1}^{\infty }k(\beta (k) + \sqrt{a_{ k}} +\phi (a_{k})) <\infty. }$$
(34)

Moreover, let g(x,y) be a 1-continuous bounded degenerate kernel. Then, as \(n \rightarrow \infty\) ,

$$\displaystyle{ \frac{1} {n^{3/2}}\sup _{0\leq \lambda \leq 1}\left \vert \sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}g(X_{ i},X_{j})\right \vert \rightarrow 0 }$$
(35)

in probability.

The proof of Proposition 1 requires some moment bounds for increments of U-statistics of degenerate kernels, which we will now state as separate lemmas.

Lemma 1.

Let (X n ) n≥1 be a 1-approximating functional with constants (a k ) k≥1 of an absolutely regular process with mixing coefficients (β(k)) k≥1 , satisfying

$$\displaystyle{ \sum _{k=1}^{\infty }k(\beta (k) + \sqrt{a_{ k}} +\phi (a_{k})) <\infty. }$$
(36)

Moreover, let g(x,y) be a 1-continuous bounded degenerate kernel. Then, there exists a constant C 1 such that

$$\displaystyle{ E\left (\sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}g(X_{ i},X_{j})\right )^{2} \leq C_{ 1}[n\lambda ](n - [n\lambda ]). }$$
(37)

Proof.

We can write

$$\displaystyle\begin{array}{rcl} & & E\left (\sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}g(X_{ i},X_{j})\right )^{2} =\sum _{ i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}E(g(X_{ i},X_{j}))^{2} \\ & & \qquad \qquad \qquad \qquad + 2\sum _{1\leq i_{1}\neq i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}\neq j_{2}\leq n}E\left (g(X_{i_{1}},X_{j_{1}})g(X_{i_{2}},X_{j_{2}})\right ) {}\end{array}$$
(38)

The elements of the first sum all are bounded, hence

$$\displaystyle{ \sum _{i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}E(g(X_{ i},X_{j}))^{2} \leq C[n\lambda ](n - [n\lambda ]). }$$
(39)

Concerning the second sum, by Lemma 5, we get

$$\displaystyle\begin{array}{rcl} & & \sum _{1\leq i_{1}<i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}<j_{2}\leq n}E\left (g(X_{i_{1}},X_{j_{1}})g(X_{i_{2}},X_{j_{2}})\right ) \\ & & \quad \qquad \qquad \leq 4\,S\sum _{1\leq i_{1}<i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}\leq j_{2}\leq n}\phi (a_{[k/3]}) \\ & & \qquad \qquad \qquad \qquad \ + 8\,S^{2}\sum _{ 1\leq i_{1}<i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}\leq j_{2}\leq n}(\sqrt{a_{[k/3]}} +\beta ([k/3])){}\end{array}$$
(40)

with \(k =\max \{ \vert i_{2} - i_{1}\vert,\vert j_{2} - j_{1}\vert \}\). We will first treat the summands with \(k = i_{2} - i_{1}\). Suppose for one moment that k is fixed and we will bound the number of indices that appear in the sum. Observe that in this case we have [n λ] ways to choose i 1, once i 1 is chosen we have one way to pick i 2 because \(i_{2} = i_{1} + k\). For j 1 we have as before n − [n λ] ways to pick this index and then for each j 1, j 2 need to be in the interval [j 1, j 1 + k] and there are exactly k integers in such interval.

$$\displaystyle\begin{array}{rcl} & & \sum _{1\leq i_{1}<i_{2}\leq [n\lambda ]}\sum _{[n\lambda ]+1\leq j_{1}<j_{2}\leq n}\left (4S\phi (a_{[k/3]}) + 8S^{2}\sqrt{a_{ [k/3]}} + 8S^{2}\beta ([k/3])\right ) \\ & & \leq C[n\lambda ](n - [n\lambda ])\left (\sum _{k=1}^{n}k\phi (a_{ k}) +\sum _{ k=1}^{n}k\sqrt{a_{ k}} +\sum _{ k=1}^{n}k\beta (k)\right ) \leq C[n\lambda ](n - [n\lambda ]){}\end{array}$$
(41)

Analogously we can find the bounds for the terms with \(k = i_{1} - i_{2}\), \(k = j_{2} - j_{1}\) and \(k = j_{1} - j_{2}\) using the conditions of summability.

We now define the process G(λ), 0 ≤ λ ≤ 1, by

$$\displaystyle{ G_{n}(\lambda ):= n^{-3/2}\sum _{ i=1}^{[n\lambda ]}\sum _{ j=[n\lambda ]+1}^{n}g(X_{ i},X_{j}),\quad 0 \leq \lambda \leq 1. }$$
(42)

Lemma 2.

Under the conditions of Lemma 1 , there exists a constant C such that

$$\displaystyle{ E(\vert G_{n}(\eta ) - G_{n}(\mu )\vert ^{2}) \leq \frac{C} {n} (\eta -\mu ), }$$
(43)

for all 0 ≤μ ≤η ≤ 1.

Proof.

We can write

$$\displaystyle\begin{array}{rcl} & & E(\vert G_{n}(\eta ) - G_{n}(\mu )\vert ^{2}) {}\notag \\ & & \quad \ \leq \frac{2} {n^{3}}E\left (\sum _{i=1}^{[n\mu ]}\sum _{ j=[n\mu ]+1}^{[n\eta ]}g(X_{ i},X_{j})\right )^{2} + \frac{2} {n^{3}}E\left (\sum _{i=[n\mu ]+1}^{[n\eta ]}\sum _{ j=[n\eta ]+1}^{n}g(X_{ i},X_{j})\right )^{2} \\ & & \quad \ = \frac{2} {n^{3}}E\left (\sum _{i=1}^{[n\mu ]}\sum _{ j=[n\mu ]+1}^{[n\eta ]}g(X_{ i},X_{j})\right )^{2} + \frac{2} {n^{3}}E\left (\sum _{i=1}^{[n\eta ]-[n\mu ]}\sum _{ j=[n\eta ]-[n\mu ]+1}^{n-[n\mu ]}g(X_{ i},X_{j})\right )^{2} \\ & & \quad \ \leq C \frac{1} {n^{3}}\left ([n\mu ]([n\eta ] - [n\mu ]) + ([n\eta ] - [n\mu ])(n - [n\eta ])\right ) \leq \frac{C} {n} (\eta -\mu ) \\ \end{array}$$
(44)

using the stationarity of the process \((X_{n})_{n\in \mathbb{N}}\) and Lemma 1.

Proof of Proposition 1.

From Lemma 2 we obtain, using Chebyshev’s inequality,

$$\displaystyle{ P\left (\vert G_{n}(\eta ) - G_{n}(\mu )\vert \geq \epsilon \right ) \leq \frac{1} {\epsilon ^{2}} \frac{C} {n} (\eta -\mu ), }$$
(45)

for all ε > 0. Thus we get for 0 ≤ k ≤ m ≤ n with \(k,m,n \in \mathbb{N}\)

$$\displaystyle\begin{array}{rcl} P\left (\left \vert G_{n}\left (\frac{m} {n} \right ) - G_{n}\left (\frac{k} {n}\right )\right \vert \geq \epsilon \right )& \leq & \frac{1} {\epsilon ^{2}} E\left (G_{n}\left (\frac{m} {n} \right ) - G_{n}\left (\frac{k} {n}\right )\right )^{2} \\ & \leq & \frac{1} {\epsilon ^{2}} \frac{C} {n^{2}}(m - k) \leq \frac{1} {\epsilon ^{2}} \frac{C} {n^{5/3}}(m - k)^{4/3}{}\end{array}$$
(46)

as mk ≤ n. Now consider the variables

$$\displaystyle{ \zeta _{i} = \left \{\begin{array}{cl} G_{n}\left ( \frac{i} {n}\right ) - G_{n}\left (\frac{i-1} {n} \right )&\text{if }i = 1,\ldots,n - 1\\ 0 &\text{else } \end{array} \right. }$$
(47)

and suppose that \(S_{i} =\zeta _{1} +\zeta _{2} +\ldots +\zeta _{i}\) with S 0 = 0, then \(S_{i} = G_{n}( \frac{i} {n})\). In consequence the inequality (46) is equivalent to

$$\displaystyle{ P(\vert S_{m} - S_{k}\vert \geq \epsilon ) \leq \frac{1} {\epsilon ^{2}} \left [\frac{C^{3/4}} {n^{5/4}} (m - k)\right ]^{4/3}\quad \text{for}\quad 0 \leq k \leq m \leq n. }$$
(48)

So the assumption of Theorem 7 are satisfied with the variables (47) in the role of the ξ i , \(\beta = 1/2\), \(\alpha = 2/3\) and \(u_{l} = C^{3/4}/n^{5/4}\), u o  = 0 and hence

$$\displaystyle{ P\left (\max _{1\leq i\leq n-1}\vert S_{i}\vert \geq \epsilon \right ) \leq \frac{K} {\epsilon ^{2}} \left [\frac{C^{3/4}} {n^{5/4}} (n - 1)\right ]^{4/3} \leq \frac{KC} {\epsilon ^{2}n^{1/3}} }$$
(49)

where K depends only of α and β. Thus, (35) holds as \(n \rightarrow \infty\). □ 

7 Proof of Main Results

In this section, we will prove Theorems 1 and 2. Note that Theorem 2 is a direct consequence of Theorem 1, applied to anti-symmetric kernels. We will nevertheless present a direct proof of Theorem 2, since this proof is much simpler than the proof in the general case. Moreover, Theorem 2 covers those cases that are most relevant in applications.

The first part of the proof is identical for both Theorems 1 and 2. Note that, for each λ ∈ [0, 1], the statistic T n (λ) is a two-sample U-statistic. Thus, using the Hoeffding decomposition (11), we can write T n (λ) as

$$\displaystyle\begin{array}{rcl} T_{n}(\lambda )& =& \frac{1} {n^{3/2}}\left (\sum _{i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}(h_{ 1}(X_{i}) + h_{2}(X_{j}) + g(X_{i},X_{j}))\right ) \\ & =& \frac{1} {n^{3/2}}\left ((n - [n\lambda ])\sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i}) + [n\lambda ]\sum _{j=[n\lambda ]+1}^{n}h_{ 2}(X_{j}) +\sum _{ i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}g(X_{ i},X_{j})\right ){}\end{array}$$
(50)

By Proposition 1, we know that

$$\displaystyle{ \frac{1} {n^{3/2}}\sup _{0\leq \lambda \leq 1}\left \vert \sum _{i=1}^{[\lambda n]}\sum _{ j=[\lambda n]+1}^{n}g(X_{ i},X_{j})\right \vert \rightarrow 0}$$

in probability. Thus, by Slutsky’s lemma, it suffices to show that the sum of the first two terms, i.e.

$$\displaystyle{ \left (\frac{n - [n\lambda ]} {n^{3/2}} \sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i}) + \frac{[n\lambda ]} {n^{3/2}}\sum _{j=[n\lambda ]+1}^{n}h_{ 2}(X_{j})\right )_{0\leq \lambda \leq 1} }$$
(51)

converges in distribution to the desired limit process.

Proof of Theorem 2.

It remains to show that (51) converges in distribution to \(\sigma W^{(0)}(\lambda ),0 \leq \lambda \leq 1\), where (W (0)(λ))0 ≤ λ ≤ 1 is standard Brownian bridge on [0, 1], and where \(\sigma ^{2}\) is defined in (19). By antisymmetry of the kernel h(x, y), we obtain that \(h_{2}(x) = -h_{1}(x)\). Hence, in this case, (51) can be rewritten as

$$\displaystyle{\frac{n - [n\lambda ]} {n^{3/2}} \sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i})- \frac{[n\lambda ]} {n^{3/2}}\sum _{i=[n\lambda ]+1}^{n}h_{ 1}(X_{i}) = \frac{1} {n^{1/2}}\sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i})- \frac{[n\lambda ]} {n^{3/2}}\sum _{i=1}^{n}h_{ 1}(X_{i}).}$$

By Proposition 2.11 and Lemma 2.15 of Borovkova, Burton and Dehling [3], the sequence (h 1(X i )) i ≥ 1 is a 1-approximating functional with approximating constant \(C\sqrt{a_{k}}\). Since h 1(X i ) is bounded, the L 2-near epoch dependence in the sense of Wooldridge and White [21] also holds, with the same constants. Moreover, the underlying process (Z n ) n ≥ 1 is absolutely regular, and hence also strongly mixing. Thus we may apply the invariance principle in Corollary 3.2 of Wooldridge and White [21], and obtain that the partial sum process

$$\displaystyle{ \left ( \frac{1} {n^{1/2}}\sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i})\right )_{0\leq \lambda \leq 1} }$$
(52)

converges weakly to Brownian motion (W(λ))0 ≤ λ ≤ 1 with \(\mathop{\mathrm{Var}}\nolimits (W(1)) =\sigma ^{2}\). The statement of the Theorem follows with the continuous mapping theorem for the mapping \(x(t)\mapsto x(t) - tx(1),\;0 \leq t \leq 1\).

The proof of Theorem 1 requires an invariance principle for the partial sum process of \(\mathbb{R}^{2}\)-valued dependent random variables; see Proposition 2 below. For mixing processes, such invariance principles have been established even for partial sums of Hilbert space valued random vector, e.g. by Dehling [7]. In this paper, we provide an extension of these results to functionals of mixing processes.

Proposition 2.

Let \((X_{n})_{n\in \mathbb{N}}\) be a 1-approximating functional of an absolutely regular process with mixing coefficients (β(k)) and let h 1 (⋅), h 2 (⋅) be bounded1–continuous functions with mean zero, such that

$$\displaystyle{ \sum _{k}k^{2}(\beta (k) + a_{ k} +\phi (a_{k})) <\infty. }$$
(53)

Then, as \(n \rightarrow \infty\) ,

$$\displaystyle{ \left ( \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt]}\left (\begin{array}{*{10}c} h_{1}(X_{i}) \\ h_{2}(X_{i}) \end{array} \right )\right )_{0\leq t\leq 1}\longrightarrow \left (\begin{array}{*{10}c} W_{1}(t) \\ W_{2}(t) \end{array} \right )_{0\leq t\leq 1} }$$
(54)

where (W 1 (t),W 2 (t)) 0≤t≤1 is a two-dimensional Brownian motion with mean zero and covariance \(E(W_{k}(s)\,W_{l}(t)) =\min (s,t)\sigma _{kl}\) , where \(\sigma _{k,l}\) as defined in (16) .

Proof.

To prove (54), we need to establish finite dimensional convergence and tightness. Concerning finite-dimensional convergence, by the Cramér-Wold device it suffices to show the convergence in distribution of a linear combination of the coordinates of the vector

$$\displaystyle\begin{array}{rcl} & & \left ( \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{1}]}h_{ 1}(X_{i}), \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{1}]}h_{ 2}(X_{i}),\ldots, \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{j}]}h_{ 1}(X_{i}), \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{j}]}h_{ 2}(X_{i}),\right. \\ & & \qquad \qquad \qquad \qquad \qquad \ \quad \qquad \left.\ldots, \frac{1} {\sqrt{n}}\sum _{i=1}^{n}h_{ 1}(X_{i}), \frac{1} {\sqrt{n}}\sum _{i=1}^{n}h_{ 2}(X_{i})\right ), {}\end{array}$$
(55)

for \(0 = t_{0} <t_{1} <\ldots <t_{j} <\ldots <t_{k} = 1\). Any such linear combination can be expressed as

$$\displaystyle{ \sum _{j=1}^{k} \frac{1} {\sqrt{n}}\sum _{i=[nt_{j-1}]+1}^{[nt_{j}]}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})), }$$
(56)

for \((a_{j},b_{j})_{j=1}^{k} \in \mathbb{R}^{2\,k}\). By using the Cramér-Wold device again, the weak convergence of this sum is equivalent to the weak convergence of the vector

$$\displaystyle\begin{array}{rcl} & & \left ( \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt_{1}]}(a_{ 1}h_{1}(X_{i}) + b_{1}h_{2}(X_{i})),\ldots, \frac{1} {\sqrt{n}}\sum _{i=[nt_{j-1}]+1}^{[nt_{j}]}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})),\right. \\ & & \qquad \qquad \qquad \qquad \qquad \quad \qquad \left.\ldots, \frac{1} {\sqrt{n}}\sum _{i=[nt_{k-1}]+1}^{n}(a_{ k}h_{1}(X_{i}) + b_{k}h_{2}(X_{i}))\right ) {}\end{array}$$
(57)

to

$$\displaystyle\begin{array}{rcl} & & \big(a_{1}(W_{1}(t_{1}) - W_{1}(t_{0})) + b_{1}(W_{2}(t_{1}) - W_{2}(t_{0})),\ldots, \\ & & \qquad \qquad \qquad \qquad \qquad a_{k}(W_{1}(t_{k}) - W_{1}(t_{k-1})) + b_{k}(W_{2}(t_{k}) - W_{2}(t_{k-1}))\big).{}\end{array}$$
(58)

Since (X n ) n ≥ 1 is a 1-approximating functional, it can be coupled with a process consisting of independent blocks. Given integers \(L:= L_{n} = [n^{3/4}]\) and \(l_{n} = [n^{1/2}]\), we introduce the (l, L) blocking (B m ) m ≥ 0 of the variables (a j h 1(X i ) + b j h 2(X i )) with \(i = [nt_{j-1}] + 1,\ldots,[nt_{j}]\), j = 0, , k and

$$\displaystyle{ B_{m}:=\sum _{ i=(m-1)(L_{n}+l_{n})+1}^{m(L_{n}+(m-1)l_{n})}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})) }$$
(59)

and separating blocks

$$\displaystyle{ \tilde{B}_{m}:=\sum _{ i=mL_{n}+(m-1)l_{n}+1}^{m(L_{n}+l_{n})}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})). }$$
(60)

By Theorem 5 there exists a sequence of independent blocks \((B_{m}^{{\prime}})\) with the same blockwise marginal distribution as (B m ) and such that

$$\displaystyle{P\left (\vert B_{m} - B_{m}^{{\prime}}\vert \leq 2\alpha _{ l}\right ) \geq 1 -\beta (l) - 2\alpha _{l},}$$

where \(\alpha _{l}:= (2\sum _{k=[l_{n}/3]}^{\infty }a_{k})^{1/2}\). We can express the components of our vector (57) as a sum of blocks

$$\displaystyle\begin{array}{rcl} & & \sum _{i=[nt_{j}]+1}^{[nt_{j+1}]}(a_{ j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})) \\ & & \qquad \qquad =\sum _{ m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}B_{m} +\sum _{ m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}\tilde{B}_{m} +\sum _{R_{ j}}(a_{j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i})),{}\end{array}$$
(61)

where R j denotes the set of indices not contained in the blocks. Observe that by the Lemma 3 for any set \(A \subset \{ 1,\ldots,n\}\)

$$\displaystyle{ E\left (\sum _{i\in A}(a_{j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i}))\right )^{2} \leq C\#A }$$
(62)

and hence

$$\displaystyle{ E\left (\sum _{m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}\tilde{B}_{m}\right )^{2} \leq C \frac{n} {L_{n} + l_{n}}l_{n} \leq Cn^{3/4}, }$$
(63)

so it follows with the Chebyshev inequality that this term is negligible. For the last summand, we have that

$$\displaystyle{ E\left (\sum _{R_{j}}(a_{j}h_{1}(X_{i}) + b_{j}h_{2}(X_{i}))\right )^{2} \leq C2(L_{ n} + l_{n}) \leq Cn^{3/4}. }$$
(64)

Furthermore, we need to show that we can replace the blocks B m by the independent coupled blocks \(B_{m}^{{\prime}}\):

$$\displaystyle\begin{array}{rcl} P\left (\left \vert \frac{1} {\sqrt{n}}\sum _{m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}(B_{m} - B_{m}^{{\prime}})\right \vert>\epsilon \right )& \leq & \sum _{ m=\left [ \frac{nt_{j}} {L+l}\right ]+1}^{\left [\frac{nt_{j+1}} {L+l} \right ]}P\left (\vert B_{m} - B_{m}^{{\prime}}\vert> \frac{\epsilon \sqrt{n}} {n^{1/4}}\right ) {}\\ & \leq & n^{\frac{1} {4} }\left (\beta ([\frac{l_{n}} {3} ]) +\alpha _{[\frac{l_{n}} {3} ]}\right ) \rightarrow 0 {}\\ \end{array}$$

as \(n \rightarrow \infty\) by our conditions on the mixing coefficients and approximation constants. Here we used that fact that \(\alpha _{n} \rightarrow 0\) and thus, for almost all \(n \in \mathbb{N}\),

$$\displaystyle{ P\left (\vert B_{m} - B_{m}^{{\prime}}\vert>\epsilon n^{1/4}\right ) \leq P\left (\vert B_{ m} - B_{m}^{{\prime}}\vert> 2\alpha _{ l_{n}}\right ). }$$
(65)

With the above arguments the result holds if we show the convergence of

$$\displaystyle{ \frac{1} {\sqrt{n}}\left (\sum _{m=\left [ \frac{nt_{0}} {L+l}\right ]+1}^{\left [ \frac{nt_{1}} {L+l}\right ]}B_{m}^{{\prime}},\ldots,\sum _{ m=\left [ \frac{nt_{k}} {L+l}\right ]+1}^{\left [\frac{nt_{k+1}} {L+l} \right ]}B_{m}^{{\prime}}\right ). }$$
(66)

Since this vector has independent components, we only need to show the one-dimensional convergence, which is a consequence of Theorem 4, using the summability condition (53).

We now turn to the question of tightness and show that, for each ε and η, there exist a δ, 0 < δ < 1, and an integer n 0 such that, for 0 ≤ t ≤ 1,

$$\displaystyle{ \frac{1} {\delta } P\left (\sup _{t\leq s\leq t+\delta }\vert Y _{n}(s) - Y _{n}(t)\vert \geq \epsilon \right ) \leq \eta,\quad n \geq n_{0} }$$
(67)

with

$$\displaystyle{ Y _{n}(t) = \frac{1} {\sigma \sqrt{n}}\sum _{i=1}^{[nt]}h_{ 1}(X_{i}) + (nt - [nt]) \frac{1} {\sigma \sqrt{n}}h(X_{[nt]+1}) }$$
(68)

(h 2 can be treated in the same way) and by Theorem 8, this condition reduces to: For each positive ε there exist a α > 1 and an integer n 0, s. t.

$$\displaystyle{ P\left (\max _{i\leq n}\left \vert \sum _{j=1}^{i}h_{ 1}(X_{j})\right \vert \geq \lambda \sqrt{n}\right ) \leq \frac{\epsilon } {\lambda ^{2}},\quad n \geq n_{0}. }$$
(69)

Let t ≥ s, s, t ∈ [0, 1]. By Lemma 4 we get

$$\displaystyle\begin{array}{rcl} E\left (\left \vert \frac{1} {\sqrt{n}}\sum _{i=1}^{[nt]}h_{ 1}(X_{i}) - \frac{1} {\sqrt{n}}\sum _{i=1}^{[ns]}h_{ 1}(X_{i})\right \vert ^{4}\right )& =& \frac{1} {n^{2}}E\left (\sum _{[ns]+1}^{[nt]}h_{ 1}(X_{i})\right )^{4} \\ & \leq & \frac{1} {n^{2}}(([nt] - [ns])^{2}C) {}\end{array}$$
(70)

and this implies

$$\displaystyle{ P\left (\left \vert \frac{1} {\sqrt{n}}\sum _{i=1}^{m}h_{ 1}(X_{i}) - \frac{1} {\sqrt{n}}\sum _{i=1}^{k}h_{ 1}(X_{i})\right \vert \geq \epsilon \right ) \leq \frac{1} {\epsilon ^{4}} \left (\frac{C^{1/2}} {n} (m - k)\right )^{2}. }$$
(71)

By Theorem 7

$$\displaystyle{ P\left (\max _{i\leq n}\left \vert \sum _{j=1}^{i}h_{ 1}(X_{j})\right \vert \geq \epsilon \sqrt{n}\right ) \leq \frac{K} {\epsilon ^{4}} \left (\frac{C^{1/2}} {n} (n - 1)\right )^{2} }$$
(72)

and we get the assertion. Thus we have established tightness of each of the two coordinates of the partial sum process, which implies tightness of the vector-valued process.

Proof of Theorem 1.

From Proposition 2 we obtain that

$$\displaystyle{ \left ( \frac{1} {\sqrt{n}}\sum _{i=1}^{[n\lambda ]}\left (\begin{array}{*{10}c} h_{1}(X_{i}) \\ h_{2}(X_{i}) \end{array} \right )\right )_{0\leq \lambda \leq 1}\longrightarrow \left (\begin{array}{*{10}c} W_{1}(\lambda ) \\ W_{2}(\lambda ) \end{array} \right )_{0\leq \lambda \leq 1}, }$$
(73)

in distribution on the space (D([0, 1]))2. We consider the functional given by

$$\displaystyle{ \left (\begin{array}{*{10}c} x_{1}(t) \\ x_{2}(t) \end{array} \right )\mapsto (1-t)x_{1}(t)+t(x_{2}(1)-x_{2}(t)),\quad 0 \leq t \leq 1. }$$
(74)

This is a continuous mapping from (D[0, 1])2 to D[0, 1], so we may apply the continuous mapping theorem to (73), and obtain

$$\displaystyle\begin{array}{rcl} & & \left (\frac{n - [n\lambda ]} {n^{3/2}} \sum _{i=1}^{[n\lambda ]}h_{ 1}(X_{i}) + \frac{[n\lambda ]} {n^{3/2}}\sum _{j=[n\lambda ]+1}^{n}h_{ 2}(X_{j})\right )_{0\leq \lambda \leq 1} {}\\ & & \qquad \qquad \qquad \qquad \qquad \qquad \ \ \longrightarrow \left ((1-\lambda )W_{1}(\lambda ) +\lambda (W_{2}(1) - W_{2}(\lambda ))\right )_{0\leq \lambda \leq 1}. {}\\ \end{array}$$

Together with the remarks at the beginning of this section, this proves Theorem 1.