1 Introduction

We are interested in detecting possible differences between the distributions of real-valued random variables \(X_1, X_2, \ldots ,X_n\). In practice, this issue can be of primary importance for data from industrial quality control, financial markets, medical diagnostics, hydrology, climatology etc.

This statistical matter is known as change-points problem whose theory, well developed for independent data, has considerably been studied in the literature both from parametric and non-parametric point of view. Since Page (1954), so many tests have been proposed for testing changes in the distribution of iid data. Among others, Chernoff and Zacks (1964) propose test statistics for detecting shifts in the mean of a normal distribution function. Their results are generalized to exponential family by Kander and Zacks (1966), Gardner (1969) and MacNeill (1974). Matthews et al. (1985) study maximal score statistics to test for constant hazard against a change-point alternative. Haccou et al. (1988) propose a likelihood ratio test for a change-point in a sequence of independent exponentially distributed random variables. They prove their test is optimal in the sense of Bahadur. Yao and Davis (1986) consider the asymptotic behavior of the likelihood ratio statistic for testing a shift in mean in a sequence of independent Gaussian random variables. Csörgő and Horváth (1987) propose statistics based on processes of linear rank statistics with quantile scores. A review on non-parametric procedures is given by Wolfe and Schechtman (1984), who summarize among others, pioneer works on change-points as Page (1954, 1955), Bhattacharya and Zhou (2017), Sen and Srivastava (1975) and Pettitt (1979).

These last years there is a growing interest in change-points study in time series data. Most of the techniques and approaches used are mainly based either on testing for the existence of changes or for their locations, or on estimating the locations. Some relevant references on change-points estimation are Härdle and Tsybakov (1997), Härdle et al. (1998), Bardet and Wintenberger (2009), Döring (2010, 2011), Ciuperca (2011), Bardet and Kengne (2014), Amano (2012), Horváth and Hušková (2005), Yang and Song (2014), Mohr and Selk (2020) and Yang et al. (2020). A non-exhaustive list of references on testing approaches are, among others, Kengne (2012), Chen et al. (2011), Dehling et al. (2013), Dehling et al. (2015), Wang and Phillips (2012), Francq and Zakoïan (2012), Bardet et al. (2012), Zhou (2014), Fotopoulos et al. (2009), Huh (2010), Enikeeva et al. (2018), Gombay (2008), Gombay and Serban (2009), Hlávka et al. (2020) and Ma et al. (2020) which uses both estimation and testing. Meintanis (2016) gives an interesting survey of testing procedures based on the empirical characteristic function. We would also like to mention the interesting and related work of Rackauskas and Wendler (2020) who deal with a robust test based on the Wilcoxon statistic for detecting epidemic changes. While the asymptotic behavior of the test statistic is studied under the null hypothesis, with techniques close to ours, its consistency is only discussed.

In almost all the existing testing papers, except perhaps Fotopoulos et al. (2009), Khakhubia (1987), Bhattacharyya and Johnson (1968), Dehling et al. (2013), Dehling et al. (2017a) and Dehling et al. (2017b), the local power is rarely studied. This issue is considered in this paper where the tests studied are derived from basic processes of the general form

$$\begin{aligned} Z_n^*(\lambda )= n^{-3/2}\sum _{i=1}^{[n\lambda ]}\sum _{j=[n\lambda ]+1}^{n}h(X_{i},X_{j}), \ 0\le \lambda \le 1, \end{aligned}$$

with \(h:{\mathbb {R}}^{2}\rightarrow {\mathbb {R}}\) a kernel function.

The asymptotic distribution of a related process has been studied in a Hölder space by Rackauskas and Wendler (2020) for stationary mixing data and antisymmetric h, while this process has been studied in a Skorohod space for instance by Csörgő and Horváth (1988) for iid data, and by Dehling et al. (2015) for data assumed to be functions of mixing random variables satisfying some other conditions such as1-approximating functional, 1-sided functional, bounded density function, with kernels h being 1-continuous. These conditions are avoided here by our mixing assumption on the data, and our study is done in a Skorohod space. Furthermore, besides the Kolmogorov-Smirnov (KS) type test usually studied in the literature, we study a Cramer-von Mises (CM) version which has the advantage that its theoretical limiting distribution under the null hypothesis can be approximated for any kernel h. We restrict our study to the classical case of one change-point detection. But our results can be generalized to multi-change-points detection which we postpone to a future paper.

The paper is organized as follows. In Sect. 2, we define useful quantities such as the test statistics, and we list some assumptions. In Sect. 3 we study the asymptotic properties of our tests statistics under the null hypothesis, under a sequence of local alternatives and under fixed alternatives. Practical considerations are presented and discussed in Sect. 4, while the last section contains the proofs of the main results.

2 General definitions and assumptions

For cumulative distribution functions Q and R, denote by \(\theta (Q,R)\) the following real number

$$\begin{aligned} \theta (Q,R)=\int \int h(x,y)dQ(x)dR(y). \end{aligned}$$

For any \(i=1,2, \ldots ,n\), let \(F_i\) be the cumulative distribution function of \(X_i\). We aim to check possible differences between the \(F_i\)’s. We restrict ourselves to checking if there exists only one index \(i_0\) for which \(F_{i_0}\) and \(F_{i_0+1}\) are different. We study this problem by testing the hypothesis \({\mathcal {H}}_0\) against the alternative \({\mathcal {H}}_1\), defined respectively by

$$\begin{aligned}&{\mathcal {H}}_0: F_1(x)=F_2(x)= \ldots =F_n(x), \ x\in {\mathbb {R}}\\&{\mathcal {H}}_1: \exists \ \lambda _0\in (0,1) \text{ such } \text{ that } F_1(x)=F_2(x)= \ldots =F_{[n\lambda _0]}(x)=F(x), x\in {\mathbb {R}}\ \text {and}\\&\quad F_{[n\lambda _0]+1}(x)= \ldots =F_n(x)=G(x), x\in {\mathbb {R}},\ \text {and}\ \theta (F,F)\ne \theta (F,G). \end{aligned}$$

Figure 1 exhibits the chronograms of some time series each of size 200, owning a change-point at \(t=100\). The first graphic in the first row shows a change in the mean of a shifted Gaussian white noise, while the one at its right shows a change in its variance. The first graphic in the second row shows a change in both the mean and the variance of a shifted Gaussian white noise, and the second shows a change in the autocorrelation of a Gaussian AR(1) model.

In order to evaluate the capacity of the tests to detect weak changes, we also consider the local alternatives \({\mathcal {H}}_{1,n}\) of the form

$$\begin{aligned}&{\mathcal {H}}_{1,n}: \exists \ \lambda _0\in (0,1) \text{ such } \text{ that } F_1(x)=F_2(x)= \ldots =F_{[n\lambda _0]}(x)=F(x),\ \text {and}\\&\quad F_{[n\lambda _0]+1}(x)= \ldots =F_n(x)=G^{(n)}(x), x\in {\mathbb {R}},\ \text {and}\\&\quad \theta (F,G^{(n)})= \theta (F,F) +n^{-1/2}[A+o(1)],\ \text {for some}\ \gamma \in {\mathbb {R}}^*. \end{aligned}$$

Remark 1

Particular examples of local alternatives \({\mathcal {H}}_{1,n}\) are those for which there exists a constant \(\gamma \) such that \(G^{(n)}(x)=F(x+n^{-1/2}\gamma )\) and the kernel function h is twice differentiable with finite integral \(\int \int (\partial h(x,y)/\partial y)dF(x)dF(y)\) and bounded second-order derivatives \(\partial ^2h(x,y)/ \partial ^2y\). This can be checked easily by a suitable application of the Taylor-Young formula. With this example, one finds

$$\begin{aligned} A=-\gamma \int \int {\partial h \over \partial u}(x,u)dF(x)dF(u). \end{aligned}$$

In the purpose of solving our testing problem, the tests we are going to use are based on the following KS and CM statistics

$$\begin{aligned} T_{1,n}= & {} \max _{1 \le k \le n-1}\left| n^{-3/2}\sum _{i=1}^k \sum _{j=k+1}^{n}\big \{h(X_{i},X_{j})-\theta ({{\widehat{F}}}, {{\widehat{F}}})\big \} \right| \end{aligned}$$
(1)
$$\begin{aligned} T_{2,n}= & {} {1 \over n}\sum _{1 \le k \le n-1}\left\{ n^{-3/2}\sum _{i=1}^k \sum _{j=k+1}^{n}\left[ h(X_{i},X_{j})-\theta ({{\widehat{F}}}, {{\widehat{F}}})\right] \right\} ^2, \end{aligned}$$
(2)

where \({{\widehat{F}}}\) stands for any consistent estimator of F, a simple example being the empirical cumulative distribution function.

Denote by [x] the integer part of any real number x. Noting that for any \(k \in \{1, \ldots , n-1 \}\), there exists \(\lambda _* \in [0,1]\) such that \(k=[\lambda _*n]\), one can write, asymptotically,

$$\begin{aligned} T_{1,n}= & {} \sup _{\lambda \in [0,1]}\left| Z_n(\lambda )\right| \\ T_{2,n}= & {} \int _0^1 Z_n^2(\lambda ) d \lambda , \end{aligned}$$

where \(Z_n\) stands for the following stochastic process

$$\begin{aligned} Z_n(\lambda )= n^{-3/2}\sum _{i=1}^{[n\lambda ]}\sum _{j=[n\lambda ]+1}^{n}\left[ h(X_{i},X_{j})-\theta (F,F)\right] , \ 0\le \lambda \le 1. \end{aligned}$$
(3)

Define the following U-statistic \(U_n\) with kernel h, and the following functions

$$\begin{aligned}&U_n =\frac{2}{n(n - 1)}\sum _{1\le i< j\le n}h(X_i,X_j) \\&h_{F,1}(x)=\int h(x,y)dF(y)-\theta (F,F)\\&h_{F,2}(y)=\int h(x,y)dF(x)-\theta (F,F)\\&h_{G,1}(x)=\int h(x,y)dG(y)- \theta (F,G)\\&h_{G,2}(y)=\int h(x,y)dF(x)- \theta (F,G)\\&g_F(x,y)=h(x,y)-h_{F,1}(x)-h_{F,2}(y)+\theta (F,F)\\&g_G(x,y)=h(x,y)- h_{G,1}(x)- h_{G,2}(y)+ \theta (F,G). \end{aligned}$$

Consider the Hoeffding decomposition of \(U_n\) under \({\mathcal {H}}_0\)

$$\begin{aligned} U_n = \theta (F,F) + U^{(1)}_{n,1} + U^{(1)}_{n,2} + U^{(2)}_n, \end{aligned}$$
(4)

where

$$\begin{aligned}&U^{(1)}_{n,1} = n^{-1}\sum ^n_{i=1}h_{F,1}(X_i)\\&U^{(1)}_{n,2} = n^{-1}\sum ^n_{i=1}h_{F,2}(X_i)\\&U^{(2)}_n = \frac{2}{n(n - 1)}\sum _{1\le i< j\le n}\left[ h(X_i,X_j) - h_{F,1}(X_i) - h_{F,2}(X_j)\right] - \theta (F,F). \end{aligned}$$

Also, define the following numbers

$$\begin{aligned} \sigma _{kl} = {\mathbb {E}}\left[ h_{F,k}(X_1)h_{F,l}(X_1)\right] + 2\sum _{j=1}^{\infty }{\mathbb {C}}\text{ ov } \left( h_{F,k}(X_1), h_{F,l}(X_{1+j})\right) , \ k, l = 1, 2. \end{aligned}$$

We make the following assumptions :

  1. (A1)

    The sequence \(\{{X}_i\}_{i\in {\mathbb {N}}}\) is absolutely regular with the rate

    $$\begin{aligned} \beta (k)={\mathcal {O}}(\tau ^k), \ \ 0<\tau <1. \end{aligned}$$
    (5)
  2. (A2)

    \(\{{X}_i\}_{i\in {\mathbb {N}}}\) is stationary.

  3. (A3)

    We consider \((Y_{i})_{1\le i\le n}\) a sequence of stationary and absolute regular random variables with rate (5). We assume the cumulative distribution function of the \(Y_i\)’s is G.

  4. (A4)

    We consider \((Y_{ni})_{1\le i\le n, n\ge 1}\) a sequence of stationary and absolute regular random variables with cumulative distribution function \(G^{(n)}(x)=F(x+\eta n^{-{1\over 2}})\). We assume the cumulative distribution functions \(G_{ij}^{(n)}\) and \(G_{ij}^{*(n)}\) of the \((Y_{ni}, Y_{nj})\)’s and \((X_i, Y_{nj})\)’s respectively satisfy

    $$\begin{aligned} \lim _{n \rightarrow \infty }G_{ij}^{(n)}(x, y)=F_{ij}(x, y) \text{ and } \lim _{n \rightarrow \infty }G_{ij}^{*(n)}(x, y)= F_{ij}(x, y), \ 1\le i<j \le n, \end{aligned}$$

    where \(F_{ij}\) is the cumulative distribution function of \((X_i,X_j)\).

    • We recall from Harel and Puri (1994) that a non-necessarily stationary triangular sequence \(\{{\mathcal {V}}_{ni}, 1\le i \le n,n \ge 1\}\) is absolutely regular if \(\beta (k) \longrightarrow 0\), as \(k \rightarrow \infty \), where

      $$\begin{aligned} \beta (k)=\sup _{n\in {\mathbb {N}}}\sup _{k \le n}\max _{1\le j\le n-k} {\mathbb {E}}\left( \sup _{A\in {\mathcal {A}}_{n,j+k}^\infty }\left| {\mathbb {P}}(A\mid {\mathcal {A}}_{n,0}^j)-{\mathbb {P}}(A)\right| \right) , \end{aligned}$$

      with \({\mathcal {A}}_{n,i}^j\) standing for the \(\sigma \)-algebra generated by \({\mathcal {V}}_{ni},\ldots ,{\mathcal {V}}_{nj}, \ i, j \in {\mathbb {N}}\cup \{\infty \}\). It will be said to be strong mixing or \(\alpha \)-mixing if \(\alpha (k) \longrightarrow 0\) as \(k \rightarrow \infty \), where

      $$\begin{aligned} \alpha (k)=\sup _{n\in {\mathbb {N}}}\max _{1\le j\le n-k} \sup _{A\in {\mathcal {A}}_{n,j+k}^\infty , B \in {\mathcal {A}}_{n,0}^j}\left| {\mathbb {P}}(A \cap B)-{\mathbb {P}}(A) {\mathbb {P}}(B)\right| . \end{aligned}$$
    • Note that a \(\beta \)-mixing sequence of random variables is also \(\alpha \)-mixing.

Fig. 1
figure 1

First row : change in the mean and change in the variance of a shifted white noise. Second row : change in both the mean and the variance of a shifted white noise, and change in the autocorrelation of an AR(1) model. The red lines represent the mean. The green, is the chronogram before the change occurred, and the blue is that after the change occurred

Remark 2

We assume a geometrical mixing rate by convenience. We believe our results can be established as well for arithmetical mixing rates to be found.

3 Asymptotics

In this section, we state all our theoretical results. Most of the proofs are postponed to the last section.

Theorem 1

Assume that the assumptions (A1)–(A3) hold. Then, under \({\mathcal {H}}_0\), if \(\max \{\sup _{i,j} {\mathbb {E}}\left[ |h(X_i, X_j)|^{2+\delta } \right] , \int \int _{{\mathbb {R}}^2}|h(x, y)|^{2+\delta }dF(x)dF(y) \} < \infty \) for some \(\delta > 0\) and the absolute regularity condition (5) is satisfied, then for any \(k,l=1,2\), \(\sigma _{kl} < \infty \).

If in addition \(\sigma _{kl} >0\), \(1\le k,l \le 2\), then the sequence of processes of \(\{Z_{n}(\lambda ); 0\le \lambda \le 1\}_{n\in {\mathbb {N}}}\) converges in distribution towards a zero-mean Gaussian process with representation

$$\begin{aligned} Z(\lambda ) = (1 - \lambda )W_1(\lambda ) + \lambda (W_2(1) - W_2(\lambda )), \ 0 \le \lambda \le 1, \end{aligned}$$

where \(\{W_1(\lambda ), W_2(\lambda )\}_{0\le \lambda \le 1}\) is a two-dimensional zero-mean Brownian motion with covariance kernel matrix with entries \({\mathbb {C}}{ov}(W_k(s), W_l(t)) =\min (s, t)\sigma _{kl}, \ k,l=1,2.\)

Proof

See Appendix.

Remark 3

The covariance kernel of the Gaussian process Z defined in Theorem 1 is given for all \(s, t \in [0,1]\) by

$$\begin{aligned} \varDelta (s,t)= & {} {\mathbb {C}}\text{ ov }(Z(s), Z(t)) \nonumber \\&= \sigma _{11}(1-s)(1-t) \min (s, t)+\sigma _{22}st[1-s-t+\min (s, t)]\nonumber \\&+\sigma _{12}\{t(1-s)[s-\min (s, t)]+s(1-t)[t-\min (s, t)]\}. \end{aligned}$$
(6)

Theorem 2

Assume (A1), (A2) and (A4) hold, h is twice differentiable with bounded second-order derivatives \(\partial ^{2}h(x,y)/\partial x\partial y\), and the integral \(\int \int (\partial h(x,y)/\partial y)dF(x)dF(y)\) is finite. Then, under \({\mathcal {H}}_{1,n}\), if

$$\begin{aligned}&\sup _{1\le i,j \le n}{\mathbb {E}}\left[ |h(X_i,X_j)|^{2+\delta } \right] , \ \ \sup _{n \ge 1}\sup _{i,j}{\mathbb {E}}\left[ |h(Y_{ni},Y_{nj})|^{2+\delta } \right] ,\\&\sup _{n \ge 1} \sup _{1\le i,j \le n}{\mathbb {E}}\left[ |h(X_i,Y_{nj})|^{2+\delta }\right] , \ \ \int \int _{{\mathbb {R}}^2}|h(x,y)|^{2+\delta }dF(x)dF(y),\\&\sup _{n \ge 1} \int \int _{{\mathbb {R}}^2}|h(x,y)|^{2+\delta }dG^{(n)}(x)dG^{(n)}(y), \ \ \sup _{n \ge 1}\int \int _{{\mathbb {R}}^2}|h(x,y)|^{2+\delta } dF(x)dG^{(n)}(y) \end{aligned}$$

are finite for some \(\delta > 0\), if for any \(k,l=1,2\), \(\sigma _{kl} >0\), then the sequence of processes \(\{Z_{n}(\lambda ); 0\le \lambda \le 1\}_{n\in {\mathbb {N}}}\) converges in distribution towards a Gaussian process \({{\widetilde{Z}}}\) with mean \((1-\lambda )\lambda A\) and representation

$$\begin{aligned} {{\widetilde{Z}}}(\lambda )=(1- \lambda )\lambda A+ Z(\lambda ), \ \ 0\le \lambda \le 1, \end{aligned}$$

where \(\{Z(\lambda )\}_{0\le \lambda \le 1}\) is the zero-mean Gaussian process defined in Theorem 1.

Proof

See Appendix. \(\square \)

Theorem 3

Assume (A1)–(A3) hold and that under \({\mathcal {H}}_1\), the integrability conditions in Theorem 2 hold. Then,

$$\begin{aligned} n^{-1/2}Z_n^*(t){\mathop {\longrightarrow }\limits ^{ a.s. }}_{n\rightarrow \infty } \left\{ \begin{array}{ll} \theta (F,F) t(\lambda _0-t)+\theta (F,G) t(1-\lambda _0), &{} 0\le t\le \lambda _0\\ \theta (G,G)(t-\lambda _0)(1-t)+\theta (F,G)\lambda _0(1-t), &{} \lambda _0\le t<1. \end{array} \right. \end{aligned}$$
(7)

Proof

See Appendix.

Theorem 4

Assume that the assumptions of Theorem 2 hold. Let \((Z(\lambda ) : 0 \le \lambda \le 1)\) be the limiting process defined in Theorems 1 and 2 , and \(\varDelta \) its covariance kernel. Then

  1. (i)

    Under \({\mathcal {H}}_0\), as n tends to infinity, one has the following convergence in distribution,

    $$\begin{aligned}&T_{1,n} \longrightarrow \sup _{\lambda \in [0,1]}\left| Z(\lambda ) \right| \\&T_{2,n} \longrightarrow \sum _{j \ge 1} \zeta _j \chi _j^2, \end{aligned}$$

    where the \( \chi _j^{2}\)’s are iid chi-square random variables with one degree of freedom and the \(\zeta _j\)’s are standing for the eigenvalues of the linear integral operator \(\nabla \) defined for any square integrable function \(\tau \) on [0, 1] by

    $$\begin{aligned} \nabla [\tau (\cdot )]= \int _0^1 \varDelta (\cdot ,s) \tau (s)ds. \end{aligned}$$
    (8)
  2. (ii)

    Under \({\mathcal {H}}_{1,n}\), as n tends to infinity, one has the following convergence in distribution,

    $$\begin{aligned}&T_{1,n} \longrightarrow \sup _{\lambda \in [0,1]}\left| (1-\lambda ) \lambda A+ Z(\lambda ) \right| \\&T_{2,n} \longrightarrow \sum _{j \ge 1} \zeta _j \chi _j^{*2}, \end{aligned}$$

    where the \( \chi _j^{*2}\)’s are iid non-central chi-square random variables with one degree of freedom and non-centrality parameters \(\rho _j^2\zeta _j^{-1}\) with the \(e_j\)’s standing for the eigenvectors of the integral operator \(\nabla \), associated with the eigen-value \(\zeta _j\), and

    $$\begin{aligned} \rho _j=A \int _0^1 \lambda (1-\lambda ) e_j(\lambda ) d \lambda . \end{aligned}$$
  3. (iii)

    Under \({\mathcal {H}}_1\), as n tends to infinity, one has the following convergence in probability,

    $$\begin{aligned} T_{1,n} \longrightarrow \infty , \ \ \ T_{2,n} \longrightarrow \infty . \end{aligned}$$

Proof

(i):

From Theorem 1 and the continuous mapping theorem, \(T_{1,n}\) and \(T_{2,n}\) converge in distribution respectively to \(\sup _{\lambda \in [0,1]} |Z(\lambda )|\) and \(\int _0^1 Z^2(\lambda )d\lambda \).

Now, we show that \(\int _0^1 Z^2(\lambda )d\lambda \) has the same distribution as a sum of weighted iid chi-square distribution with one degree of freedom. Noting that \(\varDelta \) is a Mercer kernel (it is easy to prove), it follows from Riesz and Nagy (1972) that the integral operator defined by (8) admits eigenvalues \(\zeta _1\ge \zeta _2\ge \ldots \ge 0\) with associated eigenfunctions \(e_1,e_2, \ldots \) forming an orthonormal basis of \(L^2[0,1]\), the set of square integrable functions on [0, 1]. From this result, it is an easy matter that the zero-mean Gaussian process Z as a function in \(L^2[0,1]\), has the Karhunen-Loève representation

$$\begin{aligned} Z(\lambda )=\sum _{j \ge 1} N_j e_j(\lambda ), \lambda \in [0,1], \end{aligned}$$

with the independent random variables’ \(N_j\)’s defined as \(N_j=\int _{[0,1]}Z(\lambda )e_j(\lambda )d \lambda \sim {\mathcal {N}}(0, \zeta _j)\). One easily deduces from this that in distribution

$$\begin{aligned} \int _0^1 Z^2(\lambda )d\lambda =\sum _{j \ge 1} \zeta _j \chi _j^2, \end{aligned}$$

where the \(\chi _j^2\)’s are iid chi-square random variables with one degree of freedom.

(ii):

From Theorem 2 and the continuous the mapping theorem, \(T_{1,n}\) and \(T_{2,n}\) converge in distribution respectively to \(\sup _{\lambda \in [0,1]} |(1-\lambda ) \lambda A+Z(\lambda )|\) and \(\int _0^1 |(1-\lambda ) \lambda A+Z(\lambda )|^2 d\lambda \).

For the same reasons as above, one has the decomposition

$$\begin{aligned} (1-\lambda ) \lambda A+Z(\lambda )=\sum _{j \ge 1} {{\widetilde{N}}}_j e_j(\lambda ), \lambda \in [0,1], \end{aligned}$$

with the independent random variables \({{\widetilde{N}}}_j\)’s defined as \({{\widetilde{N}}}_j=\int _0^1Z(\lambda )e_j(\lambda )d \lambda \sim {\mathcal {N}}(\rho _j, \zeta _j)\).

It follows from this that, in distribution,

$$\begin{aligned} \int _0^1 | (1-\lambda ) \lambda A+Z(\lambda )|^2d\lambda =\sum _{j \ge 1} \zeta _j \chi _j^{*2}, \end{aligned}$$

where the \(\chi _j^{*2}\)’s are non-central iid chi-square random variables with one degree of freedom and non-centrality parameter \(\rho _j^2 \zeta _j^{-1}\).

(iii):

The last part is an easy consequence of Theorem 3.

\(\square \)

Define \(\sigma ^2\) by

$$\begin{aligned} \sigma ^2={\mathbb {V}}\text{ ar }(h_{F,1}(X_1))+2 \sum _{j=1} {\mathbb {C}}\text{ ov } (h_{F,1}(X_1),h_{F,1}(X_{1+j})). \end{aligned}$$

Corollary 1

Assume that the assumptions of Theorem 2 hold, and that h is such that its associated \(h_{F,1}\) and \(h_{F,2}\) satisfy \(h_{F,1}(x)=-h_{F,2}(x)\). Then

  1. (i)

    Under \(H_0\), as n tends to infinity, one has the following convergences in distribution

    $$\begin{aligned}&T_{1,n} \longrightarrow \sigma \sup _{\lambda \in [0,1]}\left| W^0(\lambda ) \right| \\&T_{2,n} \longrightarrow \sigma ^2 \sum _{j \ge 1} {1 \over j^2 \pi ^2} \chi _j^2. \end{aligned}$$
  2. (ii)

    Under \(H_{1,n}\), as n tends to infinity, one has the following convergences in distribution

    $$\begin{aligned}&T_{1,n} \longrightarrow \sup _{\lambda \in [0,1]}\left| (1-\lambda ) \lambda A+ \sigma W^0(\lambda ) \right| \\&T_{2,n} \longrightarrow \sum _{j \ge 1} {1 \over j^2 \pi ^2} \chi _j^{*2}, \end{aligned}$$

where \(W^0\) is the Brownian bridge on [0, 1], the \( \chi _j^{2}\)’s and \( \chi _j^{*2}\)’s are as in Theorem 4 but the non-centrality parameters are \(2A^2 \left\{ {2[1-(-1)^j] / j\pi } \right\} ^2 \sigma ^{-2}\).

Proof

i- If h is such that its associated \(h_{F,1}\) and \(h_{F,2}\) satisfy \(h_{F,1}(x)=-h_{F,2}(x)\), then from the proof of Theorem 1 one sees that \(W_1(\lambda )=-W_2(\lambda )\). Whence, the representation of the limit process Z in that theorem reduces to

$$\begin{aligned} Z(\lambda )=W_1(\lambda )-\lambda W_1(1), \ \lambda \in [0,1], \end{aligned}$$

where for any \( \lambda \in [0,1]\), \(W_1(\lambda )= \sigma W^0(\lambda )\) with \(W^0(\lambda )\) standing for the Brownian bridge on [0, 1]. Thus, again, from the continuous mapping theorem, one has that \(T_{1,n}\) and \(T_{2,n}\) converge in distribution respectively to \(\sup _{\lambda \in [0,1]} |\sigma W^0(\lambda ) |\) and \(\sigma ^2 \int _0^1| W^0(\lambda )|^2d\lambda \). Using the Karhunen-Loève expansion of the Brownian bridge given e.g. in Shorack and Wellner (1986) or Pycke (2001), one has \(\zeta _j={1 \over j^2 \pi ^2}\) and \(e_j(\lambda )=\sqrt{2} \sin (j \pi \lambda )\), \(j \ge 1\), and the convergence in distribution of \(T_{2,n}\) stated in the theorem follows by elementary computations.

ii- From Theorem 4, \(T_{1,n}\) and \(T_{2,n}\) converge in distribution respectively to \(\sup _{\lambda \in [0,1]} |(1-\lambda ) \lambda A+\sigma W^0(\lambda )|\) and \(\int _0^1 |(1-\lambda ) \lambda A+\sigma W^0(\lambda )|^2 d\lambda \).

The convergence result stated for \(T_{2,n}\) in the corollary results from the Karhunen-Loève expansion of the Gaussian process \((1-\lambda ) \lambda A+\sigma W^0(\lambda )\), as sketched in Sect. 4, and for which more details can be found in Ngatchou-Wandji (2009).

Remark 4

It is easy to check that anti-symmetric kernels h are such that their associated \(h_{F,1}\) and \(h_{F,2}(x)\) satisfy the property \(h_{F,1}(x)=-h_{F,2}(x)\).

Remark 5

In the context of Corollary 1, under \({\mathcal {H}}_0\), the asymptotic distributions of \(T_{1,n}\) and \(T_{2,n}\) can be approximated, as done in Sect. 4. Under \({\mathcal {H}}_{1,n}\) it is not easy to do this for the asymptotic distribution of \(T_{1,n}\). This is a great disadvantage over \(T_{2,n}\) whose asymptotic distribution under \({\mathcal {H}}_0\) as well as under \({\mathcal {H}}_{1,n}\) can be approximated, even for more general kernel h, by using Theorem 4, and proceeding in a way fully described in Ngatchou-Wandji (2009).

4 Practical considerations

4.1 Numerical simulations

Here, we apply our results to detecting a change in the mean and/or in the variance and/or in the correlation of data from some simple models. Concretely, we check the difference between the distributions of the observations only by checking the differences between their means and/or variances and/or autocorrelations. In this purpose, we consider the kernels \(h(x,y)= \mathrm {1\!l}(x<y)\) and \(h(x,y)= x-y\). For \(h(x,y)= \mathrm {1\!l}(x<y)\), from simple computations one shows that \(\theta (F,F)=1/2\) and \(h_{F,1}(x)=1/2-F(x)=-h_{F,2}(x)\). For \(h(x,y)= x-y\), it is a trivial matter that \(\theta (F,F)=0\) and that by Remark 4, \(h_{F,1}(x)=-h_{F,2}(x)\) as h is anti-symmetric. Consequently, Corollary 1 holds for these two kernels whose corresponding \(\sigma ^2\) are respectively

$$\begin{aligned} \sigma _1^2=\sigma ^2= {\mathbb {V}}\text{ ar }\left[ F(X_1) \right] +2 \sum _{j\ge 1} {\mathbb {C}}\text{ ov }(F(X_1), F(X_{1+j})) \end{aligned}$$

and

$$\begin{aligned} \sigma _2^2=\sigma ^2= {\mathbb {V}}\text{ ar }\left( X_1 \right) +2 \sum _{j\ge 1} {\mathbb {C}}\text{ ov }(X_1, X_{1+j}). \end{aligned}$$

We sampled 1000 sets of \(n=200\) data \(X_1, X_2, \ldots , X_n\) from the model

$$\begin{aligned} X_i= \left\{ \begin{array}{cc} \varepsilon _i &{} i=1, \dots , 100\\ \mu +\rho X_{i-1}+\omega \varepsilon _i&{} i=101, \dots , 200, \end{array} \right. \end{aligned}$$
(9)

where \(\mu \) is a real number, \(\omega \) is a positive number, the \(\varepsilon _i\)’s are iid and for all \(i=1, \ldots , 200\), \(\varepsilon _i \sim {\mathcal {N}}(0,1)\), or \(\varepsilon _i \sim {\mathcal {T}}(3)\) (Student distribution with 3 degrees of freedom), or \(\varepsilon _i = {\mathcal {E}}_i-1\) with \({\mathcal {E}}_i \sim {\mathcal {E}}(1)\) (\({\mathcal {E}}(1)\) exponential distribution with parameter 1).

We first apply our Kolmogorov-Smirnov and Cramér-von Mises type tests to testing \(\mu =0\) against \(\mu \ne 0\) for \(\omega =1\) and \(\rho =0\) (testing a change in the mean of a shifted white noise). Next, we apply the two tests to testing \(\omega =1\) against \(\omega \ne 1\) for \(\mu =0\) and \(\rho =0\) (testing change in the variance of a white noise). Finally, we consider testing \(\rho =0\) against \(\rho \ne 0\) for \(\mu =0\) and \(\omega =1\) (testing a change in the autocorrelation of an AR(1) model).

For \(j=1,2\) let \(c_{\alpha j}\) be the critical value at level of significance \( \alpha \in [0,1]\), of the test based on \(T_{j,n}\). Then the empirical power of the test can be computed as the ratio of the number of samples for which \(T_{j,n} > c_{\alpha j}\) over the number of replications (taken here to be 1000). However, the critical values of our tests are difficult to compute. For this reason, as in Ngatchou-Wandji (2009), we use the p-value method as follows : instead of counting the number of samples for which \(T_{j,n} > c_{\alpha j}\), we rather count the number of samples for which the p-value is less than \(\alpha \). Denoting by \({{\widehat{\sigma }}}_j\) an estimator of \(\sigma _j\) when this number is unknown, from Corollary 1, the p-values of our tests are given respectively by

$$\begin{aligned} p_1={\mathbb {P}}\left( \sup _{\lambda \in [0,1]}|W^0(\lambda )| > {T_{1,n} \over {{\widehat{\sigma }}}_1} \right) \end{aligned}$$

and

$$\begin{aligned} p_2={\mathbb {P}}\left( \sum _{j \ge 1} {1 \over j^2 \pi ^2} \chi _j^2 > {T_{2,n} \over {{\widehat{\sigma }}}_2^2} \right) , \end{aligned}$$

where we recall that \(W^0\) is the Brownian bridge on [0, 1] and the \(\chi _j^2\)’s are iid chi-square random variables with one degree of freedom. Note that for iid \(X_i\)’s the covariance terms in the expressions of \(\sigma ^2\) vanish, so that under \({\mathcal {H}}_0\), for \(h(x,y)= \mathrm {1\!l}(x<y)\), \(\sigma ^2={\mathbb {V}}\text{ ar }[F(X_1)]=1/12\) and needs not be estimated, while for \(h(x,y)=x-y\), \(\sigma ^2={\mathbb {V}}\text{ ar }(X_1)\) which can be estimated by the sample variance as done in the trials.

From Billingsley (1999) p. 103, one has

$$\begin{aligned} p_1=-2 \sum _{|j|=0}^{\infty } (-1)^j \exp \left[ -2j^2 \left( {T_{1,n} \over {{\widehat{\sigma }}}_1}\right) ^2 \right] . \end{aligned}$$

Truncating the sum to the most significant terms yields an approximation for \(p_1\). Also, truncating the sum in the expression of \(p_2\), many well known results can be used for approximating \(p_2\). Here, we use those of Imhof (1961).

On any of the graphics below, the green color represents the level of significance of the test. On Fig. 2, the blue color represents the empirical power of the test based on \(T_{1,n}\) while the red one represents that of the test based on \(T_{2,n}\). The upper graphics in Fig. 2 display the empirical power functions of both tests as functions of \(\mu \in \{0, 0.1, 0.2, \ldots , 1.1\}\), for observations from (9) with \(\rho =0\), \(\omega =1\) and standard Gaussian noises. The first graphic corresponds to the kernel \(h(x,y)= \mathrm {1\!l}(x<y)\), while the second corresponds to the kernel \(h(x,y)= x-y\). On both graphics, one can observe that at \(\mu =0\) the test based on \(T_{2,n}\) estimates the nominal level of the test more accurately than the one based on \(T_{1,n}\). More over, it has a larger power for smaller values of the mean.

The lower graphics on Fig. 2 are respectively the power functions for the same kernels, for data from (9) with \(\rho =0\), \(\omega =1\) with Student noises with 3 degrees of freedom, and centered exponential noises (from exponential distribution with parameter 1). One can see on these graphics that the powers grow a bit more slowly than those in the upper graphics.

The upper graphics in Fig. 3 present the power functions of both tests as functions of \(\omega ^2 \in \{1, 1.2, 1.4, \ldots , 2, 2.2, \ldots , 2.8 \}\), for observations from (9) with \(\mu =0\) and \(\rho =0\). From these graphics, one sees that for the kernel \(h(x,y)= \mathrm {1\!l}(x<y)\) both tests are not sensitive to any change in the variance, while they are for \(h(x,y)= x-y\). Here, in contrast to the previous situations, the test based on \(T_{1,n}\) does better than the one based on \(T_{2,n}\).

The lower graphics in Fig. 3 exhibit the power functions of both tests as functions of \(\rho \in \{0,.05,.1,.2,.3,.4,.5,.6,.7,.8,.85,.9,.95\}\), for observations from (9) with \(\mu =0\) and \(\omega =1\). One can see that for the kernels \(h(x,y)= \mathrm {1\!l}(x<y)\) and \(h(x,y)= x-y\), both tests are able to detect change in the autocorrelation. In the vicinity of the null hypothesis the test based on \(T_{2,n}\) does better than the one based on \(T_{1,n}\), while far from this hypothesis, the test based on \(T_{1,n}\) does better than that based on \(T_{2,n}\).

We did a lot of trials. In particular, we studied the detection of a change in the mean of a shifted Student white noise with 3, 4 and 5 degrees of freedom (\({\mathcal {T}}(j), j=3,4,5\)) associated with \(h(x,y)= x-y\). Our tests did not detect a mean change in the \({\mathcal {T}}(3)\) case. This is likely due to the fact that with this kernel, the theoretical results assume the existence of moment of order larger than 2, which is not the case for \({\mathcal {T}}(3)\). For the \({\mathcal {T}}(4)\) and \({\mathcal {T}}(5)\) cases, a mean change was detected. We do not present the results as they were very similar to those presented.

Fig. 2
figure 2

Empirical power of CM test (red color). Empirical power of KS test blue color. Nominal level 5% green color. First row : change in the mean of a shifted Gaussian white noise respectively with the "indicator" and "difference" kernels. Second row : change in the mean of a shifted Student white noise with the "indicator" kernel, and change in the mean of a shifted centered exponential white noise with the "difference" kernel

Fig. 3
figure 3

Empirical power of CM test (red color). Empirical power of KS test blue color. Nominal level 5% green color. First row : change in the variance of a shifted Gaussian white noise respectively with the "indicator" and the "difference" kernels. Second row : change in the correlation of an AR(1) model respectively with the "indicator" and the "difference" kernels

4.2 Concluding remarks

The main theoretical results in this paper are the same as those of Dehling et al. (2015). But they are established for more general kernels. In addition to the traditional Kolmogorov-Smirnov (KS) type test used for change-point detection, a Cramér-von Mises (CM) type test is studied. For the kernels and the data considered, the CM test seems to have better power properties than the KS test for detecting small changes in the mean of a shifted Gaussian white noise. For the kernel \(h(x,y)=\mathrm {1\!l}(x<y)\), both test are not sensitive to change in the variance of the observations studied. This may be explained by the fact that this function involves the rank of the observations rather than the observations themselves. Furthermore, the corresponding tests are associated with uniform random variables (\(F(X_j) \sim {\mathcal {U}}[0,1]\)) through \(\sigma ^2\). This is not the case for the tests based on \(h(x,y)=x-y\) which involves directly the observation and are related to their variances through \(\sigma ^2\) and for which change in the variance can be detected by the tests studied here (with a favor to KS). Our study also shows that for each of these kernels, our tests are able to detect changes in the autocorrelation of an AR(1) model. The KS test does better far from the null hypothesis, while the CM does better in the vicinity of the null hypothesis. However, the results seem to show that they are more adapted to detect changes in the mean than changes in the variance or autocorrelation. Indeed, in testing the mean, the power of the tests grow quickly to 1, as one moves away from the null hypothesis.