Keywords

15.1 Introduction

This chapter is concerned with testing the equality of two autoregressive functions against two sided alternatives when observing two independent strictly stationary and ergodic autoregressive times series of order one. More precisely, let \(Y_{1,i},\,Y_{2,i},\, i\in \mathbb{Z}:= \{0,\pm 1,\cdots\}\), be two observable autoregressive time series such that for some real valued functions μ1 and μ2, and for some positive functions \(\sigma_1,\,\sigma_2\),

$$Y_{1,i}=\mu_1(Y_{1,i-1})+ \sigma_1(Y_{1,i-1})\varepsilon_{1,i}, \qquad Y_{2,i}=\mu_2(Y_{2,i-1})+ \sigma_2(Y_{2,i-1})\varepsilon_{2,i}.$$
(15.1)

The errors \(\{\varepsilon_{1,i},\, i\in\mathbb{Z}\}\) and \(\{\varepsilon_{2,i},\, i\in \mathbb{Z}\}\) are assumed to be two independent sequences of independent and identically distributed (i.i.d.) r.v.’s with mean zero and unit variance. Moreover, \(\varepsilon_{1,i},\,i\ge 1\) are independent of \(Y_{1,0}\), and \(\varepsilon_{2,i},\, i\ge 1\) are independent of \(Y_{2,0}\). And the time series are assumed to be stationary and ergodic.

Consider a bounded interval \([a, b]\) of \({\mathbb{R}}\). The problem of interest is to test the null hypothesis:

$$\begin{aligned} H_0:\, \mu_1(x)=\mu_2(x),\qquad \forall\, x\in [a,b],\end{aligned}$$

against the two sided alternative hypothesis:

$$H_1: \, \mu_1(x) \neq \mu_2(x), \mbox{for some $x\in [a,b]$},$$
(15.2)

based on the data set \(Y_{1,0}, Y_{1,1},\cdots,Y_{1,n_1}\), \(Y_{2,0}, Y_{2,1},\cdots,Y_{2,n_2}\).

In hydrology, autoregressive time series are often used to model water reservoirs, see, e.g., Bloomfield (1992). The above testing problem could be applied in comparing the water levels of two rivers.

Few related studies had been conducted under the two sample autoregressive setting. Koul and Li (2005) adapts the covariate matching idea used in regression setting to a one-sided tests for the superiority among two time series. Li (2009) studied the same testing problem, but the test is based on the difference of two sums of quasi-residuals. This method is also an extension of T 2 in Koul and Schick (1997) from regression setting to autoregressive setting.

1990 IMS Subject Classification: Primary 62M10, Secondary 62F03.

The papers that address the above two sided testing problem in regression setting include Hall and Hart (1990); Delgado (1993); Kulasekera (1995) and Scheike (2000). In particular, Delgado (1993) used the absolute difference of the cumulative regression functions for the same problem, assuming common design in the two regression models. Kulasekera (1995) used quasi-residuals to test the difference between two regression curves, under the conditions that do not require common design points or equal sample sizes. The current article adapts Delgado’s idea of using partial sum process and Kulasekera’s idea of using quasi residuals to construct the tests for testing the difference between two autoregressive functions.

Similarly, as in Delgado (1993), let

$$\Delta(t):= \int_{a}^{t} \Big(\mu_1(x)-\mu_2(x)\Big)\left(f_1(x)+f_2(x)\right)\, dx, \forall \, a\le t\le b,$$
(15.3)

where \(\mu_1,\,\mu_2\) are assumed to be continuous on \([a,b]\) and \(f_1,\, f_2\) are the stationary densities of the two time series \(Y_{1,i}\) and \(Y_{2,i}\), respectively. We also assume that \(f_1,\, f_2\) are continuous and positive on \([a,b]\). It is easy to show that \(\Delta(t)\equiv 0\) when the null hypothesis holds and \(\Delta(t)\neq 0\) for some t under H a . This suggests to construct tests of H 0 vs. H a based on some consistent estimators of \(\Delta(t)\). One such estimator is obtained as follows.

First, as in Kulasekera (1995), we define quasi-residuals

$$e_{1,i} = Y_{1,i}- \hat{\mu}_2(Y_{1,i-1}), \qquad i=1,\cdots,n_1,$$
(15.4)

and

$$e_{2,j}=Y_{2,j}- \hat{\mu}_1(Y_{2,j-1}),\qquad j=1,\cdots, n_2.$$
(15.5)

Here, \(\hat{\mu}_1\) and \(\hat{\mu}_2\) are appropriate estimators, such as Nadaraya–Watson estimators used in this article, of μ1 and μ2. See Nadaraya (1964) and Watson (1994).

Now, let

$$U_n(t) = \frac{1}{n_1} \sum_{i=1}^{n_1} e_{1,i}1_{[a \le Y_{1,i-1}\le t]}- \frac{1}{n_2} \sum_{j=1}^{n_2} e_{2,j}1_{[a\le Y_{2,j-1} \le t]},$$
(15.6)

where the subscript n, here and through out the chapter, represents the dependence on n 1 and n 2. With uniformly consistent estimators \(\hat{\mu}_1\) and \(\hat{\mu}_2\) of μ1 and μ2 such as kernel estimates and under some mixing condition on the time series \(Y_{1,i}\) and \(Y_{2,j}\) such as strongly \(\alpha-\)mixing, U n (t) can be shown to be \(U_{1n}(t)+U_{2n}(t)+U_{3n}(t)\) with

$$\begin{aligned} U_{1n}(t) &= \frac{1}{n_1} \sum_{i=1}^{n_1} \sigma_1(Y_{1,i-1})\varepsilon_{1,i}1_{[a \le Y_{1,i-1}\le t]}\nonumber\\ &\quad- \frac{1}{n_2} \sum_{j=1}^{n_2}\sigma_2(Y_{2,j-1}) \varepsilon_{2,j}1_{[a\le Y_{2,j-1} \le t]}=o_P(1),\\ {U_{2n}(t)}&= \frac{1}{n_1} \sum_{i=1}^{n_1} (\mu_1(Y_{1,i-1})-\mu_2(Y_{1,i-1}))1_{[a \le Y_{1,i-1}\le t]}\\ &\quad- \frac{1}{n_2} \sum_{j=1}^{n_2} (\mu_2(Y_{2,j-1})-\mu_1(Y_{2,j-1}))1_{[a\le Y_{2,j-1} \le t]}\\ &= \int_{a}^{t} \Big(\mu_1(x)-\mu_2(x)\Big)\left(f_1(x)+f_2(x)\right)\, dx +o_P(1),\\ {U_{3n}(t)}&= \frac{1}{n_1} \sum_{i=1}^{n_1} (\mu_2(Y_{1,i-1})- \hat{\mu}_2(Y_{1,i-1}))1_{[a \le Y_{1,i-1}\le t]}\\ &\quad- \frac{1}{n_2} \sum_{j=1}^{n_2} (\mu_1(Y_{2,j-1})-\hat{\mu}_1(Y_{2,j-1}))1_{[a\le Y_{2,j-1} \le t]} = \textit{o}_{\textit{P}}(1),\end{aligned}$$

uniformly for all \(t\in [a,b]\). Thus, U n (t) provides a uniformly consistent estimator of \(\Delta(t)\). This suggests to base tests of H 0 on some suitable functions of this process. In this chapter, we shall focus on the Kolmogorov–Smirnov type test based on \(\sup_{a\le t\le b}|U_n(t)|\).

To determine the large sample distribution of the process U n (t), one needs to normalize this process suitably. Let

$$\begin{aligned} \tau_n^2(t) &= q_1E \left\{\sigma_1^2(Y_{1,0})\left(1 + \frac{f_2(Y_{1,0})}{f_1(Y_{1,0})}\right)^2 1_{[a\le Y_{1,0}\le t]}\right\}\nonumber\\ &\quad + q_2 E\left\{\sigma_2^2(Y_{2,0})\left(1+ \frac{f_1(Y_{2,0})}{f_2(Y_{2,0})}\right)^2 1_{[a \le Y_{2,0}\le t]}\right\},\end{aligned}$$
(15.7)

where, \(q_1=\frac{N}{n_1}=\frac{n_2}{n_1+n_2}\), \(q_2=\frac{N}{n_2}=\frac{n_1}{n_1+n_2}\) and \(N=\frac{n_1 n_2}{n_1+n_2}\).

We consider the following normalized test statistics:

$$T: = \sup_{a\le t\le b} \Big | \frac{N^{1/2} U_n(t)}{\sqrt{\tau_n^2(b)}} \Big|.$$
(15.8)

In the case σ i ’s and f i ’s are known, the tests of H 0 could be based on T, being significant for its large value. But, usually those functions are unknown which renders T of little use. This suggests to replace τ n with its estimate \(\hat{\tau}_n^2\) which satisfies

$$\frac{\hat{\tau}_n^2(b)}{\tau_n^2(b)}\rightarrow_P 1.$$
(15.9)

An example of such estimator \(\hat{\tau}_n(t)\) of \(\tau_n(t)\) is

$$\begin{aligned} {\hat{\tau}_n^2(t)} &=q_1\frac{1}{n_1}\,\sum_{i=1}^{n_1}\left\{\left(Y_{1,i}-\tilde{\mu}_1(Y_{1,i-1})\right)^2\left(1+\frac{\hat{f}_2(Y_{1,i-1})}{\hat{f}_1(Y_{1,i-1})}\right)^2 1_{[a\le Y_{1,i-1}\le t]} \right\}\nonumber\\ &\quad + q_2\frac{1}{n_2}\,\sum_{j=1}^{n_2}\left\{\left(Y_{2,j}-\tilde{\mu}_2(Y_{2,j-1})\right)^2\left(1+\frac{\hat{f}_1(Y_{2,j-1})}{\hat{f}_2(Y_{2,j-1})}\right)^2 1_{[a\le Y_{2,j-1}\le t]}\right\},\end{aligned}$$
(15.10)

where, \(\tilde{\mu}_i\)’s and \(\hat{f}_i\)’s are appropriate estimators, such as kernel estimators used in this paper, of μ i ’s and f i ’s. Therefore, the proposed tests will be based on the adaptive version of T, namely

$$\hat{T}: =\sup_{a\le t\le b} \Big | \frac{N^{1/2}U_n(t)}{\sqrt{\hat{\tau}_n^2(b)}} \Big|$$
(15.11)

We shall study the asymptotic behavior of \(\hat{T}\) as the sample sizes n 1 and n 2 tend to infinity. Theorem 2.1 of Sect. 15.2 shows that under H 0, T weakly converge to supremum of Brownian motion over [0, 1], under some general assumptions and with \(\hat{\mu}_1\) and \(\hat{\mu}_2\) being Nadaraya–Watson estimators of μ1 and μ2. Then, in Corollary 2.1, under some general assumptions on the estimates \(\tilde{\mu}_1,\, \tilde{\mu}_2\) and \(\hat{f}_1,\, \hat{f}_2\), we derive the same asymptotic distributions of \(\hat{T}\) under H 0. Remark 2.2 proves that the power of the test basted on \(\hat{T}\) converges to 1, at the fixed alternative (15.2) or even at the alternatives that converge to H 0 at a rate lower than \(\sqrt{\tau_n^2(b)}\). In Sect. 15.3, we conduct a Monte Carlo simulation study of the finite sample level and power behavior of the proposed test \(\hat{T}\). The simulation results are shown to be consistent with the asymptotic theory at the moderate sample sizes considered. In Sect. 15.4, we study some properties of kernel smoothers and weak convergence of both empirical processes and marked empirical processes. Those studies facilitate the proof of our main results in Sect. 15.2. But, they may also be of interest on their own, hence are formulated and proved in Sect. 15.4. The other proofs are deferred to Sect. 15.5.

15.2 Asymptotic Behavior of T and \(\hat{T}\)

This section investigates the asymptotic behavior of T given in (15.8) and the adaptive statistic \(\hat{T}\) given in (15.11) under the null hypothesis and the alternatives (15.2). We write P for the underline probability measures and E for the corresponding expectations. In this chapter we consider Nadaraya–Watson estimators \(\hat{\mu}_1,\, \hat{\mu}_2\) of μ1 and μ2, i.e.,

$$\hat{\mu}_i(x)= \frac{\sum_{j=1}^{n_i}Y_{i,j}K_{h_i}(Y_{i,j-1}-x)} {\sum_{j=1}^{n_i}K_{h_i}(Y_{i,j-1}-x)}, i=1,2,$$
(15.12)

where \(K_{h_i}(x)=\frac{1}{h_i}K(\frac{x}{h_i})\), with K being a kernel density function on the real line with compact support \([-1,1]\), \(h_1,\, h_2> 0\) are the bandwidths. First, we recall the following definition from Bosq (1998):

Definition 2.1

For any real discrete time process \((X_i, i\in \mathbb{Z})\) define the strongly mixing coefficients

$$\begin{aligned} \alpha (k):= \sup_{t\in \mathbb{Z}} \alpha(\sigma\mbox{-field}(X_i, i\le t),\,\,\sigma\mbox{-field} (X_i, i\ge t+k)); k=1,2,\dots\end{aligned}$$

where, for any two sub σ-fields \(\mathcal{B}\) and \(\mathcal{C}\),

$$\begin{aligned} \alpha(\mathcal{B}, \mathcal{C})=\sup_{B\in\mathcal{B}, \, C\in\mathcal{C}}|P(B\cap C)-P(B)P(C)|.\end{aligned}$$

Definition 2.2.

The process \((X_i, i\in \mathbb{Z})\) is said to be geometrically strong mixing (GSM) if there exists \(c_0>0\) and \(\rho\in [0,1)\) such that \(\alpha(k)\le c_0\rho^k\) , for all \(k\ge 1\) .

The following assumptions are needed in this paper.

  1. [(A.1)]

    The autoregressive functions \(\mu_1,\, \mu_2\) are continuous on an open interval containing \([a, b]\) and they have continuous derivatives on \([a, b]\).

  2. [(A.2)]

    The kernel function K(x) is a symmetric Lipschitz-continuous density on \({\mathbb{R}}\) with compact support \([-1,1]\).

  3. [(A.3)]

    The bandwidths \(h_1,\, h_2\) are chosen such that \(h_i^2~N^{1-c} \rightarrow \infty\) for some \(c>0\) and \(h_i^4~N \rightarrow 0\).

  4. [(A.4)]

    The densities f 1 and f 2 are bounded and their restrictions to \([a,b]\) are positive. Moreover, they have continuous second derivatives over an open interval containing \([a,\, b]\).

  5. [(A.5)]

    The conditional variance functions \(\sigma_1^2\) and \(\sigma_2^2\) are positive on \([a,b]\) and continuous on an open interval containing \([a,b]\).

  6. [(A.6)]

    \(Y_{1,i}\), \(Y_{2,i}\), \(i\in \mathbb{Z}\) are GSM processes.

  7. [(A.7)]

    For some \(M<\infty\), we have

    $$\begin{aligned} E(\varepsilon^4_{i,1})\le M,\qquad i=1,2.\end{aligned}$$
  8. [(A.8)]

    For \(i=1,2\), the joint densities \(g_{i,l}\) between \(Y_{i,0}\) and \(Y_{i, l}\) for all \(l\ge 1\) are uniformly bounded over an open interval \(\mathcal{I}_0\) containing \(\mathcal{I}\), i.e., \(\sup_{l\ge 1}\sup_{x,y \in \mathcal{I}_0}g_{i,l}(x,y)<\infty\).

  9. [(A.9)]

    The densities g 1 and g 2 of the innovations \(\varepsilon_{1,1}\) and \(\varepsilon_{2,1}\) are bounded.

Let \({\mathcal K}(y)=\int_{-1}^y K(t)\,dt\) be the distribution function corresponding to the kernel density K(y) on \([-1,\,1]\) and let

$$\begin{aligned} V_n(t) & = \frac{1}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\,\sigma_1(Y_{1,i-1})\left(1_{[a\le Y_{1,i-1}\le t]}+\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})}\left({\mathcal K}\left(\frac{t-Y_{1,i-1}}{h_2}\right)\right.\right.\nonumber\\ &\quad\left.\!\left.-\,{\mathcal K}\left(\frac{a-Y_{1,i-1}}{h_2}\right)\right) \right)\nonumber\\ &\quad-\frac{1}{n_2}\sum_{j=1}^{n_2}\varepsilon_{2,j}\,\sigma_2(Y_{2,j-1})\left(1_{[a\le Y_{2,j-1}\le t]}+\frac{f_1(Y_{2,j-1})}{f_2(Y_{2,j-1})}\left({\mathcal K}\left(\frac{t-Y_{2,j-1}}{h_1}\right)\right.\right.\nonumber\\ &\quad\left.\!\left.-\,{\mathcal K}\left(\frac{a-Y_{2,j-1}}{h_1}\right)\right) \right)\end{aligned}$$
(15.13)

and

$$\begin{aligned}W_n(t) &= \frac{1}{n_1}\sum_{i=1}^{n_1} (\mu_1(Y_{1,i-1})-\mu_2(Y_{1,i-1}))1_{[a \le Y_{1,i-1}\le t]}\nonumber\\ & +\, \frac{1}{n_2}\sum_{j=1}^{n_2} (\mu_1(Y_{2,j-1})-\mu_2(Y_{2,j-1}))1_{[a \le Y_{2,j-1}\le t]}\end{aligned}$$
(15.14)

We are now ready to state the main result.

Theorem 2.1

Suppose, the conditions (A.1)–(A.9) hold true. Then, under both null and alternative hypotheses, as \(n_1 \wedge n_2 \rightarrow \infty\),

$$\sup_{a\le t \le b}\left| \frac{N^{1/2}}{\sqrt{\tau_n^2(b)}} \left(U_n(t)-V_n(t)-W_n(t)\right) \right|=o_P(1).$$
(15.15)

Here, U n is given in (15.6) with \( \hat{\mu}_1,\, \hat{\mu}_2\) of (15.12), and V n and W n are given in (15.13) and (15.14). Consequently,

$$\frac{N^{1/2}}{\sqrt{\tau_n^2(b)}} \left(U_n(t)-W_n(t)\right) \Longrightarrow B\circ \varphi(t), \varphi(t)=\lim_{n_1\wedge n_2\rightarrow\infty} \frac{\tau_n^2(t)}{\tau_n^2(b)},$$
(15.16)

in the Skorohod space \(D[a,b]\) , where \(B\circ \varphi\) is a continuous Brownian motion on \([a,b]\) with respect to time ϕ. Therefore, under H 0 , \(T\) of (15.8) satisfies

$$\begin{aligned} T \Longrightarrow \sup_{0\le t\le 1} |B(t)|,\end{aligned}$$

where \(B(t)\) is a continuous Brownian motion on \({\mathbb{R}}\).

Proof:

The proof is given in Sect. 15.5.

Next, we need the following additional assumption to obtain the asymptotic distribution of \(\hat{T}\) given in (15.11)

Assumption 2.1

Let \(\tilde{\mu}_i\), \(\hat{f}_i\) be estimators of μ i and f i , respectively, satisfying

$$\begin{aligned} \sup_{a \le x \le b}|\tilde{\mu}_i(x)-\mu_i(x)|=o_P(1), \quad\sup_{a \le x \le b}| \hat{f}_i(x)-f_i(x)|=o_P(1), \qquad i=1,2,\end{aligned}$$

under both null and alternative hypotheses.

Corollary 2.1

Suppose the conditions of Theorem 2.1 hold true. In addition, suppose that there are estimates \(\tilde{\mu}_i\) and \(\hat{f}_i\) in (15.10) satisfying Assumption 2.1. Then, as \(n_1 \wedge n_2 \rightarrow \infty\) and under H 0, \(\hat{T}\) of (15.11) satisfies

$$\begin{aligned} \hat{T} \Longrightarrow \sup_{0\le t\le 1} |B(t)|.\end{aligned}$$

Proof:

It suffices to prove (15.9). Let

$$\begin{aligned} O_1=E \left\{\sigma_1^2(Y_{1,0})\left(1 + \frac{f_2(Y_{1,0})}{f_1(Y_{1,0})}\right)^2 1_{[a \le Y_{1,0}\le b]}\right\}\end{aligned}$$

and

$$\begin{aligned} O_{1n}= \frac{1}{n_1}\sum_{i=1}^{n_1}\left\{\left(Y_{1,i}-\tilde{\mu}_1(Y_{1,i-1})\right)^2\left(1+\frac{\hat{f}_2(Y_{1,i-1})}{\hat{f}_1(Y_{1,i-1})}\right)^2 1_{[a \le Y_{1,i-1}\le b]} \right\}.\end{aligned}$$

By Assumption 2.1, (A.4), (A.5) and Chebyshev’s inequality, it can be derived that

$$O_{1n} = O_1+o_P(1).$$
(15.17)

Similarly, we obtain

$$O_{2n} = O_2+o_P(1),$$
(15.18)

with

$$\begin{aligned} O_2=E \left\{\sigma_2^2(Y_{2,0})\left(1 + \frac{f_1(Y_{2,0})}{f_2(Y_{2,0})}\right)^2 1_{[a \le Y_{2,0}\le b]}\right\}\end{aligned}$$

and

$$\begin{aligned} O_{2n}= \frac{1}{n_2}\sum_{j=1}^{n_2}\left\{\left(Y_{2,j}-\tilde{\mu}_2(Y_{2,j-1})\right)^2\left(1+\frac{\hat{f}_1(Y_{2,j-1})}{\hat{f}_2(Y_{2,j-1})}\right)^2 1_{[a \le Y_{2,j-1}\le b]} \right\}.\end{aligned}$$

From (15.17), (15.18) and the fact that \(O_1,\, O_2\) are some positive constants, we have

$$\begin{aligned} \frac{\hat{\tau}_n^2(b)- \hat{\tau}_n^2(a)}{\tau_n^2(b)-\tau_n^2(a)} =\frac{q_1(O_1+o_P(1))+q_2(O_2+o_P(1))}{q_1O_1+q_2O_2} \rightarrow_P 1,\end{aligned}$$

which completes the proof of the corollary. ☐

Remark 2.1

An example of estimates \(\tilde{\mu}_i\) and \(\hat{f}_i\) satisfying Assumption 2.1 are: \(\tilde{\mu}_i= \hat{\mu}_i\) of (15.12) and

$$\hat{f}_i(x)= \frac{1}{n_i}\,\sum_{j=1}^{n_i}K_{h_i}(Y_{i,j-1}-x), i=1,2,$$
(15.19)

with \(h_1, h_2\) being appropriate bandwidths that could be different for constructing \(\hat{\mu}_i\) in U n of (15.6). For example, here we can take \(h_i=O(n_i^{-1/5})\), See Bosq (1998). But to construct \(\hat{\mu}_i\) in U n of (15.6), we need to choose bandwidths that satisfy (A.3).

Remark 2.2 Testing property of \(\hat{T}\):

Under the model (15.1), consider the following alternative that is the same as in (15.2):

$$\begin{aligned} H_a:\quad \mu_1(x)- \mu_2(x) = {\delta}(x) \neq 0 \mbox{for some $x\in [a,b]$},\end{aligned}$$

where \({\delta}\) is continuous on \([a,b]\) since \(\mu_1, \, \mu_2\) are continuous.

Theorem 2.1 and its corollary suggest to reject the null hypothesis for large values of \(\hat{T}\) given in (15.11) under Assumption 2.1.

Let

$$\begin{aligned} \hat{T}(t) = \frac{N^{1/2}}{\sqrt{\hat{\tau}_n^2(b)}} \,U_n(t), \qquad \hat{T}_1(t)=\frac{N^{1/2}}{\sqrt{\hat{\tau}_n^2(b)}}(U_n(t)-W_n(t))\end{aligned}$$

Then,

$$\begin{aligned}\hat{T}(t)= \hat{T}_1(t) + h(t), h(t)= \frac{N^{1/2}}{\sqrt{\hat{\tau}_n^2(b)}}W_n(t)\end{aligned}$$

By Ergodic theorem,

$$\begin{aligned} W_n &\rightarrow_P \int_{a}^{t} {\delta}(x)\, dx,\nonumber\\ h(t) &\sim \frac{N^{1/2}}{\sqrt{\hat{\tau}_n^2(b)}} \, \int_{a}^{t} {\delta}(x)\, dx.\end{aligned}$$
(15.20)

This, together with \(\frac{N^{1/2}}{\sqrt{\tau_n^2(b)}} \rightarrow \infty\), (15.9), and the fact that \(\int_{a}^{t} {\delta}(x)\, dx\) is not 0 for some \(a\le t \le b\) implies,

$$\sup_{a\le t \le b}|h(t)| \rightarrow_P \infty.$$
(15.21)

Hence, in view of (15.16) and (15.9),

$$\hat{T} = \sup_{a\le t \le b} | \hat{T}(t)|= \sup_{a\le t \le b}| \hat{T}_1(t)+h(t)| \rightarrow_P \infty.$$
(15.22)

This, together with Corollary 2.1 indicates that the test bases on \(\hat{T}\) is consistent for H a .

Note:

By using the same arguments as above, we even can claim that under Assumption 2.1, the test based on \(\hat{T}\) is consistent for the alternatives converging to the null hypothesis at any rate α n that is lower than \(N^{-1/2}\), since (15.21) is still satisfied when \({\delta}(x)\) is replaced by \({\delta}(x)\alpha_n\). Furthermore, under \(H_1: \,\,\mu_1(x)-\mu_2(x)=\frac{\sqrt{\tau_n^2(b)}}{N^{1/2}} {\delta}(x), x \in [0,1],\) the limiting powers of the asymptotic level α tests T is computed as

$$\begin{aligned} \lim_{n\rightarrow \infty}P(\hat{T}_1> b_{\alpha}) &\quad = &\quad P\left(\sup_{a\le t\le b} \left|B\circ\varphi(t)+g(t)\right|>b_{\alpha} \right),\end{aligned}$$

where \(b_{\alpha}\) is defined such that

$$\begin{aligned} P\left(\sup_{0\le t\le 1} \left|B(t)\right|>b_{\alpha} \right)= \alpha.\end{aligned}$$

15.3 Simulation

In this section, we investigate the finite sample behavior of the nominal level of the proposed test \(\hat{T}\) under H 0 and power of \(\hat{T}\) against some nonparametric alternatives. As sample sizes, we choose the moderate sample sizes \(n_1=n_2=n=50,\, 100,\, 300,\) \(600, \, 1,000\), and 2,000 with each simulation being repeated for 1,000 times. The data is simulated from model (15.1), where the two autoregressive functions are chosen to be \(\mu_2(x)=1+x/2\) and \(\mu_1(x)=\mu_2(x)+{\delta}(x)\), and the innovations \(\{\varepsilon_{1,i}\}\) and \(\{\varepsilon_{2,i}\}\) are taken to be independent standard normal \({\mathcal{N}}(0,1)\). We choose \({\delta}(x)=0\) corresponding to H 0 and \({\delta}(x)= 1\), \(2(x-3)/x\), and \(2\left(\frac{n_1n_2}{n_1+n_2}\right)^{-1/2}=2~N^{-1/2}\) corresponding to H a . Note that the second choice of \({\delta}\) is negative for \(x<3\) and positive for \(x>3\) and converge to 0 for \(x\rightarrow 3\); the last choice of \({\delta}\) corresponds to the local alternatives with a rate being the same as τ n of (15.7). For simplicity, the conditional variance functions σ1 and σ2 are chosen to be (i) \(\sigma_1(x)=\sigma_2(x)=1\) and (ii) \(\sigma_1(x)=\sigma_2(x)=3/\sqrt{1+x^2}\). Finally, the interval \([a,\,b]\) in (15.2) is taken to be \([2,\,4]\).

Table 15.1 The critical values \(b_\alpha\)

To construct the test statistics \(\hat{T}\) of (15.11), we consider Nadaraya–Watson estimators \(\hat{\mu}_1,\,\hat{\mu}_2\) of (15.12) with kernel \(K(x)=\frac{3}{4}(1-x^2)\mathbf{1}(|x|\le 1)\) and we considered three different bandwidths \(h_1=h_2=0.15, 0.2\) and 0.25. The estimates \(\tilde{\mu}_1,\, \tilde{\mu}_2\) and \(\hat{f}_1,\, \hat{f}_2\) in \(\hat{\tau}_n\) of (15.10) are from Remark 2.1 with \(h_i=n_i^{-1/5},\,i=1,2\). Let \(b_\alpha\) satisfy \(P\big(\sup_{0\le t\le 1}|B(t)|\) \(>b_\alpha \big) =\alpha.\) Then, the empirical size (power) is computed by the proportion of rejects \(\frac{\# \mbox{ of} [\hat{T}>b_\alpha]}{1000}.\)

In Table 15.1, we give the critical values \(b_\alpha\) obtained from the formula \(P\left(\sup_{0\le t\le 1}|B(t)|<b \right)\) \(= P(\,|B(1)|<b\,)+ 2\sum_{i=1}^\infty (-1)^i P(\,(2i-1)b<B(1)<(2i+1)b\,)\) given on page 553 of the book by Resnick (1992).

The simulation programming was done using R. To generate each of the two samples, we first generated \((500+n)\) error variables from \({\mathcal{N}}(0,1)\). Using these errors and model (15.1) with the initial value \(Y_{i,0}\) randomly chosen from \({\mathcal{N}}(0,1)\), we generated \((501+n)\) observations. The last \((n+1)\) observations from the data thus generated are used in carrying out the simulation study.

The results of the simulation study are shown in Table 15.2 below. Three rows correspond each choice of \({\delta}(x)\) with the first row corresponding to bandwidth 0.15, the second to 0.2 and the third to 0.25. The finite sample level and power behavior of the tests are shown to be quite stable across the various choices of the bandwidth. One sees that for both choices of σ1 and σ2, the empirical sizes of the test are not much different from the nominal levels for most moderate samples sizes, but they are closer to the true levels when the sample size gets larger. The simulated powers under fixed alternative \({\delta}(x)=1\) are close to 1 for all moderate sample sizes, even at α-level.025. The simulated powers under fixed alternative \({\delta}(x)=2(x-3)/x\) are seen to increase quickly with n and they are quite large for \(n\geq 600\). The simulated powers under local alternative \({\delta}(x)=2~N^{-1/2}\) are stable for most moderate sample sizes. In summary, the simulated levels and powers are consistent with the asymptotic theory at most moderate sample sizes considered.

Table 15.2 Proportion of rejections \(\{\hat{T}>2.24241 \,(2.49771)\}\) at level \(\alpha=.05 \,(.025)\) for (i) \(\sigma_1=1=\sigma_2\) and (ii) \(\sigma_1(x)=3/\sqrt{1+x^2}=\sigma_2(x)\). Three rows correspond each choice of \({\delta}(x)\) with the first row corresponding to bandwidth 0.15, the second to 0.2 and the third to 0.25

15.4 Properties of Kernel Smoothers and Weak Convergence of Empirical Processes

In this section, we first study the asymptotic behavior of the following kernel smoothers over \([a,b]\) for \(i=1,2\):

$$\begin{aligned}\hat{f}_i(x)&=\frac{1}{n_i}\sum_{j=1}^{n_i} K_{h_i}(Y_{i, j-1}-x),\!\! \Lambda_{i,n}(x) =\frac{1}{n_i}\sum_{j=1}^{n_i} \sigma_i(Y_{i,j-1})\varepsilon_{i,j}K_{h_i}(Y_{i,j-1}-x),\end{aligned}$$
(15.23)
$$\begin{aligned} \Psi_i(x,y)&=\frac{1}{n_i}\sum_{j=1}^{n_i} \frac{K_{h_i}(Y_{i,j-1}-x)}{\Psi_i(Y_{i,j-1})}1_{[Y_{i,j-1}\le y]}, \Psi_1=f_2, \Psi_2=f_1,\end{aligned}$$
(15.24)
$$\begin{aligned} \Gamma_1(y)&=\frac{1}{n_1}\sum_{i=1}^{n_1} \varepsilon_{1,i}\Psi_2(Y_{1,i-1},y),\,\, \Gamma_2(y)=\frac{1}{n_2}\sum_{i=1}^{n_2} \varepsilon_{2,i}\Psi_1(Y_{2,i-1},y),\end{aligned}$$
(15.25)
$$\begin{aligned}H_{i,j}(y)&=\frac{1}{n_i}\sum_{k=1}^{n_i} K_{h_i}(Y_{i,k-1}-y)(Y_{i,k-1}-y)^j, i, j=1,2.\end{aligned}$$
(15.26)

By Lemma 1–4 in Li (2008), we have the following results.

Lemma 4.1

Suppose conditions (A.2), (A.3) and (A.4)–(A.8) hold. Then,

$$\begin{aligned} \sup_{a\le x\le b}\left| \Lambda_{i,n}(x) \right| =O_P\left(\sqrt{\frac{\log n_i}{n_ih_i}}\right), i=1,2,\end{aligned}$$

where \(\Lambda_{i,n}(x)\) is given in (15.23).

Lemma 4.2

Suppose conditions (A.2), (A.3), (A.4), (A.6) and (A.8) hold. Then f i of (15.23) satisfies

$$\begin{aligned} \sup_{a\le x\le b}| \hat{f}_i(x)-f_i(x)|=O_P\left(\sqrt{\frac{\log n_i}{n_ih_i}}\right) +O(h_i^2),\quad i=1,2.\end{aligned}$$

Lemma 4.3

Suppose condition (A.2), (A.3), (A.4), (A.6) and (A.8) hold. Then, \(\Psi_i(x,y)\) of (15.24) satisfies

$$\begin{aligned} \sup_{\begin{array}{c} a-h_i \le x\le b+h_i\\ a\le y\le b\\\end{array}} \textit{Var}\{\Psi_i(x,y)\}=O(\frac{1}{n_ih_i}), \qquad i=1,2.\end{aligned}$$

Lemma 4.4

Suppose condition (A.2)- (A.4), (A.6) and (A.8) hold. Then \(H_{i,j}\) of (15.26) satisfies

$$\begin{aligned} H_{i,j}(y) &=h_i^j f_i(y) u_j +O_P\left(h_i^{j+1}+h_j^j\sqrt{\frac{\log n_i}{n_ih_i}}\right),\\ u_j &=\int_{-1}^1~K(u)u^j\,du,\qquad i,j=1,2\end{aligned}$$

uniformly on \(a\le x \le b\).

Next, we study the property of some empirical processes. The weak convergence of marked empirical process proved in Theorem 2.2.6 of Koul (2002) implied the following lemma:

Lemma 4.5

Suppose conditions (A.4), (A.6), (A.7) and (A.9) hold, then for \(i=1,2\),

$$\begin{aligned} \sup_{a\le y\le b} \left| \frac{1}{n_i}\sum_{j=1}^{n_i} (|\varepsilon_{i,j}|-E|\varepsilon_{i,j}|)1_{[Y_{i,j-1}\le y]} \right|=O_P(n_i^{-1/2}).\end{aligned}$$

Next, recall that \({\mathcal K}(y)=\int_{-1}^y K(t)\,dt\) is the distribution function corresponding to the kernel density K(y) on \([-1,\,1]\). We prove the following lemma:

Lemma 4.6

Suppose conditions (A.2), (A.3), (A.4), (A.6) and (A.8) hold. Then Γ i of (15.25) satisfies

$$\begin{aligned} \sup_{a\le y\le b}\left| N^{1/2}\left(\Gamma_1(y)-\frac{1}{n_1}\sum_{i=1}^{n_1} \varepsilon_{1,i}\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})}{\mathcal K}\left(\frac{y-Y_{1,i-1}}{h_2}\right)\right)\right|&=o_P(1),\\ \sup_{a\le y\le b}\left| N^{1/2}\left(\Gamma_2(y)-\frac{1}{n_2}\sum_{i=1}^{n_2} \varepsilon_{2,i}\frac{f_1(Y_{2,i-1})}{f_2(Y_{2,i-1})}{\mathcal K}\left(\frac{y-Y_{2,i-1}}{h_1}\right)\right)\right|&=o_P(1).\end{aligned}$$

Proof:

The proof is similar to the proof of Lemma 1 of Li (2008) which is in turn similar to Lemma 6.1 of Fan and Yao (2003). It is sufficient to prove the first equality. Let C denote a generic constant, which can vary from one place to another. Also let

$$\begin{aligned} N^{1/2}\left(\Gamma_1(y)-\frac{1}{n_1}\sum_{i=1}^{n_1} \varepsilon_{1,i}\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})}{\mathcal K} \left(\frac{y-Y_{1,i-1}}{h_2}\right)\right)=A_n(y)\end{aligned}$$

Now, decompose A n (y) into \(A_{1,n}(y)+A_{2,n}(y)\) with

$$\begin{aligned} A_{1,n}(y) &=N^{1/2}\frac{1}{n_1}\sum_{i=1}^{n_1} \varepsilon_{1,i}\left(\Psi_2(Y_{1,i-1},y)-E(\Psi_2(Y_{1,i-1},y))\right),\\ A_{2,n}(y) &=N^{1/2}\frac{1}{n_1}\sum_{i=1}^{n_1} \varepsilon_{1,i}\left(E(\Psi_2(Y_{1,i-1},y))-\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})}{\mathcal K}\left(\frac{y-Y_{1,i-1}}{h_2}\right)\right).\end{aligned}$$

First, we show \(\sup_{a\le y\le b}|A_{2,n}(y)|=o_P(1)\). For some \((Y_{1,i-1}^*\in [a-2h_2,\,b+2h_2]\), By Tayler expansion, We have

$$\begin{aligned} &A_{2,n}(y)\\ &= N^{1/2}\frac{1}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\!\left(\int_{-1}^{\frac{y-Y_{1,i-1}}{h_2}} K(u)\left(\frac{f_2(Y_{1,i-1}+h_2u)}{f_1(Y_{1,i-1}+h_2u)}-\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})}\right)\textit{du}\!\right)\\ &= N^{1/2}\frac{1}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\!\left(\!h_2\int_{-1}^{\frac{y-Y_{1,i-1}}{h_2}}\!\!\!u K(u) \frac{f_2^\prime(Y_{1,i-1})f_1(Y_{1,i-1})-f_1^\prime(Y_{1,i-1})f_2(Y_{1,i-1})}{f_1(Y_{1,i-1}+h_2u)f_1(Y_{1,i-1})}\,\textit{du}\right.\\ & + \left.\frac{h_2^2}{2}\int_{-1}^{\frac{y-Y_{1,i-1}}{h_2}}u^2~K(u) \frac{f_2^{\prime\prime}(Y_{1,i-1}^*)f_1(Y_{1,i-1})-f_1^{\prime\prime}(Y_{1,i-1}^*)f_2(Y_{1,i-1})}{f_1(Y_{1,i-1}+h_2u)f_1(Y_{1,i-1})}\,\textit{du}\right)\\ &\le N^{1/2}h_2\frac{1}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\int_{-1}^{\frac{y-Y_{1,i-1}}{h_2}}u K(u) \frac{f_2^\prime(Y_{1,i-1})f_1(Y_{1,i-1})-f_1^\prime(Y_{1,i-1})f_2(Y_{1,i-1})}{f_1(Y_{1,i-1}+h_2u)f_1(Y_{1,i-1})}\,\textit{du}\\ &\quad+ N^{1/2}h_2^2\frac{1}{n_1}\sum_{i=1}^{n_1}|\varepsilon_{1,i}|\cdot C,\qquad \mbox{by (A.2) and (A.4)},\end{aligned}$$

uniformly over \([a,\, b]\),

By a similar argument in proving Lemma 4.1 or Lemma 1 of Li (2008), it can be shown that

$$\begin{aligned} \sup_{a\le y \le b}&\left| \frac{1}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\int_{-1}^{\frac{y-Y_{1,i-1}}{h_2}}u K(u) \frac{f_2^\prime(Y_{1,i-1})f_1(Y_{1,i-1})-f_1^\prime(Y_{1,i-1})f_2(Y_{1,i-1})}{f_1(Y_{1,i-1}+h_2u)f_1(Y_{1,i-1})}\,\textit{du}\right|\\ &=O_P\left(\sqrt{\frac{\log n_1}{n_1h_2}}\right).\end{aligned}$$

Also, \(N^{1/2}h_2^2\frac{1}{n_1}\sum_{i=1}^{n_1}|\varepsilon_{1,i}|=O_P(N^{1/2}h_2^2))=o_P(1)\) by (A.3). Hence, by (A.3) we have

$$\begin{aligned}\sup_{a\le y \le b}|A_{2,n}(y)|&=O_P\left(\sqrt{q_1h_2\log n_1} \right)+o_P(1)\nonumber\\ &=O_P\left(\sqrt{q_1h_2\log\frac{N}{q_1}} \right)+o_P(1)=o_P(1).\end{aligned}$$
(15.27)

Now, it is left to prove

$$\sup_{a\le y \le b}|A_{1,n}(y)|=o_P(1).$$
(15.28)

Slightly simpler than the proof of Lemma 1 in Li (2008) and Lemma 6.1 in Fan and Yao (2003), the proof consists of the following two steps:

(a):

(Discretization). Partition the interval \([a,\, b]\) with length L into \(M=[(N^{1+c})^{1/2}]\) subintervals \(\{I_k\}\) of equal length. Let \(\{y_k\}\) be the centers of I k . Then

$$\sup_{a\le y\le b}|A_{1,n}(y)|\le \underset{1\le k\le M}{\max} |A_{1,n}(y_k)| +o_P(1).$$
(15.29)
(b):

(Maximum deviation for discretized series). For any small ϵ,

$$P\left(\underset{1\le j\le M}{\max}|A_{1,n}(y_j)|>\epsilon\right)\rightarrow 0.$$
(15.30)

Let \(G_{i,n}(y)=\sqrt{n_i}\left(\frac{1}{n_i}\,\sum_{j=1}^{n_i}1_{[Y_{i,j-1}\le y]}-P(Y_{i,j-1}\le y)\right)\). The strong approximation theorem for the empirical process of a stationary sequence of strong mixing random variables established in Berkes and Philipp (1997) and in Theorem 4.3 of the monograph edited by Dehling et al (2002) implied

$$\sup_{1\le k\le M}\sup_{y\in I_k} \left|G_{1,n}(y)-G_{1,n}(y_k)\right|=o_P(1).$$
(15.31)

Now, we prove part (a). First, for any \(1 \le K\le M\) and all \(y\in I_k\), we decompose \(A_{1,n}(y)-A_{1,n}(y_k)\) as \(D_{1,n}(y)+ D_{2,n}(y)\) with

$$\begin{aligned} D_{1,n}(y)&=N^{1/2}\frac{1}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\left(\Psi_2(Y_{1,i-1},y)-\Psi_2(Y_{1,i-1},y_k))\right),\\ D_{2,n}(y)&=N^{1/2}\frac{1}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\left(E(\Psi_2(Y_{1,i-1},y))-E(\Psi_2(Y_{1,i-1},y_k))\right),\end{aligned}$$

Without losing generality, it is sufficient to consider all \(y_k\le y\in I_k\). It is easy to see that

$$\begin{aligned} &|D_{1,n}(y)|\\ &\le C N^{1/2}\left(\frac{1}{n_1}\sum_{i=1}^{n_1}|\varepsilon_{1,i}|1_{[y_k-h_2 \le Y_{1,i-1}\le y+h_2]}\right) \left(\frac{1}{n_2 h_2}\sum_{i=1}^{n_2}1_{[y_k \le Y_{2,i-1}\le y]}\right)\\ &\le C N^{1/2}\!\left(\frac{1}{n_1}\sum_{i=1}^{n_1}(|\varepsilon_{1,i}|-E|\varepsilon_{1,i}|)1_{[y_k-h_2 \le Y_{1,i-1}\le y+h_2]}+\frac{1}{n_1}\sum_{i=1}^{n_1}1_{[y_k-h_2 \le Y_{1,i-1}\le y+h_2]}\!\right)\\ &\qquad \left(\frac{1}{h_2}\left[\frac{1}{\sqrt{n_2}}(G_{2,n}(y)-G_{2,n}(y_k))+P(y_k\le Y_{2,i-1}\le y)\right]\right)\\ & = C\frac{N^{1/2}}{h_2}\Big (O_P\left(\frac{1}{\sqrt{n_1}}\right)+\frac{1}{\sqrt{n_1}}(G_{1,n}(y+h_2)-G_{1,n}(y_k-h_2))\\ &\quad+P(y_k-h_2\le y_{1, i-1}\le y+h_2)\Big)\!\left(\!{o}_{P}(\frac{1}{\sqrt{n_2}})+O_P(\frac{1}{M})\!\right) \mbox{by Lemma 4.5, (15.31)}\\ & = C\frac{N^{1/2}}{h_2}\left(O_P\left(\frac{1}{\sqrt{n_1}}\right) +O_P(h_2) \right) \left(o_P(\frac{1}{\sqrt{n_2}})+O_P(\frac{1}{M})\right) \,\mbox{again by (15.31)}\\ &= o_P(1),\end{aligned}$$

and similarly,

$$\begin{aligned} |D_{2,n}(y)| & \le C N^{1/2}\left(\frac{1}{n_1}\sum_{i=1}^{n_1}|\varepsilon_{1,i}|1_{[y_k-h_2 \le Y_{1,i-1}\le y+h_2]}\right) \left(\frac{1} h_2P(y_k \le Y_{2,i-1}\le y)\right)\\ & = C\frac{N^{1/2}}{h_2}O_P(h_2)O_P\left(\frac{1}{N^{1/2+c/2}}\right) = o_P(1).\end{aligned}$$

Hence, we have

$$\begin{aligned} \sup_{1\le k\le M} \sup_{y\in I_j} |A_{1,n}(y)-A_{1,n}(y_k)|=o_P(1).\end{aligned}$$

This proves part (a). Next,

$$\begin{aligned} &P(\underset{1\le k\le M}{\max}|A_{1,n}(y_k)|>\epsilon) \le M \underset{1\le k\le M}{\max}E\left(A_{1,n}^2(y_k)\right)/\epsilon^2\\ &\quad= N^{1/2+c/2} N \frac{1}{n_1}O\left(\frac{1}{n_2 h_2}\right) \, \rightarrow 0, \qquad \mbox{by Lemma 4.3 and (A.3)}.\end{aligned}$$

This proves part (b) and hence finishes the proof of (15.28) and the Lemma. ☐

15.5 Proofs

Here, we shall give the proof of our main result, Theorem 2.1. The lemmas proved in Sect. 15.4 will facilitate the proof of this theorem. As usual, let C be a generic constant. It suffices to prove (15.15) and (15.16). Now consider \(N^{1/2} U_n(t)\) for all \(a\le t\le b\). We decompose \(N^{1/2} U_n(t)\) as \(B_{1,n}(t)-B_{2,n}(t)\) with

$$\begin{aligned} B_{1,n}(t) &=N^{1/2}\frac{1}{n_1}\sum_{i=1}^{n_1}(Y_{1, i}-\hat{\mu}_2(Y_{1,i-1}))1_{[a \le Y_{1,i-1}\le t]},\\ B_{2,n}(t) &=N^{1/2}\frac{1}{n_2}\sum_{i=1}^{n_2}(Y_{2, i}-\hat{\mu}_1(Y_{2,i-1}))1_{[a \le Y_{2,i-1}\le t]}.\end{aligned}$$

We first consider \(B_{1,n}(t)\). Recall definitions (15.12) and (15.23)–(15.25). By decomposition and simple algebra, we rewrite \(B_{1,n}(t)\) as \(I(t)-\textit{II}(t)+\textit{III}(t)\) with

$$\begin{aligned}\!\!\textit{I}(t) &= \frac{N^{1/2}}{n_1} \sum_{i=1}^{n_1}(\varepsilon_{1,i}\sigma_1(Y_{1,i-1})+(\mu_1(Y_{1,i-1})-\mu_2(Y_{1,i-1}))) {\mathbf 1} _{[a\le Y_{1,i-1}\le t]}\end{aligned}$$
(15.32)
$$\begin{aligned} \!\!\textit{II}(t) &= \frac{N^{1/2}}{n_1} \sum_{i=1}^{n_1}\frac{\sum_{j=1}^{n_2}\varepsilon_{2,j}\sigma_2(Y_{2,j-1})K_{h_2}(Y_{2,j-1}-Y_{1,i-1})}{n_2 f_2(Y_{1,n-1})}{\mathbf 1} _{[a\le Y_{1,i-1}\le t]}\end{aligned}$$
(15.33)
$$\begin{aligned} \!\!\textit{III}(t) &= \frac{N^{1/2}}{n_1} \sum_{i=1}^{n_1}\frac{\sum_{j=1}^{n_2} (\mu_2(Y_{1,i-1})-\mu_2(Y_{2,j-1})) K_{h_2}(Y_{2,j-1}-Y_{1,i-1})}{n_2 \hat{f}_2(Y_{1,i-1})}{\mathbf 1} _{[a\le Y_{1,i-1}\le t]}\end{aligned}$$
(15.34)

Now we consider \(\textit{II}(t)\). By decomposition, we rewrite it as \(\textit{II}_1(t)+\textit{II}_2(t)\) with

$$\begin{aligned}\textit{II}_1(t) &= N^{1/2}\frac{1}{n_2}\sum_{j=1}^{n_2}\varepsilon_{2,j}\sigma_2(Y_{2,j-1})(\Psi_1(Y_{2,j-1},t)-\Psi_1(Y_{2,j-1},a))\nonumber\\ &=N^{1/2}\frac{1}{n_2}\sum_{j=1}^{n_2}\varepsilon_{2,j}\sigma_2(Y_{2,j-1})\frac{f_1(Y_{2,j-1})}{f_2(Y_{2,j-1})}\left({\mathcal K}\left(\frac{t-Y_{2,j-1}}{h_1}\right)\right.\nonumber\\ &\quad-\left.{\mathcal K}\left(\frac{a-Y_{2,j-1}}{h_1}\right)\right) +o_P(1),\end{aligned}$$
(15.35)

uniformly on \(a\le t\le b\) by Lemma 4.6 and its proof, and uniformly on \(a \le t \le b\),

$$\begin{aligned} \textit{II}_2(t) &= \frac{N^{1/2}}{n_1} \sum_{i=1}^{n_1}\frac{\sum_{j=1}^{n_2}\varepsilon_{2,j}\sigma_2(Y_{2,j-1}) K_{h_2}(Y_{2,j-1}-Y_{1,i-1})}{n_2 f_2(Y_{1,i-1})}{\mathbf 1} _{[a\le Y_{1,i-1}\le t]}\\ &\frac{f_2(Y_{1,i-1})- \hat{f}_2(Y_{1,i-1})}{\hat{f}_2(Y_{1,i-1})}\\ & \le C \frac{N^{1/2}}{n_1} \sum_{i=1}^{n_1} \left|\frac{1}{n_2}\sum_{j=1}^{n_2}\varepsilon_{2,j}\sigma_2(Y_{2,j-1}) K_{h_2}(Y_{2,j-1}-Y_{1,i-1})\right| {\mathbf 1} _{[a\le Y_{1,i-1}\le t]}\\ & \;\cdot \sup_{a\le Y_{1, i-1}\le t}\left| \frac{f_2(Y_{1,i-1})- \hat{f}_2(Y_{1,i-1})}{\hat{f}_2(Y_{1,i-1})}\right|\\ & = C N^{1/2} \textit{O}_{\textit{P}}\!\left(\!\sqrt{\frac{\log{n_2}}{n_2 h_2}}\right)\cdot \left(\!\textit{O}_{\textit{P}}\!\left(\!\sqrt{\frac{\log{n_2}}{n_2 h_2}}\right)+O_P(h_2^2)\!\right)\!{,}\,\mbox{by Lemma 4.1 and 4.2}\\ & = o_P(1),\qquad \mbox{by (A.3)}.\end{aligned}$$

Next, we consider \(\textit{III}(t)\). Let \(\mu_2^{(1)}\) denote the first derivative of μ2. Then, by (A.1), uniformly on \(a\le t \le b\),

$$\begin{aligned}\textit{III}(t) &\le \frac{N^{1/2}}{n_1} \sum_{i=1}^{n_1}\frac{\mu_2^{(1)}(Y_{1,i-1}) |H_{2,1}(Y_{1,i-1})|+C|H_{2,2}(Y_{1,i-1})|}{\hat{f}_2(Y_{1,i-1})}{\mathbf 1}_ {[a\le Y_{1,i-1}\le t]}\nonumber\\ &=\textit{O}_{\textit{P}}\!\left(\!N^{1/2}\!\left(\!h_2^2+h_2\sqrt{\frac{\log n_2}{n_2h_2}}\!\right)\!\right)\! = 0_P(1){,}\;\mbox{by Lemma 4 and (A.3)}\end{aligned}$$
(15.36)

Hence, by (15.32) and (15.35)–(15.36), we have uniformly on \(a\le t\le b\),

$$\begin{aligned} B_{1,n}(t) &= \frac{N^{1/2}}{n_1} \sum_{i=1}^{n_1}(\varepsilon_{1,i}\sigma_1(Y_{1,i-1})+(\mu_1(Y_{1,i-1})-\mu_2(Y_{1,i-1}))) {\mathbf 1} _{[a\le Y_{1,i-1}\le t]}\nonumber\\ &\quad-\frac{N^{1/2}}{n_2}\sum_{j=1}^{n_2}\varepsilon_{2,j}\sigma_2(Y_{2,j-1})\frac{f_1(Y_{2,j-1})}{f_2(Y_{2,j-1})}\left({\mathcal K}\left(\frac{t-Y_{2,j-1}}{h_1}\right)\right.\nonumber\\ &\quad-\left.{\mathcal K}\left(\frac{a-Y_{2,j-1}}{h_1}\right)\right) +o_P(1)\end{aligned}$$
(15.37)

Similarly, we have uniformly on \(a\le t\le b\),

$$\begin{aligned}B_{2,n}(t) &= \frac{N^{1/2}}{n_2} \sum_{j=1}^{n_2}(\varepsilon_{2,j}\sigma_2(Y_{2,j-1})+(\mu_2(Y_{2,j-1})-\mu_1(Y_{2,j-1}))) {\mathbf 1} _{[a\le Y_{2,j-1}\le t]}\nonumber\\ &\quad-\frac{N^{1/2}}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\sigma_1(Y_{1,i-1})\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})} \left({\mathcal K}\left(\frac{t-Y_{1,i-1}}{h_2}\right)\right.\nonumber\\ &\quad-\left.{\mathcal K}\left(\frac{a-Y_{1,i-1}}{h_2}\right)\right) +o_P(1)\end{aligned}$$
(15.38)

By (15.37) and (15.38), we proved (15.15).

Now, we need to prove (15.16). Applying the CLT for martingales [Hall and Heyde (1980), Corollary 3.1], we first could show that the finite-dimensional distributions of \(\frac{N^{1/2}}{\tau_n^2(b)}V_n(t)\) tend to the right limit. Then, apply theorem for weak convergence on functional space [Hall and Heyde (1980), Theorem A.2], we need to prove the tightness of \(\frac{N^{1/2}}{\tau_n^2(b)}V_n(t)\). It suffices to prove the tightness of

$$\begin{aligned} \frac{N^{1/2}}{n_1} \sum_{i=1}^{n_1} \varepsilon_{1,i}\sigma_1(Y_{1,i-1}) {\mathbf 1} _{[a\le Y_{1,i-1}\le t]}\end{aligned}$$

and

$$\begin{aligned} \frac{N^{1/2}}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\sigma_1(Y_{1,i-1})\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})}{\mathcal K}\left(\frac{t-Y_{1,i-1}}{h_2}\right).\end{aligned}$$

The tightness of the first sequence is implied by the weak convergence of a marked empirical process [Koul and Stute (1999), Lemma3.1].

Since \({\mathcal K}\left(\frac{t-Y_{1,i-1}}{h_2}\right)=1\) for \(Y_{1,i-1}\le t-h_2\) and \({\mathcal K}\left(\frac{t-Y_{1,i-1}}{h_2}\right)1_{[t-h_2 \le Y_{1,i-1}\le t+h_2]}\) just behaves like \(h_2 K_{h_2}(t-Y_{1,i-1})\), the second sequence can be rewritten as

$$\begin{aligned} &\frac{N^{1/2}}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\sigma_1(Y_{1,i-1})\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})}1_{[Y_{1,i-1}\le t-h_2]}\\ &\quad+\frac{N^{1/2}}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\sigma_1(Y_{1,i-1})\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})}{\mathcal K}\left(\frac{t-Y_{1,i-1}}{h_2}\right)1_{[t-h_2 \le Y_{1,i-1}\le t+h_2]},\end{aligned}$$

with second term bing o P (1) uniformly on \(a\le t\le b\) by a proof similar to that of Lemma 4.1. Again, by the weak convergence of a marked empirical process [Koul and Stute (1999), Lemma3.1], we could prove the tightness of \(\frac{N^{1/2}}{n_1}\sum_{i=1}^{n_1}\varepsilon_{1,i}\sigma_1(Y_{1,i-1})\frac{f_2(Y_{1,i-1})}{f_1(Y_{1,i-1})}1_{[Y_{1,i-1}\le t-h_2]}\). Therefore, we complete the proof of the main theory.