A two-sample test for the error distribution in nonparametric regression based on the characteristic function

Rivas-Martínez, G. I.; Jiménez-Gamero, M. D.; Moreno-Rebollo, J. L.

doi:10.1007/s00362-017-0878-8

A two-sample test for the error distribution in nonparametric regression based on the characteristic function

Regular Article
Published: 30 January 2017

Volume 60, pages 1369–1395, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Statistical Papers Aims and scope Submit manuscript

A two-sample test for the error distribution in nonparametric regression based on the characteristic function

Download PDF

G. I. Rivas-Martínez¹,
M. D. Jiménez-Gamero ORCID: orcid.org/0000-0002-8823-3292² &
J. L. Moreno-Rebollo²

300 Accesses
6 Citations
Explore all metrics

Abstract

A test for the equality of error distributions in two nonparametric regression models is proposed. The test statistic is based on comparing the empirical characteristic functions of the residuals calculated from independent samples of the models. The asymptotic null distribution of the test statistic cannot be used to estimate its null distribution because it is unknown, since it depends on the unknown common error distribution. To approximate the null distribution, a weighted bootstrap estimator is studied, providing a consistent estimator. The finite sample performance of this approximation as well as the power of the resulting test are evaluated by means of a simulation study. The procedure can be extended to testing for the equality of $d>2$ error distributions.

A model specification test for the variance function in nonparametric regression

Article 27 July 2018

The asymptotic normality of internal estimator for nonparametric regression

Article Open access 10 September 2018

Comparing regression curves: an L¹-point of view

Article 30 August 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Let $(X_1,Y_1)$ and $(X_2,Y_2)$ be two independent random vectors. Assume that they satisfy the general nonparametric regression models,

$$\begin{aligned} Y_k=m_k(X_k)+\sigma _k(X_k)\varepsilon _k, \end{aligned}$$

(1)

where $ m_k(x) = E(Y_k \mid X_k=x) $ is the regression function, $\sigma _k^2(x) = Var(Y_k \mid X_k=x)$ is the conditional variance function and $\varepsilon _k$ is the regression error, which is assumed to be independent of $X_k$, $k=1,2$. By construction $E(\varepsilon _k)=0$ and $Var(\varepsilon _k)$=1. The regression functions, the variance functions, the distributions of the error and that of the covariates are unknown and no parametric models are assumed for them. We are interested in testing for the equality of the error distributions, that is, in tests of the null hypothesis

$$\begin{aligned} H_0: F_1= F_2, \end{aligned}$$

versus the alternative

$$\begin{aligned} H_1: F_1\ne F_2, \end{aligned}$$

where $F_1$, $F_2$ stand for the cumulative distribution function (CDF) of $\varepsilon _1$ and $\varepsilon _2$, respectively. In view of the uniqueness of the characteristic function (CF), the null hypothesis can equivalently be stated as

$$\begin{aligned} H_0: C_1=C_2, \end{aligned}$$

versus the alternative

$$\begin{aligned} H_1: C_1 \ne C_2, \end{aligned}$$

where $C_k$ denotes the CF corresponding to $F_k$, that is, $C_k(t)=\int \exp (\mathrm{i}t x)$$ dF_k(x)=R_{k}(t)+\mathrm{i}I_{k}(t)$, $k=1,2$.

The equality of the error distributions is a usual assumption in several statistical problems such as that of testing for the equality of regression curves (see for example, Young and Bowman 1995; Hall and Hart 1990; Kulasekera and Wang 2001). The equality of the error distributions may considerably simplify some procedures. For instance, under $H_0$, the asymptotic null distribution of the test statistic for the equality of regression functions in Pardo-Fernández et al. (2015a) coincides with the one of the classical ANOVA test for comparing means (it also requires the equality of the densities of the covariates). Another example is given by the asymptotic null distribution of the test statistic for the equality of variance functions in Pardo-Fernández et al. (2015b) based on CFs. The authors prove that, when the covariates are identically distributed and the error distributions are equal, the asymptotic distribution coincides with that of the classical Levene test for comparing variances. Thus, the problem of testing for $H_0$ is of considerable practical interest.

The problem of testing whether two samples come from the same population has generated a considerable amount of research. Many different approaches have been proposed to deal with this problem when the data are observable (see, for example, the references in Meintanis 2005; Alba-Fernández et al. 2008; Hobza et al. 2014; Baringhaus and Kolbe 2015; Modarres 2016). In our setting the errors are not observable and the inference must be based on the residuals, which are not independent even if the original data are. The number of proposals to deal with this case is not so big. Mora (2005) proposed tests for testing $H_0$ when the regression models are linear; for the more general model given in (1), Pardo-Fernández (2007)—PF07 from now on- also proposed tests for $H_0$. These two papers study Kolgomorov–Smirnov (KS) and Cramér–von Misses (CvM) type test statistics based on the empirical CDFs of the residuals. Since the null distribution of these test statistics is unknown, these papers use a smooth bootstrap to approximate the critical values. Two main problems with these procedures are: they assume strong conditions on the distributions of the errors which, among other things, are supposed to have a smooth density; in addition, although quite easy to implement, the bootstrap can become computationally expensive as the sample sizes of the data increase.

This paper proposes a test for $H_0$ that is based on comparing the empirical CF (ECF) of the residuals in samples from the models. It can be seen as a residual version of the test in Alba-Fernández et al. (2008), designed for the two-sample problem when observable independent and identically distributed (IID) data are available from each population. A weighted bootstrap (WB) estimator, in the sense of Burke (2000), is proposed to approximate the critical values. This method has been previously used in Kojadinovic and Yan (2012) and Ghoudi and Rémillard (2014), to approximate the null distribution of goodness-of-fit tests based on the empirical CDF, in Jiménez-Gamero and Kim (2015), to approximate the null distribution of goodness-of-fit tests based on the ECF, in Quessy and Éthier (2012), for the two-sample problem for dependent data, and in Jiménez-Gamero et al. (2016), for the two-sample problem for observable independent data, among others. In view of the good properties of the WB in these and other papers, it is also expected to work well for approximating the null distribution of the test statistic considered in this paper.

Compared to the tests in PF07, the procedure suggested in this paper has two main advantages. First, it assumes less stringent assumptions on the distribution of the regression errors. Specifically, we do not assume that the error distribution has a probability density function. Thus, the method can be applied when such distribution is arbitrary: continuous, discrete or mixed. Secondly, the WB approximation is computationally more efficient than that based on the smooth bootstrap.

The paper is organized as follows. Section 2 describes the test statistic. The problem of approximating the null distribution of the proposed test statistic is dealt with in Sect. 3, where the use of a WB estimator is studied. The consistency of the resulting null distribution estimator is proved. It is also shown that the resulting test is consistent, in the sense of being able to detect any alternative. Some practical issues are addressed in Sect. 4. Section 5 reports the results of some simulation experiments designed to study the finite sample performance of the proposed approximation, to compare it with other methods as well as a real data application. From this numerical study it is concluded that the WB approximation works, in the sense of providing levels close to the nominal values, and that the power of the test is comparable or even greater than the power of the test based on the empirical CDF. Section 6 shows how the proposed test can be extended to the general case of comparison of $d>2$ error distributions. All proofs and technical details are deferred to the last section.

The following notation will be used along the paper: all vectors are column vectors; the superscript $^T$ denotes transpose; $1_n\in \mathbb {R}^n$ has all its components equal to 1; if $x \in \mathbb {R}^k$, with $x^T=(x_1, \ldots , x_k)$, then diag(x) is the $k \times k$ diagonal matrix whose (i, i) entry is $x_i$, $1\le i \le k$; $P_0$, $E_0$ and $Cov_0$ denote probability, expectation and covariance, respectively, by assuming that the null hypothesis is true; $P_*$, $E_*$ and $Cov_*$ denote the conditional probability law, expectation and covariance, given the data, respectively; $\mathop {\rightarrow }\limits ^{\mathcal {L}}$ denotes convergence in distribution; $\mathop {\rightarrow }\limits ^{P}$ denotes convergence in probability; $\mathop {\rightarrow }\limits ^{a.s.}$ denotes the almost sure convergence; for any complex number $z=a+\text{ i }b$, |z| is its modulus; an unspecified integral denotes integration over the whole real line $\mathbb {R}$; for a given non-negative real-valued function $\omega $ we denote $\Vert \cdot \Vert _{\omega }$ to the norm and $\langle \cdot , \cdot \rangle _\omega $ to the scalar product in the Hilbert space $L^2(\omega )=\{g:\mathbb {R} \rightarrow \mathbb {C}, \, \int |g(t)|^2\omega (t)dt<\infty \}$.

2 The test statistic

Let $(X_{kj},Y_{kj})$, $j=1,2,\ldots ,n_k$, be an IID sample from $(X_k,Y_k)$ satisfying (1), and let $\varepsilon _{k1},\ldots ,\varepsilon _{kn_k}$ denote the associated errors, $k=1,2$. Since the hypothesis $H_0$ establishes the equality of the distributions of the errors $\varepsilon _{kj}$ and they are not observable, the inference must be based on the residuals,

$$\begin{aligned} \hat{\varepsilon }_{kj}=\frac{Y_{kj}-\hat{m}_k(X_{kj})}{\hat{\sigma }_k(X_{kj})}, \quad j=1,2,\ldots ,n_k, \end{aligned}$$

(2)

where $\hat{m}_k$ and $\hat{\sigma }_k$ are estimators of ${m}_k$ and $\sigma _k$, respectively, $k=1,2$. Several choices are possible for $\hat{m}_k$ and $\hat{\sigma }_k$. Here, as in PF07, we use the following kernel estimators for the regression function $m_k$ and the variance function $\sigma _k^2$,

$$\begin{aligned} \hat{m}_k(x)= \sum _{j=1}^{n_k} W_{kj}(x;h_{k})Y_{kj}, \quad x \in S, \end{aligned}$$

$$\begin{aligned} \hat{\sigma }_k^2(x)=\sum _{j=1}^{n_k} W_{kj}(x;h_{k})\left\{ Y_{kj}-\hat{m}_k(x)\right\} ^2, \quad x \in S, \end{aligned}$$

where

$$\begin{aligned} W_{kj}(x;h_{k})=\frac{K_{h_{k}}\left( X_{kj}-x\right) }{\sum _{s=1}^{n_k} K_{h_{k}}\left( X_{ks}-x\right) }, \quad x \in S, \end{aligned}$$

$K_{h}(\cdot )=\frac{1}{h}K(\frac{\cdot }{h})$, $K(\cdot )$ is a kernel and $h_{k}$ is the bandwidth, satisfying certain conditions that will be specified later. The proposed test statistic takes the form

$$\begin{aligned} T_{n_1,n_2}=\Vert \hat{C}_1-\hat{C}_2\Vert ^2_\omega , \end{aligned}$$

where

$$\begin{aligned} \hat{C}_k(t)=\frac{1}{n_k}\sum _{j=1}^{n_k} \exp (\mathrm{i }t\hat{\varepsilon }_{kj})=\hat{R}_k(t)+\mathrm{i} \hat{I}_k(t), \end{aligned}$$

(3)

$$\begin{aligned} \hat{R}_k(t)=\frac{1}{n_k}\sum _{j=1}^{n_k}\cos (t\hat{\varepsilon }_{kj}),\quad \hat{I}_k(t)=\frac{1}{n_k}\sum _{j=1}^{n_k} \sin (t\hat{\varepsilon }_{kj}), \end{aligned}$$

$k=1,2,$ and $\omega (t)$ is a non-negative function.

In order to study properties of $T_{n_1,n_2}$ some assumptions will be required. Next we list them.

(A.1)
All limits in this paper are taken when $n_1,n_2 \rightarrow \infty $ in such a way that
$$\begin{aligned} \lim \frac{n_1}{N}=\tau , \quad \text {for some} \quad \tau \in (0,1), \end{aligned}$$
where $N=n_1+n_2$.
(A.2)
The weight function $\omega (t)$ is a non-negative symmetric function,
$$\begin{aligned} \omega (t)=\omega (-t), \quad \forall t, \end{aligned}$$
and $\int t^4 \omega (t)dt<\infty $.

There is no restriction in assuming that the weight function $\omega (t)$ is symmetric because otherwise by defining $\omega _1(t)=0.5\{\omega (t)+\omega (-t)\}$, which is clearly symmetric, we have that

$$\begin{aligned} \Vert C_1-C_2\Vert _{\omega }=\Vert C_1-C_2\Vert _{\omega _1}, \end{aligned}$$

for any two CFs $C_1$ and $C_2$. Note that the symmetry of $\omega $ implies that

$$\begin{aligned} T_{n_1,n_2}=\Vert \hat{R}_{1}-\hat{R}_{2}+\hat{I}_{1}-\hat{I}_{2}\Vert ^2_\omega . \end{aligned}$$

The following assumption will be required to ensure that $\hat{m}_k$ and $\hat{\sigma }_k$ provide consistent estimators of $m_k$ and $\sigma _k$, respectively.

(A.3)
For $k=1,2$,
1. (i)
  $E(\varepsilon _{k}^4)<\infty $.
2. (ii)
  $X_k$ is absolutely continuous with compact support S and density function $f_{k}$.
3. (iii)
  $f_k, m_k$ and $\sigma _k$ are twice continuously differentiable.
4. (iv)
  $\inf _{x\in S}f_k>0$ and $\inf _{x\in S}\sigma _k>0$.
5. (v)
  $n_kh_{k}^4\rightarrow 0$ and $n_kh_{k}^2/\ln n_k \rightarrow \infty $.
6. (vi)
  K is a symmetric density function with compact support and twice continuously differentiable.

For simplicity we assume that the same kernel function, K, is used in both populations. Nevertheless, the results to be stated remain true if different kernels are used, whenever they satisfy Assumption (A.3)(vi).

In order to give a sound justification of $T_{n_1,n_2}$ as a test statistic for testing $H_0$ we next derive its limit.

Theorem 1

Suppose that Assumptions (A.1)–(A.3) hold, then $T_{n_1,n_2}\mathop {\rightarrow }\limits ^{P} \kappa =\Vert C_1-C_2\Vert ^2_{\omega }.$

Note that $\kappa \ge 0$. If $H_0$ is true then $\kappa =0$. Since two distinct characteristic functions can be equal in a finite interval (Feller 1971, p. 506), a general way to ensure that $\kappa =0$ iff $H_0$ is true is to take $\omega $ positive for almost all (with respect to the Lebesgue measure) points in $\mathbb {R}$. Thus, a reasonable test for testing $H_0$ should reject the null hypothesis for large values of $T_{n_1,n_2}$. Now, to determine what are large values we must calculate its null distribution, or at least an approximation to it. This is the topic of the next section.

3 Approximating the null distribution

The null distribution of $T_{n_1,n_2}$ is clearly unknown, so it must be approximated. We first try to estimate it by means of its asymptotic null distribution.

Let

$$\begin{aligned} R'_k(t)=-\int x \sin (tx)dF_k(x), \quad I'_k(t)= \int x \cos (xt)dF_k(t), \end{aligned}$$

$k=1,2$. Note that under the null hypothesis $R_1(t)=R_2(t)=R(t)$, $I_1(t)=I_2(t)=I(t)$, $R'_1(t)=R'_2(t)=R'(t)$ and $I'_1(t)=I'_2(t)=I'(t)$.

Theorem 2

Suppose that Assumptions (A.1)–(A.3) hold. If $H_0$ is true, then

$$\begin{aligned} \frac{n_1n_2}{N}T_{n_1,n_2}\mathop {\longrightarrow }\limits ^{\mathcal {L}}\Vert Z\Vert ^2_{\omega }, \end{aligned}$$

where $\{Z(t), \, t\in \mathbb {R}\}$ is a centered Gaussian process on $L_2(\omega )$ with covariance kernel $\varrho _0(t,s)=Cov_0\{Z_0(\varepsilon ;t),Z_0(\varepsilon ;s)\}$ and

$$\begin{aligned} Z_0(\varepsilon ;t)= & {} \cos (t\varepsilon )+t\varepsilon I(t)-t\frac{\varepsilon ^2-1}{2}R'(t)-R(t) \\ {}&+ \sin (t\varepsilon )-t\varepsilon R(t)-t\frac{\varepsilon ^2-1}{2}I'(t)-I(t). \end{aligned}$$

Remark 1

The asymptotic null distribution of $\frac{n_1n_2}{N}T_{n_1,n_2}$ can be expressed as

$$\begin{aligned} \Vert Z\Vert _{\omega }^2 \mathop {=}\limits ^{d} \sum _{j\ge 1} \lambda _j Z_j^2, \end{aligned}$$

(4)

where $\mathop {=}\limits ^{d}$ stands from the equality in distribution, $Z_1, Z_2, \ldots $ are independent standard normal variables and the set $\{ \lambda _j\}_{j \ge 1}$ are the non-zero eigenvalues of the integral equation

$$\begin{aligned} \int \varrho _0(t,s)g_j(t)w(t)dt=\lambda _j g_j(s) \end{aligned}$$

with corresponding eigenfunctions $\{g_j(\cdot )\}_{j \ge 1}$.

From Remark 1 it becomes evident that the asymptotic null distribution of $T_{n_1,n_2}$ does not provide a useful approximation to its null distribution since it depends on the unknown common distribution. So, we next study another way of approximating it by means of a WB estimator.

Let

$$\begin{aligned} C_\tau (t)= \tau C_1(t)+(1-\tau )C_2(t)=R_\tau (t)+\mathrm{i}I_\tau (t). \end{aligned}$$

Let $\xi _{11},\ldots ,\xi _{1n_1},\xi _{21},\ldots ,\xi _{2n_2}$ be IID random variates with mean 0 and variance 1, which are independent of $(X_{11},Y_{11}),\ldots ,(X_{1n_1},Y_{1n_1}),(X_{21},Y_{21}),\ldots ,$$(X_{2n_2},Y_{2n_2})$. We define the following WB version of $T_{n_1,n_2}$,

$$\begin{aligned} T_{1,n_1,n_2}^*=\Vert C_{1}^*-C_{2}^*\Vert ^2_\omega , \end{aligned}$$

where

$$\begin{aligned} {C}_{k}^*(t)=\frac{1}{n_k}\sum _{j=1}^{n_k}\xi _{kj}Z_{k,\tau }(\varepsilon _{kj};t), \end{aligned}$$

$$\begin{aligned} \begin{array}{rcl} Z_{k,\tau }(\varepsilon ; t) &{} = &{} \displaystyle \cos (t\varepsilon )+t\varepsilon I_{k}(t)-t\frac{\varepsilon ^2-1}{2}R_{k}'(t)-R_\tau (t)\\ &{} &{} \displaystyle +\sin (t\varepsilon )-t\varepsilon R_{k}(t)-t\frac{\varepsilon ^2-1}{2}I_{k}'(t)-I_\tau (t), \end{array} \end{aligned}$$

(5)

$k=1,2$. The next result gives the weak limit of the conditional distribution of $T_{1,n_1,n_2}^*$, given the data $(X_{11},Y_{11}),\ldots ,$$(X_{1n_1},Y_{1n_1}),(X_{21},Y_{21}),\ldots , $$(X_{2n_2},Y_{2n_2})$.

Theorem 3

Suppose that Assumptions (A.1)–(A.3) hold, then

$$\begin{aligned} \sup _x \left| P_*\left\{ \frac{n_1n_2}{N}T_{1,n_1,n_2}^*\le x\right\} -P\left\{ T_\tau \le x \right\} \right| \mathop {\longrightarrow }\limits ^{P}0, \end{aligned}$$

where $T_\tau =\Vert Z_\tau \Vert ^2_\omega $, $\{Z_\tau (t), \, t\in \mathbb {R}\}$ is a centered Gaussian process on $L_2(\omega )$ with covariance kernel $\varrho _\tau (t,s)=(1-\tau )\varrho _{1,\tau }(t,s)+\tau \varrho _{2,\tau }(t,s)$ and $\varrho _{k,\tau }(t,s)=E\{Z_{k,\tau }(\varepsilon _k;t)Z_{k,\tau }(\varepsilon _k;s)\}$, $k=1,2$.

The result in Theorem 3 is valid whether the null hypothesis is true or not. If $H_0$ holds, then the kernels $\varrho _0(t,s)$ and $\varrho _\tau (t,s)$ coincide. Therefore, a direct consequence of Theorems 2 and 3 is that the conditional distribution of $T_{1,n_1,n_2}^*$, given the data, provides a consistent estimator of the distribution of $T_{n_1,n_2}$ when $H_0$ is true. However, from a practical point of view, this result is useless because the function $Z_{k,\tau }(\varepsilon _{kj};t)$ depends on the non-observable error $\varepsilon _{kj}$ and on the unknown values of the functions $R_{k}(t)$, $I_{k}(t)$, $R_{k}'(t)$ and $I_{k}'(t)$, $j=1,\ldots ,n_j$, $k=1,2$. To overcome these difficulties we replace $\varepsilon _{kj}$ by $\hat{\varepsilon }_{kj}$, $R_{k}(t)$ by $\hat{R}_{k}(t)$, $I_{k}(t)$ by $\hat{I}_{k}(t)$, $R_{k}'(t)$ by $\hat{R}_{k}'(t)$ and $I_{k}'(t)$ by $\hat{I}_{k}'(t)$, with

$$\begin{aligned} \hat{R}'_{k}(t)= -\frac{1}{n_k}\sum _{j=1}^{n_k} \hat{\varepsilon }_{kj}\sin (t\hat{\varepsilon }_{kj}), \quad \hat{I}'_{k}(t)=\frac{1}{n_k}\sum _{j=1}^{n_k} \hat{\varepsilon }_{kj}\cos (t\hat{\varepsilon }_{kj}). \end{aligned}$$

So, instead of $T_{1,n_1,n_2}^*$, now we consider

$$\begin{aligned} T_{2,n_1,n_2}^*= \Vert \hat{U}^*_1-\hat{U}^*_2\Vert ^2_\omega , \end{aligned}$$

where

$$\begin{aligned} \begin{array}{ll} \hat{U}^*_{k}(t)=&{}\frac{1}{n_k}\sum _{j=1}^{n_k}\bigg \{\cos (t\hat{\varepsilon }_{kj})+t\hat{\varepsilon }_{kj} \hat{I}_{k}(t)-t\frac{\hat{\varepsilon }_{kj}^2-1}{2}\hat{R}_{k}'(t)-\hat{R}_\tau (t) \\ &{}+\,{\sin }(t\hat{\varepsilon }_{kj})-t\hat{\varepsilon }_{kj} \hat{R}_{k}(t)-t\frac{\hat{\varepsilon }_{kj}^2-1}{2}\hat{I}_{k}'(t)-\hat{I}_\tau (t)\bigg \}\xi _{kj}, \end{array} \end{aligned}$$

(6)

$k=1,2$, and

$$\begin{aligned} \hat{R}_\tau (t) =\frac{n_1}{N} \hat{R}_{1}(t)+ \frac{n_2}{N} \hat{R}_{2}(t), \quad \hat{I}_\tau (t) =\frac{n_1}{N} \hat{I}_{1}(t)+ \frac{n_2}{N} \hat{I}_{2}(t). \end{aligned}$$

The next theorem states that replacing $\varepsilon _{kj}$ by $\hat{\varepsilon }_{kj}, \ldots , \, I_{k}'(t)$ by $\hat{I}_{k}'(t)$ in the expression of $T_{1,n_1,n_2}^*$ has no asymptotic effect, in the sense that both $T_{1,n_1,n_2}^*$ and $T_{2,n_1,n_2}^*$ have the same conditional asymptotic distribution, given the data. Observe that all quantities involved in the definition of $T_{2,n_1,n_2}^*$ are known, thus, in principle, one could be able to know, or at least to accurately approximate its conditional distribution, given the data. This practical issue will be handled in Sect. 4.

Theorem 4

Suppose that Assumptions (A.1)–(A.3) hold, then

$$\begin{aligned} \sup _x \left| P_*\left\{ \frac{n_1n_2}{N}T_{2,n_1,n_2}^* \le x \right\} -P\left\{ T_\tau \le x \right\} \right| \mathop {\longrightarrow }\limits ^{P} 0, \end{aligned}$$

where $T_\tau $ is as defined in Theorem 3.

The result in Theorem 4 is valid whether the null hypothesis $H_0$ is true or not. As observed before for $T_{1,n_1,n_2}^*$, an immediate consequence of this fact and Theorem 2 is the following.

Corollary 1

If $H_0$ is true and the assumptions in Theorem 4 hold, then

$$\begin{aligned} \sup _x \left| P_*\left\{ \frac{n_1n_2}{N}T_{2,n_1,n_2}^* \le x\right\} -P\left\{ \frac{n_1n_2}{N}T_{n_1,n_2}\le x\right\} \right| \mathop {\longrightarrow }\limits ^{P}0. \end{aligned}$$

Let $\alpha \in (0,1)$. For testing $H_0$ we consider

$$\begin{aligned} \Psi _*=\left\{ \begin{array}{lll} 1, &{} \quad \text {if} \quad T_{n_1,n_2}\ge t_{2,n_1,n_2,\alpha }^*,\\ 0, &{}\quad \text {otherwise,} \end{array}\right. \end{aligned}$$

where $ t_{2,n_1,n_2,\alpha }^*$ is the $1-\alpha $ percentile of the conditional distribution of $T_{2,n_1,n_2}^*$, or equivalently, $\Psi _*=1$ if $p^*\le \alpha $, where $p^*=P_* \left\{ T_{2,n_1,n_2}^* \ge T_{n_1,n_2,obs}\right\} $ and $T_{n_1,n_2,obs}$ is the observed value of the test statistic. The result in Corollary 1 states that $\Psi _*$ is asymptotically correct, in the sense that its type I error probability is asymptotically equal to the nominal value $\alpha $.

From Theorems 1, 2 and 4, it readily follows the next result.

Corollary 2

Suppose that $H_0$ is not true, the assumptions in Theorem 4 hold and $\omega $ is such that

$$\begin{aligned} \kappa =\Vert C_1-C_2\Vert ^2_{\omega }>0, \end{aligned}$$

(7)

then $P(\Psi _*=1)\rightarrow 1$.

Corollary 2 shows that, if $\omega $ is such that (7) holds, then the test $\Psi _*$ is consistent in the sense of being able to asymptotically detect any (fixed) alternative. As discussed before, a general way to ensure (7) is to take $\omega $ positive for almost all (with respect to the Lebesgue measure) points in $\mathbb {R}$.

Remark 2

The results so far stated keep on being true if instead of using the raw multipliers, $\xi _{11},\ldots ,\xi _{1n_1}$, $\xi _{21},\ldots ,\xi _{2n_2}$, we use the centered multipliers, $\xi _{11}-\bar{\xi }_1,\ldots ,\xi _{1n_1}-\bar{\xi }_1$, $\xi _{21}-\bar{\xi }_2,\ldots ,\xi _{2n_2}-\bar{\xi }_2$, where $\bar{\xi }_k=\frac{1}{n_k}\sum _{j=1}^{n_k}\xi _{jk}$, $k=1,2$, as suggested in Burke (2000) and Kojadinovic and Yan (2012).

Remark 3

In Remark 1 we saw that the null distribution of $T_{n_1,n_2}$ is a linear combination of independent $\chi ^2$ variables, the weights in that linear combination being the eigenvalues $\{ \lambda _j\}_{j \ge 1}$ of certain functional. Routine algebra shows that the conditional distribution of $T_{2,n_1,n_2}^*$, given the data, can be also expressed as a linear combination of a certain finite set of variables, $\sum _{j=1}^N \hat{\lambda }_j W_j^2$, where $\hat{\lambda }_1, \ldots , \hat{\lambda }_{N}$ are the eigenvalues of the symmetric $N \times N$-matrix $M_2$, that will be defined in next section (see Eqs. (9) and (10)), and $(W_1, \ldots , W_{N})=(\xi _{11}, \ldots , \xi _{1n_1},\xi _{21}, \ldots , \xi _{2n_2})H$, H being the matrix containing the eigenvectors associated to the eigenvalues $\hat{\lambda }_1, \ldots , \hat{\lambda }_{N}$, that is, $M_2=H \, diag(\hat{\lambda }_1, \ldots , \hat{\lambda }_{N}) H^T$. What really happens is that the set $\{\hat{\lambda }_j\}_{j =1}^N$ converges to $\{ \lambda _j\}_{j \ge 1}$ (see Delhing and Mikosch 1994).

Remark 4

From Remark 3 it becomes evident that the conditional distribution of $T_{2,n_1,n_2}^*$, given the data, depends on the distribution of $(W_1, \ldots , W_{N})$. The distribution of this random vector is, in general, unknown. For the special case where the multipliers come from a standard normal distribution, the vector $(W_1, \ldots , W_{N})$ has independent components distributed according to a standard normal distribution, and thus the conditional distribution of $T_{2,n_1,n_2}^*$, given the data, is a finite linear combination of independent $\chi ^2$ variables, where the weights in the linear combination are $\hat{\lambda }_1, \ldots , \hat{\lambda }_{N}$. Note that in this special case the WB distribution of the test statistic is of the same type as its asymptotic null distribution.

4 On the practical calculation

This section describes some computational issues related to the calculation of the test statistic $T_{n_1,n_2}$ and the WB approximation to its null distribution.

4.1 Calculation of the test statistic

Let

$$\begin{aligned} M=\left( \begin{array}{cc} M_{11} &{}\quad M_{12}\\ M_{21} &{}\quad M_{22}\end{array}\right) , \quad A=\left( \begin{array}{cc} \frac{1}{n_1^2}1_{n_1}1_{n_1}^T &{}\quad \frac{-1}{n_1n_2}1_{n_1}1_{n_2}^T\\ \frac{-1}{n_1 n_2}1_{n_2}1_{n_1}^T &{}\quad \frac{1}{n_2^2}1_{n_2}1_{n_2}^T\end{array}\right) , \end{aligned}$$

with $M_{rs}=(\varphi _{\omega }(\hat{\varepsilon }_{rj}-\hat{\varepsilon }_{sv}))_{1\le j \le n_r,\, 1\le v \le n_v}$, $r,s=1,2$, and

$$\begin{aligned} \varphi _{\omega }(x)=\int \cos (t'x)\omega (t)dt. \end{aligned}$$

(8)

Let v be the vector of $\mathbb {R}^{N}$ with the first $n_1$ components equal to $1/n_1$ and the rest equal to $-1/n_2$. In practice, the test statistic $T_{n_1,n_2}$ can be computed by using the following expression (see Lemma 1 in Alba-Fernández et al. 2008)

$$\begin{aligned} T_{n_1,n_2}= & {} \frac{1}{n_1^2}\sum _{j,r=1}^{n_1}\varphi _{\omega }(\hat{\varepsilon }_{1j}-\hat{\varepsilon }_{1r})+ \frac{1}{n_2^2}\sum _{l,v=1}^{n_2}\varphi _{\omega }(\hat{\varepsilon }_{2l}-\hat{\varepsilon }_{2v} )\\&- \frac{2}{n_1n_2}\sum _{j=1}^{n_1}\sum _{l=1}^{n_2}\varphi _{\omega }(\hat{\varepsilon }_{1j}-\hat{\varepsilon }_{2l})\\= & {} v^TMv\\= & {} 1_{N}^TM_11_{N}, \end{aligned}$$

with $M_1=M \odot A$, $\odot $ denoting the Hadamard product.

The WB version of $T_{n_1,n_2}$, $T_{2,n_1,n_2}^*$, can be expressed as

$$\begin{aligned} T_{2,n_1,n_2}^{*}=\xi ^TM_2 \xi , \end{aligned}$$

with $\xi ^T=(\xi _{11}, \ldots , \xi _{1n_1},\xi _{21}, \ldots , \xi _{2n_2})$ and

$$\begin{aligned} M_2=M_3\odot A, \quad M_3=\left( \begin{array}{cc} M_{3,11} &{}\quad M_{3,12}\\ M_{3,21} &{}\quad M_{3,22}\end{array}\right) , \end{aligned}$$

(9)

$$\begin{aligned} M_{3,rs}=\left( \int Z_{r,\tau }(\hat{\varepsilon }_{rj};t)Z_{s,\tau }(\hat{\varepsilon }_{sv};t)\omega (t)dt\right) _{1\le j \le n_r,\, 1\le v \le n_v}, \end{aligned}$$

(10)

$ r,s=1,2$. An explicit expression for $M_3$ is given in the Appendix.

4.2 Calculation of the WB distribution of the test statistic

Normal multipliers. As observed in Remark 4, if the multipliers have a normal distribution then, conditional on the data, $T_{2,n_1,n_2}^{*}$ is distributed as $W=\sum _{j=1}^{N}\hat{\lambda }_j\chi ^2_{1,j}$, where $\hat{\lambda }_1, \ldots , \hat{\lambda }_{N}$ are the eigenvalues of $M_2$ and $\chi ^2_{1,1}, \ldots , \chi ^2_{1,N}$ are independent variables having a chi-squared distribution with 1 degree of freedom. The law of W can be numerically approximated by using, for example, Imhof’s method (Imhof 1961). In this case, the WB estimator of the p-value can be calculated as follows:

Algorithm 1

1.
Calculate the residuals $\hat{\varepsilon }_{11}, \ldots ,\hat{\varepsilon }_{1n_1}$,$\hat{\varepsilon }_{21}, \ldots ,\hat{\varepsilon }_{2n_2}$.
2.
Calculate the observed value of the test statistic $T_{n_1,n_2}$, $T_{n_1,n_2,obs}$.
3.
Calculate the eigenvalues of $M_2$, $\hat{\lambda }_1, \ldots , \hat{\lambda }_{N}$.
4.
Approximate the p-value by $\hat{p}^{*}=P_*\left( \sum _{j=1}^{N}\hat{\lambda }_j\chi ^2_{1,j}>T_{n_1,n_2,obs}\right) $.

Arbitrary multipliers. As also observed in Remark 4, the WB distribution of $T_{n_1,n_2}$ is unknown for arbitrary multipliers. Nevertheless, the WB p-value estimator can be easily approximated by simulation as follows. Let $\Delta (u)=1$ if $u>0$ and $\Delta (u)=0$ if $u\le 0$.

Algorithm 2

1.
Calculate the residuals $\hat{\varepsilon }_{11}, \ldots ,\hat{\varepsilon }_{1n_1}$,$\hat{\varepsilon }_{21}, \ldots ,\hat{\varepsilon }_{2n_2}$.
2.
Calculate the observed value of the test statistic $T_{n_1,n_2}$, $T_{n_1,n_2,obs}$.
3.
Calculate $M_{2}$.
4.
For some large integer B, repeat the following steps for every $b \in \{1,\ldots ,B\}$:
1. (a)
  Generate $\xi _{11}, \ldots , \xi _{1n_1},\xi _{21}, \ldots , \xi _{2n_2}$ IID variables with mean 0 and variance 1.
2. (b)
  Calculate $T^{*b}_{2,n_1,n_2}=\xi ^TM_2\xi $.
5.
Approximate the p-value by $\hat{p}^*=\frac{1}{B}\sum _{b=1}^B \Delta (T^{*b}_{2,n_1,n_2}-T_{n_1,n_2,obs})$.

5 Numerical results

5.1 Finite sample performance

The properties so far studied are asymptotic. In order to empirically investigate the performance of the proposed test for finite sample sizes, we carried out a simulation experiment. The objective of this experiment is fourfold: first, to study the goodness of the WB approximation to the null distribution of the test statistic; second, to analyze the WB approximation in terms of power, comparing it to the power that results when the bootstrap employed in PF07 is used to approximate the null distribution of the proposed test statistic (denoted as Boot in the tables); third, to compare the power of the proposed test to the CvM type test in PF07 (denoted as CM in the tables) (the KS test is not considered in our simulation study because, in the simulations carried out in PF07, it was less powerful than the CvM test); and finally, to compare the WB and the bootstrap approximations in terms of the CPU time required. This section reports and summarizes the obtained results. All computations in this paper have been performed by using programs written in the R language (R Core Team 2015). Specifically, to numerically approximate the WB p-value by Imhof’s method the R package CompQuadForm (Duchesne and Lafaye de Micheaux 2010) was used.

Three specifications for the functions $m_k$ and $\sigma _k$ were considered:

S1:
$Y_{kj}=X_{kj}+X_{kj}^2+(X_{kj}+0.5)\varepsilon _{kj}$, $1\le j \le n_k$, $k=1,2$,
S2:
$Y_{kj}=X_{kj}+0.5\varepsilon _{kj}$, $1\le j \le n_k$, $k=1,2$,
S3:
$Y_{1j}=X_{1j}+X_{1j}^2+(X_{1j}+0.5)\varepsilon _{1j}$, $1\le j \le n_1$, and $Y_{2j}=X_{2j}+0.5\varepsilon _{2j}$, $1\le j \le n_2$,

with $X_{kj}\sim U(0,1)$, $1\le j \le n_k$, $k=1,2$. For each of these specifications for $m_k$ and $\sigma _k$, the following three cases were considered for the error distribution:

(i)
$\varepsilon _1, \varepsilon _2 \sim N(0,1)$,
(ii)
$\varepsilon _1\sim N(0,1)$, $\varepsilon _2 \sim E(1)-1$,
(iii)
$\varepsilon _1\sim N(0,1)$, $\varepsilon _2 \sim U(-\sqrt{3},\sqrt{3})$,

where E(1) stands for a negative exponential law with mean 1. Case (i) corresponds to the null hypothesis, while cases (ii) and (iii) are alternatives.

To estimate the regression function and the conditional variance the Epanechnikov kernel $K(u)=0$.$75\times (1-u^2)$ was employed.

As weight function for $T_{n_1,n_2}$ we took $\omega (t)=\exp (-\beta t^2)$. This weight function has been considered in many other test statistics involving ECFs (see, for example, the tests in Alba-Fernández et al. 2008; Meintanis 2005; Meintanis et al. 2015; Pardo-Fernández et al. 2015a, b, among others).

Another issue is the choice of h. Since the choice of the bandwidth for tests based on smoothing remains an open issue (see, for example, de Uña-Álvarez 2013; González-Manteiga and Crujeiras 2013; Sperlich 2013), we proceeded as in the simulation studies in PF07 and Pardo-Fernández et al. (2015a): we took $h_k=c\times n_k^{-a}$, where c and a are real constants. To decide the values for a and c, we performed an extensive simulation experiment with the aim of selecting those values giving type I error probabilities closest to the nominal values. We also tried several values for $\beta $, specifically $\beta =\{0.05,0.10,0.15,0.20,0.25\}$. In general, better results -in the sense of agreement between the observed type I error probabilities and the target values- were obtained for $\beta =0.15$. Because of this reason we fixed $\beta =0.15$ in all simulations.

1000 samples with sizes $n_1,n_2 \in \{50,100\}$ were generated for each case and each specification for $m_k$ and $\sigma _k$. For each sample, to approximate the WB p-value of the observed value of the test statistic, we applied Algorithm 2, with raw and centered multipliers generated from a standard normal distribution and $B=1000$, and Algorithm 1. In simulations we observed that, as expected, these approximations provided quite similar values. Nevertheless, the WB with centered multipliers gives slightly better results, in the sense of yielding type I error probabilities which are a bit closer to the nominal values than the other two. Because of this reason, we recommend its use. All results displayed in the tables were obtained by using Algorithm 2 with centered multipliers. To approximate the bootstrap p-value we proceeded as in PF07, generating 200 bootstrap samples. The tables report the fractions of p-values less than or equal to 0.05 and 0.10.

Tables 1, 2, 3 display the results for the level. Looking at them it can be concluded that for $n_1, \, n_2 = 100$ all choices for a and c in these tables give values very similar to the true value of $\alpha $, for all specifications and for all tests. In general, $a=0.3$ and $c=1.0$ give quite reasonable results, so we set these values for a and c to study the power.

Table 4 displays the results for the power. In case (ii) all tests have similar power for all considered specifications; in case (iii) the test proposed in this paper exhibits larger power than the one based on the empirical CDF. As for the WB and bootstrap approximations to the null distribution of $T_{n_1,n_2}$, the bootstrap test is slightly more powerful than the one based on the WB approximation. Nevertheless, as the sample size increases, the power of both tests become closer. This fact was also observed in Kojadinovic and Yan (2012) and Ghoudi and Rémillard (2014) for goodness-of-fit tests based on the empirical CDF. As will be seen a bit later, the practical importance of this fact resides in that for larger sample sizes the bootstrap becomes extremely time consuming.

Table 1 Percentage of rejections at the significance levels $5\%$ (upper entry) and $10\%$ (lower entry) for case (i) and specification S1

Full size table

Table 2 Percentage of rejections at the significance levels $5\%$ (upper entry) and $10\%$ (lower entry) for case (i) and specification S2

Full size table

Table 3 Percentage of rejections for at the significance levels $5\%$ (upper entry) and $10\%$ (lower entry) for case (i) and specification S3

Full size table

Table 4 Percentage of rejections at the significance levels $5\%$ and $10\%$ for cases (ii) and (iii) and specifications S1–S3

Full size table

We also compared the bootstrap and the WB approximations in terms of the required computing time. To calculate the WB approximation we used Algorithm 1 (denoted as WB1 in Table 5) and Algorithm 2 (denoted as WB2 in Table 5). For the comparisons to be fair, we took $B=1000$ for the bootstrap and the WB2 estimators. As for the raw and the centered multipliers, the difference in the required computing time is negligible. Table 5 shows the CPU time consumed in seconds to get a p-value for testing the equality of the error distribution for several sample sizes. Looking at this table it becomes evident that WB2 is more efficient than the bootstrap approximation, in terms of the required computing time, specially for larger sample sizes. The difference between WB1 and WB2 is really small. The gain in computational efficiency of the WB over the bootstrap stems from the fact that one does not have to re-estimate m and $\sigma $ at each iteration, which slows down the process considerably; by contrast, for the WB approximation, once the matrix $M_3$ is calculated, the WB replicates $T_{2,n_1,n_2}^{*1},\ldots , T_{2,n_1,n_2}^{*B}$ are calculated very rapidly.

Table 5 CPU time in seconds for the calculation of a p-value

Full size table

Finally, we ran simulations when the error distributions come from a mixed distribution. Specifically, we considered the following cases:

(iv)
$$\begin{aligned} \varepsilon _1, \varepsilon _2 \sim \left\{ \begin{array}{l@{\quad }l} 0, &{} \text {with probability} \ 0.20, \\ N(0,\sqrt{5/4}) &{} \text {with probability} \ 0.80, \end{array}\right. \end{aligned}$$
(v)
$$\begin{aligned} \varepsilon _1 \sim \left\{ \begin{array}{l@{\quad }l} 0, &{} \text {with probability} \ 0.20, \\ N(0,\sqrt{5/4}) &{} \text {with probability} \ 0.80, \end{array}\right. \end{aligned}$$

$$\begin{aligned} \varepsilon _2 \sim \left\{ \begin{array}{l@{\quad }l} 0, &{} \text {with probability} \ 0.50,\\ N(0,\sqrt{2}) &{} \text {with probability} \ 0.50. \end{array}\right. \end{aligned}$$

Case (iv) corresponds to the null hypothesis and case (v) is an alternative. In practice, these cases could model a situation where the observations come from two devices, one of them with no measurement error. The test in PF07 cannot be applied in this setting because it requires the error distribution to have a smooth density. Table 6 displays the obtained results for the test proposed in this paper for $n_1,n_2=100$. Again the empirical levels are close to the target values and the test has power against the alternative.

Table 6 Percentage of rejections at the significance levels $5\%$ (upper entry) and 10% (lower entry) for cases (iv) and (v) and specifications S1–S3

Full size table

5.2 Real data analysis

Finally, we applied the proposed test to a real data set. To estimate the p-value we applied Algorithm 2 with $B=1000$. Several values for a and c were tried. Next we briefly describe it.

Young and Bowman (1995) proposed a method for testing the equality and parallelism of two or more smooth curves. Their method assumes that the errors are equally distributed in each population. They applied their method to a data set consisting of the yield (g / plant) and density (plants/$m^2$) of White Spanish Onions from two South Australian localities, namely Purnong Landing (first group, 42 observations) and Virginia (second group, 42 observations). This data set is available in the R package sm (Bowman and Azzalini 2014). Table 7 displays the estimated p-values when using the test proposed in this paper for testing the equality of the error distributions. As in Young and Bowman (1995), the test was applied on the logarithm of the data. Looking at this table we see that the equality of the error distribution cannot be rejected.

Table 7 p-values for the data set

Full size table

6 Testing for the equality of $d>2$ error distributions

The proposed test can be extended to testing for the equality of $d>2$ error distributions as follows. Let $(X_k,Y_k)$, $1\le k \le d$, be d independent random vectors satisfying the general nonparametric regression model (1), $ 1\le k \le d$. Let $F_k$ and $C_k=R_k+\mathrm{i} I_k$ denote de CDF and the CF of $\varepsilon _k$, respectively, $1\le k \le d$. Suppose that independent samples are available from each population: $(X_{k1}, Y_{k1}), \ldots , (X_{kn_k},Y_{kn_k})$, $1\le k \le d$. Let $N=n_1+\cdots +n_d$. For testing

$$\begin{aligned} H_{0d}: \,F_1=\cdots =F_d \quad \Leftrightarrow \quad C_1=\cdots =C_d, \end{aligned}$$

against the general alternative

$$\begin{aligned} H_{1d}: \, H_{0d} \, \text{ is } \text{ not } \text{ true }, \end{aligned}$$

for observable data, Hušková and Meintanis (2008) have proposed to compare the ECF associated to the sample from each population to the ECF of all available data which, under $H_{0d}$, estimates the common CF, say $C=C_1=\cdots =C_d$. A residual version of such test can be used for testing $H_{0d}$ in our setting. Specifically, let $\hat{\varepsilon }_{kj}$, $1\le j \le n_k$, $1\le k \le d$, be defined as in (2) and let

$$\begin{aligned} T_N=\sum _{k=1}^d n_k\Vert \hat{C}_k-\hat{C}\Vert _{\omega }^2, \end{aligned}$$

where $\hat{C}_k$ is as defined in (3) and

$$\begin{aligned} \hat{C}=\frac{1}{N}\sum _{k=1}^d n_k \hat{C}_k. \end{aligned}$$

Analogue results to those given in Theorems 1, 2 and 4 can be given for $T_N$. Next we state them without proofs because they closely follow those provided for $d=2$.

Theorem 5

Suppose that $n_k/N \rightarrow \tau _k>0$, $1\le k \le d$, Assumptions (A.2) and (A.3) hold for all $1\le k \le d$, then $\frac{1}{N}T_{N}\mathop {\rightarrow }\limits ^{P} \sum _{k=1}^d \tau _k\Vert {C}_k-C_0\Vert ^2_{\omega },$ with $C_0=\sum _{k=1}^d \tau _k{C}_k$.

Theorem 6

Suppose that assumptions in Theorem 5 hold. If $H_0$ is true, then

$$\begin{aligned} T_{N}\mathop {\longrightarrow }\limits ^{\mathcal {L}}\sum _{k=1}^d \Vert Z_k-\sqrt{\tau _k}Z_0\Vert ^2_{\omega }, \end{aligned}$$

where $\{Z_k(t), \, t\in \mathbb {R}\}$, $k=1,\ldots ,d$, are d IID centered Gaussian processes on $L_2(\omega )$ with covariance kernel $\varrho _0(t,s)$ as defined in Theorem 2 and $Z_0=\sum _{k=1}^d \sqrt{\tau _k}{Z}_k$.

Now let $\xi _{1,1},\ldots ,\xi _{1,n_1}, \ldots , \xi _{d,1},\ldots ,\xi _{d,n_d}$ be IID random variates with mean 0 and variance 1, which are independent of the data, $(X_{k1}, Y_{k1}), \ldots , (X_{kn_k},$$Y_{kn_k})$, $1\le k \le d$. Let

$$\begin{aligned} T^*_N=\sum _{k=1}^d n_k\Vert \hat{U}^*_k-\hat{U}^*_0\Vert _{\omega }^2, \end{aligned}$$

where $\hat{U}^*_k$ is as defined in (6) with $\hat{R}_{\tau }$ and $\hat{I}_{\tau }$ replaced by $\hat{R}_{\tau _1,\ldots , \tau _d}$ and $\hat{I}_{\tau _1,\ldots , \tau _d}$, respectively,

$$\begin{aligned} \hat{R}_{\tau _1,\ldots , \tau _d}(t) =\frac{1}{N}\sum _{k=1}^d n_k \hat{R}_{k}(t), \quad \hat{I}_{\tau _1,\ldots , \tau _d}(t) =\frac{1}{N}\sum _{k=1}^d n_k \hat{I}_{k}(t), \end{aligned}$$

and

$$\begin{aligned} \hat{U}^*_0=\frac{1}{N}\sum _{k=1}^d n_k \hat{U}^*_k. \end{aligned}$$

Theorem 7

Suppose that assumptions in Theorem 5 hold, then

$$\begin{aligned} \sup _x \left| P_*\left\{ T_{N}^* \le x \right\} -P\left\{ T_{\tau _1,\ldots , \tau _d} \le x \right\} \right| \mathop {\longrightarrow }\limits ^{P} 0, \end{aligned}$$

where $T_{\tau _1,\ldots , \tau _d}=\sum _{k=1}^d \Vert Z_{k,\tau _1,\ldots ,\tau _k}-\sqrt{\tau _k}Z_{0,\tau _1,\ldots ,\tau _k}\Vert ^2_{\omega }$, $\{Z_{k,\tau _1,\ldots ,\tau _k}(t), \, t\in \mathbb {R}\}$, $1 \le k \le d$, are independent centered Gaussian processes on $L_2(\omega )$ with covariance kernel

$$\begin{aligned} \varrho _{k,\tau _1,\ldots ,\tau _k}(t,s)=E\{Z_{k,\tau _1,\ldots ,\tau _k}(\varepsilon _k;t)Z_{k,\tau _1,\ldots ,\tau _k}(\varepsilon _k;s)\}, \end{aligned}$$

$Z_{k,\tau _1,\ldots ,\tau _k}(\varepsilon _k;t)$ is defined as in (5) with $R_{\tau }$ and $I_{\tau }$ replaced by ${R}_{\tau _1,\ldots , \tau _d}$ and ${I}_{\tau _1,\ldots , \tau _d}$, respectively,

$$\begin{aligned} {R}_{\tau _1,\ldots , \tau _d}(t) =\sum _{k=1}^d\tau _k {R}_{r}(t), \quad {I}_{\tau _1,\ldots , \tau _d}(t) =\sum _{k=1}^d\tau _k {I}_{r}(t), \end{aligned}$$

and $Z_{0,\tau _1,\ldots ,\tau _k}=\sum _{k=1}^d \sqrt{\tau _k}{Z}_{k,\tau _1,\ldots ,\tau _k}$.

Similar results to those stated in Corollaries 1 and 2 for $T_{n_1,n_2}^*$ can be given for $T^*_N$. To save space we omit them.

References

Alba-Fernández V, Jiménez-Gamero MD, Muñoz-García J (2008) A test for the two-sample problem based on empirical characteristic functions. Comput Stat Data Anal 52:3730–3748
Article MathSciNet MATH Google Scholar
Baringhaus L, Kolbe D (2015) Two-sample tests based on empirical Hankel transforms. Stat Pap 56:597–617
Article MathSciNet MATH Google Scholar
Bowman AW, Azzalini, A (2014) R package ‘sm’: nonparametric smoothing methods (version 2.2-5.4). http://www.stats.gla.ac.uk/~adrian/sm, http://azzalini.stat.unipd.it/Book_sm
Burke MD (2000) Multivariate tests-of-fit and uniform confidence bands using a weighted bootstrap. Stat Probab Lett 46:13–20
Article MathSciNet MATH Google Scholar
de Uña-Álvarez J (2013) Comments on: an updated review of Goodness-of-Fit tests for regression models. Test 22:414–418
Article MathSciNet MATH Google Scholar
Delhing H, Mikosch T (1994) Random quadratic forms and the bootstrap for $U$-statistics. J Multivar Anal 51:392–413
Article MathSciNet MATH Google Scholar
Duchesne P, Lafaye de Micheaux P (2010) Computing the distribution of quadratic forms: further comparisons between the Liu-Tang-Zhang approximation and exact methods. Comput Stat Data Anal 54:858–862
Article MathSciNet MATH Google Scholar
Feller W (1971) An introduction to probability theory and its applications, vol 2. Wiley, New York
MATH Google Scholar
Ghoudi K, Rémillard B (2014) Comparison of specification tests for GARCH models. Comput Stat Data Anal 76:291–300
Article MathSciNet MATH Google Scholar
González-Manteiga W, Crujeiras R (2013) An updated review of Goodness-of-Fit tests for regression models. Test 22:361–411
Article MathSciNet MATH Google Scholar
Hall P, Hart JD (1990) Bootstrap test for difference between means in nonparametric regression. J Am Stat Assoc 85:1039–1049
Article MathSciNet MATH Google Scholar
Hobza T, Morales D, Pardo L (2014) Divergence-based test of homogeneity for spatial data. Stat Pap 55:1059–1077
Article MathSciNet MATH Google Scholar
Hušková M, Meintanis SG (2008) Tests for the multivariate $k$-sample problem based on the empirical characteristic function. J Nonparametr Stat 20:263–277
Article MathSciNet MATH Google Scholar
Imhof JP (1961) Computing the distribution of quadratic forms in normal variables. Biometrika 48:419–426
Article MathSciNet MATH Google Scholar
Jiménez-Gamero MD, Alba-Fernández MV, Jodrá P, Barranco-Chamorro I (2016) Fast tests for the two-sample problem based on the empirical characteristic function. Math Comput Simul. doi:10.1016/j.matcom.2016.09.007
Jiménez-Gamero MD, Kim H-M (2015) Fast goodness-of-fit test based on the characteristic function. Comput Stat Data Anal 89:172–191
Article MathSciNet MATH Google Scholar
Kojadinovic I, Yan J (2012) Goodness-of-fit testing based on a weighted bootstrap: a fast sample alternative to the parametric bootstrap. Can J Stat 40:480–500
Article MathSciNet MATH Google Scholar
Kulasekera KB, Wang J (2001) A test of the equality of regression curves using Gâteaux scores. Austral N Z J Stat 43:89–99
Article MATH Google Scholar
Kundu S, Majumdar S, Mukherjee K (2000) Central limit theorems revisited. Stat Probab Lett 47:265–275
Article MathSciNet MATH Google Scholar
Masry E (1996) Multivariate local polynomial regression for time series: uniform strong consistency and rates. J Time Ser Anal 17:571–600
Article MathSciNet MATH Google Scholar
Meintanis SG (2005) Permutation tests for homogeneity based on the empirical characteristic function. J Nonparametr Stat 17:583–592
Article MathSciNet MATH Google Scholar
Meintanis SG, Ngatchou-Wandji J, Taufer E (2015) Goodness-of-fit tests for multivariate stable distributions based on the empirical characteristic function. J Multivar Anal 140:171–192
Article MathSciNet MATH Google Scholar
Modarres R (2016) Multinomial interpoint distances. Stat Pap. doi:10.1007/s00362-016-0766-7
Mora J (2005) Comparing distribution functions of errors in linear models: a nonparametric approach. Stat Probab Lett 73:425–432
Article MathSciNet MATH Google Scholar
Pardo-Fernández JC (2007) Comparison of error distributions in nonparametric regression. Stat Probab Lett 77:350–356
Article MathSciNet MATH Google Scholar
Pardo-Fernández JC, Jiménez-Gamero MD, El Ghouch A (2015a) A nonparametric ANOVA-type test for regression curves based on characteristic functions. Scand J Stat 42:197–213
Article MathSciNet MATH Google Scholar
Pardo-Fernández JC, Jiménez-Gamero MD, El Ghouch A (2015b) Tests for the equality of conditional variance functions in nonparametric regression. Electron J Stat 9:1826–1851
Article MathSciNet MATH Google Scholar
Quessy JF, Éthier F (2012) Cramér-von Mises and characteristic function tests for the two and $k$-sample problems with dependent data. Comput Stat Data Anal 56:2097–2111
Article MATH Google Scholar
R Core Team (2015) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Sperlich S (2013) Comments on: an updated review of Goodness-of-Fit tests for regression models. Test 22:419–427
Young SG, Bowman AW (1995) Non-parametric analysis of covariance. Biometrics 51:920–931
Article MATH Google Scholar

Download references

Acknowledgements

The authors thank the anonymous referees for their constructive comments and suggestions which helped to improve the presentation. G.I. Rivas-Martínez acknowledges financial support from Fundación Carolina, Universidad Nacional de Asunción and Universidad de Sevilla. M.D. Jiménez-Gamero acknowledges financial support from grant MTM2014-55966-P of the Spanish Ministry of Economy and Competitiveness.

Author information

Authors and Affiliations

Laboratorio de Sistemas de Potencia y Control, Universidad Nacional de Asunción, San Lorenzo, Paraguay
G. I. Rivas-Martínez
Departamento de Estadística e Investigación Operativa, Universidad de Sevilla, Seville, Spain
M. D. Jiménez-Gamero & J. L. Moreno-Rebollo

Authors

G. I. Rivas-Martínez
View author publications
You can also search for this author in PubMed Google Scholar
M. D. Jiménez-Gamero
View author publications
You can also search for this author in PubMed Google Scholar
J. L. Moreno-Rebollo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. D. Jiménez-Gamero.

Appendix

1.1 An expression for $M_3$

Let

$$\begin{aligned} \varphi _{\omega }'(t)=\frac{\partial }{\partial t}\varphi _{\omega }(t), \quad \varphi _{\omega }''(t)=\frac{\partial ^2}{\partial t^2}\varphi _{\omega }(t), \end{aligned}$$

where $\varphi _{\omega }$ is as defined in (8). Let $D_1$, $D_2$ the matrices defined similarly to M with $\varphi _{\omega }$ replaced by $\varphi _{\omega }'$ and $\varphi _{\omega }''$, respectively. Let

$$\begin{aligned} e_1= & {} (\hat{\varepsilon }_{11}, \ldots ,\hat{\varepsilon }_{1n_1},\hat{\varepsilon }_{21}, \ldots ,\hat{\varepsilon }_{2n_2})^T,\\ e_2= & {} \left( \frac{\hat{\varepsilon }_{11}^2-1}{2}, \ldots ,\frac{\hat{\varepsilon }_{1n_1}^2-1}{2},\frac{\hat{\varepsilon }_{21}^2-1}{2}, \ldots ,\frac{\hat{\varepsilon }_{2n_2}^2-1}{2}\right) ^T,\\ d_1^T= & {} 1_N^TD_1/N,\\ d_2^T= & {} 1_N^T(e_11_N^T\odot D_1)/N,\\ c^T= & {} 1_N^TM/N,\\ a_1= & {} 1_N^TD_21_N/N^2,\\ a_2= & {} 1_N^T(e_11_N^t \odot D_2)1_N/N^2,\\ a_3= & {} -1_N^T(e_11_N^t \odot D_1)1_N/N^2,\\ a_4= & {} e_1^TD_2e_1/N^2,\\ a_5= & {} 1_N^TM1_N/N^2. \end{aligned}$$

With this notation,

$$\begin{aligned} M_3= & {} M-e_1d_1^T-d_1e_1^T-e_2d_2^T-d_2e_2^T-1_Nc^T-c1_N^T-a_1e_1e_1^T\\&-\,a_2\left( e_1e_2^T+e_2e_1^T\right) -a_3\left( e_21_N^T+1_Ne_2^T\right) -a_4e_2e_2^T+a_51_N1_N^T. \end{aligned}$$

1.2 Proofs

We now sketch the proofs of the results stated in the previous sections, as well as some preliminary results. Observe that under Assumption (A.3) (see, for example, Masry 1996)

$$\begin{aligned} \begin{array}{lll} \displaystyle \sup _{x\in S}|\hat{m}_k(x)-m_k(x)|=o_P\left( n_k^{-1/4}\right) ,\\ \displaystyle \sup _{x\in S}|\hat{\sigma }_k(x)-\sigma _k(x)|=o_P\left( n_k^{-1/4}\right) , \end{array} \end{aligned}$$

(11)

$k=1,2$. Let

$$\begin{aligned} \tilde{C}_k(t)=\frac{1}{n_k}\sum _{j=1}^{n_k}\exp (\mathrm{i}t\varepsilon _{kj})=\tilde{R}_k(t)+\mathrm{i} \tilde{I}_k(t), \end{aligned}$$

$$\begin{aligned} \tilde{R}_k(t)=\frac{1}{n_k}\sum _{j=1}^{n_k}\cos (t{\varepsilon }_{kj}),\quad \tilde{I}_k(t)=\frac{1}{n_k}\sum _{j=1}^{n_k} \sin (t{\varepsilon }_{kj}), \end{aligned}$$

$k=1,2$.

Proof of Theorem 1

From Lemma 10(i) in Pardo-Fernández et al. (2015a) and (11),

$$\begin{aligned} T_{n_1,n_2}=\Vert \tilde{C}_1-\tilde{C}_2\Vert ^2_{\omega }+o_P(1). \end{aligned}$$

(12)

Theorem 2 in Alba-Fernández et al. (2008) asserts that

$$\begin{aligned} \Vert \tilde{C}_1-\tilde{C}_2\Vert ^2_{\omega }\mathop {\longrightarrow }\limits ^{a.s.} \kappa . \end{aligned}$$

(13)

The result follows from (12) and (13). $\square $

Lemma 1

If Assumptions (A.2) and (A.3) hold, then

$$\begin{aligned} \frac{n_1n_2}{N}T_{n_1,n_2}=\Vert Z_{n_1,n_2}\Vert ^2_{\omega }, \end{aligned}$$

with

$$\begin{aligned} Z_{n_1,n_2}(t)=\sqrt{\frac{n_2}{N}}U_1(t)-\sqrt{\frac{n_1}{N}}U_2(t), \end{aligned}$$

$$\begin{aligned} U_k(t) = U_{0k}(t)+t\rho _{k,1}(t)+t^2r_{k,2}(t), \end{aligned}$$

$$\begin{aligned} \begin{array}{ll} U_{0k}(t)=&{}\frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k}\Big \{\cos (t\varepsilon _{kj})+t\varepsilon _{kj}I_{k}(t)-t\frac{\varepsilon ^2_{kj}-1}{2}R'_{k}(t)-R_\tau +\sin (t\varepsilon _{kj}) \\ &{}-\,t\varepsilon _{kj}R_{k}(t)-t\frac{\varepsilon ^2_{kj}-1}{2}I'_{k}(t)-I_\tau \Big \}, \end{array} \end{aligned}$$

$\sup _{t} |r_{k,s}(t)|=o_P(N^{-1/2})$, $k,s=1,2$.

Proof

We have that

$$\begin{aligned} \frac{n_1n_2}{N}T_{n_1,n_2} = \Vert Z_{n_1,n_2}^0\Vert ^2_{\omega }, \end{aligned}$$

where $Z_{n_1,n_2}^0(t)=\sqrt{\frac{n_2}{N}}U_1^0(t)-\sqrt{\frac{n_1}{N}}U_2^0(t)$, $U_k^0(t)=\sqrt{n_k}\{\hat{C}_{k}(t)-C_\tau (t)\}$, $k=1,2$. From Lemma 10 in Pardo-Fernández et al. (2015b),

$$\begin{aligned} U_k^0(t)=\sqrt{n_k}\tilde{C}_{k}(t)+A_{k,1}(t)+A_{k,2}(t)-\sqrt{n_k}C_\tau (t)+t\rho _{k,1}(t)+t^2\rho _{k,2}(t),\quad k=1,2, \end{aligned}$$

where

$$\begin{aligned} A_{k,1}(t)=\mathrm{i}t\frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k} \exp (\mathrm{i}t\varepsilon _{kj})\left( \frac{\sigma _k(X_{kj})-\hat{\sigma }_k(X_{kj})}{\sigma _k(X_{kj})}\right) \varepsilon _{kj}, \end{aligned}$$

$$\begin{aligned} A_{k,2}(t)=\mathrm{i}t\frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k} \exp (\mathrm{i}t\varepsilon _{kj})\left( \frac{m_k(X_{kj})-\hat{m}_k(X_{kj})}{\sigma _k(X_{kj})}\right) \end{aligned}$$

and $\sup _{t} |\rho _{k,s}(t)|=o_P(N^{-1/2})$, $k,s=1,2$. From Lemma 11 in Pardo-Fernández et al. (2015b),

$$\begin{aligned} A_{k,1}(t)=-\frac{t}{2}C'_{k}(t)\frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k} (\varepsilon _{kj}^2-1)+t\rho _{k,3}(t),\quad \sup _t|\rho _{k,3}(t)|=o_P(1), \end{aligned}$$

with $C'_{k}(t)=R'_{k}(t)+\mathrm{i}I'_{k}(t)$. From the proof of Theorem 1 in Pardo-Fernández et al. (2015a),

$$\begin{aligned} A_{k,2}(t)=-\mathrm{i}tC_{k}(t)\frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k} \varepsilon _{kj}+t\rho _{k,4}(t),\quad \sup _t|\rho _{k,4}(t)|=o_P(1). \end{aligned}$$

All above facts and Assumption (A.2) imply that $\Vert Z_{n_1,n_2}^0\Vert ^2_{\omega }=\Vert Z_{n_1,n_2}\Vert ^2_{\omega }$. This completes the proof. $\square $

Proof of Theorem 2

Let us continue with the notation in the statement of Lemma 1. By the central limit theorem for IID random elements in Hilbert spaces, $\{U_{01}(t), \, t \in \mathbb {R} \}$ converges to a centered Gaussian process $U^{(1)}$ on $L^2(w)$ with covariance structure $\varrho _0(t,s)$. By the independence of the two samples, $\{U_{02}(t), \, t \in \mathbb {R} \}$ converges in distribution to an independent copy $U^{(2)}$ of $U^{(1)}$. As, for constants a and b satisfying $a^2+b^2=1$, the centered process $Z(t)=aU^{(1)}(t)+bU^{(2)}(t)$ has covariance structure $\varrho _0(t,s)$, and since $(\sqrt{1-n_1/N})^2+(\sqrt{n_1/N})^2=1$ and $n_1/N$ converges to $\tau $, it follows that $\{Z_{n_1,n_2}(t),\, t \in \mathbb {R} \}$ converges in law to $\{Z(t),\, t \in \mathbb {R} \}$, under $H_0$. Finally, the result follows from the continuous mapping theorem. $\square $

Proof of Theorem 3

Note that

$$\begin{aligned} \frac{n_1n_2}{N}T_{1,n_1,n_2}^*=\Vert Z_{n_1,n_2}^{*}\Vert ^{2}_\omega , \end{aligned}$$

where

$$\begin{aligned} Z_{n_1,n_2}^{*}=\sqrt{\frac{n_2}{N}}U_{1}^*(t)-\sqrt{\frac{n_1}{N}}U_{2}^*(t), \end{aligned}$$

where $U_{k}^*(t)=\frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k}\xi _{kj}Z_{k,\tau }(\varepsilon _{kj};t),\ k=1,2$.

First, it will be shown that conditional on $(X_{11},Y_{11}),\ldots ,(X_{1n_1},Y_{1n_1})$, $\{U_{1}^*(t), \, t \in \mathbb {R}\}$ converges in law to $\{U_{1\tau }(t), \, t \in \mathbb {R}\}$ on $L_2(\omega )$, where $\{U_{1\tau }(t), \, t\in \mathbb {R}\}$ is a centered Gaussian process on $L_2(\omega )$ with covariance kernel $\varrho _{1,\tau }(t,s)$. To achieve this result we will apply Theorem 1.1 in Kundu et al. (2000). Next we will show that conditions (i)–(iii) in that theorem hold.

Note that $E_*\{\xi _{1j}Z_{1,\tau }(\varepsilon _{1j};t)\}=0, \ 1 \le j \le n_1$. Denote

$$\begin{aligned}c_{n_1}(t,s)=Cov_*\{U_{1}^*(t),U_{1}^*(s)\}=\frac{1}{n_1}\sum _{j=1}^{n_1} Z_{1,\tau }(\varepsilon _{1j};t)Z_{1,\tau }(\varepsilon _{1j};s).\end{aligned}$$

From the strong law of large numbers,

$$\begin{aligned} c_{n_1}(t,s)\mathop {\longrightarrow }\limits ^{a.s.}\varrho _{1,\tau }(t,s),\quad \forall s,t \in \mathbb {R}. \end{aligned}$$

(14)

Note also that

$$\begin{aligned} |c_{n_1}(t,s)|\le g(t,s), \quad \forall s, t \in \mathbb {R}, \end{aligned}$$

(15)

with

$$\begin{aligned} g(t,s)=\varpi _1+\varpi _2(|t|+|s|)+\varpi _3|t||s|, \quad a.s. \end{aligned}$$

for certain positive constants $\varpi _1$, $\varpi _2$, $\varpi _3$.

Let $\{e_k, k \ge 0\}$ be an orthonormal basis of $L_2(\omega )$. Let $V_{1}$ denote the covariance operator of $U_{1}^*$ and let $V_\tau $ denote the covariance operator of $U_{1\tau }(t)$. From (14) and (15), by the dominated convergence theorem,

$$\begin{aligned} \begin{array}{ll} \lim \langle V_{1} e_k, e_l \rangle _\omega &{}= \lim \int c_{n_1}(s,t)e_k(t)e_l(s)\omega (t)\omega (s)dtds \\ &{}=\int \varrho _{1,\tau }(t,s)e_k(t)e_l(s)\omega (t)\omega (s)dtds =\langle V_\tau e_k, e_l \rangle _\omega \quad a.s. \end{array} \end{aligned}$$

Thus taking $a_{kl}=\langle V_\tau e_k, e_l \rangle _\omega $, the condition (i) in the aforementioned Theorem 1.1 holds. To check the condition (ii), by monotone convergence theorem, Parseval’s relation and dominated convergence theorem, it follows

$$\begin{aligned} \begin{array}{ll} \lim \sum _{k=0}^\infty \langle V_{1} e_k, e_l \rangle _\omega &{}= \lim \sum _{k=0}^\infty E_*\{\langle U_{1}^*(t),e_k \rangle ^2_\omega \}=\lim E_*\{\Vert U_{1}^*(t)\Vert ^2_\omega \} \\ &{}=\int \lim c_{n_1}(t,t)\omega (t)dt=\int \varrho _{1,\tau }(t,t)\omega (t)dt\\ &{} =E\Vert U_{1\tau }\Vert ^2_\omega < \infty \quad a.s. \end{array} \end{aligned}$$

Before verifying condition (iii), we first notice that

$$\begin{aligned} \left| \langle \frac{1}{\sqrt{n_1}}\xi _{1j}Z_{1,\tau }(\varepsilon _{1j};t),e_k \rangle _\omega \right| \le \frac{|\xi _{1j}|}{\sqrt{n_1}}\left( \int Z_{1,\tau }^2(\varepsilon _{1j};t)\omega (t)dt\right) ^{1/2}\le g(t,t)\frac{|\xi _{1j}|}{\sqrt{n_1}}. \end{aligned}$$

From the above inequality,

$$\begin{aligned} \begin{array}{lll} {\displaystyle \sum _{j=1}^{n_1} E_*\left[ \left\langle \frac{1}{\sqrt{n_1}}\xi _{1j}Z_{1,\tau }(\varepsilon _{1j};t),e_k \right\rangle _\omega ^2 I \left\{ \left| \left\langle \frac{1}{\sqrt{n_1}}\xi _{1j}Z_{1,\tau }(\varepsilon _{1j};t),e_k \right\rangle _\omega \right|>\epsilon \right\} \right] }\\ \quad \le \frac{g^2(t,t)}{n_1}\sum _{j=1}^{n_1} E_*\left[ \xi _{1j}^2 I \{|\xi _{1j}|>g^{-1}(t,t)\epsilon \sqrt{n_1}\}\right] \\ \quad =g^2(t,t)E_*\left[ \xi ^2_{1j}I\{|\xi _{1j}|>g^{-1}(t,t)\epsilon \sqrt{n_1}\}\right] \rightarrow 0, \end{array} \end{aligned}$$

$\forall \epsilon > 0$, $\forall k \ge 0$.

Analogously, conditional on $(X_{21},Y_{21}),\ldots ,(X_{2n_2},Y_{2n_2})$, $\{U_{2}^*(t), \, t \in \mathbb {R}\}$ converges in law to $\{U_{2\tau (t)}, \, t \in \mathbb {R}\}$ on $L_2(\omega )$, where $\{U_{2\tau }(t), \, t\in \mathbb {R}\}$ is a centered Gaussian process on $L_2(\omega )$ with covariance kernel $\varrho _{2,\tau }(t,s)$. A similar argument to that in the proof of Theorem 2 shows that, conditional on $(X_{11},Y_{11}),\ldots ,(X_{1n_1},Y_{1n_1}),(X_{21},Y_{21}),\ldots ,(X_{2n_2},Y_{2n_2})$, $\{Z_{1,n_1,n_2}^*(t), \, t \in \mathbb {R}\}$ converges in law to $\{Z_\tau (t), \, t \in \mathbb {R}\}$ on $L_2(\omega )$. $\square $

Lemma 2

Suppose that Assumption (A.3) holds, then

(a)
$\frac{1}{n_k}\sum _{j=1}^{n_k}(\varepsilon _{kj}-\hat{\varepsilon }_{kj})^2=o_P(1), \ k=1,2$.
(b)
$\frac{1}{n_k}\sum _{j=1}^{n_k}(\hat{\varepsilon }_{kj}^2-\varepsilon _{kj}^2)^2=o_P(1), \ k=1,2$.
(c)
$\frac{1}{n_k}\sum _{j=1}^{n_k}(\hat{\varepsilon }_{kj}^2-1)^2=O_P(1), \ k=1,2$.
(d)
$\frac{1}{n_k}\sum _{j=1}^{n_k} \hat{\varepsilon }_{kj}^2=O_P(1), \ k=1,2.$

Proof

The difference between the residuals and the errors can be written as follows

$$\begin{aligned} \hat{\varepsilon }_{kj}-\varepsilon _{kj}=\varepsilon _{kj}\left( \frac{\sigma _k(X_{kj})-\hat{\sigma }_k(X_{kj})}{\hat{\sigma }_k(X_{kj})}\right) +\left( \frac{m_k(X_{kj})-\hat{m}_k(X_{kj})}{\hat{\sigma }_k(X_{kj})}\right) , \end{aligned}$$

(16)

$k=1,2$. The results in (a)–(d) follow from (11) and (16). $\square $

Lemma 3

Suppose that Assumptions (A.1)–(A.3) hold, then

(a)
$\Vert t(\hat{R}_{k}-R_{k})\Vert ^2_\omega =o_P(1)$, $\Vert t(\hat{I}_{k}-I_{k})\Vert ^2_\omega =o_P (1),\quad k=1,2$.
(b)
$\Vert R_\tau - \hat{R}_\tau \Vert ^2_\omega =o_P(1),\quad \Vert I_\tau - \hat{I}_\tau \Vert ^2_\omega =o_P(1),$
(c)
$\Vert t(R'_{k}-\hat{R}'_{k})\Vert ^2_\omega =o_P(1)$, $\Vert t(I'_{k}-\hat{I}'_{k})\Vert ^2_\omega =o_P(1).\quad k=1,2$.

Proof

(a) By the mean value theorem,

$$\begin{aligned} t\{\hat{R}_{k}(t)-\tilde{R}_{k}(t)\}=-t^2 r_{k}(t). \end{aligned}$$

From Lemma 2(a) and the Cauchy–Schwarz inequality,

$$\begin{aligned} \sup _t |r_{k}(t)|\le \frac{1}{n_k}\sum _{j=1}^{n_k} |\hat{\varepsilon }_{kj}-\varepsilon _{kj}| \le \left( \frac{1}{n_k}\sum _{j=1}^{n_k} (\hat{\varepsilon }_{kj}-\varepsilon _{kj})^2\right) ^{1/2}=o_P(1). \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert t(\hat{R}_{k}-\tilde{R}_{k})\Vert ^2_\omega \le \sup _t |r_{k}(t)|^2 \int t^4\omega (t)dt=o_P(1). \end{aligned}$$

(17)

We also have that

$$\begin{aligned} \Vert t({R}_{k}-\tilde{R}_{k})\Vert ^2_\omega =o_P(1). \end{aligned}$$

(18)

Finally, (17) and (18) both imply that $\Vert t(\hat{R}_{k}-R_{k})\Vert ^2_\omega =o_P(1)$. The proof for $\Vert t(\hat{I}_{k}-I_{k})\Vert ^2_\omega $ is parallel. The proof of parts (b) and (c) follow similar steps. $\square $

Proof of Theorem 4

$\frac{n_1n_2}{N}T_{2,n_1,n_2}^*$ can be expressed as $\frac{n_1n_2}{N}T_{2,n_1,n_2}^*=D_1+D_2+2D_3$, where $D_3^2 \le D_1 D_2$, $D_1=\frac{n_1n_2}{N}T^*_{1,n_1,n_2}$, $D_2= \frac{n_1n_2}{N}\Vert (\hat{U}^*_1-\hat{U}^*_2)-({C}^*_1-{C}^*_2) \Vert ^2_{\omega }$. From Theorem 3,

$$\begin{aligned} \sup _x \left| P_*(D_1 \le x)-P(T_\tau \le x )\right| \mathop {\longrightarrow }\limits ^{P}0, \end{aligned}$$

Thus, to show the result it suffices to see that $D_2=o_{p_*}(1)$ in probability. With this aim, observe that $D_2$ can be expressed as

$$\begin{aligned} D_2=\sum _{k=1,2}\sum _{j=1}^{8} S_{k,j}+ \sum _{k,s=1,2} \sum _{1\le l \ne j \le 8}S_{k,s,j,l},\quad \end{aligned}$$

with $S_{k,s,j,l}^2\le S_{k,j} S_{s,l}$, $1\le j,l \le 8$, $k,s=1,2$,

$$\begin{aligned} \begin{array}{l} S_{k,1}=\frac{N-n_k}{N}\Vert \frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k} \{\cos (t\hat{\varepsilon }_{kj})-\cos (t\varepsilon _{kj})\}\xi _{k,j} \Vert _{\omega }^2, \\ S_{k,2}=\frac{N-n_k}{N}\Vert \frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k} \{\sin (t\hat{\varepsilon }_{kj})-\sin (t\varepsilon _{kj})\}\xi _{k,j} \Vert _{\omega }^2, \\ S_{k,3}=\frac{N-n_k}{N}\Vert \frac{t}{\sqrt{n_k}}\sum _{j=1}^{n_k}(\hat{\varepsilon }_{kj}\hat{R}_k-\varepsilon _{kj}R_k )\xi _{k,j} \Vert _{\omega }^2, \\ S_{k,4}=\frac{N-n_k}{N}\Vert \frac{t}{\sqrt{n_k}}\sum _{j=1}^{n_k}(\hat{\varepsilon }_{kj}\hat{I}_k -\varepsilon _{kj}I_k)\xi _{k,j} \Vert _{\omega }^2, \\ S_{k,5}=\frac{N-n_k}{N}\Vert \frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k}(R_\tau -\hat{R}_\tau )\xi _{1,k} \Vert _{\omega }^2,\\ S_{k,6}=\frac{N-n_k}{N}\Vert \frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k}(I_\tau -\hat{I}_\tau )\xi _{k,j} \Vert _{\omega }^2,\\ S_{k,7}=\frac{N-n_k}{N}\Vert \frac{t}{2\sqrt{n_k}}\sum _{j=1}^{n_k}\{(\hat{\varepsilon }_{kj}^2-1)\hat{I}_k'-(\varepsilon _{kj}^2-1)I_k' \}\xi _{k,j} \Vert _{\omega }^2,\\ S_{k,8}=\frac{N-n_k}{N}\Vert \frac{t}{2\sqrt{n_k}}\sum _{j=1}^{n_k}\{(\varepsilon _{kj}^2-1)R_k'-(\hat{\varepsilon }_{kj}^2-1)\hat{R}_k' \}\xi _{k,j} \Vert _{\omega }^2, \end{array} \end{aligned}$$

$k=1,2$. We will show that $S_{k,j}=o_{p_*}(1)$ in probability, $1\le j \le 8$, $k=1,2$.

By the mean value theorem,

$$\begin{aligned} S_{1,1}=\frac{n_2}{N}\frac{1}{n_1}\sum _{j,m=1}^{n_1} \xi _{1j}\xi _{1m}(\hat{\varepsilon }_{1j}-\varepsilon _{1j})(\hat{\varepsilon }_{1m}-\varepsilon _{1m})\int t^2 \sin (t\tilde{ \varepsilon }_{1j})\sin (\tilde{\varepsilon }_{1m})\omega (t)dt, \end{aligned}$$

where $\tilde{\varepsilon }_{1j}=\alpha _{1j} \varepsilon _{1j}+(1-\alpha _{1j})\hat{\varepsilon }_{1j}$, for some $\alpha _{1j}\in (0,1)$. Then, from Lemma 2(a),

$$\begin{aligned} E_*(S_{1,1})\le \frac{n_2}{N}\frac{1}{n_1}\sum _{j=1}^{n_1}(\varepsilon _{1j}-\hat{\varepsilon }_{1j})^2\int t^2\omega (t)dt=o_{p}(1), \end{aligned}$$

which implies $S_{1,1}=o_{p_*}(1)$ in probability. Analogously, $S_{2,1}=o_{p_*}(1)$, $S_{1,2}=o_{p_*}(1)$, $S_{2,2}=o_{p_*}(1)$ in probability.

Observe that $S_{1,3}=S_{13}+S_{23}+2S_{33}$, with $S_{33}^2\le S_{13}S_{23}$,

$$\begin{aligned} S_{13}=\frac{n_2}{N}\frac{1}{n_1}\sum _{j,m=1}^{n_1} (\hat{\varepsilon }_{1j}-\varepsilon _{1j})(\hat{\varepsilon }_{1m}-\varepsilon _{1m})\xi _{1j}\xi _{1m}\Vert t R_1\Vert ^2_{\omega }, \end{aligned}$$

$$\begin{aligned} S_{23}=\frac{n_2}{N}\frac{1}{n_1}\sum _{j,m=1}^{n_1} \hat{\varepsilon }_{1j}\hat{\varepsilon }_{1m} \xi _{1j}\xi _{1m} \Vert t(\hat{R}_1-R_1)\Vert ^2_{\omega }. \end{aligned}$$

From Lemma 2(a) it follows that $E_*(S_{13})=o_P(1)$ and thus $S_{13}=o_{p_*}(1)$ in probability. From Lemma 2(d) and Lemma 3(a) it follows that $E_*(S_{23})=o_P(1)$ and thus $S_{23}=o_{p_*}(1)$ in probability. Therefore, $S_{1,3}=o_{p_*}(1)$ in probability. Analogously, $S_{2,3}=o_{p_*}(1)$, $S_{1,4}=o_{p_*}(1)$ and $S_{2,4}=o_{p_*}(1)$ in probability.

Since $S_{1,5}=\frac{n_2}{N}\left( \frac{1}{\sqrt{n_k}}\sum _{j=1}^{n_k} \xi _{kj} \right) ^2 \Vert R_\tau -\hat{R}_\tau \Vert _{\omega }^2$, the central limit theorem and Lemma 3(b) imply that $E_*(S_{1,5})=o_P(1)$ and thus $S_{1,5}=o_{p_*}(1)$ in probability. Analogously, $S_{2,5}=o_{p_*}(1)$, $S_{1,6}=o_{p_*}(1)$ and $S_{1,6}=o_{p_*}(1)$ in probability.

Observe that $S_{1,7}=S_{17}+S_{27}+2S_{37}$, with $S_{37}^2\le S_{17}S_{27}$,

$$\begin{aligned} S_{17}=\frac{n_2}{N}\frac{1}{4}\frac{1}{n_1}\sum _{j,m=1}^{n_1} (\hat{\varepsilon }_{1j}^2-1)(\hat{\varepsilon }_{1m}^2-1)\xi _{1j}\xi _{1m} \Vert t(\hat{I}_1'-I'_1)\Vert ^2_{\omega }, \end{aligned}$$

$$\begin{aligned} S_{27}=\frac{n_2}{N}\frac{1}{4}\frac{1}{n_1}\sum _{j,m=1}^{n_1} (\hat{\varepsilon }_{1j}^2-\varepsilon _{1j}^2)(\hat{\varepsilon }_{1m}^2-\varepsilon _{1m}^2)\xi _{1j}\xi _{1m} \Vert tI_k'\Vert ^2_{\omega }. \end{aligned}$$

From Lemma 2(c) and Lemma 3(c), it follows that $E_*(S_{17})=o_P(1)$ and thus $S_{17}=o_{p_*}(1)$ in probability. From Lemma 2(b) it follows that $E_*(S_{27})=o_P(1)$ and thus $S_{27}=o_{p_*}(1)$, in probability. Therefore, $S_{1,7}=o_{p_*}(1)$, in probability. Analogously, $S_{2,7}=o_{p_*}(1)$, $S_{1,8}=o_{p_*}(1)$, and $S_{2,8}=o_{p_*}(1)$ in probability. This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rivas-Martínez, G.I., Jiménez-Gamero, M.D. & Moreno-Rebollo, J.L. A two-sample test for the error distribution in nonparametric regression based on the characteristic function. Stat Papers 60, 1369–1395 (2019). https://doi.org/10.1007/s00362-017-0878-8

Download citation

Received: 23 May 2016
Revised: 22 December 2016
Published: 30 January 2017
Issue Date: August 2019
DOI: https://doi.org/10.1007/s00362-017-0878-8

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A two-sample test for the error distribution in nonparametric regression based on the characteristic function

Abstract

Similar content being viewed by others

A model specification test for the variance function in nonparametric regression

The asymptotic normality of internal estimator for nonparametric regression

Comparing regression curves: an L1-point of view

1 Introduction

2 The test statistic

Theorem 1

3 Approximating the null distribution

Theorem 2

Remark 1

Theorem 3

Theorem 4

Corollary 1

Corollary 2

Remark 2

Remark 3

Remark 4

4 On the practical calculation

4.1 Calculation of the test statistic

4.2 Calculation of the WB distribution of the test statistic

Algorithm 1

Algorithm 2

5 Numerical results

5.1 Finite sample performance

5.2 Real data analysis

6 Testing for the equality of \(d>2\) error distributions

Theorem 5

Theorem 6

Theorem 7

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 An expression for \(M_3\)

1.2 Proofs

Proof of Theorem 1

Lemma 1

Proof

Proof of Theorem 2

Proof of Theorem 3

Lemma 2

Proof

Lemma 3

Proof

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation

Comparing regression curves: an L¹-point of view