Testing normality via a distributional fixed point property in the Stein characterization

Betsch, Steffen; Ebner, Bruno

doi:10.1007/s11749-019-00630-0

Testing normality via a distributional fixed point property in the Stein characterization

Original Paper
Published: 22 February 2019

Volume 29, pages 105–138, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

TEST Aims and scope Submit manuscript

Testing normality via a distributional fixed point property in the Stein characterization

Download PDF

517 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

We propose two families of tests for the classical goodness-of-fit problem to univariate normality. The new procedures are based on $L^2$-distances of the empirical zero-bias transformation to the empirical distribution or the normal distribution function. Weak convergence results are derived under the null hypothesis, under contiguous as well as under fixed alternatives. A comparative finite-sample power study shows the competitiveness to classical procedures.

Goodness-of-fit tests for semiparametric and parametric hypotheses based on the probability weighted empirical characteristic function

Article 23 March 2016

Tests for multivariate normality—a critical review with emphasis on weighted $L^2$-statistics

Article Open access 01 December 2020

U-Tests of General Linear Hypotheses for High-Dimensional Data Under Nonnormality and Heteroscedasticity

Article 01 September 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Testing normality is commonly known as the most used and discussed goodness-of-fit technique, justified by the model assumption of normality in classical models. To be specific, let $X, X_1, X_2, \ldots $ be real-valued independent and identically distributed (iid.) random variables. The problem of interest is to test the hypothesis

$$\begin{aligned} H_0: \mathbb {P}^X \in {\mathcal {N}}=\{ {\mathcal {N}}(\mu , \sigma ^2) \, | \, (\mu , \sigma ^2) \in \mathbb {R}\times (0,\infty ) \} \end{aligned}$$

(1)

against general alternatives. So far, a great variety of goodness-of-fit tests have been proposed, and research is of ongoing interest, as witnessed by the recent papers of Bera et al. (2016), Villaseñor-Alva and González-Estrada (2015) and comparative studies like from Romão et al. (2010), Yap and Sim (2011). Classical procedures in goodness-of-fit methodology such as the Kolmogorov–Smirnov and the Cramér–von Mises test approach the testing problem by measuring the distance of the empirical distribution function to the estimated representative of ${\mathcal {N}}$. For a theoretical approach to goodness-of-fit tests to a family of distributions, see del Barrio et al. (2000), Neuhaus (1979). Other methods are based on skewness and kurtosis, as, for instance, proposed by Pearson et al. (1977) (known to lead to inconsistent procedures), the empirical characteristic function, see Epps and Pulley (1983), the Wasserstein distance, see del Barrio et al. (2000), del Barrio et al. (1999), the sample entropy, see Vasicek (1976), the integrated empirical distribution function, see Klar (2001), or correlation and regression tests, as the famous Shapiro–Wilk test, see Shapiro and Wilk (1965), among others. For a survey of classical methods, see del Barrio et al. (2000), Sect. 3, and Henze (1994), and for the problem of testing multivariate normality, we refer to Henze (2002), Mecklin and Mundfrom (2004).

Another natural approach to assess the distance of the distribution of a real-valued random variable X to the normal distribution is to calculate the difference between $\mathbb {E}h(X)$ and $\mathbb {E}h(N)$, where $\mathbb {P}^N = {\mathcal {N}}(0,1)$, over some large class of functions $h : \mathbb {R}\rightarrow \mathbb {R}$. With the class $\{x \mapsto e^{itx}\, | \, t \in \mathbb {R}\}$ leading to the characteristic functions, one heavily relies on the assumption of independence when proving limit theorems. In an attempt to give an alternative proof of the central limit theorem, Charles Stein considered a different class of test functions (see, e.g. Stein 1972). Stating that X has a standard normal distribution if, and only if,

$$\begin{aligned} \mathbb {E}\big [ f^{\prime }(X) \big ] = \mathbb {E}\big [ X f(X) \big ] \end{aligned}$$

(2)

holds for each absolutely continuous function f for which the expectations exist, it appears reasonable to regard $\mathbb {E}\big [ f^{\prime }(X) - X f(X) \big ]$, for a suitable function f, as an estimate of $\mathbb {E}h(X) - \mathbb {E}h(N)$ since both terms ought to be small whenever the distribution of X is close to standard normal. In practice, solving the differential equation

$$\begin{aligned} f^{\prime }(x) - x f(x) = h(x) - \mathbb {E}h(N) \end{aligned}$$

(3)

for absolutely continuous functions h, evaluating at X and taking expectations, the problem reduces to appraising $\mathbb {E}\big [ f_h^{\prime }(X) - X f_h(X)\big ]$, with $f_h$ being the solution of (3). A commonly used tool to handle these terms is the so-called zero-bias transformation introduced by Goldstein and Reinert (1997). Namely, if $\mathbb {E}X = 0$ and $\mathbb {V}(X) = 1$, a random variable $X^*$ is said to have the X-zero-bias distribution if

$$\begin{aligned} \mathbb {E}\big [ f^{\prime }(X^*)\big ] = \mathbb {E}\big [ X f(X) \big ] \end{aligned}$$

(4)

holds for all absolutely continuous functions f for which these expectations exist. The use of this distribution, if it exists, lends itself easily to the purpose of distributional approximation. For instance, starting with the solution of (3), the mean value theorem gives

$$\begin{aligned} | \mathbb {E}h(X) - \mathbb {E}h(N) | = | \mathbb {E}[f_h^{\prime }(X) - f_h^{\prime }(X^*)] | \le \left||f_h^{\prime \prime } \right||_{\infty } \, \mathbb {E}| X - X^* | . \end{aligned}$$

Thus, the problem reduces to bounding the derivatives of the solution $f_h$ of (3) and constructing $X^*$ such that $\mathbb {E}| X - X^* |$ is accessible. Bounds on $f_h$ and its derivatives are well known, and a comprehensive treatment as well as explicit constructions for $X^*$ may be found in Chen et al. (2011) (for the bounds, see also Stein 1986). For a general introduction to Stein’s method, see Chen et al. (2011), Ross (2011). One of the main reasons Stein’s method, particularly for the normal distribution, has been studied to a remarkable extent are various central limit type results, also giving convergence rates, even in dependency settings.

It seems reasonable to ask whether Stein’s characterization (2) may be used to construct a goodness-of-fit statistic. Apparently, we can hardly evaluate a quantity for all absolutely continuous functions which makes the direct application of equation (2) rather complicated (cf. Liu et al. 2016). Instead, we propose a test based on the zero-bias distribution. To this end, we first recall the explicit formula for the density and distribution function of the zero-bias distribution.

Lemma 1

If X is a centred, real-valued random variable with $\mathbb {V}(X) = 1$, the X-zero-bias distribution exists and is unique. Moreover, it is absolutely continuous with respect to the Lebesgue measure with density

$$\begin{aligned} d^X (t) = \mathbb {E}[X \mathbb {1}\{ X > t \}] = - \mathbb {E}[X \, \mathbb {1}\{ X \le t \}] \end{aligned}$$

and distribution function

$$\begin{aligned} F^X (t) = \mathbb {E}[X (X - t) \mathbb {1}\{ X \le t \}] . \end{aligned}$$

A proof can be found in Chen et al. (2011) or in the original treatment (Goldstein and Reinert 1997). Now, interpreting (4) as a distributional transformation $\mathbb {P}^X \mapsto \mathbb {P}^{X^*}$, the standard normal distribution is characterized as the unique fixed point of this transformation [see also Goldstein and Reinert 1997, Lemma 2.1 (i)]. Writing this in terms of the formula from Lemma 1, the characterization reads as follows.

Theorem 1

A random variable X with distribution function F and $\mathbb {E}X = 0$, $\mathbb {V}(X) = 1$ has the standard normal distribution if, and only if,

$$\begin{aligned} F^X = F \end{aligned}$$

which in turn holds if, and only if,

$$\begin{aligned} F^X = \varPhi , \end{aligned}$$

where $\varPhi $ is the distribution function of the standard normal distribution.

Proof

By Lemma 1 and the presumptions on X, the zero-bias distribution $\mathbb {P}^{X^*}$ of $\mathbb {P}^X$ exists, is unique and has distribution function $F^X$. Hence, if $\mathbb {P}^X$ is the standard normal distribution, (2) is satisfied and the definition of the zero-bias distribution through formula (4) and its uniqueness imply $\mathbb {P}^{X^*} = \mathbb {P}^X$, that is, $F^X = F$. Conversely, if $F^X = F$, Lemma 1 implies $\mathbb {P}^{X^*} = \mathbb {P}^X$ and Stein’s characterization (2) yields that X is standard Gaussian.

For the second equivalence note that if X follows the standard normal law, $F^X = F = \varPhi $ by the first part. Finally, assume that $F^X = \varPhi $. Since $F^X$ is the distribution function of $X^*$, $\mathbb {P}^{X^*}$ is the standard normal distribution, so

$$\begin{aligned} \mathbb {E}\big [ f^{\prime }(X^*) \big ] = \mathbb {E}\big [ X^* f(X^*) \big ] \end{aligned}$$

holds for each absolutely continuous function f for which these expectations exist. The definition of the zero-bias distribution implies that for any such function f,

$$\begin{aligned} \mathbb {E}\big [ X f(X) \big ] = \mathbb {E}\big [ X^* f(X^*) \big ]. \end{aligned}$$

Noticing that these functions include any monomial (since $X^*$ has the standard normal distribution for which moments of all orders exist) and that the normal distribution is uniquely determined through its sequence of moments (see Theorem 30.1 of Billingsley 1995), this last equation shows $\mathbb {P}^X = \mathbb {P}^{X^*} = {\mathcal {N}}(0,1)$. $\square $

This theorem paves the way for the construction of goodness-of-fit tests using a measure of deviation between an empirical version of $F^X$ and $\varPhi $ or the empirical distribution, respectively. Heuristically, the above characterization indicates that the difference between these empirical quantities ought to be small when the underlying sample comes from a normal distribution and large whenever it does not. Thus, tests based on this characterization should be able to detect deviations from the normality hypothesis (1).

In Sect. 2, we use a weighted $L^2$-measure to construct two statistics for our testing problem (1). We derive the limit null distributions in Sect. 3 and study the behaviour under contiguous alternatives in Sect. 4. The consistency of these classes of tests is established in Sect. 5, and we obtain the limit distributions of the statistics under fixed alternatives. To analyse the actual performance, empirical results in form of a power study are presented in Sect. 6. Conclusions and outlines complete the article.

2 The new test statistics

Let $X, X_1, X_2, \ldots $ be real-valued iid. random variables defined on an underlying probability space $(\varOmega , {\mathcal {A}}, \mathbb {P})$. Further, let F be the distribution function of X and assume that $\mathbb {E}[X^2] < \infty $. To reflect the invariance of the family of normal distributions ${\mathcal {N}}$ with respect to affine transformations, the proposed statistics only depend on the so-called scaled residuals, namely $Y_{n,1}, \ldots , Y_{n,n}$,

$$\begin{aligned} Y_{n,j} = \frac{X_j - {\overline{X}}_n}{S_n}, \end{aligned}$$

where ${\overline{X}}_n = n^{-1} \sum _{k=1}^{n} X_k$ and $S_n^2 = n^{-1} \sum _{k=1}^{n}(X_k - {\overline{X}}_n)^2$ are the sample mean and variance, respectively. This way, the values of our statistics themselves and thus the tests based on them are invariant under affine transformations of the data. We note that if X has a normal distribution with some parameters $\mu $ and $\sigma ^2$, $Y_{n,1}$ is approximately standard normal since $({\overline{X}}_n, S_n^2)$ is a strongly consistent estimator of $(\mu , \sigma ^2)$. Due to the affine invariance, we assume, w.l.o.g., $\mathbb {E}X = 0$ and $\mathbb {V}(X) = 1$.

In view of Theorem 1 and the heuristics given thereafter, we suggest the Cramér–von Mises-type (or weighted $L^2$-type) test statistics

$$\begin{aligned} G_n^{(1)} = n \int _{\mathbb {R}} \left( \frac{1}{n} \sum _{j=1}^{n} Y_{n,j} (Y_{n,j} - t) \mathbb {1}\{ Y_{n,j} \le t \} - \frac{1}{n} \sum _{j=1}^{n} \mathbb {1}\{ Y_{n,j} \le t \} \right) ^2 \omega (t) \, \mathrm {d}t\quad \end{aligned}$$

(5)

and

$$\begin{aligned} G_n^{(2)} = n \int _{\mathbb {R}} \left( \frac{1}{n} \sum _{j=1}^{n} Y_{n,j} (Y_{n,j} - t) \mathbb {1}\{ Y_{n,j} \le t \} - \varPhi (t) \right) ^2 \omega (t) \, \mathrm {d}t . \end{aligned}$$

(6)

Here, $n^{-1} \sum _{j=1}^{n} Y_{n,j} (Y_{n,j} - t) \mathbb {1}\{ Y_{n,j} \le t \}$ is an empirical version of the zero-bias distribution function and $n^{-1} \sum _{j=1}^{n} \mathbb {1}\{ Y_{n,j} \le t \}$ is the empirical distribution function of $Y_{n,1}, \ldots , Y_{n,n}$. By $\omega : \mathbb {R}\rightarrow \mathbb {R}$, we denote a positive, continuous weight function satisfying

$$\begin{aligned} \int _{\mathbb {R}} t^6 \, \omega (t) \, \mathrm {d}t < \infty \end{aligned}$$

(7)

and

$$\begin{aligned} n \int _{\mathbb {R}} \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) - \omega (s) \right| ^{3} \big ( \omega (s) \big )^{-2} \mathrm {d}s = o_{\mathbb {P}}(1), \end{aligned}$$

(8)

where $o_{\mathbb {P}}(1)$ denotes convergence to 0 in probability as $n \rightarrow \infty $. A test based on $G_{n}^{(1)}$ or $G_{n}^{(2)}$ rejects $H_0$ for large values of the statistic. For the implementation of our tests, we need to specify the weight function $\omega $. To that end, we use the density function of a centred normal distribution

$$\begin{aligned} \omega _a (t) = \frac{1}{\sqrt{2 \pi a}} \, e^{- \frac{t^2}{2 a}}, \end{aligned}$$

where the variance is chosen to be some tuning parameter $a > 0$. We prove in Lemma 2 of “Appendix A” that $\omega _a$ satisfies the above conditions. Note that this type of weight has also been employed by Henze and Zirkler (1990). For this explicit function, our statistics have the expressions

$$\begin{aligned} G_{n,a}^{(1)}&= \frac{2}{n} \sum \limits _{1 \le j < k \le n} \left\{ \phantom {\exp \left( - \tfrac{Y_{(k)}^2}{2 a}\right) } \left( 1 - \varPhi \left( \tfrac{Y_{(k)}}{\sqrt{a}} \right) \right) \left( (Y_{(j)}^2 - 1)(Y_{(k)}^2 - 1) + a Y_{(j)} Y_{(k)} \right) \right. \nonumber \\&\quad \left. \, + \frac{a}{\sqrt{2 \pi a}} \, \exp \left( - \tfrac{Y_{(k)}^2}{2 a}\right) \left( - Y_{(j)}^2 Y_{(k)} + Y_{(k)} + Y_{(j)} \right) \right\} \nonumber \\&\quad + \frac{1}{n} \sum \limits _{j=1}^{n} \left\{ \phantom {\frac{a}{\sqrt{2 \pi a}}} \left( 1 - \varPhi \left( \tfrac{Y_{j}}{\sqrt{a}}\right) \right) \left( Y_j^4 + (a - 2) Y_j^2 + 1 \right) \right. \nonumber \\&\quad \left. + \frac{a}{\sqrt{2 \pi a}} \, \exp \left( - \tfrac{Y_j^2}{2 a}\right) \left( 2 Y_j - Y_j^3 \right) \right\} \end{aligned}$$

(9)

and

$$\begin{aligned} G_{n,a}^{(2)}&= \frac{2}{n} \sum \limits _{1 \le j < k \le n} \left\{ Y_{(j)} Y_{(k)} \left[ \left( Y_{(j)} Y_{(k)} + a \right) \left( 1 - \varPhi \left( \tfrac{Y_{(k)}}{\sqrt{a}} \right) \right) - a Y_{(j)} \, \omega _a(Y_{(k)}) \right] \right\} \nonumber \\&\quad + \sum \limits _{j=1}^{n} \Bigg \{\frac{Y_j^2}{n} \left[ (Y_j^2 + a) \left( 1 - \varPhi \left( \tfrac{Y_j}{\sqrt{a}} \right) \right) - a Y_j \, \omega _a(Y_j) \right] \nonumber \\&\quad - 2 Y_j \Bigg [Y_j \int _{Y_j}^{\infty } \varPhi (t) \, \omega _a (t) \, \mathrm {d}t - a \varPhi (Y_j) \, \omega _a(Y_j) \nonumber \\&\quad - \frac{a}{\sqrt{2 \pi (1+a)}} \left( 1 - \varPhi \left( \sqrt{\tfrac{1 + a}{a}} \, Y_j \right) \right) \Bigg ] \Bigg \} \nonumber \\&\quad + n \int _{\mathbb {R}} \varPhi (t)^2 \, \omega _a(t) \, \mathrm {d}t , \end{aligned}$$

(10)

where $Y_{1}, \ldots , Y_{n}$ is shorthand for the normalized sample $Y_{n, 1}, \ldots , Y_{n,n}$ and $Y_{(1)} \le \cdots \le Y_{(n)}$ is the ordered sample. Those expressions make the statistics amenable to computations and, with critical values like those given in Sect. 6, the tests can be implemented immediately for any fixed $a > 0$.

The tuning parameter a determines the decay of the weight function. For tests based on the Laplace or the Fourier transform, the properties that those transformations reflect on the underlying distribution often give a good heuristic for which values of the tuning parameter lead to a high power of the test (see Baringhaus et al. 2000 for examples and explanations). Since the zero-bias transformation is known to preserve many properties of the original distribution, we expect that, at least for our first statistic, the tuning parameter will have little influence on the power for most alternative distributions. Indeed, we will observe that both tests are very stable in this regard. Nevertheless, for some (symmetric) alternative distributions the choice of the tuning parameter is crucial; therefore, we additionally implement our test with an adaptive, data-dependent choice as proposed by Allison and Santana (2015). Particularly interesting is the case $a \searrow 0$. Here, Baringhaus et al. (2000) have shown that, after suitable rescaling, this limit can be obtained explicitly for many test statistics by using an Abelian theorem for the Laplace transform. (Note that due to different parametrization they let $a \rightarrow \infty $.) For our statistics, we have

$$\begin{aligned} 2 n \lim \limits _{a \, \searrow \, 0} G_{n, a}^{(1)} = \left( \sum _{j = 1}^{n} (Y_{j}^2 - 1) \, \mathbb {1}\{ Y_{j} \le 0 \} \right) ^2 + \left( \sum _{j = 1}^{n} (Y_{j}^2 - 1) \, \mathbb {1}\{ Y_{j} < 0 \} \right) ^2 \end{aligned}$$

(11)

and

$$\begin{aligned} 2 n \lim \limits _{a \, \searrow \, 0} G_{n, a}^{(2)} = \left( \frac{n}{2} - \sum _{j = 1}^{n} Y_{j}^2 \, \mathbb {1}\{ Y_{j} \le 0 \} \right) ^2 + \left( \frac{n}{2} - \sum _{j = 1}^{n} Y_{j}^2 \, \mathbb {1}\{ Y_{j} < 0 \} \right) ^2, \end{aligned}$$

(12)

that is, in the limit $a \searrow 0$, $G_{n,a}^{(1)}$ and $G_{n,a}^{(2)}$ reject the normality hypothesis for large values of the respective limits in (11) and (12). A proof of those limit relations is given in “Appendix C”. If the underlying distribution of X is continuous, the indicator functions in the above limits are equal almost surely, and the terms can be simplified. A related question is the limit for $a \rightarrow \infty $. Starting from (9), direct but tedious calculations, mostly involving L’Hospital’s rule, give

$$\begin{aligned} \sqrt{2 \pi }\, n \lim \limits _{a \, \rightarrow \, \infty } \sqrt{a} \, G_{n,a}^{(1)} =&\sum \limits _{1 \le j < k \le n} \Big \{ 2 Y_{(j)}^2 Y_{(k)} - Y_{(j)} Y_{(k)}^2 + \frac{1}{3} Y_{(j)} Y_{(k)}^4 - Y_{(j)}^2 Y_{(k)}^3 \Big \} \\&+ \sum \limits _{j=1}^{n} \left\{ - \frac{1}{3} Y_{(j)}^5 + j \, Y_{(j)}^3 - 2(j - 1) Y_{(j)} \right\} \end{aligned}$$

We omit the calculations as they provide no further insight. It remains open if a similar limit exists for the second statistic $G_{n,a}^{(2)}$.

Having discussed the framework for the implementation of our tests, it remains to introduce the setting for our theoretical studies. Namely, to develop the asymptotic theory, we let ${\mathcal {B}}$ be the Borel-$\sigma $-field of $\mathbb {R}$ and ${\mathcal {L}}^1$ the Lebesgue measure on $\mathbb {R}$, and consider the Hilbert space

$$\begin{aligned} {\mathcal {H}} = L^2(\mathbb {R}, {\mathcal {B}}, \omega \, \mathrm {d} {\mathcal {L}}^1) \end{aligned}$$

of measurable, square-integrable functions $f: \mathbb {R}\rightarrow \mathbb {R}$. Notice that the functions figuring within the integral in the definition of $G_n^{(1)}$ and $G_n^{(2)}$ are $({\mathcal {A}}\,\otimes \,{\mathcal {B}}, {\mathcal {B}})$-measurable and random elements of ${\mathcal {H}}$. We denote by

$$\begin{aligned} \left||f \right||_{{\mathcal {H}}} = \left( \int _{\mathbb {R}} \big |f(t)\big |^2 \, \omega (t) \, \mathrm {d}t \right) ^{1/2}, \qquad \langle f, g \rangle _{{\mathcal {H}}}=\int _{\mathbb {R}} f(t)g(t) \, \omega (t) \, \mathrm {d}t \end{aligned}$$

the usual norm as well as the usual inner product in ${\mathcal {H}}$. Furthermore, we write ${\mathcal {U}}_n(s) \approx {\mathcal {V}}_n(s)$ whenever

$$\begin{aligned} \left||{\mathcal {U}}_n - {\mathcal {V}}_n \right||_{{\mathcal {H}}} = o_{\mathbb {P}}(1) . \end{aligned}$$

Here, ${\mathcal {U}}_n$ and ${\mathcal {V}}_n$ are random elements of our Hilbert space. For the approximations associated with this notation, Lemma 3 stated in “Appendix A” will be essential. We have also deferred some asymptotic expansions to “Appendix B” so it is easier to grasp the main ideas of the proofs. In the following, we denote convergence in distribution by and write $O_{\mathbb {P}}(1)$ for boundedness in probability.

3 The limit null distributions

Our first results for the statistics concern the study of their behaviour under the hypothesis. In particular, we derive the limit distributions for $n \rightarrow \infty $ when the normality hypothesis (1) holds. Therefore, we assume in this section that $X, X_1, X_2, \ldots $ are iid. random variables with $\mathbb {P}^X = {\mathcal {N}}(0, 1)$. By $\varphi $, we denote the density function of the standard normal law.

Theorem 2

There exists a centred Gaussian element ${\mathcal {W}}^{(2)}$ of ${\mathcal {H}}$ with covariance kernel

$$\begin{aligned} {\mathcal {K}}^{(2)} (s,t)= & {} \big (2(s + t) - (st + 3)(s \wedge t) + (s + t)(s \wedge t)^2 - (s \wedge t)^3\big ) \varphi (s \wedge t)\\&{}+{} (st + 3) \big (\varPhi (s \wedge t) - \varPhi (s) \varPhi (t)\big ) + (t - 2s) \varphi (t) \varPhi (s)\\&{}+{} (s - 2t) \varphi (s) \varPhi (t) - \frac{st}{2} \varphi (s) \varphi (t) - 4 \varphi (s) \varphi (t), \quad s, t \in \mathbb {R}, \end{aligned}$$

where $s \wedge t = \min \{ s, t \}$, such that

Proof

Note that a simple change of variable in the integral gives

$$\begin{aligned} G_n^{(2)} = \frac{1}{S_n} \int _{\mathbb {R}} \big |\sqrt{n} \, U_n(s)\big |^2 \, \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \mathrm {d}s , \end{aligned}$$

(13)

where

$$\begin{aligned} U_n(s) = {\widehat{F}}_n^X (s) - \varPhi \left( \frac{s - {\overline{X}}_n}{S_n} \right) \end{aligned}$$

and

$$\begin{aligned} {\widehat{F}}_n^X (s) = \frac{1}{n} \sum _{j=1}^{n} \frac{X_j - {\overline{X}}_n}{S_n^2} \, (X_j - s) \, \mathbb {1}\{X_j \le s\}, \quad s \in \mathbb {R}. \end{aligned}$$

The idea of the proof is to show that $U_n$ converges weakly to the Gaussian element of ${\mathcal {H}}$ stated in the theorem and to use Lemma 3 from “Appendix A” to replace the shifted weight function in the integral above by $\omega (s)$. Indeed, with (26) from Lemmata 4 and 5 we have

$$\begin{aligned} \sqrt{n} \, U_n(s) \approx&\frac{1}{S_n^2} \, \sqrt{n} \left\{ \frac{1}{n} \sum \limits _{j=1}^{n} X_j (X_j - s) \mathbb {1}\{ X_j \le s \} - {\overline{X}}_n \, \mathbb {E}\big [ (X - s) \mathbb {1}\{X \le s\} \big ] \right. \\&\left. \, - \frac{1}{n} \sum \limits _{j=1}^{n} X_j^2 \, \varPhi (s) - S_n \, \varphi (s) \left( (1 - S_n) \cdot s - {\overline{X}}_n \right) \right\} . \end{aligned}$$

Since

$$\begin{aligned} \sqrt{n} \, (1 - S_n) = \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} \frac{1}{2} \left( 1 - X_j^2 \right) + o_\mathbb {P}(1) , \end{aligned}$$

we obtain

$$\begin{aligned} \sqrt{n} \, U_n(s) \approx \frac{1}{S_n^2} \, \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} W_j(s), \end{aligned}$$

(14)

where

$$\begin{aligned} W_j(s) =&\, X_j (X_j - s) \mathbb {1}\{ X_j \le s \} + X_j \big ( \varphi (s) + s \varPhi (s) \big ) \\&- X_j^2 \, \varPhi (s) - \left( \tfrac{1}{2} \, (1 - X_j^2) \cdot s - X_j \right) \varphi (s) . \end{aligned}$$

Notice that $W_1, \ldots , W_n$ are iid. random elements of ${\mathcal {H}}$ with $\mathbb {E}W_1 = 0$ (as $F^X = \varPhi $ under $H_0$, cf. Theorem 1) and $\mathbb {E}\left||W_1 \right||_{\mathcal {H}}^2 < \infty $. The central limit theorem for separable Hilbert spaces, see Corollary 10.9 in Ledoux and Talagrand (2011), provides the existence of a centred Gaussian element ${\mathcal {W}}^{(2)} \in {\mathcal {H}}$ with

By (14), $\left||\sqrt{n} \, U_n \right||_{{\mathcal {H}}} = O_\mathbb {P}(1)$ and Lemma 4 implies $\mathop {\sup }\nolimits _{s \, \in \, \mathbb {R}} \big | U_n(s) \big | \le 2$$\mathbb {P}$-almost surely (a.s.) for each $n \in \mathbb {N}$. Thus, with Lemma 3, (13) reads as

The continuous mapping theorem and Slutsky’s lemma imply

Since the function ${\mathcal {K}}^{(2)}$ defined in the statement of the theorem satisfies ${\mathcal {K}}^{(2)}(s,t) = \mathbb {E}[ W_1(s) \, W_1(t) ]$, it is the covariance kernel of ${\mathcal {W}}^{(2)}$ and we are done. $\square $

For $G_n^{(1)}$, the limit distribution under the hypothesis can be obtained in a similar manner. Starting with

$$\begin{aligned} G_n^{(1)} = \frac{n}{S_n} \int _{\mathbb {R}} \left( {\widehat{F}}_n^X (s) - \frac{1}{n} \sum _{j=1}^{n} \mathbb {1}\{ X_j \le s \} \right) ^2 \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \, \mathrm {d}s , \end{aligned}$$

(15)

the reasoning closely parallels that of the proof of Theorem 2.

Corollary 1

There exists a centred Gaussian element ${\mathcal {W}}^{(1)}$ of ${\mathcal {H}}$ with covariance kernel

$$\begin{aligned} {\mathcal {K}}^{(1)} (s,t)= & {} \big ((s + t) - (st + 1)(s \wedge t) + (s + t)(s \wedge t)^2 - (s \wedge t)^3\big ) \varphi (s \wedge t)\\&{}+{} (st + 2)\big (\varPhi (s \wedge t) - \varPhi (s) \varPhi (t)\big ) - s \varPhi (s) \varphi (t) - t \varPhi (t) \varphi (s) \\&{}-{} \varphi (s) \varphi (t), \quad s,t \in \mathbb {R}, \end{aligned}$$

such that

Remark

The distribution of $\left||{\mathcal {W}}^{(k)} \right||_{{\mathcal {H}}}^2$, $k = 1, 2$, that is, the limit distribution of $G_n^{(k)}$ under the hypothesis, is that of $\sum _{j=1}^{\infty } \lambda _j^{(k)} N_j^2$. Here, $N_1, N_2, \ldots $ are independent standard Gaussian random variables and $\lambda _1^{(k)}, \lambda _2^{(k)}, \ldots $ are the nonzero eigenvalues of the operator

$$\begin{aligned} {\mathcal {H}} \rightarrow {\mathcal {H}}, \quad f \longmapsto \int _{\mathbb {R}} {\mathcal {K}}^{(k)} (\cdot , t) \, f(t) \, \omega (t) \, \mathrm {d}t , \end{aligned}$$

$k = 1, 2$. Considering the complexity of ${\mathcal {K}}^{(k)}$, it does not seem possible to determine $\lambda _j^{(k)}$ explicitly. Thus, in practice, critical values are obtained by simulation rather than by using asymptotic results. An alternative approach to gain theoretically justified (approximate) critical values is to calculate the first four moments of the limit null distribution and fit a representative of the Pearson- or Johnson-family of distributions to those moments (see Henze 1990 for an example). Since we do not face any complications in computing the critical values, we will only pursue the empirical approach.

4 Contiguous alternatives

Adjusting the argumentation of Henze and Wagner (1997), we will derive non-degenerate limit distributions for our statistics under contiguous alternatives converging to the normal distribution at rate $n^{-1/2}$. To this end, we introduce a triangular array of row-wise iid. random variables $X_{n,1}, \ldots , X_{n,n}$, $n \in \mathbb {N}$, with Lebesgue density

$$\begin{aligned} p_n (x) = \varphi (x) \cdot \left( 1 + \tfrac{1}{\sqrt{n}} \, c(x) \right) , \quad x \in \mathbb {R}. \end{aligned}$$

Here, $c : \mathbb {R}\rightarrow \mathbb {R}$ is a measurable, bounded function satisfying

$$\begin{aligned} \int _{\mathbb {R}} c(x) \, \varphi (x) \, \mathrm {d}x = 0 . \end{aligned}$$

Notice that, by the boundedness of c, we may assume n to be large enough to ensure $p_n \ge 0$. We set

$$\begin{aligned} \mu _n = \bigotimes \limits _{j=1}^{n} (\varphi {\mathcal {L}}^1) , \quad \nu _n = \bigotimes \limits _{j=1}^{n} (p_n {\mathcal {L}}^1) \end{aligned}$$

which are measures on $(\mathbb {R}^n, {\mathcal {B}}^n)$, where ${\mathcal {B}}^n$ is the Borel-$\sigma $-field of $\mathbb {R}^n$. Apparently, $\nu _n$ is absolutely continuous with respect to $\mu _n$ and we can look upon the Radon–Nikodym derivative $L_n = \frac{\mathrm {d}\nu _n}{\mathrm {d}\mu _n}$. By a Taylor expansion,

$$\begin{aligned} \log \left( L_n (X_{n,1}, \ldots , X_{n,n}) \right)&= \sum \limits _{j=1}^{n} \log \left( 1 + \frac{1}{\sqrt{n}} \, c(X_{n,j}) \right) \\&= \sum \limits _{j=1}^{n} \left( \frac{1}{\sqrt{n}} \, c(X_{n,j}) - \frac{1}{2n} \, c(X_{n,j})^2 \right) + o_{\mathbb {P}}(1) \end{aligned}$$

whenever $(X_{n,1}, \ldots , X_{n,n})$ has distribution $\mu _n$. (Note that in this case the triangular array essentially reduces to a sequence of iid. random variables with density $\varphi $.) Therefore, viewing $L_n$ as a random element $(\mathbb {R}^n, {\mathcal {B}}^n, \mu _n) \rightarrow (\mathbb {R}, {\mathcal {B}})$, the central limit theorem and the law of large numbers give

where

$$\begin{aligned} \tau ^2 = \int _{\mathbb {R}} c(x)^2 \, \varphi (x) \, \mathrm {d}x \end{aligned}$$

and denotes convergence in distribution under $\mu _n$. By LeCam’s first Lemma (see, for instance, Hájek et al. 1999, p. 253, Corollary 1), $\nu _n$ is contiguous to $\mu _n$. Interpreting $U_n$ from the proof of Theorem 2 as $U_n : \mathbb {R}^n \rightarrow {\mathcal {H}}$, we have shown that and (14) reads as

where and

Thus, by contiguity, and, in particular,

(16)

Defining

we have, under ,

with $\zeta \in {\mathcal {H}}$. Consequently, for any $k \in \mathbb {N}$, $v \in \mathbb {R}^k$ and $s_1, \ldots , s_k \in \mathbb {R}$, the multivariate central limit theorem, the law of large numbers and Slutsky’s lemma imply

Here, $\varSigma = \left( {\mathcal {K}}^{(2)}(s_i, s_j) \right) _{1 \le i,j \le k}$, with ${\mathcal {K}}^{(2)}$ the covariance kernel of ${\mathcal {W}}^{(2)}$ from Theorem 2, and $\zeta _k = \big ( \zeta (s_1), \ldots , \zeta (s_k) \big )^\top $. Therefore,

and LeCam’s third Lemma (see Hájek et al. 1999, p. 259, Lemma 2) implies

(17)

where denotes convergence of the finite-dimensional distributions (under $\nu _n$). In the proof of Theorem 2, we have shown that which entails the tightness of $\{W_n^* \, | \, n \in \mathbb {N}\}$ under $\mu _n$. As $\{W_n^* \, | \, n \in \mathbb {N}\}$ remains tight under $\nu _n$ by contiguity, (17) yields

Combining this with (16), we have shown the following theorem.

Theorem 3

Under the triangular array $X_{n,1} , \ldots , X_{n,n}$ with $\mathbb {P}^{X_{n,1}} = p_n {\mathcal {L}}^1$, we have

Since Corollary 1 is obtained with the same line of proof used in Theorem 2, we can likewise conclude

where ${\widetilde{\zeta }} (s) = \int {\widetilde{\eta }}(s,x) \, c(x) \, \varphi (x) \, \mathrm {d}x$ and

$$\begin{aligned} {\widetilde{\eta }}(s,x) = \big (x(x - s) - 1\big ) \mathbb {1}\{ x \le s \} + \left( 1 + x \cdot s - x^2\right) \varPhi (s) + x \varphi (s). \end{aligned}$$

From these statements, we discern that tests based on any of our statistics are able to detect contiguous alternatives which converge, at rate $n^{-1/2}$, to the class of normal distributions. For further insights on contiguity, we refer to Roussas (1972) and Sen (1981).

5 Consistency and limit distributions under fixed alternatives

The major goal of this section is to establish that our test procedures can detect any fixed alternative satisfying a weak moment condition. For some of those distributions, we can even extend the consistency results and derive weak limits of our statistics.

We return to the nonparametric setting of Sect. 2 and let $X, X_1, X_2, \ldots $ be iid. random variables with distribution function F and $\mathbb {E}[X^2] < \infty $. Further, we assume $\mathbb {E}X = 0$ and $\mathbb {V}(X) = 1$.

Theorem 4

As $n \rightarrow \infty $, we have

$$\begin{aligned} \frac{G_n^{(1)}}{n} \longrightarrow \int _{\mathbb {R}} \left( F^X (s) - F(s) \right) ^2 \omega (s) \, \mathrm {d}s = \left||F^X - F \right||_{{\mathcal {H}}}^2 = \varDelta ^{(1)} \end{aligned}$$

and

$$\begin{aligned} \frac{G_n^{(2)}}{n} \longrightarrow \int _{\mathbb {R}} \left( F^X (s) - \varPhi (s) \right) ^2 \omega (s) \, \mathrm {d}s = \left||F^X - \varPhi \right||_{{\mathcal {H}}}^2 = \varDelta ^{(2)} , \end{aligned}$$

where each convergence is in probability.

Proof

We denote by ${\widehat{F}}_n$ the empirical distribution function of $X_1, \ldots , X_n$. By the classical Glivenko–Cantelli theorem and (25) from Lemma 4,

$$\begin{aligned}&\left| \left||{\widehat{F}}_n^X - {\widehat{F}}_n \right||_{{\mathcal {H}}}^2 - \varDelta ^{(1)} \right| \\&\quad \le 4 \left( \sup \limits _{s \, \in \, \mathbb {R}} \Big | {\widehat{F}}_n^X(s) - F^X(s) \Big | + \sup \limits _{s \, \in \, \mathbb {R}} \Big | {\widehat{F}}_n(s) - F(s) \Big | \right) \int _{\mathbb {R}} \omega (s) \, \mathrm {d}s \\&\quad \longrightarrow 0 \end{aligned}$$

$\mathbb {P}$-a.s., as $n \rightarrow \infty $. Rewriting $G_n^{(1)}$ as in (15) and applying the second part of Lemma 3, we have

$$\begin{aligned} \frac{G_n^{(1)}}{n} = \left||{\widehat{F}}_n^X - {\widehat{F}}_n \right||_{{\mathcal {H}}}^2 + o_\mathbb {P}(1). \end{aligned}$$

The proof of the second claim is almost identical. $\square $

We recall that a level-$\alpha $-test based on $G_n^{(k)}$, $k = 1, 2$, rejects the hypothesis if $G_n^{(k)} > c_n^{(k)}$, where $c_n^{(k)}$ is the $(1 - \alpha )$-quantile of the distribution of $G_n^{(k)}$ under $H_0$. Theorem 2 (and Corollary 1) ensures that $\sup _{n \, \in \, \mathbb {N}} c_n^{(k)} < \infty $. Now, by Theorem 1, the limits $\varDelta ^{(1)}$ and $\varDelta ^{(2)}$ figuring in Theorem 4 are positive if X has a non-normal distribution. Consequently,

$$\begin{aligned} \mathbb {P}\left( G_n^{(k)}> c_n^{(k)} \right) \ge \mathbb {P}\left( n^{-1} G_n^{(k)} > n^{-1} \sup _{n \, \in \, \mathbb {N}} c_n^{(k)} \right) \longrightarrow 1, \end{aligned}$$

as $n \rightarrow \infty $, and a level-$\alpha $-test based on $G_n^{(1)}$ or $G_n^{(2)}$ is consistent against each alternative with existing second moment.

The following theorem concerns the limit distributions of our statistics under fixed alternatives.

Theorem 5

Let $X, X_1, X_2, \ldots $ be iid., non-normal random variables with distribution function F and $\mathbb {E}X^4 < \infty $. Assume that X has a continuously differentiable density function p satisfying $\sup _{s \, \in \, \mathbb {R}} |p(s)| \le K_1 < \infty $ and $\sup _{s \, \in \, \mathbb {R}} |p^{\prime }(s)| \le K_2 < \infty $. W.l.o.g., $\mathbb {E}X = 0$ and $\mathbb {V}(X) = 1$. Then, as $n \rightarrow \infty $,

(18)

where

$$\begin{aligned} \tau _{(1)}^2 = 4 \int _{\mathbb {R}} \int _{\mathbb {R}} {\mathfrak {C}}^{(1)}(s,t) \left( F^X(s) - F(s) \right) \left( F^X(t) - F(t) \right) \, \omega (s) \, \omega (t) \, \mathrm {d}s \, \mathrm {d}t \end{aligned}$$

with

$$\begin{aligned} {\mathfrak {C}}^{(1)}(s,t) = \mathbb {E}\big [ C^{(1)}(s) \, C^{(1)}(t) \big ], \quad s,t \in \mathbb {R}, \end{aligned}$$

and

$$\begin{aligned} C^{(1)}(s) =&\big (X (X - s) - 1\big ) \mathbb {1}\{ X \le s \} - X^2 F^X(s) + \big ( 1 + X \cdot s \big ) F(s) \\&- \left( \tfrac{1}{2} (1 - X^2) \cdot s - X \right) p(s) + \left( 2X - \tfrac{1}{2} (1 - X^2) \cdot s \right) d^X(s) , \quad s \in \mathbb {R}. \end{aligned}$$

Proof

Setting

$$\begin{aligned} V_n (s) = {\widehat{F}}_n^X (s) - \frac{1}{n} \sum \limits _{j=1}^{n} \mathbb {1}\{ X_j \le s \} - F^X \left( \frac{s - {\overline{X}}_n}{S_n} \right) + F \left( \frac{s - {\overline{X}}_n}{S_n} \right) ,\quad s \in \mathbb {R}, \end{aligned}$$

a change of variable in both integrals and an integral decomposition, as used by Chapman (1958), gives

$$\begin{aligned} \sqrt{n} \left( \frac{G_n^{(1)}}{n} - \varDelta ^{(1)} \right) =&~ \frac{1}{S_n} \left\{ 2 \int _{\mathbb {R}} \sqrt{n} \, V_n(s) \cdot \left[ F^X ({\widetilde{s}}) - F ({\widetilde{s}}) \right] \omega ({\widetilde{s}}) \, \mathrm {d}s \right. \nonumber \\&\left. + \frac{1}{\sqrt{n}} \int _{\mathbb {R}} \big | \sqrt{n} \, V_n(s) \big |^2 \, \omega ({\widetilde{s}}) \, \mathrm {d}s \right\} , \end{aligned}$$

where ${\widetilde{s}} = (s - {\overline{X}}_n) / S_n$. Under the assumption $\left||\sqrt{n} \, V_n \right||_{{\mathcal {H}}} = O_\mathbb {P}(1)$, Hölder’s inequality, Lebesgue’s theorem and Slutsky’s lemma give

$$\begin{aligned}&\left| \int _{\mathbb {R}} \sqrt{n} \, V_n(s) \cdot \left[ F^X ({\widetilde{s}}) - F ({\widetilde{s}}) \right] \omega (s) \, \mathrm {d}s - \big \langle \sqrt{n} \, V_n , \, F^X - F \big \rangle _{{\mathcal {H}}} \right| \\&\quad \le \left||\sqrt{n} \, V_n \right||_{{\mathcal {H}}} \left( \int _{\mathbb {R}} \big | F^X({\widetilde{s}}) - F^X(s) + F(s) - F({\widetilde{s}}) \big |^2 \omega (s) \, \mathrm {d}s \right) ^{1/2} \\&\quad = o_\mathbb {P}(1). \end{aligned}$$

Here, we used that both $F^X$ and F are continuous distribution functions. Using that $\sup _{s \, \in \, \mathbb {R}} \big |V_n(s)\big | \le 4$$\mathbb {P}$-a.s. for each $n \in \mathbb {N}$ (note that ${\widehat{F}}_n^X$ is a distribution function as well by Lemma 4), Lemma 3 from “Appendix A” yields

$$\begin{aligned} \sqrt{n} \left( \frac{G_n^{(1)}}{n} - \varDelta ^{(1)} \right) = 2 \, \big \langle \sqrt{n} \, V_n , \, F^X - F \big \rangle _{{\mathcal {H}}} + \frac{1}{\sqrt{n}} \left||\sqrt{n} \, V_n \right||_{{\mathcal {H}}}^2 + o_\mathbb {P}(1). \end{aligned}$$

(19)

To verify the above assumption, we show that $\sqrt{n} \, V_n$ converges in distribution in ${\mathcal {H}}$. In this regard, (26) and Lemma 5 imply

$$\begin{aligned} \sqrt{n} \, V_n(s) \approx&\, \frac{\sqrt{n}}{S_n^2} \left\{ \frac{1}{n} \sum \limits _{j=1}^{n} X_j (X_j - s) \mathbb {1}\{ X_j \le s \} - {\overline{X}}_n \, \mathbb {E}\big [ (X - s) \mathbb {1}\{X \le s\} \big ] \right. \\&\left. - S_n^2 \left( \frac{1}{n} \sum \limits _{j=1}^{n} \mathbb {1}\{ X_j \le s \} - F(s) \right) - \frac{1}{n} \sum \limits _{j=1}^{n} X_j^2 \, F^X(s) \right. \\&\left. - S_n \left( (1 - S_n) \cdot s - {\overline{X}}_n \right) \left( d^X(s) - p(s) \right) \phantom {\sum \limits _{j}^{n}}\right\} . \end{aligned}$$

By the classical Glivenko–Cantelli theorem and $\sqrt{n} \, (S_n^2 - 1) = O_{\mathbb {P}}(1)$,

$$\begin{aligned} \sqrt{n} \, S_n^2 \left( \frac{1}{n} \sum \limits _{j=1}^{n} \mathbb {1}\{ X_j \le s \} - F(s) \right) \approx \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} \left( \mathbb {1}\{ X_j \le s \} - F(s) \right) . \end{aligned}$$

Together with the expansion $\sqrt{n} \, (1 - S_n) = n^{-1/2} \sum _{j=1}^{n} \frac{1}{2} (1 - X_j^2) + o_\mathbb {P}(1)$, we have

$$\begin{aligned} \sqrt{n} \, V_n(s) \approx \frac{1}{S_n^2} \, \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} Z_j(s) , \end{aligned}$$

(20)

where

$$\begin{aligned} Z_j(s) =&\, X_j (X_j - s) \mathbb {1}\{ X_j \le s \} + X_j \left( d^X(s) + s F(s) \right) - X_j^2 \, F^X(s) \\&- \mathbb {1}\{ X_j \le s \} + F(s) - \left( \tfrac{1}{2} (1 - X_j^2) \cdot s - X_j \right) \left( d^X(s) - p(s) \right) . \end{aligned}$$

Since $Z_1, \ldots , Z_n$ are iid. random elements of ${\mathcal {H}}$ with $\mathbb {E}Z_1 = 0$ as well as $\mathbb {E}\left||Z_1 \right||_{\mathcal {H}}^2 < \infty $, the central limit theorem for separable Hilbert spaces implies

where ${\mathcal {Z}} \in {\mathcal {H}}$ is a centred Gaussian element. In particular, $\left||\sqrt{n} \, V_n \right||_{{\mathcal {H}}}$ is bounded in probability by (20), and (19) holds. The continuous mapping theorem and Slutsky’s Lemma imply

Denoting the covariance kernel of ${\mathcal {Z}}$ by

$$\begin{aligned} {\mathfrak {C}}^{(1)}(s,t) = \mathbb {E}\big [ {\mathcal {Z}}(s) \, {\mathcal {Z}}(t) \big ], \end{aligned}$$

the limiting random variable $2 \, \big \langle {\mathcal {Z}}, \, F^X - F \big \rangle _{{\mathcal {H}}}$ has the normal distribution ${\mathcal {N}}(0, \tau _{(1)}^2)$, where

$$\begin{aligned} \tau _{(1)}^2&= 4 \, \mathbb {E}\Big [ \big \langle {\mathcal {Z}}, \, F^X - F \big \rangle _{{\mathcal {H}}}^2 \Big ] \\&= 4 \int _{\mathbb {R}} \int _{\mathbb {R}} {\mathfrak {C}}^{(1)}(s,t) \left( F^X(s) - F(s) \right) \left( F^X(t) - F(t) \right) \omega (s) \, \omega (t) \, \mathrm {d}s \, \mathrm {d}t . \end{aligned}$$

$\square $

Applying the reasoning of Theorem 5 to $G_n^{(2)}$, $\varPhi $ will drop out when considering the decomposition of the integrals. Proceeding with the remaining terms exactly as before, we obtain an analogous statement for the second statistic under slightly weaker conditions.

Corollary 2

Let $X, X_1, X_2, \ldots $ be iid., non-normal random variables with distribution function F, Lebesgue density p and $\mathbb {E}X^4 < \infty $. Further, assume $\sup _{s \, \in \, \mathbb {R}} |p(s)| < \infty $ and $\mathbb {E}X = 0$, $\mathbb {V}(X) = 1$. Then, as $n \rightarrow \infty $,

(21)

where

$$\begin{aligned} \tau _{(2)}^2 = 4 \int _{\mathbb {R}} \int _{\mathbb {R}} {\mathfrak {C}}^{(2)}(s,t) \left( F^X(s) - \varPhi (s) \right) \left( F^X(t) - \varPhi (t) \right) \, \omega (s) \, \omega (t) \, \mathrm {d}s \, \mathrm {d}t \end{aligned}$$

with

$$\begin{aligned} {\mathfrak {C}}^{(2)}(s,t) = \mathbb {E}\big [ C^{(2)}(s) \, C^{(2)}(t) \big ],\quad s,t\in \mathbb {R}, \end{aligned}$$

and

$$\begin{aligned} C^{(2)}(s) =&\, X (X - s) \mathbb {1}\{ X \le s \} - X^2 \, F^X(s) + X \cdot s F(s) \\&+ \left( 2 X - \tfrac{1}{2} (1 - X^2) \cdot s \right) d^X(s), \quad s \in \mathbb {R}. \end{aligned}$$

Remark

Note that for Theorem 5, we redeployed a line of proof put forward by Baringhaus et al. (2017). The asymptotic normality also qualifies our statistics for the applications they propose (see also Baringhaus and Henze 2017).

First, we fix $\alpha \in (0,1)$ and denote by $q_{\alpha } = \varPhi ^{-1} (1 - \alpha /2)$ the $(1 - \alpha /2)$-quantile of the standard normal distribution. Letting

$$\begin{aligned} {\widehat{\tau }}_{(k),n}^2 = {\widehat{\tau }}_{(k),n}^2 (X_1, \ldots , X_n) \end{aligned}$$

be a (weakly) consistent estimator of $\tau _{(k)}^2$, $k = 1,2$, figuring in Theorem 5 and Corollary 2, respectively, (18) and (21) immediately indicate that

$$\begin{aligned} I_n = \left[ \frac{G_n^{(k)}}{n} - \frac{q_{\alpha } \, {\widehat{\tau }}_{(k),n}}{\sqrt{n}} , \, \frac{G_n^{(k)}}{n} + \frac{q_{\alpha } \, {\widehat{\tau }}_{(k),n}}{\sqrt{n}} \right] \end{aligned}$$

(22)

is an asymptotic confidence interval for $\varDelta ^{(k)} = \varDelta ^{(k)}(F)$ at level $1 - \alpha $. Here, F satisfies the assumptions of Theorem 5 (or Corollary 2). As was briefly explained in the introduction, one objective of Stein’s method for the normal distribution is to assess how close a given distribution is to being normal. Thus, seeing $\varDelta ^{(1)}(F)$ and $\varDelta ^{(2)}(F)$ as ’measures’ of how far F differs from the standard normal distribution, we also developed a procedure for empirical assessments of this kind.

Second, we emphasize that our statistics can be employed for inverse testing problems. Namely, if $\varDelta _0 > 0$ is a given distance of tolerance, tests that reject $H_{\varDelta _0}$ if

$$\begin{aligned} \frac{G_n^{(k)}}{n} \le \varDelta _0 - \frac{{\widehat{\tau }}_{(k),n}}{\sqrt{n}} \, \varPhi ^{-1}(1 - \alpha ) \end{aligned}$$

are asymptotic level-$\alpha $-tests for the problem

$$\begin{aligned} H_{\varDelta _0} : \varDelta ^{(k)}(F) \ge \varDelta _0 \text { against } K_{\varDelta _0} : \varDelta ^{(k)}(F) < \varDelta _0 . \end{aligned}$$

These tests are consistent against each alternative and aim at validating a whole nonparametric neighbourhood of the hypothesized, underlying normality. Unfortunately, the direct approach to obtain estimators for $\tau ^2_{(k)}$ does not lead to feasible results.

Finally, we suppose $\{ c_n^{(k)} \} \subset (0, \infty )$ is the sequence of critical values for a level-$\alpha $-test based on $G_n^{(k)}$, $k = 1,2$. For an alternative distribution F satisfying the relevant prerequisites of Theorem 5 or Corollary 2, we can approximate the power of the test against this alternative by

(23)

Note that this last application does (in theory) not need an estimator of $\tau _{(k)}^2$. Instead, $\tau _{(k)}^2$ and $\varDelta ^{(k)}$ have to be calculated for the particular fixed alternative.

6 Empirical results

In this section, we investigate the behaviour of our two statistics given through the explicit formulas (9) and (10). It is organized as follows: First, we compare the two tests over a range of possible tuning parameters and alternative distributions. Based on these results, we choose our final procedure and describe its implementation. Then, a brief summary of the competing tests for an empirical power study is given. We display the performance of our test in comparison with the established tests in a finite-sample power study. Finally, we add results for the applications from the last section (as described in the Remark) for three alternative distributions. The simulations are performed using the statistical computing environment R, see R Core Team (2017). Notice that there are several comparative simulation studies for testing normality in the literature, as witnessed by Baringhaus et al. (1989), Farrell and Rogers-Stewart (2006), Landry and Lepage (1992), Pearson et al. (1977), Romão et al. (2010), Shapiro et al. (1968), Yap and Sim (2011) and others.

Since we consider two new families of tests both depending on the choice of the tuning parameter a, we will calculate the finite-sample power for a range of different parameters. In each simulation, we consider the sample sizes $n = 20$, $n = 50$ and $n = 100$, and fix the nominal level of significance $\alpha $ to 0.05. To implement the tests for any of the (fixed) values $a \in \{0.1, 0.25, 0.5, 1, 1.5, 2, 3\}$, we calculate the critical values by a Monte Carlo simulation with 100,000 repetitions. These critical values for $G_{n,a}^{(k)}$, $k = 1, 2$, can be found in Tables 1 and 2 and are taken from there throughout the simulations.

Table 1 Empirical 0.95 quantiles for $G_{n,a}^{(1)}$ under $H_0$ (100,000 replications)

Full size table

Table 2 Empirical 0.95 quantiles for $G_{n,a}^{(2)}$ under $H_0$ (100,000 replications)

Full size table

We choose the alternative distributions to fit the extensive power study of normality tests by Romão et al. (2010), in order to ease the comparison to other tests. Namely, we choose as symmetric distributions the Student $t_\nu $-distribution with $\nu \in \{3, 5, 10\}$ degrees of freedom, as well as the uniform distribution ${\mathcal {U}}(-\sqrt{3}, \sqrt{3})$. The asymmetric distributions are the $\chi ^2_\nu $-distribution with $\nu \in \{5, 15\}$ degrees of freedom, the Beta distributions B(1, 4) and B(2, 5), the Gamma distributions $\varGamma (1, 5)$ and $\varGamma (5, 1)$ parametrized by their shape and rate parameter, the Gumbel distribution Gum(1, 2) with location parameter 1 and scale parameter 2, the lognormal distribution LN(0, 1) as well as the Weibull distribution W(1, 0.5) with scale parameter 1 and shape parameter 0.5. As representatives of bimodal distributions, we take the mixture of normal distributions Mix${\mathcal {N}}(p, \mu , \sigma ^2)$, where the random variables are generated by

$$\begin{aligned} (1 - p) \, {\mathcal {N}}(0, 1) + p \, {\mathcal {N}}(\mu , \sigma ^2), \quad p \in (0, 1), \, \mu \in \mathbb {R}, \, \sigma > 0. \end{aligned}$$

Each entry in Table 3 referring to the finite-sample power of the tests is based on 10,000 replications. From the results in this table, we infer that for asymmetric alternative distributions our tests perform almost identical and are extremely stable over the range of tuning parameters. For the symmetric and bimodal alternatives however, the choice of a can have considerable influence on the power of the test. Moreover, with particular focus on the uniform distribution and the normal mixtures, the test based on $G_{n,a}^{(1)}$ shows a significantly better performance than $G_{n,a}^{(2)}$.

Table 3 Empirical rejection rates for $G_{n,a}^{(k)}$, $k = 1, 2$ ($\alpha = 0.05$, 10,000 replications)

Full size table

Taking this superiority of $G_{n,a}^{(1)}$, and the fact that the choice of tuning parameter influences the power, into account, we propose as a final procedure a test based on $G_{n,a}^{(1)}$ calculated by (9) with a data-dependent choice of the tuning parameter a. To implement the latter, we use the algorithm from Allison and Santana (2015), which has already been applied in the recent simulation study by Allison et al. (2017) for tests of exponentiality. Given the standardized sample $Y_1, \ldots , Y_n$ as in Sect. 2, our test is carried out as follows:

(a):: Fix a grid of possible tuning parameters $a \in \{ a_1, \ldots , a_\ell \}$ (here: $a \in \{ 0.1, 0.25, 0.5, 1, 1.5, 2, 3 \}$).
(b):: Sample from $Y_1, \ldots , Y_n$ with replacement and, for the obtained bootstrap sample, calculate $G_{n,a_i}^{(1)}$, $i = 1, \ldots , \ell $, via (9).
(c):: Repeat step (b) B times (here: $B = 400$) and denote the resulting values of the statistic by $G_{1,a_i}^*, \ldots , G_{B,a_i}^*$, $i = 1, \ldots , \ell $.
(d):: Calculate the bootstrap powers by ${\widehat{P}}_{a_i} = B^{-1} \sum _{b = 1}^{B} \mathbb {1}\{ G_{b,a_i}^* > c_{n, a_i}(\alpha ) \}$, $i = 1, \ldots , \ell $, where $c_{n, a_i}(\alpha )$ is the critical value for a level-$\alpha $-test based on $G_{n,a_i}^{(1)}$ (for the Monte Carlo approximations, see Table 1).
(e):: Choose as the tuning parameter ${\widehat{a}} = \mathrm {arg} \max \{{\widehat{P}}_{a}|a\in \{a_{1},\cdots ,a_{\ell }\}\}$ and apply the test based on $G_{n, {\widehat{a}}}^{(1)}$ to $Y_1, \ldots , Y_n$.

We consider the following competitors to this test. As classical and well-known tests, we include the Shapiro–Wilk test (SW), see Shapiro and Wilk (1965), the Shapiro–Francia test (SF), see Shapiro and Francia (1972), and the Anderson–Darling test (AD), see Anderson and Darling (1952). For the implementation of these tests in R, we refer to the package nortest by Gross and Ligges (2015). Tests based on the empirical characteristic function are represented by the Baringhaus–Henze–Epps–Pulley test (BHEP), see Baringhaus and Henze (1988), Epps and Pulley (1983). The BHEP test with tuning parameter $\beta > 0$ is based on

$$\begin{aligned} \hbox {BHEP} =&\frac{1}{n} \sum _{j, k = 1}^{n} \exp \left( - \frac{\beta ^2}{2} \left( Y_{j} - Y_{k}\right) ^2 \right) \\&\, - \frac{2}{\sqrt{1 + \beta ^2}} \sum _{j = 1}^{n} \exp \left( - \frac{\beta ^2}{2(1 + \beta ^2)} \, Y_{j}^2 \right) + \frac{n}{\sqrt{1 + 2\beta ^2}}, \end{aligned}$$

where $Y_1, \ldots , Y_n$ is the standardized sample. We fix $\beta = 1$ and take the critical values from Henze (1990) but also restate them in Table 4.

Furthermore, we include the quantile correlation test of del Barrio–Cuesta–Albertos–Mátran–Rodríguez–Rodríguez (BCMR) based on the $L^2$-Wasserstein distance, see del Barrio et al. (1999) and Sect. 3.3 of del Barrio et al. (2000). The BCMR statistic is given by

$$\begin{aligned} \hbox {BCMR} = n \left( 1 - \frac{1}{S_n^2} \left( \sum _{k = 1}^n X_{(k)} \int _{\frac{k - 1}{n}}^{\frac{k}{n}} \varPhi ^{-1}(t) \, \mathrm {d}t \right) ^2 \right) - \int _{\frac{1}{n + 1}}^{\frac{n}{n + 1}} \frac{t (1 - t)}{\left( \varphi \left( \varPhi ^{-1}(t) \right) \right) ^2} \, \mathrm {d}t, \end{aligned}$$

where $X_{(k)}$ is the k-th order statistic of $X_1, \ldots , X_n$, $S_n^2$ is the sample variance and $\varPhi ^{-1}$ is the quantile function of the standard normal distribution. Simulated critical values be found in the work of Krauczi (2009), or in Table 4.

The Henze–Jiménez-Gamero test (HJG), see Henze and Jiménez-Gamero (2018), uses a weighted $L^2$-distance between the empirical moment-generating function of the standardized sample and the moment-generating function of the standard normal distribution. The test is based on

$$\begin{aligned} \hbox {HJG}_\beta&= \frac{1}{n\sqrt{\beta }} \sum _{j,k = 1}^n \exp \left( \frac{(Y_j + Y_k)^2}{4 \beta } \right) - \frac{2}{\sqrt{\beta - 1/2}}\sum _{j = 1}^n \exp \left( \frac{Y_j^2}{4 \beta - 2}\right) \\&\quad + \frac{n}{\sqrt{\beta - 1}} \end{aligned}$$

with $\beta > 2$. We consider the tuning parameters $\beta \in \{2.5, 5, 10\}$. Since Henze and Jiménez-Gamero (2018) did not simulate critical values in the univariate case, the empirical critical values can be found in Table 4. This test was proposed recently, so it is not yet included in any other power study. All of the simulated critical values displayed in Table 4 have been confirmed in a simulation with 100,000 replications (compare to Henze 1990; Krauczi 2009).

Table 4 Empirical 0.95 quantiles for BCMR, BHEP and HJG$_{\beta }$ under $H_0$ (100,000 replications)

Full size table

Table 5 Empirical rejection rates for competing procedures ($\alpha = 0.05$, 10,000 replications)

Full size table

In Table 5, we display the results of the competitive simulation study, where our test based on the steps (a)–(e) (with bootstrap size $B = 400$ and values for a as before) is denoted by BE$_{{\widehat{a}}}$. Each entry is based on 10,000 Monte Carlo replications, and the best-performing test for each distribution and sample size is highlighted for easy reference.

Starting with the symmetric distributions, we see that the SF and SW tests perform best for these models. Interestingly, the HJG$_\beta $ test has the highest power against Students $t_{10}$-distribution but completely fails to detect the uniform alternative. The finite-sample power of our new test for the $t_\nu $-distributions is comparable to the BHEP test, but the uniform distribution seems to be a weak spot. Bimodal distributions are best detected by the AD test. The performance of the SW, BCMR, BHEP and SF tests is comparable, while the new BE$_{{\widehat{a}}}$ procedure has a slightly weaker power and the HJG$_\beta $ test is clearly inferior for those distributions. Considering the asymmetric alternatives, our new procedure shows its potential by dominating all other procedures for the $\chi ^2$-, the Gamma as well as the Gumbel distributions. All procedures do a good job in rejecting the Weibull and the lognormal alternatives.

To conclude the simulation study, we investigate the confidence interval (22) and the power approximation in (23) for the fixed tuning parameter $a = 1$. As an example, we examine three alternatives also partially considered by Baringhaus et al. (2017), namely the uniform distribution ${\mathcal {U}} \left( -\sqrt{3}, \sqrt{3} \right) $ and the Laplace distribution $L \left( 0, 1 / \sqrt{2} \right) $ with density $p(x) = \exp \left( - \sqrt{2} |x| \right) $, $x \in \mathbb {R}$, as well as the Logistic distribution $Lo \left( 0, \sqrt{3} / \pi \right) $ with density $p(x) = \pi \exp \left( - \pi x /\sqrt{3} \right) / \sqrt{3}\left( 1+\exp \left( - \pi x /\sqrt{3} \right) \right) ^2 $, $x \in \mathbb {R}$. Notice that these alternatives are standardized. The values of $\varDelta ^{(k)}$ are 0.004167 $(k = 1)$ and 0.000985 $(k = 2)$ for ${\mathcal {U}} \left( -\sqrt{3}, \sqrt{3} \right) $, 0.005225 $(k = 1)$ and 0.001367 $(k = 2)$ for $L \left( 0, 1 / \sqrt{2} \right) $, and 0.000819 $(k=1)$ and 0.000241 $(k=2)$ for $Lo \left( 0, \sqrt{3} / \pi \right) $.

Note that the uniform and the Laplace distribution do not satisfy the differentiability condition of Theorem 5, but since they do satisfy all conditions of Corollary 2, we include simulations for $G_{n,1}^{(1)}$ in those cases as well. The results in Tables 6 and 7 however, when compared to the Logistic distribution or the second statistic $G_{n,1}^{(2)}$ where all requirements are formally fulfilled, indicate that Theorem 5 should also cover the uniform and, in particular, the Laplace distribution, and thus the result might hold under weaker conditions.

Since the limit variance $\tau _{(k)}^2$, $k = 1, 2$, seems inaccessible by computation in each case, we decided to estimate $\tau _{(k)}^2$, $k = 1, 2$, by means of simulation. For a sample size of $n = 1000$ with 10,000 repetitions, the estimated values are 0.000302 $(k = 1)$ and 0.000016 $(k = 2)$ for ${\mathcal {U}} \left( -\sqrt{3}, \sqrt{3} \right) $, 0.002452 $(k = 1)$ and 0.000565 $(k = 2)$ for $L \left( 0, 1 / \sqrt{2} \right) $, as well as 0.000430 $(k=1)$ and 0.000125 $(k=2)$ for $Lo \left( 0, \sqrt{3} / \pi \right) $. Table 6 displays the empirical power (in %) of $G_{n,1}^{(1)}$ and $G_{n,1}^{(2)}$ against the three alternatives. The nominal level is $1 - \alpha = 0.95$ which explains the considerably smaller power compared to Table 2 from Baringhaus et al. (2017). The columns denoted by ’Apr’ show the corresponding approximations given by (23), while ’MC’ stands for the empirical rejection rates for 10,000 repetitions. Table 6 shows that the approximate power function (23) often appears as a lower bound to the power of the test statistics, confirming the observations by Baringhaus et al. (2017).

Table 6 Empirical power and approximation (23) against three alternatives

Full size table

Because no consistent estimator ${\widehat{\tau }}_{(k),n}^2 (X_1, \ldots , X_n)$ of $\tau _{(k)}^2$, $k = 1, 2$, is known, a nonparametric bootstrap procedure with $B = 500$ bootstrap samples (drawn with repetition) is implemented to calculate the empirical coverage probabilities shown in Table 7. The confidence level is set to $1 - \alpha = 0.9$, and each value is based on 10,000 repetitions. Obviously the empirical coverage probabilities are higher than the confidence level, indicating that the approximate lower and upper bounds of the confidence intervals are too conservative. This might be an effect of the bootstrap variance estimation procedure, but regarding the results of the power approximation, we rather think that an appropriate error-correction term in (22) and (23) will lead to better results.

Table 7 Empirical coverage probabilities of $I_n$ from (22) for $\varDelta ^{(k)}$, $k = 1, 2$, (at nominal level 0.9, with 10,000 replications)

Full size table

7 Conclusions and outlines

Starting with Charles Stein’s insight that a random variable X has a standard normal distribution if, and only if,

$$\begin{aligned} \mathbb {E}\big [ f^{\prime }(X)\big ] = \mathbb {E}\big [ Xf(X) \big ] \end{aligned}$$

holds for any absolutely continuous function, we developed two classes of goodness-of-fit statistics for testing the normality hypothesis. We utilized the zero-bias transformation to bypass the problem of calculating an empirical property for all absolutely continuous functions. An advantage of the underlying zero-bias identity over many other types of transformation applied in goodness-of-fit testing, like the characteristic function or the Laplace transform, is that the distribution inserted into the mapping is not associated with a purely analytic quantity but is mapped to another distribution and, thereby, stays accessible to a stochastically intuitive examination (cf. Lemmata 1 and 4). The conducted power study suggests that our tests are serious competitors to established tests and even set new markers in terms of the highest power achieved for many asymmetrical alternatives. Both procedures are consistent against any alternative distribution satisfying a weak moment condition.

We want to emphasize that some problems remain open for further research. One issue concerns our choice of weight function. The integrals figuring in the second sum of $G_{n,a}^{(2)}$ in (10), though they are accessible to stable numerical integration, are a slight drawback in terms of calculation time as compared to $G_{n,a}^{(1)}$. It is conceivable to replace the term $\omega (t) \mathrm {d}t$ in (5) and (6) by $\mathrm {d}F(t)$ and to estimate F by the empirical distribution function. However, this type of test is not included in the framework for our theoretical results. Another question is whether there is some limiting statistic, as $a \rightarrow \infty $, when considering $G_{n, a}^{(2)}$ from Sect. 2. Finally, since we have not succeeded in calculating consistent estimators for $\tau _{(1)}^2$ and $\tau _{(2)}^2$ (see the Remark in Sect. 5), it remains to derive appropriate estimators and, in view of the results in Tables 6 and 7, to find better power approximations as well as suitable confidence intervals.

References

Allison JS, Santana L (2015) On a data-dependent choice of the tuning parameter appearing in certain goodness-of-fit tests. J Stat Comput Simul 85(16):3276–3288
MathSciNet Google Scholar
Allison JS, Santana L, Smit N, Visagie IJH (2017) An ‘apples to apples’ comparison of various tests for exponentiality. Comput Stat 32(4):1241–1283
MathSciNet MATH Google Scholar
Anderson TW, Darling DA (1952) Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann Math Stat 23:193–212
MathSciNet MATH Google Scholar
Baringhaus L, Henze N (1988) A consistent test for multivariate normality based on the empirical characteristic function. Metrika 35(1):339–348
MathSciNet MATH Google Scholar
Baringhaus L, Henze N (2017) Cramér–von Mises distance: probabilistic interpretation, confidence intervals, and neighbourhood-of-model validation. J Nonparametr Stat 29(2):167–188
MathSciNet MATH Google Scholar
Baringhaus L, Danschke R, Henze N (1989) Recent and classical tests for normality—a comparative study. Commun Stat Simul Comput 18(1):363–379
MathSciNet MATH Google Scholar
Baringhaus L, Gürtler N, Henze N (2000) Weighted integral test statistics and components of smooth tests of fit. Aust N Z J Stat 42(2):179–192
MathSciNet Google Scholar
Baringhaus L, Ebner B, Henze N (2017) The limit distribution of weighted ${L}^2$-goodness-of-fit statistics under fixed alternatives, with applications. Ann Inst Stat Math 69(5):969–995
MATH Google Scholar
Bera AK, Galvao AF, Wang L, Xiao Z (2016) A new characterization of the normal distribution and test for normality. Econom Theory 32(5):1216–1252
MathSciNet MATH Google Scholar
Billingsley P (1995) Probability and measure, 3rd edn. Wiley, Hoboken
MATH Google Scholar
Chapman DG (1958) A comparative study of several one-sided goodness-of-fit tests. Ann Math Stat 29(3):655–674
MathSciNet MATH Google Scholar
Chen LHY, Goldstein L, Shao QM (2011) Normal approximation by Steins method. Probability and its applications. Springer, Berlin
Google Scholar
del Barrio E, Cuesta-Albertos JA, Matrán C, Rodriguez-Rodriguez JM (1999) Tests of goodness of fit based on the ${L}_2$-Wasserstein distance. Ann Stat 27(4):1230–1239
MATH Google Scholar
del Barrio E, Cuesta-Albertos JA, Matrán C, Csörgö S, Cuadras CM, de Wet T, Giné E, Lockhart R, Munk A, Stute W (2000) Contributions of empirical and quantile processes to the asymptotic theory of goodness-of-fit tests. TEST 9(1):1–96
MathSciNet Google Scholar
Epps TW, Pulley LB (1983) A test for normality based on the empirical characteristic function. Biometrika 70(3):723–726
MathSciNet MATH Google Scholar
Farrell PJ, Rogers-Stewart K (2006) Comprehensive study of tests for normality and symmetry: extending the Spiegelhalter test. J Stat Comput Simul 76(9):803–816
MathSciNet MATH Google Scholar
Goldstein L, Reinert G (1997) Stein’s method and the zero bias transformation with application to simple random sampling. Ann Appl Probab 7(4):935–952
MathSciNet MATH Google Scholar
Gross J, Ligges U (2015) nortest: tests for normality. R package version 1.0-4
Hájek J, S̆idák Z, Sen PK (1999) Theory of rank tests. Academic Press, Cambridge, Probability and Mathematical Statistics
Google Scholar
Henze N (1990) An approximation to the limit distribution of the Epps–Pulley test statistic for normality. Metrika 37:7–18
MathSciNet MATH Google Scholar
Henze N (1994) Tests of normality (in German). Allgemeines statistisches Archiv. J German Stat Soc 78(3):293–317
Google Scholar
Henze N (2002) Invariant tests for multivariate normality: a critical review. Stat Pap 43(4):467–506
MathSciNet MATH Google Scholar
Henze N, Wagner T (1997) A new approach to the BHEP tests for multivariate normality. J Multivariate Anal 62:1–23
MathSciNet MATH Google Scholar
Henze N, Zirkler B (1990) A class of invariant consistent tests for multivariate normality. Commun Stat Theory Methods 19(10):3595–3617
MathSciNet MATH Google Scholar
Henze N, Jiménez-Gamero MD (2018) A new class of tests for multinormality with iid. and Garch data based on the empirical moment generating function. TEST. https://doi.org/10.1007/s11749-018-0589-z
MathSciNet MATH Google Scholar
Klar B (2001) Goodness-of-fit tests for the exponential and the normal distribution based on the integrated distribution function. Ann Inst Stat Math 53(2):338–353
MathSciNet MATH Google Scholar
Krauczi É (2009) A study of the quantile correlation test for normality. TEST 18(1):156–165
MathSciNet MATH Google Scholar
Landry L, Lepage Y (1992) Empirical behavior of some tests for normality. Commun Stat Simul Comput 21(4):971–999
Google Scholar
Ledoux M, Talagrand M (2011) Probability in Banach spaces. Isoperimetry and processes. Springer, Berlin
MATH Google Scholar
Liu Q, Lee JD, Jordan M (2016) A kernelized Stein discrepancy for goodness-of-fit tests. In: Proceedings of the 33rd international conference on international conference on machine learning, vol 48, pp 276–284
Mecklin CJ, Mundfrom DJ (2004) An appraisal and bibliography of tests for multivariate normality. Int Stat Rev 72(1):123–138
MATH Google Scholar
Neuhaus G (1979) Asymptotic theory of goodness of fit tests when parameters are present: a survey. Stat J Theor Appl Stat 10(3):479–494
MathSciNet MATH Google Scholar
Pearson ES, D’Agostino RB, Bowman KO (1977) Tests for departure from normality: comparison of powers. Biometrika 64(2):231–246
MATH Google Scholar
Core Team R (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Romão X, Delgado R, Costa A (2010) An empirical power comparison of univariate goodness-of-fit tests for normality. J Stat Comput Simul 80(5):545–591
MathSciNet MATH Google Scholar
Ross N (2011) Fundamentals of Steins method. Probab Surv 8:210–293
MathSciNet MATH Google Scholar
Roussas GG (1972) Contiguity of probability measures: some applications in statistics. Cambridge University Press, Cambridge
MATH Google Scholar
Sen PK (1981) Sequential nonparametrics : invariance principles and statistical inference. Wiley, New York
MATH Google Scholar
Shapiro SS, Francia RS (1972) An approximate analysis of variance test for normality. J Am Stat Assoc 67(337):215–216
Google Scholar
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611
MathSciNet MATH Google Scholar
Shapiro SS, Wilk MB, Chen HJ (1968) A comparative study of various tests for normality. J Am Stat Assoc 63(324):1343–1372
MathSciNet Google Scholar
Stein C (1972) A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In: Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: probability theory, pp 583–602
Stein C (1986) Approximate computation of expectations, vol 7. Lecture Notes-Monograph Series. Institute of Mathematical Statistics, Hayward
MATH Google Scholar
Vasicek O (1976) A test for normality based on sample entropy. J R Stat Soc Ser B (Methodol) 38(1):54–59
MathSciNet MATH Google Scholar
Villaseñor-Alva JA, González-Estrada E (2015) A correlation test for normality based on the Lévy characterization. Commun Stat Simul Comput 44(5):1225–1238
MATH Google Scholar
Widder DV (1959) The Laplace transform, 5th printing. Princeton University Press, Princeton
Google Scholar
Yap BW, Sim CH (2011) Comparisons of various types of normality tests. J Stat Comput Simul 81(12):2141–2155
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank Norbert Henze for useful comments and also express their gratitude to three anonymous referees for careful reading and suggestions that helped improve the article.

Author information

Authors and Affiliations

Institute of Stochastics, Karlsruhe Institute of Technology, Englerstr. 2, 76128, Karlsruhe, Germany
Steffen Betsch & Bruno Ebner

Authors

Steffen Betsch
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Ebner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bruno Ebner.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Preliminary results concerning the weight functions

We first prove that the density function of a centred normal distribution is an admissible weight function. Then, we give a general result for the asymptotic behaviour of integral terms involving weight functions of the type we consider. In the whole section, we adopt the setting and notation from Sect. 2.

Lemma 2

The functions $\omega _a(s) = (2 \pi a)^{-1/2} \exp (- s^2 / (2 a))$, $s \in \mathbb {R}$, $a > 0$, satisfy the weight function conditions stated in Sect. 2.

Proof

The only non-trivial statement is that $\omega _a$ satisfies (8). Let $0< \varepsilon < 1/8$ be arbitrary. In the case $|S_n^{-1} - 1| \le \varepsilon $ and $|{\overline{X}}_n| / S_n \le \varepsilon $, a Taylor expansion gives

$$\begin{aligned} \omega _a\left( \frac{s - {\overline{X}}_n}{S_n} \right) - \omega _a(s) = \omega _a^{\prime }\big ( \xi _n(s) \big ) \left( \frac{s - {\overline{X}}_n}{S_n} - s \right) , \end{aligned}$$

(24)

where $\big |\xi _n(s) - s\big | \le \big | (s - {\overline{X}}_n) / S_n - s \big | \le (|s| + 1) / 8$. Consequently,

$$\begin{aligned} \big ( \xi _n(s) \big )^2 - s^2&\ge \min \left\{ \left| s - \frac{|s| + 1}{8} \right| ,\quad \left| s + \frac{|s| + 1}{8} \right| \right\} ^2 - s^2 \\&= - \frac{15}{64} s^2 - \frac{7}{32} |s| + \frac{1}{64} \end{aligned}$$

from which we conclude

$$\begin{aligned} \frac{\big | \omega _a^{\prime }\big ( \xi _n(s) \big ) \big |^3}{\big (\omega _a(s)\big )^2}&= \frac{\big | \xi _n(s) \big |^3}{a^3 \sqrt{2 \pi a}} \exp \left( - \frac{3}{2 a} \left( \big (\xi _n(s)\big )^2 - s^2 \right) - \frac{1}{2 a} s^2 \right) \\&\le \frac{1}{a^3 \sqrt{2 \pi a}} \big ( 2 |s| + 1 \big )^3 \exp \left( - \frac{s^2}{8 a} + \frac{|s|}{a} \right) . \end{aligned}$$

Combining this with (24),

$$\begin{aligned}&n \int _{\mathbb {R}} \left| \omega _a\left( \frac{s - {\overline{X}}_n}{S_n} \right) - \omega _a(s) \right| ^{3} \big ( \omega _a(s) \big )^{-2} \mathrm {d}s \\&\quad \le \varepsilon \int _{\mathbb {R}} n \left| \left( \frac{1}{S_n} - 1 \right) s - \frac{{\overline{X}}_n}{S_n} \right| ^2 \frac{\big ( 2 |s| + 1 \big )^4}{a^3 \sqrt{2 \pi a}} \exp \left( - \frac{s^2}{8 a} + \frac{|s|}{a} \right) \mathrm {d}s. \end{aligned}$$

As $\varepsilon $ was arbitrary, the claim follows from the boundedness in probability of $\sqrt{n} \big ( S_n^{-1} - 1 \big )$ and $\sqrt{n} \big ( {\overline{X}}_n / S_n \big )$. $\square $

Lemma 3

Let ${\mathcal {U}}_n$ be a random element of ${\mathcal {H}}$, $n \in \mathbb {N}$, such that $\left||\sqrt{n} \, {\mathcal {U}}_n \right||_{{\mathcal {H}}} = O_\mathbb {P}(1)$. Then,

$$\begin{aligned} \int _{\mathbb {R}} \big |\sqrt{n} \, {\mathcal {U}}_n(s)\big | \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) - \omega (s) \right| \mathrm {d}s = o_\mathbb {P}(1). \end{aligned}$$

If in addition $\sup _{s \, \in \, \mathbb {R}} \big | {\mathcal {U}}_n(s) \big | \le C$$\mathbb {P}$-a.s. for each $n \in \mathbb {N}$ and some $C > 0$,

$$\begin{aligned} \int _{\mathbb {R}} \big |\sqrt{n} \, {\mathcal {U}}_n(s) \big |^2 \, \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \mathrm {d}s = \left||\sqrt{n} \, {\mathcal {U}}_n \right||_{{\mathcal {H}}}^2 + o_\mathbb {P}(1). \end{aligned}$$

Proof

By Hölder’s inequality ($p = q = 2$) and Slutsky’s lemma

$$\begin{aligned}&\int _{\mathbb {R}} \big |\sqrt{n} \, {\mathcal {U}}_n(s)\big | \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) - \omega (s) \right| \mathrm {d}s \\&\quad \le \left||\sqrt{n} \, {\mathcal {U}}_n \right||_{{\mathcal {H}}} \left( \int _{\mathbb {R}} \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \Big / \omega (s) - 1 \right| ^2 \omega (s) \, \mathrm {d}s \right) ^{1/2} \\&\quad = o_\mathbb {P}(1), \end{aligned}$$

where we used the assumption on ${\mathcal {U}}_n$ and the fact that (8) implies

$$\begin{aligned}&\int _{\mathbb {R}} \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \Big / \omega (s) - 1 \right| ^2 \omega (s) \, \mathrm {d}s \\&\quad \le \left( \int _{\mathbb {R}} \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \Big / \omega (s) - 1 \right| ^3 \omega (s) \mathrm {d}s \right) ^{2/3} \left( \int _{\mathbb {R}} \omega (s) \, \mathrm {d}s \right) ^{1/3} \\&\quad = o_\mathbb {P}(1). \end{aligned}$$

The second claim also follows from Hölder’s inequality ($p = 3/2, \, q = 3$) and (8) since

$$\begin{aligned}&\left| \int _{\mathbb {R}} \big |\sqrt{n} \, {\mathcal {U}}_n(s)\big |^2 \, \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \mathrm {d}s - \left||\sqrt{n} \, {\mathcal {U}}_n \right||_{{\mathcal {H}}}^2 \right| \\&\quad \le n \int _{\mathbb {R}} \big | {\mathcal {U}}_n(s) \big |^2 \big (\omega (s)\big )^{2/3} \, \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \Big / \omega (s) - 1 \right| \big (\omega (s)\big )^{1/3} \, \mathrm {d}s \\&\quad \le n^{2/3} \left( \int _{\mathbb {R}} \big | {\mathcal {U}}_n(s) \big |^3 \omega (s) \mathrm {d}s \right) ^{2/3} n^{1/3} \left( \int _{\mathbb {R}} \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \Big / \omega (s) - 1 \right| ^3 \omega (s) \, \mathrm {d}s \right) ^{1/3} \\&\quad \le C^{2/3} \left||\sqrt{n} \, {\mathcal {U}}_n \right||_{{\mathcal {H}}}^{4/3} \left( n \int _{\mathbb {R}} \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) - \omega (s) \right| ^{3} \big ( \omega (s) \big )^{-2} \mathrm {d}s \right) ^{1/3} \\&\quad = o_\mathbb {P}(1).\square \end{aligned}$$

B Asymptotic expansions

We adopt the setting from Sect. 2, that is, we let $X, X_1, X_2, \ldots $ be iid. random variables with distribution function F and $\mathbb {E}[X^2] < \infty $ as well as $\mathbb {E}X = 0$, $\mathbb {V}(X) = 1$. The following lemma collects basic facts about a quantity closely related to the empirical zero-bias distribution function.

Lemma 4

The function

$$\begin{aligned} {\widehat{F}}_n^X (s) = \frac{1}{n} \sum _{j=1}^{n} \frac{X_j - {\overline{X}}_n}{S_n^2} \, (X_j - s) \, \mathbb {1}\{X_j \le s\}, \quad s \in \mathbb {R}, \end{aligned}$$

is a continuous distribution function for each $n \in \mathbb {N}$ (and on a set of measure one). Furthermore,

$$\begin{aligned} \sup \limits _{s \, \in \, \mathbb {R}} \left| {\widehat{F}}_n^X (s) - F^X (s) \right| \longrightarrow 0 \end{aligned}$$

(25)

$\mathbb {P}$-a.s., as $n \rightarrow \infty $, and

$$\begin{aligned} \sqrt{n} \, {\widehat{F}}_n^X (s) \approx \frac{\sqrt{n}}{S_n^2} \left\{ \frac{1}{n} \sum \limits _{j=1}^{n} X_j (X_j - s) \mathbb {1}\{ X_j \le s \} - {\overline{X}}_n \, \mathbb {E}\big [ (X - s) \mathbb {1}\{X \le s\} \big ] \phantom {\sum \limits _{j=1}^{n}} \right\} . \end{aligned}$$

(26)

Proof

We fix $n \in \mathbb {N}$ and notice that

$$\begin{aligned} {\widehat{d}}_n^X (s) = \frac{1}{n} \sum _{j=1}^{n} \frac{X_j - {\overline{X}}_n}{S_n^2} \, \mathbb {1}\{X_j > s\} = - \frac{1}{n} \sum _{j=1}^{n} \frac{X_j - {\overline{X}}_n}{S_n^2} \, \mathbb {1}\{X_j \le s\} \,(\ge 0) . \end{aligned}$$

Using the first representation when integrating over $({\overline{X}}_n, \infty )$ and the second for $(- \infty , {\overline{X}}_n]$, we obtain

$$\begin{aligned} \int _{\mathbb {R}} {\widehat{d}}_n^X (t) \, \mathrm {d}t = \frac{1}{S_n^2} \left( \frac{1}{n} \sum _{j=1}^{n} \left( X_j - {\overline{X}}_n \right) ^2 \right) = 1. \end{aligned}$$

Now, we conclude from

$$\begin{aligned} \int _{- \infty }^{s} {\widehat{d}}_n^X (t) \, \mathrm {d}t = - \frac{1}{n} \sum _{j=1}^{n} \frac{X_j - {\overline{X}}_n}{S_n^2} \int _{- \infty }^{s} \mathbb {1}\{X_j \le t\} \, \mathrm {d}t = {\widehat{F}}_n^X (s) \end{aligned}$$

that ${\widehat{F}}_n^X$ is a continuous distribution function. By the strong law of large numbers and the almost sure convergence $({\overline{X}}_n, S_n^2) \rightarrow (0,1)$, we have

$$\begin{aligned} {\widehat{F}}_n^X (s)&= \frac{1}{S_n^2} \cdot \frac{1}{n} \sum _{j=1}^{n} X_j (X_j - s) \mathbb {1}\{ X_j \le s \} \\&\quad - \frac{{\overline{X}}_n}{S_n^2} \cdot \frac{1}{n} \sum _{j=1}^{n} (X_j - s) \mathbb {1}\{ X_j \le s \} \\ {}&\longrightarrow F^X (s) \end{aligned}$$

$\mathbb {P}$-a.s., as $n \rightarrow \infty $, for any fixed $s \in \mathbb {R}$. The proof of the classical Glivenko–Cantelli theorem applies to ${\widehat{F}}_n^X$ which yields (25). For the last claim, we set

$$\begin{aligned} A_n(s) = \frac{1}{n} \sum _{j=1}^{n} (X_j - s) \, \mathbb {1}\{X_j \le s\} - \mathbb {E}\big [(X - s) \, \mathbb {1}\{X \le s\} \big ] , \quad s \in \mathbb {R}. \end{aligned}$$

Straightforward calculations using Tonelli’s theorem and the integrability condition (7) give

$$\begin{aligned} \mathbb {E}\left[ \int _{\mathbb {R}} A_n(s)^2 \, \omega (s) \, \mathrm {d}s \right] \longrightarrow 0, \quad \text {as} \quad n \rightarrow \infty , \end{aligned}$$

so $\left||A_n \right||_{{\mathcal {H}}}^2 = o_{\mathbb {P}}(1)$. Together with $\sqrt{n} \, {\overline{X}}_n = O_\mathbb {P}(1)$ and Slutsky’s lemma, this implies (26). $\square $

We proceed by proving further asymptotic expansions of the same type as (26).

Lemma 5

Assume, in addition to the above prerequisites, that X has a continuously differentiable density function p with

$$\begin{aligned} \mathop {\sup }\nolimits _{s \, \in \, \mathbb {R}} \big | p(s) \big | \le K_1< \infty \quad \text {and} \quad \mathop {\sup }\nolimits _{s \, \in \, \mathbb {R}} \big | p^{\prime }(s) \big | \le K_2 < \infty . \end{aligned}$$

We have

$$\begin{aligned} \sqrt{n} \, F \left( \frac{s - {\overline{X}}_n}{S_n} \right) \approx \sqrt{n} \left\{ F(s) + p(s) \left( \left( \frac{1}{S_n} - 1 \right) s - \frac{{\overline{X}}_n}{S_n} \right) \right\} \end{aligned}$$

and, with $F^X$ as in Lemma 1,

$$\begin{aligned} \sqrt{n} \, F^X \left( \frac{s - {\overline{X}}_n}{S_n} \right) \approx \sqrt{n} \left\{ F^X(s) + d^X(s) \left( \left( \frac{1}{S_n} - 1 \right) s - \frac{{\overline{X}}_n}{S_n} \right) \right\} . \end{aligned}$$

Moreover,

$$\begin{aligned} \sqrt{n} \, S_n^2 \, F^X(s) \approx \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} X_j^2 \, F^X(s) \end{aligned}$$

which reads as $\sqrt{n} \, S_n^2 \, \varPhi (s) \approx n^{- 1/2} \sum _{j=1}^{n} X_j^2 \, \varPhi (s)$ when $\mathbb {P}^X = {\mathcal {N}}(0, 1)$ (cf. Theorem 1).

Proof

By Taylor’s theorem,

$$\begin{aligned} \sqrt{n} \, F \left( \frac{s - {\overline{X}}_n}{S_n} \right) = \sqrt{n} \left\{ F(s) + p(s) \left( \frac{s - {\overline{X}}_n}{S_n} - s \right) \right\} + R_n(s), \end{aligned}$$

where

$$\begin{aligned} R_n (s) = \sqrt{n} \, \frac{p^{\prime }\big (\xi _n(s)\big )}{2} \left( \frac{s - {\overline{X}}_n}{S_n} - s \right) ^2 \end{aligned}$$

and $\big | \xi _n(s) - s \big | \le \big | (s - {\overline{X}}_n) / S_n - s \big |$. Condition (7) assures that $R_n \in {\mathcal {H}}$$\mathbb {P}$-a.s. and with $\sqrt{n} \big (S_n^{-1} - 1\big ) = O_\mathbb {P}(1)$, $\sqrt{n} \, {\overline{X}}_n = O_\mathbb {P}(1)$ we conclude

$$\begin{aligned} \left||R_n \right||_{{\mathcal {H}}}^2 \le \frac{K_2^2}{4} \int _{\mathbb {R}} n \left| \left( \frac{1}{S_n} - 1 \right) s - \frac{{\overline{X}}_n}{S_n} \right| ^4 \omega (s) \, \mathrm {d}s = o_\mathbb {P}(1). \end{aligned}$$

Now, let $0< \varepsilon < 1$ be arbitrary. In the case $\big | S_n^{-1} - 1 \big | \le \varepsilon $ and $\big | {\overline{X}}_n \big | / S_n \le \varepsilon $, we have

$$\begin{aligned} \sqrt{n} \, F^X \left( \frac{s - {\overline{X}}_n}{S_n} \right) = \sqrt{n} \left\{ F^X(s) + d^X(s) \left( \frac{s - {\overline{X}}_n}{S_n} - s \right) \right\} + {\widetilde{R}}_n(s) , \end{aligned}$$

where

$$\begin{aligned} {\widetilde{R}}_n(s) = - \frac{\sqrt{n}}{2} \, {\widetilde{\xi }}_n(s) \, p\big ( {\widetilde{\xi }}_n(s) \big ) \left( \frac{s - {\overline{X}}_n}{S_n} - s \right) ^2 \end{aligned}$$

and $\big |{\widetilde{\xi }}_n(s) - s \big | \le \big | (s - {\overline{X}}_n) / S_n - s \big | \le |s| + 1$. Using $\big ( {\widetilde{\xi }}_n(s) \big )^2 \le (2 |s| + 1)^2$, we get

$$\begin{aligned} \left||{\widetilde{R}}_n \right||_{{\mathcal {H}}}^2&\le \frac{K_1^2}{4} \int _{\mathbb {R}} n \left| \frac{s - {\overline{X}}_n}{S_n} - s \right| ^4 \big ( 2 |s| + 1 \big )^2 \, \omega (s) \, \mathrm {d}s \\&\le \frac{\varepsilon ^2 K_1^2}{4} \int _{\mathbb {R}} n \left| \left( \frac{1}{S_n} - 1 \right) s - \frac{{\overline{X}}_n}{S_n} \right| ^2 \big ( 2 |s| + 1 \big )^4 \, \omega (s) \, \mathrm {d}s . \end{aligned}$$

Since $\sqrt{n} \big (S_n^{-1} - 1\big )$ and $\sqrt{n} \big ( {\overline{X}}_n / S_n \big )$ are bounded in probability and $\varepsilon $ was arbitrary, $||{\widetilde{R}}_n||_{{\mathcal {H}}}^2 = o_\mathbb {P}(1)$. The last claim of the lemma follows from

$$\begin{aligned} \left||\sqrt{n} \, S_n^2 \, F^X - \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} X_j^2 \, F^X \right||_{{\mathcal {H}}} = \sqrt{n} \, {\overline{X}}_n^2 \left||F^X \right||_{{\mathcal {H}}} = o_\mathbb {P}(1). \end{aligned}$$

$\square $

C Proof of the limit relations in (11) and (12)

We will give the proof of (11), using the notation from Sect. 2. The limit in (12) is obtained by the same argument. Set

$$\begin{aligned} g(s) = s^{-1/2} \left( \frac{1}{n} \sum \limits _{j=1}^{n} \big ( Y_j (Y_j - \sqrt{2 s}) - 1 \big ) \, \mathbb {1}\{ Y_j \le \sqrt{2 s} \} \right) ^2, \quad s > 0, \end{aligned}$$

as well as

$$\begin{aligned} {\widetilde{g}}(s) = s^{-1/2} \left( \frac{1}{n} \sum \limits _{j=1}^{n} \big ( Y_j (Y_j + \sqrt{2 s}) - 1 \big ) \, \mathbb {1}\{ Y_j \le - \sqrt{2 s} \} \right) ^2, \quad s > 0. \end{aligned}$$

Splitting the integral in the definition of $G_{n, a}^{(1)}$ (see (5)) into integrals over $(- \infty , 0]$ and $(0, \infty )$, simple changes of variable yield

$$\begin{aligned} \lim \limits _{a \, \searrow \, 0} G_{n, a}^{(1)}&= \lim \limits _{a \, \searrow \, 0} \frac{n}{2 \sqrt{\pi }} \left( a^{-1/2} \int _0^\infty g(s) \, e^{-s / a} \, \mathrm {d}s + a^{-1/2} \int _0^\infty {\widetilde{g}}(s) \, e^{-s / a} \, \mathrm {d}s \right) \\&= \lim \limits _{a \, \rightarrow \, \infty } \frac{n}{2 \sqrt{\pi }} \left( a^{1/2} \int _0^\infty g(s) \, e^{-a s} \, \mathrm {d}s + a^{1/2} \int _0^\infty {\widetilde{g}}(s) \, e^{-a s} \, \mathrm {d}s \right) . \end{aligned}$$

Since the integrals on the right-hand side of the above equation are Laplace transforms, and since we have

$$\begin{aligned} \lim \limits _{s \, \searrow \, 0} \varGamma (1/2) \, s^{1/2} g(s) = \sqrt{\pi } \left( \frac{1}{n} \sum \limits _{j=1}^{n} (Y_j^2 - 1) \, \mathbb {1}\{ Y_j \le 0 \} \right) ^2 \end{aligned}$$

and

$$\begin{aligned} \lim \limits _{s \, \searrow \, 0} \varGamma (1/2) \, s^{1/2} \, {\widetilde{g}}(s) = \sqrt{\pi } \left( \frac{1}{n} \sum \limits _{j=1}^{n} (Y_j^2 - 1) \, \mathbb {1}\{ Y_j < 0 \} \right) ^2, \end{aligned}$$

an Abelian theorem for the Laplace transform, as stated on p. 182 in the book by Widder (1959) (see also Baringhaus et al. 2000), implies the claim. Here, $\varGamma (1/2) = \sqrt{\pi }$ denotes the Gamma function evaluated at 1 / 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Betsch, S., Ebner, B. Testing normality via a distributional fixed point property in the Stein characterization. TEST 29, 105–138 (2020). https://doi.org/10.1007/s11749-019-00630-0

Download citation

Received: 19 March 2018
Accepted: 26 January 2019
Published: 22 February 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11749-019-00630-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Testing normality via a distributional fixed point property in the Stein characterization

Abstract

Similar content being viewed by others

Goodness-of-fit tests for semiparametric and parametric hypotheses based on the probability weighted empirical characteristic function

Tests for multivariate normality—a critical review with emphasis on weighted \(L^2\)-statistics

U-Tests of General Linear Hypotheses for High-Dimensional Data Under Nonnormality and Heteroscedasticity

1 Introduction

Lemma 1

Theorem 1

Proof

2 The new test statistics

3 The limit null distributions

Theorem 2

Proof

Corollary 1

Remark

4 Contiguous alternatives

Theorem 3

5 Consistency and limit distributions under fixed alternatives

Theorem 4

Proof

Theorem 5

Proof

Corollary 2

Remark

6 Empirical results

7 Conclusions and outlines

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Preliminary results concerning the weight functions

Lemma 2

Proof

Lemma 3

Proof

B Asymptotic expansions

Lemma 4

Proof

Lemma 5

Proof

C Proof of the limit relations in (11) and (12)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation