1 Introduction

Testing normality is commonly known as the most used and discussed goodness-of-fit technique, justified by the model assumption of normality in classical models. To be specific, let \(X, X_1, X_2, \ldots \) be real-valued independent and identically distributed (iid.) random variables. The problem of interest is to test the hypothesis

$$\begin{aligned} H_0: \mathbb {P}^X \in {\mathcal {N}}=\{ {\mathcal {N}}(\mu , \sigma ^2) \, | \, (\mu , \sigma ^2) \in \mathbb {R}\times (0,\infty ) \} \end{aligned}$$
(1)

against general alternatives. So far, a great variety of goodness-of-fit tests have been proposed, and research is of ongoing interest, as witnessed by the recent papers of Bera et al. (2016), Villaseñor-Alva and González-Estrada (2015) and comparative studies like from Romão et al. (2010), Yap and Sim (2011). Classical procedures in goodness-of-fit methodology such as the Kolmogorov–Smirnov and the Cramér–von Mises test approach the testing problem by measuring the distance of the empirical distribution function to the estimated representative of \({\mathcal {N}}\). For a theoretical approach to goodness-of-fit tests to a family of distributions, see del Barrio et al. (2000), Neuhaus (1979). Other methods are based on skewness and kurtosis, as, for instance, proposed by Pearson et al. (1977) (known to lead to inconsistent procedures), the empirical characteristic function, see Epps and Pulley (1983), the Wasserstein distance, see del Barrio et al. (2000), del Barrio et al. (1999), the sample entropy, see Vasicek (1976), the integrated empirical distribution function, see Klar (2001), or correlation and regression tests, as the famous Shapiro–Wilk test, see Shapiro and Wilk (1965), among others. For a survey of classical methods, see del Barrio et al. (2000), Sect. 3, and Henze (1994), and for the problem of testing multivariate normality, we refer to Henze (2002), Mecklin and Mundfrom (2004).

Another natural approach to assess the distance of the distribution of a real-valued random variable X to the normal distribution is to calculate the difference between \(\mathbb {E}h(X)\) and \(\mathbb {E}h(N)\), where \(\mathbb {P}^N = {\mathcal {N}}(0,1)\), over some large class of functions \(h : \mathbb {R}\rightarrow \mathbb {R}\). With the class \(\{x \mapsto e^{itx}\, | \, t \in \mathbb {R}\}\) leading to the characteristic functions, one heavily relies on the assumption of independence when proving limit theorems. In an attempt to give an alternative proof of the central limit theorem, Charles Stein considered a different class of test functions (see, e.g. Stein 1972). Stating that X has a standard normal distribution if, and only if,

$$\begin{aligned} \mathbb {E}\big [ f^{\prime }(X) \big ] = \mathbb {E}\big [ X f(X) \big ] \end{aligned}$$
(2)

holds for each absolutely continuous function f for which the expectations exist, it appears reasonable to regard \(\mathbb {E}\big [ f^{\prime }(X) - X f(X) \big ]\), for a suitable function f, as an estimate of \(\mathbb {E}h(X) - \mathbb {E}h(N)\) since both terms ought to be small whenever the distribution of X is close to standard normal. In practice, solving the differential equation

$$\begin{aligned} f^{\prime }(x) - x f(x) = h(x) - \mathbb {E}h(N) \end{aligned}$$
(3)

for absolutely continuous functions h, evaluating at X and taking expectations, the problem reduces to appraising \(\mathbb {E}\big [ f_h^{\prime }(X) - X f_h(X)\big ]\), with \(f_h\) being the solution of (3). A commonly used tool to handle these terms is the so-called zero-bias transformation introduced by Goldstein and Reinert (1997). Namely, if \(\mathbb {E}X = 0\) and \(\mathbb {V}(X) = 1\), a random variable \(X^*\) is said to have the X-zero-bias distribution if

$$\begin{aligned} \mathbb {E}\big [ f^{\prime }(X^*)\big ] = \mathbb {E}\big [ X f(X) \big ] \end{aligned}$$
(4)

holds for all absolutely continuous functions f for which these expectations exist. The use of this distribution, if it exists, lends itself easily to the purpose of distributional approximation. For instance, starting with the solution of (3), the mean value theorem gives

$$\begin{aligned} | \mathbb {E}h(X) - \mathbb {E}h(N) | = | \mathbb {E}[f_h^{\prime }(X) - f_h^{\prime }(X^*)] | \le \left||f_h^{\prime \prime } \right||_{\infty } \, \mathbb {E}| X - X^* | . \end{aligned}$$

Thus, the problem reduces to bounding the derivatives of the solution \(f_h\) of (3) and constructing \(X^*\) such that \(\mathbb {E}| X - X^* |\) is accessible. Bounds on \(f_h\) and its derivatives are well known, and a comprehensive treatment as well as explicit constructions for \(X^*\) may be found in Chen et al. (2011) (for the bounds, see also Stein 1986). For a general introduction to Stein’s method, see Chen et al. (2011), Ross (2011). One of the main reasons Stein’s method, particularly for the normal distribution, has been studied to a remarkable extent are various central limit type results, also giving convergence rates, even in dependency settings.

It seems reasonable to ask whether Stein’s characterization (2) may be used to construct a goodness-of-fit statistic. Apparently, we can hardly evaluate a quantity for all absolutely continuous functions which makes the direct application of equation (2) rather complicated (cf. Liu et al. 2016). Instead, we propose a test based on the zero-bias distribution. To this end, we first recall the explicit formula for the density and distribution function of the zero-bias distribution.

Lemma 1

If X is a centred, real-valued random variable with \(\mathbb {V}(X) = 1\), the X-zero-bias distribution exists and is unique. Moreover, it is absolutely continuous with respect to the Lebesgue measure with density

$$\begin{aligned} d^X (t) = \mathbb {E}[X \mathbb {1}\{ X > t \}] = - \mathbb {E}[X \, \mathbb {1}\{ X \le t \}] \end{aligned}$$

and distribution function

$$\begin{aligned} F^X (t) = \mathbb {E}[X (X - t) \mathbb {1}\{ X \le t \}] . \end{aligned}$$

A proof can be found in Chen et al. (2011) or in the original treatment (Goldstein and Reinert 1997). Now, interpreting (4) as a distributional transformation \(\mathbb {P}^X \mapsto \mathbb {P}^{X^*}\), the standard normal distribution is characterized as the unique fixed point of this transformation [see also Goldstein and Reinert 1997, Lemma 2.1 (i)]. Writing this in terms of the formula from Lemma 1, the characterization reads as follows.

Theorem 1

A random variable X with distribution function F and \(\mathbb {E}X = 0\), \(\mathbb {V}(X) = 1\) has the standard normal distribution if, and only if,

$$\begin{aligned} F^X = F \end{aligned}$$

which in turn holds if, and only if,

$$\begin{aligned} F^X = \varPhi , \end{aligned}$$

where \(\varPhi \) is the distribution function of the standard normal distribution.

Proof

By Lemma 1 and the presumptions on X, the zero-bias distribution \(\mathbb {P}^{X^*}\) of \(\mathbb {P}^X\) exists, is unique and has distribution function \(F^X\). Hence, if \(\mathbb {P}^X\) is the standard normal distribution, (2) is satisfied and the definition of the zero-bias distribution through formula (4) and its uniqueness imply \(\mathbb {P}^{X^*} = \mathbb {P}^X\), that is, \(F^X = F\). Conversely, if \(F^X = F\), Lemma 1 implies \(\mathbb {P}^{X^*} = \mathbb {P}^X\) and Stein’s characterization (2) yields that X is standard Gaussian.

For the second equivalence note that if X follows the standard normal law, \(F^X = F = \varPhi \) by the first part. Finally, assume that \(F^X = \varPhi \). Since \(F^X\) is the distribution function of \(X^*\), \(\mathbb {P}^{X^*}\) is the standard normal distribution, so

$$\begin{aligned} \mathbb {E}\big [ f^{\prime }(X^*) \big ] = \mathbb {E}\big [ X^* f(X^*) \big ] \end{aligned}$$

holds for each absolutely continuous function f for which these expectations exist. The definition of the zero-bias distribution implies that for any such function f,

$$\begin{aligned} \mathbb {E}\big [ X f(X) \big ] = \mathbb {E}\big [ X^* f(X^*) \big ]. \end{aligned}$$

Noticing that these functions include any monomial (since \(X^*\) has the standard normal distribution for which moments of all orders exist) and that the normal distribution is uniquely determined through its sequence of moments (see Theorem 30.1 of Billingsley 1995), this last equation shows \(\mathbb {P}^X = \mathbb {P}^{X^*} = {\mathcal {N}}(0,1)\). \(\square \)

This theorem paves the way for the construction of goodness-of-fit tests using a measure of deviation between an empirical version of \(F^X\) and \(\varPhi \) or the empirical distribution, respectively. Heuristically, the above characterization indicates that the difference between these empirical quantities ought to be small when the underlying sample comes from a normal distribution and large whenever it does not. Thus, tests based on this characterization should be able to detect deviations from the normality hypothesis (1).

In Sect. 2, we use a weighted \(L^2\)-measure to construct two statistics for our testing problem (1). We derive the limit null distributions in Sect. 3 and study the behaviour under contiguous alternatives in Sect. 4. The consistency of these classes of tests is established in Sect. 5, and we obtain the limit distributions of the statistics under fixed alternatives. To analyse the actual performance, empirical results in form of a power study are presented in Sect. 6. Conclusions and outlines complete the article.

2 The new test statistics

Let \(X, X_1, X_2, \ldots \) be real-valued iid. random variables defined on an underlying probability space \((\varOmega , {\mathcal {A}}, \mathbb {P})\). Further, let F be the distribution function of X and assume that \(\mathbb {E}[X^2] < \infty \). To reflect the invariance of the family of normal distributions \({\mathcal {N}}\) with respect to affine transformations, the proposed statistics only depend on the so-called scaled residuals, namely \(Y_{n,1}, \ldots , Y_{n,n}\),

$$\begin{aligned} Y_{n,j} = \frac{X_j - {\overline{X}}_n}{S_n}, \end{aligned}$$

where \({\overline{X}}_n = n^{-1} \sum _{k=1}^{n} X_k\) and \(S_n^2 = n^{-1} \sum _{k=1}^{n}(X_k - {\overline{X}}_n)^2\) are the sample mean and variance, respectively. This way, the values of our statistics themselves and thus the tests based on them are invariant under affine transformations of the data. We note that if X has a normal distribution with some parameters \(\mu \) and \(\sigma ^2\), \(Y_{n,1}\) is approximately standard normal since \(({\overline{X}}_n, S_n^2)\) is a strongly consistent estimator of \((\mu , \sigma ^2)\). Due to the affine invariance, we assume, w.l.o.g., \(\mathbb {E}X = 0\) and \(\mathbb {V}(X) = 1\).

In view of Theorem 1 and the heuristics given thereafter, we suggest the Cramér–von Mises-type (or weighted \(L^2\)-type) test statistics

$$\begin{aligned} G_n^{(1)} = n \int _{\mathbb {R}} \left( \frac{1}{n} \sum _{j=1}^{n} Y_{n,j} (Y_{n,j} - t) \mathbb {1}\{ Y_{n,j} \le t \} - \frac{1}{n} \sum _{j=1}^{n} \mathbb {1}\{ Y_{n,j} \le t \} \right) ^2 \omega (t) \, \mathrm {d}t\quad \end{aligned}$$
(5)

and

$$\begin{aligned} G_n^{(2)} = n \int _{\mathbb {R}} \left( \frac{1}{n} \sum _{j=1}^{n} Y_{n,j} (Y_{n,j} - t) \mathbb {1}\{ Y_{n,j} \le t \} - \varPhi (t) \right) ^2 \omega (t) \, \mathrm {d}t . \end{aligned}$$
(6)

Here, \(n^{-1} \sum _{j=1}^{n} Y_{n,j} (Y_{n,j} - t) \mathbb {1}\{ Y_{n,j} \le t \}\) is an empirical version of the zero-bias distribution function and \(n^{-1} \sum _{j=1}^{n} \mathbb {1}\{ Y_{n,j} \le t \}\) is the empirical distribution function of \(Y_{n,1}, \ldots , Y_{n,n}\). By \(\omega : \mathbb {R}\rightarrow \mathbb {R}\), we denote a positive, continuous weight function satisfying

$$\begin{aligned} \int _{\mathbb {R}} t^6 \, \omega (t) \, \mathrm {d}t < \infty \end{aligned}$$
(7)

and

$$\begin{aligned} n \int _{\mathbb {R}} \left| \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) - \omega (s) \right| ^{3} \big ( \omega (s) \big )^{-2} \mathrm {d}s = o_{\mathbb {P}}(1), \end{aligned}$$
(8)

where \(o_{\mathbb {P}}(1)\) denotes convergence to 0 in probability as \(n \rightarrow \infty \). A test based on \(G_{n}^{(1)}\) or \(G_{n}^{(2)}\) rejects \(H_0\) for large values of the statistic. For the implementation of our tests, we need to specify the weight function \(\omega \). To that end, we use the density function of a centred normal distribution

$$\begin{aligned} \omega _a (t) = \frac{1}{\sqrt{2 \pi a}} \, e^{- \frac{t^2}{2 a}}, \end{aligned}$$

where the variance is chosen to be some tuning parameter \(a > 0\). We prove in Lemma 2 of “Appendix A” that \(\omega _a\) satisfies the above conditions. Note that this type of weight has also been employed by Henze and Zirkler (1990). For this explicit function, our statistics have the expressions

$$\begin{aligned} G_{n,a}^{(1)}&= \frac{2}{n} \sum \limits _{1 \le j < k \le n} \left\{ \phantom {\exp \left( - \tfrac{Y_{(k)}^2}{2 a}\right) } \left( 1 - \varPhi \left( \tfrac{Y_{(k)}}{\sqrt{a}} \right) \right) \left( (Y_{(j)}^2 - 1)(Y_{(k)}^2 - 1) + a Y_{(j)} Y_{(k)} \right) \right. \nonumber \\&\quad \left. \, + \frac{a}{\sqrt{2 \pi a}} \, \exp \left( - \tfrac{Y_{(k)}^2}{2 a}\right) \left( - Y_{(j)}^2 Y_{(k)} + Y_{(k)} + Y_{(j)} \right) \right\} \nonumber \\&\quad + \frac{1}{n} \sum \limits _{j=1}^{n} \left\{ \phantom {\frac{a}{\sqrt{2 \pi a}}} \left( 1 - \varPhi \left( \tfrac{Y_{j}}{\sqrt{a}}\right) \right) \left( Y_j^4 + (a - 2) Y_j^2 + 1 \right) \right. \nonumber \\&\quad \left. + \frac{a}{\sqrt{2 \pi a}} \, \exp \left( - \tfrac{Y_j^2}{2 a}\right) \left( 2 Y_j - Y_j^3 \right) \right\} \end{aligned}$$
(9)

and

$$\begin{aligned} G_{n,a}^{(2)}&= \frac{2}{n} \sum \limits _{1 \le j < k \le n} \left\{ Y_{(j)} Y_{(k)} \left[ \left( Y_{(j)} Y_{(k)} + a \right) \left( 1 - \varPhi \left( \tfrac{Y_{(k)}}{\sqrt{a}} \right) \right) - a Y_{(j)} \, \omega _a(Y_{(k)}) \right] \right\} \nonumber \\&\quad + \sum \limits _{j=1}^{n} \Bigg \{\frac{Y_j^2}{n} \left[ (Y_j^2 + a) \left( 1 - \varPhi \left( \tfrac{Y_j}{\sqrt{a}} \right) \right) - a Y_j \, \omega _a(Y_j) \right] \nonumber \\&\quad - 2 Y_j \Bigg [Y_j \int _{Y_j}^{\infty } \varPhi (t) \, \omega _a (t) \, \mathrm {d}t - a \varPhi (Y_j) \, \omega _a(Y_j) \nonumber \\&\quad - \frac{a}{\sqrt{2 \pi (1+a)}} \left( 1 - \varPhi \left( \sqrt{\tfrac{1 + a}{a}} \, Y_j \right) \right) \Bigg ] \Bigg \} \nonumber \\&\quad + n \int _{\mathbb {R}} \varPhi (t)^2 \, \omega _a(t) \, \mathrm {d}t , \end{aligned}$$
(10)

where \(Y_{1}, \ldots , Y_{n}\) is shorthand for the normalized sample \(Y_{n, 1}, \ldots , Y_{n,n}\) and \(Y_{(1)} \le \cdots \le Y_{(n)}\) is the ordered sample. Those expressions make the statistics amenable to computations and, with critical values like those given in Sect. 6, the tests can be implemented immediately for any fixed \(a > 0\).

The tuning parameter a determines the decay of the weight function. For tests based on the Laplace or the Fourier transform, the properties that those transformations reflect on the underlying distribution often give a good heuristic for which values of the tuning parameter lead to a high power of the test (see Baringhaus et al. 2000 for examples and explanations). Since the zero-bias transformation is known to preserve many properties of the original distribution, we expect that, at least for our first statistic, the tuning parameter will have little influence on the power for most alternative distributions. Indeed, we will observe that both tests are very stable in this regard. Nevertheless, for some (symmetric) alternative distributions the choice of the tuning parameter is crucial; therefore, we additionally implement our test with an adaptive, data-dependent choice as proposed by Allison and Santana (2015). Particularly interesting is the case \(a \searrow 0\). Here, Baringhaus et al. (2000) have shown that, after suitable rescaling, this limit can be obtained explicitly for many test statistics by using an Abelian theorem for the Laplace transform. (Note that due to different parametrization they let \(a \rightarrow \infty \).) For our statistics, we have

$$\begin{aligned} 2 n \lim \limits _{a \, \searrow \, 0} G_{n, a}^{(1)} = \left( \sum _{j = 1}^{n} (Y_{j}^2 - 1) \, \mathbb {1}\{ Y_{j} \le 0 \} \right) ^2 + \left( \sum _{j = 1}^{n} (Y_{j}^2 - 1) \, \mathbb {1}\{ Y_{j} < 0 \} \right) ^2 \end{aligned}$$
(11)

and

$$\begin{aligned} 2 n \lim \limits _{a \, \searrow \, 0} G_{n, a}^{(2)} = \left( \frac{n}{2} - \sum _{j = 1}^{n} Y_{j}^2 \, \mathbb {1}\{ Y_{j} \le 0 \} \right) ^2 + \left( \frac{n}{2} - \sum _{j = 1}^{n} Y_{j}^2 \, \mathbb {1}\{ Y_{j} < 0 \} \right) ^2, \end{aligned}$$
(12)

that is, in the limit \(a \searrow 0\), \(G_{n,a}^{(1)}\) and \(G_{n,a}^{(2)}\) reject the normality hypothesis for large values of the respective limits in (11) and (12). A proof of those limit relations is given in “Appendix C”. If the underlying distribution of X is continuous, the indicator functions in the above limits are equal almost surely, and the terms can be simplified. A related question is the limit for \(a \rightarrow \infty \). Starting from (9), direct but tedious calculations, mostly involving L’Hospital’s rule, give

$$\begin{aligned} \sqrt{2 \pi }\, n \lim \limits _{a \, \rightarrow \, \infty } \sqrt{a} \, G_{n,a}^{(1)} =&\sum \limits _{1 \le j < k \le n} \Big \{ 2 Y_{(j)}^2 Y_{(k)} - Y_{(j)} Y_{(k)}^2 + \frac{1}{3} Y_{(j)} Y_{(k)}^4 - Y_{(j)}^2 Y_{(k)}^3 \Big \} \\&+ \sum \limits _{j=1}^{n} \left\{ - \frac{1}{3} Y_{(j)}^5 + j \, Y_{(j)}^3 - 2(j - 1) Y_{(j)} \right\} \end{aligned}$$

We omit the calculations as they provide no further insight. It remains open if a similar limit exists for the second statistic \(G_{n,a}^{(2)}\).

Having discussed the framework for the implementation of our tests, it remains to introduce the setting for our theoretical studies. Namely, to develop the asymptotic theory, we let \({\mathcal {B}}\) be the Borel-\(\sigma \)-field of \(\mathbb {R}\) and \({\mathcal {L}}^1\) the Lebesgue measure on \(\mathbb {R}\), and consider the Hilbert space

$$\begin{aligned} {\mathcal {H}} = L^2(\mathbb {R}, {\mathcal {B}}, \omega \, \mathrm {d} {\mathcal {L}}^1) \end{aligned}$$

of measurable, square-integrable functions \(f: \mathbb {R}\rightarrow \mathbb {R}\). Notice that the functions figuring within the integral in the definition of \(G_n^{(1)}\) and \(G_n^{(2)}\) are \(({\mathcal {A}}\,\otimes \,{\mathcal {B}}, {\mathcal {B}})\)-measurable and random elements of \({\mathcal {H}}\). We denote by

$$\begin{aligned} \left||f \right||_{{\mathcal {H}}} = \left( \int _{\mathbb {R}} \big |f(t)\big |^2 \, \omega (t) \, \mathrm {d}t \right) ^{1/2}, \qquad \langle f, g \rangle _{{\mathcal {H}}}=\int _{\mathbb {R}} f(t)g(t) \, \omega (t) \, \mathrm {d}t \end{aligned}$$

the usual norm as well as the usual inner product in \({\mathcal {H}}\). Furthermore, we write \({\mathcal {U}}_n(s) \approx {\mathcal {V}}_n(s)\) whenever

$$\begin{aligned} \left||{\mathcal {U}}_n - {\mathcal {V}}_n \right||_{{\mathcal {H}}} = o_{\mathbb {P}}(1) . \end{aligned}$$

Here, \({\mathcal {U}}_n\) and \({\mathcal {V}}_n\) are random elements of our Hilbert space. For the approximations associated with this notation, Lemma 3 stated in “Appendix A” will be essential. We have also deferred some asymptotic expansions to “Appendix B” so it is easier to grasp the main ideas of the proofs. In the following, we denote convergence in distribution by and write \(O_{\mathbb {P}}(1)\) for boundedness in probability.

3 The limit null distributions

Our first results for the statistics concern the study of their behaviour under the hypothesis. In particular, we derive the limit distributions for \(n \rightarrow \infty \) when the normality hypothesis (1) holds. Therefore, we assume in this section that \(X, X_1, X_2, \ldots \) are iid. random variables with \(\mathbb {P}^X = {\mathcal {N}}(0, 1)\). By \(\varphi \), we denote the density function of the standard normal law.

Theorem 2

There exists a centred Gaussian element \({\mathcal {W}}^{(2)}\) of \({\mathcal {H}}\) with covariance kernel

$$\begin{aligned} {\mathcal {K}}^{(2)} (s,t)= & {} \big (2(s + t) - (st + 3)(s \wedge t) + (s + t)(s \wedge t)^2 - (s \wedge t)^3\big ) \varphi (s \wedge t)\\&{}+{} (st + 3) \big (\varPhi (s \wedge t) - \varPhi (s) \varPhi (t)\big ) + (t - 2s) \varphi (t) \varPhi (s)\\&{}+{} (s - 2t) \varphi (s) \varPhi (t) - \frac{st}{2} \varphi (s) \varphi (t) - 4 \varphi (s) \varphi (t), \quad s, t \in \mathbb {R}, \end{aligned}$$

where \(s \wedge t = \min \{ s, t \}\), such that

Proof

Note that a simple change of variable in the integral gives

$$\begin{aligned} G_n^{(2)} = \frac{1}{S_n} \int _{\mathbb {R}} \big |\sqrt{n} \, U_n(s)\big |^2 \, \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \mathrm {d}s , \end{aligned}$$
(13)

where

$$\begin{aligned} U_n(s) = {\widehat{F}}_n^X (s) - \varPhi \left( \frac{s - {\overline{X}}_n}{S_n} \right) \end{aligned}$$

and

$$\begin{aligned} {\widehat{F}}_n^X (s) = \frac{1}{n} \sum _{j=1}^{n} \frac{X_j - {\overline{X}}_n}{S_n^2} \, (X_j - s) \, \mathbb {1}\{X_j \le s\}, \quad s \in \mathbb {R}. \end{aligned}$$

The idea of the proof is to show that \(U_n\) converges weakly to the Gaussian element of \({\mathcal {H}}\) stated in the theorem and to use Lemma 3 from “Appendix A” to replace the shifted weight function in the integral above by \(\omega (s)\). Indeed, with (26) from Lemmata 4 and 5 we have

$$\begin{aligned} \sqrt{n} \, U_n(s) \approx&\frac{1}{S_n^2} \, \sqrt{n} \left\{ \frac{1}{n} \sum \limits _{j=1}^{n} X_j (X_j - s) \mathbb {1}\{ X_j \le s \} - {\overline{X}}_n \, \mathbb {E}\big [ (X - s) \mathbb {1}\{X \le s\} \big ] \right. \\&\left. \, - \frac{1}{n} \sum \limits _{j=1}^{n} X_j^2 \, \varPhi (s) - S_n \, \varphi (s) \left( (1 - S_n) \cdot s - {\overline{X}}_n \right) \right\} . \end{aligned}$$

Since

$$\begin{aligned} \sqrt{n} \, (1 - S_n) = \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} \frac{1}{2} \left( 1 - X_j^2 \right) + o_\mathbb {P}(1) , \end{aligned}$$

we obtain

$$\begin{aligned} \sqrt{n} \, U_n(s) \approx \frac{1}{S_n^2} \, \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} W_j(s), \end{aligned}$$
(14)

where

$$\begin{aligned} W_j(s) =&\, X_j (X_j - s) \mathbb {1}\{ X_j \le s \} + X_j \big ( \varphi (s) + s \varPhi (s) \big ) \\&- X_j^2 \, \varPhi (s) - \left( \tfrac{1}{2} \, (1 - X_j^2) \cdot s - X_j \right) \varphi (s) . \end{aligned}$$

Notice that \(W_1, \ldots , W_n\) are iid. random elements of \({\mathcal {H}}\) with \(\mathbb {E}W_1 = 0\) (as \(F^X = \varPhi \) under \(H_0\), cf. Theorem 1) and \(\mathbb {E}\left||W_1 \right||_{\mathcal {H}}^2 < \infty \). The central limit theorem for separable Hilbert spaces, see Corollary 10.9 in Ledoux and Talagrand (2011), provides the existence of a centred Gaussian element \({\mathcal {W}}^{(2)} \in {\mathcal {H}}\) with

By (14), \(\left||\sqrt{n} \, U_n \right||_{{\mathcal {H}}} = O_\mathbb {P}(1)\) and Lemma 4 implies \(\mathop {\sup }\nolimits _{s \, \in \, \mathbb {R}} \big | U_n(s) \big | \le 2\)\(\mathbb {P}\)-almost surely (a.s.) for each \(n \in \mathbb {N}\). Thus, with Lemma 3, (13) reads as

The continuous mapping theorem and Slutsky’s lemma imply

Since the function \({\mathcal {K}}^{(2)}\) defined in the statement of the theorem satisfies \({\mathcal {K}}^{(2)}(s,t) = \mathbb {E}[ W_1(s) \, W_1(t) ]\), it is the covariance kernel of \({\mathcal {W}}^{(2)}\) and we are done. \(\square \)

For \(G_n^{(1)}\), the limit distribution under the hypothesis can be obtained in a similar manner. Starting with

$$\begin{aligned} G_n^{(1)} = \frac{n}{S_n} \int _{\mathbb {R}} \left( {\widehat{F}}_n^X (s) - \frac{1}{n} \sum _{j=1}^{n} \mathbb {1}\{ X_j \le s \} \right) ^2 \omega \left( \frac{s - {\overline{X}}_n}{S_n} \right) \, \mathrm {d}s , \end{aligned}$$
(15)

the reasoning closely parallels that of the proof of Theorem 2.

Corollary 1

There exists a centred Gaussian element \({\mathcal {W}}^{(1)}\) of \({\mathcal {H}}\) with covariance kernel

$$\begin{aligned} {\mathcal {K}}^{(1)} (s,t)= & {} \big ((s + t) - (st + 1)(s \wedge t) + (s + t)(s \wedge t)^2 - (s \wedge t)^3\big ) \varphi (s \wedge t)\\&{}+{} (st + 2)\big (\varPhi (s \wedge t) - \varPhi (s) \varPhi (t)\big ) - s \varPhi (s) \varphi (t) - t \varPhi (t) \varphi (s) \\&{}-{} \varphi (s) \varphi (t), \quad s,t \in \mathbb {R}, \end{aligned}$$

such that

Remark

The distribution of \(\left||{\mathcal {W}}^{(k)} \right||_{{\mathcal {H}}}^2\), \(k = 1, 2\), that is, the limit distribution of \(G_n^{(k)}\) under the hypothesis, is that of \(\sum _{j=1}^{\infty } \lambda _j^{(k)} N_j^2\). Here, \(N_1, N_2, \ldots \) are independent standard Gaussian random variables and \(\lambda _1^{(k)}, \lambda _2^{(k)}, \ldots \) are the nonzero eigenvalues of the operator

$$\begin{aligned} {\mathcal {H}} \rightarrow {\mathcal {H}}, \quad f \longmapsto \int _{\mathbb {R}} {\mathcal {K}}^{(k)} (\cdot , t) \, f(t) \, \omega (t) \, \mathrm {d}t , \end{aligned}$$

\(k = 1, 2\). Considering the complexity of \({\mathcal {K}}^{(k)}\), it does not seem possible to determine \(\lambda _j^{(k)}\) explicitly. Thus, in practice, critical values are obtained by simulation rather than by using asymptotic results. An alternative approach to gain theoretically justified (approximate) critical values is to calculate the first four moments of the limit null distribution and fit a representative of the Pearson- or Johnson-family of distributions to those moments (see Henze 1990 for an example). Since we do not face any complications in computing the critical values, we will only pursue the empirical approach.

4 Contiguous alternatives

Adjusting the argumentation of Henze and Wagner (1997), we will derive non-degenerate limit distributions for our statistics under contiguous alternatives converging to the normal distribution at rate \(n^{-1/2}\). To this end, we introduce a triangular array of row-wise iid. random variables \(X_{n,1}, \ldots , X_{n,n}\), \(n \in \mathbb {N}\), with Lebesgue density

$$\begin{aligned} p_n (x) = \varphi (x) \cdot \left( 1 + \tfrac{1}{\sqrt{n}} \, c(x) \right) , \quad x \in \mathbb {R}. \end{aligned}$$

Here, \(c : \mathbb {R}\rightarrow \mathbb {R}\) is a measurable, bounded function satisfying

$$\begin{aligned} \int _{\mathbb {R}} c(x) \, \varphi (x) \, \mathrm {d}x = 0 . \end{aligned}$$

Notice that, by the boundedness of c, we may assume n to be large enough to ensure \(p_n \ge 0\). We set

$$\begin{aligned} \mu _n = \bigotimes \limits _{j=1}^{n} (\varphi {\mathcal {L}}^1) , \quad \nu _n = \bigotimes \limits _{j=1}^{n} (p_n {\mathcal {L}}^1) \end{aligned}$$

which are measures on \((\mathbb {R}^n, {\mathcal {B}}^n)\), where \({\mathcal {B}}^n\) is the Borel-\(\sigma \)-field of \(\mathbb {R}^n\). Apparently, \(\nu _n\) is absolutely continuous with respect to \(\mu _n\) and we can look upon the Radon–Nikodym derivative \(L_n = \frac{\mathrm {d}\nu _n}{\mathrm {d}\mu _n}\). By a Taylor expansion,

$$\begin{aligned} \log \left( L_n (X_{n,1}, \ldots , X_{n,n}) \right)&= \sum \limits _{j=1}^{n} \log \left( 1 + \frac{1}{\sqrt{n}} \, c(X_{n,j}) \right) \\&= \sum \limits _{j=1}^{n} \left( \frac{1}{\sqrt{n}} \, c(X_{n,j}) - \frac{1}{2n} \, c(X_{n,j})^2 \right) + o_{\mathbb {P}}(1) \end{aligned}$$

whenever \((X_{n,1}, \ldots , X_{n,n})\) has distribution \(\mu _n\). (Note that in this case the triangular array essentially reduces to a sequence of iid. random variables with density \(\varphi \).) Therefore, viewing \(L_n\) as a random element \((\mathbb {R}^n, {\mathcal {B}}^n, \mu _n) \rightarrow (\mathbb {R}, {\mathcal {B}})\), the central limit theorem and the law of large numbers give

where

$$\begin{aligned} \tau ^2 = \int _{\mathbb {R}} c(x)^2 \, \varphi (x) \, \mathrm {d}x \end{aligned}$$

and denotes convergence in distribution under \(\mu _n\). By LeCam’s first Lemma (see, for instance, Hájek et al. 1999, p. 253, Corollary 1), \(\nu _n\) is contiguous to \(\mu _n\). Interpreting \(U_n\) from the proof of Theorem 2 as \(U_n : \mathbb {R}^n \rightarrow {\mathcal {H}}\), we have shown that and (14) reads as

where and

Thus, by contiguity, and, in particular,

(16)

Defining

we have, under ,

with \(\zeta \in {\mathcal {H}}\). Consequently, for any \(k \in \mathbb {N}\), \(v \in \mathbb {R}^k\) and \(s_1, \ldots , s_k \in \mathbb {R}\), the multivariate central limit theorem, the law of large numbers and Slutsky’s lemma imply

Here, \(\varSigma = \left( {\mathcal {K}}^{(2)}(s_i, s_j) \right) _{1 \le i,j \le k}\), with \({\mathcal {K}}^{(2)}\) the covariance kernel of \({\mathcal {W}}^{(2)}\) from Theorem 2, and \(\zeta _k = \big ( \zeta (s_1), \ldots , \zeta (s_k) \big )^\top \). Therefore,

and LeCam’s third Lemma (see Hájek et al. 1999, p. 259, Lemma 2) implies

(17)

where denotes convergence of the finite-dimensional distributions (under \(\nu _n\)). In the proof of Theorem 2, we have shown that which entails the tightness of \(\{W_n^* \, | \, n \in \mathbb {N}\}\) under \(\mu _n\). As \(\{W_n^* \, | \, n \in \mathbb {N}\}\) remains tight under \(\nu _n\) by contiguity, (17) yields

Combining this with (16), we have shown the following theorem.

Theorem 3

Under the triangular array \(X_{n,1} , \ldots , X_{n,n}\) with \(\mathbb {P}^{X_{n,1}} = p_n {\mathcal {L}}^1\), we have

Since Corollary 1 is obtained with the same line of proof used in Theorem 2, we can likewise conclude

where \({\widetilde{\zeta }} (s) = \int {\widetilde{\eta }}(s,x) \, c(x) \, \varphi (x) \, \mathrm {d}x\) and

$$\begin{aligned} {\widetilde{\eta }}(s,x) = \big (x(x - s) - 1\big ) \mathbb {1}\{ x \le s \} + \left( 1 + x \cdot s - x^2\right) \varPhi (s) + x \varphi (s). \end{aligned}$$

From these statements, we discern that tests based on any of our statistics are able to detect contiguous alternatives which converge, at rate \(n^{-1/2}\), to the class of normal distributions. For further insights on contiguity, we refer to Roussas (1972) and Sen (1981).

5 Consistency and limit distributions under fixed alternatives

The major goal of this section is to establish that our test procedures can detect any fixed alternative satisfying a weak moment condition. For some of those distributions, we can even extend the consistency results and derive weak limits of our statistics.

We return to the nonparametric setting of Sect. 2 and let \(X, X_1, X_2, \ldots \) be iid. random variables with distribution function F and \(\mathbb {E}[X^2] < \infty \). Further, we assume \(\mathbb {E}X = 0\) and \(\mathbb {V}(X) = 1\).

Theorem 4

As \(n \rightarrow \infty \), we have

$$\begin{aligned} \frac{G_n^{(1)}}{n} \longrightarrow \int _{\mathbb {R}} \left( F^X (s) - F(s) \right) ^2 \omega (s) \, \mathrm {d}s = \left||F^X - F \right||_{{\mathcal {H}}}^2 = \varDelta ^{(1)} \end{aligned}$$

and

$$\begin{aligned} \frac{G_n^{(2)}}{n} \longrightarrow \int _{\mathbb {R}} \left( F^X (s) - \varPhi (s) \right) ^2 \omega (s) \, \mathrm {d}s = \left||F^X - \varPhi \right||_{{\mathcal {H}}}^2 = \varDelta ^{(2)} , \end{aligned}$$

where each convergence is in probability.

Proof

We denote by \({\widehat{F}}_n\) the empirical distribution function of \(X_1, \ldots , X_n\). By the classical Glivenko–Cantelli theorem and (25) from Lemma 4,

$$\begin{aligned}&\left| \left||{\widehat{F}}_n^X - {\widehat{F}}_n \right||_{{\mathcal {H}}}^2 - \varDelta ^{(1)} \right| \\&\quad \le 4 \left( \sup \limits _{s \, \in \, \mathbb {R}} \Big | {\widehat{F}}_n^X(s) - F^X(s) \Big | + \sup \limits _{s \, \in \, \mathbb {R}} \Big | {\widehat{F}}_n(s) - F(s) \Big | \right) \int _{\mathbb {R}} \omega (s) \, \mathrm {d}s \\&\quad \longrightarrow 0 \end{aligned}$$

\(\mathbb {P}\)-a.s., as \(n \rightarrow \infty \). Rewriting \(G_n^{(1)}\) as in (15) and applying the second part of Lemma 3, we have

$$\begin{aligned} \frac{G_n^{(1)}}{n} = \left||{\widehat{F}}_n^X - {\widehat{F}}_n \right||_{{\mathcal {H}}}^2 + o_\mathbb {P}(1). \end{aligned}$$

The proof of the second claim is almost identical. \(\square \)

We recall that a level-\(\alpha \)-test based on \(G_n^{(k)}\), \(k = 1, 2\), rejects the hypothesis if \(G_n^{(k)} > c_n^{(k)}\), where \(c_n^{(k)}\) is the \((1 - \alpha )\)-quantile of the distribution of \(G_n^{(k)}\) under \(H_0\). Theorem 2 (and Corollary 1) ensures that \(\sup _{n \, \in \, \mathbb {N}} c_n^{(k)} < \infty \). Now, by Theorem 1, the limits \(\varDelta ^{(1)}\) and \(\varDelta ^{(2)}\) figuring in Theorem 4 are positive if X has a non-normal distribution. Consequently,

$$\begin{aligned} \mathbb {P}\left( G_n^{(k)}> c_n^{(k)} \right) \ge \mathbb {P}\left( n^{-1} G_n^{(k)} > n^{-1} \sup _{n \, \in \, \mathbb {N}} c_n^{(k)} \right) \longrightarrow 1, \end{aligned}$$

as \(n \rightarrow \infty \), and a level-\(\alpha \)-test based on \(G_n^{(1)}\) or \(G_n^{(2)}\) is consistent against each alternative with existing second moment.

The following theorem concerns the limit distributions of our statistics under fixed alternatives.

Theorem 5

Let \(X, X_1, X_2, \ldots \) be iid., non-normal random variables with distribution function F and \(\mathbb {E}X^4 < \infty \). Assume that X has a continuously differentiable density function p satisfying \(\sup _{s \, \in \, \mathbb {R}} |p(s)| \le K_1 < \infty \) and \(\sup _{s \, \in \, \mathbb {R}} |p^{\prime }(s)| \le K_2 < \infty \). W.l.o.g., \(\mathbb {E}X = 0\) and \(\mathbb {V}(X) = 1\). Then, as \(n \rightarrow \infty \),

(18)

where

$$\begin{aligned} \tau _{(1)}^2 = 4 \int _{\mathbb {R}} \int _{\mathbb {R}} {\mathfrak {C}}^{(1)}(s,t) \left( F^X(s) - F(s) \right) \left( F^X(t) - F(t) \right) \, \omega (s) \, \omega (t) \, \mathrm {d}s \, \mathrm {d}t \end{aligned}$$

with

$$\begin{aligned} {\mathfrak {C}}^{(1)}(s,t) = \mathbb {E}\big [ C^{(1)}(s) \, C^{(1)}(t) \big ], \quad s,t \in \mathbb {R}, \end{aligned}$$

and

$$\begin{aligned} C^{(1)}(s) =&\big (X (X - s) - 1\big ) \mathbb {1}\{ X \le s \} - X^2 F^X(s) + \big ( 1 + X \cdot s \big ) F(s) \\&- \left( \tfrac{1}{2} (1 - X^2) \cdot s - X \right) p(s) + \left( 2X - \tfrac{1}{2} (1 - X^2) \cdot s \right) d^X(s) , \quad s \in \mathbb {R}. \end{aligned}$$

Proof

Setting

$$\begin{aligned} V_n (s) = {\widehat{F}}_n^X (s) - \frac{1}{n} \sum \limits _{j=1}^{n} \mathbb {1}\{ X_j \le s \} - F^X \left( \frac{s - {\overline{X}}_n}{S_n} \right) + F \left( \frac{s - {\overline{X}}_n}{S_n} \right) ,\quad s \in \mathbb {R}, \end{aligned}$$

a change of variable in both integrals and an integral decomposition, as used by Chapman (1958), gives

$$\begin{aligned} \sqrt{n} \left( \frac{G_n^{(1)}}{n} - \varDelta ^{(1)} \right) =&~ \frac{1}{S_n} \left\{ 2 \int _{\mathbb {R}} \sqrt{n} \, V_n(s) \cdot \left[ F^X ({\widetilde{s}}) - F ({\widetilde{s}}) \right] \omega ({\widetilde{s}}) \, \mathrm {d}s \right. \nonumber \\&\left. + \frac{1}{\sqrt{n}} \int _{\mathbb {R}} \big | \sqrt{n} \, V_n(s) \big |^2 \, \omega ({\widetilde{s}}) \, \mathrm {d}s \right\} , \end{aligned}$$

where \({\widetilde{s}} = (s - {\overline{X}}_n) / S_n\). Under the assumption \(\left||\sqrt{n} \, V_n \right||_{{\mathcal {H}}} = O_\mathbb {P}(1)\), Hölder’s inequality, Lebesgue’s theorem and Slutsky’s lemma give

$$\begin{aligned}&\left| \int _{\mathbb {R}} \sqrt{n} \, V_n(s) \cdot \left[ F^X ({\widetilde{s}}) - F ({\widetilde{s}}) \right] \omega (s) \, \mathrm {d}s - \big \langle \sqrt{n} \, V_n , \, F^X - F \big \rangle _{{\mathcal {H}}} \right| \\&\quad \le \left||\sqrt{n} \, V_n \right||_{{\mathcal {H}}} \left( \int _{\mathbb {R}} \big | F^X({\widetilde{s}}) - F^X(s) + F(s) - F({\widetilde{s}}) \big |^2 \omega (s) \, \mathrm {d}s \right) ^{1/2} \\&\quad = o_\mathbb {P}(1). \end{aligned}$$

Here, we used that both \(F^X\) and F are continuous distribution functions. Using that \(\sup _{s \, \in \, \mathbb {R}} \big |V_n(s)\big | \le 4\)\(\mathbb {P}\)-a.s. for each \(n \in \mathbb {N}\) (note that \({\widehat{F}}_n^X\) is a distribution function as well by Lemma 4), Lemma 3 from “Appendix A” yields

$$\begin{aligned} \sqrt{n} \left( \frac{G_n^{(1)}}{n} - \varDelta ^{(1)} \right) = 2 \, \big \langle \sqrt{n} \, V_n , \, F^X - F \big \rangle _{{\mathcal {H}}} + \frac{1}{\sqrt{n}} \left||\sqrt{n} \, V_n \right||_{{\mathcal {H}}}^2 + o_\mathbb {P}(1). \end{aligned}$$
(19)

To verify the above assumption, we show that \(\sqrt{n} \, V_n\) converges in distribution in \({\mathcal {H}}\). In this regard, (26) and Lemma 5 imply

$$\begin{aligned} \sqrt{n} \, V_n(s) \approx&\, \frac{\sqrt{n}}{S_n^2} \left\{ \frac{1}{n} \sum \limits _{j=1}^{n} X_j (X_j - s) \mathbb {1}\{ X_j \le s \} - {\overline{X}}_n \, \mathbb {E}\big [ (X - s) \mathbb {1}\{X \le s\} \big ] \right. \\&\left. - S_n^2 \left( \frac{1}{n} \sum \limits _{j=1}^{n} \mathbb {1}\{ X_j \le s \} - F(s) \right) - \frac{1}{n} \sum \limits _{j=1}^{n} X_j^2 \, F^X(s) \right. \\&\left. - S_n \left( (1 - S_n) \cdot s - {\overline{X}}_n \right) \left( d^X(s) - p(s) \right) \phantom {\sum \limits _{j}^{n}}\right\} . \end{aligned}$$

By the classical Glivenko–Cantelli theorem and \(\sqrt{n} \, (S_n^2 - 1) = O_{\mathbb {P}}(1)\),

$$\begin{aligned} \sqrt{n} \, S_n^2 \left( \frac{1}{n} \sum \limits _{j=1}^{n} \mathbb {1}\{ X_j \le s \} - F(s) \right) \approx \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} \left( \mathbb {1}\{ X_j \le s \} - F(s) \right) . \end{aligned}$$

Together with the expansion \(\sqrt{n} \, (1 - S_n) = n^{-1/2} \sum _{j=1}^{n} \frac{1}{2} (1 - X_j^2) + o_\mathbb {P}(1)\), we have

$$\begin{aligned} \sqrt{n} \, V_n(s) \approx \frac{1}{S_n^2} \, \frac{1}{\sqrt{n}} \sum \limits _{j=1}^{n} Z_j(s) , \end{aligned}$$
(20)

where

$$\begin{aligned} Z_j(s) =&\, X_j (X_j - s) \mathbb {1}\{ X_j \le s \} + X_j \left( d^X(s) + s F(s) \right) - X_j^2 \, F^X(s) \\&- \mathbb {1}\{ X_j \le s \} + F(s) - \left( \tfrac{1}{2} (1 - X_j^2) \cdot s - X_j \right) \left( d^X(s) - p(s) \right) . \end{aligned}$$

Since \(Z_1, \ldots , Z_n\) are iid. random elements of \({\mathcal {H}}\) with \(\mathbb {E}Z_1 = 0\) as well as \(\mathbb {E}\left||Z_1 \right||_{\mathcal {H}}^2 < \infty \), the central limit theorem for separable Hilbert spaces implies

where \({\mathcal {Z}} \in {\mathcal {H}}\) is a centred Gaussian element. In particular, \(\left||\sqrt{n} \, V_n \right||_{{\mathcal {H}}}\) is bounded in probability by (20), and (19) holds. The continuous mapping theorem and Slutsky’s Lemma imply

Denoting the covariance kernel of \({\mathcal {Z}}\) by

$$\begin{aligned} {\mathfrak {C}}^{(1)}(s,t) = \mathbb {E}\big [ {\mathcal {Z}}(s) \, {\mathcal {Z}}(t) \big ], \end{aligned}$$

the limiting random variable \(2 \, \big \langle {\mathcal {Z}}, \, F^X - F \big \rangle _{{\mathcal {H}}}\) has the normal distribution \({\mathcal {N}}(0, \tau _{(1)}^2)\), where

$$\begin{aligned} \tau _{(1)}^2&= 4 \, \mathbb {E}\Big [ \big \langle {\mathcal {Z}}, \, F^X - F \big \rangle _{{\mathcal {H}}}^2 \Big ] \\&= 4 \int _{\mathbb {R}} \int _{\mathbb {R}} {\mathfrak {C}}^{(1)}(s,t) \left( F^X(s) - F(s) \right) \left( F^X(t) - F(t) \right) \omega (s) \, \omega (t) \, \mathrm {d}s \, \mathrm {d}t . \end{aligned}$$

\(\square \)

Applying the reasoning of Theorem 5 to \(G_n^{(2)}\), \(\varPhi \) will drop out when considering the decomposition of the integrals. Proceeding with the remaining terms exactly as before, we obtain an analogous statement for the second statistic under slightly weaker conditions.

Corollary 2

Let \(X, X_1, X_2, \ldots \) be iid., non-normal random variables with distribution function F, Lebesgue density p and \(\mathbb {E}X^4 < \infty \). Further, assume \(\sup _{s \, \in \, \mathbb {R}} |p(s)| < \infty \) and \(\mathbb {E}X = 0\), \(\mathbb {V}(X) = 1\). Then, as \(n \rightarrow \infty \),

(21)

where

$$\begin{aligned} \tau _{(2)}^2 = 4 \int _{\mathbb {R}} \int _{\mathbb {R}} {\mathfrak {C}}^{(2)}(s,t) \left( F^X(s) - \varPhi (s) \right) \left( F^X(t) - \varPhi (t) \right) \, \omega (s) \, \omega (t) \, \mathrm {d}s \, \mathrm {d}t \end{aligned}$$

with

$$\begin{aligned} {\mathfrak {C}}^{(2)}(s,t) = \mathbb {E}\big [ C^{(2)}(s) \, C^{(2)}(t) \big ],\quad s,t\in \mathbb {R}, \end{aligned}$$

and

$$\begin{aligned} C^{(2)}(s) =&\, X (X - s) \mathbb {1}\{ X \le s \} - X^2 \, F^X(s) + X \cdot s F(s) \\&+ \left( 2 X - \tfrac{1}{2} (1 - X^2) \cdot s \right) d^X(s), \quad s \in \mathbb {R}. \end{aligned}$$

Remark

Note that for Theorem 5, we redeployed a line of proof put forward by Baringhaus et al. (2017). The asymptotic normality also qualifies our statistics for the applications they propose (see also Baringhaus and Henze 2017).

First, we fix \(\alpha \in (0,1)\) and denote by \(q_{\alpha } = \varPhi ^{-1} (1 - \alpha /2)\) the \((1 - \alpha /2)\)-quantile of the standard normal distribution. Letting

$$\begin{aligned} {\widehat{\tau }}_{(k),n}^2 = {\widehat{\tau }}_{(k),n}^2 (X_1, \ldots , X_n) \end{aligned}$$

be a (weakly) consistent estimator of \(\tau _{(k)}^2\), \(k = 1,2\), figuring in Theorem 5 and Corollary 2, respectively, (18) and (21) immediately indicate that

$$\begin{aligned} I_n = \left[ \frac{G_n^{(k)}}{n} - \frac{q_{\alpha } \, {\widehat{\tau }}_{(k),n}}{\sqrt{n}} , \, \frac{G_n^{(k)}}{n} + \frac{q_{\alpha } \, {\widehat{\tau }}_{(k),n}}{\sqrt{n}} \right] \end{aligned}$$
(22)

is an asymptotic confidence interval for \(\varDelta ^{(k)} = \varDelta ^{(k)}(F)\) at level \(1 - \alpha \). Here, F satisfies the assumptions of Theorem  5 (or Corollary 2). As was briefly explained in the introduction, one objective of Stein’s method for the normal distribution is to assess how close a given distribution is to being normal. Thus, seeing \(\varDelta ^{(1)}(F)\) and \(\varDelta ^{(2)}(F)\) as ’measures’ of how far F differs from the standard normal distribution, we also developed a procedure for empirical assessments of this kind.

Second, we emphasize that our statistics can be employed for inverse testing problems. Namely, if \(\varDelta _0 > 0\) is a given distance of tolerance, tests that reject \(H_{\varDelta _0}\) if

$$\begin{aligned} \frac{G_n^{(k)}}{n} \le \varDelta _0 - \frac{{\widehat{\tau }}_{(k),n}}{\sqrt{n}} \, \varPhi ^{-1}(1 - \alpha ) \end{aligned}$$

are asymptotic level-\(\alpha \)-tests for the problem

$$\begin{aligned} H_{\varDelta _0} : \varDelta ^{(k)}(F) \ge \varDelta _0 \text { against } K_{\varDelta _0} : \varDelta ^{(k)}(F) < \varDelta _0 . \end{aligned}$$

These tests are consistent against each alternative and aim at validating a whole nonparametric neighbourhood of the hypothesized, underlying normality. Unfortunately, the direct approach to obtain estimators for \(\tau ^2_{(k)}\) does not lead to feasible results.

Finally, we suppose \(\{ c_n^{(k)} \} \subset (0, \infty )\) is the sequence of critical values for a level-\(\alpha \)-test based on \(G_n^{(k)}\), \(k = 1,2\). For an alternative distribution F satisfying the relevant prerequisites of Theorem 5 or Corollary 2, we can approximate the power of the test against this alternative by

(23)

Note that this last application does (in theory) not need an estimator of \(\tau _{(k)}^2\). Instead, \(\tau _{(k)}^2\) and \(\varDelta ^{(k)}\) have to be calculated for the particular fixed alternative.

6 Empirical results

In this section, we investigate the behaviour of our two statistics given through the explicit formulas (9) and (10). It is organized as follows: First, we compare the two tests over a range of possible tuning parameters and alternative distributions. Based on these results, we choose our final procedure and describe its implementation. Then, a brief summary of the competing tests for an empirical power study is given. We display the performance of our test in comparison with the established tests in a finite-sample power study. Finally, we add results for the applications from the last section (as described in the Remark) for three alternative distributions. The simulations are performed using the statistical computing environment R, see R Core Team (2017). Notice that there are several comparative simulation studies for testing normality in the literature, as witnessed by Baringhaus et al. (1989), Farrell and Rogers-Stewart (2006), Landry and Lepage (1992), Pearson et al. (1977), Romão et al. (2010), Shapiro et al. (1968), Yap and Sim (2011) and others.

Since we consider two new families of tests both depending on the choice of the tuning parameter a, we will calculate the finite-sample power for a range of different parameters. In each simulation, we consider the sample sizes \(n = 20\), \(n = 50\) and \(n = 100\), and fix the nominal level of significance \(\alpha \) to 0.05. To implement the tests for any of the (fixed) values \(a \in \{0.1, 0.25, 0.5, 1, 1.5, 2, 3\}\), we calculate the critical values by a Monte Carlo simulation with 100,000 repetitions. These critical values for \(G_{n,a}^{(k)}\), \(k = 1, 2\), can be found in Tables 1 and 2 and are taken from there throughout the simulations.

Table 1 Empirical 0.95 quantiles for \(G_{n,a}^{(1)}\) under \(H_0\) (100,000 replications)
Table 2 Empirical 0.95 quantiles for \(G_{n,a}^{(2)}\) under \(H_0\) (100,000 replications)

We choose the alternative distributions to fit the extensive power study of normality tests by Romão et al. (2010), in order to ease the comparison to other tests. Namely, we choose as symmetric distributions the Student \(t_\nu \)-distribution with \(\nu \in \{3, 5, 10\}\) degrees of freedom, as well as the uniform distribution \({\mathcal {U}}(-\sqrt{3}, \sqrt{3})\). The asymmetric distributions are the \(\chi ^2_\nu \)-distribution with \(\nu \in \{5, 15\}\) degrees of freedom, the Beta distributions B(1, 4) and B(2, 5), the Gamma distributions \(\varGamma (1, 5)\) and \(\varGamma (5, 1)\) parametrized by their shape and rate parameter, the Gumbel distribution Gum(1, 2) with location parameter 1 and scale parameter 2, the lognormal distribution LN(0, 1) as well as the Weibull distribution W(1, 0.5) with scale parameter 1 and shape parameter 0.5. As representatives of bimodal distributions, we take the mixture of normal distributions Mix\({\mathcal {N}}(p, \mu , \sigma ^2)\), where the random variables are generated by

$$\begin{aligned} (1 - p) \, {\mathcal {N}}(0, 1) + p \, {\mathcal {N}}(\mu , \sigma ^2), \quad p \in (0, 1), \, \mu \in \mathbb {R}, \, \sigma > 0. \end{aligned}$$

Each entry in Table 3 referring to the finite-sample power of the tests is based on 10,000 replications. From the results in this table, we infer that for asymmetric alternative distributions our tests perform almost identical and are extremely stable over the range of tuning parameters. For the symmetric and bimodal alternatives however, the choice of a can have considerable influence on the power of the test. Moreover, with particular focus on the uniform distribution and the normal mixtures, the test based on \(G_{n,a}^{(1)}\) shows a significantly better performance than \(G_{n,a}^{(2)}\).

Table 3 Empirical rejection rates for \(G_{n,a}^{(k)}\), \(k = 1, 2\) (\(\alpha = 0.05\), 10,000 replications)

Taking this superiority of \(G_{n,a}^{(1)}\), and the fact that the choice of tuning parameter influences the power, into account, we propose as a final procedure a test based on \(G_{n,a}^{(1)}\) calculated by (9) with a data-dependent choice of the tuning parameter a. To implement the latter, we use the algorithm from Allison and Santana (2015), which has already been applied in the recent simulation study by Allison et al. (2017) for tests of exponentiality. Given the standardized sample \(Y_1, \ldots , Y_n\) as in Sect. 2, our test is carried out as follows:

(a):

Fix a grid of possible tuning parameters \(a \in \{ a_1, \ldots , a_\ell \}\) (here: \(a \in \{ 0.1, 0.25, 0.5, 1, 1.5, 2, 3 \}\)).

(b):

Sample from \(Y_1, \ldots , Y_n\) with replacement and, for the obtained bootstrap sample, calculate \(G_{n,a_i}^{(1)}\), \(i = 1, \ldots , \ell \), via (9).

(c):

Repeat step (b) B times (here: \(B = 400\)) and denote the resulting values of the statistic by \(G_{1,a_i}^*, \ldots , G_{B,a_i}^*\), \(i = 1, \ldots , \ell \).

(d):

Calculate the bootstrap powers by \({\widehat{P}}_{a_i} = B^{-1} \sum _{b = 1}^{B} \mathbb {1}\{ G_{b,a_i}^* > c_{n, a_i}(\alpha ) \}\), \(i = 1, \ldots , \ell \), where \(c_{n, a_i}(\alpha )\) is the critical value for a level-\(\alpha \)-test based on \(G_{n,a_i}^{(1)}\) (for the Monte Carlo approximations, see Table 1).

(e):

Choose as the tuning parameter \({\widehat{a}} = \mathrm {arg} \max \{{\widehat{P}}_{a}|a\in \{a_{1},\cdots ,a_{\ell }\}\}\) and apply the test based on \(G_{n, {\widehat{a}}}^{(1)}\) to \(Y_1, \ldots , Y_n\).

We consider the following competitors to this test. As classical and well-known tests, we include the Shapiro–Wilk test (SW), see Shapiro and Wilk (1965), the Shapiro–Francia test (SF), see Shapiro and Francia (1972), and the Anderson–Darling test (AD), see Anderson and Darling (1952). For the implementation of these tests in R, we refer to the package nortest by Gross and Ligges (2015). Tests based on the empirical characteristic function are represented by the Baringhaus–Henze–Epps–Pulley test (BHEP), see Baringhaus and Henze (1988), Epps and Pulley (1983). The BHEP test with tuning parameter \(\beta > 0\) is based on

$$\begin{aligned} \hbox {BHEP} =&\frac{1}{n} \sum _{j, k = 1}^{n} \exp \left( - \frac{\beta ^2}{2} \left( Y_{j} - Y_{k}\right) ^2 \right) \\&\, - \frac{2}{\sqrt{1 + \beta ^2}} \sum _{j = 1}^{n} \exp \left( - \frac{\beta ^2}{2(1 + \beta ^2)} \, Y_{j}^2 \right) + \frac{n}{\sqrt{1 + 2\beta ^2}}, \end{aligned}$$

where \(Y_1, \ldots , Y_n\) is the standardized sample. We fix \(\beta = 1\) and take the critical values from Henze (1990) but also restate them in Table 4.

Furthermore, we include the quantile correlation test of del Barrio–Cuesta–Albertos–Mátran–Rodríguez–Rodríguez (BCMR) based on the \(L^2\)-Wasserstein distance, see del Barrio et al. (1999) and Sect. 3.3 of del Barrio et al. (2000). The BCMR statistic is given by

$$\begin{aligned} \hbox {BCMR} = n \left( 1 - \frac{1}{S_n^2} \left( \sum _{k = 1}^n X_{(k)} \int _{\frac{k - 1}{n}}^{\frac{k}{n}} \varPhi ^{-1}(t) \, \mathrm {d}t \right) ^2 \right) - \int _{\frac{1}{n + 1}}^{\frac{n}{n + 1}} \frac{t (1 - t)}{\left( \varphi \left( \varPhi ^{-1}(t) \right) \right) ^2} \, \mathrm {d}t, \end{aligned}$$

where \(X_{(k)}\) is the k-th order statistic of \(X_1, \ldots , X_n\), \(S_n^2\) is the sample variance and \(\varPhi ^{-1}\) is the quantile function of the standard normal distribution. Simulated critical values be found in the work of Krauczi (2009), or in Table 4.

The Henze–Jiménez-Gamero test (HJG), see Henze and Jiménez-Gamero (2018), uses a weighted \(L^2\)-distance between the empirical moment-generating function of the standardized sample and the moment-generating function of the standard normal distribution. The test is based on

$$\begin{aligned} \hbox {HJG}_\beta&= \frac{1}{n\sqrt{\beta }} \sum _{j,k = 1}^n \exp \left( \frac{(Y_j + Y_k)^2}{4 \beta } \right) - \frac{2}{\sqrt{\beta - 1/2}}\sum _{j = 1}^n \exp \left( \frac{Y_j^2}{4 \beta - 2}\right) \\&\quad + \frac{n}{\sqrt{\beta - 1}} \end{aligned}$$

with \(\beta > 2\). We consider the tuning parameters \(\beta \in \{2.5, 5, 10\}\). Since Henze and Jiménez-Gamero (2018) did not simulate critical values in the univariate case, the empirical critical values can be found in Table 4. This test was proposed recently, so it is not yet included in any other power study. All of the simulated critical values displayed in Table 4 have been confirmed in a simulation with 100,000 replications (compare to Henze 1990; Krauczi 2009).

Table 4 Empirical 0.95 quantiles for BCMR, BHEP and HJG\(_{\beta }\) under \(H_0\) (100,000 replications)
Table 5 Empirical rejection rates for competing procedures (\(\alpha = 0.05\), 10,000 replications)

In Table 5, we display the results of the competitive simulation study, where our test based on the steps (a)–(e) (with bootstrap size \(B = 400\) and values for a as before) is denoted by BE\(_{{\widehat{a}}}\). Each entry is based on 10,000 Monte Carlo replications, and the best-performing test for each distribution and sample size is highlighted for easy reference.

Starting with the symmetric distributions, we see that the SF and SW tests perform best for these models. Interestingly, the HJG\(_\beta \) test has the highest power against Students \(t_{10}\)-distribution but completely fails to detect the uniform alternative. The finite-sample power of our new test for the \(t_\nu \)-distributions is comparable to the BHEP test, but the uniform distribution seems to be a weak spot. Bimodal distributions are best detected by the AD test. The performance of the SW, BCMR, BHEP and SF tests is comparable, while the new BE\(_{{\widehat{a}}}\) procedure has a slightly weaker power and the HJG\(_\beta \) test is clearly inferior for those distributions. Considering the asymmetric alternatives, our new procedure shows its potential by dominating all other procedures for the \(\chi ^2\)-, the Gamma as well as the Gumbel distributions. All procedures do a good job in rejecting the Weibull and the lognormal alternatives.

To conclude the simulation study, we investigate the confidence interval (22) and the power approximation in (23) for the fixed tuning parameter \(a = 1\). As an example, we examine three alternatives also partially considered by Baringhaus et al. (2017), namely the uniform distribution \({\mathcal {U}} \left( -\sqrt{3}, \sqrt{3} \right) \) and the Laplace distribution \(L \left( 0, 1 / \sqrt{2} \right) \) with density \(p(x) = \exp \left( - \sqrt{2} |x| \right) \), \(x \in \mathbb {R}\), as well as the Logistic distribution \(Lo \left( 0, \sqrt{3} / \pi \right) \) with density \(p(x) = \pi \exp \left( - \pi x /\sqrt{3} \right) / \sqrt{3}\left( 1+\exp \left( - \pi x /\sqrt{3} \right) \right) ^2 \), \(x \in \mathbb {R}\). Notice that these alternatives are standardized. The values of \(\varDelta ^{(k)}\) are 0.004167 \((k = 1)\) and 0.000985 \((k = 2)\) for \({\mathcal {U}} \left( -\sqrt{3}, \sqrt{3} \right) \), 0.005225 \((k = 1)\) and 0.001367 \((k = 2)\) for \(L \left( 0, 1 / \sqrt{2} \right) \), and 0.000819 \((k=1)\) and 0.000241 \((k=2)\) for \(Lo \left( 0, \sqrt{3} / \pi \right) \).

Note that the uniform and the Laplace distribution do not satisfy the differentiability condition of Theorem 5, but since they do satisfy all conditions of Corollary 2, we include simulations for \(G_{n,1}^{(1)}\) in those cases as well. The results in Tables 6 and 7 however, when compared to the Logistic distribution or the second statistic \(G_{n,1}^{(2)}\) where all requirements are formally fulfilled, indicate that Theorem 5 should also cover the uniform and, in particular, the Laplace distribution, and thus the result might hold under weaker conditions.

Since the limit variance \(\tau _{(k)}^2\), \(k = 1, 2\), seems inaccessible by computation in each case, we decided to estimate \(\tau _{(k)}^2\), \(k = 1, 2\), by means of simulation. For a sample size of \(n = 1000\) with 10,000 repetitions, the estimated values are 0.000302 \((k = 1)\) and 0.000016 \((k = 2)\) for \({\mathcal {U}} \left( -\sqrt{3}, \sqrt{3} \right) \), 0.002452 \((k = 1)\) and 0.000565 \((k = 2)\) for \(L \left( 0, 1 / \sqrt{2} \right) \), as well as 0.000430 \((k=1)\) and 0.000125 \((k=2)\) for \(Lo \left( 0, \sqrt{3} / \pi \right) \). Table 6 displays the empirical power (in %) of \(G_{n,1}^{(1)}\) and \(G_{n,1}^{(2)}\) against the three alternatives. The nominal level is \(1 - \alpha = 0.95\) which explains the considerably smaller power compared to Table 2 from Baringhaus et al. (2017). The columns denoted by ’Apr’ show the corresponding approximations given by (23), while ’MC’ stands for the empirical rejection rates for 10,000 repetitions. Table 6 shows that the approximate power function (23) often appears as a lower bound to the power of the test statistics, confirming the observations by Baringhaus et al. (2017).

Table 6 Empirical power and approximation (23) against three alternatives

Because no consistent estimator \({\widehat{\tau }}_{(k),n}^2 (X_1, \ldots , X_n)\) of \(\tau _{(k)}^2\), \(k = 1, 2\), is known, a nonparametric bootstrap procedure with \(B = 500\) bootstrap samples (drawn with repetition) is implemented to calculate the empirical coverage probabilities shown in Table 7. The confidence level is set to \(1 - \alpha = 0.9\), and each value is based on 10,000 repetitions. Obviously the empirical coverage probabilities are higher than the confidence level, indicating that the approximate lower and upper bounds of the confidence intervals are too conservative. This might be an effect of the bootstrap variance estimation procedure, but regarding the results of the power approximation, we rather think that an appropriate error-correction term in (22) and (23) will lead to better results.

Table 7 Empirical coverage probabilities of \(I_n\) from (22) for \(\varDelta ^{(k)}\), \(k = 1, 2\), (at nominal level 0.9, with 10,000 replications)

7 Conclusions and outlines

Starting with Charles Stein’s insight that a random variable X has a standard normal distribution if, and only if,

$$\begin{aligned} \mathbb {E}\big [ f^{\prime }(X)\big ] = \mathbb {E}\big [ Xf(X) \big ] \end{aligned}$$

holds for any absolutely continuous function, we developed two classes of goodness-of-fit statistics for testing the normality hypothesis. We utilized the zero-bias transformation to bypass the problem of calculating an empirical property for all absolutely continuous functions. An advantage of the underlying zero-bias identity over many other types of transformation applied in goodness-of-fit testing, like the characteristic function or the Laplace transform, is that the distribution inserted into the mapping is not associated with a purely analytic quantity but is mapped to another distribution and, thereby, stays accessible to a stochastically intuitive examination (cf. Lemmata 1 and 4). The conducted power study suggests that our tests are serious competitors to established tests and even set new markers in terms of the highest power achieved for many asymmetrical alternatives. Both procedures are consistent against any alternative distribution satisfying a weak moment condition.

We want to emphasize that some problems remain open for further research. One issue concerns our choice of weight function. The integrals figuring in the second sum of \(G_{n,a}^{(2)}\) in (10), though they are accessible to stable numerical integration, are a slight drawback in terms of calculation time as compared to \(G_{n,a}^{(1)}\). It is conceivable to replace the term \(\omega (t) \mathrm {d}t\) in (5) and (6) by \(\mathrm {d}F(t)\) and to estimate F by the empirical distribution function. However, this type of test is not included in the framework for our theoretical results. Another question is whether there is some limiting statistic, as \(a \rightarrow \infty \), when considering \(G_{n, a}^{(2)}\) from Sect. 2. Finally, since we have not succeeded in calculating consistent estimators for \(\tau _{(1)}^2\) and \(\tau _{(2)}^2\) (see the Remark in Sect. 5), it remains to derive appropriate estimators and, in view of the results in Tables 6 and 7, to find better power approximations as well as suitable confidence intervals.