1 Introduction

We revisit the use of Stein’s method to prove uniform Berry–Esseen (B–E) bounds for Studentized nonlinear statistics. Let \(X_1, \dots , X_n\) be independent random variables that serve as some raw data, and for each \(i =1, \dots , n\), let

$$\begin{aligned} \xi _i \equiv g_{n, i} (X_i) \end{aligned}$$
(1.1)

for a function \(g_{n, i}(\cdot )\) that can also depend on i and n, such that

$$\begin{aligned} {\mathbb {E}}[\xi _i] = 0 \text { for all }i \text { and }\sum _{i=1}^n{\mathbb {E}}[\xi _i^2] = 1. \end{aligned}$$
(1.2)

A Studentized nonlinear statistic is an asymptotically normal statistic that can be represented in the general form

$$\begin{aligned} T_{SN} \equiv \frac{W_n + D_{1n}}{(1+ D_{2n})^{1/2}}, \end{aligned}$$
(1.3)

with \(W_n \equiv \sum _{i=1}^n \xi _i\), where the “remainder” terms

$$\begin{aligned} D_{1n} = D_{1n} (X_1, \dots , X_n) \text { and } D_{2n} = D_{2n} (X_1, \dots , X_n) \end{aligned}$$
(1.4)

are some functions of the data, with the additional properties that

$$\begin{aligned} D_{1n}, D_{2n} \longrightarrow 0 \text { in probability as }n\text { tends to }\infty , \text { and } D_{2n} \ge -1 \text { almost surely}.\nonumber \\ \end{aligned}$$
(1.5)

We adopt the convention that if \(1 + D_{2n} = 0\), the value of \(T_{SN}\) is taken to be 0, \(+ \infty \) or \(-\infty \) depending on the sign of \(W_n + D_{1n}\). Such a statistic is a generalization of the classical Student’s t-statistic [13], where the denominator \(1 + D_{2n}\) acts as a data-driven “self-normalizer” for the numerator \(W_n + D_{1n}\).

Many statistics used in practice can be seen as examples of (1.3); hence, developing a general Berry–Esseen-type inequality for \(T_{SN}\) is relevant to many applications. The first such attempt based on Stein’s method can be found in the semi-review article of Shao et al. [9], whose proof critically relies upon an exponential-type randomized concentration inequality first appearing in Shao [8]. However, while their methodology is sound, there are numerous gaps; most notably, Shao et al. [9] overlooked that the original exponential-type randomized concentration inequality of Shao [8] is developed for a sum of independent random variables with mean zero, which is not well suited for their proof wherein the truncated summands generally do not have mean 0. In fact, truncation itself is an insufficient device to carry the arguments involved, as will be explained in this article.

Our contributions are twofold. First, we put the methodology of Shao et al. [9] on solid footing; this, among other things, is accomplished by adopting variable censoring instead of truncation, as well as developing a modified randomized concentration inequality for a sum of censored variables, to rectify the gaps in their arguments. We also present a more user-friendly B–E bound for the statistic \(T_\textrm{SN}\) when the denominator remainder \(D_{2n}\) admits a certain standard form. Second, as an application to a prototypical example of Studentized nonlinear statistics, we establish a uniform B–E bound of the rate \(1/\sqrt{n}\) for Studentized U-statistics whose dependence on the degree of the kernel is also explicit; all prior works in this vein only treat the simplest case with a kernel of degree 2. This bound is the most optimal known to date and serves to complete the literature in uniform B–E bounds for Studentized U-statistics.

Notation. \(\Phi (\cdot )\) is the standard normal distribution function and \({\bar{\Phi }}(\cdot ) = 1- \Phi (\cdot )\). The indicator function is denoted by \(I(\cdot )\). For \(p \ge 1\), \(\Vert Y\Vert _p \equiv ({\mathbb {E}}[|Y|^p])^{1/p}\) for a random variable Y. For any \(a, b \in {\mathbb {R}}\), \(a \vee b = \max (a, b)\) and \(a \wedge b = \min (a, b)\). \(C, C_1, C_2 \cdots ..\) denotes positive absolute constants that may differ in value from place to place, but does not depend on other quantities nor the distributions of the random variables. For two (possibly multivariate) random variables \(Y_1\) and \(Y_2\), “\(Y_1 =_d Y_2\)” means \(Y_1\) and \(Y_2\) have the same distribution.

2 General Berry–Esseen Bounds for Studentized Nonlinear Statistics

Let \(\xi _1, \dots , \xi _m\) be as in Sect. 1 that satisfy the assumptions in (1.2). For each \(i = 1, \dots , n\), define

$$\begin{aligned} \xi _{b, i} \equiv \xi _i I(|\xi _i| \le 1) + I(\xi _i >1) - I(\xi _i <-1), \end{aligned}$$
(2.1)

an upper-and-lower censored version of \(\xi _i\), and their sum

$$\begin{aligned} W_b = W_{b, n}\equiv \sum _{i=1}^n \xi _{b, i}. \end{aligned}$$
(2.2)

Moreover, for each \(i = 1, \dots , n\), we define \(W_b^{(i)} \equiv W_b - \xi _{b, i}\) and \( W_n^{(i)} \equiv W_n - \xi _i\). We also let

$$\begin{aligned} \beta _2 \equiv \sum _{i=1}^n {\mathbb {E}}[\xi _i^2 I(|\xi _i| > 1)] \text { and } \beta _3 \equiv \sum _{i=1}^n {\mathbb {E}}[\xi _i^3 I(|\xi _i| \le 1)]. \end{aligned}$$

For any \(x \in {\mathbb {R}}\),

$$\begin{aligned} f_x (w) \equiv {\left\{ \begin{array}{ll} \sqrt{2 \pi } e^{w^2/2} \Phi (w){\bar{\Phi }}(x) &{} \quad w \le x\\ \sqrt{2 \pi } e^{w^2/2} \Phi (x) {\bar{\Phi }}(w) &{} \quad w > x \end{array}\right. }; \end{aligned}$$
(2.3)

is the solution to the Stein equation [12]

$$\begin{aligned} f_x'(w) - wf_x(w) = I(w \le x) - \Phi (x). \end{aligned}$$
(2.4)

Our first result is the following uniform Berry–Esseen bound for the Studentized nonlinear statistic in (1.3):

Theorem 2.1

(Uniform B–E bound for Studentized nonlinear statistics) Let \(X_1, \dots , X_n\) be independent random variables. Consider the Studentized nonlinear statistic \(T_\textrm{SN}\) in (1.3), constructed with the linear summands in (1.1) that satisfy the condition in (1.2), and the remainder terms in (1.4) that satisfy the condition in (1.5). There exists a positive absolute constant \(C > 0\) such that

$$\begin{aligned}{} & {} \sup _{x \in {\mathbb {R}}} \Big |P(T_{SN} \le x) - \Phi (x)\Big | \le \sum _{j=1}^2P(|D_{jn}| > 1/2) \nonumber \\{} & {} \quad + C \Bigg \{\beta _2 + \beta _3 + \Vert {\bar{D}}_{1n}\Vert _2 + {\mathbb {E}}\Big [(1 +e^{W_b}) {\bar{D}}_{2n}^2\Big ] + \sup _{x \ge 0} \Big |x {\mathbb {E}}[{\bar{D}}_{2n} f_x(W_b)]\Big | \nonumber \\{} & {} \quad + \sum _{j=1}^2 \sum _{i=1}^n \Big ( {\mathbb {E}}[\xi ^2_{b, i}] \Big \Vert (1+ e^{W_b^{(i)}})({\bar{D}}_{jn} - {\bar{D}}_{jn}^{(i)} )\Big \Vert _1 \nonumber \\{} & {} \quad + \Big \Vert \xi _{b, i} ( 1+e^{W_b^{(i)}/2} ) ( {\bar{D}}_{jn} - {\bar{D}}_{jn}^{(i)}) \Big \Vert _1\Big ) \Bigg \} , \end{aligned}$$
(2.5)

where for each \(j \in \{1, 2\}\) and each \(i \in \{1, \dots , n\}\),

  • \(D_{jn}^{(i)} \equiv D_{jn}^{(i)}(X_1, \dots , X_{i-1}, X_{i+1}, \dots , X_n)\) is any function in the raw data except \(X_i\);

  • \({\bar{D}}_{jn}\) is a censored version of \(D_{jn}\) defined as

    $$\begin{aligned} {\bar{D}}_{jn}\equiv D_{jn}I\Big (|D_{jn}| \le \frac{1}{2} \Big ) + \frac{1}{2}I\Big (D_{jn} > \frac{1}{2}\Big ) - \frac{1}{2} I\Big (D_{jn} <- \frac{1}{2}\Big ); \end{aligned}$$
  • \({\bar{D}}^{(i)}_{jn}\) is a censored version of \(D^{(i)}_{jn}\) defined as

    $$\begin{aligned} {\bar{D}}_{jn}^{(i)} \equiv D_{jn}^{(i)}I\Big (|D_{jn}^{(i)}| \le \frac{1}{2}\Big ) +\frac{1}{2} I\Big (D_{jn}^{(i)} > \frac{1}{2}\Big ) - \frac{1}{2}I\Big (D_{jn}^{(i)} <- \frac{1}{2}\Big ). \end{aligned}$$

In applications, \(D_{1n}^{(i)}\) and \(D_{2n}^{(i)}\) are typically taken as “leave-one-out” quantities constructed in almost identical manner as \(D_{1n}\) and \(D_{2n}\), respectively, but without any terms involving the datum \(X_i\), for instance, compared \(D_{1n}\) and \(D_{1n}^{(i)}\) in (3.12) and (3.27) for the case of a U-statistic. The proof of Theorem 2.1 (“Appendix C”) bypasses the gaps in the proof of the original B–E bound for \(T_\textrm{SN}\) stated in [9, Theorem 3.1]. As a key step in their approach to proving Shao et al. [9, Theorem 3.1] based on Stein’s method, the exponential-type randomized concentration inequality developed in Shao [8, Theorem 2.7] is applied to control a probability of the type

$$\begin{aligned} P\left( \Delta _1 \le \sum _{i=1}^n \xi _i I( |\xi _i| \le 1 ) \le \Delta _2\right) , \end{aligned}$$

where \(\Delta _1\) and \(\Delta _2\) are some context-dependent random quantities. Unfortunately, Shao et al. [9] overlooked that Shao [8, Theorem 2.7] was originally developed for a sum of mean-0 random variables, such as \(W_n\), instead of the sum \(\sum _{i=1}^n \xi _i I( |\xi _i| \le 1 )\) figuring in the prior display, whose truncated summands do not have mean 0 in general. The latter needs to be addressed in some way to mend their arguments, which leads to the exponential randomized concentration inequality (Lemma B.1) developed in this work for the sum \(W_b\) in (2.2). Here, the censored summands \(\xi _{b, i}\)’s are considered instead so that the new inequality can still be proved in much the same way as Shao [8, Theorem 2.7]; replacing the truncated \(\xi _i I( |\xi _i| \le 1 )\) with the censored \(\xi _{b, i}\) is otherwise permissible, because only the boundedness of the summands is essential under the approach.

The B–E bound stated in Theorem 2.1 is in a primitive form. When applied to specific examples of \(T_{SN}\), various terms in (2.5) have to be further estimated to render a more expressive bound. In that respect, the following apparent properties of censoring will become very useful:

Property 2.2

(Properties of variable censoring) Let Y and Z be any two real value variables. The following facts hold:

  1. (i)

    Suppose, for some \(a, b \in {\mathbb {R}}\cup \{-\infty , \infty \}\) with \(a \le b\),

    $$\begin{aligned} {\bar{Y}} \equiv a I(Y < a) + Y I(a \le Y \le b)+bI(Y>b) \end{aligned}$$

    and

    $$\begin{aligned} {\bar{Z}} \equiv a I(Z < a) + Z I(a \le Z \le b)+bI(Z>b). \end{aligned}$$

    Then it must be that \( |{\bar{Y}} - {\bar{Z}} | \le |Y- Z|. \)

  2. (ii)

    If Y is a non-negative random variable, then it must also be true that

    $$\begin{aligned} Y I(0 \le Y \le b)+bI(Y>b) \le Y \text { for any } b \in (0, \infty ), \end{aligned}$$

    i.e., the upper-censored version of Y is always no larger than Y itself.

In applications of Theorem 2.1, that \({\bar{D}}_{1n}\) and \({\bar{D}}_{1n}^{(i)}\) are lower-and-upper censored by the same interval \([-1/2, 1/2]\) implies the bound

$$\begin{aligned} |{\bar{D}}_{1n} - {\bar{D}}_{1n}^{(i)}| \le |{D}_{1n} - {D}_{1n}^{(i)}|, \end{aligned}$$
(2.6)

by virtue of Property 2.2(i), as well as

$$\begin{aligned} |{\bar{D}}_1| \le |D_1| \end{aligned}$$
(2.7)

by virtue of Property 2.2(ii) because \(|{\bar{D}}_1|\) is essentially the non-negative \(|D_1|\) upper-censored at 1/2. These bounds imply one can form the further norm estimates

$$\begin{aligned}{} & {} \Vert (1+ e^{W_b^{(i)}})({\bar{D}}_{1n} - {\bar{D}}_{1n}^{(i)} )\Vert _1 \le C \Vert D_{1n} - D_{1n}^{(i)}\Vert _2, \end{aligned}$$
(2.8)
$$\begin{aligned}{} & {} \Vert \xi _{b, i}( 1+ e^{W_b^{(i)}/2} ) ( {\bar{D}}_{1n} - {\bar{D}}_{1n}^{(i)}) \Vert _1 \le C \Vert \xi _i \Vert _2 \Vert D_{1n} - D_{1n}^{(i)}\Vert _2 \end{aligned}$$
(2.9)

and

$$\begin{aligned} \Vert {\bar{D}}_1\Vert _2 \le \Vert D_1\Vert _2, \end{aligned}$$
(2.10)

for the terms in (2.5) related to the numerator remainder \(D_1\); see “Appendix D” for the simple arguments leading to these bounds. The right-hand sides of (2.8)–(2.10) are then amenable to direct second moment calculations to render more expressive terms. We also remark that if, instead, the truncated remainder terms

$$\begin{aligned} D_{jn}I\Big (|D_{jn}| \le \frac{1}{2} \Big ) \text { and } D_{jn}^{(i)}I\Big (|D_{jn}^{(i)}| \le \frac{1}{2} \Big ), \text { for } j =1, 2, \end{aligned}$$
(2.11)

are adopted as in Shao et al. [9, Theorem 3.1], a bound analogous to (2.6) does not hold in general; this also attests to censoring as a useful tool for developing nice B–E bounds under the current approach.

In comparison with the terms related to \(D_1\), some of the terms related to \(D_2\) in (2.5), such as

$$\begin{aligned} \sup _{x \ge 0}|x {\mathbb {E}}[{\bar{D}}_{2n} f_x(W_b)]| \text { and }{\mathbb {E}}[e^{W_b}{\bar{D}}_{2n}^2] , \end{aligned}$$

are more obscure and have to be estimated on a case-by-case basis for specific examples of \(T_\textrm{SN}\). However, in certain applications, the denominator remainder can be perceivably manipulated into the form

$$\begin{aligned} D_{2n} = \max \Big (-1, \quad \Pi _1+ \Pi _2\Big ) \end{aligned}$$
(2.12)

lower censored at \(-1\), where \(\Pi _1\) is defined as

$$\begin{aligned} \Pi _1 \equiv \sum _{i =1}^n \Big ( \xi _{b, i}^2 -{\mathbb {E}}[\xi _{b,i}^2] \Big ), \end{aligned}$$
(2.13)

and \(\Pi _2 \equiv \Pi _2 (X_1, \dots , X_n)\) is another data-dependent term. For instance, if a non-negative self-normalizer \(1 + D_{2n}\) can be written as the intuitive form

$$\begin{aligned} 1 + D_{2n} = \sum _{i=1}^n \xi _i^2 + E \end{aligned}$$

for a data-dependent term \(E \equiv E(X_1, \dots , X_n)\) of perceivably smaller order, then \(D_{2n}\) can be cast into the form (2.12) because \(\sum _{i=1}^n({\mathbb {E}}[\xi _{b,i}^2] + {\mathbb {E}}[ (\xi _i^2 - 1) I(|\xi _i|> 1)] )=\sum _{i=1}^n {\mathbb {E}}[\xi _i^2]= 1\) and one can take

$$\begin{aligned} \Pi _2 = E - \sum _{i=1}^n{\mathbb {E}}[ (\xi _i^2 - 1) I(|\xi _i|> 1)] + \sum _{i=1}^n (\xi _i^2 - 1) I(|\xi _i|> 1). \end{aligned}$$

We now present a more refined version of Theorem 2.1 for Studentized nonlinear statistics whose \(D_{2n}\) admits the form (2.12) under an absolute third-moment assumption on \(\xi _i\); the proof is included in “Appendix D”.

Theorem 2.3

(Uniform B–E bound for Studentized nonlinear statistics with the denominator remainder (2.12) under a third moment assumption) Suppose all the conditions in Theorem 2.1 are met, and that \({\mathbb {E}}[|\xi _i|^3] < \infty \) for all \(1 \le i \le n\). In addition, assume \(D_{2n}\) takes the specific form (2.12) with \(\Pi _1\) defined in (2.13) and \(\Pi _2 \equiv \Pi _2(X_1, \dots , X_n)\) being a function in the raw data \(X_1, \dots , X_n\). For each \(i = 1, \dots , n\), let

$$\begin{aligned} \Pi _2^{(i)} \equiv \Pi _2^{(i)}(X_1, \dots , X_{i-1}, X_{i+1}, \dots , X_n) \end{aligned}$$

be any function in the raw data except \(X_i\). Then

$$\begin{aligned} \sup _{x \in {\mathbb {R}}} \Big |P(T_{SN} \le x) - \Phi (x)\Big |\le & {} C \Bigg \{ \sum _{i=1}^n {\mathbb {E}}[|\xi _i|^3] + \Vert D_{1n}\Vert _2 + \Vert \Pi _2\Vert _2 \nonumber \\{} & {} + \sum _{i=1}^n \Vert \xi _i\Vert _2 \Vert D_{1n}- D_{1n}^{(i)}\Vert _2\nonumber \\{} & {} + \sum _{i=1}^n \Vert \xi _i\Vert _2\Vert \Pi _2 - \Pi _2^{(i)}\Vert _2 \Bigg \}, \end{aligned}$$
(2.14)

where \(D_{1n}^{(i)} \equiv D_{1n}^{(i)}(X_1, \dots , X_{i-1}, X_{i+1}, \dots , X_n)\) is as in Theorem 2.1.

The \(\Vert \cdot \Vert _2\) terms in (2.14) are now amenable to direct second moment calculations. Hence, if one can cast the denominator remainder into the form (2.12), Theorem 2.3 provides a user-friendly framework to establish B–E bounds for such instances of \(T_\textrm{SN}\).

3 Uniform Berry–Esseen Bound for Studentized U-Statistics

We will apply Theorem 2.3 to establish a uniform B–E bound of the rate \(1/\sqrt{n}\) for Studentized U-statistics of any degree; all prior works in this vein [2, 4, 5, 9, 15] only offer bounds for Studentized U-statistics of degree 2. We refer the reader to Shao et al. [9] and Jing et al. [5] for other examples of applications, including L-statistics and random sums and functions of nonlinear statistics.

Given independent and identically distributed random variables \(X_1, \dots , X_n\) taking value in a measure space \(({\mathcal {X}}, \Sigma _{{\mathcal {X}}})\), a U-statistic of degree \(m \in {\mathbb {N}}_{\ge 1}\) takes the form

$$\begin{aligned} U_n = {n \atopwithdelims ()m}^{-1} \sum _{1 \le i_1< \dots < i_m \le n} h(X_{i_1} , \dots , X_{i_m}), \end{aligned}$$

where \(h : {\mathcal {X}}^m \longrightarrow {\mathbb {R}}\) is a real-valued function symmetric in its m arguments, also known as the kernel of \(U_n\); throughout, we will assume that

$$\begin{aligned} {\mathbb {E}}[h(X_1, \dots , X_m)] = 0, \end{aligned}$$
(3.1)

as well as

$$\begin{aligned} 2m <n. \end{aligned}$$
(3.2)

An important related function of \(h(\cdot ) \) is the canonical function

$$\begin{aligned} g(x) = {\mathbb {E}}[h( X_1, \dots , X_{m-1}, x)] = {\mathbb {E}}[h(X_1, \dots , X_m)|X_m = x], \end{aligned}$$

which determines the first-order asymptotic behavior of the U-statistic. We will only consider non-degenerate U-statistics, which are U-statistics with the property that

$$\begin{aligned} \sigma _g^2 \equiv \text {var}[g(X_1)] > 0. \end{aligned}$$

It is well known that when \({\mathbb {E}}[h^2(X_1, \dots , X_m)] < \infty \), \(\frac{\sqrt{n} U_n}{m \sigma _g}\) converges weakly to the standard normal distribution as n tends to infinity [6, Theorem 4.2.1]; however, the limiting variance \(\sigma _g^2\) is typically unknown and has to be substituted with a data-driven estimate. By constructing

$$\begin{aligned} q_i \equiv \frac{1}{{n-1 \atopwithdelims ()m-1}} \sum _{\begin{array}{c} 1 \le i_1< \dots < i_{m-1} \le n\\ i_l \ne i \text { for } l = 1, \dots , m-1 \end{array}} h(X_i, X_{i_1}, \dots , X_{i_{m-1}}), \qquad i = 1, \dots , n, \end{aligned}$$

as natural proxies for \(g(X_1), \dots , g(X_n)\), the most common jackknife estimator for \(\sigma _g^2\) is

$$\begin{aligned} s_n^ 2\equiv \frac{n-1}{(n-m)^2} \sum _{i=1}^n (q_i - U_n)^2 \end{aligned}$$

[1], which gives rise to the Studentized U-statistic

$$\begin{aligned} T_n \equiv \frac{\sqrt{n} U_n}{m s_n}. \end{aligned}$$

Without any loss of generality, we will assume that

$$\begin{aligned} \sigma _g^2 = 1, \end{aligned}$$
(3.3)

as one can always replace \(h(\cdot )\) and \(g(\cdot )\), respectively, by \(h(\cdot )/\sigma _g\) and \(g(\cdot )/\sigma _g\) without changing the definition of \(T_n\). Moreover, for \(s^*_n\) defined as

$$\begin{aligned} {s^*_n}^2 \equiv \frac{n-1}{(n-m)^2} \sum _{i=1}^n q_i^2, \end{aligned}$$

we will also consider the statistic

$$\begin{aligned} T_n^* \equiv \frac{\sqrt{n} U_n}{m s^*_n}. \end{aligned}$$
(3.4)

For any \(x\in {\mathbb {R}}\), the event-equivalence relationship

$$\begin{aligned} \{T_n> x\} = \left\{ T_n^* > \frac{x }{\left( 1 + \frac{m^2 (n-1)x^2}{(n-m)^2}\right) ^{1/2}} \right\} \end{aligned}$$
(3.5)

is known in the literature; see [7, 10] for instance.

We now state a uniform Berry–Esseen bound for \(T_n\) and \(T_n^*\). In the sequel, for any \(k \in \{1, \dots , n\}\) and \(p \ge 1\), where no ambiguity arises, we may use \({\mathbb {E}}[\ell ]\) and \(\Vert \ell \Vert _p\) as the respective shorthands for \({\mathbb {E}}[\ell (X_1, \dots , X_k)]\) and \(\Vert \ell (X_1, \dots , X_k)\Vert _p\), for a given function \(\ell : {\mathcal {X}}^k \longrightarrow {\mathbb {R}}\) in k arguments. For example, we may use \({\mathbb {E}}[|h|^3]\) and \(\Vert h\Vert _3\) to, respectively, denote the third absolute moment and 3-norm of \(h(X_1, \dots , X_m)\) with inserted data, and \({\mathbb {E}}[g^2] = \Vert g\Vert _2^2 = \sigma _g^2 = 1\) under (3.1) and (3.3).

Theorem 3.1

(Berry–Esseen bound for Studentized U-statistics) Let \(X_1, \dots , X_n\) be independent and identically distributed random variables taking value in a measure space \(({\mathcal {X}}, \Sigma _{{\mathcal {X}}})\). Assume (3.1)–(3.3) and

$$\begin{aligned} {\mathbb {E}}[|h|^3] < \infty , \end{aligned}$$
(3.6)

then the following Berry–Esseen bound holds:

$$\begin{aligned} \sup _{x \in {\mathbb {R}}}|P(T_n \le x) - \Phi (x)| \le C\frac{ {\mathbb {E}}[|g|^3] + m( {\mathbb {E}}[h^2] + \Vert g\Vert _3\Vert h\Vert _3)}{\sqrt{n}} \end{aligned}$$
(3.7)

for a positive absolute constant C; (3.7) also holds with \(T_n\) replaced by \(T_n^*\).

To the best of our knowledge, this bound is the most optimal to date in the following sense: improving upon the preceding works of [2, 4, 15], for Studentized U-statistics of degree 2, under the same assumptions as Theorem 3.1, Jing et al. [5, Theorem 3.1] state a bound of the form

$$\begin{aligned} \sup _{x \in {\mathbb {R}}} |P(T_n \le x) - \Phi (x)| \le C \frac{ {\mathbb {E}}[|h(X_1, X_2)|^3]}{\sqrt{n}} \end{aligned}$$

for an absolute constant \(C > 0\). In comparison, (3.7) is more optimal for \(m =2\) because all the moment quantities

$$\begin{aligned} {\mathbb {E}}[|g(X_1)|^3] , \quad {\mathbb {E}}[|h(X_1, X_2)|^2] \text { and } \Vert g(X_1)\Vert _3\Vert h(X_1, X_2)\Vert _3 \end{aligned}$$

from (3.7) are all no larger than \({\mathbb {E}}[|h(X_1, X_2)|^3]\), given the standard moment properties for U-statistics; see (3.10).

In addition, we remark that the original B–E bound for Studentized U-statistics of degree 2 in Shao et al. [9, Theorem 4.2 & Remark 4.1] may have been falsely stated. Given (3.1)–(3.3), for an absolute constant \(C >0\), they stated a seemingly better bound (than (3.7)) of the form

$$\begin{aligned} \sup _{x \in {\mathbb {R}}} |P(T_n \le x) - \Phi (x)| \le C \frac{\Vert h(X_1, X_2)\Vert _2+ {\mathbb {E}}[|g(X_1)|^3]}{\sqrt{n}}, \end{aligned}$$

under the weaker assumption (than (3.6)) that \(\Vert g(X_1)\Vert _3\vee \Vert h(X_1, X_2)\Vert _2 < \infty \)Footnote 1. Unfortunately, the latter assumption is inadequate under the current approach based on Stein’s method. The main issue is that Shao et al. [9] have ignored crucial calculations that require forming estimates of the rate O(1/n) for an expectation of the type

$$\begin{aligned} {\mathbb {E}}[\xi _{b, 1} \xi _{b, 2} {\bar{h}}_2 (X_{i_1}, X_{i_2}) {\bar{h}}_2(X_{j_1}, X_{j_2})], \end{aligned}$$

where \(1 \le i_1 < i_2\le n\) and \(1 \le j_1 < j_2\le n\) are two pairs of sample indices, and \({\bar{h}}_2(\cdot )\) is the second-order canonical function in the Hoeffding’s decomposition of \(U_n\) for \(m = 2\); see (3.9). To do so, we believe one cannot do away with a third moment assumption on the kernel as in (3.6), where the anxious reader can skip ahead to Lemma E.1(iii) and (iv) for a preview of our estimates. Our proof of Theorem 3.1 rectifies such errors; moreover, it generalizes to a kernel of any degree m, for which the enumerative calculations needed are considerably more involved.

We first set the scene for establishing Theorem 3.1, by letting

$$\begin{aligned} \xi _i = \frac{g(X_i)}{\sqrt{n}} \end{aligned}$$
(3.8)

and defining

$$\begin{aligned} {\bar{h}}_k(x_1 \dots , x_k) = h_k(x_1 \dots , x_k) - \sum _{i=1}^k g(x_i) \text { for } k = 1, \dots , m, \end{aligned}$$
(3.9)

where

$$\begin{aligned} h_k(x_1, \dots , x_k) = {\mathbb {E}}[h(X_1, \dots , X_m) |X_1 = x_1, \dots , X_k = x_k ]; \end{aligned}$$

in particular, \(g(x) = h_1(x)\) and \(h(x_1, \dots , x_m) = h_m(x_1, \dots , x_m)\). An important property of the functions \(h_k\) is that

$$\begin{aligned} {\mathbb {E}}\big [ |h_k|^p\big ] \le {\mathbb {E}}\big [ |h_{k'}|^p\big ] \text { for any } p \ge 1 \text { and } k \le k', \end{aligned}$$
(3.10)

which is a consequence of Jensen’s inequality:

$$\begin{aligned} {\mathbb {E}}\Big [ |h_{k}(X_1, \dots , X_k )|^p\Big ]&= {\mathbb {E}}\Big [ |{\mathbb {E}}[h(X_1, \dots , X_m) |X_1, \dots , X_k]|^p\Big ] \\&= {\mathbb {E}}\Big [ \Big |{\mathbb {E}}[h_{k'}(X_1, \dots , X_{k'})| X_1, \dots , X_k]\Big |^p\Big ] \\&\le {\mathbb {E}}\Big [ {\mathbb {E}}\Big [|h_{k'}(X_1, \dots , X_{k'})|^p \mid X_1, \dots , X_k\Big ]\Big ] = {\mathbb {E}}\Big [ |h_{k'}(X_1, \dots , X_{k'})|^p\Big ]. \end{aligned}$$

One can then write the part of (3.4) without the Studentizer \(s^*_n\) as

$$\begin{aligned} \frac{\sqrt{n} U_n}{m} = W_n + D_{1 n}, \end{aligned}$$
(3.11)

where \(W_n \equiv \sum _{i=1}^n \xi _i\) and

$$\begin{aligned} D_{1n} \equiv {n -1 \atopwithdelims ()m -1}^{-1} \sum _{1 \le i_1< \dots < i_m \le n} \frac{{\bar{h}}_m(X_{i_1}, X_{i_2}, \dots , X_{i_m}) }{\sqrt{n}}, \end{aligned}$$
(3.12)

are considered as the numerator components under the framework of (1.3). To handle \(s^*_n\), we shall first define

$$\begin{aligned} \Psi _{n, i} = \sum _{\begin{array}{c} 1 \le i_1< \dots < i_{m-1} \le n\\ i_l \ne i \text { for } l = 1, \dots , m-1 \end{array}} \frac{{\bar{h}}_m(X_i, X_{i_1}, \dots , X_{i_{m-1}}) }{\sqrt{n}} \end{aligned}$$

and write

$$\begin{aligned} q_i&= \frac{1}{{n-1 \atopwithdelims ()m-1}} \sum _{\begin{array}{c} 1 \le i_1< \dots < i_{m-1} \le n\\ i_l \ne i \text { for } l = 1, \dots , m-1 \end{array}} \left[ g(X_i) + \sum _{l=1}^{m-1} g(X_{i_l})+ {\bar{h}}_m(X_i, X_{i_1}, \dots , X_{i_{m-1}}) \right] \\&= \sqrt{n}\left[ \left( \frac{n-m}{n-1}\right) \xi _i+ \frac{m-1}{n-1}W_n\right] + \frac{\sqrt{n}}{{n-1 \atopwithdelims ()m-1}} \Psi _{n, i} \end{aligned}$$

for each i. By further letting

$$\begin{aligned} \Lambda _n^2 = \sum _{i=1}^n \Psi _{n, i}^2 \quad \text { and } \quad V_n^2 = \sum _{i=1} ^n \xi _i^2, \end{aligned}$$

the sum \(\sum _{i=1}^n q_i^2\) can be consequently written as

$$\begin{aligned} \sum _{i=1}^n q_i^2= & {} n \left( \frac{n-m}{n-1}\right) ^2 V_n^2 + \left[ n^2 \left( \frac{m-1}{n-1}\right) ^2 + \frac{2n(n-m)(m-1)}{(n-1)^2}\right] W_n^2 \\{} & {} + \frac{n}{{n-1 \atopwithdelims ()m-1}^2} \Lambda _n^2 + 2 n \left( \frac{n-m}{n-1}\right) {n-1 \atopwithdelims ()m-1}^{-1} \sum _{i=1}^n \xi _i \Psi _{n,i} + \frac{2 n (m-1)}{(n-1){n-1 \atopwithdelims ()m-1}} \sum _{i=1}^nW_n \Psi _{n, i}, \end{aligned}$$

which implies one can re-express \({s_n^*}^2\) as

$$\begin{aligned} {s_n^*}^2 = d_n^2 (V_n^2 + \delta _{1n} + \delta _{2n}) \quad \text { for }\quad d_n^2 \equiv \frac{n}{n-1} \end{aligned}$$
(3.13)

for

$$\begin{aligned} \delta _{1n}= & {} \left[ \frac{ n(m-1)^2}{(n-m)^2} + \frac{2(m-1)}{(n-m)}\right] W_n^2 + \frac{(n-1)^2}{ {n-1 \atopwithdelims ()m-1}^2 (n-m)^2} \Lambda _n^2 \nonumber \\{} & {} + \frac{2(n-1)(m-1) }{(n-m)^2 {n-1 \atopwithdelims ()m-1}} \sum _{i = 1}^n W_n \Psi _{n, i} \end{aligned}$$
(3.14)

and

$$\begin{aligned} \delta _{2n} \equiv \frac{2 (n-1) }{(n-m)} {n-1 \atopwithdelims ()m-1}^{-1} \sum _{i=1}^n \xi _i \Psi _{n,i}. \end{aligned}$$

We now present the proof of Theorem 3.1.

Proof of Theorem 3.1

It suffices to consider \(x \ge 0\) since otherwise one can replace \(h(\cdot )\) by \(- h(\cdot )\). Defining

$$\begin{aligned} b_n = \frac{ m^2(n-1)}{ (n-m)^2} \text { and } a_{n, x} = a_{n}(x) = \frac{1}{ (1 + b_n x^2)^{1/2}}, \end{aligned}$$

we first simplify the problem using the bound

$$\begin{aligned} | {\bar{\Phi }}(x a_n(x)) - {\bar{\Phi }}(x)| \le \min \left( \frac{ m^2(n-1)x^3}{\sqrt{2 \pi } (n-m)^2} , \frac{2}{\max (2 , \sqrt{2 \pi } x a_{n, x})} \right) \exp \left( \frac{-x^2 a_{n, x}^2}{2}\right) ,\nonumber \\ \end{aligned}$$
(3.15)

which will be shown by a “bridging argument” borrowed from Jing et al. [5] at the end of this section. Then, by the triangular inequality, (3.5) and (3.15),

$$\begin{aligned}&|P(T_n \le x) - \Phi (x)|\nonumber \\&\quad = |P(T_n> x) - {\bar{\Phi }}(x)| \nonumber \\&\quad \le |P( T_n^*> x a_n(x) ) - {\bar{\Phi }}(x a_n(x))| + | {\bar{\Phi }}(x a_n(x)) - {\bar{\Phi }}(x)| \nonumber \\&\quad \le |P( T_n^*> x a_n(x) ) - {\bar{\Phi }}(x a_n(x))| \nonumber \\&\qquad + \min \left( \frac{ m^2(n-1)x^3}{\sqrt{2 \pi } (n-m)^2} , \frac{2}{\max (2 , \sqrt{2 \pi } x a_{n, x})} \right) \exp \left( \frac{-x^2 a_{n, x}^2}{2}\right) \nonumber \\&\quad \le |P( T_n^* > x a_n(x) ) - {\bar{\Phi }}(x a_n(x))| +C\frac{m^2}{\sqrt{n}}, \end{aligned}$$
(3.16)

where the last inequality in (3.16) holds as follows: For \(0 \le x \le n^{1/6}\), the term

$$\begin{aligned} \frac{ m^2(n-1)x^3}{\sqrt{2 \pi } (n-m)^2} \le \frac{m^2(n-1) \sqrt{n}}{\sqrt{2 \pi } (n-m)^2} \le \frac{m^2(n-1) \sqrt{n}}{\sqrt{2 \pi } (n-n/2)^2} \le \frac{2 \sqrt{2} m^2}{\sqrt{\pi n}}. \end{aligned}$$

For \(n^{1/6}< x < \infty \), since \(x a_n(x)\) is strictly increasing in \(x \in [0, \infty )\), we have that

$$\begin{aligned} \exp (-x^2 a_{n, x}^2 /2)&\le \exp (-n^{1/3} (1 + b_n n^{1/3})^{-1} /2)\\&\le \exp \bigg (-\frac{n^{1/3}}{2} \left( 1 + \frac{4 m^2 (n-1)n^{1/3}}{n^2}\right) ^{-1}\bigg ) \\&\underbrace{\le }_{\text {by }(\text {3.2})} \exp \left( - \frac{n^{1/3}}{2 (1 + (2m)^{4/3})} \right) \le \exp \big (- \frac{n^{1/3}}{8m^{4/3}}\big ) \le \frac{Cm^2}{\sqrt{n}}. \end{aligned}$$

Since

$$\begin{aligned} m = m{\mathbb {E}}[g^2] \le {\mathbb {E}}[h^2] \end{aligned}$$
(3.17)

by (3.3) and a classical U-statistic moment bound [6, Lemma 1.1.4], in light of (3.16), to prove (3.7) it suffices to show

$$\begin{aligned} |P(T_n^* > x) - {\bar{\Phi }}(x)| \le C\frac{{\mathbb {E}}[|g|^3]+ m ( {\mathbb {E}}[h^2] + \Vert g\Vert _3\Vert h\Vert _3)}{\sqrt{n}}, \end{aligned}$$
(3.18)

as we have claimed to also hold in Theorem 3.1.

Note that since \(2 |W_n \sum _{i=1}^n \Psi _{n, i}| \le 2 \sqrt{n} |W_n| \Lambda _n\) by Cauchy’s inequality,

$$\begin{aligned}{} & {} \frac{2(n-1)(m-1) }{(n-m)^2 {n-1 \atopwithdelims ()m-1}} \Bigg |\sum _{i = 1}^n W_n \Psi _{n, i}\Bigg | \le 2 \Bigg \{ \frac{\sqrt{n}(m-1)}{n-m}|W_n| \Bigg \} \Bigg \{\frac{(n-1)}{{n-1 \atopwithdelims ()m-1}(n-m) } \Lambda _n \Bigg \}\nonumber \\{} & {} \quad \le \frac{n (m-1)^2}{(n-m)^2} W_n^2 + \frac{(n-1)^2}{{n-1 \atopwithdelims ()m-1}^2 (n-m)^2 } \Lambda _n^2, \end{aligned}$$
(3.19)

and hence we can deduce from (3.14) that

$$\begin{aligned} \delta _{1n} \ge 0. \end{aligned}$$
(3.20)

With (3.11) and (3.13), one can then rewrite \(T_n^*\) as

$$\begin{aligned} T_n^*&= \frac{W_n + D_{1n} }{ d_n \sqrt{V_n^2 + \delta _{1n} + \delta _{2n}}}. \end{aligned}$$

Now, consider the related statistic

$$\begin{aligned} {\tilde{T}}^*_n = \frac{W_n + D_{1n}}{ \{\max (0, V_{n,b}^2 + \delta _{1n, b} + \delta _{2n, b})\}^{1/2}}, \end{aligned}$$

with suitably censored components in the denominator defined as

$$\begin{aligned} V^2_{n, b}= & {} \sum _{i =1}^n \xi _{b, i}^2, \quad \delta _{1n, b} = \min (\delta _{1n}, n^{-1/2}) \quad \text { and } \quad \delta _{2n, b} = \frac{2 (n-1)}{(n-m)} {n-1 \atopwithdelims ()m-1}^{-1} \sum _{i=1}^n \xi _{b, i} \Psi _{n,i}, \end{aligned}$$

Note that \(T_n^* \) and \({\tilde{T}}^*_n \) can be related by the inclusions of events

$$\begin{aligned} \{{\tilde{T}}^*_n \le d_n x\}\backslash {\mathcal {E}}\subset \{T^{*}_n \le x\} \subset \{{\tilde{T}}^*_n \le d_n x\}\cup {\mathcal {E}}, \end{aligned}$$

where \({\mathcal {E}}\equiv \{\max _{1\le i \le n} |\xi _i|> 1\} \cup \{|\delta _{1n}| > n^{-1/2}\}\). The latter fact implies

$$\begin{aligned} |P(T^{*}_n \le x) - \Phi (x)|&\le |P({\tilde{T}}^*_n \le d_n x) - \Phi (x)| + P({\mathcal {E}}) \nonumber \\&\le |P({\tilde{T}}^*_n \le d_n x) - \Phi (x)| + \sum _{i=1}^n P(|\xi _i|> 1) + P(|\delta _{1n}| > n^{-1/2})\nonumber \\&\le |P({\tilde{T}}^*_n \le d_n x) - \Phi (x)| + \beta _2 + \sqrt{n}{\mathbb {E}}[ |\delta _{1n}|] \nonumber \\&\le |P({\tilde{T}}^*_n \le d_n x) - \Phi (x)| + \frac{ {\mathbb {E}}[|g|^3]}{\sqrt{n}}+C\frac{ m {\mathbb {E}}[ h^2]}{\sqrt{n}} , \end{aligned}$$
(3.21)

with (3.21) coming from \(\beta _2 \le \sum _{i=1}^n {\mathbb {E}}[\xi _i^3] = {\mathbb {E}}[|g|^3]/\sqrt{n}\), as well as combining (3.19) with (3.14) as:

$$\begin{aligned}&{\mathbb {E}}[|\delta _{1n} |]\\&\quad \le 2 \left[ \frac{ m (m-1) (n -1)}{(n-m)^2} \right] {\mathbb {E}}[W_n^2] + \frac{2(n-1)^2}{ {n-1 \atopwithdelims ()m-1}^ 2 (n-m)^2 }{\mathbb {E}}[\Lambda _n^2 ]\\&\quad = 2 \left[ \frac{ m (m-1) (n -1)}{(n-m)^2} \right] \\&\qquad + \frac{2(n-1)^2}{ {n-1 \atopwithdelims ()m-1}^ 2 (n-m)^2 } {\mathbb {E}}\left[ \left( \sum _{2 \le i_1< \dots < i_{m-1} \le n}{\bar{h}}_m(X_1, X_{i_1}, \dots , X_{i_{m-1}}) \right) ^2\right] \\&\quad \le \Bigg (\frac{8m}{n} + \frac{4(n-1)^2(m-1)^2}{ (n-m)^2(n-m+1)m } \Bigg ) {\mathbb {E}}[ h^2], \end{aligned}$$

where the last inequality follows from (3.17) and \(2m < n\), as well as a standard U-statistic bound in Lemma E.1(ii).

In light of (3.21), to prove (3.18), it suffices to bound \(|P({\tilde{T}}^*_n \le d_n x) - \Phi (x)|\). To this end, we first define

$$\begin{aligned} \check{T}^{**}_n = \frac{W_n + D_{1n}}{ \{\max (0, V_{n,b}^2 + \delta _{2n, b})\}^{1/2}} \end{aligned}$$

and

$$\begin{aligned} {\hat{T}}^{**}_n = \frac{W_n + D_{1n}}{ \{\max (0, V_{n,b}^2 + n^{-1/2} + \delta _{2n, b})\}^{1/2}}, \end{aligned}$$

which, by (3.20), have the property

$$\begin{aligned} P(\check{T}^{**}_n \le d_n x) \le P({\tilde{T}}^*_n \le d_n x) \le P({\hat{T}}^{**}_n \le d_n x ) \end{aligned}$$
(3.22)

Hence, to establish a bound for \(|P({\tilde{T}}^*_n \le d_n x) - \Phi (x)|\), our strategy is to prove the same bound for \(|P(\check{T}^{**}_n \le d_n x) - \Phi (d_n x)|\) and \(|P({\hat{T}}^{**}_n \le d_n x) - \Phi (d_n x)|\), as well as using the bound

$$\begin{aligned} |\Phi (d_n x) - \Phi (x)| = \phi (x')(d_n x - x) \le C (d_n- 1) \le C n^{-1/2}, \end{aligned}$$
(3.23)

coming from the mean value theorem, where \(x' \in (x, d_nx)\) and \(x\phi (x')\) is a bounded function in \(x \in [0, \infty )\). To simplify notation, we will put \(\check{T}^{**}_n\) and \({\hat{T}}^{**}_n\) under one umbrella and define their common placeholder

$$\begin{aligned} T_n^{**} = \frac{W_n + D_{1n}}{(1 + D_{2n})^{1/2}}, \end{aligned}$$
(3.24)

where

$$\begin{aligned} D_{2n} \equiv \max (-1 , V_{n,b}^2 - 1+ (n^{-1/2}|0)+ \delta _{2n, b}) \end{aligned}$$
(3.25)

and for \(a, b \in {\mathbb {R}}\), (a|b) represents either a or b; so \(T_n^{**}\) is either \({\hat{T}}_n^{**}\) or \(\check{T}_n^{**}\).

Now, we cast (3.25) into the form (2.12) by taking \(\Pi _1 = V_{n, b}^2 - \sum _{i=1}^n {\mathbb {E}}[\xi _{b,i}^2]\) and

$$\begin{aligned} \Pi _2 = \delta _{2n, b} + (n^{-1/2}|0) - \sum _{i=1}^n {\mathbb {E}}[ (\xi _i^2 - 1)I(|\xi _i|> 1) ] \end{aligned}$$
(3.26)

In order to apply Theorem 2.3 to bound \(|P(T^{**}_n \le d_n x) - \Phi (d_n x)|\), we will let \(D_{1n}^{(i)}\) and \(\Pi _2^{(i)}\), respectively, to be the “leave-one-out” versions of \(D_{1n}\) and \(\Pi _2\) in (3.12) and (3.26) that omit all the terms involving \(X_i\), i.e,

$$\begin{aligned} D_{1n}^{(i)} \equiv {n -1 \atopwithdelims ()m -1}^{-1} \sum _{\begin{array}{c} 1 \le i_1< \dots < i_m \le n\\ i_l \ne i \text { for } l = 1, \dots , m \end{array}} \frac{{\bar{h}}_m(X_{i_1}, X_{i_2}, \dots , X_{i_m}) }{\sqrt{n}} \end{aligned}$$
(3.27)

and

$$\begin{aligned} \Pi _2^{(i)} \equiv \delta _{2n, b}^{(i)} + (n^{-1/2}|0) - \sum _{\begin{array}{c} j =1\\ j \ne i \end{array}}^n{\mathbb {E}}[ (\xi _j^2 - 1)I(|\xi _j|> 1) ] \end{aligned}$$
(3.28)

for

$$\begin{aligned} \delta _{2n, b}^{(i)} \equiv \frac{2 (n-1) }{\sqrt{n}(n-m)} {n-1 \atopwithdelims ()m-1}^{-1} \sum _{\begin{array}{c} j =1\\ j \ne i \end{array}}^n \xi _{b, j} \sum _{\begin{array}{c} 1 \le i_1< \dots < i_{m-1} \le n \\ i_l \ne j, i \text { for } l = 1, \dots , m-1 \end{array}} {\bar{h}}_m (X_j, X_{i_1}, \dots , X_{i_{m-1}}). \end{aligned}$$

We also need the following bounds:

Lemma 3.2

(Moment bounds related to \(D_{1n}\) in (3.12)) Let \(D_{1n}\) and \(D_{1n}^{(i)}\) be defined as in (3.12) and (3.27). Under the assumptions of Theorem 3.1, the following hold:

$$\begin{aligned} \Vert D_{1n}\Vert _2 \le \frac{(m-1) \Vert h\Vert _2}{\sqrt{m(n-m+1)}}, \end{aligned}$$
(3.29)

and

$$\begin{aligned} \Vert D_{1n} - D_{1n}^{(i)} \Vert _2 \le \frac{\sqrt{2} (m-1) \Vert h\Vert _2}{\sqrt{nm(n-m+1)}} \end{aligned}$$
(3.30)

Proof of Lemma 3.2

This is known in the literature. Refer to Chen et al. [3, Lemma 10.1] for a proof. \(\square \)

Lemma 3.3

(Moment bounds related to \(\Pi _2\) in (3.26)) Consider \(\Pi _2\) and \(\Pi _2^{(i)}\) defined in (3.26) and (3.28). Under the assumptions of Theorem 3.1, the following bounds hold:

  1. (i)
    $$\begin{aligned} \Vert \Pi _2\Vert _2 \le C\frac{\Vert g\Vert _3^3 + m\Vert g\Vert _3 \Vert h\Vert _3}{\sqrt{n}}, \end{aligned}$$

    and

  2. (ii)
    $$\begin{aligned} \Vert \Pi _2 - \Pi _2^{(i)} \Vert _2 \le C \frac{m\Vert g\Vert _3\Vert h\Vert _3 + m^{1.5} \sqrt{\Vert h\Vert _2}}{n} \end{aligned}$$

The proof of Lemma 3.3 is deferred to Appendix E. One can then apply Theorem 2.3, along with Lemmas 3.2 and 3.3 as well as (3.17), to give the bound

$$\begin{aligned} |P(T^{**}_n \le d_n x) - \Phi (d_nx)| \le C\frac{ {\mathbb {E}}[|g|^3] + m(\Vert g\Vert _3 \Vert h\Vert _3 + \Vert h\Vert _2^{3/2})}{\sqrt{n}} \end{aligned}$$
(3.31)

where we have used the fact that \(\sigma _g^2 = 1\) in (3.3) and \(\sigma _g \le \Vert h\Vert _2\) by virtue of (3.10). From (3.31), one can establish (3.18) with (3.21)–(3.24) and that \(\Vert h\Vert _2^{3/2} \le {\mathbb {E}}[h^2]\).

It remains to finish the proof for (3.15): First, it can be seen that

$$\begin{aligned} 0 < a_{n, x} \le 1. \end{aligned}$$
(3.32)

Because of (3.32), we have

$$\begin{aligned} |x a_{n, x} - x|&= \left| \frac{ (a_{n, x}^2 - 1)x}{a_{n,x } +1}\right| = \left| \left( \frac{b_n }{ 1 + b_n x^2} \right) \left( \frac{x^3}{ a_{n, x} + 1} \right) \right| \le b_n x^3 = \frac{ m^2(n-1)x^3}{ (n-m)^2}, \end{aligned}$$

which implies, by the mean value theorem, that

$$\begin{aligned} |\Phi (x a_{n, x}) - \Phi (x)| \le \phi (x a_{n, x} )\frac{ m^2(n-1)x^3}{ (n-m)^2} =\frac{ m^2(n-1)x^3}{\sqrt{2 \pi } (n-m)^2} \exp \left( \frac{-x^2 a_{n, x}^2}{2}\right) . \end{aligned}$$

At the same time, we also have, by the well-known normal tail bound and (3.32),

$$\begin{aligned} |\Phi (x a_{n, x} ) - \Phi (x)| \le {\bar{\Phi }}(x a_{n, x} ) +{\bar{\Phi }}(x) \le \frac{2}{\max (2 , \sqrt{2 \pi } x a_{n, x})} \exp \left( \frac{- x^2 a_{n, x}^2}{2} \right) . \end{aligned}$$

\(\square \)