1 Introduction

As the number of observations n in a statistical experiment goes to infinity, many statistics of interest have the property to converge weakly to a \(\mathcal {N}(0,1)\) distribution, once adequately centered and scaled, see, e.g., van der Vaart (2000, Chapter 5) for a thorough introduction. Hence, when little is known on the distribution of a statistic for a fixed sample size, a classical approach to conduct inference on the parameters of the statistical model amounts to approximating that distribution by its tractable Gaussian limit. A recurring theme in statistics and probability is thus to quantify the distance between those two distributions for a given n.

In this article, we present some refined results in the canonical case of a standardized sum of independent random variables. We consider independent but not necessarily identically distributed random variables to encompass a broader range of applications. For instance, certain bootstrap schemes such as the multiplier ones (see Chapter 9 in van der Vaart (1996) or Chapter 10 in Kosorok (2006)) boil down to studying a sequence of mutually independent not necessarily identically distributed (i.n.i.d.) random variables conditional on the initial sample.

More formally, let \({(X_i)_{i=1, \dots , n}}\) be a sequence of i.n.i.d. random variables satisfying for every \({i \in \{1, \dots ,n\}}\), \({\mathbb {E}[X_i] = 0}\) and \({\gamma _i {:=} \mathbb {E}[ X_i^4 ] < +\infty }\). We also define the standard deviation \(B_n\) of the sum of the \(X_i\)’s, i.e., \({B_n {:=} \sqrt{\sum _{i=1}^n\mathbb {E}[X_i^2]}},\) so that the standardized sum can be written as \({S_n {:=} \sum _{i=1}^n X_i/B_n}\). Finally, we define the average individual standard deviation \({\overline{B}_n {:=} B_n/\sqrt{n}}\) and the average standardized third raw moment \({\lambda _{3,n} {:=} \frac{1}{n}\sum _{i=1}^n\mathbb {E}[X_i^3]/\overline{B}_n^3}\). The main results of this article are of the form

$$\begin{aligned} \underbrace{\sup _{x \in \mathbb {R}} \left| \mathbb {P}(S_n \le x) - \Phi (x) - \frac{\lambda _{3,n}}{6\sqrt{n}}(1-x^2)\varphi (x) \right| }_{\displaystyle =:\Delta _{n,\text {E}}} \le \delta _n, \end{aligned}$$
(1)

where \(\Phi \) is the cumulative distribution function of a standard Gaussian random variable, \(\varphi \) its density function and \(\delta _n\) is a positive sequence that depends on the first four moments of \((X_i)_{i=1,\dots ,n}\) and tends to zero under some regularity conditions. In the following, we use the notation \(G_n(x) {:=} \Phi (x) + \lambda _{3,n} (6\sqrt{n})^{-1} (1-x^2)\varphi (x)\).

The quantity \(G_n(x)\) is usually called the one-term Edgeworth expansion of \(\mathbb {P}\left( S_n \le x\right) \), hence the letter E in the notation \(\Delta _{n,\text {E}}\). Controlling the uniform distance between \(\mathbb {P}\left( S_n \le \cdot \right) \) and \(G_n(\cdot )\) has a long tradition in statistics and probability, see for instance Esseen (1945) and the books by Cramer (1962) and Bhattacharya and Ranga Rao (1976). As early as in the work of Esseen (1945), it was acknowledged that in independent and identically distributed (i.i.d.) cases, \(\Delta _{n,\text {E}}\) was of the order \(n^{-1/2}\) in general and of the order \(n^{-1}\) if \((X_i)_{i=1,\dots ,n}\) has a nonzero continuous component. These results were then extended in a wide variety of directions, often in connection with bootstrap procedures, see for instance Hall (1992) and Lahiri (2003) for the dependent case.

A one-term Edgeworth expansion can be seen as a refinement of the so-called Berry-Esseen inequality (Berry (1941); Esseen (1942)) which goal is to bound

$$\begin{aligned} \Delta _{n,\text {B}} {:=} \sup _{x \in \mathbb {R}}\big | \mathbb {P}(S_n \le x) - \Phi (x) \big |. \end{aligned}$$
(2)

The refinement stems from the fact that in \(\Delta _{n,\text {E}},\) the distance between \(\mathbb {P}\left( S_n \le \cdot \right) \) and \(\Phi \) is adjusted for the presence of non-asymptotic skewness in the distribution of \(S_n\). Contrary to the literature on Edgeworth expansions, there is a substantial amount of work devoted to explicit constants in the Berry-Esseen inequality and its extensions, see, e.g., Bentkus and Götze (1996); Bentkus (2003); Pinelis and Molzon (2016); Chernozhukov et al. (2017); Raič (2018, 2019). The sharpest known result in the i.n.i.d. univariate framework is due to Shevtsova (2013), which shows that for every \({n \in \mathbb {N}^{*}}\), if \({\mathbb {E}[|X_i|^3]<+\infty }\) for every \({i \in \{1,...,n\}}\), then \(\Delta _{n,\text {B}} \le 0.5583 \, K_{3,n} / \sqrt{n}\) where \(K_{p,n} {:=} n^{-1} \sum _{i=1}^n\mathbb {E}[|X_i|^p]/(\overline{B}_n)^p\), for \({p \in \mathbb {N}^{*}}\), denotes the average standardized p-th absolute moment. \(K_{p,n}\) measures tail thickness, with \(K_{2,n}\) normalized to 1 and \(K_{4,n}\) the kurtosis. An analogous result is given in Shevtsova (2013) under the i.i.d. assumption where 0.5583 is replaced with 0.4690. A close lower bound is due to Esseen (1956): there exists a distribution such that \(\Delta _{n,\text {B}} = (C_B/\sqrt{n}) \left( n^{-1} \sum _{i=1}^n\mathbb {E}[|X_i|^3]/\overline{B}_n^3\right) \) with \({C_B \approx 0.4098}\). Another line of research applies Edgeworth expansions in order to get a bound on \(\Delta _{n,\text {B}}\) that contains higher-order terms, see Adell and Lekuona (2008); Boutsikas (2011) and Zhilova (2020).

Despite the breadth of those theoretical advances, there remain some limits to take full advantage of those results even in simple statistical applications, for instance, when conducting inference on the expectation of a real random variable.Footnote 1 If we focus on Berry-Esseen inequalities, we show in Section 5.2 that even the sharpest upper bound to date on \(\Delta _{n,\text {B}}\) can be uninformative when conducting inference on an expectation even for n larger than 59,000. Therefore, it is natural to wonder whether bounds derived from a one-term Edgeworth expansion could be tighter in moderately large samples (such as a few thousands). In the i.i.d. case and under some smoothness conditions, Senatov (2011) obtains such improved bounds. To our knowledge, the question is nevertheless still open in the i.n.i.d. setup, as well as in the general setup when no condition on the characteristic function is assumed. In particular, most articles that present results of the form of (1) do not provide a fully explicit value for \(\delta _n\), that is, \(\delta _n\) is defined up to some “universal” but unknown constant, see for instance Cramer (1962) and  Bentkus and Götze (1996), among others.

In this article, we derive novel inequalities of the form of (1) that aim to be relevant in practical applications. Such “user-friendly” bounds seek to achieve two goals. First, we provide explicit values for \(\delta _n\), which are implemented in the new R package BoundEdgeworth Derumigny et al. (2023) using the function Bound_EE1 (the function Bound_BE provides a bound on \({\Delta _{n,B}}\)). Second, the bounds \(\delta _n\) should be small enough to be informative even with small (\({n \approx }\) hundreds) to moderate (\({n \approx }\) thousands) sample sizes. We obtain these bounds in an i.i.d. setting and in a more general i.n.i.d. case only assuming finite fourth moments.

We give improved bounds on \(\Delta _{n,\text {E}}\) under some regularity assumptions on the tail behavior of the characteristic function \({f_{S_{n}}}\) of \(S_n\). Such conditions are related to the continuity of the distribution of \(S_n\) and the differentiability of the corresponding density (with respect to Lebesgue’s measure). These are well-known conditions required for the Edgeworth expansion to be a good approximation of \({\mathbb {P}(S_n\le \cdot \,)}\) with fast rates. Our main results are summed up in Table 1.

Table 1 Summary of the new bounds on \(\Delta _{n,\text {E}}\) under different scenarios

In the rest of this section, we introduce notation used in the rest of the paper. Section 2 presents our bounds on \({\Delta _{n,E}}\) under moment conditions only in i.n.i.d. or i.i.d. settings. In Section 3, we develop tighter bounds under regularity assumptions on the characteristic function of \(S_n\). They rely on an alternative control of \({\Delta _{n,E}}\) that involves the integral of \({f_{S_{n}}}\), enabling us to use additional regularity assumptions on the tails of that function. In Section 4, we discuss practical aspects related to our bounds: how to choose or estimate the moments of the distribution of \(S_n\) involved in order to compute our bounds. We also perform numerical comparisons between our and existing bounds for some particular distributions (Student and Gamma). In Section 5, we apply our results to analyze several aspects of one-sided tests based on the normal approximation of a sample mean. In particular, based on our bounds, we propose a new method to compute sufficient sample sizes for experimental design with given effect size to be detected and nominal power. All proofs are postponed in the appendix. The proofs of the main results are gathered in Appendix A, relying on the computations of Appendix B. Useful lemmas are given in Appendix C.

Additional notation. \(\vee \) (resp. \(\wedge \)) denotes the maximum (resp. minimum) operator. For a random variable X, we denote its probability distribution by \(P_X\). For a distribution P, let \(f_P\) denote its characteristic function; similarly, for a random variable X, we denote by \(f_X\) its characteristic function. We recall that \(f_{\mathcal {N}(0,1)}(t)=e^{-t^2/2}\). We denote the (extended) lower incomplete Gamma function by \(\gamma (a, x) {:=} \int _0^x |u|^{a-1} e^{-u} du\) (for \(a > 0\) and \(x \in \mathbb {R}\)), the upper incomplete Gamma function by \(\Gamma (a,x) {:=} \int _x^{+\infty } u^{a-1} e^{-u} du\) (for \(a \ge 0\) and \(x > 0\)) and the standard gamma function by \(\Gamma (a) {:=} \Gamma (a,0) = \int _0^{+\infty } u^{a-1} e^{-u} du\) (for \(a > 0\)). For two sequences \((a_n),\) \((b_n),\) we write \(a_n = O(b_n)\) whenever there exists \(C>0\) such that \({a_n \le C b_n}\); \(a_n = o(b_n)\) whenever \(a_n / b_n \rightarrow 0\); and \(a_n \asymp b_n\) whenever \(a_n = O(b_n)\) and \(b_n = O(a_n)\). We denote by \(\chi _1\) the constant \(\chi _1 {:=} \sup _{x>0} x^{-3} |\cos (x)-1+x^2/2| \approx 0.099\) (Shevtsova , 2010), and by \(\theta _1^*\) the unique root in \((0,2\pi )\) of the equation \(\theta ^2+2\theta \sin (\theta )+6(\cos (\theta )-1)=0\). We also define \(t_1^* {:=} \theta _1^* / (2\pi ) \approx 0.64\) (Shevtsova , 2010). For every \({i \in \mathbb {N}^{*}}\), we define the individual standard deviation \({\sigma _{i} {:=} \sqrt{\mathbb {E}[X_i^2]}}\). Henceforth, we reason for a fixed arbitrary sample size \({n \in \mathbb {N}^{*}}\). Densities and continuous distributions are always assumed implicitly to be with respect to Lebesgue’s measure.

For clarity, we define below the concept of an explicit expression. In the rest of the article, the goal is to find bounds on \({\Delta _{n,E}}\) that are explicit expressions in the sense of Definition 1.

Definition 1

An expression is called explicit if it can be written as a finite sequence of terms. A term is defined as

  • either a numerical constant (i.e. a computable real number),

  • or one of the parameters of the framework (such as n, \(\lambda _{3,n}\), \(K_{4,n}\) and so on),

  • or one of the standard functions (rational functions, exponential functions, logarithmic functions, incomplete Gamma functions, indicator functions, absolute value, maximum or minimum) applied to a finite set of terms,

  • or, recursively, as an explicit expression itself.

2 Control of \(\varvec{\Delta _{n,\text {E}}}\) under moment conditions only

We start by introducing two versions of our basic assumptions on the distribution of the variables \((X_i)_{i=1, \dots , n}\).

Assumption 1

(Moment conditions in the i.n.i.d. framework) \((X_i)_{i=1, \dots , n}\) are independent and centered random variables such that for every \(i=1,\dots ,n\), the fourth raw individual moment \(\gamma _{i} {:=} \mathbb {E}[ X_i^4 ]\) is positive and finite.

Assumption 2

(Moment conditions in the i.i.d. framework) \((X_i)_{i=1,\dots ,n}\) are i.i.d. centered random variables such that the fourth raw moment \(\gamma _{n} {:=} \mathbb {E}[ X_n^4 ]\) is positive and finite.

Assumption 2 corresponds to the classical i.i.d. sampling with finite fourth moment while Assumption 1 is its generalization in the i.n.i.d. framework. Those two assumptions primarily ensure that enough moments of \((X_i)_{i=1,\dots ,n}\) exist to build a non-asymptotic upper bound on \(\Delta _{n,\text {E}}.\) In some applications, such as the bootstrap, it is required to consider an array of random variables \((X_{i,n})_{i=1,\dots ,n}\) instead of a sequence. For example, Efron (1979)’s nonparametric bootstrap procedure consists in drawing n elements in the random sample \((X_{1,n},...,X_{n,n})\) with replacement. Conditional on \((X_{i,n})_{i=1,\dots ,n},\) the n values drawn with replacement can be seen as a sequence of n i.i.d. random variables with distribution \(\frac{1}{n}\sum _{i=1}^n\delta _{\{X_{i,n}\}}\), denoting by \(\delta _{\{a\}}\) the Dirac measure at a given point \({a \in \mathbb {R}}\). Our results encompass these situations directly. Nonetheless, we do not use the array terminology here as our results hold non-asymptotically, i.e., for any fixed sample size n.

To state our first theorem, remember that \({\overline{B}_n {:=} (1 / \sqrt{n}) \sqrt{\sum _{i=1}^n\sigma _{i}^2}}\), for \(p \in \mathbb {N}^{*}\), \(K_{p,n} {:=} n^{-1} \sum _{i=1}^n\mathbb {E}[|X_i|^p]/ \overline{B}_n^p\), and let us introduce \(\widetilde{K}_{3,n} {:=} K_{3,n} + \frac{1}{n}\sum _{i=1}^n\mathbb {E}|X_i|\sigma _{i}^2 / \overline{B}_n^3\), \(\Delta {:=} (1 - 4 \chi _1 - \sqrt{K_{4,n}/n}) / 2\), and the terms \(r^{\text {inid,skew}}_{1,n}\), \(r^\text {inid,noskew}_{1,n}\), \(r^\text {iid,skew}_{1,n}\) and \(r^\text {iid,noskew}_{1,n}\).

These remainder terms are defined by:

$$\begin{aligned} r^{\text {inid,skew}}_{1,n}&{:=} \frac{(14.1961 + 67.0415) \, {\tilde{K}_{3,n}}^4}{16\pi ^4 n^2} + \frac{4.3394 \, |{\lambda _{3,n}}| {\tilde{K}_{3,n}}^3}{8 \pi ^3 n^2}+ \frac{1.0435 K_{4,n}^{5/4}}{n^{5/4}} \nonumber \\&+\! \frac{1.1101 K_{4,n}^{3/2} + 31.9921 |\lambda _{3,n}| \times K_{4,n}}{n^{3/2}} + \frac{0.6087 K_{4,n}^{7/4}}{n^{7/4}} + \frac{9.8197K_{4,n}^{2}}{n^2} \nonumber \\&+\! \frac{ |\lambda _{3,n}| \big ( \Gamma ( 3/2 , \sqrt{0.2} (n/K_{4,n})^{1/4}\! \wedge 2 \sqrt{n} / \tilde{K}_{3,n})\! -\! \Gamma ( 3/2 , 2 \sqrt{n} / \tilde{K}_{3,n}) \big ) }{\sqrt{n}} \nonumber \\&+\! \frac{1.0253 K_{3,n}}{6\pi \sqrt{n}} \Big \{ 0.5|\Delta |^{-3/2}\mathbbm {1}_{\{\Delta \ne 0\}} \times \big |\gamma (3/2, 4 \Delta n / \tilde{K}_{3,n}^2) \nonumber \\&\qquad \qquad \quad \; -\! \gamma \big (3/2, 2\Delta ( 0.1 (n/K_{4,n})^{1/2} \wedge 2 n / \tilde{K}_{3,n}^2 ) \big ) \big | \nonumber \\&\qquad \qquad \quad \; +\! \mathbbm {1}_{\{\Delta = 0\}} \frac{ ( 2 \sqrt{n} \! / \! \tilde{K}_{3,n} \!)^3\! -\! (\! \sqrt{0.2} (\! n/K_{4,n}\! )^{1/4}\! \wedge \! 2 \sqrt{n} \! /\! \tilde{K}_{3,n}\! )^3 }{3} \Big \}, \end{aligned}$$
(3)
$$\begin{aligned} r_{1,n}^{\text {inid,noskew}}&{:=} \frac{(14.1961 + 67.0415) \, \tilde{K}_{3,n}^4}{16\pi ^4 n^2} + \frac{0.6661 K_{4,n}^{3/2}}{n^{3/2}} + \frac{6.1361 K_{4,n}^{2}}{n^{2}} \nonumber \\&+ \frac{1.0253K_{4,n}}{6\pi n} \Big \{ 0.5|\Delta |^{-2}\mathbbm {1}_{\{\Delta \ne 0\}} \times \big |\gamma (2, 4 \Delta n / \tilde{K}_{3,n}^2) \nonumber \\&\qquad \quad \;\; - \gamma \big (2, 2\Delta ( 0.1 (n/K_{4,n})^{1/2} \wedge 2 n / \tilde{K}_{3,n}^2 ) \big ) \big | \nonumber \\&\qquad \quad \;\; + \mathbbm {1}_{\{\Delta = 0\}} \frac{ ( 2 \sqrt{n} /\! \tilde{K}_{3,n} )^4\! -\! (\!\sqrt{0.2} (n/\! K_{4,n})^{1/4}\! \wedge \! 2 \sqrt{n} /\! \tilde{K}_{3,n}\! )^4 }{4} \Big \}, \end{aligned}$$
(4)
$$\begin{aligned} r_{1,n}^{\text {iid,skew}}&{:=} \frac{(14.1961 + 67.0415) \, \tilde{K}_{3,n}^4}{16\pi ^4 n^2} + \frac{4.3394 \, |\lambda _{3,n}| \tilde{K}_{3,n}^3}{8 \pi ^3 n^2} + e_{2,n} \nonumber \\&+ \frac{1.306 \big ( e_{2,n} - 1.006792 \big ) \lambda _{3,n}^2}{36 n} \nonumber \\&+ \frac{ |\lambda _{3,n}| \big ( \Gamma ( 3/2 , \sqrt{0.2} (n/K_{4,n})^{1/4} \wedge 2 \sqrt{n} / \tilde{K}_{3,n})\! -\! \Gamma ( 3/2 , 2 \sqrt{n} / \tilde{K}_{3,n}) \big ) }{\sqrt{n}} \nonumber \\&+ \frac{1.0253 \times 2^{5/2} \, K_{3,n}}{3 \pi \sqrt{n}} \big ( \Gamma \big ( 3/2, \big \{ \sqrt{0.2} (n/K_{4,n})^{1/4} \wedge 2\sqrt{n}/\tilde{K}_{3,n} \big \}^2/8 \big ) \nonumber \\&\qquad \qquad \qquad \qquad \qquad - \Gamma \big ( 3/2, 4 n / (8 \tilde{K}_{3,n}^2) \big ) \big ), \end{aligned}$$
(5)

and

$$\begin{aligned} r_{1,n}^{\text {iid,noskew}}&{:=} \frac{(14.1961 + 67.0415) \, \tilde{K}_{3,n}^4}{16\pi ^4 n^2} + e_{2,n} \nonumber \\&+ \frac{16 \times 1.0253K_{4,n}}{3\pi n} \big ( \Gamma \big ( 2 , \big \{ \sqrt{0.2} (n/K_{4,n})^{1/4} \wedge 2\sqrt{n}/\tilde{K}_{3,n} \big \}^2/8 \big ) \nonumber \\&\qquad \qquad \qquad \qquad \qquad - \Gamma \big ( 2 , 4 n / (8 \tilde{K}_{3,n}^2) \big ) \big ), \end{aligned}$$
(6)

where

$$\begin{aligned} \overline{R}_n^{\text {iid,skew}}\!{} & {} {:=} \dfrac{0.06957 |\lambda _{3,n}|}{n^{1.5}} + \dfrac{0.6661 K_{4,n}}{n^{2}} + \dfrac{0.4441 \lambda _{3,n}^{2}}{n^{2}} + \dfrac{0.6087 |\lambda _{3,n}| \times K_{4,n}}{n^{2.5}} \nonumber \\{} & {} + \dfrac{0.2221 K_{4,n}^{2}}{n^{3}} \nonumber \\{} & {} + e_{2,n} \times \Big (\dfrac{0.1088 K_{4,n}^{2}}{n^{2}} + \dfrac{1.3321 K_{4,n}}{n^{2}} + \dfrac{0.3972 |\lambda _{3,n}| \times K_{4,n}^{0.75}}{n^{2.25}} \nonumber \\{} & {} \qquad + \dfrac{0.04441 K_{4,n}^{1.5}}{n^{2.5}} + \dfrac{0.02961 K_{4,n}^{0.5} \times \lambda _{3,n}^{2}}{n^{2.5}} + \dfrac{0.006620 |\lambda _{3,n}| \times K_{4,n}^{1.25}}{n^{2.75}} \nonumber \\{} & {} + \dfrac{0.0003701 K_{4,n}^{2}}{n^{3}} + \dfrac{4.0779 }{n^{2}} + \dfrac{2.4316 |\lambda _{3,n}| \times K_{4,n}^{-0.25}}{n^{2.25}} + \dfrac{0.2719 K_{4,n}^{0.5}}{n^{2.5}} \nonumber \\{} & {} + \dfrac{0.1813 K_{4,n}^{-0.5} \times \lambda _{3,n}^{2}}{n^{2.5}} + \dfrac{0.1216 |\lambda _{3,n}| \times K_{4,n}^{0.25}}{n^{2.75}} + \dfrac{0.002266 K_{4,n}}{n^{3}} \nonumber \\{} & {} + \dfrac{0.3625 |\lambda _{3,n}|^{2} \times K_{4,n}^{-0.5}}{n^{2.5}} + \dfrac{0.05404 |\lambda _{3,n}| \times K_{4,n}^{-0.75} \times \lambda _{3,n}^{2}}{n^{2.75}} \nonumber \\{} & {} + \dfrac{0.01209 |\lambda _{3,n}|^{2} \times K_{4,n}^{0}}{n^{3}} + \dfrac{0.002027 |\lambda _{3,n}| \times K_{4,n}^{0.75}}{n^{3.25}} + \dfrac{0.004531 K_{4,n}}{n^{3}} \nonumber \\{} & {} + \dfrac{0.006042 K_{4,n}^{0} \times \lambda _{3,n}^{2}}{n^{3}} + \dfrac{7.552 \times 10^{-5} K_{4,n}^{1.5}}{n^{3.5}} \nonumber \\{} & {} + \dfrac{0.002014 K_{4,n}^{-1} \times \lambda _{3,n}^{4}}{n^{3}} + \dfrac{0.0009006 |\lambda _{3,n}| \times K_{4,n}^{-0.25} \times \lambda _{3,n}^{2}}{n^{3.25}} \nonumber \\{} & {} + \dfrac{5.035 \times 10^{-5} K_{4,n}^{0.5} \times \lambda _{3,n}^{2}}{n^{3.5}} \nonumber \\{} & {} + \!\dfrac{0.0001007 |\lambda _{3,n}|^{2} \!\times \! K_{4,n}^{0.5}}{n^{3.5}} \!+\! \dfrac{1.126 \!\times \! 10^{-5} |\lambda _{3,n}| \!\times \! K_{4,n}^{1.25}}{n^{3.75}} \nonumber \\{} & {} +\! \dfrac{3.147 \!\times \! 10^{-7} K_{4,n}^{2}}{n^{4}} + \dfrac{0.2983 |\lambda _{3,n}| \times K_{4,n}}{n^{1.5}} \nonumber \\{} & {} + \dfrac{1.8261 |\lambda _{3,n}|}{n^{1.5}} + \dfrac{0.5445 |\lambda _{3,n}|^{2} \times K_{4,n}^{-0.25}}{n^{1.75}} + \dfrac{0.06087 |\lambda _{3,n}| \times K_{4,n}^{0.5}}{n^{2}} \nonumber \\{} & {} + \dfrac{0.04058 |\lambda _{3,n}| \times K_{4,n}^{-0.5} \times \lambda _{3,n}^{2}}{n^{2}} \nonumber \\{} & {} + \dfrac{0.009074 |\lambda _{3,n}|^{2} \times K_{4,n}^{0.25}}{n^{2.25}} + \dfrac{0.0005073 |\lambda _{3,n}| \times K_{4,n}}{n^{2.5}}\Big ), \end{aligned}$$
(7)
$$\begin{aligned} \overline{R}^\textrm{iid,noskew}_n&{:=} \dfrac{0.6661 K_{4,n}}{n^{2}} + \dfrac{0.2221 K_{4,n}^{2}}{n^{3}} + e_{2,n} \times \Big (\dfrac{0.1088 K_{4,n}^{2}}{n^{2}} \nonumber \\&+ \dfrac{1.3321 K_{4,n}}{n^{2}} + \dfrac{0.04441 K_{4,n}^{1.5}}{n^{2.5}} \nonumber \\&+ \dfrac{0.0003701 K_{4,n}^{2}}{n^{3}} + \dfrac{4.0779 }{n^{2}} + \dfrac{0.2719 K_{4,n}^{0.5}}{n^{2.5}} + \dfrac{0.002266 K_{4,n}}{n^{3}} \nonumber \\&+ \dfrac{0.004531 K_{4,n}}{n^{3}} + \dfrac{7.552 \times 10^{-5} K_{4,n}^{1.5}}{n^{3.5}} + \dfrac{3.147 \times 10^{-7} K_{4,n}^{2}}{n^{4}} \Big ). \end{aligned}$$
(8)

and

$$\begin{aligned} e_{2,n}&{:=} \exp \Big (0.0119 + 0.000071 \times \big ( \frac{42.9326|\lambda _{3,n}|}{(K_{4,n}^{1/4}n^{1/4})} + 4.8 \left( \frac{K_{4,n}}{n}\right) ^{1/2} \\&+ \frac{3.2\lambda _{3,n}^2}{(K_{4,n}n)^{1/2}} + \frac{0.7156K_{4,n}^{1/4}|\lambda _{3,n}|}{n^{3/4}}+ \frac{0.04K_{4,n}}{n} \big ) \Big ). \end{aligned}$$

The following theorem is proved in Sections 1 (“i.n.i.d.” case) and Section 1 (“i.i.d.” case).

Theorem 1

(Control of the one-term Edgeworth expansion with bounded moments of order four) If Assumption 1 (resp. Assumption 2) holds and \(n \ge 3\), we have the bound

$$\begin{aligned} {\Delta _{n,E}} \le \frac{0.1995 \, \widetilde{K}_{3,n}}{\sqrt{n}} + \frac{0.031 \, \widetilde{K}_{3,n}^2 + 0.195 \, K_{4,n} + 0.054 \, |\lambda _{3,n}|\widetilde{K}_{3,n} + 0.03757 \, \lambda _{3,n}^2}{n} + r_{1,n} \, , \end{aligned}$$
(9)

where \(r_{1,n}\) is one of the four possible remainders \(r_{1,n}^{\text {inid,skew}}\), \(r_{1,n}^{\text {inid,noskew}}\), \(r_{1,n}^{\text {iid,skew}}\) or \(r_{1,n}^{\text {iid,noskew}}\), depending on whether Assumption 1 (“i.n.i.d.” case) or Assumption 2 (“i.i.d.” case) is satisfied and whether \(\mathbb {E}[X_i^3]=0\) for every \(i = 1,\dots ,n\) (“noskew” case) or not (“skew” case).

Remark 1

Assume that there exists a constant \(K_4\) such that \(K_{4,n} \le K_4\) for all \(n \ge 3\) (this is the case, for example, if the data is an i.i.d sample from a given infinite homogeneous population). Then there exist constants \(C_1, C_2, C_3, C_4 > 0\) such that the remainder terms can be bounded in the following way: \(|r^\textrm{inid,skew}_{1,n}| \le C_1 n^{-5/4}\), \(|r^\textrm{iid,skew}_n| \le C_2 n^{-3/2}\), \(|r^\textrm{inid,skew}_{1,n}| \le C_3 n^{-5/4}\), and \(|r^\textrm{iid,skew}_n| \le C_4 n^{-2}\), for every \(n \ge 3\). This can be seen directly from the previous equations, as it is always possible to find the main term, and then bound all the others by the required powers.

Remark 2

In the regime where \(K_{4,n}\) tends to infinity faster than \(\sqrt{n}\), our bounds do not tend to 0. This is the case in particular for the term that is multiplied by \(\mathbbm {1}_{\{\Delta \ne 0\}}\). In this case, the bounds given by Theorem 1 are still valid; in some cases, the right-hand side will be larger than 1 and therefore the inequality trivially still holds. This can be interpreted in the following sense: the average kurtosis of the distribution increases too fast for the distance to the first-order Edgeworth expansion to be controlled by our techniques.

Note that it is possible to replace \(\tilde{K}_{3,n}\) by the simpler upper bound \(2 K_{3,n}\) under Assumption 1 (respectively by \(K_{3,n}+1\) under Assumption 2). This theorem displays a bound of order \(n^{-1/2}\) on \({\Delta _{n,E}}\) in the regime where \(K_{4,n}\) is bounded by a fixed constant. The rate \(n^{-1/2}\) cannot be improved when only assuming moment conditions on \((X_i)_{i=1,\dots ,n}\) (Esseen (1945), Cramer (1962)). Another nice aspect of those bounds is their dependence on \(\lambda _{3,n}\). For many classes of distributions, \(\lambda _{3,n}\) can, in fact, be exactly zero. This is the case if for every \(i = 1,\dots ,n\), \(X_i\) has a non-skewed distribution, such as any distribution that is symmetric around its expectation. More generally, \(|\lambda _{3,n}|\) can be substantially smaller than \(K_{3,n}\), decreasing the related terms.

As mentioned in the Introduction, we are not aware of explicit bounds on \({\Delta _{n,E}}\) under moment conditions only. It is thus difficult to assess how our bounds compare to the literature. On the other hand, there exist well-established bounds on \(\Delta _{n,B}\). Using Theorem 1, the bound \((1-x^2)\varphi (x) / 6 \le \varphi (0)/6 \le 0.0665 \) for all \(x \in \mathbb {R}\), and applying the triangle inequality, we can control \(\Delta _{n,B}\) as well. More precisely, for every \(n \ge 3\), we have

$$\begin{aligned}&\Delta _{n,B} \le \frac{0.1995\widetilde{K}_{3,n}+0.0665|\lambda _{3,n}|}{\sqrt{n}} + {\frac{C_5}{n}}, \end{aligned}$$
(10)

for some constant \(C_5 > 0\). Under Assumption 1, \(\tilde{K}_{3,n} \le 2 K_{3,n} \). Combined with the refined inequality \(|\lambda _{3,n}| \le 0.621K_{3,n}\) (Pinelis, 2011, Theorem 1), we can derive a simpler bound that involves only \(K_{3,n}\)

$$\begin{aligned} \frac{0.1995\widetilde{K}_{3,n} + 0.0665|\lambda _{3,n}|}{\sqrt{n}} \le \frac{0.4403 K_{3,n}}{\sqrt{n}}. \end{aligned}$$

The bound \(\Delta _{n,B} \le 0.4403 K_{3,n} / \sqrt{n} + {C_5/n}\) is already tighter than the sharpest known Berry-Esseen inequality in the i.n.i.d. framework, \(\Delta _{n,B} \le 0.5583 K_{3,n}/\sqrt{n}\), as soon as the remainder term \(C_5 / n\) is smaller than the difference \(0.118 K_{3,n}/\sqrt{n}\). This bound is also tighter than the sharpest known Berry-Esseen inequality in the i.i.d. case, \(\Delta _{n,B} \le 0.4690 K_{3,n}/\sqrt{n}\), up to a \({C_5/n}\) term. We recall that the sharpest existing bounds (Shevtsova , 2013) only require a finite third moment while we use further regularity in the form of a finite fourth moment. We refer to Example 1 and Fig. 1 for a numerical comparison, showing improvements for n of the order of a few thousands. The most striking improvement is obtained in the unskewed case when \({\mathbb {E}[X_i^3] = 0}\) for every integer i. In this case, Theorem 1 and the inequality \({\tilde{K}_{3,n} \le 2 K_{3,n}}\) yield \({\Delta _{n,B} \le 0.3990 K_{3,n} / \sqrt{n} + {C_5/n}}\). Note that this result does not contradict Esseen (1956)’s lower bound \({0.4098 K_{3,n} / \sqrt{n}}\) as the distribution he constructs does not satisfy \(\mathbb {E}[X_i^3] = 0\) for every i.

Under Assumption 2, \(\tilde{K}_{3,n} \le K_{3,n}+1\) and we can combine this with (10) and the inequality \(|\lambda _{3,n}| \le 0.621K_{3,n}\), so that we obtain

$$\begin{aligned} \Delta _{n,B}&\le \frac{0.1995 (K_{3,n} + 1) + 0.0665 \times 0.621 K_{3,n}}{\sqrt{n}} + {\frac{C_5}{n}} \\&\le \frac{0.2408 K_{3,n} + 0.1995}{\sqrt{n}} + {\frac{C_5}{n}}. \end{aligned}$$

As in the i.n.i.d. case discussed above, the numerical constant in front of \(K_{3,n}\) in the leading term is smaller than the lower bound constant \({C_B{:=}0.4098}\) derived in Esseen (1956). The point is addressed in detail in Shevtsova (2012), where the author explains that the constant coming from Esseen (1956) cannot be improved only if one seeks control of \(\Delta _{n,B}\) with a leading term of the form \(c_1 K_{3,n}/\sqrt{n}\) for some \(c_1 > 0\). In contrast, our bound on \(\Delta _{n,B}\) exhibits a leading term of the form \((c_1 K_{3,n} + c_2)/\sqrt{n}\) for positive constants \(c_1\) and \(c_2\).

Fig. 1
figure 1

Comparison between existing (Shevtsova , 2013) and new (Theorem 1) Berry-Esseen upper bounds on \(\Delta _{n,\text {B}} {:=} \sup _{x \in \mathbb {R}}\left| \mathbb {P}(S_n \le x) - \Phi (x) \right| \) for different sample sizes under moment conditions only (log-log scale). As remarked by a reviewer, we note that the improvement we obtain should not come as a surprise since our results require boundedness of \(4\)th order moments while Shevtsova (2013)’s bounds remain valid under boundedness of 3rd order moments only. In that respect, the comparison is somewhat unfair

Example 1

(Implementation of our bounds on \(\Delta _{n,B}\)) Theorem 1 provides new tools to control \(\Delta _{n,B}\), and we compare them with existing results. To compute our bounds, we need numerical values for \(\tilde{K}_{3,n}\), \(\lambda _{3,n}\), and \(K_{4,n}\) or upper bounds thereon. As discussed in Section 4.1, controlling \(K_{4,n}\) is in fact sufficient to bound \({\Delta _{n,E}}\) and \(\Delta _{n,B}\). In that section, we also explain that the choice \(K_{4,n} \le 9\) is reasonable in practice as it covers a wide range of commonly encountered distributions. Consequently, we stick to this value in our numerical examples.

The different bounds, without or with the assumption of an unskewed distribution (\(\lambda _{3,n} = 0\)), are plotted as a function of \(n\) in Fig. 1:

  • Shevtsova (2013) i.n.i.d.: \(\frac{0.5583 }{\sqrt{n}} K_{3,n}\)

  • Shevtsova (2013) i.i.d.: \(\frac{0.4690}{\sqrt{n}} K_{3,n}\)

  • Theorem 1i.n.i.d.: \(\frac{0.4403 }{\sqrt{n}} K_{3,n} + r_{1,n}\)

  • Theorem 1i.n.i.d. (unskewed): \(\frac{0.3990}{\sqrt{n}} K_{3,n} + r_{1,n}\)

  • Theorem 1i.i.d.: \(\frac{0.2408 K_{3,n} + 0.1995}{\sqrt{n}} + r_{1,n}\)

  • Theorem 1i.i.d. (unskewed): \(\frac{0.1995 (K_{3,n} + 1)}{\sqrt{n}} + r_{1,n}\),

where the explicit expressions of \(r_{1,n}\), according to the set-up, are given in Eqs. 3, 4, 5, and 6.

As previously mentioned, our bound in the baseline i.n.i.d. case gets close to and even improves upon the best known Berry-Esseen bound in the i.i.d. setup (Shevtsova , 2013) for n of the order of tens of thousands. When \({\lambda _{3,n} = 0}\), our bounds are smaller, highlighting improvements of the Berry-Esseen bounds for unskewed distributions. In parallel, the bounds are also reduced in the i.i.d. framework.

3 Improved bounds on \({\varvec{\Delta _{n,E}}}\) under assumptions on the tail behavior of \(\varvec{f_{S_{n}}}\)

In this section, we derive tighter bounds on \({\Delta _{n,E}}\) under additional regularity conditions on the tail behavior of the characteristic function of \(S_n\). They follow from Theorem 2, which provides an alternative upper bound on \({\Delta _{n,E}}\) that involves the tail behavior of \(f_{S_{n}}\). To state this theorem, let us introduce the terms \(r_{2,n}^{\text {inid,skew}}\), \(r_{2,n}^{\text {inid,noskew}}\), \(r_{2,n}^{\text {iid,skew}}\) and \(r_{2,n}^{\text {iid,noskew}}\) 

$$\begin{aligned} r_{2,n}^{\text {inid,skew}}{} & {} {:=} \frac{1.2533 \, \tilde{K}_{3,n}^4}{16\pi ^4n^2} + \frac{0.3334 \, \tilde{K}_{3,n}^4 \, |\lambda _{3,n}|}{16\pi ^4n^{5/2}} + \frac{14.1961 \, \tilde{K}_{3,n}^{16}}{(2\pi )^{16}n^8} + \frac{4.3394 \, |\lambda _{3,n}| \, \tilde{K}_{3,n}^{12}}{(2\pi )^{12} n^{13/2}} \nonumber \\{} & {} + \frac{|\lambda _{3,n}| \big ( \Gamma ( 3/2 , \sqrt{0.2} (n/K_{4,n})^{1/4} \wedge 16\pi ^3n^2/\tilde{K}_{3,n}^4) - \Gamma ( 3/2 , 16\pi ^3n^2/\tilde{K}_{3,n}^4) \big )}{\sqrt{n}} \nonumber \\{} & {} + \frac{1.0435 K_{4,n}^{5/4}}{n^{5/4}} + \frac{1.1101 K_{4,n}^{3/2} + 8.2383 |\lambda _{3,n}| \times K_{4,n}}{n^{3/2}} + \frac{0.6087 K_{4,n}^{7/4}}{n^{7/4}} \nonumber \\{} & {} + \frac{9.8197K_{4,n}^{2}}{n^2} \nonumber \\{} & {} + \frac{1.0253K_{3,n}}{6\pi \sqrt{n}} \Big \{ 0.5|\Delta |^{-3/2}\mathbbm {1}_{\{\Delta \ne 0\}} \times \big |\gamma (3/2, 2^8\pi ^6 \Delta n^4 / \tilde{K}_{3,n}^8) \nonumber \\{} & {} \qquad \qquad \qquad \;\; - \gamma \big (3/2, \Delta ( 0.2 (n/K_{4,n})^{1/2} \wedge 2^8\pi ^6 n^4 / \tilde{K}_{3,n}^8 ) \big ) \big | \nonumber \\{} & {} \qquad \qquad \qquad + \mathbbm {1}_{\{\Delta = 0\}} \frac{(16 \pi ^3 n^2 / \tilde{K}_{3,n}^4)^3 - (\sqrt{0.2} (n/K_{4,n})^{1/4} \wedge 16 \pi ^3 n^2 / \tilde{K}_{3,n}^4)^3}{3} \Big \} \nonumber \\{} & {} + \frac{1.0253}{\pi } \left( \Gamma \left( 0 , (4\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^8n^4/\tilde{K}_{3,n}^8)(1-4\pi \chi _1t_1^*) / (2\pi ^2) \right) \right. \nonumber \\{} & {} \left. \qquad \qquad \quad - \Gamma \left( 0 , (4t_1^{*2}\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^6n^4/\tilde{K}_{3,n}^8)(1-4\pi \chi _1t_1^*) / 2 \right) \right) \nonumber \\{} & {} + \frac{1.0253}{\pi } \left( \!\Gamma \left( 0 , (4\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^8n^4/\tilde{K}_{3,n}^8) / (2\pi ^2) \right) \!-\! \Gamma \left( 0 , 144\pi ^6n^4/\tilde{K}_{3,n}^8 \right) \right) \!, \end{aligned}$$
(11)
$$\begin{aligned} r_{2,n}^{\text {inid,noskew}}&{:=} \frac{1.2533 \, \tilde{K}_{3,n}^4}{16\pi ^4n^2} + \frac{14.1961 \, \tilde{K}_{3,n}^{16}}{(2\pi )^{16}n^8} \nonumber \\&+ \frac{1.0253 K_{4,n}}{6\pi n} \Big \{ 0.5|\Delta |^{-2}\mathbbm {1}_{\{\Delta \ne 0\}} \times \big |\gamma (2, 2^8\pi ^6 \Delta n^4 / \tilde{K}_{3,n}^8) \nonumber \\&\qquad \qquad \qquad \;\; - \gamma \big (2, \Delta ( 0.2 (n/K_{4,n})^{1/2} \wedge 2^8\pi ^6 n^4 / \tilde{K}_{3,n}^8 ) \big ) \big | \nonumber \\&\qquad \qquad \quad + \mathbbm {1}_{\{\Delta = 0\}} \frac{(16 \pi ^3 n^2 / \tilde{K}_{3,n}^4)^4 - (\sqrt{0.2} (n/K_{4,n})^{1/4} \wedge 16 \pi ^3 n^2 / \tilde{K}_{3,n}^4)^4}{4} \Big \} \nonumber \\&+ \frac{1.0253}{\pi } \left( \Gamma \left( 0 , (4\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^8n^4/\tilde{K}_{3,n}^8)(1-4\pi \chi _1t_1^*) / (2\pi ^2) \right) \right. \nonumber \\&\left. \qquad \qquad \quad - \Gamma \left( 0 , (4t_1^{*2}\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^6n^4/\tilde{K}_{3,n}^8)(1-4\pi \chi _1t_1^*) / 2 \right) \right) \nonumber \\&+ \frac{1.0253}{\pi } \left( \Gamma \left( 0 , (4\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^8n^4/\tilde{K}_{3,n}^8) / (2\pi ^2) \right) \right. \nonumber \\&\left. \qquad \qquad \quad \!-\! \Gamma \left( 0 , 144\pi ^6n^4/\tilde{K}_{3,n}^8 \right) \right) . \end{aligned}$$
(12)
$$\begin{aligned} r^\textrm{iid,skew}_{2,n}&{:=} \frac{1.2533 \, \tilde{K}_{3,n}^4}{16\pi ^4n^2} + \frac{0.3334 \, \tilde{K}_{3,n}^4 \, |\lambda _{3,n}|}{16\pi ^4n^{5/2}} + \frac{14.1961 \, \tilde{K}_{3,n}^{16}}{(2\pi )^{16}n^8} + \frac{4.3394 \, |\lambda _{3,n}| \, \tilde{K}_{3,n}^{12}}{(2\pi )^{12} n^{13/2}} \nonumber \\&+ \frac{|\lambda _{3,n}| \big ( \Gamma ( 3/2 , \sqrt{0.2} (n/K_{4,n})^{1/4} \wedge 16\pi ^3n^2/\tilde{K}_{3,n}^4) - \Gamma ( 3/2 , 16\pi ^3n^2/\tilde{K}_{3,n}^4) \big )}{\sqrt{n}} \nonumber \\&+ \overline{R}_n^{\text {iid,skew}} \nonumber \\&+ \frac{1.0253 \times 2^{5/2} \, K_{3,n}}{3\pi \sqrt{n}} \big | \Gamma (3/2, 2^5\pi ^6n^4/\tilde{K}_{3,n}^8) \nonumber \\&\qquad \qquad \qquad \qquad \quad \; - \Gamma (3/2, 0.1\sqrt{n/(16K_{4,n})} \wedge 2^5\pi ^6n^4/\tilde{K}_{3,n}^8) \big | \nonumber \\&+ \frac{1.306 \big ( e_{2,n}(0.1) - e_3(0.1) \big ) \lambda _{3,n}^2}{36 n} \nonumber \\&+ \frac{1.0253}{\pi } \left( \Gamma \left( 0 , (4\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^8n^4/\tilde{K}_{3,n}^8)(1-4\pi \chi _1t_1^*) / (2\pi ^2) \right) \right. \nonumber \\&\left. \qquad \qquad \quad - \Gamma \left( 0 , (4t_1^{*2}\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^6n^4/\tilde{K}_{3,n}^8)(1-4\pi \chi _1t_1^*) / 2 \right) \right) \nonumber \\&+ \frac{1.0253}{\pi } \left( \Gamma \left( 0 , (4\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^8n^4/\tilde{K}_{3,n}^8) / (2\pi ^2) \right) - \Gamma \left( 0 , 144\pi ^6n^4/\tilde{K}_{3,n}^8 \right) \right) , \end{aligned}$$
(13)

and

$$\begin{aligned} r_{2,n}^{\text {iid,noskew}}{} & {} {:=} \frac{1.2533 \, \tilde{K}_{3,n}^4}{16\pi ^4n^2} + \frac{14.1961 \, \tilde{K}_{3,n}^{16}}{(2\pi )^{16}n^8} + \overline{R}^\textrm{iid,noskew}_n \nonumber \\{} & {} + \frac{16 \!\times \! 1.0253 \, K_{3,n} \big | \Gamma (2, 2^5\pi ^6n^4/\tilde{K}_{3,n}^8) \!-\! \Gamma (2, 0.1\sqrt{n/(16K_{4,n})} \wedge 2^5\pi ^6n^4/\tilde{K}_{3,n}^8) \big |}{3\pi n} \nonumber \\{} & {} + \frac{1.0253}{\pi } \left( \Gamma \left( 0 , (4\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^8n^4/\tilde{K}_{3,n}^8)(1-4\pi \chi _1t_1^*) / (2\pi ^2) \right) \right. \nonumber \\{} & {} \left. \qquad \qquad \quad - \Gamma \left( 0 , (4t_1^{*2}\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^6n^4/\tilde{K}_{3,n}^8)(1-4\pi \chi _1t_1^*) / 2 \right) \right) \nonumber \\{} & {} + \frac{1.0253}{\pi } \left( \Gamma \left( 0 , (4\pi ^2n/\tilde{K}_{3,n}^2 \wedge 144\pi ^8n^4/\tilde{K}_{3,n}^8) / (2\pi ^2) \right) \right. \nonumber \\{} & {} \left. \qquad \qquad \quad - \Gamma \left( 0 , 144\pi ^6n^4/\tilde{K}_{3,n}^8 \right) \right) . \end{aligned}$$
(14)

Recall also that \(t_1^* \approx 0.64\) and let \(a_n {:=} 2t_1^*\pi \sqrt{n}/\tilde{K}_{3,n} \wedge 16\pi ^3n^2/\tilde{K}_{3,n}^4\) and \(b_n {:=} 16\pi ^4n^2/\tilde{K}_{3,n}^4\). In practice, even for fairly small \(n\), \(a_n\) is equal to \(2t_1^*\pi \sqrt{n}/\tilde{K}_{3,n}\).

Theorem 2

If Assumption 1 (resp. Assumption 2) holds and \(n \ge 3\), we have the bound

$$\begin{aligned} {\Delta _{n,E}} \le \frac{0.195 \, K_{4,n} + 0.038 \, \lambda _{3,n}^2}{n} + \frac{1.0253}{\pi } \int _{a_n}^{b_n} \frac{|f_{S_{n}}(t)|}{t}dt + r_{2,n} \end{aligned}$$
(15)

where \(r_{2,n}\) is one of the four possible remainders \(r^{\text {inid,skew}}_{2,n}\), \(r^{\text {inid,noskew}}_{2,n}\), \(r^{\text {iid,skew}}_{2,n}\) or \(r^{\text {iid,noskew}}_{2,n}\), depending on whether Assumption 1 (“i.n.i.d.” case) or Assumption 2 (“i.i.d.” case) is satisfied and whether \(\mathbb {E}[X_i^3]=0\) for every \(i = 1,\dots ,n\) (“noskew” case) or not (“skew” case).

Remark 3

Assume that there exists a constant \(K_4\) such that \(K_{4,n} \le K_4\) for all \(n \ge 0\) (this is the case, for example, if the data is an i.i.d sample from a given infinite homogeneous population). Then there exists constants \(C_6, C_7, C_8, C_9 > 0\) such that the remainder terms can be bounded in the following way: \(|r_{2,n}^{\textrm{inid,skew}}| \le C_6 n^{-5/4}\), \(|r_{2,n}^{\textrm{inid,noskew}}| \le C_7 n^{-3/2}\), \(|r_{2,n}^{\textrm{iid,skew}}| \le C_8 n^{-5/4}\), and \(|r_{2,n}^{\textrm{iid,noskew}}| \le C_9 n^{-2}\), for every \(n \ge 3\).

This theorem is proved in Section A.4 under Assumption 1 (resp. in Section A.5 under Assumption 2). The first term contains quantities that were already present in the term of order 1/n in the bound of Theorem 1: \(0.195K_{4,n}\) and \(0.038 \lambda _{3,n}^2\). On the contrary, the other terms are encompassed in the integral term and in the remainder. Indeed, a careful reading of the proofs (see notably Section A.1 that outlines the structure of the proofs of all theorems) shows that the leading term \(0.1995 \, \widetilde{K}_{3,n} / \sqrt{n}\) in the bound (9) comes from choosing a free tuning parameter T of the order of \(\sqrt{n}\). Here, we make another choice for T such that this term is now negligible. The cost of this change of T is the introduction of the integral term involving \(f_{S_{n}}\). The leading term of the bound thus depends on the tail behavior of \(f_{S_{n}}\).

Note that the result is obtained under the same conditions as Theorem 1, namely under moment conditions only. Nonetheless, it is mainly interesting combined with some assumptions on \({f_{S_{n}}}\) over the interval \([a_n, b_n]\), otherwise we do not have an explicit control on the integral term involving \({f_{S_{n}}}\). In the rest of this section, we present two possible assumptions on \({f_{S_{n}}}\) that yield such a control.

3.1 Polynomial tail decay on \(|f_{S_{n}}|\).

As a first regularity condition on \({f_{S_{n}}}\), we can assume a polynomial rate decrease. Corollary 3 presents the resulting bound in the i.n.i.d. case. In fact, a similar condition could be invoked with i.i.d. data by requesting a polynomial decrease of the characteristic function of \(X_n / \sigma _n\). However, we present in the next paragraph milder assumptions in the i.i.d. case that remain sufficient to obtain an explicit control of the tails of \(f_{S_{n}}\).

Corollary 3

Let \(n \ge 3\). If Assumption 1 holds and if there exist some positive constants \(C_0, p\) such that for all \(|t| \ge a_n\), \(|f_{S_{n}}(t)| \le C_0 |t|^{-p}\), then

$$\begin{aligned} {\Delta _{n,E}} \le \frac{0.195 \, K_{4,n} + 0.038 \, \lambda _{3,n}^2}{n} + \frac{1.0253 \, C_0 a_n^{-p}}{\pi } + r_{3,n} \end{aligned}$$

where \(r_{3,n} {:=} r_{2,n} - 1.0253 \, C_0 b_n^{-p}/\pi \).

Besides moment conditions, Corollary 3 requires a uniform control of \(f_{S_{n}}\) outside the interval \((-a_n,a_n)\). When \(\tilde{K}_{3,n} = o(\sqrt{n})\), \(a_n\) goes to infinity. In this case, the condition is a tail control of the characteristic function of \(S_n\) in a neighborhood of infinity, thus making the condition weaker to impose.

Placing restrictions on the tails of \(f_{S_{n}}\) is not very common in statistical applications. However, this notion is closely related to the smoothness of the underlying distribution of \(S_n\). Proposition 21 in the Appendix (which builds upon classical results such as (Ushakov, 2011, Theorem 1.2.6)) shows that the tail condition on \(f_{S_{n}}\) is satisfied with \(p \ge 1\) whenever \(P_{S_n}\) has a density \(g_{S_n}\) that is \(p-1\) times differentiable and such that its \((p-1)\)-th derivative is of bounded variation with total variation \(V_n {:=} \textrm{Vari}[g_{S_n}^{(p-1)}]\) uniformly bounded in n. In such situations, we can take \(C_0 = 1 \vee \sup _{n \in \mathbb {N}^{*}}V_n\).

Although Corollary 3 is valid for every positive p, it is only an improvement on the results of the previous section under the stricter condition \({p > 1}\), a situation in which \(P_{S_n}\) admits a density with respect to Lebesgue’s measure (second part of Proposition 21). In particular when \(p=2\), \(a_n^{-p}\) is exactly proportional to \(n^{-1}\) and we obtain

$$\begin{aligned} {\Delta _{n,E}} \le \frac{0.195 \, K_{4,n} + 0.038 \, \lambda _{3,n}^2 + 1.0253 \, C_0 \pi ^{-1} }{n} + {C_{10} n^{-5/4}}, \end{aligned}$$

for every \(n \ge 3\) and for some constants \(C_9 > 0\). When \({p > 2}\), \(a_n^{-p}\) becomes negligible compared to \(n^{-5/4}\) so that there exists a constant \(C_{11} > C_{10}\) satisfying

$$\begin{aligned} {\Delta _{n,E}} \le \frac{0.195 \, K_{4,n} + 0.038 \, \lambda _{3,n}^2}{n} + {C_{11} n^{-5/4}}, \end{aligned}$$

Combining these bounds on \({\Delta _{n,E}}\) with the expression of the Edgeworth expansion translates into upper bounds on \(\Delta _{n,B}\) of the form

$$\begin{aligned} \Delta _{n,B} \le \frac{0.0665 \, |\lambda _{3,n}|}{\sqrt{n}} + {\frac{C_{12}}{n}} \le \frac{0.0413 \, K_{3,n}}{\sqrt{n}} + {\frac{C_{12}}{n}}, \end{aligned}$$

for some constant \(C_{12} > 0\). As soon as the term \(C_{12}/n\) gets smaller than \(0.0413 K_{3,n}/\sqrt{n}\), the bound on \(\Delta _{n,B}\) becomes much better than \(0.5583 K_{3,n}/\sqrt{n}\) or \(0.4690 K_{3,n}/\sqrt{n}\). This can happen even for sample sizes n of the order of a few thousands, assuming that \(K_{3,n}\) and \(K_{4,n}\) are reasonable (e.g. \(K_{4,n} \le 9\)). When \(\mathbb {E}[X_i^3]=0\) for every \(i = 1,\dots ,n\), we remark that \(\Delta _{n,B} = {\Delta _{n,E}}\), meaning that we obtain a bound on \(\Delta _{n,B}\) of order \(n^{-1}\).

We confirm these rates through a numerical application in Example 2 for the specific choices \(C_0 = 1\) and \(p = 2\). These choices are satisfied for common distributions such as the Laplace distribution (for which these values of \(C_0\) and p are sharp) and the Gaussian distribution. This actually opens the way for another restriction on the tails of \(f_{S_{n}}\): we could impose \(|f_{S_{n}}(t)| \le \max _{1 \le r \le M}|\rho _r(t)|\) for all \(|t| \ge a_n\) and for \((\rho _r)_{r=1, \dots , M}\) a family of known characteristic functions. This second suggestion boils down to a semiparametric assumption on \(P_{S_n}\): \(f_{S_{n}}\) is assumed to be controlled in a neighborhood of \(\pm \infty \) by the behavior of at least one of the M characteristic functions \((\rho _r)_{r=1, \dots , M},\) but \(f_{S_{n}}\) need not be exactly one of those M characteristic functions. This semiparametric restriction becomes less and less stringent as n increases since we need to control \(f_{S_{n}}\) on a region that vanishes as n goes to infinity. Since \(S_n\) is centered and of variance 1 by definition, the choice of possible \(\rho _r\) is naturally restricted to the set of characteristic functions that correspond to such standardized distributions.

3.2 Alternative control of \(|f_{S_{n}}|\) in the i.i.d. case.

We state a second corollary that deals with the i.i.d. framework. We define the following quantity \(\kappa _n {:=} \sup _{t: \, |t| \ge a_n/\sqrt{n}} |f_{X_n/\sigma _n}(t)|\) and let \(c_n {:=} b_n/a_n\). Under Assumption 2, we remark that \(\sup _{t: \, |t|\ge a_n} |f_{S_{n}}(t)| = \kappa _n^n.\)

Corollary 4

Let \(n \ge 3\). Under Assumption 2,

$$\begin{aligned} {\Delta _{n,E}} \le \frac{0.195 \, K_{4,n} + 0.038 \, \lambda _{3,n}^2}{n} + \frac{1.0253 \, \kappa _n^n\log (c_n)}{\pi } + r_{2,n} \,. \end{aligned}$$

Furthermore, \(\kappa _n < 1\) as soon as \(P_{X_n/\sigma _n}\) has an absolutely continuous component.

Note that for any given \(s > 0\) and any random variable Z, \(\sup _{t: |t| \ge s} |f_Z(t)| \)\(= 1\) if and only if \(P_{Z}\) is a lattice distribution, i.e., concentrated on a set of the form \(\{a + nh, n \in \mathbb {Z} \}\) (Ushakov, 2011, Theorem 1.1.3). Therefore, \(\kappa _n < 1\) as soon as the distribution is not lattice, which is the case for any distribution with an absolute continuous component.

In Corollary 4, the first term on the right-hand side of the inequality as well as \(r_{2,n}\) are unchanged compared to Theorem 2 and Corollary 3. The second term on the right-hand side of the inequality, \((1.0253/\pi )\kappa _n^n\log (c_n),\) corresponds to an upper bound on the integral term of Eq. 15 in Theorem 2. Imposing \(K_{4,n} \le K_4\), we can only claim that \(1.0253 \, \kappa _n^n\log (c_n)/\pi \le C_{13} \kappa _n^n\log n\) for some constant \(C_{13} > 0\) which does not provide an explicit rate on \({\Delta _{n,E}}\). If we also assume \(\sup _{n\ge 3}\kappa _n<1\) then we can write

$$\begin{aligned} {\Delta _{n,E}} \le \frac{0.195 \, K_{4,n} + 0.038 \, \lambda _{3,n}^2}{n} {+ C_{14} n^{-5/4}}, \end{aligned}$$

for some constant \(C_{14} > 0\), and

$$\begin{aligned}&\Delta _{n,B} \le \frac{0.0665 \, |\lambda _{3,n}|}{\sqrt{n}} { + \frac{C_{15}}{n}} \le \frac{0.0413 \, K_{3,n}}{\sqrt{n}} { + \frac{C_{15}}{n}}, \end{aligned}$$

for some constant \(C_{15} > 0\).

When is the assumption \(\sup _{n\ge 3}\kappa _n < 1\) reasonable? First, it always holds in the i.i.d. setting with a distribution of the \((X_i)_{i = 1, \ldots , n}\) independent of n and continuous. By definition of \(a_n\) and by the fact that \(\tilde{K}_{3,n} \ge 1\), \(a_n/\sqrt{n}\) is larger than \(2 t_1^* \pi \) for n large enough. Consequently, \(\kappa _n\) is upper bounded by \(\kappa {:=} \sup _{t: \, |t| \ge 2t_1^*\pi } |f_{X_1/\sigma _1}(t)|\) for n large enough. In this case, if \(P_{X_1/\sigma _1}\) has an absolutely continuous component, \(\kappa < 1\). For smaller \(n\), we use the fact that \(\kappa _n < 1\) for every \(n\) as explained right after Corollary 4. The value of \(\kappa \) depends on the distribution \(P_{X_1/\sigma _1}\). The closer to one \(\kappa \) gets, the less regular \(P_{X_1/\sigma _1}\) is, in the sense that the latter becomes hardly distinguishable from a lattice distribution.

Second, we could impose that the characteristic function \(f_{X_n/\sigma _n}\) be controlled by some finite family of known characteristic functions \(\rho _1, \dots , \rho _M\) (independent of n) beyond \(a_n / \sqrt{n}\). This follows the suggestion mentioned after Corollary 3, except that we now obtain an exponential upper bound instead of a polynomial one. Indeed, for n large enough, \(\kappa _n \le \kappa {:=} \sup _{t:|t|\ge 2t_1^*\pi }\max _{1\le m \le M}|\rho _m(t)|\) and \(\kappa < 1\) provided that \((\rho _m)_{m=1,\dots ,M}\) are characteristic functions of continuous distributions.

In Example 2, we plot our bounds on \(\Delta _{n,B}\) by imposing the restriction \(\kappa _n \le 0.99\) which we argue is a very reasonable choice. To justify this claim, we compare our restriction to the value of \(\kappa _n\) we would get if \(X_n/\sigma _n\) were standard Laplace, a distribution whose characteristic function has much fatter tails than the standard Gaussian or Logistic for instance. In fact, if we were to compute \(\sup _{t:|t|\ge 2t_1^*\pi }|\rho (t)|\) with \(\rho \) the characteristic function of a standard Laplace distribution, we would get \(\kappa _n < 0.11\). Despite our fairly conservative bound on \(\kappa _n\), we witness considerable improvements of our bounds compared to those given in Section 2.

Example 2

(Implementation of our bounds on \(\Delta _{n,B}\)) We compare the bounds on \(\Delta _{n,B}\) obtained in Corollaries 3 and 4 to \(0.5583 K_{3,n}/\sqrt{n}\) and \(0.4690 K_{3,n}/\sqrt{n}.\) As in Example 1, we fix \(K_{4,n} \le 9\), which is enough to control \(K_{3,n}\) (see Section 4.1). As explained above, we set \(p = 2\) and \(C_0 = 1\) to apply Corollary 3 and \(\kappa = 0.99\) for Corollary 4.

  • Corollary 3i.n.i.d.: \(\Delta _{n,B} \le \frac{0.0413 \, K_{3,n}}{\sqrt{n}} + \frac{0.195 \, K_{4,n} + 0.0147 \, K_{3,n}^2}{n} + \frac{1.0253}{\pi } a_n^{-2} + r_{3,n}\)

  • Corollary 3i.n.i.d. unskewed: \(\Delta _{n,B} \le \frac{0.195 \, K_{4,n}}{n} + \frac{1.0253}{\pi } a_n^{-2} + r_{3,n}\)

  • Corollary 4i.i.d.: \(\Delta _{n,B} \!\le \! \frac{0.0413 \, K_{3,n}}{\sqrt{n}} \!+\! \frac{0.195 \, K_{4,n} + 0.0147 \, K_{3,n}^2}{n} + \frac{1.0253 \, \kappa _n^n\log (c_n)}{\pi } + r_{2,n}\)

  • Corollary 4i.i.d. unskewed: \(\Delta _{n,B} \le \frac{0.195 \, K_{4,n}}{n} + \frac{1.0253 \, \kappa _n^n\log (c_n)}{\pi } + r_{2,n}\)

Figure 2 displays the different bounds that we obtain as a function of the sample size n, alongside with the existing bounds (Shevtsova , 2013) that do not assume such regularity conditions. The new bounds take advantage of these regularity conditions and are therefore tighter in all settings for \(n\) larger than \(10,000\). In the unskewed case, the improvement arises for much smaller \(n\) and the rate of convergence gets faster from \(1/\sqrt{n}\) to 1/n.

4 Practical considerations

Fig. 2
figure 2

Comparison between existing (Shevtsova , 2013) and new (Corollaries 3 and 4) Berry-Esseen upper bounds on \(\Delta _{n,\text {B}} {:=} \sup _{x \in \mathbb {R}}\left| \mathbb {P}(S_n \le x) - \Phi (x) \right| \) for different sample sizes with additional regularity assumption on \(f_{S_{n}}\) (log-log scale). Note that, compared to existing ones, the new bounds make use of the regularity assumption and of the boundedness of the 4th order moments

4.1 Default value \(K_{4,n} \le 9\) or “Plug-in” approach

.

As seen in the previous examples, explicit values or bounds on some functionals of \(P_{S_n}\) are required to compute our non-asymptotic bounds on a standardized sample mean. This phenomenon is not unique to our bounds, and arises for any Berry-Esseen- or Edgeworth-type bounds. A value or a bound on \(K_{3,n}\) is indeed required to compute existing Berry-Esseen bounds as in the seminal works of Berry (1941) and Esseen (1942) and its recent improvement (e.g. Shevtsova (2013)). Similar to us, recent extensions to these bounds proposed in Adell and Lekuona (2008), Boutsikas (2011) and Zhilova (2020) also depend on several (potentially unknown) moments of the distributions.

Under moment conditions only, the main term and remainder \(r_{1,n}\) of Theorem 1 solely depend on \(\lambda _{3,n}\), \(K_{3,n}\) or \(\tilde{K}_{3,n}\), and \(K_{4,n}\). As a matter of fact, a bound on \(K_{4,n}\) is sufficient to control all those quantities: Pinelis (2011) ensures \(|\lambda _{3,n}| \le 0.621 K_{3,n}\), and a convexity argument yields \(K_{3,n} \le K_{4,n}^{3/4}\) (and remember that \(\tilde{K}_{3,n}\) is lower than \(2 K_{3,n}\) in the i.n.i.d. case and \(K_{3,n} + 1\) in the i.i.d. case). Having access to a bound on \(K_{4,n}\) is thus crucial to compute our bounds in practice.

First, in some situations, one may rely on auxiliary information about the distribution. In the i.i.d. case in particular, we note that imposing the bound \({K_{4,n} \le 9}\) allows for a wide family of distributions used in practice: any Gaussian, Gumbel, Laplace, Uniform, or Logistic distribution satisfies it, as well as any Student with at least 5 degrees of freedom, any Gamma or Weibull with shape parameter at least 1. In this case, remember that \(K_{4,n}\) is the kurtosis of \(X_n\), a natural and well-studied feature of a distribution.

In the i.n.i.d. case, \(K_{4,n}\) can be rewritten as a weighted average of individual kurtosis. In that respect, the bound \({K_{4,n} \le 9}\) indicates that, on average, the individual kurtosis are lower than \(9\).

Second, if a bound on \(K_{4,n}\) is not available, a “plug-in” approach remains applicable. The idea is to estimate the moments \(\lambda _{3,n}\), \(K_{3,n}\) and \(K_{4,n}\) by their empirical counterparts in the data (method of moments estimation), and then compute \(\delta _n\) by replacing the unknown needed quantities with those estimates. We acknowledge that this type of “plug-in” approach is only approximately valid, although somewhat unavoidable when bounds on the unknown moments are not given to the researcher.

In addition to the dependence on these moment bounds, Theorem 2 involves the integral \(\int _{a_n}^{b_n} |f_{S_{n}}(t)| / t \, dt\) that depends on the a priori unknown characteristic function of \(S_n\). The application of the resulting Corollaries 3 and 4 requires a control on the tail of this characteristic function through the quantities \(C_0\) and \(p\) in the i.n.i.d. case (respectively \(\kappa _n\) in the i.i.d. case), which can be given using expert knowledge of the regularity of the density of \(S_n\), as discussed in Section 3. It is also possible to estimate the integral directly, for instance using the empirical characteristic function (Ushakov, 2011, Chapter 3).

4.2 Numerical comparisons of our bounds on \(\mathbb {P}(S_n \le x)\) and existing ones

To give a better sense of the accuracy of our results, we perform a comparison between our bounds on \(x \mapsto \mathbb {P}(S_n \le x)\) and the existing ones (Shevtsova , 2013). Indeed, a control \(\delta _n\) on \({\Delta _{n,E}}\) (respectively \(\Delta _{n,B}\)) naturally yields upper and lower brackets on \(\mathbb {P}(S_n \le x)\) of the form \(\left[ \Phi (x) + \lambda _{3,n} / (6 \sqrt{n}) \times (1 - x^2) \varphi (x)\right] \pm \delta _n\) (respectively \(\Phi (x) \pm \delta _n\)), for any real \(x\). We plot those upper and lower brackets in the i.i.d. framework for three distinct distributions: Student distributions with 5 (Figure 3) or 8 (Figure 4) degrees of freedom and an Exponential distribution with expectation equal to 1, re-centered to fall in our framework (Figure 5). These three distributions are continuous with respect to Lebesgue’s measure which allows us to resort to our sharpest i.i.d. bounds, namely those presented in Corollary 4 (compared to Figures 1 and 2, we only report those improved bounds here). On the contrary, remember that the existing bounds (Shevtsova , 2013) assume finite third-order moments only; hence, they do not leverage the additional information about skewness and regularity of the considered distributions.

The bound \(\delta _n\) depends on various features of the distribution of \(S_n\). In line with Example 2, we set \(\kappa = 0.99\), which happens to be a conservative choice with those distributions as \(\kappa = 0.42\) for a Student(df = 8), 0.54 for a Student(df = 5), and 0.63 for the Exponential distributions we consider. In the following comparisons, we focus on the impact of the unknown moments \(K_{4,n}\), \(K_{3,n}\), and \(\lambda _{3,n}\) on the accuracy of our bounds.

Fig. 3
figure 3

Setting: i.i.d. unskewed (\(\lambda _{3,n} = 0\)) with \(X_n \sim \text {Student}(\text {df} = 5)\) and \(n = 5,\!000\).

Blue line: \(\mathbb {P}(S_n \le x)\) as a function of x.

Continuous green lines: bounds \(\Phi (x) \pm \delta _n^{\text {new}}\) where \(\delta _n^{\text {new}}\) denotes the right-hand side of Corollary 4 with \(\kappa _n = 0.99\), \(K_{4,n} \le 9\), and \(K_{3,n} \le 9^{3/4}\).

Dashed green lines: bounds \(\Phi (x) \pm \delta _n^{\text {new, oracle}}\), where \(\delta _n^{\text {new, oracle}}\) denotes the right-hand side of Corollary 4 with \(\kappa _n = 0.99\) and using the true (oracle) values of \(K_{4,n} = 9\) and \(K_{3,n} \approx 2.1\).

Continuous red lines: bounds \(\Phi (x) \pm 0.4690 K_{3,n} / \sqrt{n}\) using the bound \(K_{3,n} \le 9^{3/4} \approx 5.2\).

Dashed red lines: bounds \(\Phi (x) \pm 0.4690 K_{3,n} / \sqrt{n}\) using the true value \(K_{3,n} \approx 2.1\)

Fig. 4
figure 4

Setting: i.i.d. unskewed (\(\lambda _{3,n} = 0\)) with \(X_n \sim \text {Student}(\text {df} = 8)\) and \(n = \)5,000.

Blue line: \(\mathbb {P}(S_n \le x)\) as a function of x.

Continuous green lines: bounds \(\Phi (x) \pm \delta _n^{\text {new}}\) where \(\delta _n^{\text {new}}\) denotes the right-hand side of Corollary 4 with \(\kappa _n = 0.99\), \(K_{4,n} \le 9\), and \(K_{3,n} \le 9^{3/4}\).

Dashed green lines: bounds \(\Phi (x) \pm \delta _n^{\text {new, oracle}}\), where \(\delta _n^{\text {new, oracle}}\) denotes the right-hand side of Corollary 4 with \(\kappa _n = 0.99\) and using the true (oracle) values of \(K_{4,n} = 4.5\) and \(K_{3,n} \approx 1.8\).

Continuous red lines: bounds \(\Phi (x) \pm 0.4690 K_{3,n} / \sqrt{n}\) using the bound \(K_{3,n} \le 9^{3/4} \approx 5.2\).

Dashed red lines: bounds \(\Phi (x) \pm 0.4690 K_{3,n} / \sqrt{n}\) using the true value \(K_{3,n} \approx 1.8\)

The Student distributions illustrate the unskewed case, where our bounds use the information \(\lambda _{3,n} = 0\). Figures 3 and 4 report several bounds contrasting the suggested practical choice \(K_{4,n} \le 9\), to deal with the fact that moments are unknown, with the “oracle” bounds where we use the true values of \(\lambda _{3,n}\), \(K_{3,n}\), and \(K_{4,n}\) (computed or approximated by Monte-Carlo). As a comparison, we also report two versions of the existing bound: a “practical” one using \(K_{3,n} \le K_{4,n}^{3/4} \le 9^{3/4}\), and an “oracle” version using the true value of \(K_{3,n}\). The kurtosis of a Student distribution is equal to \( 3 + 6 / (\text {df } - 4)\) with \(\text {df} > 4\) its degree of freedom. Therefore, for any Student with at least 5 degrees of freedom, the upper bound \(K_{4,n} \le 9\) is valid, but all the more conservative as \(\text {df}\) is large. We consider two different values of \(\text {df}\) to assess the loss of accuracy of our bounds when the discrepancy between the actual \(K_{4,n}\) and our suggested default choice of \(9\) increases.

In Fig. 4, we choose \(\text {df} = 8\) so that the true value is \(K_{4,n} = 4.5\) and the proposed bound \(K_{4,n} \le 9\) is thus conservative. On the contrary, in Figure 3, because \(\text {df} = 5\), the true value of \(K_{4,n}\) is equal to the suggested choice of \(9\), which becomes sharp. In that respect, it is a more favorable situation. Nonetheless, remark that there remains a difference between the “practical” and “oracle” versions of our bounds: the latter uses the true value of \(K_{3,n}\) (here, approximately equal to \(1.8\)) while the former controls \(K_{3,n}\) by \(9^{3/4} \approx 5.2\).

The Exponential distribution displayed in Figure 5 illustrates our bounds for a skewed distribution. We choose an Exponential distribution with expectation equal to 1. This distribution has a kurtosis \(K_{4,n} = 9\) so that the main difference with Figure 3 can be expected to stem from the presence of skewness. In line with the Student case, we report two versions of Shevtsova’s bounds and ours, a practical version which uses only the information \(K_{4,n} \le 9\) and an “oracle” one based on knowledge of \(\lambda _{3,n}\), \(K_{3,n}\) and \(K_{4,n}\). We recall that \(\Delta _{n,B} \ne {\Delta _{n,E}}\) when \(\lambda _{3,n} \ne 0\). What is more, the existing bounds (plotted in red) are bounds on \(\Delta _{n,B}\) whereas ours (in green) originate from a control of \(\Delta _{n,E}\).

The “oracle” version can be interpreted as a noise-free implementation of the plug-in approach. We remark that oracle versions of existing bounds and ours are twice as accurate as their counterparts which rely on \(K_{4,n} \le 9\). These oracle bounds use by definition the true values of the moments, and therefore correspond to the most favorable case, in the sense of the tightness of the bounds.

Fig. 5
figure 5

Setting: i.i.d. skewed (\(\lambda _{3,n} \ne 0\)) with \(X_n \sim \text {Exp}(1) - 1\) and \(n = 100,\!000\).

Blue line: \(\mathbb {P}(S_n \le x)\) as a function of x.

Continuous green lines: bounds \(\Phi (x) \pm \Big ( 0.621 \times 9^{3/4} / (6 \sqrt{n}) \times (1 - x^2) \varphi (x) + \delta _n^{\text {new}} \Big )\) where \(\delta _n^{\text {new}}\) denotes the right-hand side of Corollary 4 with \(\kappa _n = 0.99\) and \(K_{4,n} \le 9\) (as in Example 2).

Dashed green lines: bounds \(\Phi (x) + \lambda _{3,n} / (6 \sqrt{n}) \times (1 - x^2) \varphi (x) \pm \delta _n^{\text {new, oracle}}\), where \(\delta _n^{\text {new, oracle}}\) denotes the right-hand side of Corollary 4 with \(\kappa _n = 0.99\) and using the true (oracle) values of \(K_{4,n}\), \(K_{3,n}\) and \(\lambda _{3,n}\).

Continuous red lines: bounds \(\Phi (x) \pm 0.4690 K_{3,n} / \sqrt{n}\) using the bound \(K_{3,n} \le 9^{3/4} \approx 5.2\).

Dashed red lines: bounds \(\Phi (x) \pm 0.4690 K_{3,n} / \sqrt{n}\) using the true value \(K_{3,n} \approx 2.45\)

5 Non-asymptotic behavior of one-sided tests

We now examine some implications of our theoretical results for the non-asymptotic validity of one-sided statistical tests based on the Gaussian approximation of the distribution of a sample mean using i.i.d. data.

Let \((Y_i)_{i=1, \dots , n}\) be an i.i.d. sequence of random variable with expectation \(\mu \), known variance \(\sigma ^2\) and finite fourth moment with \(K_4 : = \mathbb {E}\left[ (Y_n-\mu )^4\right] \)\( /\sigma ^4\) the kurtosis of the distribution of \(Y_n\). We want to conduct a test of the null hypothesis \({H_0 : \mu \le \mu _0}\), for some fixed real number \(\mu _0\), against the alternative \({H_1 : \mu > \mu _0}\) with a type I error at most \(\alpha \in (0,1)\), and ideally equal to \(\alpha \). The classical approach to this problem (Gauss test) amounts to comparing \({S_n = \sum _{i=1}^n X_i / \sqrt{n}}\), where \(X_i {:=} (Y_i - \mu _0) / \sigma \), with the \(1-\alpha \) quantile of the \(\mathcal {N}(0,1)\) distribution, \(q_{\mathcal {N}(0,1)}(1-\alpha )\), and reject \(H_0\) if \(S_n\) is larger. We study this Gauss test in the general non-asymptotic framework without imposing Gaussianity of the data distribution, and we control the difference with respect to normality using the bounds developed in the previous sections.

5.1 Computation of sufficient sample sizes

In certain fields such as medicine or economics, researchers routinely set up experiments that seek to answer a specific question on an explained variable \(Y\). The number of individuals included in the experiment has to be carefully justified as large-scale analyses are very costly. This is typically done through the construction of a so-called “pre-analysis plan” which presents the sample size needed to detect a given effect with a pre-specified testing power \(\beta \in (0,1)\).

In the Gauss test setting considered here, the researcher determines the effect of interest by fixing a particular alternative hypothesis \(H_{1, \eta }: \mu = \mu _0 + \sigma \eta \) (with \(\mu >\mu _0\)). The quantity \(\eta {:=} (\mu - \mu _0) / \sigma \) is a positive number called the effect size that indicates how far away (in terms of standard deviations) the alternative hypothesis is, compared to the null hypothesis \(H_0: \mu \le \mu _0\). Remark that in our framework, \(H_{1, \eta }\) is formally the set of all distributions with mean \(\mu \), variance \(\sigma ^2\), that satisfy our additional moment and regularity conditions. \(H_{1, \eta }\) can be seen as a nonparametric class of distributions at a fixed distance \(\eta \) of the null hypothesis.

Researchers usually rely on an asymptotic normal approximation to infer the sample size needed to detect a given effect at power \(\beta \). Our results allow us to bypass this asymptotic approximation and to propose a procedure to choose the sample size n of the experiment such that

$$\begin{aligned} \mathbb {P} \Big ( \text {Rejection of } H_0 \Big ) {:=} \mathbb {P} \Big (\sum _{i=1}^n (Y_i - \mu _0) / \sqrt{n \sigma ^2} > q_{\mathcal {N}(0,1)}(1-\alpha ) \Big ) \ge \beta , \end{aligned}$$
(16)

for any distribution belonging to the alternative hypothesis space. Any n that satisfies Eq. 16 for all distributions in the alternative hypothesis is called a (non-asymptotic) sufficient sample size for the effect size \(\eta \) at power \(\beta \).

Observe that

$$\begin{aligned} \mathbb {P} \Big ( \text {Rejection } H_0 \Big )&= \mathbb {P} \Big (\sum _{i=1}^n (Y_i - \mu + \mu - \mu _0) / \sqrt{n \sigma ^2}> q_{\mathcal {N}(0,1)}(1-\alpha ) \Big ) \\&= \mathbb {P} \Big (\sum _{i=1}^n X_i / \sqrt{n} > x_n \Big ), \end{aligned}$$

where \(X_i {:=} (Y_i - \mu ) / \sigma \) are centered with mean 0 and variance 1 and \(x_n {:=} q_{\mathcal {N}(0,1)}(1-\alpha ) - \eta \sqrt{n}\). We remind the reader that the general result from Theorem 1 or Corollary 4 implies the following upper and lower bounds for every \(x \in \mathbb {R}\) and \(n \ge 3\),

$$\begin{aligned} \frac{\lambda _{3,n}}{6\sqrt{n}}(1-x^2)\varphi (x) - \delta _n \le \mathbb {P}(S_n \le x) - \Phi (x) \le \frac{\lambda _{3,n}}{6\sqrt{n}}(1-x^2)\varphi (x) + \delta _n, \end{aligned}$$
(17)

where \(\delta _n\) is the corresponding bound on \(\Delta _{n,E}\). From Eq. 17, we thus obtain

$$\begin{aligned} 1 - \mathbb {P}\Big (\sum _{i=1}^n X_i / \sqrt{n} > x_n \Big ) - \Phi (x_n) \le \frac{\lambda _{3,n}}{6\sqrt{n}}(1 - x_n^2) \varphi (x_n) + \delta _n. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathbb {P}\Big (\sum _{i=1}^n X_i / \sqrt{n} > x_n \Big ) \ge 1 - \Phi (x_n) - \frac{\lambda _{3,n}}{6\sqrt{n}}(1 - x_n^2) \varphi (x_n) - \delta _n. \end{aligned}$$

As a consequence, the sample size \(n = n_{\eta , \beta }\) defined as the solution of the following equation

$$\begin{aligned} 1 - \Phi \Big ( q_{\mathcal {N}(0,1)}(1-\alpha ) - \eta \sqrt{n} \Big )&- \frac{\lambda _{3,n} \times \left( 1 - \left( q_{\mathcal {N}(0,1)}(1-\alpha ) - \eta \sqrt{n} \right) ^2 \right) }{6\sqrt{n}} \\&\times \varphi \Big ( q_{\mathcal {N}(0,1)}(1-\alpha ) - \eta \sqrt{n} \Big ) - \delta _n = \beta , \end{aligned}$$

is a non-asymptotic sufficient sample size. Note that the same reasoning can be also applied if we only impose an upper bound on \(\lambda _{3,n}\). In particular, if we only know \(K_{4,n}\), we can use the bound \(0.621 K_{4,n}^{3/4}\) and then a sufficient sample size n can be found as the solution to

$$\begin{aligned} 1 \!-\! \Phi \Big ( q_{\mathcal {N}(0,1)}(1-\alpha ) \!-\! \eta \sqrt{n} \Big )&\!-\! \frac{0.621 K_{4,n}^{3/4} \times \left( 1 \!-\! \left( q_{\mathcal {N}(0,1)}(1-\alpha ) \!-\! \eta \sqrt{n} \right) ^2 \right) }{6\sqrt{n}} \nonumber \\&\times \varphi \Big ( q_{\mathcal {N}(0,1)}(1-\alpha ) \!-\! \eta \sqrt{n} \Big ) \!-\! \delta _n = \beta . \end{aligned}$$
(18)
Table 2 Sufficient sample sizes for the experiment to be well-powered for a nominal power \(\beta \) for the detection of an effect size \(\eta \)

Numerical applications can be found in Table 2 which displays the computed sample sizes for different choices of effect sizes \(\eta \) and of power \(\beta \). In this experiment, we choose \(K_{4,n} \le 9\) and \(\kappa \le 0.99\), as before. We can observe that, as expected, \(n_{\eta , \beta }\) increases with \(\beta \) and decreases with \(\eta \). For \(\eta \) large enough, \(n_{\eta , \beta }\) becomes approximately constant in \(\eta \) as Eq. 18 simplifies to \(1 - \delta _n = \beta .\) Conversely, it is also possible to use directly Eq. 18 to compute the power for different effects and sample sizes. The results are displayed in Table 3.

Table 3 Lower bound (18) on the power \(\beta \) (%) as a function of the effect size \(\eta \) and sample size n, with our bounds from Corollary 4, \(K_{4,n} \le 9\), and \(\kappa \le 0.99\)

5.2 Assessing the lack of information

As explained below, the non-asymptotic bounds introduced in Sections 2 and 3 can be used to evaluate the actual (for a finite sample size) level of our one-sided test of interest.

Recall that Berry-Esseen-type inequalities aim to bound \(\Delta _{n,B}\), defined in (2), the uniform distance between \(\mathbb {P}(S_n \le \cdot )\) and \(\Phi (\cdot )\). In particular, for a nominal level \(\alpha \), we thus have

$$\begin{aligned} \Big | \mathbb {P}\big ( S_n \le q_{\mathcal {N}(0,1)}(1-\alpha ) \big ) - (1-\alpha ) \Big | \le \Delta _{n,B}, \end{aligned}$$

where the probability operator is to be understood under any data-generating process such that \(\mu = \mu _0\), to be as close as possible to the alternative hypothesis \(H_1\). Either “classical” Berry-Esseen inequalities or ours obtained through an Edgeworth expansion provide bounds on \(\Delta _{n,B}\) (see the different bounds displayed in Examples 1 and 2 in the i.i.d. case). In this context, a bound on \(\Delta _{n,B}\) is said to be uninformative when it is larger than \(\alpha \). Indeed, in that case, we cannot exclude that \(\mathbb {P}\!\left( S_n \le q_{\mathcal {N}(0,1)}(1-\alpha ) \right) \) is arbitrarily close to 1, or equivalently, that the probability to reject \(H_0\) is arbitrarily close to 0, and therefore that the test is arbitrarily conservative (type I error arbitrarily smaller than the nominal level \(\alpha \)). We denote by \(n_{\max }(\alpha )\) the largest sample size n for which the bound is uninformative. Intuitively, \(n_{\max }(\alpha )\) indicates the sample size above which the asymptotic normal approximation to the distribution of \(S_n\) becomes sensible under the assumptions used to bound \(\Delta _{n,B}\). Indeed, \(n_{\max }(\alpha )\) is specific to the bound \(\delta _n\) used, which itself depends on various features of the distribution: number of finite moments, (lack of) skewness, regularity, etc. Table 4 reports the value of \(n_{\max }(\alpha )\) for different Berry-Esseen bounds and usual nominal levels \(\alpha \in \{0.10, 0.05, 0.01\}\).

Table 4 \(n_{\max }(\alpha )\), for different assumptions and Berry-Esseen bounds: Shevtsova (2013)’s bound with finite third moment (Existing), our bound with finite fourth moment (Thoerem 1), our bound with additional regularity condition on \(f_{X_n / \sigma _n}\) (Corollary 4)

For each bound, \(n_{\max }(\alpha )\) is decreasing in \(\alpha \). For \(\alpha = 0.01\) in particular, the situation deteriorates strikingly except in the most favorable case of a regular and unskewed distribution. With our bounds, the presence or absence of skewness strongly influences \(n_{\max }(\alpha )\). We also remark that imposing the additional regularity assumption introduced in Section 3 significantly lowers \(n_{\max }(\alpha )\).

5.3 Distortions of the level of the test and of the p-values

We explain now that our non-asymptotic bounds on the Edgeworth expansion can be used to detect whether the test is conservative or liberal. This goes one step further than merely checking whether it is arbitrarily conservative or not. Eq. 17 shows that \(\mathbb {P}(S_n\le x)\) belongs to the interval

$$ \mathcal {I}_{n,x} {:=} \left[ \Phi (x) + \lambda _{3,n} (1-x^2)\varphi (x) / (6\sqrt{n}) \pm \delta _n \right] , $$

which is not centered at \(\Phi (x)\) whenever \(\lambda _{3,n} \ne 0\) and \(x \ne \pm \, 1\). The length of the interval does not depend on x and shrinks at speed \(\delta _n\). On the contrary, its location depends on x. For given nonzero skewness \(\lambda _{3,n}\) and sample size \(n\), the middle point of \(\mathcal {I}_{n,x}\) is all the more shifted away from the asymptotic approximation \(\Phi (x)\) as \( (1-x^2)\varphi (x)\) is large in absolute value. The function \(x \mapsto (1-x^2)\varphi (x)\) has global maximum at \(x=0\) and minima at the points \(x \approx \pm \, 1.73\). Consequently, irrespective of n, the largest gaps between \(\mathbb {P}(S_n\le x)\) and \(\Phi (x)\) may be expected around \(x=0\) or \(x = \pm \, 1.73\). \(\Phi (x)\) could even lie outside \(\mathcal {I}_{n,x}\), in which case \(\mathbb {P}( S_n \le x )\) has to be either strictly smaller or larger than \(\Phi (x)\). More precisely, \(\mathbb {P}( S_n \le x )\) is all the further from its normal approximation \(\Phi (x)\) as the skewness \(\lambda _{3,n}\) is large in absolute value; whether \(\mathbb {P}( S_n \le x )\) is strictly smaller or larger than \(\Phi (x)\) depends on the sign of \(1 - x^2\) as developed in Table 5.

Table 5 Cases and conditions on the skewness \(\lambda _{3,n}\) under which \(\mathbb {P}( S_n \le x )\) is either strictly smaller or larger than its normal approximation \(\Phi (x)\) for any given sample size \(n \ge 3\)

These observations allow us to quantify possible non-asymptotic distortions between the nominal level and actual rejection rate of the one-sided test we consider. Let us set \(x = q_{\mathcal {N}(0,1)}(1-\alpha )\) (henceforth denoted \(q_{1-\alpha }\) to lighten notation), which implies that \(\Phi (x) = 1- \alpha \). Here, we focus solely on the case \(|q_{1-\alpha }|>1\) to encompass all tests with nominal level \(\alpha \le 0.15\), thus in particular the conventional levels 10%, 5%, and 1%. When \(\lambda _{3,n} > 6\sqrt{n}\delta _n/\big ((q_{1-\alpha }^2-1)\varphi (q_{1-\alpha })\big )\), we conclude that \(\mathbb {P}\left( S_n \le q_{1-\alpha } \right) < 1-\alpha \). Since the event \( \{ S_n \le q_{1-\alpha } \} \) is the complement of the rejection region, the probability of rejecting \(H_0\) under the null exceeds \(\alpha \); in other words, the test cannot guarantee its stated control \(\alpha \) on the type I error and is said liberal. Conversely, when \(\lambda _{3,n} < 6\sqrt{n}\delta _n/\big ((1-q_{1-\alpha }^2)\varphi (q_{1-\alpha })\big )\), the probability \(\mathbb {P}\left( S_n \le q_{1-\alpha } \right) \) has to be larger than \(1-\alpha \); equivalently, the probability to reject under the null is below \(\alpha \) so that the test is conservative.

The distortion can also be seen in terms of p-values. In the unilateral test we consider, the p-value is \({pval {:=} 1 - \mathbb {P}(S_n \le s_n)}\) with \(s_n\) the observed value of \(S_n\) in the sample. In contrast, the approximated p-value is \({\widetilde{pval} {:=} 1 - \Phi (s_n)}\). Setting \(x = s_n\) in Eq. 17 yields

$$\begin{aligned} \frac{\lambda _{3,n}}{6\sqrt{n}}(1-s_n^2)\varphi (s_n) - \delta _n \le (1 - pval) - (1 - \widetilde{pval}) \le \frac{\lambda _{3,n}}{6\sqrt{n}}(1-s_n^2)\varphi (s_n) + \delta _n. \end{aligned}$$

Therefore,

$$\begin{aligned} \widetilde{pval} - \frac{\lambda _{3,n}}{6\sqrt{n}}(1 - s_n^2)\varphi (s_n) - \delta _n \le pval \le \widetilde{pval} - \frac{ \lambda _{3,n}}{6\sqrt{n}}(1 - s_n^2)\varphi (s_n) + \delta _n. \end{aligned}$$
(19)

In line with the explanations preceding Table 5, \(\widetilde{pval}\) is strictly smaller or larger than \(pval\) when the skewness is sufficiently large in absolute value relative to \(\delta _n\). Indeed, if \(\lambda _{3,n} \ne 0\), the interval from Eq. 19 that contains the true p-value \(pval\) is not centered at the approximated p-value \(\widetilde{pval}\). Under additional regularity assumptions (see Corollary 4 in the i.i.d. case), the remainder term \(\delta _n = O(n^{-1})\) whereas the “bias” term involving \(\lambda _{3,n}\) vanishes at rate \(n^{-1/2}\). As a result, the interval locates closer to \(\widetilde{pval}\) as n increases and its width shrinks to zero at an even faster rate.

Finally, we stress that such distortions regarding rejection rates and p-values are specific to one-sided tests. For bilateral or two-sided tests, the skewness of the distribution enters symmetrically in the approximation error and cancels out thanks to the parity of \( x \mapsto (1 - x^2) \phi (x) \).