Abstract
In this article, we obtain explicit bounds on the uniform distance between the cumulative distribution function of a standardized sum \(S_n\) of \(n\) independent centered random variables with moments of order four and its first-order Edgeworth expansion. Those bounds are valid for any sample size with \(n^{-1/2}\) rate under moment conditions only and \(n^{-1}\) rate under additional regularity constraints on the tail behavior of the characteristic function of \(S_n\). In both cases, the bounds are further sharpened if the variables involved in \(S_n\) are unskewed. We also derive new Berry-Esseen-type bounds from our results and discuss their links with existing ones. Following these theoretical results, we discuss the practical use of our bounds, which depend on possibly unknown moments of the distribution of \(S_n\). Finally, we apply our bounds to investigate several aspects of the non-asymptotic behavior of one-sided tests: informativeness, sufficient sample size in experimental design, distortions in terms of levels and p-values.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
As the number of observations n in a statistical experiment goes to infinity, many statistics of interest have the property to converge weakly to a \(\mathcal {N}(0,1)\) distribution, once adequately centered and scaled, see, e.g., van der Vaart (2000, Chapter 5) for a thorough introduction. Hence, when little is known on the distribution of a statistic for a fixed sample size, a classical approach to conduct inference on the parameters of the statistical model amounts to approximating that distribution by its tractable Gaussian limit. A recurring theme in statistics and probability is thus to quantify the distance between those two distributions for a given n.
In this article, we present some refined results in the canonical case of a standardized sum of independent random variables. We consider independent but not necessarily identically distributed random variables to encompass a broader range of applications. For instance, certain bootstrap schemes such as the multiplier ones (see Chapter 9 in van der Vaart (1996) or Chapter 10 in Kosorok (2006)) boil down to studying a sequence of mutually independent not necessarily identically distributed (i.n.i.d.) random variables conditional on the initial sample.
More formally, let \({(X_i)_{i=1, \dots , n}}\) be a sequence of i.n.i.d. random variables satisfying for every \({i \in \{1, \dots ,n\}}\), \({\mathbb {E}[X_i] = 0}\) and \({\gamma _i {:=} \mathbb {E}[ X_i^4 ] < +\infty }\). We also define the standard deviation \(B_n\) of the sum of the \(X_i\)’s, i.e., \({B_n {:=} \sqrt{\sum _{i=1}^n\mathbb {E}[X_i^2]}},\) so that the standardized sum can be written as \({S_n {:=} \sum _{i=1}^n X_i/B_n}\). Finally, we define the average individual standard deviation \({\overline{B}_n {:=} B_n/\sqrt{n}}\) and the average standardized third raw moment \({\lambda _{3,n} {:=} \frac{1}{n}\sum _{i=1}^n\mathbb {E}[X_i^3]/\overline{B}_n^3}\). The main results of this article are of the form
where \(\Phi \) is the cumulative distribution function of a standard Gaussian random variable, \(\varphi \) its density function and \(\delta _n\) is a positive sequence that depends on the first four moments of \((X_i)_{i=1,\dots ,n}\) and tends to zero under some regularity conditions. In the following, we use the notation \(G_n(x) {:=} \Phi (x) + \lambda _{3,n} (6\sqrt{n})^{-1} (1-x^2)\varphi (x)\).
The quantity \(G_n(x)\) is usually called the one-term Edgeworth expansion of \(\mathbb {P}\left( S_n \le x\right) \), hence the letter E in the notation \(\Delta _{n,\text {E}}\). Controlling the uniform distance between \(\mathbb {P}\left( S_n \le \cdot \right) \) and \(G_n(\cdot )\) has a long tradition in statistics and probability, see for instance Esseen (1945) and the books by Cramer (1962) and Bhattacharya and Ranga Rao (1976). As early as in the work of Esseen (1945), it was acknowledged that in independent and identically distributed (i.i.d.) cases, \(\Delta _{n,\text {E}}\) was of the order \(n^{-1/2}\) in general and of the order \(n^{-1}\) if \((X_i)_{i=1,\dots ,n}\) has a nonzero continuous component. These results were then extended in a wide variety of directions, often in connection with bootstrap procedures, see for instance Hall (1992) and Lahiri (2003) for the dependent case.
A one-term Edgeworth expansion can be seen as a refinement of the so-called Berry-Esseen inequality (Berry (1941); Esseen (1942)) which goal is to bound
The refinement stems from the fact that in \(\Delta _{n,\text {E}},\) the distance between \(\mathbb {P}\left( S_n \le \cdot \right) \) and \(\Phi \) is adjusted for the presence of non-asymptotic skewness in the distribution of \(S_n\). Contrary to the literature on Edgeworth expansions, there is a substantial amount of work devoted to explicit constants in the Berry-Esseen inequality and its extensions, see, e.g., Bentkus and Götze (1996); Bentkus (2003); Pinelis and Molzon (2016); Chernozhukov et al. (2017); Raič (2018, 2019). The sharpest known result in the i.n.i.d. univariate framework is due to Shevtsova (2013), which shows that for every \({n \in \mathbb {N}^{*}}\), if \({\mathbb {E}[|X_i|^3]<+\infty }\) for every \({i \in \{1,...,n\}}\), then \(\Delta _{n,\text {B}} \le 0.5583 \, K_{3,n} / \sqrt{n}\) where \(K_{p,n} {:=} n^{-1} \sum _{i=1}^n\mathbb {E}[|X_i|^p]/(\overline{B}_n)^p\), for \({p \in \mathbb {N}^{*}}\), denotes the average standardized p-th absolute moment. \(K_{p,n}\) measures tail thickness, with \(K_{2,n}\) normalized to 1 and \(K_{4,n}\) the kurtosis. An analogous result is given in Shevtsova (2013) under the i.i.d. assumption where 0.5583 is replaced with 0.4690. A close lower bound is due to Esseen (1956): there exists a distribution such that \(\Delta _{n,\text {B}} = (C_B/\sqrt{n}) \left( n^{-1} \sum _{i=1}^n\mathbb {E}[|X_i|^3]/\overline{B}_n^3\right) \) with \({C_B \approx 0.4098}\). Another line of research applies Edgeworth expansions in order to get a bound on \(\Delta _{n,\text {B}}\) that contains higher-order terms, see Adell and Lekuona (2008); Boutsikas (2011) and Zhilova (2020).
Despite the breadth of those theoretical advances, there remain some limits to take full advantage of those results even in simple statistical applications, for instance, when conducting inference on the expectation of a real random variable.Footnote 1 If we focus on Berry-Esseen inequalities, we show in Section 5.2 that even the sharpest upper bound to date on \(\Delta _{n,\text {B}}\) can be uninformative when conducting inference on an expectation even for n larger than 59,000. Therefore, it is natural to wonder whether bounds derived from a one-term Edgeworth expansion could be tighter in moderately large samples (such as a few thousands). In the i.i.d. case and under some smoothness conditions, Senatov (2011) obtains such improved bounds. To our knowledge, the question is nevertheless still open in the i.n.i.d. setup, as well as in the general setup when no condition on the characteristic function is assumed. In particular, most articles that present results of the form of (1) do not provide a fully explicit value for \(\delta _n\), that is, \(\delta _n\) is defined up to some “universal” but unknown constant, see for instance Cramer (1962) and Bentkus and Götze (1996), among others.
In this article, we derive novel inequalities of the form of (1) that aim to be relevant in practical applications. Such “user-friendly” bounds seek to achieve two goals. First, we provide explicit values for \(\delta _n\), which are implemented in the new R package BoundEdgeworth Derumigny et al. (2023) using the function Bound_EE1 (the function Bound_BE provides a bound on \({\Delta _{n,B}}\)). Second, the bounds \(\delta _n\) should be small enough to be informative even with small (\({n \approx }\) hundreds) to moderate (\({n \approx }\) thousands) sample sizes. We obtain these bounds in an i.i.d. setting and in a more general i.n.i.d. case only assuming finite fourth moments.
We give improved bounds on \(\Delta _{n,\text {E}}\) under some regularity assumptions on the tail behavior of the characteristic function \({f_{S_{n}}}\) of \(S_n\). Such conditions are related to the continuity of the distribution of \(S_n\) and the differentiability of the corresponding density (with respect to Lebesgue’s measure). These are well-known conditions required for the Edgeworth expansion to be a good approximation of \({\mathbb {P}(S_n\le \cdot \,)}\) with fast rates. Our main results are summed up in Table 1.
In the rest of this section, we introduce notation used in the rest of the paper. Section 2 presents our bounds on \({\Delta _{n,E}}\) under moment conditions only in i.n.i.d. or i.i.d. settings. In Section 3, we develop tighter bounds under regularity assumptions on the characteristic function of \(S_n\). They rely on an alternative control of \({\Delta _{n,E}}\) that involves the integral of \({f_{S_{n}}}\), enabling us to use additional regularity assumptions on the tails of that function. In Section 4, we discuss practical aspects related to our bounds: how to choose or estimate the moments of the distribution of \(S_n\) involved in order to compute our bounds. We also perform numerical comparisons between our and existing bounds for some particular distributions (Student and Gamma). In Section 5, we apply our results to analyze several aspects of one-sided tests based on the normal approximation of a sample mean. In particular, based on our bounds, we propose a new method to compute sufficient sample sizes for experimental design with given effect size to be detected and nominal power. All proofs are postponed in the appendix. The proofs of the main results are gathered in Appendix A, relying on the computations of Appendix B. Useful lemmas are given in Appendix C.
Additional notation. \(\vee \) (resp. \(\wedge \)) denotes the maximum (resp. minimum) operator. For a random variable X, we denote its probability distribution by \(P_X\). For a distribution P, let \(f_P\) denote its characteristic function; similarly, for a random variable X, we denote by \(f_X\) its characteristic function. We recall that \(f_{\mathcal {N}(0,1)}(t)=e^{-t^2/2}\). We denote the (extended) lower incomplete Gamma function by \(\gamma (a, x) {:=} \int _0^x |u|^{a-1} e^{-u} du\) (for \(a > 0\) and \(x \in \mathbb {R}\)), the upper incomplete Gamma function by \(\Gamma (a,x) {:=} \int _x^{+\infty } u^{a-1} e^{-u} du\) (for \(a \ge 0\) and \(x > 0\)) and the standard gamma function by \(\Gamma (a) {:=} \Gamma (a,0) = \int _0^{+\infty } u^{a-1} e^{-u} du\) (for \(a > 0\)). For two sequences \((a_n),\) \((b_n),\) we write \(a_n = O(b_n)\) whenever there exists \(C>0\) such that \({a_n \le C b_n}\); \(a_n = o(b_n)\) whenever \(a_n / b_n \rightarrow 0\); and \(a_n \asymp b_n\) whenever \(a_n = O(b_n)\) and \(b_n = O(a_n)\). We denote by \(\chi _1\) the constant \(\chi _1 {:=} \sup _{x>0} x^{-3} |\cos (x)-1+x^2/2| \approx 0.099\) (Shevtsova , 2010), and by \(\theta _1^*\) the unique root in \((0,2\pi )\) of the equation \(\theta ^2+2\theta \sin (\theta )+6(\cos (\theta )-1)=0\). We also define \(t_1^* {:=} \theta _1^* / (2\pi ) \approx 0.64\) (Shevtsova , 2010). For every \({i \in \mathbb {N}^{*}}\), we define the individual standard deviation \({\sigma _{i} {:=} \sqrt{\mathbb {E}[X_i^2]}}\). Henceforth, we reason for a fixed arbitrary sample size \({n \in \mathbb {N}^{*}}\). Densities and continuous distributions are always assumed implicitly to be with respect to Lebesgue’s measure.
For clarity, we define below the concept of an explicit expression. In the rest of the article, the goal is to find bounds on \({\Delta _{n,E}}\) that are explicit expressions in the sense of Definition 1.
Definition 1
An expression is called explicit if it can be written as a finite sequence of terms. A term is defined as
-
either a numerical constant (i.e. a computable real number),
-
or one of the parameters of the framework (such as n, \(\lambda _{3,n}\), \(K_{4,n}\) and so on),
-
or one of the standard functions (rational functions, exponential functions, logarithmic functions, incomplete Gamma functions, indicator functions, absolute value, maximum or minimum) applied to a finite set of terms,
-
or, recursively, as an explicit expression itself.
2 Control of \(\varvec{\Delta _{n,\text {E}}}\) under moment conditions only
We start by introducing two versions of our basic assumptions on the distribution of the variables \((X_i)_{i=1, \dots , n}\).
Assumption 1
(Moment conditions in the i.n.i.d. framework) \((X_i)_{i=1, \dots , n}\) are independent and centered random variables such that for every \(i=1,\dots ,n\), the fourth raw individual moment \(\gamma _{i} {:=} \mathbb {E}[ X_i^4 ]\) is positive and finite.
Assumption 2
(Moment conditions in the i.i.d. framework) \((X_i)_{i=1,\dots ,n}\) are i.i.d. centered random variables such that the fourth raw moment \(\gamma _{n} {:=} \mathbb {E}[ X_n^4 ]\) is positive and finite.
Assumption 2 corresponds to the classical i.i.d. sampling with finite fourth moment while Assumption 1 is its generalization in the i.n.i.d. framework. Those two assumptions primarily ensure that enough moments of \((X_i)_{i=1,\dots ,n}\) exist to build a non-asymptotic upper bound on \(\Delta _{n,\text {E}}.\) In some applications, such as the bootstrap, it is required to consider an array of random variables \((X_{i,n})_{i=1,\dots ,n}\) instead of a sequence. For example, Efron (1979)’s nonparametric bootstrap procedure consists in drawing n elements in the random sample \((X_{1,n},...,X_{n,n})\) with replacement. Conditional on \((X_{i,n})_{i=1,\dots ,n},\) the n values drawn with replacement can be seen as a sequence of n i.i.d. random variables with distribution \(\frac{1}{n}\sum _{i=1}^n\delta _{\{X_{i,n}\}}\), denoting by \(\delta _{\{a\}}\) the Dirac measure at a given point \({a \in \mathbb {R}}\). Our results encompass these situations directly. Nonetheless, we do not use the array terminology here as our results hold non-asymptotically, i.e., for any fixed sample size n.
To state our first theorem, remember that \({\overline{B}_n {:=} (1 / \sqrt{n}) \sqrt{\sum _{i=1}^n\sigma _{i}^2}}\), for \(p \in \mathbb {N}^{*}\), \(K_{p,n} {:=} n^{-1} \sum _{i=1}^n\mathbb {E}[|X_i|^p]/ \overline{B}_n^p\), and let us introduce \(\widetilde{K}_{3,n} {:=} K_{3,n} + \frac{1}{n}\sum _{i=1}^n\mathbb {E}|X_i|\sigma _{i}^2 / \overline{B}_n^3\), \(\Delta {:=} (1 - 4 \chi _1 - \sqrt{K_{4,n}/n}) / 2\), and the terms \(r^{\text {inid,skew}}_{1,n}\), \(r^\text {inid,noskew}_{1,n}\), \(r^\text {iid,skew}_{1,n}\) and \(r^\text {iid,noskew}_{1,n}\).
These remainder terms are defined by:
and
where
and
The following theorem is proved in Sections 1 (“i.n.i.d.” case) and Section 1 (“i.i.d.” case).
Theorem 1
(Control of the one-term Edgeworth expansion with bounded moments of order four) If Assumption 1 (resp. Assumption 2) holds and \(n \ge 3\), we have the bound
where \(r_{1,n}\) is one of the four possible remainders \(r_{1,n}^{\text {inid,skew}}\), \(r_{1,n}^{\text {inid,noskew}}\), \(r_{1,n}^{\text {iid,skew}}\) or \(r_{1,n}^{\text {iid,noskew}}\), depending on whether Assumption 1 (“i.n.i.d.” case) or Assumption 2 (“i.i.d.” case) is satisfied and whether \(\mathbb {E}[X_i^3]=0\) for every \(i = 1,\dots ,n\) (“noskew” case) or not (“skew” case).
Remark 1
Assume that there exists a constant \(K_4\) such that \(K_{4,n} \le K_4\) for all \(n \ge 3\) (this is the case, for example, if the data is an i.i.d sample from a given infinite homogeneous population). Then there exist constants \(C_1, C_2, C_3, C_4 > 0\) such that the remainder terms can be bounded in the following way: \(|r^\textrm{inid,skew}_{1,n}| \le C_1 n^{-5/4}\), \(|r^\textrm{iid,skew}_n| \le C_2 n^{-3/2}\), \(|r^\textrm{inid,skew}_{1,n}| \le C_3 n^{-5/4}\), and \(|r^\textrm{iid,skew}_n| \le C_4 n^{-2}\), for every \(n \ge 3\). This can be seen directly from the previous equations, as it is always possible to find the main term, and then bound all the others by the required powers.
Remark 2
In the regime where \(K_{4,n}\) tends to infinity faster than \(\sqrt{n}\), our bounds do not tend to 0. This is the case in particular for the term that is multiplied by \(\mathbbm {1}_{\{\Delta \ne 0\}}\). In this case, the bounds given by Theorem 1 are still valid; in some cases, the right-hand side will be larger than 1 and therefore the inequality trivially still holds. This can be interpreted in the following sense: the average kurtosis of the distribution increases too fast for the distance to the first-order Edgeworth expansion to be controlled by our techniques.
Note that it is possible to replace \(\tilde{K}_{3,n}\) by the simpler upper bound \(2 K_{3,n}\) under Assumption 1 (respectively by \(K_{3,n}+1\) under Assumption 2). This theorem displays a bound of order \(n^{-1/2}\) on \({\Delta _{n,E}}\) in the regime where \(K_{4,n}\) is bounded by a fixed constant. The rate \(n^{-1/2}\) cannot be improved when only assuming moment conditions on \((X_i)_{i=1,\dots ,n}\) (Esseen (1945), Cramer (1962)). Another nice aspect of those bounds is their dependence on \(\lambda _{3,n}\). For many classes of distributions, \(\lambda _{3,n}\) can, in fact, be exactly zero. This is the case if for every \(i = 1,\dots ,n\), \(X_i\) has a non-skewed distribution, such as any distribution that is symmetric around its expectation. More generally, \(|\lambda _{3,n}|\) can be substantially smaller than \(K_{3,n}\), decreasing the related terms.
As mentioned in the Introduction, we are not aware of explicit bounds on \({\Delta _{n,E}}\) under moment conditions only. It is thus difficult to assess how our bounds compare to the literature. On the other hand, there exist well-established bounds on \(\Delta _{n,B}\). Using Theorem 1, the bound \((1-x^2)\varphi (x) / 6 \le \varphi (0)/6 \le 0.0665 \) for all \(x \in \mathbb {R}\), and applying the triangle inequality, we can control \(\Delta _{n,B}\) as well. More precisely, for every \(n \ge 3\), we have
for some constant \(C_5 > 0\). Under Assumption 1, \(\tilde{K}_{3,n} \le 2 K_{3,n} \). Combined with the refined inequality \(|\lambda _{3,n}| \le 0.621K_{3,n}\) (Pinelis, 2011, Theorem 1), we can derive a simpler bound that involves only \(K_{3,n}\)
The bound \(\Delta _{n,B} \le 0.4403 K_{3,n} / \sqrt{n} + {C_5/n}\) is already tighter than the sharpest known Berry-Esseen inequality in the i.n.i.d. framework, \(\Delta _{n,B} \le 0.5583 K_{3,n}/\sqrt{n}\), as soon as the remainder term \(C_5 / n\) is smaller than the difference \(0.118 K_{3,n}/\sqrt{n}\). This bound is also tighter than the sharpest known Berry-Esseen inequality in the i.i.d. case, \(\Delta _{n,B} \le 0.4690 K_{3,n}/\sqrt{n}\), up to a \({C_5/n}\) term. We recall that the sharpest existing bounds (Shevtsova , 2013) only require a finite third moment while we use further regularity in the form of a finite fourth moment. We refer to Example 1 and Fig. 1 for a numerical comparison, showing improvements for n of the order of a few thousands. The most striking improvement is obtained in the unskewed case when \({\mathbb {E}[X_i^3] = 0}\) for every integer i. In this case, Theorem 1 and the inequality \({\tilde{K}_{3,n} \le 2 K_{3,n}}\) yield \({\Delta _{n,B} \le 0.3990 K_{3,n} / \sqrt{n} + {C_5/n}}\). Note that this result does not contradict Esseen (1956)’s lower bound \({0.4098 K_{3,n} / \sqrt{n}}\) as the distribution he constructs does not satisfy \(\mathbb {E}[X_i^3] = 0\) for every i.
Under Assumption 2, \(\tilde{K}_{3,n} \le K_{3,n}+1\) and we can combine this with (10) and the inequality \(|\lambda _{3,n}| \le 0.621K_{3,n}\), so that we obtain
As in the i.n.i.d. case discussed above, the numerical constant in front of \(K_{3,n}\) in the leading term is smaller than the lower bound constant \({C_B{:=}0.4098}\) derived in Esseen (1956). The point is addressed in detail in Shevtsova (2012), where the author explains that the constant coming from Esseen (1956) cannot be improved only if one seeks control of \(\Delta _{n,B}\) with a leading term of the form \(c_1 K_{3,n}/\sqrt{n}\) for some \(c_1 > 0\). In contrast, our bound on \(\Delta _{n,B}\) exhibits a leading term of the form \((c_1 K_{3,n} + c_2)/\sqrt{n}\) for positive constants \(c_1\) and \(c_2\).
Example 1
(Implementation of our bounds on \(\Delta _{n,B}\)) Theorem 1 provides new tools to control \(\Delta _{n,B}\), and we compare them with existing results. To compute our bounds, we need numerical values for \(\tilde{K}_{3,n}\), \(\lambda _{3,n}\), and \(K_{4,n}\) or upper bounds thereon. As discussed in Section 4.1, controlling \(K_{4,n}\) is in fact sufficient to bound \({\Delta _{n,E}}\) and \(\Delta _{n,B}\). In that section, we also explain that the choice \(K_{4,n} \le 9\) is reasonable in practice as it covers a wide range of commonly encountered distributions. Consequently, we stick to this value in our numerical examples.
The different bounds, without or with the assumption of an unskewed distribution (\(\lambda _{3,n} = 0\)), are plotted as a function of \(n\) in Fig. 1:
-
Shevtsova (2013) i.n.i.d.: \(\frac{0.5583 }{\sqrt{n}} K_{3,n}\)
-
Shevtsova (2013) i.i.d.: \(\frac{0.4690}{\sqrt{n}} K_{3,n}\)
-
Theorem 1i.n.i.d.: \(\frac{0.4403 }{\sqrt{n}} K_{3,n} + r_{1,n}\)
-
Theorem 1i.n.i.d. (unskewed): \(\frac{0.3990}{\sqrt{n}} K_{3,n} + r_{1,n}\)
-
Theorem 1i.i.d.: \(\frac{0.2408 K_{3,n} + 0.1995}{\sqrt{n}} + r_{1,n}\)
-
Theorem 1i.i.d. (unskewed): \(\frac{0.1995 (K_{3,n} + 1)}{\sqrt{n}} + r_{1,n}\),
where the explicit expressions of \(r_{1,n}\), according to the set-up, are given in Eqs. 3, 4, 5, and 6.
As previously mentioned, our bound in the baseline i.n.i.d. case gets close to and even improves upon the best known Berry-Esseen bound in the i.i.d. setup (Shevtsova , 2013) for n of the order of tens of thousands. When \({\lambda _{3,n} = 0}\), our bounds are smaller, highlighting improvements of the Berry-Esseen bounds for unskewed distributions. In parallel, the bounds are also reduced in the i.i.d. framework.
3 Improved bounds on \({\varvec{\Delta _{n,E}}}\) under assumptions on the tail behavior of \(\varvec{f_{S_{n}}}\)
In this section, we derive tighter bounds on \({\Delta _{n,E}}\) under additional regularity conditions on the tail behavior of the characteristic function of \(S_n\). They follow from Theorem 2, which provides an alternative upper bound on \({\Delta _{n,E}}\) that involves the tail behavior of \(f_{S_{n}}\). To state this theorem, let us introduce the terms \(r_{2,n}^{\text {inid,skew}}\), \(r_{2,n}^{\text {inid,noskew}}\), \(r_{2,n}^{\text {iid,skew}}\) and \(r_{2,n}^{\text {iid,noskew}}\)
and
Recall also that \(t_1^* \approx 0.64\) and let \(a_n {:=} 2t_1^*\pi \sqrt{n}/\tilde{K}_{3,n} \wedge 16\pi ^3n^2/\tilde{K}_{3,n}^4\) and \(b_n {:=} 16\pi ^4n^2/\tilde{K}_{3,n}^4\). In practice, even for fairly small \(n\), \(a_n\) is equal to \(2t_1^*\pi \sqrt{n}/\tilde{K}_{3,n}\).
Theorem 2
If Assumption 1 (resp. Assumption 2) holds and \(n \ge 3\), we have the bound
where \(r_{2,n}\) is one of the four possible remainders \(r^{\text {inid,skew}}_{2,n}\), \(r^{\text {inid,noskew}}_{2,n}\), \(r^{\text {iid,skew}}_{2,n}\) or \(r^{\text {iid,noskew}}_{2,n}\), depending on whether Assumption 1 (“i.n.i.d.” case) or Assumption 2 (“i.i.d.” case) is satisfied and whether \(\mathbb {E}[X_i^3]=0\) for every \(i = 1,\dots ,n\) (“noskew” case) or not (“skew” case).
Remark 3
Assume that there exists a constant \(K_4\) such that \(K_{4,n} \le K_4\) for all \(n \ge 0\) (this is the case, for example, if the data is an i.i.d sample from a given infinite homogeneous population). Then there exists constants \(C_6, C_7, C_8, C_9 > 0\) such that the remainder terms can be bounded in the following way: \(|r_{2,n}^{\textrm{inid,skew}}| \le C_6 n^{-5/4}\), \(|r_{2,n}^{\textrm{inid,noskew}}| \le C_7 n^{-3/2}\), \(|r_{2,n}^{\textrm{iid,skew}}| \le C_8 n^{-5/4}\), and \(|r_{2,n}^{\textrm{iid,noskew}}| \le C_9 n^{-2}\), for every \(n \ge 3\).
This theorem is proved in Section A.4 under Assumption 1 (resp. in Section A.5 under Assumption 2). The first term contains quantities that were already present in the term of order 1/n in the bound of Theorem 1: \(0.195K_{4,n}\) and \(0.038 \lambda _{3,n}^2\). On the contrary, the other terms are encompassed in the integral term and in the remainder. Indeed, a careful reading of the proofs (see notably Section A.1 that outlines the structure of the proofs of all theorems) shows that the leading term \(0.1995 \, \widetilde{K}_{3,n} / \sqrt{n}\) in the bound (9) comes from choosing a free tuning parameter T of the order of \(\sqrt{n}\). Here, we make another choice for T such that this term is now negligible. The cost of this change of T is the introduction of the integral term involving \(f_{S_{n}}\). The leading term of the bound thus depends on the tail behavior of \(f_{S_{n}}\).
Note that the result is obtained under the same conditions as Theorem 1, namely under moment conditions only. Nonetheless, it is mainly interesting combined with some assumptions on \({f_{S_{n}}}\) over the interval \([a_n, b_n]\), otherwise we do not have an explicit control on the integral term involving \({f_{S_{n}}}\). In the rest of this section, we present two possible assumptions on \({f_{S_{n}}}\) that yield such a control.
3.1 Polynomial tail decay on \(|f_{S_{n}}|\).
As a first regularity condition on \({f_{S_{n}}}\), we can assume a polynomial rate decrease. Corollary 3 presents the resulting bound in the i.n.i.d. case. In fact, a similar condition could be invoked with i.i.d. data by requesting a polynomial decrease of the characteristic function of \(X_n / \sigma _n\). However, we present in the next paragraph milder assumptions in the i.i.d. case that remain sufficient to obtain an explicit control of the tails of \(f_{S_{n}}\).
Corollary 3
Let \(n \ge 3\). If Assumption 1 holds and if there exist some positive constants \(C_0, p\) such that for all \(|t| \ge a_n\), \(|f_{S_{n}}(t)| \le C_0 |t|^{-p}\), then
where \(r_{3,n} {:=} r_{2,n} - 1.0253 \, C_0 b_n^{-p}/\pi \).
Besides moment conditions, Corollary 3 requires a uniform control of \(f_{S_{n}}\) outside the interval \((-a_n,a_n)\). When \(\tilde{K}_{3,n} = o(\sqrt{n})\), \(a_n\) goes to infinity. In this case, the condition is a tail control of the characteristic function of \(S_n\) in a neighborhood of infinity, thus making the condition weaker to impose.
Placing restrictions on the tails of \(f_{S_{n}}\) is not very common in statistical applications. However, this notion is closely related to the smoothness of the underlying distribution of \(S_n\). Proposition 21 in the Appendix (which builds upon classical results such as (Ushakov, 2011, Theorem 1.2.6)) shows that the tail condition on \(f_{S_{n}}\) is satisfied with \(p \ge 1\) whenever \(P_{S_n}\) has a density \(g_{S_n}\) that is \(p-1\) times differentiable and such that its \((p-1)\)-th derivative is of bounded variation with total variation \(V_n {:=} \textrm{Vari}[g_{S_n}^{(p-1)}]\) uniformly bounded in n. In such situations, we can take \(C_0 = 1 \vee \sup _{n \in \mathbb {N}^{*}}V_n\).
Although Corollary 3 is valid for every positive p, it is only an improvement on the results of the previous section under the stricter condition \({p > 1}\), a situation in which \(P_{S_n}\) admits a density with respect to Lebesgue’s measure (second part of Proposition 21). In particular when \(p=2\), \(a_n^{-p}\) is exactly proportional to \(n^{-1}\) and we obtain
for every \(n \ge 3\) and for some constants \(C_9 > 0\). When \({p > 2}\), \(a_n^{-p}\) becomes negligible compared to \(n^{-5/4}\) so that there exists a constant \(C_{11} > C_{10}\) satisfying
Combining these bounds on \({\Delta _{n,E}}\) with the expression of the Edgeworth expansion translates into upper bounds on \(\Delta _{n,B}\) of the form
for some constant \(C_{12} > 0\). As soon as the term \(C_{12}/n\) gets smaller than \(0.0413 K_{3,n}/\sqrt{n}\), the bound on \(\Delta _{n,B}\) becomes much better than \(0.5583 K_{3,n}/\sqrt{n}\) or \(0.4690 K_{3,n}/\sqrt{n}\). This can happen even for sample sizes n of the order of a few thousands, assuming that \(K_{3,n}\) and \(K_{4,n}\) are reasonable (e.g. \(K_{4,n} \le 9\)). When \(\mathbb {E}[X_i^3]=0\) for every \(i = 1,\dots ,n\), we remark that \(\Delta _{n,B} = {\Delta _{n,E}}\), meaning that we obtain a bound on \(\Delta _{n,B}\) of order \(n^{-1}\).
We confirm these rates through a numerical application in Example 2 for the specific choices \(C_0 = 1\) and \(p = 2\). These choices are satisfied for common distributions such as the Laplace distribution (for which these values of \(C_0\) and p are sharp) and the Gaussian distribution. This actually opens the way for another restriction on the tails of \(f_{S_{n}}\): we could impose \(|f_{S_{n}}(t)| \le \max _{1 \le r \le M}|\rho _r(t)|\) for all \(|t| \ge a_n\) and for \((\rho _r)_{r=1, \dots , M}\) a family of known characteristic functions. This second suggestion boils down to a semiparametric assumption on \(P_{S_n}\): \(f_{S_{n}}\) is assumed to be controlled in a neighborhood of \(\pm \infty \) by the behavior of at least one of the M characteristic functions \((\rho _r)_{r=1, \dots , M},\) but \(f_{S_{n}}\) need not be exactly one of those M characteristic functions. This semiparametric restriction becomes less and less stringent as n increases since we need to control \(f_{S_{n}}\) on a region that vanishes as n goes to infinity. Since \(S_n\) is centered and of variance 1 by definition, the choice of possible \(\rho _r\) is naturally restricted to the set of characteristic functions that correspond to such standardized distributions.
3.2 Alternative control of \(|f_{S_{n}}|\) in the i.i.d. case.
We state a second corollary that deals with the i.i.d. framework. We define the following quantity \(\kappa _n {:=} \sup _{t: \, |t| \ge a_n/\sqrt{n}} |f_{X_n/\sigma _n}(t)|\) and let \(c_n {:=} b_n/a_n\). Under Assumption 2, we remark that \(\sup _{t: \, |t|\ge a_n} |f_{S_{n}}(t)| = \kappa _n^n.\)
Corollary 4
Let \(n \ge 3\). Under Assumption 2,
Furthermore, \(\kappa _n < 1\) as soon as \(P_{X_n/\sigma _n}\) has an absolutely continuous component.
Note that for any given \(s > 0\) and any random variable Z, \(\sup _{t: |t| \ge s} |f_Z(t)| \)\(= 1\) if and only if \(P_{Z}\) is a lattice distribution, i.e., concentrated on a set of the form \(\{a + nh, n \in \mathbb {Z} \}\) (Ushakov, 2011, Theorem 1.1.3). Therefore, \(\kappa _n < 1\) as soon as the distribution is not lattice, which is the case for any distribution with an absolute continuous component.
In Corollary 4, the first term on the right-hand side of the inequality as well as \(r_{2,n}\) are unchanged compared to Theorem 2 and Corollary 3. The second term on the right-hand side of the inequality, \((1.0253/\pi )\kappa _n^n\log (c_n),\) corresponds to an upper bound on the integral term of Eq. 15 in Theorem 2. Imposing \(K_{4,n} \le K_4\), we can only claim that \(1.0253 \, \kappa _n^n\log (c_n)/\pi \le C_{13} \kappa _n^n\log n\) for some constant \(C_{13} > 0\) which does not provide an explicit rate on \({\Delta _{n,E}}\). If we also assume \(\sup _{n\ge 3}\kappa _n<1\) then we can write
for some constant \(C_{14} > 0\), and
for some constant \(C_{15} > 0\).
When is the assumption \(\sup _{n\ge 3}\kappa _n < 1\) reasonable? First, it always holds in the i.i.d. setting with a distribution of the \((X_i)_{i = 1, \ldots , n}\) independent of n and continuous. By definition of \(a_n\) and by the fact that \(\tilde{K}_{3,n} \ge 1\), \(a_n/\sqrt{n}\) is larger than \(2 t_1^* \pi \) for n large enough. Consequently, \(\kappa _n\) is upper bounded by \(\kappa {:=} \sup _{t: \, |t| \ge 2t_1^*\pi } |f_{X_1/\sigma _1}(t)|\) for n large enough. In this case, if \(P_{X_1/\sigma _1}\) has an absolutely continuous component, \(\kappa < 1\). For smaller \(n\), we use the fact that \(\kappa _n < 1\) for every \(n\) as explained right after Corollary 4. The value of \(\kappa \) depends on the distribution \(P_{X_1/\sigma _1}\). The closer to one \(\kappa \) gets, the less regular \(P_{X_1/\sigma _1}\) is, in the sense that the latter becomes hardly distinguishable from a lattice distribution.
Second, we could impose that the characteristic function \(f_{X_n/\sigma _n}\) be controlled by some finite family of known characteristic functions \(\rho _1, \dots , \rho _M\) (independent of n) beyond \(a_n / \sqrt{n}\). This follows the suggestion mentioned after Corollary 3, except that we now obtain an exponential upper bound instead of a polynomial one. Indeed, for n large enough, \(\kappa _n \le \kappa {:=} \sup _{t:|t|\ge 2t_1^*\pi }\max _{1\le m \le M}|\rho _m(t)|\) and \(\kappa < 1\) provided that \((\rho _m)_{m=1,\dots ,M}\) are characteristic functions of continuous distributions.
In Example 2, we plot our bounds on \(\Delta _{n,B}\) by imposing the restriction \(\kappa _n \le 0.99\) which we argue is a very reasonable choice. To justify this claim, we compare our restriction to the value of \(\kappa _n\) we would get if \(X_n/\sigma _n\) were standard Laplace, a distribution whose characteristic function has much fatter tails than the standard Gaussian or Logistic for instance. In fact, if we were to compute \(\sup _{t:|t|\ge 2t_1^*\pi }|\rho (t)|\) with \(\rho \) the characteristic function of a standard Laplace distribution, we would get \(\kappa _n < 0.11\). Despite our fairly conservative bound on \(\kappa _n\), we witness considerable improvements of our bounds compared to those given in Section 2.
Example 2
(Implementation of our bounds on \(\Delta _{n,B}\)) We compare the bounds on \(\Delta _{n,B}\) obtained in Corollaries 3 and 4 to \(0.5583 K_{3,n}/\sqrt{n}\) and \(0.4690 K_{3,n}/\sqrt{n}.\) As in Example 1, we fix \(K_{4,n} \le 9\), which is enough to control \(K_{3,n}\) (see Section 4.1). As explained above, we set \(p = 2\) and \(C_0 = 1\) to apply Corollary 3 and \(\kappa = 0.99\) for Corollary 4.
-
Corollary 3i.n.i.d.: \(\Delta _{n,B} \le \frac{0.0413 \, K_{3,n}}{\sqrt{n}} + \frac{0.195 \, K_{4,n} + 0.0147 \, K_{3,n}^2}{n} + \frac{1.0253}{\pi } a_n^{-2} + r_{3,n}\)
-
Corollary 3i.n.i.d. unskewed: \(\Delta _{n,B} \le \frac{0.195 \, K_{4,n}}{n} + \frac{1.0253}{\pi } a_n^{-2} + r_{3,n}\)
-
Corollary 4i.i.d.: \(\Delta _{n,B} \!\le \! \frac{0.0413 \, K_{3,n}}{\sqrt{n}} \!+\! \frac{0.195 \, K_{4,n} + 0.0147 \, K_{3,n}^2}{n} + \frac{1.0253 \, \kappa _n^n\log (c_n)}{\pi } + r_{2,n}\)
-
Corollary 4i.i.d. unskewed: \(\Delta _{n,B} \le \frac{0.195 \, K_{4,n}}{n} + \frac{1.0253 \, \kappa _n^n\log (c_n)}{\pi } + r_{2,n}\)
Figure 2 displays the different bounds that we obtain as a function of the sample size n, alongside with the existing bounds (Shevtsova , 2013) that do not assume such regularity conditions. The new bounds take advantage of these regularity conditions and are therefore tighter in all settings for \(n\) larger than \(10,000\). In the unskewed case, the improvement arises for much smaller \(n\) and the rate of convergence gets faster from \(1/\sqrt{n}\) to 1/n.
4 Practical considerations
4.1 Default value \(K_{4,n} \le 9\) or “Plug-in” approach
.
As seen in the previous examples, explicit values or bounds on some functionals of \(P_{S_n}\) are required to compute our non-asymptotic bounds on a standardized sample mean. This phenomenon is not unique to our bounds, and arises for any Berry-Esseen- or Edgeworth-type bounds. A value or a bound on \(K_{3,n}\) is indeed required to compute existing Berry-Esseen bounds as in the seminal works of Berry (1941) and Esseen (1942) and its recent improvement (e.g. Shevtsova (2013)). Similar to us, recent extensions to these bounds proposed in Adell and Lekuona (2008), Boutsikas (2011) and Zhilova (2020) also depend on several (potentially unknown) moments of the distributions.
Under moment conditions only, the main term and remainder \(r_{1,n}\) of Theorem 1 solely depend on \(\lambda _{3,n}\), \(K_{3,n}\) or \(\tilde{K}_{3,n}\), and \(K_{4,n}\). As a matter of fact, a bound on \(K_{4,n}\) is sufficient to control all those quantities: Pinelis (2011) ensures \(|\lambda _{3,n}| \le 0.621 K_{3,n}\), and a convexity argument yields \(K_{3,n} \le K_{4,n}^{3/4}\) (and remember that \(\tilde{K}_{3,n}\) is lower than \(2 K_{3,n}\) in the i.n.i.d. case and \(K_{3,n} + 1\) in the i.i.d. case). Having access to a bound on \(K_{4,n}\) is thus crucial to compute our bounds in practice.
First, in some situations, one may rely on auxiliary information about the distribution. In the i.i.d. case in particular, we note that imposing the bound \({K_{4,n} \le 9}\) allows for a wide family of distributions used in practice: any Gaussian, Gumbel, Laplace, Uniform, or Logistic distribution satisfies it, as well as any Student with at least 5 degrees of freedom, any Gamma or Weibull with shape parameter at least 1. In this case, remember that \(K_{4,n}\) is the kurtosis of \(X_n\), a natural and well-studied feature of a distribution.
In the i.n.i.d. case, \(K_{4,n}\) can be rewritten as a weighted average of individual kurtosis. In that respect, the bound \({K_{4,n} \le 9}\) indicates that, on average, the individual kurtosis are lower than \(9\).
Second, if a bound on \(K_{4,n}\) is not available, a “plug-in” approach remains applicable. The idea is to estimate the moments \(\lambda _{3,n}\), \(K_{3,n}\) and \(K_{4,n}\) by their empirical counterparts in the data (method of moments estimation), and then compute \(\delta _n\) by replacing the unknown needed quantities with those estimates. We acknowledge that this type of “plug-in” approach is only approximately valid, although somewhat unavoidable when bounds on the unknown moments are not given to the researcher.
In addition to the dependence on these moment bounds, Theorem 2 involves the integral \(\int _{a_n}^{b_n} |f_{S_{n}}(t)| / t \, dt\) that depends on the a priori unknown characteristic function of \(S_n\). The application of the resulting Corollaries 3 and 4 requires a control on the tail of this characteristic function through the quantities \(C_0\) and \(p\) in the i.n.i.d. case (respectively \(\kappa _n\) in the i.i.d. case), which can be given using expert knowledge of the regularity of the density of \(S_n\), as discussed in Section 3. It is also possible to estimate the integral directly, for instance using the empirical characteristic function (Ushakov, 2011, Chapter 3).
4.2 Numerical comparisons of our bounds on \(\mathbb {P}(S_n \le x)\) and existing ones
To give a better sense of the accuracy of our results, we perform a comparison between our bounds on \(x \mapsto \mathbb {P}(S_n \le x)\) and the existing ones (Shevtsova , 2013). Indeed, a control \(\delta _n\) on \({\Delta _{n,E}}\) (respectively \(\Delta _{n,B}\)) naturally yields upper and lower brackets on \(\mathbb {P}(S_n \le x)\) of the form \(\left[ \Phi (x) + \lambda _{3,n} / (6 \sqrt{n}) \times (1 - x^2) \varphi (x)\right] \pm \delta _n\) (respectively \(\Phi (x) \pm \delta _n\)), for any real \(x\). We plot those upper and lower brackets in the i.i.d. framework for three distinct distributions: Student distributions with 5 (Figure 3) or 8 (Figure 4) degrees of freedom and an Exponential distribution with expectation equal to 1, re-centered to fall in our framework (Figure 5). These three distributions are continuous with respect to Lebesgue’s measure which allows us to resort to our sharpest i.i.d. bounds, namely those presented in Corollary 4 (compared to Figures 1 and 2, we only report those improved bounds here). On the contrary, remember that the existing bounds (Shevtsova , 2013) assume finite third-order moments only; hence, they do not leverage the additional information about skewness and regularity of the considered distributions.
The bound \(\delta _n\) depends on various features of the distribution of \(S_n\). In line with Example 2, we set \(\kappa = 0.99\), which happens to be a conservative choice with those distributions as \(\kappa = 0.42\) for a Student(df = 8), 0.54 for a Student(df = 5), and 0.63 for the Exponential distributions we consider. In the following comparisons, we focus on the impact of the unknown moments \(K_{4,n}\), \(K_{3,n}\), and \(\lambda _{3,n}\) on the accuracy of our bounds.
The Student distributions illustrate the unskewed case, where our bounds use the information \(\lambda _{3,n} = 0\). Figures 3 and 4 report several bounds contrasting the suggested practical choice \(K_{4,n} \le 9\), to deal with the fact that moments are unknown, with the “oracle” bounds where we use the true values of \(\lambda _{3,n}\), \(K_{3,n}\), and \(K_{4,n}\) (computed or approximated by Monte-Carlo). As a comparison, we also report two versions of the existing bound: a “practical” one using \(K_{3,n} \le K_{4,n}^{3/4} \le 9^{3/4}\), and an “oracle” version using the true value of \(K_{3,n}\). The kurtosis of a Student distribution is equal to \( 3 + 6 / (\text {df } - 4)\) with \(\text {df} > 4\) its degree of freedom. Therefore, for any Student with at least 5 degrees of freedom, the upper bound \(K_{4,n} \le 9\) is valid, but all the more conservative as \(\text {df}\) is large. We consider two different values of \(\text {df}\) to assess the loss of accuracy of our bounds when the discrepancy between the actual \(K_{4,n}\) and our suggested default choice of \(9\) increases.
In Fig. 4, we choose \(\text {df} = 8\) so that the true value is \(K_{4,n} = 4.5\) and the proposed bound \(K_{4,n} \le 9\) is thus conservative. On the contrary, in Figure 3, because \(\text {df} = 5\), the true value of \(K_{4,n}\) is equal to the suggested choice of \(9\), which becomes sharp. In that respect, it is a more favorable situation. Nonetheless, remark that there remains a difference between the “practical” and “oracle” versions of our bounds: the latter uses the true value of \(K_{3,n}\) (here, approximately equal to \(1.8\)) while the former controls \(K_{3,n}\) by \(9^{3/4} \approx 5.2\).
The Exponential distribution displayed in Figure 5 illustrates our bounds for a skewed distribution. We choose an Exponential distribution with expectation equal to 1. This distribution has a kurtosis \(K_{4,n} = 9\) so that the main difference with Figure 3 can be expected to stem from the presence of skewness. In line with the Student case, we report two versions of Shevtsova’s bounds and ours, a practical version which uses only the information \(K_{4,n} \le 9\) and an “oracle” one based on knowledge of \(\lambda _{3,n}\), \(K_{3,n}\) and \(K_{4,n}\). We recall that \(\Delta _{n,B} \ne {\Delta _{n,E}}\) when \(\lambda _{3,n} \ne 0\). What is more, the existing bounds (plotted in red) are bounds on \(\Delta _{n,B}\) whereas ours (in green) originate from a control of \(\Delta _{n,E}\).
The “oracle” version can be interpreted as a noise-free implementation of the plug-in approach. We remark that oracle versions of existing bounds and ours are twice as accurate as their counterparts which rely on \(K_{4,n} \le 9\). These oracle bounds use by definition the true values of the moments, and therefore correspond to the most favorable case, in the sense of the tightness of the bounds.
5 Non-asymptotic behavior of one-sided tests
We now examine some implications of our theoretical results for the non-asymptotic validity of one-sided statistical tests based on the Gaussian approximation of the distribution of a sample mean using i.i.d. data.
Let \((Y_i)_{i=1, \dots , n}\) be an i.i.d. sequence of random variable with expectation \(\mu \), known variance \(\sigma ^2\) and finite fourth moment with \(K_4 : = \mathbb {E}\left[ (Y_n-\mu )^4\right] \)\( /\sigma ^4\) the kurtosis of the distribution of \(Y_n\). We want to conduct a test of the null hypothesis \({H_0 : \mu \le \mu _0}\), for some fixed real number \(\mu _0\), against the alternative \({H_1 : \mu > \mu _0}\) with a type I error at most \(\alpha \in (0,1)\), and ideally equal to \(\alpha \). The classical approach to this problem (Gauss test) amounts to comparing \({S_n = \sum _{i=1}^n X_i / \sqrt{n}}\), where \(X_i {:=} (Y_i - \mu _0) / \sigma \), with the \(1-\alpha \) quantile of the \(\mathcal {N}(0,1)\) distribution, \(q_{\mathcal {N}(0,1)}(1-\alpha )\), and reject \(H_0\) if \(S_n\) is larger. We study this Gauss test in the general non-asymptotic framework without imposing Gaussianity of the data distribution, and we control the difference with respect to normality using the bounds developed in the previous sections.
5.1 Computation of sufficient sample sizes
In certain fields such as medicine or economics, researchers routinely set up experiments that seek to answer a specific question on an explained variable \(Y\). The number of individuals included in the experiment has to be carefully justified as large-scale analyses are very costly. This is typically done through the construction of a so-called “pre-analysis plan” which presents the sample size needed to detect a given effect with a pre-specified testing power \(\beta \in (0,1)\).
In the Gauss test setting considered here, the researcher determines the effect of interest by fixing a particular alternative hypothesis \(H_{1, \eta }: \mu = \mu _0 + \sigma \eta \) (with \(\mu >\mu _0\)). The quantity \(\eta {:=} (\mu - \mu _0) / \sigma \) is a positive number called the effect size that indicates how far away (in terms of standard deviations) the alternative hypothesis is, compared to the null hypothesis \(H_0: \mu \le \mu _0\). Remark that in our framework, \(H_{1, \eta }\) is formally the set of all distributions with mean \(\mu \), variance \(\sigma ^2\), that satisfy our additional moment and regularity conditions. \(H_{1, \eta }\) can be seen as a nonparametric class of distributions at a fixed distance \(\eta \) of the null hypothesis.
Researchers usually rely on an asymptotic normal approximation to infer the sample size needed to detect a given effect at power \(\beta \). Our results allow us to bypass this asymptotic approximation and to propose a procedure to choose the sample size n of the experiment such that
for any distribution belonging to the alternative hypothesis space. Any n that satisfies Eq. 16 for all distributions in the alternative hypothesis is called a (non-asymptotic) sufficient sample size for the effect size \(\eta \) at power \(\beta \).
Observe that
where \(X_i {:=} (Y_i - \mu ) / \sigma \) are centered with mean 0 and variance 1 and \(x_n {:=} q_{\mathcal {N}(0,1)}(1-\alpha ) - \eta \sqrt{n}\). We remind the reader that the general result from Theorem 1 or Corollary 4 implies the following upper and lower bounds for every \(x \in \mathbb {R}\) and \(n \ge 3\),
where \(\delta _n\) is the corresponding bound on \(\Delta _{n,E}\). From Eq. 17, we thus obtain
Therefore,
As a consequence, the sample size \(n = n_{\eta , \beta }\) defined as the solution of the following equation
is a non-asymptotic sufficient sample size. Note that the same reasoning can be also applied if we only impose an upper bound on \(\lambda _{3,n}\). In particular, if we only know \(K_{4,n}\), we can use the bound \(0.621 K_{4,n}^{3/4}\) and then a sufficient sample size n can be found as the solution to
Numerical applications can be found in Table 2 which displays the computed sample sizes for different choices of effect sizes \(\eta \) and of power \(\beta \). In this experiment, we choose \(K_{4,n} \le 9\) and \(\kappa \le 0.99\), as before. We can observe that, as expected, \(n_{\eta , \beta }\) increases with \(\beta \) and decreases with \(\eta \). For \(\eta \) large enough, \(n_{\eta , \beta }\) becomes approximately constant in \(\eta \) as Eq. 18 simplifies to \(1 - \delta _n = \beta .\) Conversely, it is also possible to use directly Eq. 18 to compute the power for different effects and sample sizes. The results are displayed in Table 3.
5.2 Assessing the lack of information
As explained below, the non-asymptotic bounds introduced in Sections 2 and 3 can be used to evaluate the actual (for a finite sample size) level of our one-sided test of interest.
Recall that Berry-Esseen-type inequalities aim to bound \(\Delta _{n,B}\), defined in (2), the uniform distance between \(\mathbb {P}(S_n \le \cdot )\) and \(\Phi (\cdot )\). In particular, for a nominal level \(\alpha \), we thus have
where the probability operator is to be understood under any data-generating process such that \(\mu = \mu _0\), to be as close as possible to the alternative hypothesis \(H_1\). Either “classical” Berry-Esseen inequalities or ours obtained through an Edgeworth expansion provide bounds on \(\Delta _{n,B}\) (see the different bounds displayed in Examples 1 and 2 in the i.i.d. case). In this context, a bound on \(\Delta _{n,B}\) is said to be uninformative when it is larger than \(\alpha \). Indeed, in that case, we cannot exclude that \(\mathbb {P}\!\left( S_n \le q_{\mathcal {N}(0,1)}(1-\alpha ) \right) \) is arbitrarily close to 1, or equivalently, that the probability to reject \(H_0\) is arbitrarily close to 0, and therefore that the test is arbitrarily conservative (type I error arbitrarily smaller than the nominal level \(\alpha \)). We denote by \(n_{\max }(\alpha )\) the largest sample size n for which the bound is uninformative. Intuitively, \(n_{\max }(\alpha )\) indicates the sample size above which the asymptotic normal approximation to the distribution of \(S_n\) becomes sensible under the assumptions used to bound \(\Delta _{n,B}\). Indeed, \(n_{\max }(\alpha )\) is specific to the bound \(\delta _n\) used, which itself depends on various features of the distribution: number of finite moments, (lack of) skewness, regularity, etc. Table 4 reports the value of \(n_{\max }(\alpha )\) for different Berry-Esseen bounds and usual nominal levels \(\alpha \in \{0.10, 0.05, 0.01\}\).
For each bound, \(n_{\max }(\alpha )\) is decreasing in \(\alpha \). For \(\alpha = 0.01\) in particular, the situation deteriorates strikingly except in the most favorable case of a regular and unskewed distribution. With our bounds, the presence or absence of skewness strongly influences \(n_{\max }(\alpha )\). We also remark that imposing the additional regularity assumption introduced in Section 3 significantly lowers \(n_{\max }(\alpha )\).
5.3 Distortions of the level of the test and of the p-values
We explain now that our non-asymptotic bounds on the Edgeworth expansion can be used to detect whether the test is conservative or liberal. This goes one step further than merely checking whether it is arbitrarily conservative or not. Eq. 17 shows that \(\mathbb {P}(S_n\le x)\) belongs to the interval
which is not centered at \(\Phi (x)\) whenever \(\lambda _{3,n} \ne 0\) and \(x \ne \pm \, 1\). The length of the interval does not depend on x and shrinks at speed \(\delta _n\). On the contrary, its location depends on x. For given nonzero skewness \(\lambda _{3,n}\) and sample size \(n\), the middle point of \(\mathcal {I}_{n,x}\) is all the more shifted away from the asymptotic approximation \(\Phi (x)\) as \( (1-x^2)\varphi (x)\) is large in absolute value. The function \(x \mapsto (1-x^2)\varphi (x)\) has global maximum at \(x=0\) and minima at the points \(x \approx \pm \, 1.73\). Consequently, irrespective of n, the largest gaps between \(\mathbb {P}(S_n\le x)\) and \(\Phi (x)\) may be expected around \(x=0\) or \(x = \pm \, 1.73\). \(\Phi (x)\) could even lie outside \(\mathcal {I}_{n,x}\), in which case \(\mathbb {P}( S_n \le x )\) has to be either strictly smaller or larger than \(\Phi (x)\). More precisely, \(\mathbb {P}( S_n \le x )\) is all the further from its normal approximation \(\Phi (x)\) as the skewness \(\lambda _{3,n}\) is large in absolute value; whether \(\mathbb {P}( S_n \le x )\) is strictly smaller or larger than \(\Phi (x)\) depends on the sign of \(1 - x^2\) as developed in Table 5.
These observations allow us to quantify possible non-asymptotic distortions between the nominal level and actual rejection rate of the one-sided test we consider. Let us set \(x = q_{\mathcal {N}(0,1)}(1-\alpha )\) (henceforth denoted \(q_{1-\alpha }\) to lighten notation), which implies that \(\Phi (x) = 1- \alpha \). Here, we focus solely on the case \(|q_{1-\alpha }|>1\) to encompass all tests with nominal level \(\alpha \le 0.15\), thus in particular the conventional levels 10%, 5%, and 1%. When \(\lambda _{3,n} > 6\sqrt{n}\delta _n/\big ((q_{1-\alpha }^2-1)\varphi (q_{1-\alpha })\big )\), we conclude that \(\mathbb {P}\left( S_n \le q_{1-\alpha } \right) < 1-\alpha \). Since the event \( \{ S_n \le q_{1-\alpha } \} \) is the complement of the rejection region, the probability of rejecting \(H_0\) under the null exceeds \(\alpha \); in other words, the test cannot guarantee its stated control \(\alpha \) on the type I error and is said liberal. Conversely, when \(\lambda _{3,n} < 6\sqrt{n}\delta _n/\big ((1-q_{1-\alpha }^2)\varphi (q_{1-\alpha })\big )\), the probability \(\mathbb {P}\left( S_n \le q_{1-\alpha } \right) \) has to be larger than \(1-\alpha \); equivalently, the probability to reject under the null is below \(\alpha \) so that the test is conservative.
The distortion can also be seen in terms of p-values. In the unilateral test we consider, the p-value is \({pval {:=} 1 - \mathbb {P}(S_n \le s_n)}\) with \(s_n\) the observed value of \(S_n\) in the sample. In contrast, the approximated p-value is \({\widetilde{pval} {:=} 1 - \Phi (s_n)}\). Setting \(x = s_n\) in Eq. 17 yields
Therefore,
In line with the explanations preceding Table 5, \(\widetilde{pval}\) is strictly smaller or larger than \(pval\) when the skewness is sufficiently large in absolute value relative to \(\delta _n\). Indeed, if \(\lambda _{3,n} \ne 0\), the interval from Eq. 19 that contains the true p-value \(pval\) is not centered at the approximated p-value \(\widetilde{pval}\). Under additional regularity assumptions (see Corollary 4 in the i.i.d. case), the remainder term \(\delta _n = O(n^{-1})\) whereas the “bias” term involving \(\lambda _{3,n}\) vanishes at rate \(n^{-1/2}\). As a result, the interval locates closer to \(\widetilde{pval}\) as n increases and its width shrinks to zero at an even faster rate.
Finally, we stress that such distortions regarding rejection rates and p-values are specific to one-sided tests. For bilateral or two-sided tests, the skewness of the distribution enters symmetrically in the approximation error and cancels out thanks to the parity of \( x \mapsto (1 - x^2) \phi (x) \).
Code Available
The code is available as a supplementary material and in the Github repository https://github.com/AlexisDerumigny/Reproducibility-BoundsDistanceEdgeworth. It is based on the package BoundEdgeworth (Derumigny et al. , 2023) developped with this article
Notes
In this article, we only give results for standardized sums of random variables, i.e., sums that are rescaled by their standard deviation. In practice, the variance is unknown and has to be replaced with some empirical counterpart, leading to what is usually called a self-normalized sum. This is an important question in practice that we leave aside for future research. There exist numerous results on self-normalized sums in the fields of Edgeworth expansions and Berry-Esseen inequalities (Hall (1987); de la Peña et al. (2009)). However, the practical limitations of existing results that we point out in our work still prevail.
Bounds instead of equalities in the sense that we round up to the fourth digit the obtained numerical constants.
References
Abramowitz, M., & Stegun, I.A. (1972). Handbook of mathematical functions with formulas, graphs and mathematical tables (Vol. 55). National Bureau of Standards, Applied Mathematics Series.
Adell, J.A., & Lekuona, A. (2008). Shortening the distance between Edgeworth and Berry-Esseen in the classical case. Journal of Statistical Planning and Inference, 138 (4), 1167–1178.
Bentkus, V. (2003). On the dependence of the Berry-Esseen bound on dimension. Journal of Statistical Planning and Inference, 113 (2), 385–402.
Bentkus, V., & Götze, F. (1996). The Berry-Esseen bound for student’s statistic. Ann. Probab., 24 (1), 491–503. Bentkus, V., Götze, F., van Zwet, W.R. (1997, 04). An edgeworth expansion for symmetric statistics. Ann. Statist., 25 (2), 851–896.
Berry, A. (1941). The accuracy of the gaussian approximation to the sum of independent variates. Transactions of the American Mathematical Society, 49 , 122–136.
Bhattacharya, R.N., & Ranga Rao, R. (1976). Normal approximation and asymptotic expansions. New York: Wiley.
Boutsikas, M.V. (2011). Asymptotically optimal Berry-Esseen-type bounds for distributions with an absolutely continuous part. Journal of Statistical Planning and Inference, 141 (3), 1250–1268.
Chernozhukov, V., Chetverikov, D., Kato, K. (2017, 07). Central limit theorems and bootstrap in high dimensions. Ann. Probab., 45 (4), 2309–2352.
Cramer, H. (1962). Random variables and probability distributions (2nd ed.). Cambridge University Press.
de la Peña, V.H., Leung Lai, T., Shao, Q.-M. (2009). Self-normalized processes: Limit theory and statistical applications. Springer-Verlag, Berlin Heidelberg.
Derumigny, A., Girard, L., Guyonvarch, Y. (2023). BoundEdgeworth: Bound on the error of the first-order edgeworth expansion [Computer software manual]. (R package version 0.1.2. Available at https://github.com/AlexisDerumigny/BoundEdgeworth.)
Efron, B. (1979, 01). Bootstrap methods: Another look at the jackknife. Ann. Statist., 7 (1), 1–26.
Esseen, C.-G. (1942). On the Liapunoff limit of error in the theory of probability. Arkiv för Matematik, Astronomi och Fysik..
Esseen, C.-G. (1945). Fourier analysis of distribution functions. a mathematical study of the Laplace-Gaussian law. Acta Math., 77 , 1–125.
Esseen, C.-G. (1956). A moment inequality with an application to the central limit theorem. Scandinavian Actuarial Journal, 1956 (2), 160–170.
Gil-Pelaez, J. (1951). Note on the inversion theorem. Biometrika, 38 (3–4), 481–482.
Goulet, V. (2016). expint: Exponential integral and incomplete gamma function [Computer software manual]. Retrieved from https://cran.rproject.org/package=expint (R package)
Hall, P. (1987). Edgeworth expansion for Student’s statistic under minimal moment conditions. Ann. Probab., 15 (3), 920–931.
Hall, P. (1992). The bootstrap and Edgeworth expansion. Springer-Verlag.
Kosorok, M. (2006). Introduction to empirical processes and semiparametric inference. Springer Verlag New York.
Lahiri, S.N. (2003). Resampling methods for dependent data. Springer Science & Business Media.
Narasimhan, B., Johnson, S.G., Hahn, T., Bouvier, A., Kiêu, K. (2020). cubature: Adaptive multivariate integration over hypercubes [Computer software manual]. Retrieved from https://cran.rproject.org/package=cubature (R package version 2.0.4.1)
Pinelis, I. (2011). Relations between the first four moments.
Pinelis, I., & Molzon, R. (2016). Optimal-order bounds on the rate of convergence to normality in the multivariate delta method. Electronic Journal of Statistics, 10 (1), 1001–1063.
Prawitz, H. (1972). Limits for a distribution, if the characteristic function is given in a finite domain. Scandinavian Actuarial Journal, 1972 (2), 138–154.
Prawitz, H. (1975). On the remainder in the central limit theorem. Scandinavian Actuarial Journal, 1975 (3), 145–156.
Raič, M. (2018). A multivariate central limit theorem for Lipschitz and smooth test functions. arXiv:1812.08268.
Raič, M. (2019, 11). A multivariate Berry-Esseen theorem with explicit constants. Bernoulli , 25 (4A), 2824–2853.
Senatov, V.V. (2011). On the real accuracy of approximation in the central limit theorem. Siberian Mathematical Journal, 52 (4), 19–38.
Shevtsova, I. (2010). Refinement of estimates for the rate of convergence in Lyapunov’s theorem. Dokl. Akad. Nauk, 435 (1), 26–28.
Shevtsova, I. (2012). Moment-type estimates with asymptotically optimal structure for the accuracy of the normal approximation. Annales Mathematicae et Informaticae, 39 , 241–307.
Shevtsova, I. (2013). On the absolute constants in the berry-esseen inequality and its structural and nonuniform improvements. Informatika i Ee Primeneniya [Informatics and its Applications] , 7 (1), 124–125.
Ushakov, N.G. (2011). Selected topics in characteristic functions. Berlin, Boston: De Gruyter.
Ushakov, N.G., & Ushakov, V.G. (1999). Some inequalities for characteristic functions of densities with bounded variation. Preprint series. Statistical Research Report http://urn.nb.no/URN:NBN:no-23420.
van der Vaart, A. (2000). Asymptotics statistics. Cambridge University Press.
van der Vaart, A., & Wellner, J. (1996). Weak convergence of empirical processes: with applications to statistics. Springer-Verlag New York.
Zhilova, M. (2020). New Edgeworth-type expansions with finite sample guarantees. arXiv:2006.03959.
Acknowledgements
We thank professors Victor-Emmanuel Brunel and Xavier D’Haultfœuille for insightful discussions as well as seminar participants at CREST, University of Surrey, Université Paris-Saclay, and CIREQ Montreal Econometrics Conference.
Funding
Part of this article was written while A.D. was employed by the University of Twente and Y.G. was employed by University Paris-Sud and then by Télécom Paris. No specific funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no competing interests to declare that are relevant to the content of this article
Supplementary information
The supplementary material is composed of several files: a MATLAB file, seven .Rmd files which allow to reproduce all results and figures presented in this paper, and the seven corresponding .pdf outputs
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of the main theorems
1.1 A.1 Outline of the proofs of Theorems 1 and 2
We start by presenting a lemma derived in Prawitz (1975), which is central to prove our theorems. This result helps control the distance between the cumulative distribution function F of a random variable with skewness v and its first order Edgeworth expansion \(G_v(x){:=}\Phi (x)+\frac{v}{6}(1-x^2) \varphi (x)\) in terms of their respective Fourier transforms.
Lemma 5
Let F be an arbitrary cumulative distribution function with characteristic function f and skewness v. Let \(\tau , T > 0\). Then we have
where
and \(\Psi (t){:=} \frac{1}{2} \left( 1-|t|+i\left[ (1-|t|)\cot (\pi t)+\frac{\textrm{sign}(t)}{\pi }\right] \right) \mathbbm {1}\left\{ |t| \le 1 \right\} \).
For the sake of completeness, we give a proof of this lemma in Section 1. We also use the following properties on the function \(\Psi \) (Prawitz, 1975, Equations (I.29) and (I.30))
Lemma 5 is valid for any positive values \(T\) and \(\tau \). The latter are free parameters whose values determine which terms are the dominant ones among \(\Omega _1\) to \(\Omega _4\).
Theorem 1 written in the body of the article synthesizes Theorems 6 and 7 stated and proven below respectively in the i.n.i.d. and the i.i.d. cases. Likewise, Theorem 2 corresponds to Theorems 8 (i.n.i.d. case) and Theorem 9 (i.i.d. case). The four proofs start by applying Lemma 5 with F the cdf of \(S_n\) and thus \(v = \lambda _{3,n} / \sqrt{n}\). Then, for specific values of \(T\) and \(\tau \), we derive upper bounds on each of the four terms of Eq. A1.
In all our theorems, we set
where \(\epsilon \) is a dimensionless free parameter. It is not obvious to optimize our bounds over that parameter. Consequently, Theorems 6 to 9 are proven for any \(\epsilon \in (0, 1/3)\) and, in the body of the article, we present the results with \(\epsilon = 0.1\), a sensible value according to our numerical comparisons.
Unlike \(\tau \), we vary the rate of \(T\) across theorems. In Theorems 6 and 7, we choose
The resulting bound is interesting under moment conditions only (Assumption 1 for i.n.i.d. cases and 2 for i.i.d. cases).
In Theorems 8 and 9, we make a different choice, namely
These last two theorems present alternative bounds, also valid under moment conditions only. They improve on Theorems 6 and 7 under regularity conditions on the tail behavior of the characteristic function \({f_{S_{n}}}\) of \(S_n\). Examples of such conditions are to be found in Corollaries 3 (i.n.i.d. case) and 4 (i.i.d. case).
1.2 A.2 Proof of Theorem 1 under Assumption 1
In this section, we state and prove a more general theorem (Theorem 6 below). We recover Theorem 1 when we set \(\epsilon = 0.1\).
Theorem 6
(One-term Edgeworth expansion under Assumption 1)
(i) Under Assumption 1, for every \(\epsilon \in (0,1/3)\) and every \(n \ge 1\), we have the bound
where \(e_{1,n}(\epsilon )\) is given in Eq. C32 and \(r^\textrm{inid,skew}_{1,n}(\epsilon )\) is given in Eq. A6.
(ii) If we further impose \(\mathbb {E}[X_i^3]=0\) for every \(i = 1,\dots ,n\), the upper bound reduces to
where \(r^\textrm{iid,skew}_n(\epsilon )\) is given in Eq. A7.
(iii) Finally, when \(K_{4,n} = O(1)\) as \(n \rightarrow \infty \), we obtain \(r^\textrm{inid,skew}_{1,n}(\epsilon ) = O(n^{-5/4})\) and \( r^{\text {inid,noskew}}_{1,n}(\epsilon ) = O(n^{-3/2})\).
Using Theorem 6, we can finish the proof of Theorem 1 by plugging-in our choice \(\epsilon = 0.1\) and computing the numerical constants. In particular, the computation of \(e_{1,n}(0.1)\) gives the upper bound \({e_{1,n}(0.1) \le 1.0157}\).
In the general case with skewness, using the computations for \(\overline{R}^{\text {inid}}_n(0.1)\) carried out in Section 1, the rest \( r^\textrm{inid,skew}_{1,n}(0.1) \) is bounded by the explicit expression given in Eq. (3).
In the no-skewness case, the rest \( r^{\text {inid,noskew}}_{1,n}(0.1) \) is bounded by the explicit expression given in Eq. 4, where we use the expression of \(\overline{R}^\textrm{inid}_n(\epsilon )\) in Eq. C48 and the computations when \(\epsilon = 0.1\) that follow Eq. C48.
Proof of Theorem
6. We first prove (i). We apply Lemma 5 with F denoting the cdf of \(S_n\) and obtain
Let \(T {:=} 2 \pi \sqrt{n} / \tilde{K}_{3,n}\), \(v {:=} \lambda _{3,n} / \sqrt{n}\) and \(\tau {:=} \sqrt{2\epsilon } (n/K_{4,n})^{1/4}\). We combine now Lemma 10 (control of \(\Omega _1\)), Eq. B25 (control of \(\Omega _2\)), Lemma 12 (control of \(\Omega _3\)), and Lemma 13(i) (control of \(\Omega _4\)) so that we get
Bounding \((1.0253/\pi )\times \int _0^{\tau \wedge T/\pi } u e^{-u^2/2} R^\textrm{inid}_n(u,\epsilon )\) by \(\overline{R}^\textrm{inid}_n(\epsilon ) {:=} (1.0253/\pi )\times \int _0^{+ \infty } u e^{-u^2/2} R^\textrm{inid}_n(u,\epsilon )\), bounding \(J_2\) by Lemma 19, and replacing T and \(\tau \) by their values, we obtain
where
and \(\Delta {:=} (1 - 4 \chi _1 - \sqrt{K_{4,n}/n}) / 2\).
We obtain the result of Eq. A4 by computing all numerical constants; for instance, \( 1.0253 / (2\pi ) \approx 0.19942 < 0.1995 \).
We now prove (ii). In the no-skewness case, namely when \(\mathbb {E}[X_i^3] = 0\) for every \(i = 1, \ldots , n\), the start of the proof is identical except that Lemma 13(ii) is used in lieu of Lemma 13(i) to control \(\Omega _4\). This yields
Bounding \(J_2\) by Lemma 19 and replacing T and \(\tau \) by their values, we obtain
where
We obtain the result of Eq. A5 by computing all the numerical constants.
We finally prove (iii). When \(K_{4,n} = O(1)\), we remark that \(\lambda _{3,n}\), \(K_{3,n}\), and \(\tilde{K}_{3,n}\) are bounded as well. Given the detailed analysis of \(\overline{R}^\textrm{inid}_n(\epsilon )\) carried out in Section 1 (in particular Eqs. (C47) and (C48)), boundedness of the former moments ensures that \(\overline{R}^\textrm{inid}_n(\epsilon ) = O(n^{-5/4})\) in general and \(\overline{R}^\textrm{inid}_n(\epsilon ) = O(n^{-3/2})\) in the no-skewness case.
We can also see (remember that \(\chi _1 \approx 0.099 \)) that \(\Delta > 0\) for n large enough when \(K_{4,n} = O(1)\). Consequently, for n large enough, we can write in the general case
and, in the no-skewness case,
This reasoning enables us to obtain a difference of Gamma functions and therefore apply the asymptotic expansion \(\Gamma (a,x) = x^{a-1} e^{-x} (1 + O((a-1) / x))\) which is valid for every fixed a in the regime \(x\rightarrow \infty \), see Equation (6.5.32) in Abramowitz and Stegun (1972). We also use this asymptotic expansion for the term
Consequently, we get the stated rate \(r^\textrm{inid,skew}_{1,n}(\epsilon ) = O(n^{-5/4})\) in the general case and \(r^\textrm{inid,skew}_{1,n}(\epsilon ) = O(n^{-3/2})\) in the no-skewness case.
1.3 A.3 Proof of Theorem 1 under Assumption 2
We present and prove a more general result, Theorem 7, and choose \(\epsilon = 0.1\) to recover Theorem 1 under Assumption 2
Theorem 7
(One-term Edgeworth expansion under Assumption 2)
(i) Under Assumption 2, for every \(\epsilon \in (0,1/3)\) and every \(n \ge 3\), we have the bound
where \(r^\textrm{inid,skew}_{1,n}(\epsilon )\) is given in Eq. A10 and \(e_{3}(\epsilon ) = e^{\epsilon ^2/6 + \epsilon ^2 / (2(1-3 \epsilon ) )^2}\).
(ii) If we further impose \(\mathbb {E}[X_n^3]=0\), the upper bound reduces to
where \(r^\textrm{iid,noskew}_1,n(\epsilon )\) is given in Eq. A11.
(iii) Finally, when \(K_{4,n}=O(1)\) as \(n \rightarrow \infty \), we obtain \(r^\textrm{inid,skew}_{1,n}(\epsilon ) = O(n^{-5/4})\) and \(r^\textrm{iid,noskew}_{1,n}(\epsilon ) = O(n^{-2})\).
We use this result to finish the proof of Theorem 1, which corresponds to the case \( \epsilon = 0.1 \), by computing the numerical constants. In particular, the computation of \(e_{3}(0.1)\) gives the upper bound \(e_{3}(0.1) \le 1.0068 \). Note that in the statement of Theorem 1, to obtain a more concise presentation, we control \(e_{3}(0.1)\) from above by the slightly larger bound 1.0157 used in the i.n.i.d. case to upper bound \(e_{1,n}(0.1)\).
In this case, we obtain the bound \( r^\textrm{inid,skew}_{1,n}\) on \( r^\textrm{inid,skew}_{1,n}(0.1) \) which is given in Eq. 5, where \(\overline{R}^\textrm{iid,noskew}_n\) is defined in Eq. 8.
Proof of Theorem
The overall scheme of the proof is similar to that of Theorem 6 except for some improvements obtained in the i.i.d. set-up.
We first prove (i). We apply Lemma 5 with F the cdf of \(S_n\) and obtain
Let \(T = 2 \pi \sqrt{n} / \tilde{K}_{3,n}\), \(v = \lambda _{3,n} / \sqrt{n}\) and \(\tau = \sqrt{2\epsilon } (n/K_{4,n})^{1/4}\). We combine Lemma 10 (control of \(\Omega _1\)), Eq. B25 (control of \(\Omega _2\)), Lemma 12 (control of \(\Omega _3\)), Lemma 13(iii) (control of \(\Omega _4\)) to get
Bounding \((1.0253/\pi )\times \int _0^{\tau \wedge T/\pi } u e^{-u^2/2} R^\textrm{iid}_n(u,\epsilon )\) by \(\overline{R}^\textrm{iid}_n(\epsilon ) {:=} (1.0253/\pi )\times \int _0^{+ \infty } u e^{-u^2/2} R^\textrm{iid}_n(u,\epsilon )\), bounding \(J_3\) by Lemma 20, and replacing T and \(\tau \) by their values, we obtain
where
We obtain the result of Eq. A8 by computing the numerical constants.
We now prove (ii). In the no-skewness case, namely when \(\mathbb {E}[X_n^3] = 0\), the start of the proof is identical except that Lemma 13(iv) is used in lieu of Lemma 13(iii) to control \(\Omega _4\). This yields
Bounding \(J_3\) by Lemma 20 and replacing T and \(\tau \) by their values, we obtain
where
We obtain the result of Eq. A9 by computing all the numerical constants.
We finally prove (iii). Following the line of proof as in Section 1, we can prove that \(K_{4,n} = O(1)\) ensures the standardized moments \(\lambda _{3,n}\), \(K_{3,n}\), and \(\tilde{K}_{3,n}\) are bounded as well. Given the detailed analysis of \(\overline{R}^\textrm{iid}_n(\epsilon )\) carried out in Section 1 (in particular Eq. C49), boundedness of the former moments ensures that \(\overline{R}^\textrm{iid}_n(\epsilon ) = O(n^{-3/2})\) in general and \(\overline{R}^\textrm{iid}_n(\epsilon ) = O(n^{-2})\) in the no-skewness case.
From the definitions of \(e_{2,n}\) and \(e_{3}\) in Eqs. C43 and C44, we note that the term
Applying the asymptotic expansion \(\Gamma (a,x) = x^{a-1} e^{-x} (1 + O((a-1) / x))\), we can claim
and
As a result, we obtain \(r^\textrm{inid,skew}_{1,n}(\epsilon ) = O(n^{-5/4})\) in general and \(r^\textrm{iid,noskew}_1,n(\epsilon ) \) \(= O(n^{-2})\) in the no-skewness case, as claimed.
1.4 A.4 Proof of Theorem 2 under Assumption 1
We use Theorem 8, proved below, with the choice \(\epsilon = 0.1\). Recall that \(t_1^* = \theta _1^*/(2\pi ) \approx 0.64\) where \(\theta _1^*\) is the unique root in \((0,2\pi )\) of the equation \(\theta ^2+2\theta \sin (\theta )+6(\cos (\theta )-1)=0.\) Recall also that \(a_n {:=} 2t_1^*\pi \sqrt{n}/\tilde{K}_{3,n} \wedge 16\pi ^3n^2/\tilde{K}_{3,n}^4,\) and \(b_n {:=} 16\pi ^4n^2/\widetilde{K}_{3,n}^4\).
Theorem 8
(Alternative one-term Edgeworth expansion under Assumption 1)
(i) Under Assumption 1, for every \(\epsilon \in (0,1/3)\) and every \(n \ge 1\), we have the bound
where \(r^\textrm{iid,skew}_{2,n}(\epsilon )\) is given in Eq. A14.
(ii) If we further impose \(\mathbb {E}[X_n^3]=0\), the upper bound reduces to
where \(r^\textrm{inid,skew}_{2,n}(\epsilon )\) is given in Eq. A15.
(iii) Finally, when \(K_{4,n}=O(1)\) as \(n \rightarrow \infty \), we obtain \(r^\textrm{iid,skew}_{2,n}(\epsilon ) = O(n^{-5/4})\) and \(r^\textrm{inid,skew}_{2,n}(\epsilon ) = O(n^{-3/2})\).
Using Theorem 8, we can finish the proof of Theorem 1 by setting \(\epsilon = 0.1\), computing the numerical constants and using the upper bounds on \(\overline{R}^\textrm{inid}_n(0.1)\) computed in Section 1. In particular, \( r^{\text {inid,noskew}}_{2,n}(0.1) \) is bounded by the explicit expression given in Eq. 11.
Proof of Theorem
8We first prove (i). We apply Lemma 5 with F the cdf of \(S_n\) and obtain
Let \(T = 16 \pi ^4 n^2 / \tilde{K}_{3,n}^4\), \(v = \lambda _{3,n} / \sqrt{n}\) and \(\tau = \sqrt{2\epsilon } (n/K_{4,n})^{1/4}\). We combine Lemma 10 (control of \(\Omega _1\)), Lemma 12 (control of \(\Omega _3\)), Lemma 14 and then Lemma 13(i) (control of \(\Omega _4\)) to get
Bounding \((1.0253/\pi )\times \int _0^{\tau \wedge T/\pi } u e^{-u^2/2} R^\textrm{inid}_n(u,\epsilon )\) by \(\overline{R}^\textrm{inid}_n(\epsilon ) {:=} (1.0253/\pi )\times \int _0^{+ \infty } u e^{-u^2/2} R^\textrm{inid}_n(u,\epsilon )\), bounding \(J_2\) by Lemma 19, and replacing T and \(\tau \) by their values, we obtain
where \(a_n {:=} 2t_1^*\pi \sqrt{n}/\tilde{K}_{3,n} \wedge 16\pi ^3n^2/\tilde{K}_{3,n}^4\) and \(b_n {:=} 16\pi ^4n^2/\tilde{K}_{3,n}^4\),
and \(\Delta {:=} (1 - 4 \chi _1 - \sqrt{K_{4,n}/n}) / 2\).
We now prove (ii). The proof is exactly the same as the one we have used in (i) just above, except that Lemma 13(i) is replaced with Lemma 13(ii). Consequently,
where
We finally prove (iii). The reasoning is completely analogous to the proof of Theorem 6.(iii). Leading terms in \(r^\textrm{iid,skew}_{2,n}(\epsilon )\) (resp. \(r^\textrm{inid,skew}_{2,n}(\epsilon )\)) stem from \(\overline{R}^\textrm{inid}_n(\epsilon )\). This term appeared in \(r^\textrm{inid,skew}_{1,n}(\epsilon )\) and \(r^\textrm{iid,skew}_n(\epsilon )\) and we showed \(\overline{R}^\textrm{inid}_n(\epsilon ) = O(n^{-5/4})\) in the general case and \(\overline{R}^\textrm{inid}_n(\epsilon ) = O(n^{-3/2})\) in the no-skewness case.
1.5 A.5 Proof of Theorem 2 under Assumption 2
We use Theorem 9, proved below, with the choice \(\epsilon = 0.1\). Recall that \(t_1^* = \theta _1^*/(2\pi ) \approx 0.64\) where \(\theta _1^*\) is the unique root in \((0,2\pi )\) of the equation \(\theta ^2+2\theta \sin (\theta )+6(\cos (\theta )-1)=0.\) Recall also that \(a_n {:=} 2t_1^*\pi \sqrt{n}/\tilde{K}_{3,n} \wedge 16\pi ^3n^2/\tilde{K}_{3,n}^4,\) and \(b_n {:=} 16\pi ^4n^2/\widetilde{K}_{3,n}^4\).
Theorem 9
(Alternative one-term Edgeworth expansion under Assumption 2)
(i) Under Assumption 2, for every \(\epsilon \in (0,1/3)\) and every \(n \ge 3\), we have the bound
where \(r^\textrm{iid,skew}_{2,n}(\epsilon )\) is given in Eq. A18 and \(e_{3}(\epsilon ) = e^{\epsilon ^2/6 + \epsilon ^2 / (2(1-3 \epsilon ) )^2}\).
(ii) If we further impose \(\mathbb {E}[X_n^3]=0\), the upper bound reduces to
where \(r^\textrm{inid,noskew}_{2,n}(\epsilon )\) is given in Eq. A19.
(iii) Finally, when \(K_{4,n} = O(1)\) as \(n \rightarrow \infty \), we obtain \(r^\textrm{iid,skew}_{2,n}(\epsilon ) = O(n^{-5/4})\) and \(r^\textrm{inid,noskew}_{2,n}(\epsilon ) = O(n^{-2})\).
We can use this result to wrap up the proof of Theorem 2. We set \(\epsilon = 0.1\), use the upper bound \(\overline{R}^\textrm{iid}_n(0.1) \le e_{2,n}\) in the general case (resp. \(\overline{R}^\textrm{iid}_n(0.1) \le \overline{R}^\textrm{iid,noskew}_n\) in the no-skewness case) and compute all the numerical constants depending on \(\epsilon \). This gives us the explicit expression written in Eq. 13 as an upper bound on \(r^\textrm{iid,skew}_{2,n}(0.1)\).
Proof of Theorem
9We first prove (i). The proof is similar to that of Theorem 8 except that we use Lemma 13(iii) instead of Lemma 13(i) (and the second part of Lemma 12). This leads to
Using Lemma 20 instead of Lemma 19, we arrive at
where
We now prove (ii). The proof is the same as that of Result (i), except that we use Lemma 13(iv) instead of Lemma 13(iii). We conclude
where
We finally prove (iii). \(\overline{R}^\textrm{iid}_n(\epsilon )\) is the leading term in both \(r^\textrm{iid,skew}_{2,n}(\epsilon )\) and \(r^\textrm{inid,noskew}_{2,n}(\epsilon )\). In the proof of Theorem 7, \(\overline{R}^\textrm{iid}_n(\epsilon )\) was shown to be of order \(n^{-5/4}\) in general and \(n^{-2}\) in the no-skewness case.
1.6 A.6 Proof of Lemma 5
Let us denote by “p.v \( \int \,\)” Cauchy’s principal value, defined by
where f is a measurable function on \([-a,a] \backslash \{0\}\) for a given \(a > 0\). In the following, we use the following inequalities, which are due to Prawitz (1972)
Note that these inequalities hold for every distribution F with characteristic function f, without any assumption. However, they only involve values of the characteristic function f on the interval \([-T, T]\) (independently of the fact that f may be non-zero elsewhere).
Therefore,
Note that the Gil-Pelaez inversion formula (see Gil-Pelaez (1951)) is valid for any bounded-variation function. Formally, for every bounded-variation function \(G(x)=\int _{-\infty }^x g(t) dt\), denoting the Fourier transform of a given function g by \(\check{g} {:=} \int _{-\infty }^{+\infty } e^{ixu} g(u) du\), we have
Therefore, applying Eq. A22 to the function \(G_v(x) {:=} \Phi (x) + v(1-x^2) \varphi (x) / 6\) whose (generalized) density has the Fourier transform \((1 - vix^3 / 6) e^{- x^2 / 2}\), we get
Combining this expression of \(G_v(x)\) with the bounds Eqs. A20 and A21, we get
where we resort to the triangle inequality and to the fact that the principal value of the integral of a positive function is the (usual) integral of that function. Combining \(\Psi (-u) = \overline{\Psi }(u)\) and \(f(-u) = \overline{f(u)}\) with basic properties of conjugate and modulus, so that
Using this symmetry with respect to u, we obtain
By distinguishing the cases \(u \le T\) and \(u \ge T\), we obtain
We merge the last two terms together as they correspond to the same integrand, integrated from \(T/\pi \) to \(+ \infty \).
We use the triangle inequality to break the first integral into two parts
We successively split the first term into two integrals, and apply the triangle inequality to break the first integral into two parts
Appendix B: Control of \((\Omega _\ell )_{\ell =1}^4\)
1.1 B.1 Control of the term \(\varvec{\Omega _1}\)
The following lemma enables to control the term \(\Omega _1\). The same control is used in all cases (i.i.d. and i.n.i.d. cases, Theorems 1 and 2).
Lemma 10
For every \(T > 0\), we have
Proof
We can decompose \(\Omega _1(T, v, \tau )\) as
where
To compute \(I_{1,3}\) and \(I_{1,4}\), we used the change of variable \(u = (tT)^2/2\) and the incomplete Gamma function \(\Gamma (a,x) {:=} \int _x^{+\infty } u^{a-1} e^{-u} du\) which can be computed numerically using the package expint (Goulet , 2016) in R. We estimate numerically the first two integrals using the R package cubature (Narasimhan et al. , 2020) and optimize using the optimize function with the L-BFGS-B method, we find the following upper bounds:
which can be used to bound the first four terms.
By Lemma 18, we obtain
as claimed.
Note that
where we apply the asymptotic expansion \(\Gamma (a,x) = x^{a-1} e^{-x} (1 + O((a-1) / x))\) which is valid for every fixed a in the regime \(x\rightarrow \infty \), see Equation (6.5.32) in Abramowitz and Stegun (1972).
Note that the first term on the right-hand side of (B23) is of leading order as soon as \(|\lambda _{3,n}|/\sqrt{n} = o(1)\) and \(T = T(n) = o(1).\) Our approach is related to the one used in Shevtsova (2012), except that we do not upper bound \(\Omega _1\) analytically, which allows us to get a sharper control on this term. To further highlight the gains from using numerical approximations instead of direct analytical upper bounds, we remark that from \(\left| \Psi (t)-\frac{i}{2\pi t}\right| \le \frac{1}{2}\left( 1-|t|+\frac{\pi ^2t^2}{18}\right) \) and some integration steps, we get
whose main term is approximately twice as large as the numerical bound 1.2533 that we obtained before.
1.2 B.2 Control of the term \(\varvec{\Omega _2}\)
In this section, we control \(\Omega _2(T) = 2\int _{1/\pi }^1|\Psi (t)|\,|f_{S_{n}}(Tt)|dt.\) The control used in Theorem 2 comes directly from the upper bound on the absolute value of \(\Psi \) (Eq. A2):
In Theorem 1, we derive a bound based on the following lemma.
Lemma 11
Let \(t_1^* = \theta _1^* / (2\pi )\) where \(\theta _1^*\) is the unique root in \((0,2\pi )\) of the equation \(\theta ^2+2\theta \sin (\theta )+6(\cos (\theta )-1)=0\) and \(\xi _n {:=} \tilde{K}_{3,n}/\sqrt{n}\). We obtain
Proof of Lemma
11 Applying Theorem 2.2 in Shevtsova (2012) with \(\delta =1\), we get for all \(u \in \mathbb {R}\)
where \(\epsilon _n {:=} n^{-1/2} \tilde{K}_{3,n},\) and, for any real \(u, \epsilon > 0\)
Therefore,
Choosing \(u=2\pi t/\xi _n\), multiplying by \(|\Psi |\), integrating from \(1/\pi \) to 1 and separating the two cases yields the claimed inequalities. \(\Box \)
Recall that under moment conditions only, we choose \( T = \frac{2\pi }{\xi _n} = \frac{2 \pi \sqrt{n}}{\tilde{K}_{3,n}} \). Combining this with the two inequalities (i) and (ii) of Lemma 11 yields
where
Note that the difference in the two exponents of T in the above definitions may seem surprising as these two integrals look similar. However they have very different behaviors since the first one decays much faster than the second one. In line with Section 1, we compute numerically these integrals using the R package cubature (Narasimhan et al. , 2020) and optimize them using the optimize function with the L-BFGS-B method. This gives
Finally, we arrive at
1.3 B.3 Control of the term \(\varvec{\Omega _3}\)
We recall that \(\tau \) is defined as \( \tau = \sqrt{2\epsilon } (n/K_{4,n})^{1/4} \) (see Eq. A3).
Lemma 12
Under Assumption 1, we have for any \(\epsilon \in (0, 1/3)\) and any \(T > 0\),
where the functions \(R^\textrm{inid}_n\) and \(e_{1,n}\) are defined in Eqs. C31 and C32 respectively.
Under Assumption 2, we have
where the functions \(R^\textrm{iid}_n\) and \(e_{2,n}\) are defined in Eqs. C41 and C43 respectively.
Proof of Lemma
12 First, assume that Assumption 1 holds. Lemma 15 enables us to write
where the function \(J_1\) is defined in Eq. C50. Using Eq. 18, we obtain the bounds \(J_{1}(4, 0,+\infty ,T)\le 0.327\) and \(J_{1}(6, 0,+\infty ,T) \le 1.306.\) Besides, by the first inequality in Eq. A2, we get
showing Eq. B26 as claimed.
Assume now that Assumption 2 holds. The integrand of \(I_{4,1}(T)\) can be upper bounded thanks to Lemma 16. We obtain
This completes the proof of Eq. B27. \(\Box \)
1.4 B.4 Control of the term \(\varvec{\Omega _4}\)
In this section, we bound the fourth term of Eq. A1, which is
for \(f = f_{S_{n}}\).
We prove a bound on \(\Omega _4(\sqrt{2\epsilon }(n/K_{4,n})^{1/4} \wedge T/\pi , T/\pi , T)\) under four different sets of assumptions.
Lemma 13
Let \(-\infty< a \ne b < +\infty \) and \(T > 0\). Then
-
(i)
Under Assumption 1, we have
$$\begin{aligned} \big | \Omega _4(a,b, T) \big | \le \frac{K_{3,n}}{3\sqrt{n}} \Big | J_{2} \big (3, a, b , 2\sqrt{n}/\tilde{K}_{3,n} , T \big ) \Big |, \end{aligned}$$where \(J_2\) is defined in Eq. C51.
-
1.
Under Assumption 1 and assuming \(\mathbb {E}[X_i^3]=0\) for all \(i= 1,\dots ,n\), we get the improved bound
$$\begin{aligned} \big | \Omega _4(a, b, T) \big | \le \frac{K_{4,n}}{3n} \Big | J_{2} \big (4, a, b , 2\sqrt{n}/\tilde{K}_{3,n} , T) \Big |, \end{aligned}$$ -
2.
Under Assumption 2, we have
$$\begin{aligned} \big | \Omega _4(a, b, T) \big | \le \frac{K_{3,n}}{3 \sqrt{n}} \Big | J_{3}(3, a, b, 2\sqrt{n}/\tilde{K}_{3,n} , T) \Big |, \end{aligned}$$where \(J_3\) is defined in Eq. C52.
-
(ii)
Under Assumption 2 and assuming \(\mathbb {E}[X_i^3]=0\) for all \(i= 1,\dots ,n\), we get the improved bound
$$\begin{aligned} \big | \Omega _4(a, b, T) \big |&\le \frac{K_{4,n}}{3 n} \Big | J_{3}(4, a, b, 2\sqrt{n}/\tilde{K}_{3,n}, T) \Big |. \end{aligned}$$
Remark that if \(a < b\), the four inequalities hold without absolute values since \(\Omega _4\) and \(J_2\) are then non-negative.
Proof of Lemma
13(i). Let \(t \in \mathbb {R}\). As in the proof of Lemma 2.7 in Shevtsova (2012) with \(\delta = 1\), using the fact that for every \(i= 1,\dots ,n\), we have
so that
By Eq. C34, we have \(\max _{1\le i\le n}\sigma _i^2 \le B_n^2 \times (K_{4,n}/n)^{1/2}\) so that we obtain
Applying Lemma 2.8 in Shevtsova (2012), we get that for every variable X such that \(\mathbb {E}[|X|^3]\) is finite, \(|f(t) - e^{-\sigma ^2 t^2}| \le \mathbb {E}[|X|^3] \times |t|^3 / 6\). Therefore,
Integrating the latter equation, we have
as claimed.
Proof of Lemma
13(ii). This second part of the proof mostly follows the reasoning of the first one, with suitable modifications.
First, using a Taylor expansion of order 3 of \(f_{P_{X_i}}\) around 0 (with explicit Lagrange remainder) and the inequality \(\left| e^{-x} - 1 + x \right| \le x^2/2,\) we can claim for every real t
Reasoning as in the proof of Lemma 2.7 in Shevtsova (2012) with \(\delta = 1\), we obtain
Plugging this into the definition of \(I_{3,2}(T)\), we can write
as claimed.
Proof of Lemma
13(iii) Under the i.i.d. assumption, we can prove that, for every real t,
following the method of Lemma 13(i). Multiplying by \(|\Psi (t)|\) and integrating this, we get the claimed inequality.
Proof of Lemma
13(iv) This can be recovered using the same techniques as in the proof of Lemma 13(ii). \(\square \)
In Section 3, we want to give improved bounds that uses the tail behavior of \(f_{S_{n}}\) via the integral \(\int |f_{S_{n}}(u)| u^{-1} du\). Therefore, the following lemma is used to control \(\Omega _4\) in Theorem 2.
Lemma 14
Let \(T = 16 \pi ^4 n^2 / \tilde{K}_{3,n}^4 \). Then,
Note that the first term of this inequality will be bounded by Lemma 13. The second and fourth terms decrease to zero faster than polynomially with n (see Abramowitz and Stegun (1972) and the discussion at the end of Subsection 1). Finally, the term containing the integral of \(u^{-1}|f_{S_{n}}(u)|\) is the dominant one and allows us to use the assumption on the tail behavior of \(f_{S_{n}}\) to obtain Corollaries 3 (i.n.i.d. case) and 4 (i.i.d. case).
Proof of Lemma
14 We decompose \(\Omega _4\) in two parts
Note that the second term of this inequality can be bounded as
where
By the first inequality of Eq. B24 and our choice of T, we know \(|f_{S_{n}}(T^{1/4}v)|\) can be upper bounded by \(\exp (-T^{1/2}v^2(1-4\pi \chi _1|v|)/2)\) when \(v \in [1/\pi ,t_1^*].\) Using the properties of \(u \mapsto \Psi (u)\) in Eq. A2, the fact that \(1-4\pi \chi _1t_1^* > 0\) and a change of variable, we get
To control \(J_5(T)\), we use Eq. A2 to write
To control \(J_1(0, T^{1/4} / \pi , T / \pi , T)\), we use Eq. A2 and a change of variable
Appendix C: Technical lemmas
1.1 C.1 Control of the residual term in an Edgeworth expansion under Assumption 1
For \(\epsilon \in (0, 1/3)\) and \(t \ge 0\), let us define the following quantities:
We want to show the following lemma:
Lemma 15
Under Assumption 1, for every \(\epsilon \in (0,1/3)\) and t such that \(|t|\le \sqrt{2\epsilon }(n/K_{4,n})^{1/4},\) we have
Proof of Lemma
15 Remember that \(\gamma _j {:=} \mathbb {E}[X_j^4]\), \(\sigma _j {:=} \sqrt{\mathbb {E}[X_j^2]}\), \(B_n {:=} \)\( \sqrt{\sum _{i=1}^n \mathbb {E}[X_i^2]}\) and \(K_{4,n}{:=} n^{-1} \sum _{i=1}^n\mathbb {E}[X_i^4] \, / \left( n^{-1} B_n^2 \right) ^{2}\). Applying Cauchy-Schwartz inequality, we get
and
Combining (C34), (C35) and (C36), we observe that for every \(\epsilon \in (0,1)\) and t such that \(|t| \le \sqrt{2\epsilon }(n/K_{4,n})^{1/4},\)
As we assume that \(X_j\) has a moment of order four for every \(j=1,\dots ,n\), the characteristic functions \((f_{P_{X_j}})_{j=1,\dots ,n}\) are four times differentiable on \(\mathbb {R}\). Applying a Taylor-Lagrange expansion, we get the existence of a complex number \(\theta _{1,j,n}(t)\) such that \(|\theta _{1,j,n}(t)|\le 1\) and
for every \(t \in \mathbb {R}\) and \(j = 1,\dots ,n\). Let \(\log \) stand for the principal branch of the complex logarithm function. For every \(\epsilon \in (0,1/3)\) and t such that \(|t| \le \sqrt{2 \epsilon }(n / K_{4,n})^{1/4},\) Equation (C37) shows that \(|U_{j,n}(t)| \le 3\epsilon < 1\), so that we can use another Taylor-Lagrange expansion. This ensures existence of a complex number \(\theta _{2,j,n}(t)\) such that \(|\theta _{2,j,n}(t)| \le 1\) and
Summing over \(j= 1,\dots ,n\) and exponentiating, we can claim that under the same conditions on t and \(\epsilon ,\)
A third Taylor-Lagrange expansion guarantees existence of a complex number \(\theta _{3,n}(t)\) with modulus at most \(\exp \big ( \frac{t^4K_{4,n}}{24n} + \sum _{j=1}^n\frac{|U_{j,n}(t)|^2}{2|1+\theta _{2,j,n}(t)U_{j,n}(t)|^2} \big )\) such that
Using the triangle inequality and its reverse version, as well as the restriction on \(|t| \le \sqrt{2\epsilon }(n/K_{4,n})^{1/4},\) we can write
We now control \(\sum _{j=1}^n |U_{j,n}(t)|^2\). We first expand the squares, giving the decomposition
Using Eqs. C34-C36, we can bound the right-hand side of Equation (C39) in the following manner
and
Moreover, we have \(\sum _{j=1}^n U_{j,n}(t)^2\le \frac{t^4K_{4,n}}{n} P_{1,n}(\epsilon )\) under our conditions on \(\epsilon \) and t. Combining Equation (C38), the decomposition (C39) and the previous three bounds, and grouping similar terms together, we conclude that for every \(\epsilon \in (0,1/3)\) and t such that \({|t|\le \sqrt{2\epsilon }(n/K_{4,n})^{1/4},}\)
where \(e_{1,n}(\epsilon ) {:=} \exp \left( \epsilon ^2\left( \frac{1}{6}+\frac{2P_{1,n}(\epsilon )}{(1-3\epsilon )^2} \right) \right) .\) Combining this with the definition of \(R^\textrm{inid}_n(t,\epsilon )\) finishes the proof. \(\square \)
1.2 C.2 Control of the residual term in an Edgeworth expansion under Assumption 2
Lemma 15 can be improved in the i.i.d. framework. To do so, we introduce analogues of \(R^\textrm{inid}_n(t,\epsilon ),\) \(P_{1,n}(\epsilon ),\) \(e_{1,n}(\epsilon )\) and \(U_{1,2,n}(t)\) defined by
Note that
where
Lemma 16
Under Assumption 2, for every \(\epsilon \in (0,1/3)\) and t such that \(|t|\le \sqrt{2\epsilon } (n/K_{4,n})^{1/4},\)
Proof of Lemma
16 This proof is very similar to that of Lemma 15. We note that \(B_n = \sigma \sqrt{n}.\) As before, using two Taylor-Lagrange expansions successively, we can write that for every \(\epsilon \in (0,1/3)\) and t such that \(|t| \le \sqrt{2\epsilon n}/K_{4,n}^{1/4}\)
where
and \(\theta _{1,n}(t)\) and \(\theta _{2,n}(t)\) are two complex numbers with modulus bounded by 1. Using a third Taylor-Lagrange expansion, we can write that for some complex \(\theta _{3,n}(t)\) with modulus bounded by \(\exp \left( \frac{K_{4,n}t^4}{24n} + \frac{n|U_{1,n}(t)|^2}{2(1-3\epsilon )^2} \right) ,\) the following holds
Using the triangle inequality and its reverse version in addition to the condition \(|t| \le \sqrt{2\epsilon } (n/K_{4,n})^{1/4},\) we obtain
We can decompose \(n U_{1,n}(t)^2\) as
Combining Eqs. C45 and C46 and grouping terms, we conclude that for every \({\epsilon \in (0,1/3)}\) and t such that \(|t| \le \sqrt{2\epsilon }(n/K)^{1/4},\)
\(\Box \)
1.3 C.3 Bound on integrated \(R^\textrm{inid}_n\) and \(R^\textrm{iid}_n\)
1.3.1 C.3.1 Bound on integrated \(R^\textrm{inid}_n\)
Our goal in this section is to compute a bound on
where
where
Lemma 17
For any \(p > 0\), \(\int _0^{+\infty } u^p e^{-u^2/2} du = 2^{(p-1)/2} \Gamma \big ((p+1)/2 \big )\).
Proof
We use the change of variable \(v = u^2/2\), \(u = \sqrt{2v}\), \(dv = u du\), \(du = dv/\sqrt{2v}\), so that
and, by definition of \(\Gamma (\cdot )\), this is equal to \(2^{(p-1)/2} \Gamma \big ((p+1)/2 \big )\) as claimed.
By Lemma 17, we get the following equalities
When skewness is not ruled out, \(\overline{R}^\textrm{inid}_n(\epsilon )\) can be written as a polynomial in n with coefficients \(a_{k,n}\) that still depend on n but only through the moments \(\lambda _{3,n}\) and \(K_{4,n}\) (and are therefore constant when the distribution of the observations is fixed with the sample size)
When \(\mathbb {E}[X_i^3] = 0\) for every i, which implies \(\lambda _{3,n} = 0\), simplifications occur so that we get
1.3.2 C.3.1 Bound on integrated \(R^\textrm{iid}_n\)
Our goal in this section is to compute a bound on
where
By Lemma 17, we get
When skewness is not ruled out and \(K_{4,n} = O(1)\), the previous equalities show that \(\overline{R}^\textrm{iid}_n(\epsilon )\) is of order \(n^{-3/2}\) for any \(\epsilon \in (0,1/3)\). When \(\lambda _{3,n} = 0\), we get an improved rate equal to \(n^{-2}\).
In our main theorems, we set \(\epsilon = 0.1\). In that case, we can get two explicit boundsFootnote 2 on \(\overline{R}^\textrm{iid}_n(0.1)\). When skewness is not ruled out, the bound \(e_{2,n}\) can be written as in Eq. 7. The quantity \(e_{2,n}(0.1)\) that appears in the two previous expressions can be upper bounded by
where \(P_{2,n}(0.1)\) itself satisfies
1.4 C.4 Bounding incomplete Gamma-like integrals
For every \(p \ge 1\), \(0 \le l, m \le q\) and \(T > 0\), we define \(J_1\), \(J_2\), and \(J_3\) by
We show now that all these integrals can be bounded by differences of incomplete Gamma functions.
Lemma 18
We have
Proof of Lemma
18 Without loss of generality, we assume \(l \le m\). By the first inequality in (A2), we get
where we used the change of variable \(v=u^2/2\), \(dv = u du\), so that \(du = dv / \sqrt{2v}\). The proof is completed by recognizing that the last integral can be written as a difference of two incomplete Gamma functions. \(\square \)
Lemma 19
Let \(\Delta {:=} (1 - 4 \chi _1 - \sqrt{K_{4,n}/n}) / 2\) and \(\gamma (a, x) {:=} \int _0^x |v|^{a-1} \exp (-v)\)dv. We have
Proof of Lemma
19 Without loss of generality, we assume \(l \le m\). Using the first inequality in (A2) and the fact that \(0 \le u/q \le 1\) when \(u \in [l,m]\), we get
We can then write
If \(\Delta \ne 0\), we do the change of variable \(v = u^2 \Delta \), \(dv = 2 \Delta u du\), \(u = \sqrt{v / \Delta }\), \(du = (2 \sqrt{v \Delta })^{-1} dv\), and get
where we remarked that \(v / \Delta > 0\) in the sense that either \(\Delta > 0\) and in this case \(v > 0\) as well; or \(\Delta < 0\) and \(v < 0\) as well. Finally, we get
If \(\Delta \ne 0\), the bound can be rewritten as
Lemma 20
If \(n \ge 3\), then
Proof of Lemma
20 Without loss of generality, we assume \(l \le m\). Using the first inequality in (A2), we get
We bound u/q, by 1, so that
Note that \(1 - 4 \chi _1 - 1/n > 1/4\) when \(n \ge 3\). When this is the case, using the same change of variable and computations, we get the same result as for the previous lemma. \(\square \)
1.5 C.5 Statement and proof of Proposition 21
A bound on the tail of the characteristic function is nearly equivalent to a regularity condition on the density. We detail this in the following proposition. The first part of this proposition is taken from (Ushakov, 2011, Theorem 2.5.4) (see also Ushakov and Ushakov (1999)).
Proposition 21
Let \(p \ge 1\) be an integer, Q be a probability measure that admits a density q with respect to Lebesgue’s measure, and \(f_Q\) its corresponding characteristic function.
-
1.
If q is \((p-1)\) times differentiable and \(q^{(p-1)}\) is a function with bounded variation, then
$$\begin{aligned} |f_Q(t)| \le \frac{\textrm{Vari}[q^{(p-1)}]}{|t|^p}, \end{aligned}$$where \(\textrm{Vari}[\psi ]\) denotes the total variation of a function \(\psi \).
-
2.
If \(t \mapsto |t|^{p-1} |f_Q(t)|\) is integrable on a neighborhood of \(+ \infty \), then q is \((p-1)\) times differentiable.
Remark that the existence of \(C>0\) and \(\beta > 1\) such that \(|f_Q(t)| \le C / \big ( |t|^p \log (|t|)^\beta \big )\) is sufficient to satisfy the integrability condition in the second part of Proposition 21.
Proof of Proposition
212. The assumed integrability condition implies that \(f_Q\) is absolutely integrable, and therefore we can apply the inversion formula (Ushakov, 2011, Theorem 1.2.6) so that for any \(x \in \mathbb {R}\),
where \(r(x,t) {:=} \dfrac{1}{2 \pi } e^{-itx} f_P(t)\). Note that r is infinitely differentiable with respect to x, and that
which is integrable with respect to t, by assumption. This concludes the proof that q is \((p-1)\) times differentiable, as r is measurable.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Derumigny, A., Girard, L. & Guyonvarch, Y. Explicit Non-Asymptotic Bounds for the Distance to the First-Order Edgeworth Expansion. Sankhya A 86, 261–336 (2024). https://doi.org/10.1007/s13171-023-00320-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-023-00320-y