Keywords

1 Quadratic Forms

All three gems in probability theory—the law of large numbers, the central limit theorem and the law of the iterated logarithm—concern the asymptotic behavior of the sums of random variables. It would be natural to extend the results to functionals of the sums, in particular to quadratic forms. Moreover, in mathematical statistics there are numerous asymptotic problems which can be formulated in terms of quadratic or almost quadratic forms. In this article we review the corresponding results with rates of convergence. Some of these results are optimal and could not be further improved without additional conditions. The review does not pretend to completely illuminate the present state of the area under consideration. It reflects mainly the authors interests.

Let \(X,X_{1},X_{2},\ldots\) be independent identically distributed random elements with values in a real separable Hilbert space H. The dimension of H, say dim(H), could be either infinite or finite. Let (x, y) for x, y ∈ H denote the inner product in H and put \(\vert x\vert = {(x,x)}^{1/2}\). We assume that \(\mathbf{E}\vert X_{1}{\vert }^{2} < \infty \) and denote by V the covariance operator of X 1:

$$(V x,y) = \mathbf{E}(X_{1} -\mathbf{E}X_{1},x)(X_{1} -\mathbf{E}X_{1},y).$$

Let \(\sigma _{1}^{2} \geq \sigma _{2}^{2} \geq \ldots\) be the eigenvalues of V and let \(e_{1},e_{2},\ldots\) be the corresponding eigenvectors which we assume to be orthonormal.

For any integer k > 0 we put

$$\displaystyle\begin{array}{rcl} c_{k}(V ) =\displaystyle\prod _{ 1}^{k}\sigma _{ i}^{-1},\quad \overline{c}_{ k}(V ) = {(\displaystyle\prod _{1}^{k}\sigma _{ i}^{-1})}^{(k-1)/k}.& &\end{array}$$
(1)

In what follows we use c and c( ⋅), with or without indices, to denote the absolute constants and the constants depending on parameters in brackets. Except for c i (V ) and \(\overline{c}_{i}(V )\) the same symbol may be used for various constants.

We define

$$S_{n} = {n{}^{-1/2}\sigma }^{-1}\displaystyle\sum _{ i=1}^{n}(X_{ i} -\mathbf{E}X_{i}),$$

where \(\sigma^{2} = \mathbf{E}\vert X_{1} -\mathbf{E}X_{1}{\vert }^{2}\). Without loss of generality we may assume that E X 1 = 0 and \(\mathbf{E}\vert X_{1}{\vert }^{2} = 1\). The general case can be reduced to this one considering \((X_{i} -\mathbf{E}X_{i})/\sigma\) instead of X i , \(i = 1,2,\ldots\). Let Y be H-valued Gaussian (0, V ) random element. We denote the distributions of S n and Y by P n and Q respectively.

The central limit theorem asserts that

$$P_{n}(B) - Q(B) \rightarrow 0$$

for any Borel set B in H provided Q(∂B) = 0, where ∂B is the boundary of B. The estimate of the rate of convergence in the central limit theorem is an estimate of the quantity \(\sup _{\mathcal{A}}\vert P_{n}(A) - Q(A)\vert \) for various classes \(\mathcal{A}\) of measurable sets A.

The most famous is the Berry-Esseen bound (see [5, 9]) when H = R, i.e. dim(H) = 1, and \(\mathcal{A} = \mathcal{A}_{1} =\{ (-\infty ,x),\,x \in \mathbb{R}\}\):

$$\displaystyle\begin{array}{rcl} \sup _{\mathcal{A}_{1}}\vert P_{n}(A) - Q(A)\vert \leq c\,\frac{\mathbf{E}\vert X_{1}{\vert }^{3}} {\sqrt{n}}.& &\end{array}$$
(2)

The bound is optimal with respect to dependence on n and moments of X 1. The lower bound for the constant c in (2) is known (see [11]):

$$c \geq \frac{3 + \sqrt{10}} {6\sqrt{2\pi }} = 0.40\ldots$$

The present upper bounds for c : c ≤ 0. 47 (see [41, 43]) still differ from the lower bound slightly.

In the multidimensional case when H = R d, i.e. dim(H) = d > 1, it is possible to extend the class \(\mathcal{A}\) to the class of all convex Borel sets in H and to get a bound (see e.g. [2, 33])

$$\sup _{\mathcal{A}}\vert P_{n}(A) - Q(A)\vert \leq c\,\sqrt{d}\,\frac{\mathbf{E}\vert X_{1}{\vert }^{3}} {\sigma _{d}^{3}\,\sqrt{n}}.$$

If we consider an infinite dimensional space H and take \(\mathcal{A}\) as the class of all half-spaces in H then one can show (see e.g. pp. 69–70 in [34]) that there exists a distribution of X 1 such that

$$\displaystyle\begin{array}{rcl} \sup _{\mathcal{A}}\vert P_{n}(A) - Q(A)\vert \geq 1/2.& &\end{array}$$
(3)

Therefore, in the infinite dimensional case we can construct upper bound for \(\sup _{\mathcal{A}}\vert P_{n}(A) - Q(A)\vert \) provided that \(\mathcal{A}\) is a relatively narrow class, e.g. the class of all balls \(B(a,x) =\{ y : y \in H\,\mbox{ and}\,\vert y - a{\vert }^{2} \leq x\}\) with fixed center a or the class of all balls with fixed bounded radius \(\sqrt{x}\). However, the good news are that the numerous asymptotic problems in statistics can be reformulated in terms of these or similar classes (see e.g. Sect. 2).

Put for any a ∈ H

$$F(x) = P_{n}(B(a,x)),\,\,F_{0}(x) = Q(B(a,x)),\,\,\delta _{n}(a) =\sup _{x}\vert F(x) - F_{0}(x)\vert.$$

According to (3) it is impossible to prove upper bound for \(\sup _{a}\delta _{n}(a)\) which tends to 0 as | a | → . The upper bound for δ n (a) should depend on a and becomes in general bad as | a | grows.

The history of constructing bounds for δ n (a) in the infinite dimensional case can be divided roughly into three phases: proving bounds with optimal

  • Dependence on n;

  • Moment conditions;

  • Dependence on the eigenvalues of V.

The first phase started in the middle of 1960s in the twentieth century with bounds of logarithmic order for δ n (a) (see [27]) and ended with the result:

$$\delta _{n}(a) = \mathcal{O}({n}^{-1/2}),$$

due to Götze [12], which was based on a Weyl type symmetrization inequality (see Lemma 3.37 (i) in [12]):

Let X, Y, Z be the independent random elements in H. Then

$$\displaystyle\begin{array}{rcl} \mathbf{E}\exp \{i\tau \vert X + Y + Z{\vert }^{2}\} \leq {(\mathbf{E}\exp \{2i\tau (\widetilde{X},\widetilde{Y }))}^{1/4},& &\end{array}$$
(4)

where \(\widetilde{X}\) is the symmetrization of X, i.e \(\widetilde{X} = X - X^\prime \) with independent and identically distributed X and X′. The main point of the inequality is that it enables us to reduce the initial problem with non-linear dependence on X in power of exp to linear one. The inequality since then has been successfully applied and developed by a number of the authors.

The second phase of the history finished with a paper by Yurinskii [48] who proved

$$\delta _{n}(a) \leq \frac{c(V )} {\sqrt{n}} \;(1 + \vert a{\vert }^{3})\;\mathbf{E}\vert X_{ 1}{\vert }^{3},$$

where c(V ) denotes a constant depending on V only. The Yurinskii result has the optimal dependence on n under minimal moment condition but dependence of c(V ) on characteristics of the operator V was still unsatisfactory.

At the end of the third phase it was proved (see [28, 36, 39])

$$\displaystyle\begin{array}{rcl} \delta _{n}(a) \leq \frac{c\;c_{6}(V )} {\sqrt{n}} \;(1 + \vert a{\vert }^{3})\;\mathbf{E}\vert X_{ 1}{\vert }^{3},& &\end{array}$$
(5)

where c 6(V ) is defined in (1). It is known (see Example 3 in [38]) that for any c 0 > 0 and for any given eigenvalues \(\sigma _{1}^{2},\ldots ,\sigma _{6}^{2} > 0\) of a covariance operator V there exist a vector a ∈ H = R 7,   | a |  > c 0, and a sequence \(X_{1},X_{2},\ldots\) of i.i.d. random elements in H = R 7 with zero mean and covariance operator V such that

$$\displaystyle\begin{array}{rcl} \liminf _{n\rightarrow \infty }\sqrt{ n}\;\delta _{n}(a) \geq c\;c_{6}(V )\;(1 + \vert a{\vert }^{3})\;\mathbf{E}\vert X_{ 1}{\vert }^{3}.& &\end{array}$$
(6)

Due to (6) the bound (5) is the best possible in case of the finite third moment of | X 1 | . For further refinements see e.g. [40]. For the results for the case of non-identically distributed random elements in H see [44].

At the same time better approximations for F(x) are available when we use for approximation an additional term, say F 1(x), of its asymptotic expansion. This term F 1(x) is defined as the unique function satisfying \(F_{1}(-\infty ) = 0\) with the Fourier-Stieltjes transform equal to

$$\displaystyle\begin{array}{rcl} \hat{F}_{1}(t)& =& - \frac{2{t}^{2}} {3\sqrt{n}}\mathbf{E}e\{t\vert Y - a{\vert }^{2}\}\left (3(X,Y - a)\vert X{\vert }^{2}\right. \\ & & \,\,\,\left.+2\mathit{it}{(X,Y - a)}^{3}\right ). \end{array}$$
(7)

Here and in the following X and Y are independent and we write \(e\{x\} =\exp \{ ix\}.\)

In case dim(H) <  the term F 1(x) can be defined in terms of the density function of the normal distribution (see [6]). Let \(\varphi\) denote the standard normal density in R d. Then the density function p(y) of the normal distribution Q is defined by \(p(y) =\varphi ({V }^{-1/2}y)/\sqrt{\det V},\,\,y \in {\mathbf{R}}^{d}\). We have

$$F_{1}(x) = \frac{1} {6\,\sqrt{n}}\,\chi (A_{x}),\quad A_{x} =\{ u \in {\mathbf{R}}^{d} : \vert u - a{\vert }^{2} \leq x\}$$

with the signed measure

$$\chi (A) =\displaystyle\int _{A}\mathbf{E}\,{p}^{\prime\prime}(y)\,{X}^{3}\,dy\quad \mbox{ for the Borel sets}\quad A \subset {\mathbf{R}}^{d}$$

and

$${p}^{\prime\prime}(y)\,{u}^{3} = p(y)(3({V }^{-1}u,u)({V }^{-1}y,u) - {({V }^{-1}y,u)}^{3})$$

is the third Frechet derivative of p in the direction u.

Introduce the error

$$\Delta _{n}(a) =\sup _{x}\vert F(x) - F_{0}(x) - F_{1}(x)\vert.$$

Note, that \(\hat{F}_{1}(t) = 0\) and hence F 1(x) = 0 when a = 0 or X has a symmetric distribution, i.e. when X and − X are identically distributed. Therefore, we get

$$\Delta _{n}(0) =\delta _{n}(0).$$

Similar to the developments of the bounds for δ n (a) the first task consisted in deriving the bounds for \(\Delta _{n}(a)\) with the optimal dependence on n. Starting with a seminal paper by Esseen [10] for the finite dimensional spaces H = R d,  d < , who proved

$$\displaystyle\begin{array}{rcl} \Delta _{n}(0) = \mathcal{O}({n}^{-d/(d+1)}),& &\end{array}$$
(8)

a comparable bound

$$\Delta _{n}(0) = \mathcal{O}({n}^{-\gamma })$$

with \(\gamma = 1 - \epsilon \) for any \(\epsilon > 0\) was finally proved in [12, 13], based on the Weyl type inequalities mentioned above. Further refinements and generalizations in the case a≠0 and γ < 1 are due to Nagaev and Chebotarev [29], Sazonov et al. [35].

Note however, that the results in the infinite dimensional case did not even yield (8) as corollary when \(\sigma _{d+1} = 0\), i.e. dim(H) = d. Only 50 years after Esseen’s result the optimal bounds (in n) were finally established in [3]

$$\displaystyle\begin{array}{rcl} \Delta _{n}(0) \leq \frac{c(9,V )} {n} \;\mathbf{E}\vert X_{1}{\vert }^{4},& &\end{array}$$
(9)
$$\displaystyle\begin{array}{rcl} \Delta _{n}(a) \leq \frac{c(13,V )} {n} \;(1 + \vert a{\vert }^{6})\;\mathbf{E}\vert X_{ 1}{\vert }^{4},& &\end{array}$$
(10)

where \(c(i,V ) \leq \exp \{ c\sigma _{i}^{-2}\},\;\;i = 9,13,\) and in the case of the bound (9) it was additionally assumed that the distribution of X 1 is symmetric. In order to derive these bound new techniques were developed, in particular the so-called multiplicative inequality for the characteristic functions (see Lemma 3.2, Theorem 10.1 and formulas (10.7)–(10.8) in [4]):

Let \(\varphi (t),\;t \geq 0,\) denote a continuous function such that \(0 \leq \varphi \leq 1\). Assume that

$$\displaystyle\begin{array}{rcl} \varphi (t)\;\varphi (t+\tau ) \leq \theta {\mathcal{M}}^{d}(\tau ,N)& & \\ \end{array}$$

for all t ≥ 0 and τ > 0 with some θ ≥ 1 independent of t and τ, where

$$\mathcal{M}(t,n) = 1/\sqrt{\vert t\vert n} + \sqrt{\vert t\vert }\,\mbox{ for}\,\vert t\vert > 0.$$

Then for any 0 < B ≤ 1 and N ≥ 1

$$\displaystyle\begin{array}{rcl} \displaystyle\int _{B/\sqrt{N}}^{1}\frac{\varphi (t)} {t} \;dt \leq c(s)\;\theta \;({N}^{-1} + {(B\sqrt{N})}^{-d/2})\;\;\;\mbox{ for}\;\;\;d > 8.& &\end{array}$$
(11)

The previous Weyl type inequality (4) gave the bounds for the integrals

$$\displaystyle\int _{D(n,\gamma )}\frac{\mathbf{E}e\{t\vert S_{n} - a{\vert }^{2}\}} {\vert t\vert } \,dt$$

for the areas \(D(n,\gamma ) =\{ t : {n}^{1/2} < \vert t\vert \leq {n}^{\gamma }\}\) with γ < 1 only, while (11) enables to extend the areas of integration up to γ = 1.

The bounds (9) and (10) are optimal with respect to the dependence on n [14] and on the moments. The bound (9) improves as well Esseen’s result (8) for the Euclidean spaces R d with d > 8. However, the dependence on covariance operator V in (9), (10) could be improved. Nagaev and Chebotarev [30] considered the case a = 0 and got a bound of type (9) replacing c(9, V ) by the following function c(V ):

$$c(V ) = c\;\left (\overline{c}_{13}(V ) + {(c_{9}(V ))}^{4/9}\sigma _{ 9}^{-6}\right ),$$

where \(\overline{c}_{13}(V )\) and c 9(V ) are defined by (1). The general case a≠0 was considered in [31] (see their Theorem 1.2). The Nagaev and Chebotarev results improve the dependence on the eigenvalues of V (compared to (10)) but still require that σ 13 > 0 instead of the weaker condition σ 9 > 0 in (9). However, it follows from Lemma 2.6 in [17] that for any given eigenvalues \(\sigma _{1}^{2},\ldots ,\sigma _{12}^{2} > 0\) of a covariance operator V there exist a ∈ H = R 13,   | a |  > 1, and a sequence \(X_{1},X_{2},\ldots\) of i.i.d. random elements in H = R 13 with zero mean and covariance operator V such that

$$\displaystyle\begin{array}{rcl} \liminf _{n\rightarrow \infty }n\;\Delta _{n}(a) \geq c\;c_{12}(V )\;(1 + \vert a{\vert }^{6})\;\mathbf{E}\vert X_{ 1}{\vert }^{4}.& &\end{array}$$
(12)

The bound with dependence on 12 largest eigenvalues of the operator V was obtained only in [46] (for the first version see Corollary 1.3 in [17]). Moreover, in [46] the dependence on the eigenvalues is given in the bound in the explicit form which coincides with the form given by the lower bound (12):

Theorem 1.1.

There exists an absolute constant c such that for any a ∈ H

$$\displaystyle\begin{array}{rcl} \Delta _{n}(a)& \leq & \frac{c} {n} \cdot c_{12}(V ) \cdot {\bigl (\mathbf{E}\vert X_{1}{\vert }^{4} + \mathbf{E}{(X_{ 1},a)}^{4}\bigr )} \\ & & \,\,\,\times {\bigl (1 + (V a,a)\bigr )}, \end{array}$$
(13)

where c 12 (V ) is defined in (1).

According to the lower bound (12) the estimate (13) is the best possible in the following sense:

  • It is impossible that \(\Delta _{n}(a)\) is of order \(\mathcal{O}({n}^{-1})\) uniformly for all distributions of X 1 with arbitrary eigenvalues \(\sigma _{1}^{2},\sigma _{2}^{2},\ldots\);

  • The form of the dependence of the right-hand side in (13) on the eigenvalues of V , on n and on \(\mathbf{E}\vert X_{1}{\vert }^{4}\) coincides with one given in the lower bound.

For earlier versions of this result on the optimality of 12 eigenvalues and a detailed discussion of the connection of the rate problems in the central limit theorem with classical lattice point problems in analytic number theory, see the ICM-1998 Proceedings paper by Götze [14], and also Götze and Ulyanov [17].

Note however, that in the special ‘symmetric’ cases of the distribution of X 1 or of the center, say a, of the ball, the number of the eigenvalues which are necessary for optimal bounds may well decrease below 12. For example, when E(X, b)3 = 0 for all b ∈ H, by Corollary 2.7 in [17], for any given eigenvalues \(\sigma _{1}^{2},\ldots ,\sigma _{8}^{2} > 0\) of a covariance operator V there exists a center a ∈ H = R 9,   | a |  > 1, and a sequence \(X,X_{1},X_{2},\ldots\) of i.i.d. random elements in H = R 9 with zero mean and the covariance operator V such that

$$\liminf _{n\rightarrow \infty }n\;\Delta _{n}(a) \geq c\;c_{8}(V )\;(1 + \vert a{\vert }^{4})\;\mathbf{E}\vert X_{ 1}{\vert }^{4}.$$

Hence, in this case an upper bound of order \(\mathcal{O}({n}^{-1})\) for \(\Delta _{n}(a)\) has to involve at least the eight largest eigenvalues of V.

Furthermore, lower bounds for \(n\Delta _{n}(a)\) in the case a = 0 are not available. A conjecture, see [14], said that in that case the five first eigenvalues of V suffice. That conjecture was confirmed in Theorem 1.1 in [19] with result \(\Delta _{n}(0) = \mathcal{O}({n}^{-1})\) provided that σ 5 > 0 only. Note that for some centered ellipsoids in R d with d ≥ 5 the bounds of order \(\mathcal{O}({n}^{-1})\) were obtained in [18]. Moreover, it was proved recently (see Corollary 2.4 in [20]) that even for a≠0 we have \(\Delta _{n}(a) = \mathcal{O}({n}^{-1})\) when H = R d,  5 ≤ d < , and the upper bound for \(\Delta _{n}(a)\) is written in the explicit form and depends on the smallest eigenvalue σ d (see Theorem 1.4 in [21] as well). It is necessary to emphasize that (13) implies \(\Delta _{n}(a) = \mathcal{O}({n}^{-1})\) for general infinite dimensional space H with dependence on the first twelve eigenvalues of V only.

The proofs of the recent results due to Götze, Ulyanov and Zaitsev are based on the reduction of the original problem to lattice valued random vectors and on the symmetrization techniques developed in a number of papers, see e.g. Götze [12], Yurinskii [48], Sazonov et al. [3537], Götze and Ulyanov [17], Bogatyrev et al. [7]. In the proofs we use also the new inequalities obtained in Lemma 6.5 in [20] and in [16] (see Lemma 8.2 in [20]). In fact, the bounds in [20] are constructed for more general quadratic forms of the type (ℚx, x) with non-degenerate linear symmetric bounded operator in R d.

One of the basic lemma to prove (13) is the following (see Lemma 2.2 in [17]):

Let T > 0, b ∈ R 1, b≠0, l be an integer, l ≥ 1, \(Y = (Y _{1},\ldots ,Y _{2l})\) be a Gaussian random vector with values in \({\mathbf{R}}^{2l};\,Y _{1},\ldots ,Y _{2l}\) be independent and E Y i  = 0, \(\mathbf{E}Y _{i}^{2} =\sigma _{ i}^{2}\) for \(i = 1,2,\ldots ,2l\); \(\sigma _{1}^{2} \geq \sigma _{2}^{2} \geq \ldots \geq \sigma _{2l}^{2} > 0\) and a ∈ R 2l. Then there exists a positive constant c = c(l) such that

$$\bigr |\displaystyle\int _{-T}^{T}{s}^{l-1}\mathbf{E}\exp \{is\vert Y + a{\vert }^{2}\}{e}^{ibs}\mathit{ds}\bigl | \leq c\displaystyle\prod _{ j=1}^{2l}\sigma _{ j}^{-1}.$$

For non-uniform bounds with 12 eigenvalues of covariance operator V see [7].

For estimates for the characteristic functions of polynomials (of order higher than 2) of asymptotically normal random variables see [22], for related results see also [23].

2 Applications in Statistics: Almost Quadratic Forms

In this section we consider the accuracy of approximations for the distributions of sums of independent random elements in k − 1-dimensional Euclidian space. The approximation is considered on the class of sets which are “similar” to ellipsoids. Its appearance is motivated by the study of the asymptotic behavior of the goodness-of-fit test statistics—power divergence family of statistics.

Consider a vector \({(Y _{1},\ldots ,Y _{k})}^{T}\) with multinomial distribution M k (n, π), i. e.

$$\Pr (Y _{1} = n_{1},\ldots ,Y _{k} = n_{k}) = \left \{\begin{array}{@{}l@{\quad }l@{}} n!\displaystyle\prod \nolimits _{j=1}^{k}(\pi _{j}^{n_{j}}/n_{j}!),\quad &n_{j} = 0,1,\ldots ,n\ (j = 1,\ldots ,k) \\ \quad &\mbox{ and}\displaystyle\sum \nolimits _{j=1}^{k}n_{j} = n, \\ 0, \quad &\mbox{ otherwise,} \end{array} \right.$$

where \(\pi = {(\pi _{1},\ldots ,\pi _{k})}^{T},\pi _{j} > 0,\sum _{j=1}^{k}\pi _{j} = 1\). From this point on, we will assume the validity of the hypothesis \(H_{0}: \pi =\boldsymbol{ p}\). Since the sum of n i equals n, we can express this multinomial distribution in terms of a vector \(\boldsymbol{Y } = (Y _{1},\ldots ,Y _{k-1})\) and denote its covariance matrix \(\Omega \). It is known that so defined \(\Omega \) equals \((\delta _{i}^{j}\,p_{i} - p_{i}p_{j}) \in {\mathbf{R}}^{(k-1)\times (k-1)}\). The main object of the current study is the power divergence family of goodness-of-fit test statistics:

$$t_{\lambda }(\boldsymbol{Y }) = \frac{2} {\lambda (\lambda +1)}\displaystyle\sum _{j=1}^{k}Y _{ j}\left [{\left ( \frac{Y _{j}} {\mathit{np}_{j}}\right )}^{\lambda } - 1\right ],\ \lambda \in \mathbf{R},$$

When \(\lambda = 0,-1\), this notation should be understood as a result of passage to the limit.

These statistics were first introduced in [8] and [32]. Putting \(\lambda = 1,\lambda = -1/2\) and λ = 0 we can obtain the chi-squared statistic, the Freeman-Tukey statistic, and the log-likelihood ratio statistic respectively.

We consider transformation

$$X_{j} = (Y _{j} -\mathit{np}_{j})/\sqrt{n},\ j = 1,\ldots ,k,\ r = k - 1,\ \boldsymbol{X} = {(X_{1},\ldots ,X_{r})}^{T}.$$

Herein the vector \(\boldsymbol{X}\) is the vector taking values on the lattice,

$$L\,=\,\left \{\boldsymbol{x}\,=\,{(x_{1},\ldots ,x_{r})}^{T};\,\boldsymbol{x}\,=\,\frac{\boldsymbol{m} -\mathit{n\boldsymbol{p}}} {\sqrt{n}} ,\,\boldsymbol{p}\,=\,{(p_{1},\ldots ,p_{r})}^{T},\,\boldsymbol{m}\,=\,{(n_{ 1},\ldots ,n_{r})}^{T}\right \},$$

where n j are non-negative integers.

The statistic \(t_{\lambda }(\boldsymbol{Y })\) can be expressed as a function of \(\boldsymbol{X}\) in the form

$$T_{\lambda }(\boldsymbol{X}) = \frac{2n} {\lambda (\lambda +1)}\left [\displaystyle\sum _{j=1}^{k}p_{ j}\left ({\left (1 + \frac{X_{j}} {\sqrt{n}p_{j}}\right )}^{\lambda +1} - 1\right )\right ],$$
(15)

and then, via the Taylor expansion, transformed to the form

$$T_{\lambda }(\boldsymbol{X}) =\displaystyle\sum _{ i=1}^{k}\left (\frac{X_{i}^{2}} {p_{i}} + \frac{(\lambda -1)X_{i}^{3}} {3\sqrt{\mathit{np}}_{i}^{2}} + \frac{(\lambda -1)(\lambda -2)X_{i}^{4}} {12p_{i}^{3}n} + O\left ({n}^{-3/2}\right )\right ).$$

As we see the statistics \(T_{\lambda }(\boldsymbol{X})\) is “close” to quadratic form

$$T_{1}(\boldsymbol{X}) =\displaystyle\sum _{ i=1}^{k}\frac{X_{i}^{2}} {p_{i}} ,$$

considered in Sect. 1.

We call a set B ⊂ R r an extended convex set, if for for all \(\,l = 1,\ldots ,r\) it can be expressed in the form:

$$\displaystyle\begin{array}{rcl} B =\{\boldsymbol{ x} = {(x_{1},\ldots ,x_{r})}^{T}: \lambda _{ l}({x}^{{\ast}}) < x_{ l}& <& \theta _{l}({x}^{{\ast}})\mbox{ and } \\ & & {x}^{{\ast}} = {(x_{ 1},\ldots ,x_{l-1},x_{l+1},\ldots ,x_{r})}^{T} \in B_{ l}\}, \\ \end{array}$$

where B l is some subset of R r − 1 and \(\lambda _{l}({x}^{{\ast}}),\theta _{l}({x}^{{\ast}})\) are continuous functions on R r − 1. Additionally, we introduce the following notation

$$\displaystyle\begin{array}{rcl} [h(\boldsymbol{x})]_{\lambda _{l}({x}^{{\ast}})}^{\theta _{l}({x}^{{\ast}}) }& & = h(x_{1},\ldots ,x_{l-1},\theta _{l}({x}^{{\ast}}),x_{ l+1},\ldots ,x_{r}) \\ & & \quad - h(x_{1},\ldots ,x_{l-1},\lambda _{l}({x}^{{\ast}}),x_{ l+1},\ldots ,x_{r}).\end{array}$$

It is known that the distributions of all statistics in the family converge to chi-squared distribution with k − 1 degrees of freedom (see e.g. [8], p. 443). However, more intriguing is the problem to find the rate of convergence to the limiting distribution.

For any bounded extended convex set B in [47] it was obtained an asymptotic expansion, which in [42] was converted to

$$\displaystyle\begin{array}{rcl} \Pr (\boldsymbol{X} \in B) = J_{1} + J_{2} + O({n}^{-1}).& &\end{array}$$
(16)

with

$$\displaystyle\begin{array}{rcl} & & J_{1} =\displaystyle\int \cdots \displaystyle\int \nolimits _{B}\phi (\boldsymbol{x})\left \{1 + \frac{1} {\sqrt{n}}\,h_{1}(\boldsymbol{x}) + \frac{1} {n}\,h_{2}(\boldsymbol{x})\right \}\,dx,\mbox{ where} \\ & & \qquad \qquad h_{1}(\boldsymbol{x}) = -\frac{1} {2}\displaystyle\sum _{j=1}^{k}\frac{x_{j}} {p_{j}} + \frac{1} {6}\displaystyle\sum _{j=1}^{k}x_{ j}{\left (\frac{x_{j}} {p_{j}}\right )}^{2}, \\ & & h_{2}(\boldsymbol{x}) = \frac{1} {2}\,h_{1}{(\boldsymbol{x})}^{2} + \frac{1} {12}\left (1 -\displaystyle\sum _{j=1}^{k} \frac{1} {p_{j}}\right ) + \frac{1} {4}\displaystyle\sum _{j=1}^{k}{\left (\frac{x_{j}} {p_{j}}\right )}^{2} - \frac{1} {12}\displaystyle\sum _{j=1}^{k}x_{ j}{\left (\frac{x_{j}} {p_{j}}\right )}^{3}; \\ \end{array}$$
$$\displaystyle\begin{array}{rcl} & & J_{2} = - \frac{1} {\sqrt{n}}\displaystyle\sum \nolimits _{l=1}^{r}{n}^{-(r-l)/2}\displaystyle\sum \nolimits _{ x_{l+1}\in L_{l+1}}\cdots \displaystyle\sum \nolimits _{x_{r}\in L_{r}} \\ & & \qquad \qquad \left [\displaystyle\int \cdots \displaystyle\int \nolimits _{B_{l}}[S_{1}(\sqrt{n}x_{l} + np_{l})\phi (\boldsymbol{x})]_{\lambda _{l}({x}^{{\ast}})}^{\theta _{l}({x}^{{\ast}}) }\,dx_{1},\cdots \,,dx_{l-1}\right ];\end{array}$$
(17)
$$\displaystyle\begin{array}{rcl} & & L_{j} = \left \{\boldsymbol{x}: \,x_{j} = \frac{n_{j} -\mathit{np}_{j}} {\sqrt{n}} ,\ n_{j}\mbox{ and }p_{j}\mbox{ defined as before}\right \}; \\ & & S_{1}(x) = x -\lfloor x\rfloor - 1/2,\mbox{ $\lfloor x\rfloor $ is the integer part of $x$}; \\ & & \phi (\boldsymbol{x}) = \frac{1} {{(2\pi )}^{r/2}\vert \Omega {\vert }^{1/2}}\exp \left (-\frac{1} {2}\boldsymbol{{x}}^{T}{\Omega }^{-1}\boldsymbol{x}\right ).\end{array}$$

In [47] it was shown that \(J_{2} = O({n}^{-1/2})\).

Using elementary transformations it can be easily shown that the determinant of the matrix \(\Omega \) equals \(\prod _{i=1}^{k}p_{i}\).

In [47] it was also examined the expansion for the most known power divergence statistic, which is the chi-squared statistic. Put \({B}^{\lambda } =\{\boldsymbol{ x}\mid T_{\lambda }(\boldsymbol{x}) < c\}\). It is easy to show that B 1 is an ellipsoid, which is a particular case of a bounded extended convex set. Yarnold managed to simplify the item (17) in this simple case and converted the expansion (16) to

$$\displaystyle\begin{array}{rcl} \Pr (\boldsymbol{X} \in {B}^{1})& =& G_{ r}(c) + ({N}^{1} - {n}^{r/2}{V }^{1}){e}^{-c/2}\Big/{\left ({(2\pi n)}^{r}\displaystyle\prod \nolimits _{ j=1}^{k}p_{ j}\right )}^{1/2} \\ & & +O({n}^{-1}), \end{array}$$
(18)

where G r (c) is the chi-squared distribution function with r degrees of freedom; N 1 is the number of points of the lattice L in B 1; V 1 is the volume of B 1. Using the result from Esseen [10], Yarnold obtained an estimate of the second item in (18) in the form \(O({n}^{-(k-1)/k})\). If we estimate the second term in (18) taking the result from Götze [15] instead of Esseen’s one from Esseen [10] we get (see [18]) in the case of the Pearson chi-squared statistics, i.e. when λ = 1, that for r ≥ 5

$$\displaystyle\begin{array}{rcl} \Pr (\boldsymbol{X} \in {B}^{1}) = G_{ r}(c) + O({n}^{-1}).& & \\ \end{array}$$

In [42] it was shown that, when \(\lambda = 0,\lambda = -1/2\), we have

$$\displaystyle\begin{array}{rcl} & & \qquad \qquad \qquad J_{1} = G_{r}(c) + O({n}^{-1}) \\ & & J_{2} = ({N}^{\lambda } - {n}^{r/2}{V }^{\lambda })\left.{e}^{-c/2}\right /{\left ({(2\pi n)}^{r}\displaystyle\prod \nolimits _{ j=1}^{k}p_{ j}\right )}^{1/2} + o(1), \\ & & \qquad \qquad \qquad {V }^{\lambda } = {V }^{1} + O({n}^{-1}). \end{array}$$
(19)

These results were expanded by Read to the case λ ∈ R. In particular, Theorem 3.1 in [32] implies

$$\Pr \left (T_{\lambda } < c\right ) =\Pr \left (\chi _{r}^{2} < c\right ) + J_{ 2} + O\left ({n}^{-1}\right ).$$
(20)

This reduces the problem to the estimation of the order of J 2.

It is worth mentioning that in [42] and in [32] there is no estimate for the residual in (19). Consequently, it is impossible to construct estimates of the rate of convergence of the statistics T λ to the limiting distribution, based on the simple representation for J 2 initially suggested by Yarnold.

In [45] and in [1] the rate of convergence for the residual in (19) was obtained for any power divergence statistic. Then we constructed an estimate for J 2 based on the fundamental number theory results of Hlawka [25] and Huxley [26] about an approximation of a number of the integer points in the convex sets (more general than ellipsoids) by the Lebesgue measure of the set.

Therefore, one of the main point is to investigate the applicability of the afore-mentioned theorems from number theory to the set B λ.

In [45] it is shown that \({B}^{\lambda } =\{\boldsymbol{ x}\mid T_{\lambda }(\boldsymbol{x}) < c\}\) is a bounded extended convex (strictly convex) set. As it has been already mentioned, in accordance with the results of Yarnold [47]

$$J_{2} = O\left ({n}^{-1/2}\right ).$$

For the specific case of r = 2 this estimate has been considerably refined in [1]:

$$J_{2} = O\left ({n}^{-3/4+\epsilon }{(\log n)}^{315/146}\right )$$
(21)

with \(\epsilon = 3/4 - 50/73 < 0,0651\). As it follows from (19), the rate of convergence of J 2 to 0 cannot be better than the results in the lattice point problem for the ellipsoids in number theory, where for the case r = 2 we have the lower bound of the order \(O\left ({n}^{-3/4}\log \log n\right )\) (see [24]). Therefore, the relation (21) gives for J 2 the order that is not far from the optimal one.

In [1] it was used the following theorem from Huxley [26]:

Theorem 2.2.

Let D be a two-dimensional convex set with area A, bounded by a simple closed curve C, divided into a finite number of pieces each of those being 3 times continuously differentiable in the following sense. Namely, on each piece C i the radius of curvature ρ is positive (and not infinite), continuous, and continuously differentiable with respect to the angle of contingence ψ. Then in a set that is obtained from D by translation and linear expansion of order M, the number of integer points equals

$$\displaystyle\begin{array}{rcl} N& =& A{M}^{2} + O\left (I{M}^{K}{(\log M)}^{\Lambda }\right ) \\ K& =& \frac{46} {73},\ \Lambda = \frac{315} {146}, \\ \end{array}$$

where I is a number depending only on the properties of the curve C, but not on the parameters M or A.

In [45] the results from Asylbekov et al. [1] were generalized to any dimension. The main reason why two cases when r = 2 and r ≥ 3 are considered separately consists in the fact that for r ≥ 3 it is much more difficult than for r = 2 to check the applicability of the number theory results to B λ. In [45] we used the following result from Hlawka [25]:

Theorem 2.3.

Let D be a compact convex set in R m with the origin as its inner point. We denote the volume of this set by A. Assume that the boundary C of this set is an (m − 1)-dimensional surface of class C , the Gaussian curvature being non-zero and finite everywhere on the surface. Also assume that a specially defined “canonical” map from the unit sphere to D is one-to-one and belongs to the class C . Then in the set that is obtained from the initial one by translation along an arbitrary vector and by linear expansion with the factor M the number of integer points is

$$N = A{M}^{m} + O\left (I{M}^{m-2+ \frac{2} {m+1} }\right )$$

where the constant I is a number dependent only on the properties of the surface C, but not on the parameters M or A.

Providing that m = 2, the statement of Theorem 2.3 is weaker than the result of Huxley.

The above theorem is applicable in [45] with \(M = \sqrt{n}\). Therefore, for any fixed λ we have to deal not with a single set, but rather with a sequence of sets B λ(n) which are, however, “close” to the limiting set B 1 for all sufficiently large n (see the representation for \(T_{\lambda }(\boldsymbol{X})\) after (15)). It is necessary to emphasize that the constant I in our case, generally speaking, is I(n), i.e. it depends on n. Only having ascertained the fulfillment of the inequality

$$\vert I(n)\vert \leq C_{0},$$

where C 0 is an absolute constant, we are able to apply Theorem 2.3 without a change of the overall order of the error with respect to n.

In [45] we prove the following estimate of J 2 in the space of any fixed dimension r ≥ 3.

Theorem 2.4.

For the term J 2 from the decomposition (20)the following estimate holds

$$J_{2} = O\left ({n}^{-r/(r+1)}\right ),\ r \geq 3,$$

The Theorem implies that for the statistics \(t_{\lambda }(\boldsymbol{Y })\) and \(T_{\lambda }(\boldsymbol{X})\) (see formula (14)) it holds that

$$\Pr (t_{\lambda }(\boldsymbol{Y }) < c) =\Pr (T_{\lambda }(\boldsymbol{X}) < c) = G_{r}(c) + O\left ({n}^{-1+ \frac{1} {r+1} }\right ),\ r \geq 3.$$