Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

8.1 The Law of Large Numbers

Theorem 8.1.1

(Khintchin’s Law of Large Numbers)

Let \({\{ \xi_{n} \} }_{n=1}^{\infty}\) be a sequence of independent identically distributed random variables having a finite expectation E ξ n =a and let S n :=ξ 1+⋯+ξ n . Then

$$\frac{S_n}{n} \stackrel{p}{\to}a\quad {\textit{as}}\ n \to\infty. $$

The above assertion together with Theorems 6.1.6 and 6.1.7 imply the following.

Corollary 8.1.1

Under the conditions of Theorem 8.1.1, as well as convergence of S n /n in probability, convergence in mean also takes place:

$$\mathbf{E}\biggl\vert \frac{S_n}{n} - a\biggr\vert \to0 \quad {\textit{as}} \ n\to\infty. $$

Note that the condition of independence of ξ k and the very assertion of the theorem assume that all the random variables ξ k are given on a common probability space.

From the physical point of view, the stated law of large numbers is the simplest ergodic theorem which means, roughly speaking, that for random variables their “time averages” and “space averages” coincide. This applies to an even greater extent to the strong law of large numbers, by virtue of which S n /na with probability 1.

Under more strict assumptions (existence of variance) Theorem 8.1.1 was obtained in Sect. 4.7 as a consequence of Chebyshev’s inequality.

Proof of Theorem 8.1.1

We have to prove that, for any ε>0,

$$\mathbf{P} \biggl( \biggl| \frac{S_n}{n} -a \biggr| >\varepsilon \biggr) \to0 $$

as n→∞. The above relation is equivalent to the weak convergence of distributions S n /n⊂=⇒I a . Therefore, by the continuity theorem and Example 7.1.1 it suffices to show that, for any fixed t,

$$\varphi_{ S_n /n} (t) \to e^{iat} . $$

The ch.f. φ(t) of the random variable ξ k has, in a certain neighbourhood of 0, the property |φ(t)−1|<1/2. Therefore for such t one can define the function l(t)=lnφ(t) (we take the principal value of the logarithm). Since ξ n has finite expectation, the derivative

$$l'(0) =\frac{\varphi'(0)}{\varphi(0)} =ia $$

exists. For each fixed t and sufficiently large n, the value of l(t/n) is defined and

$$\varphi_{S_n /n} (t) =\varphi^n (t/n) =e^{l(t/n)n} . $$

Since l(0)=0, one has

$$e^{l(t/n)n} =\exp \biggl\{ t \frac{l(t/n)-l(0)}{t/n} \biggr\} \to e^{l'(0)t} =e^{iat} $$

as n→∞. The theorem is proved. □

8.2 The Central Limit Theorem for Identically Distributed Random Variables

Let, as before, {ξ n } be a sequence of independent identically distributed random variables. But now we assume, along with the expectation E ξ n =a, the existence of the variance \(\operatorname{Var}\xi_{n} =\sigma^{2}\). We retain the notation S n =ξ 1+⋯+ξ n for sums of our random variables and Φ(x) for the normal distribution function with parameters (0,1). Introduce the sequence of random variables

$$\zeta_n =\frac{S_n -an}{\sigma\sqrt{n} } . $$

Theorem 8.2.1

If 0<σ 2<∞, then P(ζ n <x)→Φ(x) uniformly in x (−∞<x<∞) as n→∞.

In such a case, the sequence {ζ n } is said to be asymptotically normal.

It follows from ζ n ζ⊂=Φ 0,1, \(\zeta^{2}_{n}\ge0\), \(\mathbf{E}\zeta^{2}_{n}=\mathbf{E}\zeta^{2}=1\) and from Lemma 6.2.3 that the sequence \(\{\zeta^{2}_{n}\}\) is uniformly integrable. Therefore, as well as the weak convergence ζ n ζ, ζ⊂=Φ 0,1 (E f(ζ n )→E f(ζ) for any bounded continuous f), one also has convergence E f(ζ n )→E f(ζ) for any continuous f such that |f(x)|<c(1+x 2) (see Theorem 6.2.3).

Proof of Theorem 8.2.1

The uniform convergence is a consequence of the weak convergence and continuity of Φ(x). Further, we may assume without loss of generality that a=0, for otherwise we could consider the sequence \({\{ {\xi'}_{n} =\xi_{n} -a \} }_{n=1}^{\infty}\) without changing the sequence {ζ n }. Therefore, to prove the required convergence, it suffices to show that \(\varphi_{\zeta_{n} } (t) \to e^{{-t^{2}}/2}\) when a=0. We have

$$\varphi_{\zeta_n } (t) =\varphi^n \biggl(\frac{t}{\sigma \sqrt{n}} \biggr),\quad\mbox{where } \varphi(t) =\varphi_{\xi_k } (t). $$

Since \(\mathbf{E}\xi_{n}^{2}\) exists, φ″(t) also exists and, as t→0, one has

$$ \varphi(t) =\varphi(0) +t\varphi'(0) +\frac{t^2}{2} \varphi''(0) +o\bigl(t^2\bigr)= 1- \frac{t^2 \sigma^2 }{2} +o\bigl(t^2\bigr) . $$
(8.2.1)

Therefore, as n→∞,

$$\begin{aligned} \ln\varphi_{\zeta_n } (t) =&n\ln \biggl[ 1-\frac{\sigma^2}{2} { \biggl( \frac{t}{\sigma\sqrt{n}} \biggr) }^2 +o \biggl( \frac{t^2}{n} \biggr) \biggr] \\= & n \biggl[ -\frac{t^2}{2n} +o \biggl( \frac{t^2}{n} \biggr) \biggr] =-\frac{t^2}{2} +o(1) \to-\frac{t^2}{2} . \end{aligned}$$

The theorem is proved. □

8.3 The Law of Large Numbers for Arbitrary Independent Random Variables

Now we proceed to elucidating conditions under which the law of large numbers and the central limit theorem will hold in the case when ξ k are independent but not necessarily identically distributed. The problem will not become more complicated if, from the very beginning, we consider a more general situation where one is given an arbitrary series ξ 1,n ,…,ξ n,n , n=1,2,… of independent random variables, where the distributions of ξ k,n may depend on n. This is the so-called triangular array scheme.

Put

$$\zeta_n :=\sum_{k=1}^{n} \xi_{k,n}. $$

From the viewpoint of the results to follow, we can assume without loss of generality that

$$ \mathbf{E}\xi_{k,n} =0. $$
(8.3.1)

Assume that the following condition is met: as n→∞,

figure a

Theorem 8.3.1

(The Law of Large Numbers)

If conditions (8.3.1) and [D 1] are satisfied, then ζ n ⊂=⇒I 0 or, which is the same, \(\zeta_{n} \stackrel{p}{\to}0\) as n→∞.

Example 8.3.1

Assume ξ k =ξ k,n do not depend on n, E ξ k =0 and E|ξ k |sm s <∞ for 1<s≤2. For such s, there exists a sequence b(n)=o(n) such that n=o(b s(n)). Since, for ξ k,n =ξ k /b(n),

$$\begin{aligned} \mathbf{E}\min \bigl(|\xi_{k,n}|,\,\xi_{k,n}^2 \bigr) =& \mathbf{E} \biggl[\biggl \vert \frac{\xi_k}{b(n)}\biggr \vert ^2;\,|\xi_k|\leq b(n) \biggr]+\mathbf{E} \biggl[ \frac{|\xi_k|}{b(n)};|\xi _k|>b(n) \biggr] \\\leq& \mathbf{E} \biggl[\biggl \vert \frac{\xi_k}{b(n)}\biggr \vert ^s;\, |\xi _k|\leq b(n) \biggr]+\mathbf{E} \biggl[\biggl \vert \frac{\xi_k}{b(n)}\biggr \vert ^s;\, |\xi _k|>b(n) \biggr] \\=& m_sb^{-s}(n), \end{aligned}$$

we have

$$D_1\leq nm_sb^{-s}(n)\to0, $$

and hence \(S_{n}/b(n)\stackrel{p}{\to}0\).

A more general sufficient condition (compared to m s <∞) for the law of large numbers is contained in Theorem 8.3.3 below. Theorem 8.1.1 is an evident corollary of that theorem.

Now consider condition [D 1] in more detail. It can clearly also be written in the form

$$D_1= \sum_{k=1}^{n} \mathbf{E}\bigl(|\xi_{k,n} |; | \xi_{k,n}| >1 \bigr) + \sum _{k=1}^{n} \mathbf{E}\bigl(|\xi_{k,n} |^2 ; | \xi_{k,n}| \le1 \bigr) \to 0 . $$

Next introduce the condition

$$ M_1 := \sum_{k=1}^{n} \mathbf{E}| \xi_{k,n}| \le c < \infty $$
(8.3.2)

and the condition

figure b

for any τ>0 as n→∞. Condition [M 1] could be called a Lindeberg type condition (the Lindeberg condition [M 2] will be introduced in Sect. 8.4).

The following lemma explains the relationship between the introduced conditions.

Lemma 8.3.1

1. {[M 1]∩(3.2)}⊂[D 1]. 2. [D 1]⊂[M 1].

That is, conditions [M 1] and (8.3.2) imply [D 1], and condition [D 1] implies [M 1].

It follows from Lemma 8.3.1 that under condition (8.3.2), conditions [D 1] and [M 1] are equivalent.

Proof of Lemma 8.3.1

1. Let conditions (8.3.2) and [M 1] be met. Then, for

$$\tau\le1,\qquad g_1 (x) =\min\bigl(|x|,|x|^2\bigr), $$

one has

$$\begin{aligned} D_1 =& \sum_{k=1}^{n} \mathbf{E}g_1 (\xi_{k,n} ) \le\sum_{k=1}^{n} \mathbf{E}\bigl(|\xi_{k,n} |; | \xi_{k,n}| >\tau\bigr) + \sum_{k=1}^{n} \mathbf{E}\bigl(|\xi_{k,n} |^2; | \xi_{k,n}| \le\tau\bigr) \\\le & M_1 (\tau) +\tau\sum_{k=1}^{n} \mathbf{E}\bigl(|\xi_{k,n} |; | \xi_{k,n}| \le\tau\bigr) \le M_1 (\tau) +\tau M_1 (0) . \end{aligned}$$
(8.3.3)

Since M 1(0)=M 1c and τ can be arbitrary small, we have D 1→0 as n→∞.

2. Conversely, let condition [D 1] be met. Then, for τ≤1,

$$\begin{aligned} M_1 (\tau) \le & \sum_{k=1}^{n} \mathbf{E}\bigl(|\xi_{k,n} |; | \xi_{k,n}| >1 \bigr) \\&{} + \tau^{-1} \sum_{k=1}^{n} \mathbf{E}\bigl(|\xi_{k,n} |^2; \tau< | \xi_{k,n}| \le1 \bigr) \le\tau^{-1} D_1 \to0 \end{aligned}$$
(8.3.4)

as n→∞ for any τ>0. The lemma is proved. □

Let us show that condition [M 1] (as well as [D 1]) is essential for the law of large numbers to hold.

Consider the random variables

$$\xi_{k,n}=\left \{ \begin{array}{l@{\quad }l} 1-\frac{1}{n} & \mbox{with probability}\ \frac{1}{n},\\ -\frac{1}{n} & \mbox{with probability}\ 1-\frac{1}{n}. \end{array} \right . $$

For them, E ξ k,n =0, \(\mathbf{E}|\xi_{k,n}|=\frac{2(n-1)}{n^{2}}\sim\frac{2}{n}\), M 1≤2, condition (8.3.2) is met, but \(M_{1}(\tau)=\frac{n-1}{n}>\frac{1}{2}\) for n>2, τ<1/2, and thus condition [M 1] is not satisfied. Here the number ν n of positive ξ k,n , 1≤kn, converges in distribution to a random variable ν having the Poisson distribution with parameter λ=1. The sum of the remaining ξ k,n s is equal to \(-\frac{(n-\nu_{n})}{n}\overset{p}{\longrightarrow}-1\). Therefore, ζ n +1⊂=⇒Π 1 and the law of large numbers does not hold.

Each of the conditions [D 1] and [M 1] imply the uniform smallness of E|ξ k,n |:

$$ \max_{1\le k \le n} \mathbf{E}|\xi_{k,n} | \to0 \quad\mbox{as}\ n \to\infty. $$
(8.3.5)

Indeed, equation [M 1] means that there exists a sufficiently slowly decreasing sequence τ n →0 such that M 1(τ n )→0. Therefore

$$ \max_{ k \le n} \mathbf{E}|\xi_{k,n} | \le\max_{k \le n} \bigl[ \tau_n + \mathbf{E} \bigl(| \xi_{k,n} |; |\xi_{k,n} | > \tau_n \bigr) \bigr] \le \tau_n +M_1(\tau_n ) \to0 . $$
(8.3.6)

In particular, (8.3.5) implies the negligibility of the summands ξ k,n .

We will say that ξ k,n are negligible, or, equivalently, have property [S], if, for any ε>0,

figure c

Property [S] could also be called uniform convergence of ξ k,n in probability to zero. Property [S] follows immediately from (8.3.5) and Chebyshev’s inequality. It also follows from stronger relations implied by [M 1]:

figure d

We now turn to proving the law of large numbers. We will give two versions of the proof. The first one illustrates the classical method of characteristic functions. The second version is based on elementary inequalities and leads to a stronger assertion about convergence in mean.Footnote 1

Here is the first version.

Proof of Theorem 8.3.1

Put

$$\varphi_{k,n } (t) :=\mathbf{E}e^{it\xi_{k,n }} ,\qquad \varDelta _k (t) :=\varphi _{k,n } (t)-1. $$

One has to prove that, for each t,

$$\varphi_{\zeta_n } (t) =\mathbf{E}e^{it\zeta_{n } } =\prod _{k=1}^{n} \varphi _{k,n } (t) \to1, $$

as n→∞. By Lemma 7.4.2

$$\begin{aligned} \bigl|\varphi_{\zeta_n } (t) -1 \bigr| =& \Biggl \vert \prod _{k=1}^{n} \varphi_{k,n } (t)-\prod _{k=1}^{n} 1 \Biggr \vert \le\sum _{k=1}^{n} |\varDelta_k (t)| \\=&\sum_{k=1}^{n} \bigl|\mathbf{E}e^{it\xi_{k,n }} -1 \bigr| =\sum_{k=1}^{n} \bigl| \mathbf{E}\bigl( e^{it\xi_{k,n }} -1 - it\xi_{k,n }\bigr) \bigr|. \end{aligned}$$

By Lemma 7.4.1 we have (for g 1(x)=min(|x|,x 2))

$$\bigl|e^{itx} -1-itx\bigr|\le\min \bigl( 2|tx|, t^2x^2/2 \bigr)\le 2g_1(tx)\le2 h(t)g_1(t), $$

where h(t)=max(|t|,|t|2). Therefore

$$\bigl|\varphi_{\zeta_n } (t) -1 \bigr| \le2h(t) \sum_{k=1}^{n} \mathbf{E}g_1 ( \xi _{k,n }) =2h(t)D_1 \to0. $$

The theorem is proved. □

The last inequality shows that \(| \varphi_{\zeta_{n}} (t) - 1 |\) admits a bound in terms of D 1. It turns out that E|ζ n | also admits a bound in terms of D 1. Now we will give the second version of the proof that actually leads to a stronger variant of the law of large numbers.

Theorem 8.3.2

Under conditions (8.3.1) and [D 1] one has E|ζ n |→0 (i.e. \(\zeta_{n} \stackrel {(1)}{\longrightarrow} 0 \)).

The assertion of Theorem 8.3.2 clearly means the uniform integrability of {ζ n }; it implies Theorem 8.3.1, for

$$\mathbf{P}\bigl( | \zeta_n | > \varepsilon\bigr) \leq\mathbf{E}| \zeta _n|/\varepsilon \to0 \quad\mbox{as} \ n \to\infty. $$

Proof of Theorem 8.3.2

Put

$$\xi'_{k,n}: = \left \{ \begin{array}{l@{\quad }l} \xi_{k,n} &\mbox{if}\ | \xi_{k,n} | \leq1, \\ 0 &\mbox{otherwise}, \end{array} \right . $$

and \(\xi''_{k,n}: = \xi_{k,n} - \xi'_{k,n}\). Then \(\xi_{k,n} = \xi'_{k,n} + \xi''_{k,n}\) and \(\zeta_{n} = \zeta'_{n} + \zeta''_{n}\) with an obvious convention for the notations \(\zeta'_{n}\), \(\zeta''_{n}\). By the Cauchy–Bunjakovsky inequality,

$$\begin{aligned} \mathbf{E}| \zeta_n| \leq& \mathbf{E}\bigl| \zeta'_n - \mathbf{E}\zeta'_n \bigr| + \mathbf{E}\bigl| \zeta''_n - \mathbf{E} \zeta''_n \bigr| \leq \sqrt{\mathbf{E} \bigl(\zeta'_n - \mathbf{E}\zeta'_n \bigr)^2} + \mathbf{E}\bigl| \zeta''_n\bigr| + \bigl| \mathbf{E}\zeta ''_n\bigr | \\\leq& \sqrt{\sum\operatorname{Var}\bigl( \xi'_{k,n}\bigr)} + 2 \sum \mathbf{E}\bigl|\xi ''_{k,n} \bigr| \leq \sqrt{\sum \mathbf{E}\bigl(\xi'_{k,n}\bigr)^2} + 2 \sum\mathbf{E}\bigl| \xi ''_{k,n} \bigr| \\=& \Bigl[ \sum\ \mathbf{E}\bigl(\xi^2_{k,n}; |\xi_{k,n} | \leq1\bigr) \Bigr]^{1/2} \\&{} +2 \sum \mathbf{E}\bigl(| \xi_{k,n}|; |\xi_{k,n}| > 1\bigr) \leq \sqrt{D_1} + 2 D_1 \to0, \end{aligned}$$

if D 1→0. The theorem is proved. □

Remark 8.3.1

It can be seen from the proof of Theorem 8.3.2 that the argument will remain valid if we replace the independence of ξ k,n by the weaker condition that \(\xi'_{k,n}\) are non-correlated. It will also be valid if \(\xi'_{k,n}\) are only weakly correlated so that

$$\mathbf{E}\bigl(\zeta'_n - \mathbf{E} \zeta'_n\bigr)^2 \leq c \sum \operatorname{Var}\bigl(\xi'_{k,n} \bigr), \quad c < \infty. $$

If {ξ k } is a given fixed (not dependent on n) sequence of independent random variables, \(S_{n} = \sum^{n}_{k=1} \xi_{k}\) and E ξ k =a k , then one looks at the applicability of the law of large numbers to the sequences

$$ \xi_{k,n} = \frac{\xi_k - a_k}{b (n)}, \qquad \zeta_n = \sum\xi _{k,n} = \frac{1}{b (n)} \Biggl(S_n - \sum^n_{k=1} a_k \Biggr), $$
(8.3.7)

where ξ k,n satisfy (8.3.1), and b(n) is an unboundedly increasing sequence. In some cases it is natural to take \(b(n) = \sum^{n}_{k=1} \mathbf{E}|\xi_{k}|\) if this sum increases unboundedly. Without loss of generality we can set a k =0. The next assertion follows from Theorem 8.3.2.

Corollary 8.3.1

If, as n→∞,

$$D_1:= \frac{1}{b(n)} \sum\mathbf{E}\min \bigl(| \xi_k|,\, \xi_k^2/b(n) \bigr) \to0 $$

or, for any τ>0,

$$ M_1 (\tau) = \frac{1}{b(n)} \sum \mathbf{E}\bigl(|\xi_k|; |\xi_k| > \tau b(n)\bigr) \to0, \quad b(n) = \sum^n_{k=1} \mathbf{E} |\xi_k| \to\infty, $$
(8.3.8)

then \(\zeta_{n} \stackrel{(1)}{\longrightarrow} 0\).

Now we will present an important sufficient condition for the law of large numbers that is very close to condition (8.3.8) and which explains to some extent its essence. In addition, in many cases this condition is easier to check. Let b k =E|ξ k |, \(\overline{b}_{n}=\max_{k\leq n}b_{k}\), and, as before,

$$S_n=\sum_{k=1}^n \xi_k,\qquad b(n)=\sum_{k=1}^n b_k. $$

The following assertion is a direct generalisation of Theorem 8.1.1 and Corollary 8.1.1.

Theorem 8.3.3

Let E ξ k =0, the sequence of normalised random variables ξ k /b k be uniformly integrable and \(\overline{b}_{n}=o (b(n) )\) as n→∞. Then

$$\frac{S_n}{b(n)}\overset{(1)}{\longrightarrow}0. $$

If \(\overline{b}_{n}\leq b<\infty\) then b(n)≤bn and \(\frac{S_{n}}{n} \stackrel{(1)}{\longrightarrow}0\).

Proof

Since

$$ \mathbf{E} \bigl(|\xi_k|;|\xi_k|> \tau b(n) \bigr)\leq b_k\mathbf{E} \biggl(\biggl \vert \frac{\xi_k}{b_k}\biggr \vert ; \biggl \vert \frac{\xi_k}{b_k}\biggr \vert >\tau\frac{b(n)}{\overline {b}_n} \biggr) $$
(8.3.9)

and \(\frac{b(n)}{\overline{b}_{n}}\to\infty\), the uniform integrability of \(\{\frac{\xi_{k}}{b_{k}} \}\) implies that the right-hand side of (8.3.9) is o(b k ) uniformly in k (i.e. it admits a bound ε(n)b k , where ε(n)→0 as n→∞ and does not depend on k). Therefore

$$M_1(\tau)=\frac{1}{b(n)}\sum_{k=1}^n \mathbf{E} \bigl(|\xi_k|;|\xi _k|>\tau b(n) \bigr)\to0 $$

as n→∞, and condition (8.3.8) is met. The theorem is proved. □

Remark 8.3.2

If, in the context of the law of large numbers, we are interested in convergence in probability, only then can we generalise Theorem 8.3.3. In particular, convergence

$$\frac{S_n}{b(n)}\stackrel{p}{\to}0 $$

will still hold if a finite number of the summands ξ k (e.g., for kl, l being fixed) are completely arbitrary (they can even fail to have expectations) and the sequence \(\xi_{k}^{*}=\xi_{k+l}\), k≥1, satisfies the conditions of Theorem 8.3.3, where b(n) is defined for the variables \(\xi_{k}^{*}\) and has the property \(\frac {b(n-1)}{b(n)}\to1\) as n→∞.

This assertion follows from the fact that

$$\frac{S_n}{b(n)}=\frac{S_l}{b(n)}+\frac{S_n-S_l}{b(n-l)}\cdot \frac{b(n-l)}{b(n)},\qquad \frac{S_l}{b(n)}\,\overset{p}{\longrightarrow}\,0, \qquad \frac{b(n-l)}{b(n)}\to1, $$

and by Theorem 8.3.3

$$\frac{S_n-S_l}{b(n-l)}\overset{p}{\longrightarrow}\,0\quad \mbox{as}\ n\to \infty. $$

Now we will show that the uniform integrability condition in Theorem 8.3.3 (as well as condition M 1(τ)→0) is essential for convergence \(\zeta_{n} \stackrel{p}{\to}0\). Consider a sequence of random variables

$$\xi_j=\left \{ \begin{array}{l@{\quad }l} 2^s-1 &\mbox{with probability}\ 2^{-s},\\ -1 &\mbox{with probability}\ 1-2^{-s} \end{array} \right . $$

for jI s :=(2s−1,2s], s=1,2,…; ξ 1=0. Then E ξ j =0, E|ξ j |=2(1−2s) for jI s , and, for n=2k, one has

$$b(n)=\sum_{s=1}^k 2\bigl(1-2^{-s} \bigr)|I_s|, $$

where |I s |=2s−2s−1=2s−1 is the number of points in I k . Hence, as k→∞,

$$\begin{aligned} b(n) \sim&2 \bigl[\bigl(1-2^{-k}\bigr)2^{k-1}+ \bigl(1-2^{-k+1}\bigr)2^{k-2}+\cdots \bigr]\\\sim & 2^k+2^{k-1}+\ldots\sim2^{k+1}=2n. \end{aligned}$$

Observe that the uniform integrability condition is clearly not met here. The distribution of the number ν (s) of jumps of magnitude 2s−1 on the interval I s converges, as s→∞, to the Poisson distribution with parameter 1/2=lim s→∞2s|I s |, while the distribution of \(2^{-s}(S_{2^{s}}-S_{2^{s-1}})\) converges to the distribution of ν−1/2, where ν⊂=Π 1/2. Hence, assuming that n=2k, and partitioning the segment [2,n] into the intervals (2s−1,2s], s=1,…,k, we obtain that the distribution of S n /n converges, as k→∞, to the distribution of

$$\frac{S_n}{n}=2^{-k}\sum_{s=1}^k \frac{S_{2^s}-S_{2^{s-1}}}{2^s}2^s \Rightarrow \sum _{l=0}^\infty(\nu_l-1/2) 2^{-l}=:\zeta, $$

where ν l , l=0,1,…, are independent copies of ν. Clearly, \(\zeta\not\equiv0\), and so convergence \(\frac {S_{n}}{n}\stackrel{p}{\to} 0\) fails to take place.

Let us return to arbitrary ξ k,n . In order for [D 1] to hold it suffices that the following condition is met: for some s, 2≥s>1,

figure e

This assertion is evident, since g 1(x)≤|x|s for 2≥s>1. Conditions [L s ] could be called the modified Lyapunov conditions (cf. the Lyapunov condition [L s ] in Sect. 8.4).

To prove Theorem 8.3.2, we used the so-called “truncated versions” \(\xi'_{k,n}\) of the random variables ξ k,n . Now we will consider yet another variant of the law of large numbers, in which conditions are expressed in terms of truncated random variables.

Denote by ξ (N) the result of truncation of the random variable ξ at level N:

$$\xi^{(N)} = \max\bigl[ - N, \min(N, \xi)\bigr]. $$

Theorem 8.3.4

Let the sequence of random variables {ξ k } in (8.3.7) satisfy the following condition: for any given ε>0, there exist N k such that

$$\frac{1}{b (n)} \sum^n_{k=1} \mathbf{E}\bigl \vert \xi_k - \xi _k^{(N_k)}\bigr \vert < \varepsilon, \quad \frac{1}{b (n)} \sum ^n_{k=1} N_k < N < \infty. $$

Then the sequence {ζ n } converges to zero in mean: \(\zeta_{n} \stackrel{(1)}{\longrightarrow} 0\).

Proof

Clearly \(a_{k}^{(N_{k})}:= \mathbf{E}\xi_{k}^{(N_{k})} \to a_{k}\) as N k →∞ and \(|a_{k}^{(N_{k})} | \le N_{k}\). Further, we have

$$\begin{aligned} \mathbf{E}|\zeta_n| = \frac{1}{b (n)} \mathbf{E}\Bigl \vert \sum(\xi_k - a_k) \Bigr \vert \le& \frac{1}{ b(n)} \sum\mathbf{E}\bigl \vert \xi_k - \xi_k^{(N_k)}\bigr \vert \\&{} +\mathbf{E}\biggl \vert \sum\frac{\xi_k^{(N_k)} - a_k^{(N_k)}}{ b (n)} \biggr \vert + \frac{1}{b (n)} \sum\bigl \vert a_k^{(N_k)} - a_k\bigr \vert . \end{aligned}$$

Here the second term on the right-hand side converges to zero, since the sum under the expectation satisfies the conditions of Theorem 8.3.1 and is bounded. But the first and the last terms do not exceed ε. Since the left-hand side does not depend on ε, we have E|ζ n |→0 as n→∞. □

Corollary 8.3.2

If b(n)=n and, for sufficiently large N and all kn,

$$\mathbf{E} \bigl|\xi_k - \xi_k^{(N)} \bigr| < \varepsilon, $$

then \(\zeta_{n} \stackrel{(1)}{\longrightarrow} 0\).

The corollary follows from Theorem 8.3.4, since the conditions of the corollary clearly imply the conditions of Theorem 8.3.4.

It is obvious that, for identically distributed ξ k , the conditions of Corollary 8.3.2 are always met, and we again obtain a generalisation of Theorem 8.1.1 and Corollary 8.1.1.

If E|ξ k |r<∞ for r≥1, then we can also establish in a similar way that

$$\frac{S_n}{ n} \stackrel{(r)}{\longrightarrow} a. $$

Remark 8.3.3

Condition [D 1] (or [M 1]) is not necessary for convergence \(\zeta_{n} \stackrel{p}{\to}0\) even when (8.3.2) and (8.3.5) hold, as the following example demonstrates. Let ξ k,n assume the values −n, 0, and n with probabilities 1/n 2, 1−2/n 2, and 1/n 2, respectively. Here \(\zeta_{n} \stackrel{p}{\to}0\), since P(ζ n ≠0)≤P(⋃{ξ k,n ≠0})≤2/n→0, E|ξ k,n |=2/n→0 and M 1=∑E|ξ k,n |=2<∞. At the same time, \(\sum\mathbf{E}( | \xi_{k,n }|; | \xi_{k,n }| \ge1 ) =2 \not\to \infty\), so that conditions [D 1] and [M 1] are not satisfied.

However, if we require that

(8.3.10)

then condition [D 1] will become necessary for convergence \(\zeta_{n}\stackrel{p}{\to}0\).

Before proving that assertion we will establish several auxiliary relations that will be useful in the sequel. As above, put Δ k (t):=φ k,n (t)−1.

Lemma 8.3.2

One has

$$\sum_{k=1}^n\bigl|\varDelta _k(t)\bigr| \le|t|M_1. $$

If condition [S] holds, then for each t, as n→∞,

$$\max_{k\le n}\bigl|\varDelta _k(t)\bigr| \to0. $$

If a random variable ξ with E ξ=0 is bounded from the left: ξ>−c, c>0, then E|ξ|≤2c.

Proof

By Lemma 7.4.1,

$$\begin{aligned} \bigl|\varDelta _k(t)\bigr|\le\mathbf{E}\bigl|e^{it\xi_{k,n}}-1\bigr|\le|t|\,\mathbf {E}| \xi _{k,n}|,\qquad \sum\bigl|\varDelta _{k}(t)\bigr|\le|t| M_1. \end{aligned}$$

Further,

$$\begin{aligned} \bigl|\varDelta _k(t)\bigr| \le&\mathbf{E} \bigl(\bigl|e^{it\xi_{k,n}}-1\bigr|; | \xi_{k,n}|\le\varepsilon \bigr) + \mathbf{E} \bigl(\bigl|e^{it\xi _{k,n}}-1\bigr|; |\xi_{k,n}|>\varepsilon \bigr) \\\le&|t| \varepsilon+2\mathbf{P} \bigl(|\xi_{k,n}|>\varepsilon \bigr). \end{aligned}$$

Since ε is arbitrary here, the second assertion of the lemma now follows from condition [S].

Put

$$\xi^{+} :=\max(0;\xi) \ge0,\qquad\xi^{-}: =-\bigl(\xi- \xi^{+} \bigr) \ge 0. $$

Then E ξ=E ξ +E ξ =0 and E|ξ|=E ξ ++E ξ =2E ξ ≤2c. The lemma is proved. □

From the last assertion of the lemma it follows that (8.3.10) implies (8.3.2) and (8.3.5).

Lemma 8.3.3

Let conditions [S] and (8.3.2) be satisfied. A necessary and sufficient condition for convergence \(\varphi_{\zeta _{n}}(t)\to \varphi(t)\) is that

$$\sum_{k=1}^{n} \varDelta _k (t) \to\ln\varphi(t). $$

Proof

Observe that

$$\operatorname{Re} \varDelta _k (t) =\operatorname{Re} \bigl(\varphi_{k,n} (t) -1\bigr) \le0,\quad\bigl|e^{\varDelta _k (t)}\bigr| \le1 , $$

and therefore, by Lemma 7.4.2,

$$\begin{aligned} \bigl| \varphi_{z_n } (t) -e^{\sum \varDelta _k (t)} \bigr| =& \Biggl|\, \prod _{k=1}^{n} \varphi_{k,n} (t) -\prod _{k=1}^{n} e^{\varDelta _k (t)}\Biggr| \\\le& \sum_{k=1}^{n} \bigl| \varphi_{k,n} (t) -e^{\varDelta _k (t)} \bigr| = \sum _{k=1}^{n} \bigl|e^{\varDelta _k (t)} -1 - \varDelta _k (t) \bigr| \\\le& \frac{1}{2} \sum_{k=1}^{n} \varDelta _k^2 (t) \le\frac{1}{2} \max _k \bigl|\varDelta _k (t)\bigr|\sum _{k=1}^{n} \bigl|\varDelta _k (t)\bigr|. \end{aligned}$$

By Lemma 8.3.2 and conditions [S] and (8.3.2), the expression on the left-hand side converges to 0 as n→∞. Therefore, if \(\varphi_{z_{n} } (t) \to\varphi (t)\) then exp{∑Δ k (t)}→φ(t), and vice versa. The lemma is proved. □

The next assertion complements Theorem 8.3.1.

Theorem 8.3.5

Assume that relations (8.3.1) and (8.3.10) hold. Then condition [D 1] (or condition [M 1]) is necessary for the law of large numbers.

Proof

If the law of large numbers holds then \(\varphi_{z_{n} } (t) \to1\) and, hence by Lemma 8.3.3 (recall that (8.3.10) implies (8.3.2), (8.3.5) and [S])

$$\sum_{k=1}^{n} \varDelta _k (t) = \sum_{k=1}^{n} \mathbf{E} \bigl(e^{it\xi_{k,n}} -1 -it\xi _{k,n} \bigr) \to0 . $$

Moreover, by Lemma 7.4.1

$$\begin{aligned} & \sum_{k=1}^{n} \mathbf{E} \bigl( \bigl|e^{it\xi_{k,n}} -1 -it\xi_{k,n} \bigr| ; |\xi_{k,n}| \le \varepsilon_{k,n} \bigr) \\&\quad {} \le \frac{1}{2} \sum_{k=1}^{n} \mathbf{E}\bigl( | xi_{k,n} |^2 ; |\xi _{k,n} | \le \varepsilon_{k,n} \bigr) \le\sum_{k=1}^{n} \varepsilon_{k,n}^{2} \le \max_k \varepsilon_{k,n} \sum_{k=1}^{n} \varepsilon_{k,n} \to0 . \end{aligned}$$

Therefore, if the law of large numbers holds, then by virtue of (8.3.10)

$$\sum_{k=1}^{n} \mathbf{E} \bigl( e^{it\xi_{k,n}} -1 -it\xi_{k,n} ; \xi_{k,n} > \varepsilon_{k,n} \bigr) \to0. $$

Consider the function α(x)=(e ix−1)/ix. It is not hard to see that the inequality |α(x)|≤1 proved in Lemma 7.4.1 is strict for x>ε>0, and hence there exists a δ(τ)>0 for τ>0 such that \(\operatorname{Re}(1-\alpha(x)) \ge\delta(\tau)\) for x>τ. This is equivalent to \(\operatorname{Im} (1+ix -e^{ix}) \ge\delta (\tau )x\), so that

$$x\le\frac{1}{\delta(\tau)} \operatorname{Im} \bigl(1+ix -e^{ix}\bigr)\quad \mbox{for}\ x > \tau. $$

From this we find that

$$\begin{aligned} \mathbf{E}_1 (\tau) =& \sum_{k=1}^{n} \mathbf{E}\bigl( | \xi_{k,n} | ;|\xi_{k,n} |>\tau\bigr) = \sum _{k=1}^{n} \mathbf{E}( \xi_{k,n} ;\xi_{k,n} >\tau) \\\le& \frac{1}{\delta(\tau)}\operatorname{Im} \sum_{k=1}^{n} \mathbf{E} \bigl(1+i\xi_{k,n} - e^{i\xi_{k,n}}; \xi_{k,n} > \varepsilon _{k,n} \bigr) \to0 . \end{aligned}$$

Thus condition [M 1] holds. Together with relation (8.3.2), that follows from (8.3.10), this condition implies [D 1]. The theorem is proved. □

There seem to exist some conditions that are wider than (8.3.10) and under which condition [D 1] is necessary for convergence \(\zeta_{n}\stackrel{(1)}{\longrightarrow}0\) in mean (condition (8.3.10) is too restrictive).

8.4 The Central Limit Theorem for Sums of Arbitrary Independent Random Variables

As in Sect. 8.3, we consider here a triangular array of random variables ξ 1,n ,…,ξ n,n and their sums

$$ \zeta_n =\sum_{k=1}^{n} \xi_{k,n} . $$
(8.4.1)

We will assume that ξ k,n have finite second moments:

$$\sigma_{k,n}^{2} :=\operatorname{Var}(\xi_{k,n}) < \infty, $$

and suppose, without loss of generality, that

$$ \mathbf{E}\xi_{k,n} =0 ,\qquad\sum _{k=1}^{n} \sigma_{k,n}^{2} = \operatorname{Var}(\zeta_n)=1. $$
(8.4.2)

We introduce the following condition: for some s>2,

figure f

which is to play an important role in what follows. Our arguments related to condition [D 2] and also to conditions [M 2] and [L s ] to be introduced below will be quite similar to the ones from Sect. 8.3 that were related to conditions [D 1], [M 1] and [L s ].

We also introduce the Lindeberg condition: for any τ>0, as n→∞,

figure g

The following assertion is an analogue of Lemma 8.3.1.

Lemma 8.4.1

1. {[M 2]∩(4.2)}⊂[D 2]. 2. [D 2]⊂[M 2].

That is, conditions [M 2] and (8.4.2) imply [D 2], and condition [D 2] implies [M 2].

From Lemma 8.4.1 it follows that, under condition (8.4.2), conditions [D 2] and [M 2] are equivalent.

Proof of Lemma 8.4.1

1. Let conditions [M 2] and (8.4.2) be met. Put

$$g_2(x):=\min\bigl(x^2,|x|^s\bigr), \quad s>2. $$

Then (cf. (8.3.3), (8.3.4); τ≤1)

$$\begin{aligned} D_2=\sum _{k=1}^{n} \mathbf{E}g_2(\xi_{k,n}) \le& \sum _{k=1}^{n} \mathbf{E} \bigl(\xi^2_{k,n};|\xi _{k,n}|>\tau \bigr)+\sum _{k=1}^{n} \mathbf{E}\bigl(| \xi_{k,n}|^s;|\xi _{k,n}|\le\tau\bigr) \\\le& M_2(\tau)+\tau^{s-2}M_2(0)=M_2( \tau)+\tau^{s-2}. \end{aligned}$$

Since τ is arbitrary, we have D 2→0 as n→∞.

2. Conversely, suppose that [D 2] holds. Then

$$M_2(\tau)\le\sum _{k=1}^{n} \mathbf{E} \bigl(\xi ^2_{k,n};|\xi _{k,n}|>1 \bigr) +\frac{1}{\tau^{s-2}}\sum _{k=1}^{n} \bigl(|\xi_{k,n}|^s; \tau<|\xi _{k,n}|\le1 \bigr) \le\frac{1}{\tau^{s-2}}D_2 \to0 $$

for any τ>0, as n→∞. The lemma is proved. □

Lemma 8.4.1 also implies that if (8.4.2) holds, then condition [D 2] is “invariant” with respect to s>2.

Condition [D 2] can be stated in a more general form:

$$\sum_{k=1}^{n} \mathbf{E} \xi_{k,n}^2 h \bigl(|\xi_{k,n}| \bigr) \to0, $$

where h(x) is any function for which h(x)>0 for x>0, h(x)↑, h(x)→0 as x→0, and h(x)→c<∞ as x→∞. All the key properties of condition [D 2] will then be preserved. The Lindeberg condition clarifies the meaning of condition [D 2] from a somewhat different point of view. In Lindeberg’s condition, h(x)=I (τ,∞), τ∈(0,1). A similar remark may be made with regard to conditions [D 1] and [M 1] in Sect. 8.3.

In a way similar to what we did in Sect. 8.3 when discussing condition [M 1], one can easily verify that condition [M 2] implies convergence (see (8.3.6))

$$ \max_{k \le n} \operatorname{Var}( \xi_{k,n}) \to0 $$
(8.4.3)

and the negligibility of ξ k,n (property [S]). Moreover, one obviously has the inequality

$$M_1 (\tau) \le\frac{1}{\tau} M_2 (\tau). $$

For a given fixed (independent of n) sequence {ξ k } of independent random variables,

$$ S_n =\sum_{k=1}^{\infty} \xi_k,\qquad\mathbf{E}\xi_k =a_k, \qquad \operatorname{Var}( \xi_k) =\sigma_k^2, $$
(8.4.4)

one considers the asymptotic behaviour of the normed sums

$$ \zeta_n =\frac{1}{B_n} \Biggl(S_n - \sum_{k=1}^{\infty} a_k \Biggr), \qquad B_n^2 =\sum_{k=1}^{\infty} \sigma_k^2 , $$
(8.4.5)

that are clearly also of the form (8.4.1) with ξ k,n =(ξ k a k )/B n .

Conditions [D 1] and [M 2] for ξ k will take the form

$$ \displaystyle \begin{array}{c} \displaystyle D_2 =\frac{1}{B_n^2} \sum_{k=1}^{\infty} \mathbf{E} \min \biggl( (\xi_k -a_k )^2, \frac{|\xi_k -a_k |^s}{B_n^{s-2}} \biggr) \to0,\quad s>2;\\ \displaystyle M_2 (\tau) =\frac{1}{B_n^2} \sum_{k=1}^{\infty} \mathbf{E}\bigl((\xi_k -a_k)^2; |\xi_k -a_k | >\tau B_n \bigr) \to0,\quad\tau>0. \end{array} $$
(8.4.6)

Theorem 8.4.1

(The Central Limit Theorem)

If the sequences of random variables \({\{ \xi_{k,n} \} }_{k=1}^{\infty}\), n=1,2,…, satisfy conditions (8.4.2) and [D 2] (or [M 2]) then, as n→∞, P(ζ n <x)→Φ(x) uniformly in x.

Proof

It suffices to verify that

$$\varphi_{\zeta_n } (t) =\prod_{k=1}^{\infty} \varphi_{k,n} (t) \to e^{- t^2/2}. $$

By Lemma 7.4.2,

$$\begin{aligned} \bigl|\varphi_{\zeta_n } (t) -e^{-{t^2}/2 }\bigr| =& \Biggl| \prod _{k=1}^{n } \varphi_{k,n} (t)- \prod _{k=1}^{n } e^{-t^2 \sigma_{k,n}^2/2 } \Biggr| \\ \le& \sum_{k=1}^{n} \bigl| \varphi_{k,n} (t) - e^{- t^2 \sigma_{k,n}^2/2 } \bigr| \\\le& \sum_{k=1}^{n} \biggl| \varphi_{k,n} (t) -1 + \frac{1}{2} t^2 \sigma_{k,n}^{2} \biggr| \\&{} + \sum_{k=1}^n \biggl| e^{-t^2 \sigma_{k,n}^2/2 } -1+ \frac{1}{2} t^2 \sigma_{k,n}^{2} \biggr|. \end{aligned}$$
(8.4.7)

Since by Lemma 7.4.1, for s≤3,

$$\biggl| e^{ix} -1 -ix +\frac{x^2 }{2} \biggr| \le\min \biggl( x^2 ,\frac{|x^3|}{6} \biggr) \le g_2 (x) $$

(see the definition of the function g 2 in the beginning of the proof of Lemma 8.4.1), the first sum on the right-hand side of (8.4.7) does not exceed

$$\begin{aligned} & \sum_{k=1}^{\infty} \biggl| \mathbf{E} \biggl( e^{it\xi_{k,n}}-1 -it\xi_{k,n} + \frac{1}{2} t^2 \xi_{k,n}^{2} \biggr) \biggr| \\&\quad {} \le \sum_{k=1}^{\infty} \mathbf{E}g_2 \bigl(|t\xi_{k,n}|\bigr) \le h(t) \sum _{k=1}^{\infty} \mathbf{E}g_2 \bigl(| \xi_{k,n}|\bigr) \le h(t) D_2 \to0, \end{aligned}$$

where h(t)=max(t 2,|t|3). The last sum in (8.4.7) (again by Lemma 7.4.1) does not exceed (see (8.4.2) and (8.4.3))

$${t^4\over8}\sum _{k=1}^{n} \sigma ^4_{k,n}\le\frac{t^4}{8}\,\max_k \sigma_{k,n}^2\sum_{k=1}^n \sigma _{k,n}^2\le {t^4\over8} \max _{k} \sigma^2_{k,n}\to0\quad \hbox{as}\ n\to\infty. $$

The theorem is proved. □

If we change the second relation in (8.4.2) to E ζ n σ 2>0, then, introducing the new random variables \(\xi'_{k,n}=\xi_{k,n}/\sqrt{\operatorname{Var}\zeta_{n}}\) and using continuity theorems, it is not hard to obtain from Theorem 8.4.1 (see e.g. Lemma 6.2.2), the following assertion, which sometimes proves to be more useful in applications than Theorem 8.4.1.

Corollary 8.4.1

Assume that E ξ k,n =0, \(\operatorname{Var}( \zeta_{n}) \to\sigma^{2} >0\), and condition [D 2] (or [M 2]) is satisfied. Then \(\zeta_{n} \mathbin{\subset\!\!\!\!\!=\!\!\!\!\Rightarrow} {\boldsymbol{\Phi}}_{0,\sigma_{2} }\).

Remark 8.4.1

A sufficient condition for [D 2] and [M 2] is provided by the more restrictive Lyapunov condition, the verification of which is sometimes easier. Assume that (8.4.2) holds. For s>2, the quantity

$$L_s:=\sum_{k=1}^{n} \mathbf{E}|\xi_{k,n}|^2 $$

is called the Lyapunov fraction of the s-th order. The condition

figure h

is called the Lyapunov condition.

The quantity L s is called a fraction since for ξ k,n =(ξ k a)/B n (where a k =E ξ k , \(B_{n}^{2} =\sum_{k=1}^{n}\operatorname{Var} (\xi_{k})\) and ξ k do not depend on n), it has the form

$$L_s =\frac{1}{B_n^s}\sum_{k=1}^{n} \mathbf{E}|\xi_{k} -a_k|^s. $$

If the ξ k are identically distributed, a k =a, \(\operatorname {Var}(\xi_{k}) =\sigma^{2}\), and E|ξ k a|s=μ<∞, then

$$L_s =\frac{\mu}{ \sigma^s n^{(s-2)/2}}\to0. $$

The sufficiency of the Lyapunov condition follows from the obvious inequalities g 2(x)≤|x|s for any s, D 2L s .

In the case of (8.4.4) and (8.4.5) we can give a sufficient condition for the integral limit theorem that is very close to the Lindeberg condition [M 2]; the former condition elucidates to some extent the essence of the latter (cf. Theorem 8.3.3), and in many cases it is easier to verify. Put \(\overline{\sigma}_{n}=\max_{k\leq n}\sigma_{k}\). Theorem 8.4.1 implies the following assertion which is a direct extension of Theorem 8.2.1

Theorem 8.4.2

Let conditions (8.4.4) and (8.4.5) be satisfied, the sequence of normalised random variables \(\xi_{k}^{2}/\sigma_{k}^{2}\) be uniformly integrable and \(\overline{\sigma}_{n}=o(B_{n})\) as n→∞. Then \(\zeta_{n}\subset\hspace{-6pt}\Rightarrow\,{\boldsymbol{\Phi}}_{0,1}\).

Proof of Theorem 8.4.2

repeats, to some extent, the proof of Theorem 8.3.3. For simplicity assume that a k =0. Then

$$ \mathbf{E} \bigl(\xi_k^2;| \xi_k|>\tau B_n \bigr)\leq \sigma_k^2 \mathbf{E} \biggl(\frac{\xi^2_k}{\sigma_k^2}; \biggl \vert \frac{\xi_k}{\sigma_k}\biggr \vert >\tau\,\frac{B_n}{\overline {\sigma}_n} \biggr), $$
(8.4.8)

where \(B_{n}/\overline{\sigma}_{n}\to\infty\). Hence, it follows from the uniform integrability of \(\{\frac{\xi_{k}^{2}}{\sigma_{k}^{2}} \}\) that the right-hand side of (8.4.8) is \(o(\sigma_{k}^{2})\) uniformly in k. This means that

$$M_2(\tau)=\frac{1}{B_n^2}\sum_{k=1}^n \mathbf{E} \bigl(\xi_k^2; |\xi _k|>\tau B_n \bigr)\to0 $$

as n→∞ and condition (8.4.6) (or condition [M 2]) is satisfied. The theorem is proved. □

Remark 8.4.2

We can generalise the assertion of Theorem 8.4.2 (cf. Remark 8.3.3). In particular, convergence ζ n ⊂=⇒Φ 0,1 still takes place if a finite number of summands ξ k (e.g., for kl, l being fixed) are completely arbitrary, and the sequence \(\xi_{k}^{*}:=\xi_{k+l}\), k≥1, satisfies the conditions of Theorem 8.4.2, in which we put \(\sigma_{k}^{2}=\operatorname{Var}(\xi_{k}^{*})\), \(B_{n}^{2}=\sum _{k=1}^{n}\sigma_{k}^{2}\), and it is also assumed that \(\frac{B_{n-1}}{B_{n}}\to1\) as n→∞.

This assertion follows from the fact that

$$\frac{S_n}{B_n}=\frac{S_l}{B_n}+\frac{S_n-S_l}{B_{n-l}}\cdot \frac{B_{n-l}}{B_n}, $$

where \(\frac{S_{l}}{B_{n}}\stackrel{p}{\to}0\), \(\frac {B_{n-l}}{B_{n}}\to1\) and, by Theorem 8.4.2, \(\frac{S_{n}-S_{l}}{B_{n-l}}\mathbin{\subset\!\!\!\!\!=\!\!\!\!\Rightarrow}{\boldsymbol{\Phi}}_{0,1}\) as n→∞.

Remark 8.4.3

The uniform integrability condition that was used in Theorem 8.4.2 can be used for the triangular array scheme as well. In this more general case the uniform integrability should mean the following: the sequences η 1,n ,…,η n,n , n=1,2,…, in the triangular array scheme are uniformly integrable if there exists a function ε(N)↓0 as N↑∞ such that, for all n,

$$\max_{j\leq n}\mathbf{E} \bigl(|\eta_{j,n}|;| \eta_{j,n}|>N \bigr)\leq \varepsilon(N). $$

It is not hard to see that, with such an interpretation of uniform integrability, the assertion of Theorem 8.4.2 holds true for the triangular array scheme as well provided that the sequence \(\{\frac{\xi_{j,n}^{2}}{\sigma_{j,n}^{2}} \}\) is uniformly integrable and max jn σ j,n =o(1) as n→∞.

Example 8.4.1

We will clarify the difference between the Lindeberg condition and uniform integrability of \(\{\frac{\xi_{k}^{2}}{\sigma_{k}^{2}} \}\) in the following example. Let η k be independent bounded identically distributed random variables, Eη k =0, D η k =1 and \(g(k)>\sqrt{2}\) be an arbitrary function. Put

$$\xi_{k}:=\left \{ \begin{array}{l@{\quad }l} \eta_k & \mbox{with probability}\ 1-2g^{-2}(k),\\ \pm g(k) & \mbox{with probability}\ g^{-2}(k). \end{array} \right . $$

Then clearly E ξ k =0, \(\sigma_{k}^{2}:=\mathbf{D}\xi_{k}=3-2g^{-2}(k)\in(2,3)\) and \(B_{n}^{2}\in (2n,3n)\). The uniform integrability of \(\{\frac{\xi_{k}^{2}}{\sigma_{k}^{2}} \}\), or the uniform integrability of \(\{\xi_{k}^{2}\}\) which means the same in our case, excludes the case where g(k)→∞ as k→∞. The Lindeberg condition is wider and allows the growth of g(k), except for the case where \(g(k)>c\sqrt{k}\). If \(g(k)=o (\sqrt{k} )\), then the Lindeberg condition is satisfied because, for any fixed τ>0,

$$\mathbf{E} \bigl(\xi^2;|\xi_k|>\tau\sqrt{k} \bigr)=0 $$

for all large enough k.

Remark 8.4.4

Let us show that condition [M 2] (or [D 2]) is essential for the central limit theorem. Consider random variables

$$\xi_{k,n}=\left \{ \begin{array}{l@{\quad }l} \pm\frac{1}{\sqrt{2}} & \mbox{with probability}\ \frac{1}{n},\\ 0 & \mbox{with probability}\ 1-\frac{2}{n}. \end{array} \right . $$

They satisfy conditions (8.4.2), [S], but not the Lindeberg condition as M 2(τ)=1 for \(\tau<\frac{1}{\sqrt{2}}\). The number ν k of non-zero summands converges in distribution to a random variable ν having the Poisson distribution with parameter 2. Therefore, ζ n will clearly converge in distribution not to the normal law, but to \(\sum_{j=1}^{\nu}\gamma_{j}\), where γ j are independent and take values ±1 with probability 1/2.

Note also that conditions [D 2] or [M 2] are not necessary for convergence of the distributions of ζ n to the normal distribution. Indeed, consider the following example: ξ 1,n ⊂=Φ 0,1, ξ 2,n =⋯=ξ n,n =0. Conditions (8.4.2) are clearly met, P(ζ n <x)=Φ(x), but the variables ξ k,n are not negligible and therefore do not satisfy conditions [D 2] and [M 2].

If, however, as well as convergence ζ n ⊂=⇒Φ 0,1 we require that the ξ k,n are negligible, then conditions [D 2] and [M 2] become necessary.

Theorem 8.4.3

Suppose that the sequences of independent random variables \(\{\xi_{k,n}\}^{n}_{k=1}\) satisfy conditions (8.4.2) and [S]. Then condition [D 1] (or [M 2]) is necessary and sufficient for convergence ζ n ⊂=⇒Φ 0,1.

First note that the assertions of Lemmas 8.3.2 and 8.3.3 remain true, up to some inessential modifications, if we substitute conditions (8.3.2) and [S] with (8.4.2) and [S].

Lemma 8.4.2

Let conditions (8.4.2) and [S] hold. Then (Δ k (t)=φ k,n (t)−1)

$$\max_{k\le n}\bigl|\varDelta _k(t)\bigr|\to0,\quad \sum\bigl| \varDelta _k(t)\bigr|\le\frac{t^2}{2}, $$

and the assertion of Lemma 8.3.3, that the convergence (8.3.10) is necessary and sufficient for convergence \(\varphi_{\zeta_{n}}(t)\to\varphi(t)\), remain completely true.

Proof

We can retain all the arguments in the proofs of Lemmas 8.3.2 and 8.3.3 except for one place where ∑|Δ k (t)| is bounded. Under the new conditions, by Lemma 7.4.1, we have

$$\bigl|\varDelta _k(t)\bigr|=\bigl|\varphi_{k,n}(t)-1-it\mathbf{E} \xi_{k,n}\bigr|\le \mathbf{E}\bigl|e^{it\xi_{k,n}}-1-it\xi_{k,n}\bigr|\le \frac{t^2}{2} \mathbf{E}\,\xi^2_{k,n}, $$

so that

$$\sum\bigl|\varDelta _k(t)\bigr|\le\frac{t^2}{2}. $$

No other changes in the proofs of Lemmas 8.3.2 and 8.3.3 are needed. □

Proof of Theorem 8.4.3

Sufficiency is already proved. To prove necessity, we make use of Lemma 8.4.1. If \(\varphi _{\zeta _{n} } (t) \to e^{-{t^{2}}/2} \), then by virtue of that lemma, for Δ k (t)=φ k,n (t)−1, one has

$$\sum_{k=1}^{n} \varDelta _k (t) \to\ln\varphi(t) =-\frac{t^2}{2} . $$

For t=1 the above relation can be written in the form

$$ R_n :=\sum_{k=1}^{n} \mathbf{E} \biggl( e^{i\xi_{k,n}} -1 -i\xi_{k,n} +\frac{1}{2} \xi_{k,n}^2 \biggr) \to0. $$
(8.4.9)

Put α(x):=(e ix−1−ix)/x 2. It is not hard to see that the inequality |α(x)|≤1/2 proved in Lemma 7.4.1 is strict for x≠0, and

$$\sup_{|x| \ge\tau} \bigl|\alpha(x)\bigr| < \frac{1}{2} -\delta(\tau), $$

where δ(τ)>0 for τ>0. This means that, for |x|≥τ>0,

and hence by virtue of (8.4.9), for any τ>0,

$$M_2 (\tau) \le\frac{1}{\delta(\tau) } |R_n| \to0 $$

as n→∞. The theorem is proved. □

Corollary 8.4.2

Assume that (8.4.2) holds and

$$ \max_{k \le n} \operatorname{Var}( \xi_{k,n}) \to0 . $$
(8.4.10)

Then a necessary and sufficient condition for convergence ζ n ⊂=⇒Φ 0,1 is that

$$\eta_n := \sum_{k=1}^{n} \xi_{k,n}^{2} \mathbin{\subset\!\!\!\!\!=\!\!\!\!\Rightarrow}\mathbf{I}_1 $$

(or that \(\eta_{n} \stackrel{p}{\to}1\)).

Proof

Let η n ⊂=⇒I 1. The random variables \(\xi'_{k,n} =\xi_{k,n}^{2} -\sigma_{k,n}^{2} \) satisfy, by virtue of (8.4.10), condition (8.3.10) and satisfy the law of large numbers:

$$\sum_{k=1}^{n} \xi'_{k,n} =\eta_n -1 \stackrel{p}{\to}0. $$

Therefore, by Theorem 8.3.5, the \(\xi'_{k,n}\) satisfy condition [M 1]: for any τ>0,

$$ \sum_{k=1}^{n} \mathbf{E} \bigl(\bigl|\xi_{k,n}^{2} - \sigma_{k,n}^{2} \bigr|; \bigl|\xi_{k,n}^{2} - \sigma_{k,n}^{2} \bigr| >\tau\bigr) \to0. $$
(8.4.11)

But by (8.4.10) this condition is clearly equivalent to condition [M 2] for ξ k,n , and hence ζ n ⊂=⇒Φ 0,1.

Conversely, if ζ n ⊂=⇒Φ 0,1, then [M 2] holds for ξ k,n which implies (8.4.11). Since, moreover,

$$\sum_{k=1}^{n} \mathbf{E}\bigl| \xi'_{k,n} \bigr| \le2\sum_{k=1}^{n} \operatorname{Var}(\xi _{k,n}) =2, $$

relation (8.3.2) holds for \(\xi'_{k,n}\), and by Theorem 8.3.1

$$\sum_{k=1}^{n} {\xi'}_{k,n} =\eta_n -1 \stackrel{p}{\to}0. $$

The corollary is proved. □

Example 8.4.2

Let ξ k , k=1,2,…, be independent random variables with distributions

$$\mathbf{P}\bigl(\xi_k=k^\alpha\bigr)=\mathbf{P}\bigl( \xi_k=-k^\alpha\bigr)=\frac{1}{2}. $$

Evidently, ξ k can be represented as ξ k =k α η k , where \(\eta_{k}\stackrel{d}{=}\eta\) are independent,

$$\mathbf{P}(\eta=1)=\mathbf{P}(\eta=-1)=\frac{1}{2},\qquad \operatorname{Var}(\eta)=1,\qquad\sigma_k^2= \operatorname{Var}(\xi _k)=k^{2\alpha}. $$

Let us show that, for all α≥−1/2, the random variables S n /B n are asymptotically normal. Since

$$\frac{\xi_k^2}{\sigma_k^2}\stackrel{d}{=} \eta^2 $$

are uniformly integrable, by Theorem 8.4.2 it suffices to verify the condition

$$\overline{\sigma}_n=\max _{k\leq n}\sigma_k=o(B_n). $$

In our case \(\overline{\sigma}_{n}=\max(1,n^{2\alpha})\) and, for α>−1/2,

$$B_n^2=\sum_{k=1}^n k^{2\alpha}\sim\int_0^n x^{2\alpha}d \alpha=\frac{n^{2\alpha+1}}{2\alpha+1}. $$

For α=−1/2, one has

$$B_n^2 =\sum_{k=1}^{n} k^{-1} \sim\ln n . $$

Clearly, in these cases \(\overline{\sigma}_{n}=o(B_{n})\) and the asymptotical normality of S n /n holds.

If α<−1/2 then the sequence B n converges, condition \(\overline{\sigma}_{n}=1=o(B_{n})\) is not satisfied and the asymptotical normality of S n /B n fails to take place.

Note that, for α=−1/2, the random variable

$$S_n=\sum _{k=1}^n \frac{\eta_k}{\sqrt{k}} $$

will be “comparable” with \(\sqrt{\ln n}\) with a high probability, while the sums

$$\sum _{k=1}^n\frac{(-1)^k}{\sqrt{k}} $$

converge to a constant.

A rather graphical and well-known illustration of the above theorems is the scattering of shells when shooting at a target. The fact is that the trajectory of a shell is influenced by a large number of independent factors of which the individual effects are small. These are deviations in the amount of gun powder, in the weight and size of a shell, variations in the humidity and temperature of the air, wind direction and velocities at different altitudes and so on. As a result, the deviation of a shell from the aiming point is described by the normal law with an amazing accuracy.

Similar observations could be made about errors in measurements when their accuracy is affected by many “small” factors. (There even exists a theory of errors of which the crucial element is the central limit theorem.)

On the whole, the central limit theorem has a lot of applications in various areas. This is due to its universality and robustness under small deviations from the assumptions of the theorem, and its relatively high accuracy even for moderate values of n. The first two noted qualities mean that:

(1) the theorem is applicable to variables ξ k,n with any distributions so long as the variances of ξ k,n exist and are “negligible”;

(2) the presence of a “moderate” dependenceFootnote 2 between ξ k,n does not change the normality of the limiting distribution.

To illustrate the accuracy of the normal approximation, consider the following example. Let \(F_{n} (x) =\mathbf{P}(S_{n} /\sqrt{n} <x)\) be the distribution function of the normalised sum S n of independent variables ξ k uniformly distributed over \([-\sqrt{3},\sqrt{3}]\), so that \(\operatorname{Var}( \xi_{k} )=1\). Then it turns out that already for n=5 (!) the maximum of |F n (x)−Φ(x)| over the whole axis of x-values does not exceed 0.006 (the maximum is attained near the points x=±0.7).

And still, despite the above circumstances, one has to be careful when applying the central limit theorem. For instance, one cannot expect high accuracy from the normal approximation when estimating probabilities of rare events, say when studying large deviation probabilities (this issue has already been discussed in Sect. 5.3). After all, the theorem only ensures the smallness of the difference

$$ \bigl|\varPhi(x) -\mathbf{P}(\zeta<x )\bigr| $$
(8.4.12)

for large n. Suppose we want to use the normal approximation to find an x 0 such that the event {ζ n >x 0} would occur on average once in 1000 trials (a problem of this sort could be encountered by an experimenter who wants to ensure that, in a single experiment, such an event will not occur). Even if the difference (8.4.12) does not exceed 0.02 (which can be a good approximation) then, using the normal approximation, we risk making a serious error. It can turn out, say, that 1−Φ(x 0)=10−3 while P(ζ<x)≈0.02, and then the event {ζ n >x 0} will occur much more often (on average, once in each 50 trials).

In Chap. 9 we will consider the problem of large deviation probabilities that enables one to handle such situations. In that case one looks for a function P(n,x) such that P(ζ<x)/P(n,x)→1 as n→∞, x→∞. The function P(n,x) turns out to be, generally speaking, different from 1−Φ(x). We should note however that using the approximation P(n,x) requires more restrictive conditions on {ξ k,n }.

In Sect. 8.7 we will consider the so-called integro-local and local limit theorems that establish convergence of the density of ζ n to that of the normal law and enables one to estimate probabilities of rare events of another sort—say, of the form {a<ζ n <b} where a and b are close to each other.

8.5 * Another Approach to Proving Limit Theorems. Estimating Approximation Rates

The approach to proving the principal limit theorems for the distributions of sums of random variables that we considered in Sects. 8.18.4 was based on the use of ch.f.s. However, this is by far not the only method of proof of such assertions. Nowadays there exist several rather simple proofs of both the laws of large numbers and the central limit theorem that do not use the apparatus of ch.f.s. (This, however, does not belittle that powerful, well-developed, and rather universal tool.) Moreover, these proofs sometimes enable one to obtain more general results. As an illustration, we will give below a proof of the central limit theorem that extends, in a certain sense, Theorems 8.4.1 and 8.4.3 and provides an estimate of the convergence rate (although not the best one).

Along with the random variables ξ k,n in the triangular array scheme under assumption (8.4.2), consider mutually independent and independent of the sequence \({ \{ \xi_{k,n} \} }_{k=1}^{n}\) random variables \(\eta_{k,n} \mathbin{\subset\!\!\!\!\!=}{\boldsymbol{\Phi }}_{0,\sigma_{k,n}^{2}}\), \(\sigma_{k,n} :=\operatorname{Var}(\xi_{k,n })\), so that

$$\eta_n :=\sum_{k=1}^{n} \eta_{k,n} \mathbin{\subset\!\!\!\!\!=}{\boldsymbol{\Phi}}_{0,1}. $$

SetFootnote 3

$$\mu_{k,n} :=\mathbf{E}|\xi_{k,n}|^3,\qquad \nu_{k,n} :=\mathbf{E}|\eta_{k,n}|^3=c_3 \sigma_{k,n}^3 \le c_3 \mu_{k,n}, $$
$$\mu_{k,n}^0 :=\int|x|^3 \bigl|\,d \bigl(F_{k,n}(x) -\varPhi_{k,n} (x)\bigr)\bigr| \le\mu _{k,n} + \nu_{k,n }, $$
$$L_3 :=\sum_{k=1}^{n} \mu_{k,n},\qquad N_3 :=\sum_{k=1}^{n} \nu_{k,n}, \qquad L_3^0 :=\sum _{k=1}^{n} \mu_{k,n}^0 \le L_3 +N_3 \le (1+c_3)L_3. $$

Here F k,n and Φ k,n are the distribution functions of ξ k,n and η k,n , respectively. The quantities L 3 and N 3 are the third order Lyapunov fractions for the sequences {ξ k,n } and {η k,n }. The quantities \(\mu_{k,n}^{0}\) are called the third order pseudomoments and \(L_{s}^{0}\) the Lyapunov fractions for pseudomoments. Clearly, N 3c 3 L 3→0, provided that the Lyapunov condition holds. As we have already noted, for ξ k,n =(ξ k a k )/B n , where a k =E ξ k , \(B_{n}^{2} =\sum_{1}^{n} \operatorname{Var} (\xi_{k}) \), and ξ k do not depend on n, one has

$$L_3 =\frac{1}{B_n^3 } \sum_{k=1}^n \mu_k,\quad\mu_k =\mathbf {E}|\xi_k -a_k |^3. $$

If, moreover, ξ k are identically distributed, then

$$L_3 =\frac{\mu_1}{\sigma^3 \sqrt{n}}. $$

Our first task here is to estimate the closeness of E f(ζ n ) to E f(η n ) for sufficiently smooth f. This problem could be of independent interest. Assume that f belongs to the class C 3 of all bounded functions with uniformly continuous and bounded third derivatives: sup x |f (3)(x)|≤f 3.

Theorem 8.5.1

If fC 3 then

$$ \bigl|\mathbf{E}f(\zeta_n )-\mathbf{E}f( \eta_n )\bigr| \le\frac{f_3 L_3^0}{6} \le\frac{f_3 }{6} (L_3 + N_3). $$
(8.5.1)

Proof

Put, for 1<ln,

$$\begin{aligned} X_l :=& \xi_{1,n} +\cdots+\xi_{l-1,n}+ \eta_{l,n} +\cdots+\eta_{n,n}, \\Z_l :=& \xi_{1,n} +\cdots+\xi_{l-1,n}+ \eta_{l+1,n} +\cdots+\eta_{n,n}, \\X_1 :=& \eta_n,\qquad X_{n+1} = \zeta_n. \end{aligned}$$

Then

$$ X_{l+1} =Z_l +\xi_{l,n} , \qquad X_l =Z_l +\eta_{l,n}, $$
(8.5.2)
$$ f(\zeta_n) -f(\eta_n) =\sum _{l=1}^n \bigl[f(X_{l+1})-f(X_l)\bigr]. $$
(8.5.3)

Now we will make use of the following lemma.

Lemma 8.5.1

Let fC 3 and Z, ξ and η be independent random variables with

$$\mathbf{E}\xi= \mathbf{E}\eta=a,\qquad\mathbf{E}\xi^2 = \mathbf {E} \eta^2 =\sigma^2,\qquad \mu^0 = \int|x^3| \bigl|d\bigl(F_{\xi} (x) -F_{\eta} (x)\bigr)\bigr|< \infty. $$

Then

$$ \bigl|\mathbf{E}f(Z+\xi) -\mathbf{E}f(Z+\eta)\bigr| \le\frac{ f_3 \mu^0}{6}. $$
(8.5.4)

Applying this lemma to (8.5.3), we get

$$\bigl|\mathbf{E}\bigl[f(X_{l+1 }) -f(X_1)\bigr]\bigr| \le \frac{ f_3 \mu^0}{6} $$

which after summation gives (8.5.1). The theorem is proved.  □

Thus to complete the argument proving Theorem 8.5.1 it remains to prove Lemma 8.5.1.

Proof of Lemma 8.5.1

Set g(x):=E f(Z+x). It is evident that g, being the result of the averaging of f, has all the smoothness properties of f and, in particular, |g‴(x)|≤f 3. By virtue of the independence of Z, ξ and η, we have

$$ \mathbf{E}f(Z+\xi) -\mathbf{E}f(Z+\eta) =\int g(x)\, d \bigl(F_{\xi} (x) -F_{\eta} (x)\bigr). $$
(8.5.5)

For the integrand, we make use of the expansion

$$g(x) =g(0) +xg' (0) +\frac{x^2}{2} g'' (0)+\frac{x^3}{6} g''' ( \theta_x), \quad\theta_x \in[0,x]. $$

Since the first and second moments of ξ coincide with those of η, we obtain for the right-hand side of (8.5.5) the bound

$$\biggl \vert \frac{1}{6} x^3 g''' (\theta_x ) \,d \bigl(F_{\xi} (x) -F_{\eta} (x) \bigr) \biggr \vert \le\frac{f_3 \mu^0}{6} . $$

The lemma is proved. □

Remark 8.5.1

In exactly the same way one can establish the representation

$$ \bigl|\mathbf{E}f(\zeta_n )-\mathbf{E}f( \eta_n )\bigr| \le\frac{g''' (0)}{6} \sum_{k=1}^n \mathbf{E}\bigl(\xi_{k,n}^3 -\eta_{k,n}^3 \bigr) +\frac{f_4 L_4^0 }{24}, $$
(8.5.6)

under obvious conventions for the notations f 4 and \(L^{0}_{4}\). This bound can improve upon (8.5.1) if the differences \(\mathbf{E} (\xi_{k,n}^{3} -\eta_{k,n}^{3} )\) are small. If, for instance, \(\xi_{k,n} = (\xi_{k} -a)/(\sigma\sqrt{n})\), ξ k are identically distributed, and the third moments of ξ k,n and η k,n coincide, then on the right-hand side of (8.5.6) we will have a quantity of the order 1/n.

Theorem 8.5.1 extends Theorem 8.4.1 in the case when s=3. The extension is that, to establish convergence ζ n ⊂=⇒Φ 0,1, one no longer needs the negligibility of ξ k,n . If, for example, ξ 1,n ⊂=Φ 0,1/2 (in that case \(\mu_{1,n}^{0} =0\)) and \(L_{3}^{0} \to0\), then E f(ζ n )→E f(η), η⊂=Φ 0,1, for any f from the class C 3. Since C 3 is a distribution determining class (see Chap. 6), it remains to make use of Corollary 6.3.2.

We can strengthen the above assertion.

Theorem 8.5.2

For any \(x\in\mathbb{R}\),

$$ \bigl|\mathbf{P}(\zeta_n <x) -\varPhi(x)\bigr| \le c \bigl(L_3^0 \bigr)^{1/4}, $$
(8.5.7)

where c is an absolute constant.

Proof

Take an arbitrary function hC 3, 0≤h≤1, such that h(x)=1 for x≤0 and h(x)=0 for x≥1, and put h 3=sup x |h‴(x)|. Then, for the function f(x)=h((xt)/ε), we will have f 3=sup x |f‴(x)|≤h 3/ε 3, and by Theorem 8.5.1

$$\begin{aligned} \mathbf{P}(\zeta_n <t) \le& \mathbf{E}f( \zeta_n ) \le \mathbf{E}f(\eta) + \frac{f_3 L_3^0}{6} \\\le& \mathbf{P}(\eta< t +\varepsilon)+\frac{h_3 L_3^0}{6\varepsilon^3 } \le\mathbf{P} ( \eta<t) +\frac{\varepsilon}{\sqrt{2\pi}}+\frac{h_3 L_3^0}{6\varepsilon^3 }. \end{aligned}$$

The last inequality holds since the maximum of the derivative of the normal distribution function Φ(t)=P(η<t) is equal to \(1/\sqrt{2\pi}\). Establishing in the same way the converse inequality and putting \(\varepsilon=(L_{3}^{0})^{1/4}\), we arrive at (8.5.7). The theorem is proved. □

The bound in Theorem 8.5.2 is, of course, not the best one. And yet inequality (8.5.7) shows that we will have a good normal approximation for P(ζ n <x) in the large deviations range (i.e. for |x|→∞) as well—at least for those x for which

$$ \bigl(1-\varPhi\bigl(|x|\bigr)\bigr) \bigl(L_3^0 \bigr)^{-1/4} \to\infty $$
(8.5.8)

as n→∞. Indeed, in that case, say, for x=|x|>0,

$$\biggl \vert \frac{\mathbf{P}(\zeta_n \geq x)}{1-\varPhi(x)} -1 \biggr \vert \le \frac{c(L_3^0)^{1/4}}{1-\varPhi(x)} \to0. $$

Since by L’Hospital’s rule

$$1-\varPhi(x) =\frac{1}{\sqrt{2\pi}} \int_x^{\infty} e^{{-t^2}/2}\, dt \sim\frac{1}{\sqrt{2\pi} x} e^{{-x^2}/2} \quad\mbox{as}\ x \to \infty, $$

(8.5.8) holds for \(|x| < c_{1} \sqrt{-\ln L_{3}^{0}}\) with an appropriately chosen constant c 1.

In Chap. 20 we will obtain an extension of Theorems 8.5.1 and 8.5.2.

The problem of refinements and approximation rate bounds in the central limit theorem and other limit theorems is one of the most important in probability theory, because solving it will tell us how precise and efficient the applications of these theorems to practical problems will be. First of all, one has to find the true order of the decay of

$$\varDelta _n =\sup_x \bigl|\mathbf{P}( \zeta_n <x) -\varPhi(x) \bigr| $$

in n (or, say, in L 3 in the case of non-identically distributed variables). There exist at least two approaches to finding sharp bounds for Δ n . The first one, the so-called method of characteristic functions, is based on the unimprovable bound for the closeness of the ch.f.s

$$\biggl \vert \ln\varphi_{\zeta_n} (t) +\frac{t^2}{2}\biggr \vert <cL_3 $$

that the reader can obtain by him/herself, using Lemma 7.4.1 and somewhat modifying the argument in the proof of Theorem 8.4.1. The principal technical difficulties here are in deriving, using the inversion formula, the same order of smallness for Δ n .

The second approach, the so-called method of compositions, has been illustrated in the present section in Theorem 8.5.1 (the idea of the method is expressed, to a certain extent, by relation (8.5.3)). It will be using just that method that we will prove in Appendix 5 the following general result (Cramér–Berry–Esseen):

Theorem 8.5.3

If ξ k,n =(ξ k a k )/B n , where ξ k do not depend on n, then

$$\sup_x \bigl|\mathbf{P}(\zeta_n <x) -\varPhi(x)\bigr | \le cL_3, $$

where c is an absolute constant.

In the case of identically distributed ξ k the right-hand side of the above inequality becomes \({c\mu_{1} }/(\sigma^{3} \sqrt{n})\). It was established that in this case (2π)−1/2<c<0.4774, while in the case of non-identically distributed summands c<0.5591.Footnote 4

One should keep in mind that the above theorems and the bounds for the constant c are universal and therefore hold under the most unfavourable conditions (from the point of view of the approximation). In real problems, the convergence rate is usually much better.

8.6 The Law of Large Numbers and the Central Limit Theorem in the Multivariate Case

In this section we assume that ξ 1,n , … ,ξ n,n are random vectors in the triangular array scheme,

$$\mathbf{E} \xi_{k,n}=0,\qquad\zeta_n=\sum ^n_{k=1}\xi_{k,n}. $$

The law of large numbers \(\zeta_{n}\overset{p}{\to}\, 0\) follows immediately from Theorem 8.3.1, if we assume that the components of ξ k,n satisfy the conditions of that theorem. Thus we can assume that Theorem 8.3.1 was formulated and proved for vectors.

Dealing with the central limit theorem is somewhat more complicated. Here we will assume that E|ξ k,n |2<∞, where |x|2=(x,x) is square of the norm of x. Let

$$\sigma^2_{k,n}:=\mathbf{E}\,\xi^T_{k,n} \xi_{k,n},\qquad \sigma^2_n:=\sum ^n_{k=1}\sigma^2_{k,n} $$

(the superscript T denotes transposition, so that \(\xi^{T}_{k,n}\) is a column vector).

Introduce the condition

figure i

and the Lindeberg condition

figure j

as n→∞ for any τ>0. As in the univariate case, we can easily verify that conditions [D 2] and [M 2] are equivalent provided that \(\operatorname{tr} \sigma^{2}_{n} := \sum^{d}_{j = 1}(\sigma^{2}_{n})_{jj}<c<\infty\).

Theorem 8.6.1

If \(\sigma^{2}_{n}\to\sigma^{2}\), where σ 2 is a positive definite matrix, and condition [D 2] (or [M 2]) is met, then

$$\zeta_n \mathbin{\subset\!\!\!\!\!=\!\!\!\!\Rightarrow}{\boldsymbol{\Phi}}_{0,\sigma^2}. $$

Corollary 8.6.1

(“The conventional” central limit theorem)

If ξ 1,ξ 2, … is a sequence of independent identically distributed random vectors, E ξ k =0, \(\sigma^{2}=\mathbf{E}\,\xi^{T}_{k}\xi_{k}\) and \(S_{n}=\sum ^{n}_{k=1}\xi_{k}\) then, as n→∞,

$$\frac{S_n}{\sqrt{n}} \mathbin{\subset\!\!\!\!\!=\!\!\!\!\Rightarrow}{\boldsymbol{\Phi }}_{0,\sigma^2}. $$

This assertion is a consequence of Theorem 8.6.1, since the random variables \(\xi_{k,n}=\xi_{k}/\sqrt{n}\) satisfy its conditions.

Proof of Theorem 8.6.1

Consider the characteristic functions

$$\varphi_{k,n}(t):=\mathbf{E} e^{i(t,\xi_{k,n})},\qquad \varphi_{n}(t):=\mathbf{E} e^{i(t,\zeta_{n})}=\prod ^n_{k=1}\varphi_{k,n}(t). $$

In order to prove the theorem we have to verify that, for any t, as n→∞,

$$\varphi_n(t)\to\exp \biggl\{-\frac{1}{2}t \sigma^2t^T \biggr\}. $$

We make use of Theorem 8.4.1. We can interpret φ k,n (t) and φ n (t) as the ch.f.s

$$\varphi^\theta_{k,n} (v)=\mathbf{E}\,\exp\bigl(iv \xi^\theta _{k,n}\bigr),\qquad \varphi^\theta_{n} (v)=\mathbf{E}\exp\bigl(iv\zeta^\theta_{n}\bigr) $$

of the random variables \(\xi^{\theta}_{k,n}=(\xi_{k,n},\theta)\), \(\zeta^{\theta}_{n}=(\zeta_{n},\theta)\), where θ=t/|t|, v=|t|. Let us show that the scalar random variables \(\xi^{\theta}_{k,n}\) satisfy the conditions of Theorem 8.4.1 (or Corollary 8.4.1) for the univariate case. Clearly,

$$\begin{aligned} \mathbf{E}\,\xi^\theta_{k,n} =&0,\qquad \sum ^n_{k=1}\mathbf{E}\,\bigl(\xi^\theta_{k,n} \bigr)^2= \sum^n_{k=1}\mathbf {E}(\xi_{k,n}, \theta)^2 =\theta\sigma^2_n \theta^T\to\theta\sigma^2\theta^T>0. \end{aligned}$$

That condition [D 2] is satisfied follows from the obvious inequalities

$$\begin{aligned} (\xi_{k,n}, \theta)^2 =\bigl|\xi^\theta_{k,n}\bigr|^2 \le |\xi_{k,n}|^2,\qquad\sum^n_{k=1} \mathbf{E}\, g_2 \bigl(\xi^\theta _{k,n}\bigr) \le \sum^n_{k=1}\mathbf{E} g_2 \bigl(|\xi_{k,n}| \bigr), \end{aligned}$$

where g 2(x)=min(x 2,|x|s), s>2. Thus, for any v and θ (i.e., for any t), by Corollary 8.4.1 of Theorem 8.4.1

$$\varphi_n(t)=\mathbf{E}\exp\bigl\{iv\zeta^\theta_n \bigr\}\to\exp \biggl\{-\frac{1}{2} v^2\theta \sigma^2\theta^T \biggr\}= \exp \biggl\{- \frac{1}{2} t\sigma^2t^T \biggr\}. $$

The theorem is proved. □

Theorem 8.6.1 does not cover the case where the entries of the matrix \(\sigma^{2}_{n}\) grow unboundedly or behave in such away that the rank of the limiting matrix σ 2 becomes less than the dimension of the vectors ξ k,n . This can happen when the variances of different components of ξ k,n have different orders of decay (or growth). In such a case, one should consider the transformed sums \(\zeta'_{n} = \zeta_{n} \sigma^{-1}_{n}\) instead of ζ n . Theorem 8.6.1 is actually a consequence of the following more general assertion which, in turn, follows from Theorem 8.6.1.

Theorem 8.6.2

If the random variables \(\xi'_{k,n} = \xi_{k,n} \sigma^{-1}_{n}\) satisfy condition [D 2] (or [M 2]) then \(\zeta'_{n} \mathbin{\subset\!\!\!\!\!=\!\!\!\!\Rightarrow}{\boldsymbol{\Phi }}_{0,E}\), where E is the identity matrix.

8.7 Integro-Local and Local Limit Theorems for Sums of Identically Distributed Random Variables with Finite Variance

Theorem 8.2.1 from Sect. 8.2 is called the integral limit theorem. To understand the reasons for using such a name, one should compare this assertion with (more accurate) limit theorems of another type, that describe the asymptotic behaviour of the densities of the distributions of S n (if any) or the asymptotics of the probabilities of sums S n hitting a fixed interval. It is natural to call the theorems for densities local theorems. Theorems similar to Theorem 8.2.1 can be obtained from the local ones (if the densities exist) by integrating, and it is natural to call them integral theorems. Assertions about the asymptotics of the probabilities of S n hitting an interval are “intermediate” between the local and integral theorems, and it is natural to call them integro-local theorems. In the literature, such statements are often also referred to as local, apparently because they describe the probability of the localisation of the sum S n in a given interval.

8.7.1 Integro-Local Theorems

Integro-local theorems describe the asymptotics of

$$\mathbf{P} \bigl(S_n\in[x,x+\varDelta ) \bigr) $$

as n→∞ for a fixed Δ>0. Probabilities of this type for increasing Δ (or for Δ=∞) can clearly be obtained by summing the corresponding probabilities for fixed Δ.

We will derive integro-local and local theorems with the inversion formulas from Sect. 8.7.2.

For the sake of brevity, put

$$\varDelta [x)=[x,x+\varDelta ) $$

and denote by ϕ(x)=ϕ 0,1(x) the density of the standard normal distribution. Below we will restrict ourselves to the investigation of the sums S n =ξ 1+⋯+ξ n of independent identically distributed random variables \(\xi_{k}\stackrel{d}{=} \xi\).

Theorem 8.7.1

(The Stone–Shepp integro-local theorem)

Let ξ be a non-lattice random variable, Eξ=0 and Eξ 2=σ 2<∞. Then, for any fixed Δ>0, as n→∞,

$$ \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)= \frac{\varDelta }{\sigma\sqrt{n}}\,\phi \biggl(\frac{x}{\sigma\sqrt {n}} \biggr)+o \biggl( \frac{1}{\sqrt{n}} \biggr), $$
(8.7.1)

where the remainder term \(o (1/\sqrt{n} )\) is uniform in x.

Remark 8.7.1

Since relation (8.7.1) is valid for any fixed Δ, it will also be valid when Δ=Δ n →0 slowly enough as n→∞. If Δ=Δ n grows then the asymptotics of P(S n Δ[x)) can be obtained by summing the right-hand sides of (8.7.1) for, say, Δ=1 (if Δ n →∞ is integer-valued). Thus the integral theorem follows from the integro-local one but not vice versa.

Remark 8.7.2

By virtue of the properties of densities (see Sect. 3.2), the right-hand side of representation (8.7.1) has the same form as if the random variable \(\zeta_{n}={S_{n}}/{(\sigma\sqrt{n})}\) had the density ϕ(v)+o(1), although the existence of the density of S n (or ζ n ) is not assumed in the theorem.

Proof of Theorem 8.7.1

First prove the theorem under the simplifying assumption that condition

$$ \limsup_{|t|\to\infty} \bigl|\varphi(t) \bigr|<1 $$
(8.7.2)

is satisfied (the Cramér condition on the ch.f.). Property 11 of ch.f.s (see Sect. 8.7.1) implies that this condition is always met if the distribution of the sum S m , for some m≥1, has a positive absolutely continuous component. The proof of Theorem 8.7.1 in its general form is more complicated and will be given at the end of this section, in Sect. 8.7.3.

In order to use the inversion formula (7.2.8), we employ the “smoothing method” and consider, along with S n , the sums

$$ Z_n=S_n+\eta_{\delta}, $$
(8.7.3)

where η δ ⊂=U δ,0. Since the ch.f. \(\varphi_{\eta_{\delta}}(t)\) of the random variable η δ , being equal to

$$ \varphi_{\eta_{\delta}}(t)=\frac{1-e^{-it\delta}}{it\delta}, $$
(8.7.4)

possesses the property that the function \({\varphi_{\eta_{\delta}}(t)}/{t}\) is integrable at infinity, for the increments of the distribution function G n (x) of the random variable Z n (its ch.f. divided by t is integrable, too) we can use formula (7.2.8):

$$\begin{aligned} G_n(x+\varDelta )-G_n(x)=\mathbf{P} \bigl(Z_n\in \varDelta [x) \bigr) =& \frac{1}{2\pi}\int e^{-itx} \frac{1-e^{-it\varDelta }}{it}\ \,\varphi^n(t)\varphi_{\eta_{\delta}}(t)\,dt \\=&\frac{\varDelta }{2\pi}\int e^{-itx}\varphi^n(t)\widehat{ \varphi}(t)\,dt, \end{aligned}$$
(8.7.5)

where \(\widehat{\varphi}(t)=\varphi_{\eta_{\delta}}(t)\varphi_{\eta _{\varDelta }}(t)\) (cf. (7.2.8)) is the ch.f. of the sum of independent random variables η δ and η Δ . We obtain that the difference G n (x+Δ)−G n (x), up to the factor Δ, is nothing else but the value of the density of the random variable S n +η δ +η Δ at the point x.

Split the integral on the right-hand side of (8.7.5) into the two subintegrals: one over the domain |t|<γ for some γ<1, and the other—over the complementary domain. Put \(x=v\sqrt{n}\) and consider first

$$\begin{aligned} I_1:=\int_{|t|<\gamma}e^{-itv\sqrt{n}} \varphi^n(t)\widehat{\varphi}(t)\,dt= \frac{1}{\sqrt{n}}\int _{|u|<\gamma\sqrt{n}} e^{-iuv}\varphi^n \biggl( \frac{u}{\sqrt{n}} \biggr)\widehat{\varphi} \biggl(\frac{u}{\sqrt{n}} \biggr)\,du. \end{aligned}$$

Without loss of generality we can assume σ=1, and by (8.2.1) obtain that

$$\begin{aligned} 1-\varphi(t) =&\frac{t^2}{2}+o\bigl(t^2\bigr), \\\ln\varphi(t) =&\ln \bigl[1- \bigl(1-\varphi(t) \bigr) \bigr]= - \frac{t^2}{2}+o\bigl(t^2\bigr)\quad\mbox{as}\ t\to0. \end{aligned}$$
(8.7.6)

Hence

$$ n\ln\varphi \biggl(\frac{u}{\sqrt{n}} \biggr)=-\frac{u^2}{2}+h_n(u), $$
(8.7.7)

where h n (u)→0 for any fixed u as n→∞. Moreover, for γ small enough, in the domain \(|u|<\gamma\sqrt{n}\) we have

$$\bigl|h_n(u) \bigr|\leq\frac{u^2}{6}, $$

so the right-hand side of (8.7.7) does not exceed −u 2/3. Now we can rewrite I 1 in the form

$$ I_1=\frac{1}{\sqrt{n}}\int _{|u|<\gamma\sqrt{n}}\exp \biggl\{ -iuv-\frac{u^2}{2}+h_n(u) \biggr\} \widehat{\varphi} \biggl(\frac{u}{\sqrt{n}} \biggr)\,du, $$
(8.7.8)

where \(\vert \widehat{\varphi} ({u}/{\sqrt{n}} )\vert \leq1\) and \(\widehat{\varphi} ({u}/{\sqrt{n}}\, )\to1\) for any fixed u as n→∞. Therefore, by virtue of the dominated convergence theorem,

$$ \sqrt{n}\,I_1\to\int\exp \biggl\{-iuv- \frac{u^2}{2} \biggr\}\,du $$
(8.7.9)

uniformly in v, since the integral on the right-hand side of (8.7.8) is uniformly continuous in v. But the integral on the right-hand side of (8.7.9) is simply (up to the factor 1/(2π)) the result of applying the inversion formula to the ch.f. of the normal distribution, so that

$$\begin{aligned} \lim_{n\to\infty}\sqrt{n}\,I_1 =&\sqrt{2\pi} \,e^{-{v^2}/{2}}. \end{aligned}$$
(8.7.10)

It remains to consider the integral

$$I_2:=\int_{|t|\geq \gamma}e^{-itv\sqrt{n}} \varphi^n(t)\widehat{\varphi}(t)\,dt. $$

By virtue of (8.7.2) and non-latticeness of the distribution of ξ,

$$ q:=\sup_{|t|\geq\gamma} \bigl|\varphi(t) \bigr|<1 $$
(8.7.11)

and therefore

$$\begin{aligned} |I_2|\leq q^n\int_{|t|\geq\gamma}\bigl |\widehat{ \varphi}(t) \bigr|\,dt \leq& q^nc(\varDelta ,\delta),\qquad \lim _{n\to\infty}\sqrt{n} I_2=0 \end{aligned}$$
(8.7.12)

uniformly in v, where c(Δ,δ) depends on Δ and δ only. We have established that, for \(x=v\sqrt{n}\), as n→∞, the relations

$$ \displaystyle \begin{array}{rcl} \displaystyle I_1+I_2&=&\sqrt{ \frac{2\pi}{n}}\,e^{-v^2/2}+o \biggl(\frac{1}{\sqrt {n}} \biggr), \\\displaystyle \mathbf{P} \bigl(Z_n\in \varDelta [x) \bigr)&=&\frac{\varDelta }{\sqrt{2\pi n}} e^{-{x^2}/{(2n)}}+o \biggl(\frac{1}{\sqrt{n}} \biggr) \end{array} $$
(8.7.13)

hold uniformly in v (see (8.7.5)). This means that representation (8.7.13) holds uniformly for all x.

Further, by (8.7.3),

$$ \bigl\{Z_n\in[x,x+\varDelta -\delta) \bigr\}\subset \bigl \{S_n\in \varDelta [x) \bigr\}\subset \bigl\{Z_n\in[x- \delta,x+\varDelta ) \bigr\} $$
(8.7.14)

and, so, in particular,

$$\begin{aligned} \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)\leq\frac{\varDelta +\delta }{\sqrt {2\pi n}} e^{-{(x-\delta)^2}/{(2n)}}+o \biggl(\frac{1}{\sqrt{n}} \biggr)= \frac{\varDelta +\delta}{\sqrt{2\pi n}} e^{-{x^2}/{(2n)}}+o \biggl(\frac{1}{\sqrt{n}} \biggr). \end{aligned}$$

By (8.7.14) an analogous converse inequality also holds. Since δ is arbitrary, this is possible only if

$$ \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)= \frac{\varDelta }{\sqrt{2\pi n}}e^{-{x^2}/{(2n)}}+o \biggl(\frac{1}{\sqrt{n}} \biggr). $$
(8.7.15)

The theorem is proved. □

8.7.2 Local Theorems

If the distribution of S n has a density than we can obtain local theorems on the asymptotics of this density.

Theorem 8.7.2

Let Eξ=0, Eξ 2=σ 2<∞ and suppose there exists an m≥1 such that at least one of the following three conditions is met:

  1. (a)

    the distribution of S m has a bounded density;

  2. (b)

    the distribution of S m has a density from L 2;

  3. (c)

    the ch.f. φ m(t) of the sum S m is integrable.

Then, for nm, the distribution of the sum S n has density \(f_{S_{n}}(x)\) for which the representation

$$ f_{S_n}(x)=\frac{1}{\sqrt{2\pi n}\sigma} \exp \biggl\{-\frac{x^2}{2n\sigma^2} \biggr\}+o \biggl(\frac{1}{\sqrt{n}} \biggr) $$
(8.7.16)

holds uniformly in x as n→∞.

Conditions (a)–(c) are equivalent to each other (possibly with different values of m).

Proof

We first establish the equivalence of (a)–(c). The fact that a bounded density belongs to L 2 was proved in Sect. 7.2.3. Conversely, if fL 2 then

$$\begin{aligned} \bigl|f^{(2)*}(t) \bigr| =&\biggl \vert \int f(u)f(t-u)\,du\biggr \vert \\\leq &\biggl[\int f^2(u)\,du\times\int f^2(t-u)\,du \biggr]^{1/2}=\int f^2(u)\,du<\infty. \end{aligned}$$

Hence the relationship \(f_{S_{m}}\in L_{2}\) implies the boundedness of \(f_{S_{2m}}\), and thus (a) and (b) are equivalent.

If φ m is integrable then by Theorem 7.2.2 the density \(f_{S_{m}}\) exists and is bounded. Conversely, if \(f_{S_{m}}\) is bounded then \(f_{S_{m}}\in L_{2}\), \(\varphi_{S_{m}}\in L_{2}\) and \(\varphi_{S_{2m}}\in L_{1}\) (see Sect. 8.7.2). This proves the equivalence of (a) and (c).

We will now prove (8.7.16). By the inversion formula (7.2.1),

$$f_{S_n}(x)=\frac{1}{2\pi}\int e^{-itx} \varphi^{n}(t)\,dt. $$

Here the integral on the right-hand side does not “qualitatively” differ from the integral on the right-hand side of (8.7.5), we only have to put \(\widehat{\varphi}(t)\equiv1\) in the part I 1 of the integral (8.7.5) (the integral over the set |t|<γ), and, in the part I 2 (over the set |t|≥γ), to replace the integrable function \(\widehat{\varphi}(t)\) with the integrable function φ m(t) and to replace the function φ n(t) with φ nm(t). After these changes the whole argument in the proof of relation (8.7.13) remains valid, and therefore the same relation (up to the factor Δ) will hold for

$$f_{S_n}(x)=\frac{1}{\sqrt{2\pi n}\,\sigma}\, \exp \biggl\{-\frac{x^2}{2n\sigma^2} \biggr\}+o \biggl(\frac{1}{\sqrt {n}} \biggr). $$

The theorem is proved. □

Theorem 8.7.2 implies that the density \(f_{\zeta_{n}}\) of the random variable \(\zeta_{n}=\frac{S_{n}}{\sigma\sqrt{n}}\) converges to the density ϕ of the standard normal law:

$$f_{\zeta_n}(v)\to\phi(v) $$

uniformly in v as n→∞.

For instance, the density of the uniform distribution over [−1,1] satisfies the conditions of this theorem, and hence the density of S n at the point \(x=v\sigma\sqrt{n}\) (σ 2=1/3) will behave as \(\frac{1}{\sigma\sqrt{2\pi n}}\,e^{-{v^{2}}/{(2\sigma^{2})}}\) (cf. the remark to Example 3.6.1).

In the arithmetic case, where the random variable ξ is integer-valued and the greatest common divisor of all possible values of ξ equals 1 (see Sect. 7.1), it is the asymptotics of the probabilities P(S n =x) for integer x that become the subject of interest for local theorems. In this case we cannot assume without loss of generality that E ξ=0.

Theorem 8.7.3

(Gnedenko)

Let Eξ=a, Eξ 2=σ 2<∞ and ξ have an arithmetic distribution. Then, uniformly over all integers x, as n→∞,

$$ \mathbf{P}(S_n=x)= \frac{1}{\sqrt{2\pi n}\sigma} \exp \biggl\{\frac{(x-an)^2}{2n\sigma^2} \biggr\} +o \biggl( \frac{1}{\sqrt{n}} \biggr). $$
(8.7.17)

Proof

When proving limit theorems for arithmetic ξ, it is more convenient to use the generating functions (see Sects. 7.1, 7.7)

$$p(z)\equiv p_{\xi}(z):=\mathbf{E}\,z^\xi,\quad|z|=1, $$

so that p(e it)=φ(t), where φ is the ch.f. of ξ.

In this case the inversion formulas take the following form (see (7.2.10)): for integer x,

$$\begin{aligned} \mathbf{P}(\xi=x) =&\frac{1}{2\pi i}\int_{|z|=1}z^{-x-1}p(z)\,dz, \\\mathbf{P}(S_n=x) =&\frac{1}{2\pi i}\int_{|z|=1}z^{-x-1}p^n(z)\,dz= \frac{1}{2\pi}\int_{-\pi}^\pi e^{-itx} \varphi^n(t)\,dt. \end{aligned}$$

As in the proof of Theorem 8.7.1, here we split the integral on the right-hand side into two subintegrals: over the domain |t|<γ and over the complementary set. The treatment of the first subintegral

$$I_1:=\int_{|t|<\gamma}e^{-itx} \varphi^n(t)\,dt= \int_{|t|<\gamma}e^{-ity} \bigl[e^{-ita}\varphi(t) \bigr]^n \,dt $$

for y=xan differs from the considerations for I 1 in Theorem 8.7.1 only in that it is simpler and yields (see (8.7.10))

$$I_1=\frac{\sqrt{2\pi}}{\sigma\sqrt{n}}\,\exp \biggl\{-\frac {y^2}{2\pi\sigma^2} \biggr \}+ o \biggl(\frac{1}{\sqrt{n}} \biggr). $$

Similarly, the treatment of the second subintegral differs from that of I 2 in Theorem 8.7.1 in that it becomes simpler, since the range of integration here is compact and on that one has

$$ \bigl|\varphi(t) \bigr|\leq q( \gamma)<1. $$
(8.7.18)

Therefore, as in Theorem 8.7.1,

$$\begin{aligned} I_2=o \biggl(\frac{1}{\sqrt{n}} \biggr),\qquad\mathbf{P} (S_n=x)=\frac{1}{\sqrt{2\pi n}\sigma}\exp \biggl\{-\frac{y^2}{2n\sigma^2} \biggr \}+o \biggl(\frac{1}{\sqrt{n}} \biggr). \end{aligned}$$

The theorem is proved. □

Evidently, for the values of y of order \(\sqrt{n}\) Theorem 8.7.3 is a generalisation of the local limit theorem for the Bernoulli scheme (see Corollary 5.2.1).

8.7.3 The Proof of Theorem 8.7.1 in the General Case

To prove Theorem 8.7.1 in the general case we will use the same approach as in Sect. 7.1. We will again employ the smoothing method, but now, when specifying the random variable Z n in (8.7.3), we will take θη instead of η δ , where θ=const, η is a random variable with the ch.f. from Example 7.2.1 (see the end of Sect. 7.2) equal to

$$\varphi_{\eta}(t)=\left \{ \begin{array}{l@{\quad }l} 1-|t|, &|t|\leq1;\\ 0, & |t|>1, \end{array} \right . $$

so that for Z n =S n +θη, similarly to (8.7.5), we have

$$ \mathbf{P} \bigl(Z_n\in \varDelta [x) \bigr)= \frac{\varDelta }{2\pi}\int _{|t|\leq\frac{1}{\theta}} e^{-itx}\varphi^n(t) \varphi_{\eta_{\varDelta }}(t)\varphi_{\theta\eta}(t)\,dt, $$
(8.7.19)

where φ θη (t)=max(0,1−θ|t|). As in Sect. 8.7.1, split the integral on the right-hand side of (8.7.19) into two subintegrals: I 1 over the domain |t|<γ and I 2 over the domain γ≤|t|≤1/θ. The asymptotic behaviour of these integrals is investigated in almost the same way as in Sect. 8.7.1, but is somewhat simpler, since the domain of integration in I 2 is compact, and so, by the non-latticeness of ξ, one has on it the upper bound

$$ q:=\sup_{\gamma\leq|t|\leq1/\theta} \bigl|\varphi(t) \bigr|<1. $$
(8.7.20)

Therefore, to bound I 2 we no longer need condition (8.7.2).

Thus we have established, as above, relation (8.7.13).

To derive from this fact the required relation (8.7.15) we will need the following.

Lemma 8.7.1

Let f(y) be a bounded uniformly continuous function, η an arbitrary proper random variable independent of S n and b(n)→∞ as n→∞. If, for any fixed Δ>0 and θ>0, as n→∞, we have

$$ \mathbf{P} \bigl(S_n+\theta\eta\in \varDelta [x) \bigr)= \frac{\varDelta }{b(n)} \biggl[f \biggl(\frac{x}{b(n)} \biggr)+o(1) \biggr], $$
(8.7.21)

then

$$ \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)= \frac{\varDelta }{b(n)} \biggl[f \biggl(\frac{x}{b(n)} \biggr)+o(1) \biggr]. $$
(8.7.22)

In this assertion we can take S n to be any sequence of random variables satisfying (8.7.21). In this section we will set b(n) to be equal to \(\sqrt{n}\), but later (see the proof of Theorem A7.2.1 in Appendix 7) we will need some other sequences as well.

Proof

Put θ:=δ 2 Δ, where δ>0 will be chosen later, Δ ±:=(1±2δ)Δ, Δ ±[x):=[x,x+Δ ±) and f 0:=maxf(y). We first obtain an upper bound for P(S n Δ[x)). We have

$$\mathbf{P} \bigl(Z_n\in \varDelta _+[x-\varDelta \delta) \bigr)\geq\mathbf{P} \bigl(Z_n\in \varDelta _+[x-\varDelta \delta);|\eta|<1/\delta \bigr). $$

On the event |η|<1/δ one has −δΔ<θη<δΔ, and hence on this event

$$\bigl\{Z_n\in \varDelta _+[x-\varDelta \delta) \bigr\}\supset \bigl \{S_n\in \varDelta [x) \bigr\}. $$

Thus, by independence of η and S n ,

$$\mathbf{P} \bigl(Z_n\in \varDelta _+[x-\varDelta \delta) \bigr)\geq\mathbf{P} \bigl(S_n\in \varDelta [x);\,|\eta|<1/\delta \bigr)= \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr) \bigl(1-h(\delta) \bigr), $$

where h(δ):=P(|η|≥1/δ)→0 as δ→0. By condition (8.7.21) and the uniform integrability of f we obtain

$$\begin{aligned} \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr) \leq& \mathbf{P} \bigl(Z_n\in \varDelta _+[x-\varDelta \delta) \bigr) \bigl(1-h(\delta ) \bigr)^{-1} \\\leq &\biggl[\frac{\varDelta }{b(n)}\,f \biggl(\frac{x}{b(n)} \biggr)+ \frac {2\delta \varDelta f_0}{b(n)}+o \biggl(\frac{1}{b(n)} \biggr) \biggr] \bigl(1-h(\delta ) \bigr)^{-1}. \\ \end{aligned}$$
(8.7.23)

If, for a given ε>0, we choose δ>0 such that

$$\bigl(1-h(\delta) \bigr)^{-1}\leq1+\frac{\varepsilon \varDelta }{3}, \qquad 2\delta f_0\leq\frac{\varepsilon}{3}, $$

then we derive from (8.7.23) that, for all n large enough and ε small enough,

$$ \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)\leq \frac{\varDelta }{b(n)} \biggl(f \biggl(\frac{x}{b(n)} \biggr)+ \varepsilon \biggr). $$
(8.7.24)

This implies, in particular, that for all x,

$$ \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)\leq \frac{\varDelta }{b(n)} (f_0+\varepsilon). $$
(8.7.25)

Now we will obtain a lower bound for P(S n Δ[x)). For the event

$$A:= \bigl\{Z_n\in \varDelta _-[x+\varDelta \delta) \bigr\} $$

we have

$$ \mathbf{P}(A)=\mathbf{P} \bigl(A;|\eta|<1/\delta \bigr)+\mathbf {P} \bigl(A;\,| \eta|\geq 1/\delta \bigr). $$
(8.7.26)

On the event |η|<1/δ we have

$$\bigl\{Z_n\in \varDelta _-[x+\varDelta \delta) \bigr\}\subset \bigl \{S_n\in \varDelta [x) \bigr\}, $$

and hence

$$ \mathbf{P} \bigl(A;|\eta|<1/\delta \bigr)\leq\mathbf{P} \bigl(S_n \in \varDelta [x)\bigr ). $$
(8.7.27)

Further, by independence of η and S n and inequality (8.7.25),

$$\begin{aligned} \mathbf{P} \bigl(A;| \eta|\geq1/\delta \bigr) =&\mathbf{E}\, \bigl[\mathbf{P} (A\mid \eta);|\eta|\geq 1/\delta \bigr] \\=&\mathbf{E}\bigl[\mathbf{P} \bigl(S_n\in \varDelta _-[x+\theta\eta+\varDelta \delta)\mid \eta \bigr); |\eta|\geq 1/\delta \bigr] \\\leq&\frac{\varDelta }{b(n)}(f_0+\varepsilon)h(\delta). \end{aligned}$$

Therefore, combining (8.7.26), (8.7.27) and (8.7.21), we get

$$\begin{aligned} \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)\geq\frac{\varDelta }{b(n)}f \biggl(\frac {x}{b(n)} \biggr)- \frac{2\delta \varDelta f_0}{b(n)}+o \biggl( \frac{1}{b(n)} \biggr)- \frac{\varDelta }{b(n)}(f_0+\varepsilon)h( \delta). \end{aligned}$$

In addition, choosing δ such that

$$f_0h(\delta)<\frac{\varepsilon}{3},\qquad2\delta f_0< \frac{\varepsilon}{3}, $$

we obtain that, for all n large enough and ε small enough,

$$ \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)\geq \frac{\varDelta }{b(n)} \biggl(f \biggl(\frac{x}{b(n)} \biggr)-\varepsilon \biggr). $$
(8.7.28)

Since ε is arbitrarily small, inequalities (8.7.24) and (8.7.28) prove the required relation (8.7.22). The lemma is proved. □

To prove the theorem it remains to apply Lemma 8.7.1 in the case (see (8.7.13)) where f=ϕ and \(b(n)=\sqrt{n}\). Theorem 8.7.1 is proved.  □

8.7.4 Uniform Versions of Theorems 8.7.1–8.7.3 for Random Variables Depending on a Parameter

In the next chapter, we will need uniform versions of Theorems 8.7.1–8.7.3, where the summands ξ k depend on a parameter λ. Denote such summands by ξ (λ)k , the corresponding distributions by F (λ), and put

$$S_{(\lambda)n}:=\sum_{k=1}^n \xi_{(\lambda)k}, $$

where ξ (λ)k are independent copies of ξ (λ)⊂=F (λ). If λ is only determined by the number of summands n then we will be dealing with the triangular array scheme considered in Sects. 8.38.6 (the summands there were denoted by ξ k,n ). In the general case we will take the segment [0,λ 1] for some λ 1>0 as the parametric set, keeping in mind that λ∈[0,λ 1] may depend on n (in the triangular array scheme one can put λ=1/n).

We will be interested in what conditions must be imposed on a family of distributions F (λ) for the assertions of Theorems 8.7.1–8.7.3 to hold uniformly in λ∈[0,λ 1]. We introduce the following notation:

$$a(\lambda)=\mathbf{E}\xi_{(\lambda)},\qquad\sigma^2(\lambda )= \operatorname{Var}(\xi _{(\lambda)}), \qquad\varphi_{(\lambda)}(t)= \mathbf{E}e^{it\xi_{(\lambda)}}. $$

The next assertion is an analogue of Theorem 8.7.1.

Theorem 8.7.1A

Let the distributions F (λ) satisfy the following properties: 0<σ 1<σ(λ)<σ 2<∞, where σ 1 and σ 2 do not depend on λ:

  1. (a)

    the relation

    $$ \varphi_{(\lambda)}(t)-1-ia(\lambda)t+\frac{t^2m_2(\lambda)}{2} =o \bigl(t^2\bigr),\quad m_2(\lambda):=\mathbf{E}\, \xi_{(\lambda)}^2, $$
    (8.7.29)

    holds uniformly in λ∈[0,λ 1] as t→0, i.e. there exist a t 0>0 and a function ε(t)→0 as t→0, independent of λ, such that, for all |t|≤t 0, the absolute value of the left-hand side of (8.7.29) does not exceed ε(t)t 2;

  2. (b)

    for any fixed 0<θ 1<θ 2<∞,

    $$ q_{(\lambda)}:=\sup_{\theta_1\leq|t|\leq\theta_2} \bigl|\varphi _{(\lambda)}(t) \bigr|\leq q<1, $$
    (8.7.30)

    where q does not depend on λ.

Then, for each fixed Δ>0,

$$ \mathbf{P} \bigl(S_{(\lambda)n}-na(\lambda)\in \varDelta [x) \bigr)= \frac{\varDelta }{\sigma(\lambda)\sqrt{n}}\,\phi \biggl(\frac {x}{\sigma(\lambda)\sqrt{n}} \biggr)+ o \biggl( \frac{1}{\sqrt{n}} \biggr), $$
(8.7.31)

where the remainder term \(o ({1}/{\sqrt{n}} )\) is uniform in x and λ∈[0,λ 1].

Proof

Going through the proof of Theorem 8.7.1 in its general form (see Sect. 7.3), we see that, to ensure the validity of all the proofs of the intermediate assertions in their uniform forms, it suffices to have uniformity in the following two places:

(a) the uniformity in λ of the estimate o(t 2) as t→0 in relation (8.7.6) for the expansion of the ch.f. of the random variable \(\xi=\frac{\xi_{(\lambda)}-a(\lambda)}{\sigma(\lambda)}\);

(b) the uniformity in relation (8.7.20) for the same ch.f.

We verify the uniformity in (8.7.6). For φ(t)=Ee itξ, we have by (8.7.29)

$$\begin{aligned} \ln\varphi(t) =& -\frac{ita(\lambda)}{\sigma(\lambda)}+\ln\varphi _{(\lambda)} \biggl( \frac{t}{\sigma(\lambda)} \biggr)\\=& -\frac{t^2(m_2(\lambda)-a^2(\lambda))}{2\sigma^2(\lambda)}+o\bigl(t^2 \bigr){=} -\frac{t^2}{2}+o\bigl(t^2\bigr), \end{aligned}$$

where the remainder term is uniform in λ.

The uniformity in relation (8.7.20) clearly follows from condition b), since σ(λ) is uniformly separated from both 0 and ∞. The theorem is proved. □

Remark 8.7.3

Conditions (a) and (b) of Theorem 8.7.1A are essential for (8.7.31) to hold. To see this, consider random variables ξ and η with fixed distributions, Eξ=E η=0 and E ξ 2=E η 2=1. Let λ∈[0,1] and the random variable ξ (λ) be defined by

$$ \xi_{(\lambda)}:=\left \{ \begin{array}{l@{\quad }l} \xi &\mbox{with probability}\ 1-\lambda,\\ \frac{\eta}{\sqrt{\lambda}} &\mbox{with probability}\ \lambda, \end{array} \right . $$
(8.7.32)

so that Eξ (λ)=0 and \(\operatorname{Var}(\xi _{(\lambda )})=2-\lambda\) (in the case of the triangular array scheme one can put λ=1/n). Then, under the obvious notational conventions, for λ=t 2, t→0, we have

$$\begin{aligned} \varphi_{(\lambda)}(t)=(1-\lambda)\varphi_{\xi}(t)+ \lambda \varphi_{\eta} \biggl(\frac{t}{\sqrt{\lambda}} \biggr)= 1- \frac{3t^2}{2}+o\bigl(t^2\bigr)+ t^2 \varphi_{\eta}(1). \end{aligned}$$

This implies that (8.7.29) does not hold and hence condition a) is not met for the values of λ in the vicinity of zero. At the same time, the uniform versions of relation (8.7.31) and the central limit theorem will fail to hold. Indeed, putting λ=1/n, we obtain the triangular array scheme, in which the number ν n of the summands of the form \({\eta_{i}}/{\sqrt{\lambda}}\) in the sum \(S_{(\lambda)n}=\sum_{i=1}^{n}\xi_{(\lambda)i}\) converges in distribution to ν⊂=Π 1 and

$$\frac{1}{\sqrt{n(2-\lambda)}}\,S_{(\lambda)n}\stackrel{d}{=} \frac{S_{n-\nu_n}}{\sqrt{2n-1}}+ \frac{H_{\nu_n}}{\sqrt {2-1/n}},\quad\mbox{where } H_k=\sum _{i=1}^k\eta_i. $$

The first term on the right-hand side weakly converges in distribution to ζ⊂=Φ 0,1/2, while the second term converges to \({H_{\nu}}/{\sqrt{2}}\). Clearly, the sum of these independent summands is, generally speaking, not distributed normally with parameters (0,1).

To see that condition (b) is also essential, consider an arithmetic random variable ξ with Eξ=0 and \(\operatorname {Var}(\xi)=1\), take η to be a random variable with the uniform distribution U −1,1, and put

$$\xi_{(\lambda)}:=\left \{ \begin{array}{l@{\quad }l} \xi &\mbox{with probability}\ 1-\lambda,\\ \eta &\mbox{with probability} \ \lambda. \end{array} \right . $$

Here the random variable ξ (λ) is non-lattice (its distribution has an absolutely continuous component), but

$$\varphi_{(\lambda)}(2\pi)=(1-\lambda)+\lambda\varphi_{\eta}(2\pi ), \qquad q_{(\lambda)}\geq1-2\lambda. $$

Again putting λ=1/n, we get the triangular array scheme for which condition (b) is not met. Relation (8.7.31) does not hold either, since, in the previous notation, the sum S (λ)n is integer-valued with probability P(ν n =0)=e −1, so that its distribution will have atoms at integer points with probabilities comparable, by Theorem 8.7.3, with the right-hand side of (8.7.31). This clearly contradicts (8.7.31).

If we put λ=1/n 2 then the sum S (λ)n will be integer-valued with probability (1−1/n 2)n→1, and the failure of relation (8.7.31) becomes even more evident.

Uniform versions of the local Theorems 8.7.2 and 8.7.3 are established in a completely analogous way.

Theorem 8.7.2A

Let the distributions F (λ) satisfy the conditions of Theorem 8.7.1A with θ 2=∞ and the conditions of Theorem 8.7.2, in which conditions (a)–(c) are understood in the uniform sense (i.e., \(\max_{x} f_{S_{(\lambda)m}}(x)\) or the norm of \(f_{S_{(\lambda)m}}\) in L 2 or \(\int |\varphi_{(\lambda)}^{m}(t) |\,dt\) are bounded uniformly in λ∈[0,λ 1]).

Then representation (8.7.16) holds for \(f_{S_{(\lambda)n}}(x)\) uniformly in x and λ, provided that on its right-hand side we replace σ by σ(λ).

Proof

The conditions of Theorem 8.7.2A are such that they enable one to obtain the proof of the uniform version without any noticeable changes in the arguments proving Theorems 8.7.1A and 8.7.2. □

The following assertion is established in the same way.

Theorem 8.7.3A

Let the arithmetic distributions  F (λ) satisfy the conditions of Theorem 8.7.1A for θ 2=π. Then representation (8.7.17) holds uniformly in x and λ, provided that a and σ on its right-hand side are replaced with a(λ) and σ(λ), respectively.

Remark 8.7.3 applies to Theorems 8.7.2A and 8.7.3A as well.

8.8 Convergence to Other Limiting Laws

As we saw in previous sections, the normal law occupies a special place among all distributions—it is the limiting law for normed sums of arbitrary distributed random variables. There arises the natural question of whether there exist any other limiting laws for sums of independent random variables.

It is clear from the proof of Theorem 8.2.1 for identically distributed random variables that the character of the limiting law is determined by the behaviour of the ch.f. of the summands in the vicinity of 0. If E ξ=0 and E ξ 2=σ 2=−φ″(0) exist, then

$$\varphi \biggl( \frac{1}{\sqrt{n}} \biggr) =1+\frac{\varphi'' (0) t^2 }{2n} + o \biggl( \frac{1}{n} \biggr), $$

and this determines the asymptotic behaviour of the ch.f. of \({S_{n}} /\sqrt{n}\), equal to \(\varphi^{n} (t\sqrt{n} )\), which leads to the normal limiting law. Therefore, if one is looking for different limiting laws for the sums S n =ξ 1+⋯+ξ n , it is necessary to renounce the condition that the variance is finite or, which is the same, that φ″(0) exists. In this case, however, we will have to impose some conditions on the regular variation of the functions F +(x)=P(ξx) and/or F (x)=P(ξ<−x) as x→∞, which we will call the right and the left tail of the distribution of ξ, respectively. We will need the following concepts.

Definition 8.8.1

A positive (Lebesgue) measurable function L(t) is called a slowly varying function (s.v.f.) as t→∞, if, for any fixed v>0,

$$ \frac{L(vt)}{L(t)}\to1 \quad\mbox{as} \ t\to\infty. $$
(8.8.1)

A function V(t) is called a regularly varying function (r.v.f.) (of index −β) as t→∞ if it can be represented as

$$ V(t)=t^{-\beta}L(t), $$
(8.8.2)

where L(t) is an s.v.f. as t→∞.

One can easily see that, similarly to (8.8.1), the characteristic property of regularly varying functions is the convergence

$$ \frac{V(vt)}{V(t)}\to v^{-\beta}\quad\mbox{as}\ t\to\infty $$
(8.8.3)

for any fixed v>0. Thus an s.v.f. is an r.v.f. of index zero.

Among typical representatives the class of s.v.f.s are the logarithmic function and its powers lnγ t, \(\gamma\in\mathbb{R}\), linear combinations thereof, multiple logarithms, functions with the property that L(t)→L=const≠0 as t→∞ etc. As an example of a bounded oscillating s.v.f. we mention

$$L_0(t) = 2 + \sin(\ln\ln t),\quad t>1. $$

The main properties of r.v.f.s are given in Appendix 6.

As has already been noted, for S n /b(n) to converge to a “nondegenerate” limiting law under a suitable normalisation b(n), we will have to impose conditions on the regular variation of the distribution tails of ξ. More precisely, we will need a regular variation of the “two-sided tail”

$$F_0(t)=F_-(t)+F_+(t)=\mathbf{P} \bigl(\xi\notin[-t,t) \bigr). $$

We will assume that the following condition is satisfied for some β∈(0,2], ρ∈[−1,1]:

[R β,ρ ] The two-sided tail F 0(x)=F (x)+F +(x) is an r.v.f. as x→∞, i.e. it can be represented as

$$ F_0(x) =t^{-\beta} L_{F_0}(x), \quad\beta\in(0,2], $$
(8.8.4)

where \(L_{F_{0}}(x)\) is an s.v.f., and the following limit exists

$$ \rho_+ :=\lim_{x\to\infty} \frac{F_+ (x)}{F_0(x)}\in [0,1],\qquad \rho:=2\rho_+-1. $$
(8.8.5)

If ρ +>0, then clearly the right tail F +(x) is an r.v.f. like F 0(x), i.e. it can be represented as

$$F_+ (x) =V(x):= x^{-\beta} L (x), \quad\beta\in(0,2], \ L(x)\sim \rho_+ L_{F_0} (x). $$

(Here, and likewise in Appendix 6, we use the symbol V to denote an r.v.f.) If ρ +=0, then the right tail F +(x)=o(F 0(x)) is not assumed to be regularly varying.

Relation (8.8.5) implies that the following limit also exists

$$\rho_-:=\lim_{x\to\infty} \frac{F_- (x)}{F_0(x)}=1 - \rho_+. $$

If ρ >0, then, similarly to the case of the right tail, the left tail F (x) can be represented as

$$F_- (x) = W(x):= x^{-\beta} L_W (x), \quad\beta\in(0,2], \ L_W (x)\sim\rho_- L_{F_0} (x). $$

If ρ =0, then the left tail F (x)=o(F 0(x)) is not assumed to be regularly varying.

The parameters ρ ± are related to the parameter ρ in the notation [R β,ρ ] through the equalities

$$\rho= \rho_+ - \rho_- = 2\rho_+ - 1\in[-1,1]. $$

Clearly, in the case β<2 we have E ξ 2=∞, so that the representation

$$\varphi(t)=1-\frac{t^2\sigma^2}{2}+o\bigl(t^2\bigr)\quad\mbox{as}\ t\to0 $$

no longer holds, and the central limit theorem is not applicable. If E ξ exists and is finite then everywhere in what follows it will be assumed without loss of generality that

$$\mathbf{E}\xi=0. $$

Since F 0(x) is non-increasing, there always exists the “generalised” inverse function \(F_{0}^{(-1)}(u)\) understood as

$$F_0^{(-1)}(u):=\inf\bigl\{x:\, F_0(x)<u\bigr\}. $$

If the function F 0 is strictly monotone and continuous then \(b=F_{0}^{(-1)}(u)\) is the unique solution to the equation

$$F_0(b)=u, \quad u\in(0,1). $$

Set

$$\zeta_n :=\frac{S_n}{b(n)}, $$

wherein the case β>2 we define the normalising factor b(n) by

$$ b(n):=F_0^{(-1)} (1/n). $$
(8.8.6)

For β=2 put

$$ b(n):=Y^{(-1)}(1/n), $$
(8.8.7)

where

$$\begin{aligned} Y(x) :=& 2x^{-2} \int_0^x yF_0(y)\, dy = 2x^{-2} \biggl[\int_0^x yF_+(y)\, dy + \int_0^x yF_-(y)\, dy \biggr] \\= & x^{-2} \mathbf{E}\bigl(\xi^2; \,-x \le \xi<x\bigr)=x^{-2}L_Y(x), \end{aligned}$$
(8.8.8)

L Y is an s.v.f. (see Theorem A6.2.1(iv) in Appendix 6). It follows from Theorem A6.2.1(v) in Appendix 6 that, under condition (8.8.4), we have

$$b(n)=n^{1/\beta}L_b(n), \quad\beta\le2, $$

where L b is an s.v.f.

We introduce the functions

$$V_I(x) = \int_0^x V(y)\, dy, \qquad V^I(x) = \int_x^\infty V(y)\, dy. $$

8.8.1 The Integral Theorem

Theorem 8.7.3A

Let condition [R β,ρ ] be satisfied. Then the following assertions hold true.

  1. (i)

    For β∈(0,2), β≠1 and the normalising factor (8.8.6), as n→∞,

    $$ \zeta_n\Rightarrow\zeta^{(\beta,\rho)}. $$
    (8.8.9)

    The distribution F β,ρ of the random variable ζ (β,ρ) depends on parameters β and ρ only and has a ch.f. φ (β,ρ)(t), given by

    $$ \varphi^{(\beta,\rho)}(t):=\mathbf{E}e^{it \zeta^{(\beta,\rho)}} = \exp\bigl \{|t|^\beta B(\beta,\rho,\vartheta) \bigr\}, $$
    (8.8.10)

    where \(\vartheta={\rm sign}\, t\),

    $$ B(\beta,\rho,\vartheta) = \varGamma(1-\beta) \biggl[ i\rho\vartheta\sin \frac{\beta\pi}{2} -\cos\frac{\beta\pi}{2} \biggr] $$
    (8.8.11)

    and, for β∈(1,2), we put Γ(1−β)=Γ(2−β)/(1−β).

  2. (ii)

    When β=1, for the sequence ζ n with the normalising factor (8.8.6) to converge to a limiting law, the former, generally speaking, needs to be centred. More precisely, as n→∞, the following convergence takes place:

    $$ \zeta_n-A_n\Rightarrow\zeta^{(1,\rho)}, $$
    (8.8.12)

    where

    $$ A_n = \frac{n}{b(n)} \bigl[V_I \bigl(b(n)\bigr) - W_I \bigl(b(n)\bigr) \bigr] - \rho\,C, $$
    (8.8.13)

    C≈0.5772 is the Euler constant, and

    $$ \varphi^{(1,\rho)}(t) = \mathbf{E}e^{it \zeta^{(1,\rho)}} =\exp \biggl\{- \frac{\pi|t|}{2} - i\rho t\ln|t| \biggr\}. $$
    (8.8.14)

    If n[V I (b(n))−W I (b(n))]=o(b(n)), then ρ=0 and we can put A n =0.

    If E ξ exists and equals zero then

    $$A_n = \frac{n}{b(n)} \bigl[ W^I \bigl(b(n)\bigr) - V^I \bigl(b(n)\bigr) \bigr] - \rho\, C. $$

    If E ξ=0 and ρ≠0 then ρA n →−∞ as n→∞.

  3. (iii)

    For β=2 and the normalising factor (8.8.7), as n→∞,

    $$\zeta_n \Rightarrow\zeta^{(2,\rho)}, \qquad\varphi^{(2,\rho)}(t ):=\mathbf{E}e^{it \zeta^{(2,\rho )}}=e^{- {t^2}/{2}}, $$

    so that ζ (2,ρ) has the standard normal distribution that is independent of ρ.

The Proof of Theorem 8.8.1

is based on the same considerations as the proof of Theorem 8.2.1, i.e. on using the asymptotic behaviour of the ch.f. φ(t) in the vicinity of zero. But here it will be somewhat more difficult from the technical viewpoint. This is why the proof of Theorem 8.8.1 appears in Appendix 7. □

Remark 8.8.1

The last assertion of the theorem (for β=2) shows that the limiting distribution may be normal even in the case of infinite variance of ξ.

Besides with the normal distribution, we also note “extreme” limit distributions, corresponding to the ρ=±1 where the ch.f. φ (β,ρ) (or the respective Laplace transform) takes a very simple form. Let, for example, ρ=−1. Since e iπϑ/2=ϑi, then, for β≠1,2,

$$\begin{aligned} B(\beta,-1,\vartheta) =&-\varGamma(1-\beta) \biggl[i\sin\frac{\beta \pi\vartheta}{2}+ \cos\frac{\beta\pi\vartheta}{2} \biggr] \\=&-\varGamma(1-\beta)e^{{i\beta\pi\vartheta}/{2}}= -\varGamma(1-\beta) (i \vartheta)^\beta, \\\varphi^{(\beta,-1)}(t) =&\exp \bigl\{-\varGamma(1-\beta) (it)^\beta \bigr\}, \\\mathbf{E}\,e^{\lambda\zeta^{(\beta,-1)}} =&\exp \bigl\{-\varGamma (1-\beta ) \lambda^\beta \bigr\}, \quad\operatorname{Re}\lambda\geq0. \end{aligned}$$

Similarly, for β=1, by (8.8.14) and the equalities \(-\frac{\pi\vartheta}{2}=i\,\frac{i\pi\vartheta}{2}=i\ln i\vartheta\) we have

$$\begin{aligned} \ln\varphi^{(1,-1)}(t) =&-\frac{\pi\vartheta t}{2}+it\ln|t|= it\ln i \vartheta+it\ln|t|=it\ln it, \\\mathbf{E}\,e^{\lambda\zeta^{(1,-1)}} =&\exp\{\lambda\ln\lambda\} ,\quad \operatorname{Re} \lambda\geq0. \end{aligned}$$

A similar formula is valid for ρ=1.

Remark 8.8.2

If β<2, then by virtue of the properties of s.v.f.s (see Theorem A6.2.1(iv) in Appendix 6), as x→∞,

$$\int_0^x yF_0(y)\, dy = \int _0^x y^{1-\beta} L_{F_0}(y)\, dy\sim \frac{1}{2-\beta}\,x^{2-\beta}L_{F_0}(x)= \frac{1}{2-\beta}\,x^2 F_0(x). $$

Therefore, for β<2, we have Y(x)∼2(2−β)−1 F 0(x),

$$Y^{(-1)} (1/n) \sim F_0^{(-1)} \biggl( \frac{2-\beta}{2n} \biggr) \sim \biggl(\frac{2}{2-\beta} \biggr)^{1/\beta} F_0^{(-1)}(1/n) $$

(cf. (8.8.6)). On the other hand, for β=2 and σ 2:=E ξ 2<∞ one has

$$Y(x)\sim x^{-2}\sigma^2,\qquad b(n)=Y^{(-1)}(1/n) \sim \sqrt{\sigma}{n}. $$

Thus normalisation (8.8.7) is “transitional” from normalisation (8.8.6) (up to the constant factor (2/(2−β))1/β) to the standard normalisation \(\sigma\sqrt{n}\) in the central limit theorem in the case where E ξ 2<∞. This also means that normalisation (8.8.7) is “universal” and can be used for all β≤2 (as it is done in many textbooks on probability theory). However, as we will see below, in the case β<2 normalisation (8.8.6) is easier and simpler to deal with, and therefore we will use that scaling.

Recall that F β,ρ denotes the distribution of the random variable ζ (β,ρ). The parameter β takes values in the interval (0,2], the parameter ρ=ρ +ρ can assume any value from [−1,1]. The role of the parameters β and ρ will be clarified below.

Theorem 8.8.1 implies that each of the laws F β,ρ , 0<β≤2 and −1≤ρ≤1 is limiting for the distributions of suitably normalised sums of independent identically distributed random variables. It follows from the law of large numbers that the degenerate distribution I a concentrated at the point a is also a limiting one. Denote the set of all such distributions by \(\mathfrak{S}_{0}\). Furthermore, it is not hard to see that if F is a distribution from the class \(\mathfrak{S}_{0}\) then the law that differs from F by scaling and shifting, i.e. the distribution F {a,b} defined, for some fixed b>0 and a, by the relation

$$\mathbf{F}_{\{a,b\}} (B) := \mathbf{F} \biggl(\frac{B-a}{b} \biggr), \quad \mbox{where} \ \frac{B-a}{b}=\{u\in\mathbb{R}: ub+a\in B \}, $$

is also limiting for the distributions of sums of random variables (S n a n )/b n as n→∞ for appropriate {a n } and {b n }.

It turns out that the class of distributions \(\mathfrak{S}\) obtained by the above extension from \(\mathfrak{S}_{0} \) exhausts all the limiting laws for sums of identically distributed independent random variables.

Another characterisation of the class of limiting laws \(\mathfrak{S}\) is also possible.

Definition 8.8.2

We call a distribution F stable if, for any a 1, a 2, b 1>0, b 2>0, there exist a and b>0 such that

$$\mathbf{F}_{\{a_1,b_1\}} * \mathbf{F}_{\{a_2,b_2\}} = \mathbf{F}_{\{a,b\}}. $$

This definition means that the convolution of a stable distribution F with itself again yields the same distribution F, up to a scaling and shift (or, which is the same, for independent random variables ξ i ⊂=F we have (ξ 1+ξ 2a)/b⊂=F for appropriate a and b).

In terms of the ch.f. φ, the stability property has the following form: for any b 1>0 and b 2>0, there exist a and b>0 such that

$$ \varphi(t b_1)\varphi(t b_2) = e^{it a} \varphi(t b), \quad t \in\mathbb{R}. $$
(8.8.15)

Denote the class of all stable laws by \(\mathfrak{S}^{S}\). The remarkable fact is that the class of all limiting laws \(\mathfrak{S}\) (for (S n a n )/b n for some a n and b n ) and the class of all stable laws \(\mathfrak{S}^{\,S}\) coincide.

If, under a suitable normalisation, as n→∞,

$$\zeta_n\Rightarrow\zeta^{(\beta,\rho)}, $$

then one says that the distribution F of the summands ξ belongs to the domain of attraction of the stable law F β,ρ .

Theorem 8.8.1 means that, if F satisfies condition [R β,ρ ], then F belongs to the domain of attraction of the stable law F β,ρ .

One can prove the converse assertion (see e.g. Chap. XVII, § 5 in [30]): if F belongs to the domain of attraction of a stable law F β,ρ for β<2, then [R β,ρ ] is satisfied.

As for the role of the parameters β and ρ, note the following. The parameter β characterises the rate of convergence to zero as x→∞ for the functions

$$F_{\beta,\rho, -}(x) :=\mathbf{F}_{\beta,\rho}\bigl((-\infty, -x)\bigr) \quad \mbox{and}\quad F_{\beta,\rho, +}(x) :=\mathbf{F}_{\beta,\rho}\bigl([x,\infty )\bigr). $$

One can prove that, for ρ +>0, as t→∞,

$$ F_{\beta,\rho, +}(t) \sim\rho_+ t^{-\beta}, $$
(8.8.16)

and, for ρ >0, as t→∞,

$$ F_{\beta,\rho, -}(t) \sim\rho_- t^{-\beta}. $$
(8.8.17)

Note that, for ξ⊂=F β,ρ , the asymptotic relations in Theorem 8.8.1 turn into precise equalities provided that we replace in them b(n) with b n :=n 1/β. In particular,

$$ \mathbf{P} \biggl(\frac{S_n}{b_n} \geq t \biggr) = F_{\beta,\rho, +}(t). $$
(8.8.18)

This follows from the fact that [φ (β,ρ)(t/b n )]n coincides with φ (β,ρ)(t) (see (8.8.10)) and hence the distribution of the normalised sum S n /b n coincides with the distribution of the random variable ξ.

The parameter ρ taking values in [−1,1] is the measure of asymmetry of the distribution F β,ρ . If, for instance, ρ=1 (ρ =0), then, for β<1, the distribution F β,1 is concentrated entirely on the positive half-line. This is evident from the fact that in this case F β,1 can be considered as the limiting distribution for the normalised sums of independent identically distributed random variables ξ k ≥0 (with F (0)=0). Since all the prelimit distributions are concentrated on the positive half-line, so is the limiting distribution.

Similarly, for ρ=−1 and β<1, the distribution F β,−1 is entirely concentrated on the negative half-line. For ρ=0 (ρ +=ρ =1/2) the ch.f. of the distribution F β,0 will be real, and the distribution F β,0 itself is symmetric.

As we saw above, the ch.f.s φ (β,ρ)(t) of stable laws F β,ρ admit closed-form representations. They are clearly integrable over \(\mathbb{R}\), and the same is true for the functions t k φ (β,ρ)(t) for any k≥1. Therefore all the stable distributions have densities that are differentiable arbitrarily many times (see e.g. the inversion formula (7.2.1)). As for explicit forms of these densities, they are only known for a few laws. Among them are:

1. The normal law F 2,ρ (which does not depend on ρ).

2. The Cauchy distribution F 1,0 with density 2/(π 2+4x 2), −∞<x<∞. Scaling the x-axis with a factor of π/2 transforms this density into the form 1/π(1+x 2) corresponding to K 0,1.

3. The Lévy distribution. This law can be obtained from the explicit form for the distribution of the maximum of the Wiener process. This will be the distribution F 1/2,1 with parameters 1/2,1 and density (up to scaling; cf. (8.8.16))

$$f^{(1/2,1)}(x)=\frac{1}{\sqrt{2\pi}x^{3/2}}\,e^{- {1}/{(2x)}}, \quad x>0 $$

(this density has a first hitting time of level 1 by the standard Wiener process, see Theorem 19.2.2).

8.8.2 The Integro-Local and Local Theorems

Under the conditions of this section we can also obtain integro-local and local theorems in the same way as in Sect. 8.7 in the case of convergence to the normal law. As in Sect. 8.7, integro-local theorems deal here with the asymptotics of

$$\mathbf{P} \bigl(S_n\in \varDelta [x) \bigr),\qquad \varDelta [x)=[x,x+\varDelta ) $$

as n→∞ for a fixed Δ>0.

As we can see from Theorem 8.8.1, the ch.f. φ (β,ρ)(t) of the stable law F β,ρ is integrable, and hence, by the inversion formula, there exists a uniformly continuous density f (β,ρ) of the distribution F β,ρ . (As has already been noted, it is not difficult to show that f (β,ρ) is differentiable arbitrarily many times, see Sect. 7.2.)

Theorem 8.7.3A

(The Stone integro-local theorem)

Let ξ be a non-lattice random variable and the conditions of Theorem 8.8.1 be met. Then, for any fixed Δ>0, as n→∞,

$$ \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)= \frac{\varDelta }{b(n)}f^{(\beta,\rho)} \biggl(\frac{x}{b(n)} \biggr)+o \biggl( \frac{1}{b(n)} \biggr), $$
(8.8.19)

where the remainder term \(o (\frac{1}{b(n)} )\) is uniform over x.

If β=1 and E|ξ| does not exist then, on the right-hand side of (8.8.20), we must replace \(f^{(\beta,\rho)} (\frac{x}{b(n)} )\) with \(f^{(\beta,\rho)} (\frac{x}{b(n)}-A_{n} )\), where A n is defined in (8.8.13).

All the remarks to the integro-local Theorem 8.7.1 hold true here as well, with evident changes.

If the distribution of S n has a density then we can find the asymptotics of that density.

Theorem 8.7.3A

Let there exist an m≥1 such that at least one of conditions (a)(c) of Theorem 8.7.2 is satisfied. Moreover, let the conditions of Theorem 8.8.1 be met. Then for the density \(f_{S_{n}}(x)\) of the distribution of S n one has the representation

$$ f_{S_n}(x)=\frac{1}{b(n)}f^{(\beta,\rho)} \biggl( \frac{x}{b(n)} \biggr)+ o \biggl(\frac{1}{b(n)} \biggr) $$
(8.8.20)

which holds uniformly in x as n→∞.

If β=1 and E|ξ| does not exist then, on the right-hand side of (8.8.20), we must replace \(f^{(\beta,\rho)} (\frac{x}{b(n)} )\) with \(f^{(\beta,\rho)} (\frac{x}{b(n)}-A_{n} )\), where A n is defined in (8.8.13).

The assertion of Theorem 8.8.3 can be rewritten for \(\zeta_{n}=\frac{S_{n}}{b(n)}-A_{n}\) as

$$f_{\zeta_n}(v)\to f^{(\beta,\rho)}(v) $$

for any v as n→∞.

For integer-valued ξ k the following theorem holds true.

Theorem 8.7.3A

Let the distribution of ξ be arithmetic and the conditions of Theorem 8.8.1 be met. Then, uniformly for all integers x, as n→∞,

$$ \mathbf{P} (S_n=x)=\frac{1}{b(n)}\,f^{(\beta,\rho)} \biggl(\frac {x-an}{b(n)} \biggr)+ o \biggl(\frac{1}{\sqrt{n}} \biggr), $$
(8.8.21)

where a=Eξ if E |ξ| exists and a=0 if E |ξ| does not exist, β≠1. If β=1 and E|ξ| does not exist then, on the right-hand side of (8.8.21), we must replace \(f^{(\beta,\rho)} (\frac{x-an}{b(n)} )\) with \(f^{(\beta,\rho)} (\frac{x}{b(n)}-A_{n} )\).

The proofs of Theorems 8.8.2–8.8.4 mostly repeat those of Theorems 8.7.1–8.7.3 and can be found in Appendix 7.

8.8.3 An Example

In conclusion we will consider an example.

In Sect. 12.8 we will see that in the fair game considered in Example 4.2.3 the ruin time η(z) of a gambler with an initial capital of z units satisfies the relation \(\mathbf{P}(\eta(z) \ge n) \sim z\sqrt{2/{\pi n}}\) as n→∞. In particular, for z=1,

$$ \mathbf{P}\bigl(\eta(1) \ge n\bigr) \sim\sqrt{2/{\pi n}} . $$
(8.8.22)

It is not hard to see (for more detail, see also Chap. 12) that η(z) has the same distribution as η 1+η 2+⋯+η z , where η j are independent and distributed as η(1). Thus for studying the distribution of η(z) when z is large, by virtue of (8.8.22), one can make use of Theorem 8.8.4 (with β=1/2, b(n)=2n 2/π), by which

$$ \lim_{z \to\infty} \mathbf{P} \biggl( \frac{2\pi\eta(x)}{ z^2 } <x \biggr) =F_{1/{2,1}} (x) $$
(8.8.23)

is the Lévy stable law with parameters β=1/2 and ρ=1. Moreover, for integer x and z→∞,

$$\mathbf{P} \bigl(\eta(z)=x \bigr)= \frac{\pi}{2z^2}\,f^{(1/2,1)} \biggl(\frac{x\pi}{2z^2} \biggr)+o \biggl(\frac{1}{z^2} \biggr). $$

These assertions enable one to obtain the limiting distribution for the number of crossings of an arbitrary strip [u,v] by the trajectory S 1,…,S n in the case where

$$\mathbf{P}(\xi_k = -1)=\mathbf{P}(\xi_k = - 1)=1/2. $$

Indeed, let for simplicity u=0. By the first positive crossing of the strip [0,v] we will mean the Markov time

$$\eta_+ := \min\{ k : S_k = v \}. $$

The first negative crossing of the strip is then defined as the time η ++η , where

$$\eta_-: = \min\{ k :S_{\eta_+ + k} = 0 \}. $$

The time η 1=η ++η will also be the time of the “double crossing” of [0,v]. The variables η ± are distributed as η(v) and are independent, so that η 1 has the same distribution as η(2v). The variable H k =η 1(2v)+⋯+η k (2v), where η i (2v) have the same distribution as η(2v) and are independent, is the time of the k-th double crossing. Therefore

$$\nu(n) := \max\{ k : H_k \leq n \} = \min\{ k : H_k > n \} - 1 $$

is the number of double crossings of the strip [0,v] by time n. Now we can prove the following assertion:

$$ \lim_{n \to\infty} \mathbf{P} \biggl( \frac{\nu(n)}{\sqrt{n}} \geq x \biggr) = F_{1/2, 1} \biggl(\frac{\pi}{2 v^2 x^2} \biggr). $$
(8.8.24)

To prove it, we will make use of the following relation (which will play, in its more general form, an important role in Chap. 10):

$$\bigl\{ \nu(n) \geq k \bigr\} = \{ H_k \leq n \}, $$

where H k is distributed as η(2vk). If n/k 2s 2 as n→∞, then by virtue of (8.8.23)

$$\mathbf{P}(H_k \leq n) =\mathbf{P} \biggl( \frac{2 \pi H_k}{(2 v k )^2} \leq \frac{2 \pi n}{(2 v k)^2} \biggr) \to F_{1/2,1} \biggl(\frac{\pi s^2}{2v^2} \biggr), $$

and therefore

$$\mathbf{P} \biggl(\frac{\nu(n)}{\sqrt{n}} \geq x \biggr) = \mathbf {P}\bigl(\nu(n) \geq x \sqrt{n}\bigr) = \mathbf{P} (H_{\lfloor x \sqrt{n}\rfloor} \leq n) \to F_{1/2, 1} \biggl( \frac{\pi}{2 v^2 x^2} \biggr). $$

(Here for \(k = \lfloor x \sqrt{n}\rfloor\) one has n/k 2s 2=1/x 2.) Relation (8.8.24) is proved.  □

Assertion (8.8.24) will clearly remain true for the number of crossings of the strip [u,v], u≠0; one just has to replace v with vu on the right-hand side of (8.8.24). It is also clear that (8.8.24) enables one to find the limiting distribution of the number of “simple” (not double) crossings of [u,v] since the latter is equal to 2ν(n) or 2ν(n)+1.