Large Deviation Probabilities for Sums of Independent Random Variables

Borovkov, Alexandr A.

doi:10.1007/978-1-4471-5201-9_9

Alexandr A. Borovkov²

Part of the book series: Universitext ((UTX))

181k Accesses

Abstract

The material presented in this chapter is unique to the present text. After an introductory discussion of the concept and importance of large deviation probabilities, Cramér’s condition is introduced and the main properties of the Cramér and Laplace transforms are discussed in Sect. 9.1. A separate subsection is devoted to an in-depth analysis of the key properties of the large deviation rate function, followed by Sect. 9.2 establishing the fundamental relationship between large deviation probabilities for sums of random variables and those for sums of their Cramér transforms, and discussing the probabilistic meaning of the rate function. Then the logarithmic Large Deviations Principle is established. Section 9.3 presents integro-local, integral and local theorems on the exact asymptotic behaviour of the large deviation probabilities in the so-called Cramér range of deviations. Section 9.4 is devoted to analysing various types of the asymptotic behaviours of the large deviation probabilities for deviations at the boundary of the Cramér range that emerge under different assumptions on the distributions of the random summands. In Sect. 9.5, the behaviour of the large deviation probabilities is found in the case of heavy-tailed distributions, namely, when the distributions tails are regularly varying at infinity. These results are used in Sect. 9.6 to find the asymptotics of the large deviation probabilities beyond the Cramér range of deviations, under special assumptions on the distribution tails of the summands.

Access provided by Autonomous University of Puebla. Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Let ξ,ξ ₁,ξ ₂,… be a sequence of independent identically distributed random variables,

$$\mathbf{E}\xi_k=0,\qquad\mathbf{E}\xi_k^2= \sigma^2<\infty ,\qquad S_n=\sum _{k=1}^n\xi_k. $$

Suppose that we have to evaluate the probability P(S _n≥x). If $x\sim v\sqrt{n}$ as n→∞, v=const, then by the integral limit theorem

$$ \mathbf{P}(S_n\geq x)\sim1-\varPhi\biggl( \frac{v}{\sigma}\biggr) $$

(9.0.1)

as n→∞. But if $x\gg\sqrt{n}$, then the integral limit theorem enables one only to conclude that P(S _n≥x)→0 as n→∞, which in fact contains no quantitative information on the probability we are after. Essentially the same can happen for fixed but “relatively” large values of v/σ. For example, for v/σ≥3 and the values of n around 100, the relative accuracy of the approximation in (9.0.1) becomes, generally speaking, bad (the true value of the left-hand side can be several times greater or smaller than that of the right-hand side). Studying the asymptotic behaviour of P(S _n≥x) for $x\gg\sqrt{n}$ as n→∞, which is not known to us yet, could fill these gaps. This problem is highly relevant since questions of just this kind arise in many problems of mathematical statistics, insurance theory, the theory of queueing systems, etc. For instance, in mathematical statistics, finding small probabilities of errors of the first and second kind of statistical tests when the sample size n is large leads to such problems (e.g. see [7]). In these problems, we have to find explicit functions P(n,x) such that

$$ \mathbf{P}(S_n\geq x)=P(n,x) \bigl(1+o(1) \bigr) $$

(9.0.2)

as n→∞. Thus, unlike the case of normal approximation (9.0.1), here we are looking for approximations P(n,x) with a relatively small error rather than an absolutely small error. If P(n,x)→0 in (9.0.2) as n→∞, then we will speak of the probabilities of rare events, or of the probabilities of large deviations of sums S _n. Deviations of the order $\sqrt{n}$ are called normal deviations.

In order to study large deviation probabilities, we will need some notions and assertions.

9.1 Laplace’s and Cramér’s Transforms. The Rate Function

9.1.1 The Cramér Condition. Laplace’s and Cramér’s Transforms

In all the sections of this chapter, except for Sect. 9.5, the following Cramér condition will play an important role.

[C] :: There exists a λ≠0 such that
$$ \mathbf{E} e^{\lambda\xi}=\int e^{\lambda y}\mathbf{F}(dy)< \infty. $$
(9.1.1)

We will say that the right-side (left-side) Cramér condition holds if λ>0 (λ<0) in (9.1.1). If (9.1.1) is valid for some negative and positive λ (i.e. in a neighbourhood of the point λ=0), then we will say that the two-sided Cramér’s condition is satisfied.

The Cramér condition can be interpreted as characterising a fast (at least exponentially fast) rate of decay of the tails F _±(t) of the distribution F. If, for instance, we have (9.1.1) for λ>0, then by Chebyshev’s inequality, for t>0,

$$F_+(t):=\mathbf{P}(\xi\geq t)\leq e^{-\lambda t}\mathbf{E} e^{\lambda\xi}, $$

i.e. F ₊(t) decreases at least exponentially fast. Conversely, if, for some μ>0, one has F ₊(t)≤ce ^−μt, t>0, then, for λ∈(0,μ),

$$\begin{aligned} \int_0^\infty e^{\lambda y}\mathbf{F}(dy) =&- \int_0^\infty e^{\lambda y}\,dF_+(y)=F_+(0)+\lambda \int_0^\infty e^{\lambda y}F_+(y)\,dy \\\leq& F_+(0)+c\lambda\int_0^\infty e^{(\lambda-\mu)y}dy= F_+(0)+\frac{c\lambda}{\mu-\lambda}<\infty. \end{aligned}$$

Since the integral $\int_{-\infty}^{0} e^{\lambda y}\mathbf{F}(dy)$ is finite for any λ>0, we have E e ^λξ<∞ for λ∈(0,μ).

The situation is similar for the left tail F ₋(t):=P(ξ<−t) provided that (9.1.1) holds for some λ<0.

Set

$$\lambda_+:=\sup\bigl\{\lambda:\mathbf{E} e^{\lambda\xi}<\infty\bigr\} , \qquad \lambda_-:=\inf\bigl\{\lambda:\mathbf{E} e^{\lambda\xi}<\infty\bigr \}. $$

Condition [C] is equivalent to λ ₊>λ ₋. The right-side Cramér condition means that λ ₊>0; the two-sided condition means that λ ₊>0>λ ₋. Clearly, the ch.f. φ(t)=E e ^itξ is analytic in the complex plane in the strip $-\lambda_{+}<{\rm Im}\,t<-\lambda_{-}$. This follows from the differentiability of φ(t) in this region of the complex plane, since the integral ∫|ye ^ity|F(dy) for the said values of $\operatorname{Im} t$ converges uniformly in $\operatorname{Re}t$.

Here and henceforth by the Laplace transform (Laplace–Stieltjes or Laplace–Lebesgue) of the distribution F of the random variable ξ we shall mean the function

$$\psi(\lambda):=\mathbf{E} e^{\lambda\xi}=\varphi(-i\lambda), $$

which conflicts with Sect. 7.1.1 (and the terminology of mathematical analysis), according to which the term Laplace’s transform refers to the function E e ^−λξ=φ(iλ). The reason for such a slight inconsistency in terminology (only the sign of the argument differs, this changes almost nothing) is our reluctance to introduce new notation or to complicate the old notation. Nowhere below will it cause confusion.^{Footnote 1}

As well as condition [C], we will also assume that the random variable ξ is nondegenerate, i.e. $\xi\not\equiv\,{\rm const}$ or, which is the same, $\operatorname{Var}\xi>0$.

The main properties of Laplace’s transform.

As was already noted in Sect. 7.1.1, Laplace’s transform, like the ch.f., uniquely characterises the distribution F. Moreover, it has the following properties, which are similar to the corresponding properties of ch.f.s (see Sect. 7.1). Under obvious conventions of notation,

(Ψ1):

ψ _a+bξ(λ)=e ^λa ψ _ξ(bλ), if a and b are constant.

(Ψ2):

If ξ ₁,…,ξ _n are independent and $S_{n}=\sum_{j=1}^{n}\xi_{j}$, then

$$\psi_{S_n}(\lambda)=\prod_{j=1}^n \psi_{\xi_j}(\lambda). $$

(Ψ3):

If E|ξ|^k<∞ and the right-side Cramér condition is satisfied then the function ψ _ξ is k-times right differentiable at the point λ=0,

$$\psi_\xi^{(k)}(0)=\mathbf{E} \xi^k=:m_k $$

and, as λ↓0,

$$\psi_\xi(\lambda)=1+\sum_{j=1}^k \frac{\lambda^j}{j!} m_j+o\bigl(\lambda^k\bigr). $$

This also implies that, as λ↓0, the representation

$$ \ln\psi_\xi(\lambda)=\sum_{j=1}^k \frac{\gamma_j\lambda ^j}{j!}+o\bigl(\lambda^k\bigr), $$

(9.1.2)

holds, where γ _j are the so-called semi-invariants (or cumulants) of order j of the random variable ξ. One can easily verify that

$$ \gamma_1=m_1,\quad\gamma_2=m_2^0= \sigma^2,\quad\gamma_3=m_3^0, \quad\ldots, $$

(9.1.3)

where $m_{k}^{0}=\mathbf{E} (\xi-m_{1})^{k}$ is the central moment of order k.

Definition 9.1.1

Let condition [C] be met. The Cramér transform at the point λ of the distribution F is the distribution^{Footnote 2}

$$ \mathbf{F}_{(\lambda)}(dy)=\frac{e^{\lambda y}\mathbf{F}(dy)}{\psi(\lambda)}. $$

(9.1.4)

Clearly, the distributions F and F _(λ) are mutually absolutely continuous (see Sect. 3.5 of Appendix 3) with density

$$\frac{\mathbf{F}_{(\lambda)}(dy)}{\mathbf{F}(dy)}=\frac{e^{\lambda y}}{\psi(\lambda)}. $$

Denote a random variable with distribution F _(λ) by ξ _(λ).

The Laplace transform of the distribution F _(λ) is obviously equal to

$$ \mathbf{E} e^{\mu\xi_{(\lambda)}}=\frac{\psi(\lambda+\mu )}{\psi (\lambda)}. $$

(9.1.5)

Clearly,

$$\begin{aligned} \mathbf{E} \xi_{(\lambda)} =&\frac{\psi'(\lambda)}{\psi (\lambda )}= \bigl(\ln\psi(\lambda) \bigr)', \qquad\mathbf{E} \xi_{(\lambda)}^2= \frac{\psi''(\lambda)}{\psi (\lambda)}, \\\operatorname{Var}(\xi_{(\lambda)}) =&\frac{\psi''(\lambda)}{\psi (\lambda)}- \biggl( \frac{\psi'(\lambda)}{\psi(\lambda} \biggr)^2 = \bigl(\ln\psi(\lambda) \bigr)''. \end{aligned}$$

Since ψ″(λ)>0 and $\operatorname{Var}(\xi_{(\lambda )})>0$, the foregoing implies one more important property of the Laplace transform.

(Ψ4):

The functions ψ(λ) and lnψ(λ) are strictly convex, and

$$\mathbf{E} \xi_{(\lambda)}=\frac{\psi'(\lambda)}{\psi(\lambda)} $$

strictly increases on (λ ₋,λ ₊).

The analyticity of ψ(λ) in the strip $\operatorname{Re}\lambda\in(\lambda_{-},\lambda_{+})$ can be supplemented by the following “extended” continuity property on the segment [λ ₋,λ ₊] (in the strip ${\rm Re}\,\lambda\in[\lambda_{-},\lambda_{+}]$).

(Ψ5):: The function ψ(λ) is continuous “inside” [λ ₋,λ ₊], i.e. ψ(λ _±∓0)=ψ(λ _±) (where the cases ψ(λ _±)=∞ are not excluded).

Outside the segment [λ ₋,λ ₊] such continuity, generally speaking, does not hold as, for example, is the case when ψ(λ ₊)<∞ and ψ(λ ₊+0)=∞, which takes place, say, for the distribution F with density $f(x)=cx^{-3}e^{-\lambda_{+}x}$ for x≥1, c=const.

9.1.2 The Large Deviation Rate Function

Under condition [C], the large deviation rate function will play the determining role in the description of asymptotics of probabilities P(S _n≥x).

Definition 9.1.2

The large deviation rate function (or, for brevity, simply the rate function) Λ of a random variable ξ is defined by

$$ \varLambda(\alpha):=\sup_{\lambda} \bigl(\alpha \lambda-\ln\psi (\lambda) \bigr). $$

(9.1.6)

The meaning of the name will become clear later. In classical analysis, the right-hand side of (9.1.6) is known as the Legendre transform of the function lnψ(λ).

Consider the function A(α,λ)=αλ−lnψ(λ) of the supremum appearing in (9.1.6). The function −lnψ(λ) is strictly concave (see property (Ψ4)), and hence so is the function A(α,λ) (note also that A(α,λ)=−lnψ _α(λ), where ψ _α(λ)=e ^−λα ψ(λ) is the Laplace transform of the distribution of the random variable ξ−α and, therefore, from the “qualitative point of view”, A(α,λ) possesses all the properties of the function −lnψ(λ)). The foregoing implies that there always exists a unique point λ=λ(α) (on the “extended” real line [−∞,∞]) at which the supremum in (9.1.6) is attained. As α grows, the value of A(α,λ) for λ>0 increases (proportionally to λ), and for λ<0 it decreases. Therefore, the graph of A(α,λ) as the function of λ will, roughly speaking, “roll over” to the right as α grows. This means that the maximum point λ(α) will also move to the right (or stay at the same place if λ(α)=λ ₊).

We now turn to more precise formulations. On the interval [λ ₋,λ ₊], there exists the derivative (respectively, the right and the left derivative at the endpoints λ _±)

$$ A'_\lambda(\alpha,\lambda)= \alpha-\frac{\psi'(\lambda)}{\psi(\lambda)}. $$

(9.1.7)

The parameters

$$ \alpha_\pm=\frac{\psi'(\lambda_\pm\mp0)}{\psi(\lambda_\pm\mp 0)},\qquad \alpha_-<\alpha_+, $$

(9.1.8)

will play an important role in what follows. The value of α ₊ determines the angle at which the curve lnψ(λ) “sticks” into the point (λ ₊,lnψ(λ ₊)). The quantity α ₋ has a similar meaning. If α∈[α ₋,α ₊] then the equation $A'_{\lambda}(\alpha,\lambda){=}0$, or (see (9.1.7))

$$ \frac{\psi'(\lambda)}{\psi(\lambda)}=\alpha, $$

(9.1.9)

always has a unique solution λ(α) on the segment [λ ₋,λ ₊] (λ _± can be infinite). This solution λ(α), being the inverse of an analytical and strictly increasing function $\frac{\psi'(\lambda)}{\psi(\lambda )}$ on (λ ₋,λ ₊) (see (9.1.9)), is also analytical and strictly increasing on (α ₋,α ₊),

$$ \lambda(\alpha)\uparrow\lambda_+\quad\mbox{as}\ \alpha \uparrow \alpha_+;\qquad \lambda(\alpha)\downarrow\lambda_-\quad\mbox{as}\ \alpha \downarrow\alpha_-. $$

(9.1.10)

The equalities

$$ \varLambda(\alpha)=\alpha\lambda(\alpha)-\ln\psi \bigl(\lambda ( \alpha) \bigr),\qquad \frac{\psi'(\lambda(\alpha))}{\psi(\lambda(\alpha))}=\alpha $$

(9.1.11)

yield

$$\varLambda'(\alpha)=\lambda(\alpha)+\alpha\lambda'( \alpha)- \frac{\psi'(\lambda(\alpha))}{\psi(\lambda(\alpha))}\lambda '(\alpha) =\lambda(\alpha). $$

Recalling that

$$\frac{\psi'(0)}{\psi(0)}=m_1=\mathbf{ E}\xi,\quad 0\in[\lambda_-, \lambda_+],\ m_1\in[\alpha_-,\alpha_+], $$

we obtain the following representation for the function Λ:

(Λ1):

If α ₀∈[α ₋,α ₊], α∈[α ₋,α ₊] then

$$ \varLambda(\alpha)=\varLambda(\alpha_0)+\int _{\alpha_0}^\alpha\lambda(v)\,dv. $$

(9.1.12)

Since λ(m ₁)=Λ(m ₁)=0 (this follows from (9.1.9) and (9.1.11)), we obtain, in particular, for α ₀=m ₁, that

$$ \varLambda(\alpha)=\int_{m_1}^\alpha \lambda(v)\,dv. $$

(9.1.13)

The functions λ(α) and Λ(α) are analytic on (α ₋,α ₊).

Now consider what happens outside the segment [α ₋,α ₊]. Assume for definiteness that λ ₊>0. We will study the behaviour of the functions λ(α) and Λ(α) near the point α ₊ and for α>α ₊. Similar results hold true in the vicinity of the point α ₋ in the case λ ₋<0.

First let λ ₊=∞, i.e. the function lnψ(λ) is analytic on the whole semiaxis λ>0, and the tail F ₊(t) decays as t→∞ faster than any exponential function. Denote by

$$s_\pm=\pm\sup \bigl\{t:F_\pm(t)>0 \bigr\} $$

the boundaries of the support of F. Without loss of generality, we will assume that

$$ s_+>0,\quad s_-<0. $$

(9.1.14)

This can always be achieved by shifting the random variable, similarly to our assuming, without loss of generality, E ξ=0 in many theorems of Chap. 8, where we used the fact that the problem of studying the distribution of S _n is “invariant” with respect to a shift. (We can also note that Λ _ξ−a(α−a)=Λ _ξ(α), see property (Λ4) below, and that (9.1.14) always holds provided that E ξ=0.)

(Λ2):: (i) If λ ₊=∞ then α ₊=s ₊.

Hence, for s ₊=∞, we always have α ₊=∞ and so for any α≥α ₋ we are dealing with the already considered “regular” case, where (9.1.12) and (9.1.13) hold true.

(ii) If s ₊<∞ then λ ₊=∞, α ₊=s ₊,

$$\varLambda(\alpha_+)=-\ln\mathbf{P}(\xi=s_+),\qquad \varLambda(\alpha)=\infty \quad\mbox{for}\ \alpha>\alpha_+. $$

Similar assertions hold true for s ₋, α ₋, λ ₋.

Proof

(i) First let s ₊<∞. Then the asymptotics of ψ(λ) and ψ′(λ) as λ→∞ is determined by the integrals in a neighbourhood of the point s ₊: for any fixed ε>0,

$$\psi(\lambda)\sim\mathbf{E} \bigl(e^{\lambda\xi};\xi >s_+-\varepsilon \bigr),\qquad \psi'(\lambda)\sim\mathbf{E} \bigl(\xi e^{\lambda\xi};\xi >s_+-\varepsilon\bigr) $$

as λ→∞. Hence

$$\alpha_+=\lim_{\lambda\to\infty}\frac{\psi'(\lambda)}{\psi (\lambda)}= \lim _{\lambda\to\infty} \frac{\mathbf{E} (\xi e^{\lambda\xi};\xi>s_+-\varepsilon )}{\mathbf{E} (e^{\lambda\xi};\xi>s_+-\varepsilon)}=s_+. $$

If s ₊=∞, then lnψ(λ) grows as λ→∞ faster than any linear function and therefore the derivative (lnψ(λ))′ increases unboundedly, α ₊=∞.

(ii) The first two assertions are obvious. Further, let p ₊=P(ξ=s ₊)>0. Then

$$\begin{aligned} &\psi(\lambda)\sim p_+ e^{\lambda s_+}, \\& \alpha\lambda-\ln\psi(\lambda)=\alpha\lambda-\ln p_+ -\lambda s_+ +o(1)=( \alpha-\alpha_+)\lambda-\ln p_+ +o(1) \end{aligned}$$

as λ→∞. This and (9.1.11) imply that

$$\varLambda(\alpha)=\begin{cases} -\ln p_+ &\mbox{for}\ \alpha=\alpha_+,\\ \infty&\mbox{for}\ \alpha>\alpha_+. \end{cases} $$

If p ₊=0, then the relation $\psi(\lambda)=o(e^{\lambda s_{+}})$ as λ→∞ similarly implies Λ(α ₊)=∞. Property (Λ2) is proved. □

Now let 0<λ ₊<∞. If α ₊<∞, then necessarily ψ(λ ₊)<∞, ψ(λ ₊+0)=∞ and ψ′(λ ₊)<∞ (here we mean the left derivative). If we assume that ψ(λ ₊)=∞, then lnψ(λ ₊)=∞, (lnψ(λ))′→∞ as λ↑λ ₊ and α ₊=∞, which contradicts the assumption α ₊<∞. Since ψ(λ)=∞ for λ>λ ₊, the point λ(α), having reached the value λ ₊ as α grows, will stop at that point. So, for α≥α ₊, we have

$$ \lambda(\alpha)=\lambda_+,\qquad\varLambda(\alpha) =\alpha \lambda_+-\ln\psi(\lambda_+)= \varLambda(\alpha_+)+\lambda_+(\alpha-\alpha_+). $$

(9.1.15)

Thus, in this case, for α≥α ₊ the function λ(α) remains constant, while Λ(α) grows linearly. Relations (9.1.12) and (9.1.13) remain true.

If α ₊=∞, then α<α ₊ for all finite α≥α ₋, and we again deal with the “regular” case that we considered earlier (see (9.1.12) and (9.1.13)). Since λ(α) does not decrease, these relations imply the convexity of Λ(α).

In summary, we can formulate the following property.

(Λ3):: The functions λ(α) and Λ(α) can only be discontinuous at the points s _± and under the condition P(ξ=s _±)>0. These points separate the domain (s ₋,s ₊) where the function Λ is finite and continuous (in the extended sense) from the domain α∉[s ₋,s ₊] where Λ(α)=∞. In the domain [s ₋,s ₊] the function Λ is convex. (If we define convexity in the “extended” sense, i.e. including infinite values as well, then Λ is convex on the entire real line.) The function Λ is analytic in the interval (α ₋,α ₊). If λ ₊<∞ and α ₊<∞, then on the half-line (α ₊,∞) the function Λ(α) is linear with slope λ ₊; at the boundary point α ₊ the continuity of the first derivatives persists. If λ ₊=∞, then Λ(α)=∞ on (α ₊,∞). The function Λ(α) possesses a similar property on (−∞,α ₋).

If λ ₋=0, then α ₋=m ₁ and λ(α)=Λ(α)=0 for α≤m ₁.

Indeed, since λ(m ₁)=0 and ψ(λ)=∞ for λ<λ ₋=0=λ(m ₁), as the value of α decreases to α ₋=m ₁, the point λ(α), having reached the value 0, will stop, and λ(α)=0 for α≤α ₋=m ₁. This and the first identity in (9.1.11) also imply that Λ(α)=0 for α≤m ₁.

If λ ₋=λ ₊=0 (condition [C] is not met), then λ(α)=Λ(α)≡0 for all α. This is obvious, since the value of the function under the sup sign in (9.1.6) equals −∞ for all λ≠0. In this case the limit theorems presented in the forthcoming sections will be of little substance.

We will also need the following properties of the function Λ.

(Λ4):: Under obvious notational conventions, for independent random variables ξ and η, we have
$$\begin{aligned} \varLambda_{\xi+\eta}(\alpha) =&\sup_{\lambda} \bigl(\alpha \lambda -\ln\psi_\xi(\lambda)- \ln\psi_\eta(\lambda) \bigr)= \inf_\gamma \bigl(\varLambda_\xi(\gamma)+ \varLambda_\eta(\alpha-\gamma ) \bigr), \\\varLambda_{c\xi+b}(\alpha) =&\sup_\lambda \bigl(\alpha \lambda -\lambda b-\ln\psi_\xi(\lambda c) \bigr)=\varLambda_\xi \biggl(\frac{\alpha-b}{c} \biggr). \end{aligned}$$

Clearly, inf_γ in the former relation is attained at the point γ at which λ _ξ(γ)=λ _η(α−γ). If ξ and η are identically distributed then γ=α/2 and therefore

$$\varLambda_{\xi+\eta}(\alpha)= \varLambda_\xi \biggl( \frac{\alpha}{2} \biggr)+\varLambda_\eta \biggl(\frac{\alpha}{2} \biggr)= 2\varLambda_\xi \biggl(\frac{\alpha}{2} \biggr). $$

(Λ5):

The function Λ(α) attains its minimal value 0 at the point α=E ξ=m ₁. For definiteness, assume that α ₊>0. If m ₁=0 and E|ξ ^k|<∞, then

$$ \lambda(0)=\varLambda(0)=\varLambda'(0)=0,\quad \varLambda''(0)=\frac{1}{\gamma_2},\quad \varLambda'''(0)= -\frac{\gamma_3}{\gamma_2^2}, \quad\ldots $$

(9.1.16)

(In the case α ₋=0 the right derivatives are intended.) As α↓0, one has the representation

$$ \varLambda(\alpha)=\sum _{j=2}^k\frac{\varLambda^{(j)}(0)}{j!}\alpha ^j+o \bigl(\alpha^k\bigr). $$

(9.1.17)

The semi-invariants γ _j were defined in (9.1.2) and (9.1.3).

If the two-sided Cramér condition is satisfied then the series expansion (9.1.17) of the function Λ(α) holds for k=∞. This series is called the Cramér series.

Verifying properties (Λ4) and (Λ5) is not difficult, and is left to the reader.

(Λ6):: The following inversion formula is valid: for λ∈(λ ₋,λ ₊),
$$ \ln\psi(\lambda)=\sup_\alpha \bigl(\alpha \lambda-\varLambda(\alpha ) \bigr). $$
(9.1.18)

This means that the rate function uniquely determines the Laplace transform ψ(λ) and hence the distribution F as well. Formula (9.1.18) also means that subsequent double applications of the Legendre transform to the convex function lnψ(λ) leads to the same original function.

Proof

We denote by T(λ) the right-hand side of (9.1.18) and show that T(λ)=lnψ(λ) for λ∈(λ ₋,λ ₊). If, in order to find the supremum in (9.1.18), we equate to zero the derivative in α of the function under the sup sign, then we will get the equation

$$ \lambda=\varLambda'(\alpha)=\lambda(\alpha). $$

(9.1.19)

Since λ(α), α∈(α ₋,α ₊), is the function inverse to (lnψ(λ))′ (see (9.1.9)), for λ∈(λ ₋,λ ₊) Eq. (9.1.19) clearly has the solution

$$ \alpha=a(\lambda):= \bigl(\ln\psi(\lambda) \bigr)'. $$

(9.1.20)

Taking into account the fact that λ(a(λ))≡λ, we obtain

$$\begin{aligned} T(\lambda) =&\lambda a(\lambda)-\varLambda \bigl(a(\lambda) \bigr), \\T'(\lambda) =&a(\lambda)+\lambda a'(\lambda)-\lambda \bigl(a(\lambda) \bigr)a'(\lambda)=a(\lambda). \end{aligned}$$

Since a(0)=m ₁ and T(0)=−Λ(m ₁)=0, we have

$$ T(\lambda)=\int_0^\lambda a(u)\,du=\ln \psi(\lambda). $$

(9.1.21)

The assertion is proved, and so is yet another inversion formula (the last equality in (9.1.21), which expresses lnψ(λ) as the integral of the function a(λ) inverse to λ(α)). □

(Λ7):: The exponential Chebyshev inequality. For α≥m ₁, we have
$$\mathbf{P}(S_n\geq\alpha n)\leq e^{-n\varLambda(\alpha)}. $$

Proof

If α≥m ₁, then λ(α)≥0. For λ=λ(α)≥0, we have

$$\begin{aligned} \psi^n(\lambda) \geq&\mathbf{E} \bigl(e^{\lambda S_n}; \,S_n\geq\alpha n\bigr)\geq e^{\lambda\alpha n}\mathbf{P}(S_n \geq\alpha n); \\\mathbf{P}(S_n\geq\alpha n) \leq& e^{-\alpha n\lambda(\alpha)+n\ln\psi(\lambda(\alpha))}=e^{-n\varLambda(\alpha)}. \end{aligned}$$

□

We now consider a few examples, where the values of λ _±, α _±, and the functions ψ(λ), λ(α), Λ(α) can be calculated in an explicit form.

Example 9.1.1

If ξ⊂=Φ _0,1, then

$$\psi(\lambda)=e^{{\lambda^2}/{2}},\qquad|\lambda_\pm|=|\alpha _\pm|=\infty,\qquad \lambda(\alpha)=\alpha,\qquad\varLambda(\alpha)= \frac{\alpha^2}{2}. $$

Example 9.1.2

For the Bernoulli scheme ξ⊂=B _p, we have

$$\begin{aligned} \psi(\lambda) =&pe^{\lambda}+q,\qquad|\lambda_\pm|=\infty,\quad \alpha_+=1,\quad\alpha_-=0,\quad m_1=\mathbf{E} \xi=p, \\\lambda(\alpha) =&\ln\frac{\alpha(1-p)}{p(1-\alpha)}, \quad \varLambda(\alpha)=\alpha\ln \frac{\alpha}{p}+(1-\alpha)\ln\frac {1-\alpha}{1-p} \quad\mbox{for }\alpha\in(0,1), \\\varLambda(0) =&-\ln(1-p),\quad\varLambda(1)=-\ln p,\quad \varLambda(\alpha)= \infty\quad\mbox{for}\ \alpha\notin[0,1]. \end{aligned}$$

Thus the function H(α)=Λ(α), which described large deviation probabilities for S _n in the local Theorem 5.2.1 for the Bernoulli scheme, is nothing else but the rate function. Below, in Sect. 9.3, we will obtain generalisations of Theorem 5.2.1 for arbitrary arithmetic distributions.

Example 9.1.3

For the exponential distribution Γ _β, we have

$$\begin{aligned} \psi(\lambda) =&\frac{\beta}{\beta-\lambda},\quad\lambda_+= \beta,\quad\lambda_-=-\infty, \quad\alpha_+=\infty,\quad\alpha_-=0,\quad m_1=\frac{1}{\beta}, \\\lambda(\alpha) =&\beta-\frac{1}{\alpha},\quad \varLambda(\alpha)=\alpha \beta-1-\ln\alpha\beta\quad\mbox{for}\ \alpha>0. \end{aligned}$$

Example 9.1.4

For the centred Poisson distribution with parameter β, we have

$$\begin{aligned} \psi(\lambda) =&\exp \bigl\{\beta\bigl[e^\lambda-1-\lambda\bigr] \bigr\} ,\quad|\lambda_\pm|=\infty,\quad \alpha_-=-\beta,\quad\alpha_+= \infty,\quad m_1=0, \\\lambda(\alpha) =&\ln\frac{\beta+\alpha}{\beta},\quad \varLambda(\alpha)=(\alpha+\beta) \ln\frac{\alpha+\beta}{\beta }-\alpha\quad\mbox{for }\alpha>-\beta. \end{aligned}$$

9.2 A Relationship Between Large Deviation Probabilities for Sums of Random Variables and Those for Sums of Their Cramér Transforms. The Probabilistic Meaning of the Rate Function

9.2.1 A Relationship Between Large Deviation Probabilities for Sums of Random Variables and Those for Sums of Their Cramér Transforms

Consider the Cramér transform of F at the point λ=λ(α) for α∈[α ₋,α ₊] and introduce the notation ξ ^(α):=ξ _(λ(α)),

$$S_n^{(\alpha)}:=\sum_{i=1}^n \xi^{(\alpha)}_i, $$

where $\xi_{i}^{(\alpha)}$ are independent copies of ξ ^(α). The distribution F ^(α):=F _(λ(α)) of the random variable ξ ^(α) is called the Cramér transform of F with parameter α. The random variables ξ ^(α) are also called Cramér transforms, but of the original random variable ξ. The relationship between the distributions of S _n and $S_{n}^{(\alpha)}$ is established in the following assertion.

Theorem 9.2.1

For x=nα, α∈(α ₋,α ₊), and any t>0, one has

$$ \mathbf{P} \bigl(S_n\in[x,x+t) \bigr)=e^{-n\varLambda(\alpha)} \int_0^te^{-\lambda(\alpha)z}\mathbf{P} \bigl(S_n^{(\alpha)}-\alpha n\in dz \bigr). $$

(9.2.1)

Proof

The Laplace transform of the distribution of the sum $S_{n}^{(\alpha)}$ is clearly equal to

$$ \mathbf{E} e^{\mu S_n^{(\alpha)}}= \biggl[ \frac{\psi(\mu+\lambda(\alpha))}{\psi(\lambda(\alpha))} \biggr]^n $$

(9.2.2)

(see (9.1.5)). On the other hand, consider the Cramér transform (S _n)_(λ(α)) of S _n at the point λ(α). Applying (9.1.5) to the distribution of S _n, we obtain

$$\mathbf{E} e^{\mu(S_n)_{(\lambda(\alpha))}}= \frac{\psi^n(\mu+\lambda(\alpha))}{\psi^n(\lambda(\alpha))}. $$

Since this expression coincides with (9.2.2), the Cramér transform of S _n at the point λ(α) coincides in distribution with the sum $S_{n}^{(\alpha)}$ of the transforms $\xi_{i}^{(\alpha)}$. In other words,

$$ \frac{\mathbf{P}(S_n\in dv)e^{\lambda(\alpha)v}}{\psi^n(\lambda (\alpha))}= \mathbf{P} \bigl(S_n^{(\alpha)}\in dv\bigr) $$

(9.2.3)

or, which is the same,

$$\begin{aligned} \mathbf{P}(S_n\in dv) =&e^{-\lambda(\alpha)v+n\ln\psi(\lambda(\alpha))}\mathbf{P} \bigl(S_n^{(\alpha)}\in dv\bigr)=e^{-n\varLambda(\alpha)+\lambda(\alpha)(n\alpha-v)}\mathbf{P} \bigl(S_n^{(\alpha)}\in dv\bigr). \end{aligned}$$

Integrating this equality in ν from x to x+t, letting x:=nα and making the change of variables v−nα=z, we get

$$\begin{aligned} \mathbf{P} \bigl(S_n\in[x,x+t) \bigr) =&e^{-n\varLambda(\alpha)}\int _x^{x+t} e^{\lambda(\alpha)(n\alpha-v)}\mathbf{P} \bigl(S_n^{(\alpha)}\in dv\bigr) \\=& e^{-n\varLambda(\alpha)}\int_0^t e^{-\lambda(\alpha)z} \mathbf{P}\bigl(S_n^{(\alpha)}-\alpha n\in dz\bigr). \end{aligned}$$

The theorem is proved. □

Since for α∈[α ₋,α ₊] we have

$$\mathbf{E} \xi^{(\alpha)}=\frac{\psi'(\lambda(\alpha))}{\psi (\lambda (\alpha))}=\alpha $$

(see (9.1.11)), one has $\mathbf{E} (S_{n}^{(\alpha)}-\alpha n)=0$ and so for $t\leq c\sqrt{n}$ we have probabilities of normal deviations of $S_{n}^{(\alpha)}-\alpha n$ on the right-hand side of (9.2.1). This allows us to reduce the problem on large deviations of S _n to the problem on normal deviations of $S^{(\alpha)}_{n}$. If α>α ₊, then formula (9.2.1) is still rather useful, as will be shown in Sects. 9.4 and 9.5.

9.2.2 The Probabilistic Meaning of the Rate Function

In this section we will prove the following assertion, which clarifies the probabilistic meaning of the function Λ(α).

Denote by Δ[α):=[α,α+Δ) the interval of length Δ with the left end at the point α. The notation Δ _n[α), where Δ _n depends on n, will have a similar meaning.

Theorem 9.2.2

For each fixed α and all sequences Δ _n converging to 0 as n→∞ slowly enough, one has

$$ \varLambda(\alpha)=-\lim_{n\to\infty} \frac{1}{n}\,\ln\mathbf {P} \biggl(\frac{S_n}{n}\in \varDelta _n[\alpha) \biggr). $$

(9.2.4)

This relation can also be written as

$$\mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta _n[\alpha) \biggr)= e^{-n\varLambda(\alpha)+o(n)}. $$

Proof of Theorem 9.2.2

First let α∈(α ₋,α ₊). Then

$$\mathbf{E} \xi^{(\alpha)}=\alpha,\qquad \operatorname{Var} \xi^{(\alpha)}= \bigl(\ln\psi(\lambda) \bigr)''_{\lambda =\lambda(\alpha)}< \infty $$

and hence, as n→∞ and Δ _n→0 slowly enough (e.g., for Δ _n≥n ^−1/3), by the central limit theorem we have

$$\mathbf{P} \bigl(S_n^{(\alpha)}-\alpha n\in[0, \varDelta _n n) \bigr)\to1/2. $$

Therefore, by Theorem 9.2.1 for t=Δ _n n, x=αn and by the mean value theorem,

$$\begin{aligned} \mathbf{P} \bigl(S_n\in[x,x+t) \bigr) =& \biggl( \frac{1}{2}+o(1) \biggr)\, e^{-n\varLambda(\alpha)-\lambda(\alpha)\varDelta _n n\theta},\quad\theta\in(0,1); \\\frac{1}{n}\,\ln\mathbf{P} \bigl(S_n\in[x,x+t) \bigr) = &- \varLambda(\alpha)-\lambda(\alpha)\theta \varDelta _n+o(1)=-\varLambda ( \alpha)+o(1) \end{aligned}$$

as n→∞. This proves (9.2.4) for α∈(α ₋,α ₊).

The further proof is divided into three stages.

(1) The upper bound in the general case. Now let α be arbitrary and |λ(α)|<∞. By Theorem 9.2.1 for t=nΔ _n, we have

$$\mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta _n[\alpha) \biggr)\leq \exp \bigl\{-n\varLambda(\alpha)+\max \bigl(\big|\lambda(0)\big|,\big|\lambda (\alpha)\big| \bigr)n\varDelta _n \bigr\}. $$

If Δ _n→0 then

$$ \limsup_{n\to\infty} \frac{1}{n}\ln \mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta _n[\alpha ) \biggr)\leq - \varLambda(\alpha). $$

(9.2.5)

(This inequality can also be obtained from the exponential Chebyshev’s inequality (Λ7).)

(2) The lower bound in the general case. Let |λ(α)|<∞ and |s _±|=∞. Introduce “truncated” random variables ^(N) ξ with the distribution

$$\mathbf{P}\bigl({}^{(N)}\xi\in B\bigr)=\frac{\mathbf{P}(\xi\in B;\,|\xi|<N)}{\mathbf{P}(|\xi|<N)}=\mathbf{P} \bigl( \xi\in B \bigm| |\xi|<N \bigr) $$

and endow all the symbols that correspond to ^(N) ξ with the left superscript (N). Then clearly, for each λ,

$$\mathbf{E} \bigl(e^{\lambda\xi};|\xi|<N \bigr)\uparrow\psi (\lambda ), \qquad \mathbf{P} \bigl(|\xi|<N \bigr)\uparrow1 $$

as N→∞, so that

$${}^{(N\!)}\!\psi(\lambda)=\frac{\mathbf{E} (e^{\lambda\xi};\, |\xi |<N)}{\mathbf{P}(|\xi|<N)} \to\psi(\lambda). $$

The functions ^(N) Λ(α) and Λ(α) are the upper bounds for the concave functions αλ−ln^(N) ψ(λ) and αλ−lnψ(λ), respectively. Therefore for each α we also have convergence ^(N) Λ(α)→Λ(α) as N→∞.

Further,

$$\begin{aligned} \mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta _n[\alpha) \biggr) \geq& \mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta _n[\alpha); |\xi_j|<N, j=1,\ldots,N \biggr) \\=& \mathbf{P}^n \bigl(|\xi|<N \bigr) \mathbf{P} \biggl(\frac{{}^{(N)}\!S_n}{n} \in \varDelta _n[\alpha) \biggr). \end{aligned}$$

Since s _±=±∞, one has ^(N) α _±=±N and, for N large enough, we have α∈(^(N) α ₋,^(N) α ₊). Hence we can apply the first part of the proof of the theorem by virtue of which, as Δ _n→0,

$$\begin{aligned} \frac{1}{n}\,\ln\mathbf{P} \biggl(\frac{{}^{(N)}\!S_n}{n}\in \varDelta _n[\alpha) \biggr) =& -{}^{(N)}\!\varLambda(\alpha)+o(1), \\\frac{1}{n}\,\ln\mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta _n[\alpha ) \biggr) \geq& -{}^{(N)}\!\varLambda( \alpha)+o(1)+\ln\mathbf{P} \bigl(|\xi|<N \bigr). \end{aligned}$$

The right-hand side of the last inequality can be made arbitrarily close to −Λ(α) by choosing a suitable N. Since the left-hand side of this inequality does not depend on N, we have

$$ \liminf_{n\to\infty}\frac{1}{n}\,\ln\mathbf{P} \biggl(\frac {S_n}{n}\in \varDelta _n[\alpha) \biggr) \geq- \varLambda(\alpha). $$

(9.2.6)

Together with (9.2.5), this proves (9.2.4).

(3) It remains to remove the restrictions stated at the beginning of stages (1) and (2) of the proof, i.e. to consider the cases |λ(α)|=∞ and min|s _±|<∞. These two relations are connected with each other since, for instance, the equality λ(α)=λ ₊=∞ can only hold if α≥α ₊=s ₊<∞ (see property (Λ2)). For α>s ₊, relation (9.2.4) is evident, since P(S _n/n∈Δ _n[α))=0 and Λ(α)=∞. For α=α ₊=s ₊ and p ₊=P(ξ=s ₊), we have, for any Δ>0,

$$ \mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta [\alpha_+) \biggr)= \mathbf{P} (S_n=n\alpha_+)=p_+^n. $$

(9.2.7)

Since in this case Λ(α ₊)=−lnp ₊ (see (Λ2)), the equality (9.2.4) holds true.

The case λ(α)=λ ₋=−∞ with s ₋>−∞ is considered in a similar way. However, due to the asymmetry of the interval Δ[α) with respect to the point α, there are small differences. Instead of an equality in (9.2.7) we only have the inequality

$$ \mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta _n[ \alpha_-) \biggr)\geq \mathbf{P}(S_n=n\alpha_-)=p^n_-, \qquad p_-=\mathbf{P}(\xi=\alpha_-). $$

(9.2.8)

Therefore we also have to use the exponential Chebyshev’s inequality (see (Λ7)) applying it to −S _n for s ₋=α ₋<0:

$$ \mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta _n[ \alpha_-) \biggr)\leq \mathbf{P} \biggl(\frac{S_n}{n}<\alpha_-+ \varDelta _n \biggr)\leq e^{-n\varLambda(\alpha_-+\varDelta _n)}. $$

(9.2.9)

Relations (9.2.8), (9.2.9), the equality Λ(α ₋)=−lnp ₋, and the right continuity of Λ(α) at the point α ₋ imply (9.2.4) for α=α ₋. The theorem is proved. □

9.2.3 The Large Deviations Principle

It is not hard to derive from Theorem 9.2.2 a corollary on the asymptotics of the probabilities of S _n/n hitting an arbitrary Borel set. Denote by (B) and [B] the interior and the closure of B, respectively ((B) is the union of all open intervals contained in B). Put

$$\varLambda(B):=\inf_{\alpha\in B}\varLambda(\alpha). $$

Theorem 9.2.3

For any Borel set B, the following inequalities hold:

$$\begin{aligned} \liminf_{n\to\infty}\frac{1}{n}\ln\mathbf{P} \biggl( \frac {S_n}{n}\in B \biggr) \geq& -\varLambda \bigl((B) \bigr), \end{aligned}$$

(9.2.10)

$$\begin{aligned} \limsup_{n\to\infty}\frac{1}{n}\ln\mathbf{P} \biggl( \frac {S_n}{n}\in B \biggr) \leq& -\varLambda \bigl([B] \bigr). \end{aligned}$$

(9.2.11)

If Λ((B))=Λ([B]), then the following limit exists:

$$ \lim_{n\to\infty}\frac{1}{n}\,\ln\mathbf{P} \biggl(\frac {S_n}{n}\in B \biggr)=-\varLambda(B). $$

(9.2.12)

This assertion is called the large deviation principle. It is one of the so-called “rough” (“logarithmic”) limit theorems that describe the asymptotic behaviour of lnP(S _n/n∈B). It is usually impossible to derive from this assertion the asymptotics of the probability P(S _n/n∈B) itself. (In the equality P(S _n/n∈B)=exp{−nΛ(B)+o(n)}, the term o(n) may grow in absolute value.)

Proof

Without losing generality, we can assume that B⊂[s ₋,s ₊] (since Λ(α)=∞ outside that domain).

We first prove (9.2.10). Let α _(B) be such that

$$\varLambda \bigl((B) \bigr)\equiv\inf _{\alpha\in (B)}\varLambda(\alpha)=\varLambda( \alpha_{(B)}) $$

(recall that Λ(α) is continuous on [s ₋,s ₊]). Then there exist a sequence of points α _k and a sequence of intervals (α _k−δ _k,α _k+δ _k), where δ _k→0, lying in (B) and converging to the point α _(B), such that

$$\varLambda \bigl((B) \bigr)=\inf_k\,\varLambda \bigl(( \alpha_k-\delta _k,\alpha_k+ \delta_k) \bigr). $$

Here clearly

$$\inf_k\,\varLambda \bigl((\alpha_k- \delta_k,\alpha_k+\delta_k) \bigr)= \inf _k\,\varLambda(\alpha_k), $$

and for a given ε>0, there exists a k=K such that Λ(α _K)<Λ((B))+ε. Since Δ _n[α _k)⊂(α _k−δ _k,α _k+δ _k) for large enough n (here Δ _n[α _k) is from Theorem 9.2.2), we have by Theorem 9.2.2 that, as n→∞,

$$\begin{aligned} \frac{1}{n}\ln\mathbf{P} \biggl(\frac{S_n}{n}\in B \biggr) \geq & \frac{1}{n}\ln\mathbf{P} \biggl(\frac{S_n}{n}\in(B) \biggr)\\\geq& \frac{1}{n}\ln\mathbf{P} \biggl(\frac{S_n}{n}\in( \alpha_K-\delta _K,\alpha_K+ \delta_K) \biggr) \\\geq& \frac{1}{n}\ln\mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta _n[\alpha _K) \biggr)\geq -\varLambda( \alpha_K)+o(1)\\\geq& -\varLambda \bigl((B) \bigr)-\varepsilon+o(1). \end{aligned}$$

As the left-hand side of this inequality does not depend on ε, inequality (9.2.10) is proved.

We now prove inequality (9.2.11). Denote by α _[B] the point at which inf_α∈[B] Λ(α)=Λ(α _[B]) is attained (this point always belongs to [B] since [B] is closed). If Λ(α _[B])=0, then the inequality is evident. Now let Λ(α _[B])>0. By convexity of Λ the equation Λ(α)=Λ(α _[B]) can have a second solution $\alpha'_{[B]}$. Assume it exists and, for definiteness, $\alpha'_{[B]}<\alpha_{[B]}$. The relation Λ([B])=Λ(α _[B]) means that the set [B] does not intersect with $(\alpha'_{[B]},\alpha_{[B]})$ and

$$ \mathbf{P} \biggl(\frac{S_n}{n}\in B \biggr)\leq\mathbf{P} \biggl(\frac{S_n}{n}\in [B] \biggr)\leq\mathbf{P} \biggl(\frac{S_n}{n} \leq\alpha '_{[B]} \biggr)+ \mathbf{P} \biggl( \frac{S_n}{n}\geq\alpha_{[B]} \biggr). $$

(9.2.13)

Moreover, in this case $m_{1}\in(\alpha'_{[B]},\alpha_{[B]})$ and each of the probabilities on the right-hand side of (9.2.13) can be bounded using the exponential Chebyshev’s inequality (see (Λ7)) by the value $e^{-n\varLambda(\alpha_{[B]})}$. This implies (9.2.11).

If the second solution $\alpha'_{[B]}$ does not exist, then one of the summands on the right-hand side of (9.2.13) equals zero, and we obtain the same result.

The second assertion of the theorem (Eq. (9.2.12)) is evident.

The theorem is proved. □

Using Theorem 9.2.3, we can complement Theorem 9.2.2 with the following assertion.

Corollary 9.2.1

The following limit always exists

$$ \lim_{\varDelta \to0}\lim_{n\to\infty} \frac{1}{n}\, \ln\mathbf{P} \biggl(\frac{S_n}{n}\in \varDelta [\alpha ) \biggr)=-\varLambda (\alpha). $$

(9.2.14)

Proof

Take the set B in Theorem 9.2.3 to be the interval B=Δ[α). If α∉[s ₋,s ₊] then the assertion is obvious (since both sides of (9.2.14) are equal to −∞). If α=s _± then (9.2.14) is already proved in (9.2.7), (9.2.8) and (9.2.9).

It remains to consider points α∈(s ₋,s ₊). For such α, the function Λ(α) is continuous and α+Δ is also a point of continuity of Λ for Δ small enough, and hence

$$\varLambda \bigl((B) \bigr)=\varLambda \bigl([B] \bigr)\to\varLambda(\alpha) $$

as Δ→0. Therefore by Theorem 9.2.3 the inner limit in (9.2.14) exists and converges to −Λ(α) as Δ→0.

The corollary is proved. □

Note that the assertions of Theorems 9.2.2 and 9.2.3 and their corollaries are “universal”—they contain no restrictions on the distribution F.

9.3 Integro-Local, Integral and Local Theorems on Large Deviation Probabilities in the Cramér Range

9.3.1 Integro-Local and Integral Theorems

In this subsection, under the assumption that the Cramér condition λ ₊>0 is met, we will find the asymptotics of probabilities P(S _n∈Δ[x)) for scaled deviations α=x/n from the so-called Cramér (or regular) range, i.e. for the range α∈(α ₋,α ₊) in which the rate function Λ(α) is analytic.

In the non-lattice case, in addition to the condition λ ₊>0, we will assume without loss of generality that E ξ=0. In this case necessarily

$$\alpha_-\leq0,\qquad \alpha_+=\frac{\psi'(\lambda_+)}{\psi(\lambda_+)}>0,\qquad \lambda(0)=0. $$

The length Δ of the interval may depend on n in some cases. In such cases, we will write Δ _n instead of Δ, as we did earlier. The value

$$ \sigma_\alpha^2= \frac{\psi''(\lambda(\alpha))}{\psi(\lambda (\alpha))}-\alpha^2 $$

(9.3.1)

is clearly equal to $\operatorname{Var}(\xi^{(\alpha)})$ (see (9.1.5) and the definition of ξ ^(α) in Sect. 9.2).

Theorem 9.3.1

Let λ ₊>0, α∈[0,α ₊), ξ be a non-lattice random variable, E ξ=0 and E ξ ²<∞. If Δ _n→0 slowly enough as n→∞, then

$$ \mathbf{P} \bigl(S_n\in \varDelta _n[x) \bigr)= \frac{\varDelta _n}{\sigma _\alpha \sqrt{2\pi n}}e^{-n\varLambda(\alpha)} \bigl(1+o(1) \bigr), $$

(9.3.2)

where α=x/n, and, for each fixed α ₁∈(0,α ₊), the remainder term o(1) is uniform in α∈[0,α ₁] for any fixed α ₁∈(0,α ₊).

A similar assertion is valid in the case when λ ₋<0 and α∈(α ₋,0].

Proof

The proof is based on Theorems 9.2.1 and 8.7.1A. Since the conditions of Theorem 9.2.1 are satisfied, we have

$$\mathbf{P} \bigl(S_n\in \varDelta _n[x) \bigr)=e^{-n\varLambda(\alpha)}\int_0^{\varDelta _n} e^{-\lambda(\alpha)z} \mathbf{P}\bigl(S_n^{(\alpha)}-\alpha n\in dz\bigr). $$

As λ(α)≤λ(α ₊−ε)<∞ and Δ _n→0, one has e ^−λ(α)z→1 uniformly in z∈Δ _n[0) and hence, as n→∞,

$$ \mathbf{P} \bigl(S_n\in \varDelta _n[x) \bigr)=e^{-n\varLambda(\alpha )}\mathbf{P} \bigl(S^{(\alpha)}_n-\alpha n\in \varDelta _n[0) \bigr) \bigl(1+o(1) \bigr) $$

(9.3.3)

uniformly in α∈[0,α ₊−ε].

We now show that Theorem 8.7.1A is applicable to the random variables ξ ^(α)=ξ _(λ(α)). That σ _α=σ(λ(α)) is bounded away from 0 and from ∞ for α∈[0,α ₁] is evident. (The same is true of all the theorems in this section.) Therefore, it remains to verify whether conditions (a) and (b) of Theorem 8.7.1A are met for λ=λ(α)∈[0,λ ₁], λ ₁:=λ(α ₁)<λ ₊ and $\varphi_{(\lambda)}(t)=\frac{\psi(\lambda+it)}{\psi(\lambda)}$ (see (9.1.5)). We have

$$\psi(\lambda+it)=\psi(\lambda)+it\psi'(\lambda)- \frac{t^2}{2}\,\psi''(\lambda)+o \bigl(t^2\bigr) $$

as t→0, where the remainder term is uniform in λ if the function ψ″(λ+iu) is uniformly continuous in u. The required uniform continuity can easily be proved by imitating the corresponding result for ch.f.s (see property 4 in Sect. 7.1). This proves condition (a) in Theorem 8.7.1A with

$$a(\lambda)=\frac{\psi'(\lambda)}{\psi(\lambda)},\qquad m_2(\lambda)= \frac{\psi''(\lambda)}{\psi(\lambda)}. $$

Now we will verify condition (b) in Theorem 8.7.1A. Assume the contrary: there exists a sequence λ _k∈[0,λ ₁] such that

$$q_{\lambda_k}:= \sup_{\theta_1\leq|t|\leq\theta_2} \frac{ |\psi(\lambda_k+it) |}{\psi(\lambda_k)}\to1 $$

as k→∞. By the uniform continuity of ψ in that domain, there exist points t _k∈[θ ₁,θ ₂] such that, as k→∞,

$$\frac{\psi(\lambda_k+it_k)}{\psi(\lambda_k)}\to1. $$

Since the region λ∈[0,λ ₁], |t|∈[θ ₁,θ ₂] is compact, there exists a subsequence (λ _k′,t _k′)→(λ ₀,t ₀) as k′→∞. Again using the continuity of ψ, we obtain the equality

$$ \frac{|\psi(\lambda_0+it_0)|}{\psi(\lambda_0)}=1, $$

(9.3.4)

which contradicts the non-latticeness of $\xi_{(\lambda_{0})}$. Property (b) is proved.

Thus we can now apply Theorem 8.7.1A to the probability on the right-hand side of (9.3.3). Since E ξ ^(α)=α and $\mathbf{E} (\xi^{(\alpha)})^{2}=\frac{\psi''(\lambda(\alpha ))}{\psi (\lambda(\alpha))}$, this yields

(9.3.5)

uniformly in α∈[0,α ₁] (or in x∈[0,α ₁ n]), where the values of

$$\sigma_\alpha^2=\mathbf{E} \bigl(\xi^{(\alpha)}-\alpha \bigr)^2= \frac{\psi''(\lambda(\alpha))}{\psi(\lambda(\alpha))}-\alpha^2 $$

are bounded away from 0 and from ∞. The theorem is proved. □

From Theorem 9.3.1 we can now derive integro-local theorems and integral theorems for fixed or growing Δ. Since in the normal deviation range (when x is comparable with $\sqrt{n}$) we have already obtained such results, to simplify the exposition we will consider here large deviations only, when $x\gg\sqrt{n}$ or, which is the same, $\alpha={x}/{n}\gg{1}/{\sqrt{n}}$. To be more precise, we will assume that there exists a function N(n)→∞, $N(n)=o (\sqrt{n})$ as n→∞, such that $x\geq N(n)\sqrt{n}$ $(\alpha\geq {N(n)}/{\sqrt{n}} )$.

Theorem 9.3.2

Let λ ₊>0, α∈[0,α ₊), ξ be non-lattice, E ξ=0 and E ξ ²<∞. Then, for any Δ≥Δ ₀>0, $x\geq N(n)=o(\sqrt{n\,})$, N(n)→∞ as n→∞, one has

$$ \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)= \frac{e^{-n\varLambda(\alpha)}}{\sigma_\alpha\lambda(\alpha)\sqrt {2\pi n}} \bigl(1-e^{-\lambda(\alpha)\varDelta }\bigr) \bigl(1+o(1) \bigr), $$

(9.3.6)

o(1) being uniform in $\alpha={x}/{n}\in [{N(n)}/{\sqrt{n}},\alpha_{1} ]$ and Δ≥Δ ₀ for each fixed α ₁∈(0,α ₊).

In particular (for Δ=∞),

$$ \mathbf{P}(S_n\geq x)=\frac{e^{-n\varLambda(\alpha)}}{\sigma_\alpha\lambda(\alpha )\sqrt{2\pi n}} \bigl(1+o(1) \bigr). $$

(9.3.7)

Proof

Partition the interval Δ[x) into subintervals Δ _n[x+kΔ _n), k=0,…,Δ/Δ _n−1, where Δ _n→0 and, for simplicity, we assume that M=Δ/Δ _n is an integer. Then, by Theorem 9.2.1, as Δ _n→0,

$$\begin{aligned} & \mathbf{P} \bigl(S_n\in \varDelta _n[x+k \varDelta _n) \bigr) \\&\quad {} = \mathbf{P} \bigl(S_n\in\bigl[x,x+(k+1) \varDelta _n\bigr) \bigr)-\mathbf{P} \bigl(S_n\in [x,x+k \varDelta _n) \bigr) \\&\quad {} =e^{-n\varLambda(\alpha)}\int_{k\varDelta _n}^{(k+1)\varDelta _n}e^{-\lambda(\alpha)z} \mathbf{P} \bigl(S^{(\alpha)}_n-\alpha n\in dz \bigr) \\&\quad {} =e^{-n\varLambda(\alpha)-\lambda(\alpha)k\varDelta _n} \mathbf{P} \bigl(S^{(\alpha)}_n-\alpha n\in \varDelta _n [k\varDelta _n ) \bigr) \bigl(1+o(1) \bigr) \end{aligned}$$

(9.3.8)

uniformly in α∈[0,α ₁]. Here, similarly to (9.3.5), by Theorem 8.7.1A we have

$$ \mathbf{P} \bigl(S^{(\alpha)}_n-\alpha n\in \varDelta _n [k\varDelta _n ) \bigr)=\frac{\varDelta _n}{\sigma _\alpha\sqrt{n}}\phi \biggl(\frac{k\varDelta _n}{\sigma_\alpha\sqrt{n}} \biggr)+o \biggl(\frac{1}{\sqrt{n}} \biggr) $$

(9.3.9)

uniformly in k and α. Since

$$\mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)= \sum _{k=0}^{M-1}\mathbf{P} \bigl(S_n\in \varDelta _n[x+k\varDelta ) \bigr), $$

substituting the values (9.3.8) and (9.3.9) into the right-hand side of the last equality, we obtain

$$\begin{aligned} \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr) =& \frac{e^{-n\varLambda(\alpha)}}{\sigma_\alpha\sqrt{n}} \sum_{k=0}^{M-1} \varDelta _ne^{-\lambda(\alpha)k\varDelta _n} \biggl(\phi \biggl(\frac{k\varDelta _n}{\sigma_\alpha\sqrt{n}} \biggr)+o(1) \biggr) \\=&\frac{e^{-n\varLambda(\alpha)}}{\sigma_\alpha\sqrt{n}} \int_0^{\varDelta -\varDelta _n}e^{-\lambda(\alpha)z} \biggl(\phi \biggl(\frac{z}{\sigma_\alpha\sqrt{n}} \biggr)+o(1) \biggr)\,dz. \\\end{aligned}$$

(9.3.10)

After the variable change λ(α)z=u, the right-hand side can be rewritten as

$$ \frac{e^{-n\lambda(\alpha)}}{\sigma_\alpha\lambda(\alpha)\sqrt{n}} \int_0^{(\varDelta -\varDelta _n)\lambda(\alpha)} e^{-u} \biggl(\phi \biggl(\frac{u}{\sigma_\alpha\lambda(\alpha)\sqrt {n}} \biggr)+o(1) \biggr)\,du, $$

(9.3.11)

where the remainder term o(1) is uniform in α∈[0,α ₁], Δ≥Δ ₀, and u from the integration range. Since λ(α)∼α/σ ² for small α (see (9.1.12) and (9.1.16)), for $\alpha\geq{N(n)}/{\sqrt{n}}$ we have

$$\lambda(\alpha)>\frac{N(n)}{\sigma^2\sqrt{n}}\, \bigl(1+o(1) \bigr),\qquad \sigma_\alpha\lambda(\alpha)\sqrt{n}>\frac{\sigma_\alpha N(n)}{\sigma^2}\to\infty. $$

Therefore, for any fixed u, one has

$$\phi \biggl(\frac{u}{\sigma_\alpha\lambda(\alpha)\sqrt{n}} \biggr)\to\phi(0)= \frac{1}{\sqrt{2\pi}}. $$

Moreover, $\phi(v)\leq{1}/{\sqrt{2\pi}}$ for all v. Hence, by (9.3.10) and (9.3.11),

$$\begin{aligned} \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr) =& \frac{e^{-n\varLambda(\alpha)}}{\sigma_\alpha\lambda(\alpha)\sqrt {2\pi n}}\int _0^{\lambda(\alpha)\varDelta }e^{-u}du \bigl(1+o(1) \bigr) \\=&\frac{e^{-n\varLambda(\alpha)}}{\sigma_\alpha\lambda(\alpha )\sqrt{2\pi n}} \bigl(1-e^{-\lambda(\alpha)\varDelta } \bigr) \bigl(1+o(1) \bigr) \end{aligned}$$

uniformly in α∈[0,α ₁] and Δ≥Δ ₀. Relation (9.3.7) clearly follows from (9.3.6) with Δ=∞. The theorem is proved. □

Note that if E|ξ|^k<∞ (for λ ₊>0 this is a restriction on the rate of decay of the left tails P(ξ<−t), t>0), then expansion (9.1.17) is valid and, for deviations x=o(n) (α=o(1)) such that $n\alpha^{k}={x^{k}}/{n^{k-1}}\leq c={\rm const}$, we can change the exponent nΛ(α) in (9.3.6) and (9.3.7) to

$$ n\varLambda(\alpha)=n\sum_{j=2}^k \frac{\varLambda^{(j)}(0)}{j!}\alpha ^j+o\bigl(n\alpha^k\bigr), $$

(9.3.12)

where Λ ^(j)(0) are found in (9.1.16). For k=3, the foregoing implies the following.

Corollary 9.3.1

Let λ ₊>0, E|ξ|³<∞, ξ be non-lattice, E ξ=0, E ξ ²=σ ², $x{\gg}\sqrt {n}$ and x=o(n ^2/3) as n→∞. Then

$$ \mathbf{P}(S_n\geq x)\sim\frac{\sigma\sqrt{n}}{x\sqrt{2\pi}}\,\exp \biggl\{-\frac {x^2}{2n\sigma^2} \biggr\}\sim \varPhi \biggl(-\frac{x}{\sigma\sqrt{n}} \biggr). $$

(9.3.13)

In the last relation we used the symmetry of the standard normal law, i.e. the equality 1−Φ(t)=Φ(−t). Assertion (9.3.13) shows that in the case λ ₊>0 and E|ξ|³<∞ the asymptotic equivalence

$$\mathbf{P}(S_n\geq x)\sim\varPhi \biggl(-\frac{x}{\sigma\sqrt {n}} \biggr) $$

persists outside the range of normal deviations as well, up to the values x=o(n ^2/3). If E ξ ³=0 and E ξ ⁴<∞, then this equivalence holds true up to the values x=o(n ^3/4). For larger x this equivalence, generally speaking, no longer holds.

Proof of Corollary 9.3.1

The first relation in (9.3.13) follows from Theorem 9.3.2 and (9.3.12). The second follows from the asymptotic equivalence

$$\int_x^\infty e^{-\frac{u^2}{2}}du\sim \frac{e^{-{x^2}/{2}}}{x}, $$

which is easy to establish, using, for example, l’Hospital’s rule. □

9.3.2 Local Theorems

In this subsection we will obtain analogues of the local Theorems 8.7.2 and 8.7.3 for large deviations in the Cramér range. To simplify the exposition, we will formulate the theorem for densities, assuming that the following condition is satisfied:

[D] :: The distribution F has a bounded density f(x) such that
$$\begin{aligned} f(x) =&e^{-\lambda_+x+o(x)}\quad\mbox{as}\ x\to\infty,\ \mbox{if}\ \lambda_+<\infty; \end{aligned}$$
(9.3.14)

$$\begin{aligned} f(x) \leq& ce^{-\lambda x}\quad\mbox{for any fixed}\ \lambda>0,\ c=c(\lambda), \ \mbox{if}\ \lambda_+=\infty. \end{aligned}$$
(9.3.15)

Since inequalities of the form (9.3.14) and (9.3.15) always hold, by the exponential Chebyshev inequality, for the right tails

$$F_+(x)=\int_x^\infty f(u)\,du, $$

condition [D] is not too restrictive. It only eliminates sharp “bursts” of f(x) as x→∞.

Denote by f _n(x) the density of the distribution of S _n.

Theorem 9.3.3

Let

$$\mathbf{E} \xi=0,\qquad\mathbf{E} \xi^2<\infty,\qquad\lambda _+>0,\qquad \alpha=\frac{x}{n}\in[0,\alpha_+), $$

and condition [D] be met. Then

$$f_n(x)=\frac{e^{-n\varLambda(\alpha)}}{\sigma_\alpha\sqrt{2\pi\, n}}\, \bigl(1+o(1) \bigr), $$

where the remainder term o(1) is uniform in α∈[0,α ₁] for any fixed α ₁ ∈ (0,α ₊).

Proof

The proof is based on Theorems 9.2.1 and 8.7.2A. Denote by $f_{n}^{(\alpha)}(x)$ the density of the distribution of $S_{n}^{(\alpha)}$. Relation (9.2.3) implies that, for x=αn, α∈[α ₋,α ₊], we have

$$ f_n(x)=e^{-\lambda(\alpha)x}\psi^{n} \bigl( \lambda(\alpha) \bigr) f_n^{(\alpha)}(x)= e^{-n\varLambda(\alpha)}f_n^{(\alpha)}(x). $$

(9.3.16)

Since E ξ ^(α)=α, we see that $\mathbf{E} (S_{n}^{(\alpha)}-x )=0$ and the density value $f_{n}^{(\alpha)}(x)$ coincides with the density of the distribution of the sum $S_{n}^{(\alpha)}-\alpha n$ at the point 0. In order to use Theorems 8.7.1A and 8.7.2A, we have to verify conditions (a) and (b) for θ ₂=∞ in these theorems and also the uniform boundedness in α∈[0,α ₁] of

$$ \int \bigl|\varphi_{(\lambda(\alpha))}(t) \bigr|^m dt $$

(9.3.17)

for some integer m≥1, where φ _(λ(α)) is the ch.f. of ξ ^(α) (the uniform version of condition (c) in Theorem 8.7.2). By condition [D] the density

$$f^{(\alpha)}(v)=\frac{e^{\lambda(\alpha)v}f(v)}{\psi (\lambda (\alpha) )} $$

in bounded uniformly in α∈[0,α ₁] (for such α one has λ(α)∈[0,λ ₁], λ ₁=λ(α ₁)<λ ₊). Hence the integral

$$\int \bigl(f^{(\alpha)}(v) \bigr)^2\,dv $$

is also uniformly bounded, and so, by virtue of Parseval’s identity (see Sect. 7.2), is the integral

$$\int \bigl|\varphi_{(\lambda(\alpha))}(t) \bigr|^2 dt. $$

This means that the required uniform boundedness of integral (9.3.17) is proved for m=2.

Conditions (a) and (b) for θ ₂<∞ were verified in the proof of Theorem 9.3.1. It remains to extend the verification of condition (b) to the case θ ₂=∞. This can be done by following an argument very similar to the one used in the proof of Theorem 9.3.1 in the case of finite θ ₂. Let θ ₂=∞. If we assume that there exist sequences λ _k∈[0,λ _+,ε] and |t _k|≥θ ₁ such that

$$\frac{ |\psi(\lambda_k+it_k) |}{\psi(\lambda_k)}\to1, $$

then, by compactness of [0,λ _+,ε], there will exist sequences $\lambda_{k}'\to\lambda_{0}\in [0,\lambda_{+,\varepsilon} ]$ and $t'_{k}$ such that

$$ \frac{ |\psi(\lambda'_k+it'_k) |}{\psi(\lambda_0)}\to1. $$

(9.3.18)

But by virtue of condition [D] the family of functions ψ(λ+it), $t\in\mathbb{R}$, is equicontinuous in λ∈[0,λ _+,ε]. Therefore, along with (9.3.18), we also have convergence

$$\frac{ |\psi(\lambda_0+it'_k) |}{\psi(\lambda_0)}\to 1,\qquad |t_k|\geq\theta_1>0, $$

which contradicts the inequality

$$\sup_{|t|\geq\theta_1}\frac{ |\psi(\lambda_0+it) |}{\psi (\lambda_0)}<1 $$

that follows from the existence of density.

Thus property (b) is proved for θ ₂=∞, and we can use Theorem 8.7.2A, which implies that

$$f_n^{(\alpha)}(x)= \frac{1}{\sigma (\lambda(\alpha) )\sqrt{2\pi\,n}}\, \bigl(1+o(1) \bigr). $$

This, together with (9.3.16), proves Theorem 9.3.3. □

Remark 9.3.1

We can see from the proof that, in Theorem 9.3.3, as a more general condition instead of condition [D] one could also consider the integrability of ψ ^m(λ+it) for any fixed λ∈[0,λ ₁], λ ₁<λ ₊, or condition [D] imposed on S _m for some m≥1.

For arithmetic distributions we cannot assume without loss of generality that m ₁=E ξ=0, but that does not change much in the formulations of the assertions. If λ ₊>0, then α ₊=ψ′(λ ₊)/ψ(λ ₊)>m ₁ and the scaled deviations α=x/n for the Cramér range must lie in the region [m ₁,α ₊).

Theorem 9.3.4

Let λ ₊>0, E ξ ²<∞ and the distribution of ξ be arithmetic. Then, for integer x,

$$\mathbf{P}(S_n=x)=\frac{e^{-n\varLambda(\alpha)}}{\sigma_\alpha\sqrt {2\pi n}} \bigl(1+o(1) \bigr), $$

where the remainder term o(1) is uniform in α=x/n∈[m ₁,α ₁] for any fixed α ₁∈(m ₁,α ₊).

A similar assertion is valid in the case when λ ₋<0 and α∈(α ₋,m ₁].

Proof

The proof does not differ much from that of Theorem 9.3.1. By (9.2.3),

$$\mathbf{P}(S_n=x)=e^{-\lambda(\alpha)x}\psi^{-n} \bigl(\lambda (\alpha ) \bigr) \mathbf{P}\bigl(S^{(\alpha)}_n=x \bigr)=e^{-n\varLambda(\alpha)}\mathbf {P}\bigl(S_n^{(\alpha)}=x\bigr), $$

where E ξ ^(α)=α for α∈[m ₁,α ₊). In order to compute $\mathbf{P}(S^{(\alpha)}_{n}=x)$ we have to use Theorem 8.7.3A. The verification of conditions (a) and (b) of Theorem 8.7.1A, which are assumed to hold in Theorem 8.7.3A, is done in the same way as in the proof of Theorem 9.3.1, the only difference being that relation (9.3.4) for t ₀∈[θ ₁,π] will contradict the arithmeticity of the distribution of ξ. Since a(λ(α))=E ξ ^(α)=α, by Theorem 8.7.3A we have

$$\mathbf{P}\bigl(S^{(\alpha)}_n=x\bigr)=\frac{1}{\sigma_\alpha\sqrt{2\pi n}}\, \bigl(1+o(1) \bigr) $$

uniformly in α=x/n∈[m ₁,α ₁]. The theorem is proved. □

9.4 Integro-Local Theorems at the Boundary of the Cramér Range

9.4.1 Introduction

In this section we again assume that Cramér’s condition λ ₊>0 is met. If α ₊=∞ then the theorems of Sect. 9.3 describe the large deviation probabilities for any α=x/n. But if α ₊<∞ then the approaches of Sect. 9.3 do not enable one to find the asymptotics of probabilities of large deviations of S _n for scaled deviations α=x/n in the vicinity of the point α ₊.

In this section we consider the case α ₊<∞. If in this case λ ₊=∞, then, by property (Λ2)(i), we have α ₊=s ₊=sup{t:F ₊(t)>0}, and therefore the random variables ξ _k are bounded from above by the value α ₊, P(S _n≥x)=0 for α=x/n>α ₊. We will not consider this case in what follows. Thus we will study the case α ₊<∞, λ ₊<∞.

In the present and the next sections, we will confine ourselves to considering integro-local theorems in the non-lattice case with Δ=Δ _n→0 since, as we saw in the previous section, local theorems differ from the integro-local theorems only in that they are simpler. As in Sect. 9.3, the integral theorems can be easily obtained from the integro-local theorems.

9.4.2 The Probabilities of Large Deviations of S _n in an o(n)-Vicinity of the Point α ₊ n; the Case ψ″(λ ₊)<∞

In this subsection we will study the asymptotics of P(S _n∈Δ[x)), x=αn, when α lies in the vicinity of the point α ₊<∞ and, moreover, ψ″(λ ₊)<∞. (The case of distributions F, for which λ ₊<∞, α ₊<∞ and ψ″(λ ₊)<∞, will be illustrated later, in Lemma 9.4.1.) Under the above-mentioned conditions, the Cramér transform $\mathbf{F}_{(\lambda_{+})}$ is well defined at the point λ ₊, and the random variable $\xi^{(\alpha_{+})}$ with the distribution $\mathbf{F}_{(\lambda_{+})}$ has mean α ₊ and a finite variance:

$$ \mathbf{E} \xi^{(\alpha_+)}=\frac{\psi'(\lambda_+)}{\psi (\lambda _+)}=\alpha_+, \qquad \operatorname{Var} \bigl(\xi^{(\alpha_+)} \bigr)=\sigma_{\alpha_+}^2= \frac{\psi''(\lambda_+)}{\psi(\lambda_+)}-\alpha_+^2 $$

(9.4.1)

(cf. (9.3.1)).

Theorem 9.4.1

Let ξ be a non-lattice random variable,

$$\lambda_+\in(0,\infty),\qquad\psi''(\lambda_+)< \infty,\qquad y=x-\alpha_+n=o(n). $$

If Δ _n→0 slowly enough as n→∞ then

$$\mathbf{P} \bigl(S_n\in \varDelta _n[x) \bigr)= \frac{\varDelta _n}{\sigma _{\alpha _+}\sqrt{2\pi n}}\,e^{-n\varLambda(\alpha_+)-\lambda_+y} \biggl(\exp \biggl\{-\frac {y^2}{\sigma_{\alpha_+}^2 n} \biggr\}+o(1) \biggr), $$

where

$$\alpha=\frac{x}{n},\qquad \sigma_{\alpha_+}^2= \frac{\psi''(\lambda_+)}{\psi(\lambda _+)}-\alpha_+^2, $$

and the remainder term o(1) is uniform in y.

Proof

As in the proof of Theorem 9.3.1, we use the Cramér transform, but now at the fixed point λ ₊, so there will be no triangular array scheme when analysing the sums $S_{n}^{(\alpha_{+})}$. In this case the following analogue of Theorem 9.2.1 holds true.

Theorem 9.2.1A

Let λ ₊∈(0,∞), α ₊<∞ and y=x−nα ₊. Then, for x=nα and any fixed Δ>0, the following representation is valid:

$$ \mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)=e^{-n\varLambda(\alpha _+)-\lambda_+y} \int_0^\varDelta e^{-\lambda_+ z} \mathbf{P}\bigl(S^{(\alpha_+)}_n-\alpha n\in dz\bigr). $$

(9.4.2)

Proof of Theorem 9.2.1A

repeats that of Theorem 9.2.1 the only difference being that, as was already noted, the Cramér transform is now applied at the fixed point λ ₊ which does not depend on α=x/n. In this case, by (9.2.3),

$$\mathbf{P}(S_n\in dv)=e^{-\lambda_+v+n\ln\psi(\lambda_+)} \mathbf{P} \bigl(S_n^{(\alpha_+)}\in dv\bigr)=e^{-n\varLambda(\alpha_+)+\lambda_+(\alpha_+ n-v)}\mathbf{P} \bigl(S_n^{(\alpha_+)}\in dv\bigr). $$

Integrating this equality in v from x to x+Δ, changing the variable v=x+z (x=nα), and noting that α ₊ n−v=−y−z, we obtain (9.4.2).

The theorem is proved. □

Let us return to the proof of Theorem 9.4.1. Assuming that Δ=Δ _n→0, we obtain, by Theorem 9.2.1A, that

$$ \mathbf{P} \bigl(S_n\in \varDelta _n[x) \bigr)=e^{-n\varLambda(\alpha _+)-\lambda_+ y}\,\mathbf{P} \bigl(S^{(\alpha_+)}_n-\alpha_+n\in \varDelta _n[y) \bigr) \bigl(1+o(1) \bigr). $$

(9.4.3)

By virtue of (9.4.1), we can apply Theorem 8.7.1 to evaluate the probability on the right-hand side of (9.4.3). This theorem implies that, as Δ _n→0 slowly enough,

$$\begin{aligned} \mathbf{P} \bigl(S_n^{(\alpha_+)}-\alpha_+ n\in \varDelta _n[y) \bigr) =& \frac{\varDelta _n}{\sigma_{\alpha_+}\sqrt{n}}\, \phi \biggl( \frac{y}{\sigma_{\alpha_+}\sqrt{n}} \biggr)+o \biggl(\frac{1}{\sqrt{n}} \biggr) \\=&\frac{\varDelta _n}{\sigma_{\alpha_+}\sqrt{2\pi n}}\,\exp \biggl\{-\frac{y^2}{\sigma^2_{\alpha_+}n} \biggr\}+o \biggl( \frac{1}{\sqrt{n}} \biggr) \end{aligned}$$

uniformly in y. This, together with (9.4.3), proves Theorem 9.4.1. □

9.4.3 The Class of Distributions $\mathcal{ER}$. The Probability of Large Deviations of S _n in an o(n)-Vicinity of the Point α ₊ n for Distributions F from the Class $\mathcal{ER}$ in Case ψ″(λ ₊)=∞

When studying the asymptotics of P(S _n≥αn) (or P(S _n∈Δ[αn))) in the case where ψ″(λ ₊)=∞ and α is in the vicinity of the point α ₊<∞, we have to impose additional conditions on the distribution F similarly to what was done in Sect. 8.8 when studying convergence to stable laws.

To formulate these additional conditions it will be convenient to introduce certain classes of distributions. If λ ₊<∞, then it is natural to represent the right tails F ₊(t) as

$$ F_+(t)=e^{-\lambda_+ t}V(t), $$

(9.4.4)

where, by the exponential Chebyshev inequality, V(t)=e ^o(t) as t→∞.

Definition 9.4.1

We will say that the distribution F of a random variable ξ (or the random variable ξ itself) belongs to the class $\mathcal{R}$ if its right tail F ₊(t) is a regularly varying function, i.e. can be represented as

$$ F_+(t)=t^{-\beta}L(t), $$

(9.4.5)

where L is a slowly varying function as t→∞ (see also Sect. 8.8 and Appendix 6).

We will say that the distribution F (or the random variable ξ) belongs to the class $\mathcal{ER}$ if, in the representation (9.4.4), the function V is regularly varying (which will also be denoted as $V\in\mathcal{R}$).

Distributions from the class $\mathcal{R}$ have already appeared in Sect. 8.8.

The following assertion explains which distributions from $\mathcal{ER}$ correspond to the cases α ₊=∞, α ₊<∞, ψ″(λ ₊)=∞ and ψ″(λ ₊)<∞.

Lemma 9.4.1

Let $\mathbf{F}\in\mathcal{ER}$. For α ₊ to be finite it is necessary and sufficient that

$$\int_1^\infty tV(t)\,dt<\infty. $$

For ψ″(λ ₊) to be finite, it is necessary and sufficient that

$$\int_1^\infty t^2V(t)\,dt<\infty. $$

The assertion of the lemma means that α ₊<∞ if β>2 in the representation V(t)=t ^−β L(t), where L is an s.v.f. and α ₊=∞ if β<2. For β=2, the finiteness of α ₊ is equivalent to the finiteness of $\int_{1}^{\infty}t^{-1}L(t)\,dt$. The same is true for the finiteness of ψ″(λ ₊).

Proof of Lemma 9.4.1

We first prove the assertion concerning α ₊. Since

$$\alpha_+=\frac{\psi'(\lambda_+)}{\psi(\lambda_+)}, $$

we have to estimate the values of ψ′(λ ₊) and ψ(λ ₊). The finiteness of ψ′(λ ₊) is equivalent to that of

$$ -\int_1^\infty te^{\lambda_+ t}dF_+(t)=\int_1^\infty t \bigl( \lambda_+V(t)\,dt-dV(t) \bigr), $$

(9.4.6)

where, for V(t)=o(1/t),

$$-\int_1^\infty t\, dV(t)=V(1)+\int _1^\infty V(t)\,dt. $$

Hence the finiteness of the integral on the left-hand side of (9.4.6) is equivalent to that of the sum

$$\lambda_+\int_1^\infty tV(t)\,dt+\int _1^\infty V(t)\,dt $$

or, which is the same, to the finiteness of the integral $\int_{1}^{\infty}tV(t)\,dt$. Similarly we see that the finiteness of ψ(λ ₊) is equivalent to that of $\int_{1}^{\infty}V(t)\,dt$. This implies the assertion of the lemma in the case $\int_{1}^{\infty}V(t)\,dt<\infty$, where one has V(t)=o(1/t). If $\int_{1}^{\infty}V(t)\,dt=\infty$, then ψ(λ ₊)=∞, lnψ(λ)→∞ as λ↑λ ₊ and hence $\alpha_{+}=\lim_{\lambda\uparrow\lambda_{+}} (\ln\psi (\lambda) )'=\infty$.

The assertion concerning ψ″(λ ₊) can be proved in exactly the same way. The lemma is proved. □

The lemma implies the following:

(a)
If β<2 or β=2 and $\int_{1}^{\infty}t^{-1}L(t)=\infty$, then α ₊=∞ and the theorems of the previous section are applicable to P(S _n≥x).
(b)
If β>3 or β=3 and $\int_{1}^{\infty}t^{-1} L(t)\,dt<\infty$, then α ₊<∞, ψ″(λ ₊)<∞ and we can apply Theorem 9.4.1.

It remains to consider the case

(c)
β∈[2,3], where the integral $\int_{1}^{\infty}t^{-1}L(t)\,dt$ is finite for β=2 and is infinite for β=3.

It is obvious that in case (c) we have α ₊<∞ and ψ″(λ ₊)=∞.

Put

$$V_+(t):=\frac{\lambda_+ tV(t)}{\beta\psi(\lambda_+)},\qquad b(n):=V_+^{(-1)} \biggl( \frac{1}{n} \biggr), $$

where $V_{+}^{(-1)} ({1}/{n} )$ is the value of the function inverse to V ₊ at the point 1/n.

Theorem 9.4.2

Let ξ be a non-lattice random variable, $\mathbf{F}\in\mathcal{ER}$ and condition (c) hold. If Δ _n→0 slowly enough as n→∞, then, for y=x−α ₊ n=o(n),

$$\mathbf{P} \bigl(S_n\in \varDelta _n[x) \bigr)= \frac{\varDelta _n e^{-n\varLambda(\alpha_+)- \lambda_+y}}{b(n)} \biggl(f^{(\beta-1,1)} \biggl(\frac{y}{b(n)} \biggr)+o(1) \biggr), $$

where f ^(β−1,1) is the density of the stable law F _(β−1,1) with parameters β−1,1, and the remainder term o(1) is uniform in y.

We will see from the proof of the theorem that studying the probabilities of large deviations in the case where α ₊<∞ and ψ″(λ ₊)=∞ is basically impossible outside the class $\mathcal{ER}$, since it is impossible to find theorems on the limiting distribution of S _n in the case $\operatorname{Var}(\xi )=\infty$ without the conditions [R _γ,ρ] of Sect. 8.8 being satisfied.

Proof of Theorem 9.4.2

Condition (c) implies that $\alpha_{+}=\mathbf{E} \xi^{(\alpha_{+})}<\infty$ and $\operatorname{Var}(\xi^{(\alpha_{+})})=\infty$. We will use Theorem 9.2.1A. For Δ _n→0 slowly enough we will obtain, as in the proof of Theorem 9.4.1, that relation (9.4.3) holds true. But now, in contrast to Theorem 9.4.1, in order to calculate the probability on the right-hand side of (9.4.3), we have to employ the integro-local Theorem 8.8.3 on convergence to a stable law. In our case, by the properties of r.v.f.s, one has

$$\begin{aligned} \mathbf{P} \bigl(\xi^{(\alpha_+)}\geq t \bigr) =& -\frac{1}{\psi(\lambda_+)}\int _t^\infty e^{\lambda_+u} dF_+(u)= \frac{1}{\psi(\lambda_+)}\int_t^\infty \bigl(\lambda _+V(u)du-dV(u) \bigr) \\=&\frac{\lambda_+}{\beta\psi(\lambda_+)}\,t^{-\beta+1}L_+(t)\sim V_+(t), \end{aligned}$$

(9.4.7)

where L ₊(t)∼L(t) is a slowly varying function. Moreover, the left tail of the distribution $\mathbf{F}^{(\alpha_{+})}$ decays at least exponentially fast. By virtue of the results of Sect. 8.8, this means that, for $b(n)=V_{+}^{(-1)}(1/n)$, we have convergence of the distributions of $\frac{S_{n}^{(\alpha_{+})}-\alpha_{+} n}{b(n)}$ to the stable law F _β−1,1 with parameters β−1∈[1,2] and 1. It remains to use representation (9.4.3) and Theorem 8.8.3 which implies that, provided Δ _n→0 slowly enough, one has

$$\mathbf{P} \bigl(S_n^{(\alpha_+)}-\alpha_+ n\in \varDelta _n[y) \bigr)= \frac{\varDelta _n}{b(n)}\,f^{(\beta-1,1)} \biggl( \frac{y}{b(n)} \biggr)+ o \biggl(\frac{1}{b(n)} \biggr) $$

uniformly in y. The theorem is proved. □

Theorem 9.4.2 concludes the study of probabilities of large deviations of S _n/n in the vicinity of the point α ₊ for distributions from the class $\mathcal{ER}$.

9.4.4 On the Large Deviation Probabilities in the Range α>α ₊ for Distributions from the Class $\mathcal{ER}$

Now assume that the deviations x of S _n are such that α=x/n>α ₊, and y=x−α ₊ n grows fast enough (faster than $\sqrt{n}$ under the conditions of Theorem 9.4.1 and faster than b(n) under the conditions of Theorem 9.4.2). Then, for the probability

$$ \mathbf{P} \bigl(S^{(\alpha_+)}-\alpha_+ n\in \varDelta _n[y) \bigr), $$

(9.4.8)

the deviations y (see representation (9.4.3)) will belong to the zone of large deviations, so applying Theorems 8.7.1 and 8.8.3 to evaluate such probabilities does not make much sense. Relation (9.4.7) implies that, in the case $\mathbf{F}\in\mathcal{ER}$, we have $\mathbf{F}^{(\alpha_{+})}\in\mathcal{R}$. Therefore, we will know the asymptotics of the probability (9.4.8) (and hence also of the probability P(S _n∈Δ _n[x)), see (9.4.3)) if we obtain integro-local theorems for the probabilities of large deviations of the sums S _n, in the case where the summands belong to the class $\mathcal{R}$. Such theorems are also of independent interest in the present chapter, and the next section will be devoted to them. After that, in Sect. 9.6 we will return to the problem on large deviation probabilities in the class $\mathcal{ER}$ mentioned in the title of this section.

9.5 Integral and Integro-Local Theorems on Large Deviation Probabilities for Sums S _n when the Cramér Condition Is not Met

If E ξ=0 and the right-side Cramér condition is not met (λ ₊=0), then the rate function Λ(α) degenerates on the right semiaxis: Λ(α)=λ(α)=0 for α≥0, and the results of Sects. 9.1–9.4 on the probabilities of large deviations of S _n are of little substance. In this case, in order to find the asymptotics of P(S _n≥x) and P(S _n∈Δ[x)), we need completely different approaches, while finding these asymptotics is only possible under additional conditions on the behaviour of the tail F ₊(t) of the distribution F, similarly to what happened in Sect. 8.8 when studying convergence to stable laws.

The above-mentioned additional conditions consist of the assumption that the tail F ₊(t) behaves regularly enough. In this section we will assume that $F_{+}(t)=V(t)\in\mathcal{R}$, where $\mathcal{R}$ is the class of regularly varying functions introduced in the previous section (see also Appendix 6). To make the exposition more homogeneous, we will confine ourselves to the case β>2, $\operatorname{Var}(\xi)<\infty$, where −β is the power exponent in the function $V\in\mathcal{R}$ (see (9.4.5)). Studying the case β∈[1,2] ($\operatorname{Var}(\xi)=\infty$) does not differ much from the exposition below, but it would significantly increase the volume of the exposition and complicate the text, and therefore is omitted. Results for the case β∈(0,2] can be found in [8, Chap. 3].

9.5.1 Integral Theorems

Integral theorems for probabilities of large deviations of S _n and maxima $\overline{S}_{n}=\max_{k\leq n}S_{k}$ in the case E ξ=0, $\operatorname{Var}(\xi)<\infty$, $\mathbf {F}\in\mathcal{R}$, β>2, follow immediately from the bounds obtained in Appendix 8. In particular, Corollaries A8.2.1 and A8.3.1 of Appendix 8 imply the following result.

Theorem 9.5.1

Let E ξ=0, $\operatorname{Var}(\xi)<\infty$, $\mathbf{F}\in\mathcal{R}$ and β>2. Then, for $x\gg\sqrt{n\ln n}$,

$$ \mathbf{P}(\overline{S}_n\geq x)\sim \mathbf{P}(S_n\geq x)\sim nV(x). $$

(9.5.1)

Under an additional condition [D ₀] to be introduced below, the assertion of this theorem will also follow from the integro-local Theorem 9.5.2 (see below).

Comparing Theorem 9.5.1 with the results of Sects. 9.2–9.4 shows that the nature of the large deviation probabilities is completely different here. Under the Cramér condition and for α=x/n∈(0,α ₊), the large deviations of S _n are, roughly speaking, “equally contributed to by all the summands” ξ _k, k≤n. This is confirmed by the fact that, for a fixed α, the limiting conditional distribution of ξ _k, k≤n, given that S _n∈Δ[x) (or S _n≥x) for x=αn, Δ=1, as n→∞ coincides with the distribution F ^(α) of the random variable ξ ^(α). The reader can verify this himself/herself using Theorem 9.3.2. In other words, the conditions {S _n∈Δ[x)} (or {S _n≥x}), x=αn, change equally (from F to F ^(α)) the distributions of all the summands.

However, if the Cramér condition is not met, then under the conditions of Theorem 9.5.1 the large deviations of S _n are essentially due to one large (comparable with x) jump. This is seen from the fact that the value of nV(x) on the right-hand side of (9.5.1) is nothing else but the main term of the asymptotics for $\mathbf{P}(\overline{\xi}_{n}\geq x)$, where $\overline{\xi}_{n}=\max_{k\leq n}\xi_{k}$. Indeed, if nV(x)→0 then

$$\begin{aligned} \mathbf{P}(\overline{\xi}_n<x) =& \bigl(1-V(x) \bigr)^n=1-nV(x)+O \bigl( \bigl(nV(x) \bigr)^2 \bigr), \\\mathbf{P}(\overline{\xi}_n\geq x) =&nV(x)+O \bigl( \bigl(nV(x) \bigr)^2 \bigr)\sim nV(x). \end{aligned}$$

In other words, the probabilities of large deviations of S _n, $\overline{S}_{n}$ and $\overline{\xi}_{n}$ are asymptotically the same. The fact that the probabilities of the events {ξ _j≥y} for y∼x play the determining role in finding the asymptotics of P(S _n≥x) can easily be discovered in the bounds from Appendix 8.

Thus, while the asymptotics of P(S _n≥x) for $x=\alpha n\gg\sqrt{n}$ in the Cramér case is determined by “the whole distribution F” (as the rate function Λ(α) depends on the “the whole distribution F”), these asymptotics in the case $\mathbf{F}\in\mathcal{R}$ are determined by the right tail F ₊(t)=V(t) only and do not depend on the “remaining part” of the distribution F (for the fixed value of E ξ=0).

9.5.2 Integro-Local Theorems

In this section we will study the asymptotics of P(S _n∈Δ[x)) in the case where

$$ \mathbf{E} \xi=0,\qquad\operatorname{Var}\xi^2< \infty,\qquad \mathbf{F}\in\mathcal{R}, \qquad\beta>2,\qquad x\gg\sqrt{n\ln n}. $$

(9.5.2)

These asymptotics are of independent interest and are also useful, for example, in finding the asymptotics of integrals of type E(g(S _n); S _n≥x) for $x\gg\sqrt{n\ln n}$ for a wide class of functions g. As was already noted (see Subsection 4.4), in the next section we will use the results from the present section to obtain integro-local theorems under the Cramér condition (for summands from the class $\mathcal{ER}$) for deviations outside the Cramér zone.

In order to obtain integro-local theorems in this section, we will need additional conditions. Besides condition $\mathbf{F}\in\mathcal{R}$, we will also assume that the following holds:

Condition [D ₀] For each fixed Δ, as t→∞,

$$V(t)-V(t+\varDelta )=v(t) \bigl(\varDelta +o(1) \bigr),\qquad v(t)=\frac{\beta V(t)}{t}. $$

It is clear that if the function L(t) in representation (9.4.5) (or the function V(t)) is differentiable for t large enough and L′(t)=o(L(t)/t) as t→∞ (all sufficiently smooth s.v.f.s possess this property; cf. e.g., polynomials of lnt etc.), then condition [D ₀] will be satisfied, and the derivative −V′(t)∼v(t) will play the role of the function v(t).

Theorem 9.5.2

Let conditions (9.5.2) and [D ₀] be met. Then

$$\mathbf{P} \bigl(S_n\in \varDelta [x) \bigr)=\varDelta n v(x) \bigl(1+o(1) \bigr),\qquad v(x)=\frac{\beta V(x)}{x}, $$

where the remainder term o(1) is uniform in $x\geq N\sqrt{n\ln n}$ and Δ∈[Δ ₁,Δ ₂] for any fixed Δ ₂>Δ ₁>0 and any fixed sequence N→∞.

Note that in Theorems 9.5.1 and 9.5.2 we do not assume that n→∞. The assumption that x→∞ is contained in (9.5.2).

Proof

For y<x, introduce the events

$$ G_n := \bigl\{S_n\in \varDelta [x) \bigr\}, \qquad B_j:= \{\xi_j < y \},\qquad B:=\bigcap _{j=1}^n B_j. $$

(9.5.3)

Then

$$ \mathbf{P}(G_n)=\mathbf{P}(G_nB)+\mathbf{P}(G_n \overline{B}),\qquad \overline{B}=\bigcup _{j=1}^n \overline{B}_j, $$

(9.5.4)

where

$$ \sum_{j=1}^n\mathbf{P}(G_n \overline{B}_j)\ge\mathbf{P}(G_n\overline{B})\ge\sum _{j=1}^n\mathbf{P} (G_n \overline{B}_j)- \sum_{i < j \le n} \mathbf{P}(G_n\overline{B}_i\overline{B}_j) $$

(9.5.5)

(see property 8 in Sect. 9.2.2).

The proof is divided into three stages: the bounding of P(G _n B), that of $\mathbf{P}(G_{n}\overline{B}_{i}\overline{B}_{j})$, i≠j, and the evaluation of $\mathbf{P}(G_{n}\overline{B}_{j})$.

(1) A bound on P(G _n B). We will make use of the rough inequality

$$ \mathbf{P}(G_nB)\le\mathbf{P}(S_n \ge x; B ) $$

(9.5.6)

and Theorem A8.2.1 of Appendix 8 which implies that, for x=ry with a fixed r>2, any δ>0, and $x\ge N \sqrt{n\ln n}$, N→∞, we have

$$ \mathbf{P}(S_n\ge x;B )\le\bigl(nV(y) \bigr)^{r-\delta}. $$

(9.5.7)

Here we can always choose r such that

$$ \bigl(nV(x) \bigr)^{r-\delta}\ll n\varDelta v(x) $$

(9.5.8)

for $x\gg\sqrt{n}$. Indeed, putting n:=x ² and comparing the powers of x on the right-hand and left-hand sides of (9.5.8), we obtain that for (9.5.8) to hold it suffices to choose r such that

$$(2-\beta) (r-\delta) < 1 - \beta, $$

which is equivalent, for β>2, to the inequality.

$$r >\frac{\beta-1}{\beta-2}. $$

For such r, we will have that, by (9.5.6)–(9.5.8),

$$ \mathbf{P}(G_nB)=o \bigl(n\varDelta v(x) \bigr). $$

(9.5.9)

Since r−δ>1, we see that, for n≪x ², relations (9.5.8) and (9.5.9) will hold true all the more.

(2) A bound for $\mathbf{P}(G_{n}\overline{B}_{i}\overline{B}_{j} )$. It is sufficient to bound $\mathbf{P}(G_{n}\overline{B}_{n-1}\overline{B}_{n} )$. Set

$$\delta:=\frac{1}{r}<\frac{1}{2},\qquad H_k:= \bigl\{v: \,v< (1-k\delta )x+\varDelta \bigr\},\quad k=1,2. $$

Then

$$\begin{aligned} \mathbf{P}(G_n\overline{B}_{n-1}\overline{B}_n ) =& \int_{H_2}\mathbf{P}(S_{n-2}\in dz ) \\&{} \times\int_{H_1} \mathbf{P}(z+\xi\in dv, \xi\ge\delta x) \mathbf{P}\bigl(v+\xi\in \varDelta [x), \xi\ge\delta x\bigr). \\ \end{aligned}$$

(9.5.10)

Since in the domain H ₁ we have x−v>δx−Δ, the last factor on the right-hand side of (9.5.10) has, by condition [D ₀], the form Δv(x−v)(1+o(1))≤cΔv(x) as x→∞, so the integral over H ₁ in (9.5.10), for x large enough, does not exceed

$$c\varDelta v(x)\mathbf{P}(z+\xi\in H_1;\xi\ge\delta x )\le c\varDelta v(x)V(\delta x). $$

The integral over the domain H ₂ in (9.5.10) evidently allows a similar bound. Since nV(x)→0, we obtain that

$$ \sum_{i< j\le n} \mathbf{P}(G_n\overline{B}_i\overline{B}_j )\le c_1\varDelta n^2v(x)V(x)=o \bigl(\varDelta n v(x) \bigr). $$

(9.5.11)

(3) The evaluation of $\mathbf{P}(G_{n}\overline{B}_{j} )$ is based on the relation

(9.5.12)

which yields

$$\begin{aligned} \mathbf{P}(G_n\overline{B}_n) \le& \varDelta \mathbf{E} \bigl[v(x-S_{n-1}); S_{n-1} < (1-\delta) x+\varDelta \bigr] \bigl(1+o(1) \bigr) \\= & \varDelta v(x) \bigl(1+o(1)\bigr). \end{aligned}$$

(9.5.13)

The last relation is valid for $x\gg\sqrt{n}$, since, by Chebyshev’s inequality, $\mathbf{E} [ v(x-S_{n-1} );|S_{n-1}| \le M\sqrt {n}]\sim v(x)$ as M→∞, $M\sqrt{n}=o(x)$ and, moreover, the following evident bounds hold:

$$\mathbf{E} \bigl[v (x-S_{n-1} ); S_{n-1}\in\bigl(M \sqrt{n}, (1-\delta ) x+\varDelta \bigr) \bigr]=o \bigl(v(x) \bigr), $$

$$\mathbf{E} \bigl[v (x-S_{n-1} );\, S_{n-1}\in(-\infty, -M \sqrt{n}\, ) \bigr]=o \bigl(v(x) \bigr) $$

as M→∞.

Similarly, by (virtue of (9.5.12)) we get

$$ \mathbf{P}(G_n\overline{B}_n ) \ge \int _{-\infty}^{(1-\delta)x} \mathbf{P}(S_{n-1}\in dz )\, \mathbf{P} \bigl( \xi\in \varDelta [x-z) \bigr) \sim \varDelta v(x). $$

(9.5.14)

From (9.5.13) and (9.5.14) we obtain that

$$\mathbf{P}(G_n\overline{B}_n )=\varDelta v(x) \bigl(1+o(1) \bigr). $$

This, together with (9.5.4), (9.5.9) and (9.5.11), yields the representation

$$\mathbf{P}(G_n)=\varDelta nv(x) \bigl(1+o(1) \bigr). $$

The required uniformity of the term o(1) clearly follows from the preceding argument. The theorem is proved. □

Theorem 9.5.2 implies the following

Corollary 9.5.1

Let the conditions of Theorem 9.5.2 be satisfied. Then there exists a fixed sequence Δ _N converging to zero slowly enough as N→∞ such that the assertion of Theorem 9.5.2 remains true when the segment [Δ ₁,Δ ₂] is replaced in it with [Δ _N,Δ ₂].

9.6 Integro-Local Theorems on the Probabilities of Large Deviations of S _n Outside the Cramér Range (Under the Cramér Condition)

We return to the case where the Cramér condition is met. In Sects. 9.3 and 9.4 we obtained integro-local theorems for deviations inside and on the boundary of the Cramér range. It remains to study the asymptotics of P(S _n∈Δ[x)) outside the Cramér range, i.e. for α=x/n>α ₊. Preliminary observations concerning this problem were made in Sect. 9.4.4 where it was reduced to integro-local theorems for the sums S _n when Cramér’s condition is not satisfied. Recall that in that case we had to restrict ourselves to considering distributions from the class $\mathcal{ER} $ defined in Sect. 9.4.3 (see (9.4.4)).

Theorem 9.6.1

Let $\mathbf{F}\in\mathcal{ER}$, β>3, α=x/n>α ₊ and $y=x-\alpha_{+} n\gg\sqrt{n}$. Then there exists a fixed sequence Δ _N converging to zero slowly enough as N→∞, such that

$$\begin{aligned} \mathbf{P} \bigl(S_n\in \varDelta _N[x) \bigr) =&e^{-n\varLambda(\alpha _+)-\lambda_+ y}n\varDelta _N v_+(y) \bigl(1+o(1) \bigr) \\=&e^{-n\varLambda(\alpha)}n\varDelta _N v_+(y) \bigl(1+o(1) \bigr), \end{aligned}$$

where v ₊(y)=λ ₊ V(y)/ψ(λ ₊), the remainder term o(1) is uniform in x and n such that $y\gg N\sqrt{n\ln n}$, N being an arbitrary fixed sequence tending to ∞.

Proof

By Theorem 9.2.1A there exists a sequence Δ _N converging to zero slowly enough such that (cf. (9.4.3))

$$ \mathbf{P} \bigl(S_n\in \varDelta _N[x) \bigr)=e^{-n\varLambda(\alpha _+)-\lambda_+ y}\,\mathbf{P} \bigl(S_n^{(\alpha_+)}-\alpha_+n\in \varDelta _N[y) \bigr). $$

(9.6.1)

Since by properties (Λ1) and (Λ2) the function Λ(α) is linear for α>α ₊:

$$\varLambda(\alpha)=\varLambda(\alpha_+)+(\alpha-\alpha_+)\lambda_+, $$

the exponent in (9.6.1) can be rewritten as

$$-n\varLambda(\alpha_+)-\lambda_+y=-n\varLambda(\alpha). $$

The right tail of the distribution of $\xi^{(\alpha_{+})}$ has the form (see (9.4.7))

$$\mathbf{P}\bigl(\xi^{(\alpha_+)}\geq t\bigr)=\frac{\lambda_+}{\psi(\lambda_+)}\int _{t}^\infty V(u)\,du+V(t). $$

By the properties of regularly varying functions (see Appendix 6),

$$V(t)-V(t-u)=o( \bigl(V(t) \bigr) $$

as t→∞ for any fixed u. This implies that condition [D ₀] of Sect. 9.5 is satisfied for the distribution of $\xi^{(\alpha_{+})}$.

This means that, in order to calculate the probability on the right-hand side of (9.6.1), we can use Theorem 9.5.2 and Corollary 9.5.1, by virtue of which, as Δ _N→0 slowly enough,

$$\mathbf{P} \bigl(S_n^{(\alpha_+)}-\alpha_+ n\in \varDelta _N[y) \bigr)=n\varDelta _Nv_+(y) \bigl(1+o(1) \bigr), $$

where the remainder term o(1) is uniform in all x and n such that $y\gg N\sqrt{n\ln n}$, N→∞.

The theorem is proved. □

Since P(S _n∈Δ _N[x)) decreases exponentially fast as x (or y) grows (note the factor $e^{-\lambda_{+} y}$ in (9.6.1)), Theorem 9.6.1 immediately implies the following integral theorem.

Corollary 9.6.1

Under the conditions of Theorem 9.6.1,

$$\mathbf{P}(S_n\geq x)=e^{-n\varLambda(\alpha)}\frac{nV(y)}{\psi(\lambda_+)} \bigl(1+o(1) \bigr). $$

Proof

Represent the probability P(S _n≥x) as the sum

$$\begin{aligned} \mathbf{P}(S_n\geq x) =&\sum_{k=0}^\infty \mathbf{P} \bigl(S_n\in \varDelta _N[x+k \varDelta _N) \bigr)\\\sim & e^{-n\varLambda(\alpha)}\frac{n\lambda_+}{\psi(\lambda_+)}\sum _{k=0}^\infty \varDelta _N V(y+ \varDelta _N k) e^{-\lambda_+\varDelta _N k}. \end{aligned}$$

Here the series on the right-hand side is asymptotically equivalent, as N→∞, to the integral

$$V(y)\int_0^\infty e^{-\lambda_+ t}dt= \frac{V(y)}{\lambda_+}. $$

The corollary is proved. □

Note that a similar corollary (i.e. the integral theorem) can be obtained under the conditions of Theorem 9.4.2 as well.

In the range of deviations $\alpha=\frac{x}{n}>\alpha_{+}$, only the case $\mathbf{F}\in\mathcal{ER}$, β∈[2,3] (recall that α ₊=∞ for β<2) has not been considered in this text. As we have already said, it could also be considered, but that would significantly increase the length and complexity of the exposition. Results dealing with this case can be found in [8]; one can also find there a more complete study of large deviation probabilities.

Notes

1.
In the literature, the function E e ^λξ is sometimes called the “moment generating function”.
2.
In some publications the transform (9.1.4) is also called the Esscher transform. However, the systematic use of transform (9.1.4) for the study of large deviations was first done by Cramér.
If we study the probabilities of large deviations of sums of random variables using the inversion formula, similarly to what was done for normal deviations in Chap. 8, then we will necessarily come to employ the so-called saddle-point method, which consists of moving the contour of integration so that it passes through the so-called saddle point, at which the exponent in the integrand function, as we move along the imaginary axis, attains its minimum (and, along the real axis, attains its maximum; this explains the name “saddle point”). Cramér’s transform does essentially the same, making such a translation of the contour of integration even before applying the inversion formula, and reduces the large deviation problem to the normal deviation problem, where the inversion formula is not needed if we use the results of Chap. 8. It is this technique that we will follow in the present chapter.

References

Borovkov, A.A.: Mathematical Statistics. Gordon & Breach, Amsterdam (1998)
Google Scholar
Borovkov, A.A., Borovkov, K.A.: Asymptotic Analysis of Random Walks. Heavy-Tailed Distributions. Cambridge University Press, Cambridge (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Sobolev Institute of Mathematics and Novosibirsk State University, Novosibirsk, Russia
Alexandr A. Borovkov

Authors

Alexandr A. Borovkov
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Borovkov, A.A. (2013). Large Deviation Probabilities for Sums of Independent Random Variables. In: Probability Theory. Universitext. Springer, London. https://doi.org/10.1007/978-1-4471-5201-9_9

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5201-9_9
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5200-2
Online ISBN: 978-1-4471-5201-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Large Deviation Probabilities for Sums of Independent Random Variables

Abstract

Keywords

9.1 Laplace’s and Cramér’s Transforms. The Rate Function

9.1.1 The Cramér Condition. Laplace’s and Cramér’s Transforms

Definition 9.1.1

9.1.2 The Large Deviation Rate Function

Definition 9.1.2

Proof

Proof

Proof

Example 9.1.1

Example 9.1.2

Example 9.1.3

Example 9.1.4

9.2 A Relationship Between Large Deviation Probabilities for Sums of Random Variables and Those for Sums of Their Cramér Transforms. The Probabilistic Meaning of the Rate Function

9.2.1 A Relationship Between Large Deviation Probabilities for Sums of Random Variables and Those for Sums of Their Cramér Transforms

Theorem 9.2.1

Proof

9.2.2 The Probabilistic Meaning of the Rate Function

Theorem 9.2.2

Proof of Theorem 9.2.2

9.2.3 The Large Deviations Principle

Theorem 9.2.3

Proof

Corollary 9.2.1

Proof

9.3 Integro-Local, Integral and Local Theorems on Large Deviation Probabilities in the Cramér Range

9.3.1 Integro-Local and Integral Theorems

Theorem 9.3.1

Proof

Theorem 9.3.2

Proof

Corollary 9.3.1

Proof of Corollary 9.3.1

9.3.2 Local Theorems

Theorem 9.3.3

Proof

Remark 9.3.1

Theorem 9.3.4

Proof

9.4 Integro-Local Theorems at the Boundary of the Cramér Range

9.4.1 Introduction

9.4.2 The Probabilities of Large Deviations of S n in an o(n)-Vicinity of the Point α + n; the Case ψ″(λ +)<∞

Theorem 9.4.1

Proof

Theorem 9.2.1A

Proof of Theorem 9.2.1A

9.4.3 The Class of Distributions \(\mathcal{ER}\). The Probability of Large Deviations of S n in an o(n)-Vicinity of the Point α + n for Distributions F from the Class \(\mathcal{ER}\) in Case ψ″(λ +)=∞

Definition 9.4.1

Lemma 9.4.1

Proof of Lemma 9.4.1

Theorem 9.4.2

Proof of Theorem 9.4.2

9.4.4 On the Large Deviation Probabilities in the Range α>α + for Distributions from the Class \(\mathcal{ER}\)

9.5 Integral and Integro-Local Theorems on Large Deviation Probabilities for Sums S n when the Cramér Condition Is not Met

9.5.1 Integral Theorems

Theorem 9.5.1

9.5.2 Integro-Local Theorems

Theorem 9.5.2

Proof

Corollary 9.5.1

9.6 Integro-Local Theorems on the Probabilities of Large Deviations of S n Outside the Cramér Range (Under the Cramér Condition)

Theorem 9.6.1

Proof

Corollary 9.6.1

Proof

Notes

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

9.4.2 The Probabilities of Large Deviations of S _n in an o(n)-Vicinity of the Point α ₊ n; the Case ψ″(λ ₊)<∞

9.4.3 The Class of Distributions \(\mathcal{ER}\). The Probability of Large Deviations of S _n in an o(n)-Vicinity of the Point α ₊ n for Distributions F from the Class \(\mathcal{ER}\) in Case ψ″(λ ₊)=∞

9.4.4 On the Large Deviation Probabilities in the Range α>α ₊ for Distributions from the Class \(\mathcal{ER}\)

9.5 Integral and Integro-Local Theorems on Large Deviation Probabilities for Sums S _n when the Cramér Condition Is not Met

9.6 Integro-Local Theorems on the Probabilities of Large Deviations of S _n Outside the Cramér Range (Under the Cramér Condition)