Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In this chapter we sum independent random variables \(X_i\) and discuss what happens to the distribution of their sum, \(Y = \sum _i X_i\). We shall see that the distribution of Y is given by the convolution of distributions of individual \(X_i\)’s, and that in the case \(i\rightarrow \infty \)—under certain conditions—the distributions of Y tend to stable distributions, relevant for the processes of random walks.

1 Convolution of Continuous Distributions

What is the distribution of \(Z=X+Y\) if continuous random variables X and Y correspond to densities \(f_X(x)\) and \(f_Y(y)\)? We are interested in the probability that the sum \(x+y\) falls within the interval \([z,z+\mathrm {d}z]\), where x and y are arbitrary within their own definition domains. All points fulfilling this requirement are represented by the oblique shaded area in the figure.

 

figure a

One must add up all contributions to the probability within this band. The infinitesimal area \(\mathrm {d}x\,\mathrm {d}z\) (shaded rhomboid) corresponds to the infinitesimal probability \(f_X(x)f_Y(y)\,\mathrm {d}x\,\mathrm {d}z\). By integrating over x we obtain the probability \(f_Z(z)\,\mathrm {d}z\). Let us write only its density and insert \(y=z-x\):

$$\begin{aligned} f_Z(z) = (f_X *f_Y)(z) = \int _{-\infty }^\infty f_X(x) f_Y(z-x) \, \mathrm {d}x . \end{aligned}$$
(6.1)

This operation is called the convolution of distributions and we denote it by the symbol \(*\). If you do not trust this geometric argument, one can also reason as follows:

$$\begin{aligned} f_Z(z)\,\mathrm {d}z= & {} P( z\le Z\le z+\mathrm {d}z ) = P( z\le X+Y\le z+\mathrm {d}z ) \\= & {} \int _{-\infty }^\infty \mathrm {d}x \int _{z-x}^{z-x+\mathrm {d}z} f_X(x) f_Y(y) \, \mathrm {d}y = \int _{-\infty }^\infty f_X(x) \mathrm {d}x \underbrace{\int _{z-x}^{z-x+\mathrm {d}z} f_Y(y) \, \mathrm {d}y}_{f_Y(z-x)\,\mathrm {d}z} , \end{aligned}$$

whence (6.1) follows immediately. Convolution is a symmetric operation:

$$ \bigl (f_X *f_Y\bigr )(z) = \int _{-\infty }^\infty f_X(x) f_Y(z-x) \,\mathrm {d}x = \int _{-\infty }^\infty f_X(z-y) f_Y(y) \,\mathrm {d}y = \bigl (f_Y *f_X\bigr )(z) . $$

A convolution of three probability distributions is calculated as follows:

$$ \bigl (f_1 *f_2 *f_3\bigr )(z) = \int _{-\infty }^\infty \int _{-\infty }^\infty f_1(x_1) \, f_2(x_2) \, f_3(z-x_1-x_2) \, \mathrm {d}x_1 \, \mathrm {d}x_2 , $$

and generalizations of higher order are obvious.

Fig. 6.1
figure 1

Twofold consecutive convolution of \(U(-1/2,1/2)\) with itself

Example

What do we obtain after two consecutive convolutions of a symmetric uniform distribution \(U(-1/2,1/2)\), corresponding to the “box” probability density f shown in Fig. 6.1? The first convolution yields

$$ g(x) = \bigl ( f *f \bigr )(x) = \int \limits _{-\infty }^\infty f(x') \, f(x-x') \, \mathrm {d}x' = \left\{ \begin{array}{lcl} \displaystyle \int \limits _{-1/2}^{x+1/2} \mathrm {d}x' = 1+x \!\!&{};&{}\!\!\quad -1 \le x \le 0 , \\ \displaystyle \int \limits _{x-1/2}^{1/2} \mathrm {d}x' = 1-x \!\!&{};&{}\!\! \,0 \le x \le 1 , \end{array} \right. $$

which is a triangular distribution (\(f*f\) in the figure). The second convolution gives

$$\begin{aligned} \bigl ( f *f *f \bigr )(x)= & {} \int \limits _{-\infty }^\infty f(x') \, g(x-x') \, \mathrm {d}x' \\= & {} \left\{ \begin{array}{lcl} \displaystyle \int \limits _{-1/2}^{x+1} \bigl [ \bullet \bigr ] \,\mathrm {d}x' = {9\over 8} + {3x\over 2} + {x^2\over 2} &{};&{} {-{3\over 2} \le x \le -{1\over 2}} , \\ \displaystyle \int \limits _{-1/2}^{x} \bigl [ \bullet \bigr ]\,\mathrm {d}x' +\displaystyle \int \limits _{x}^{1/2} \bigl [ \bullet \bigr ]\,\mathrm {d}x' = - {x^2\over 2} + {3\over 4} &{};&{} \,\,\qquad |x| \le {1\over 2} , \\ \displaystyle \int \limits _{x-1}^{1/2} \bigl [ \bullet \bigr ] \,\mathrm {d}x' = {9\over 8} - {3x\over 2} + {x^2\over 2} &{};&{} {\,\,{1\over 2} \le x \le {3\over 2}} , \\ \end{array} \right. \end{aligned}$$

where \(\bullet = 1 + (x-x')\). This density is denoted by \(f*f*f\) in the figure. Try to proceed yet another step and calculate \(f*f*f*f\)! (You shall see in an instant where this is leading.)     \(\triangleleft \)

Example

What about the convolution of an asymmetric distribution? For instance, what is the distribution of the variable \(Y = X_1 + X_2 + \cdots + X_n\) if all \(X_i\) are uniformly distributed on [0, 1], i.e. \(X_i \sim U(0,1)\)?

The density of the variable Y for arbitrary \(n \ge 1\) is

$$\begin{aligned} f_Y(x) = {1\over (n-1)!} \sum _{h=0}^{\lfloor {x}\rfloor } \left( {n\atop h} \right) (-1)^h (x-h)^{n-1} , \qquad n \ge 1 , \end{aligned}$$
(6.2)

and is shown in Fig. 6.2 for \(n=1\) (original distribution), \(n=2\) (single convolution), \(n=3\), \(n=6\) and \(n=12\). As in the previous example, the density after several convolutions reminds one of something “bell-shaped”, one could suspect, the normal distribution. Besides, the distribution of the sum variable creeps away from the origin: this is a cue for the following subsection.     \(\triangleleft \)

Fig. 6.2
figure 2

Multiple convolutions of the U(0, 1) distribution with itself

1.1 The Effect of Convolution on Distribution Moments

First consider what happens to the average of the sum of two random variables:

$$\begin{aligned} \overline{Z} \!= & {} \! \int _{-\infty }^\infty z \, f_Z(z) \,\mathrm {d}z = \int _{-\infty }^\infty z \left[ \int _{-\infty }^\infty f_X(x) f_Y(z-x) \,\mathrm {d}x \right] \,\mathrm {d}z \\ \!= & {} \! \int _{-\infty }^\infty \left[ \int _{-\infty }^\infty z \, f_Y(z-x) \,\mathrm {d}z \right] f_X(x) \,\mathrm {d}x = \int _{-\infty }^\infty \left[ \int _{-\infty }^\infty (x+y) f_Y(y) \,\mathrm {d}y \right] f_X(x) \,\mathrm {d}x \\ \!= & {} \! \int _{-\infty }^\infty x \left[ \int _{-\infty }^\infty f_Y(y) \,\mathrm {d}y \right] f_X(x) \,\mathrm {d}x +\int _{-\infty }^\infty f_X(x) \,\mathrm {d}x \int _{-\infty }^\infty y\,f_Y(y) \,\mathrm {d}y = \overline{X} + \overline{Y} , \end{aligned}$$

thus

$$ \overline{X+Y} = \overline{X} + \overline{Y} $$

or \(E[X+Y] = E[X] + E[Y]\), which we already know from (4.6). Now let us also calculate the variance of Z! We must average the expression

$$ \bigl ( Z - \overline{Z} \bigr )^2 = \bigl [ \bigl ( X - \overline{X} \bigr ) +\bigl ( Y - \overline{Y} \bigr ) \bigr ]^2 = \bigl ( X - \overline{X} \bigr )^2 +2\bigl ( X - \overline{X} \bigr )\bigl ( Y - \overline{Y} \bigr ) + \bigl ( Y - \overline{Y} \bigr )^2 . $$

Because X and Y are independent, the expected value of the second term is zero, so we are left with only

$$ \sigma _{X+Y}^2 = \sigma _X^2 + \sigma _Y^2 $$

or \(\mathrm {var}[X+Y] = \mathrm {var}[X] + \mathrm {var}[Y]\). We know that too, namely, from (4.20), in a slightly different garb also from (4.25) if one sets \(Y=X_1+X_2\). As an exercise, check what happens to the third and fourth moment upon convolution: you will find out that \(M_{3,X+Y} = M_{3,X} + M_{3,Y}\), so the third moments of distributions are additive. By taking into account the definition of skewness \(\rho = M_3 / \sigma ^3\) (see (4.18)) this can also be written as

$$ \rho _{X+Y} \sigma _{X+Y}^3 = \rho _{X} \sigma _{X}^3 +\rho _{Y} \sigma _{Y}^3 . $$

The fourth moments are not additive, since \(M_{4,X+Y} = M_{4,X} + M_{4,Y} + 6 M_{2,X} M_{2,Y}\), but by using (4.19) this can be simplified to

$$ \varepsilon _{X+Y} \sigma _{X+Y}^4 = \varepsilon _{X} \sigma _{X}^4 +\varepsilon _{Y} \sigma _{Y}^4 . $$

Example

It appears as if even the most “weird” distribution evolves into something “bell-shaped” when it is convoluted with itself a couple of times. Figure 6.3 shows an example in which upon just two convolutions a rather irregular density (fulfilling all requirements for a probability density) turns into a density closely resembling the standardized normal distribution.     \(\triangleleft \)

Fig. 6.3
figure 3

As few as two convolutions may be needed to turn a relatively irregular distribution into a distribution that looks almost like the standardized normal. (We have subtracted the expected value of the distribution obtained at each step and rescaled the variance)

Example

Still, convolution does not perform miracles. Let us calculate the n-fold convolution of the Cauchy distribution with itself! We obtain

$$\begin{aligned} f^{(1)}(x)= & {} f(x) = {1\over \pi } {1\over 1+x^2} , \nonumber \\ f^{(2)}(x)= & {} \bigl (f*f\bigr )(x) = {1\over \pi } {2\over 4+x^2} , \nonumber \\ f^{(3)}(x)= & {} \bigl (f*f*f\bigr )(x) = {1\over \pi } {3\over 9+x^2} , \nonumber \\&\vdots&\nonumber \\ f^{(n)}(x)= & {} \bigl ( \underbrace{f*f*\cdots *f}_{n} \bigr )(x) = {1\over \pi } {n\over n^2+x^2} . \end{aligned}$$
(6.3)

Certainly \(f^{(n)}\) does not approach the density of the normal distribution; rather, it remains faithful to its ancestry. Consecutive convolutions yield just further Cauchy distributions! We say that the Cauchy distribution is stable with respect to convolution. The reasons for this behaviour will be discussed below.     \(\triangleleft \)

2 Convolution of Discrete Distributions

The discrete analog of the continuous convolution formula (6.1) for the summation of independent discrete random variables X and Y is at hand: if X takes the value i, then Y must be \(n-i\) if their sum is to be n. Since X and Y are independent, the probabilities for such an “event” should be multiplied, thus

$$\begin{aligned} P(X+Y=n) = \sum _i P(X=i,Y=n-i) = \sum _i P(X=i)P(Y=n-i) \end{aligned}$$
(6.4)

or

$$ f_{X+Y}(n) = \sum _i f_X(i) f_Y(n-i) . $$

Example

Let us demonstrate that the convolution of two Poisson distributions is still a Poisson distribution! Let \(X \sim \mathrm {Poisson}(\lambda )\) and \(Y \sim \mathrm {Poisson}(\mu )\) be mutually independent Poisson variables with parameters \(\lambda \) and \(\mu \). For their sum \(Z=X+Y\) one then has

$$\begin{aligned} P(Z=n)= & {} \sum _{i=0}^n P(X=i,Y=n-i) = \sum _{i=0}^n P(X=i)P(Y=n-i) \\= & {} \sum _{i=0}^n {\lambda ^i \mathrm {e}^{-\lambda } \over i!} {\mu ^{(n-i)} \mathrm {e}^{-\mu } \over (n-i)!} \!=\! {\mathrm {e}^{-(\lambda +\mu )} \over n!} \sum _{i=0}^n {n!\over i!(n-i)!} \lambda ^i \mu ^{n-i} \!=\! {\mathrm {e}^{-(\lambda +\mu )} (\lambda +\mu )^n \over n!} , \end{aligned}$$

thus indeed \(Z \sim \mathrm {Poisson}(\lambda +\mu )\). A more elegant solution of this problem will be given by the Example on p. 369 in Appendix B.1.     \(\triangleleft \)

Example

Let us compute the probability distribution of the sum \(Z=X+Y\) of independent discrete random variables X and Y, distributed according to

$$ f_n = P(X\!=\!n) = \left\{ \begin{array}{lcl} 0.15 &{};&{} n = -3 , \\ 0.25 &{};&{} n = -1 , \\ 0.1 &{};&{} n = 2 , \\ 0.3 &{};&{} n = 6 , \\ 0.2 &{};&{} n = 8 , \\ 0 &{};&{} \mathrm {otherwise} , \end{array} \right. \quad g_n = P(Y\!=\!n) = \left\{ \begin{array}{lcl} 0.2 &{};&{} n = -2 , \\ 0.1 &{};&{} n = 1 , \\ 0.3 &{};&{} n = 5 , \\ 0.4 &{};&{} n = 8 , \\ 0 &{};&{} \mathrm {otherwise} . \end{array} \right. $$

The distributions are shown in Fig. 6.4 (left) [1].

Fig. 6.4
figure 4

Discrete convolution in the case when the distributions have different supports. [Left] Distributions f and g. [Right] Convolution of f and g

In principle we are supposed to find all values \(P(Z=z)\), so we must compute the convolution sum \(\{h\} = \{f\} *\{g\}\) for each n separately:

$$ h_n = P(Z=n) = \sum _{j=-\infty }^\infty f_j g_{n-j} . $$

To make the point, let us just calculate the probability that \(X+Y=4\). We need

When n and j indices are combed through, many \(f_j g_{n-j}\) terms vanish (crossed-out terms above); only the underlined bilinears survive. Such a procedure must be repeated for each n: a direct calculation of convolutions may become somewhat tedious. The problem can also be solved by using generating functions, as demonstrated by the Example on p. 374 in Appendix B.2.     \(\triangleleft \)

3 Central Limit Theorem

Let \(X_1,X_2,\ldots ,X_n\) be real, independent and identically distributed random variables with probability density \(f_X\), whose expected value \(\mu _X = E[X_i]\) and variance \(\sigma _X^2 = E[(X_i - \mu _X)^2]\) are bounded. Define the sum of random variables \(Y_n=\sum _{i=1}^n X_i\). By (4.6) and (4.22), the expected value and variance of \(Y_n\) are \(E[Y_n] = \mu _{Y_n} = n \mu _X\) and \(\sigma ^2_{Y_n} = n \sigma _X^2\), respectively. The probability density \(f_Y\) of the sum variable \(Y_n\) is given by the n-fold convolution of the densities of the \(X_i\)’s,

$$ f_{Y_n} = \underbrace{f_X *f_X *\cdots *f_X}_{n} . $$

The example in Fig. 6.2 has revealed that the average of the probability density, calculated by consecutive convolutions of the original density, kept on increasing: in that case, the average in the limit \(n\rightarrow \infty \) even diverges! One sees that the variance keeps on growing as well. Both problems can be avoided by defining a rescaled variable

$$ Z_n = {Y_n-\mu _{Y_n} \over \sigma _{Y_n}} = {Y_n-n\mu _X \over \sqrt{n} \sigma _X} . $$

This ensures that upon subsequent convolutions, the average of the currently obtained density is subtracted and its variance is rescaled: see Fig. 6.3. In the limit \(n\rightarrow \infty \) the distribution function of the variable \(Z_n\) then converges to the distribution function of the standardized normal distribution N(0, 1),

$$ \lim _{n\rightarrow \infty } P(Z_n \le z) = \Phi (z) = \frac{1}{\sqrt{2\pi }} \int _{-\infty }^z \mathrm {e}^{-t^2/2} \, \mathrm {d}t , $$

or, in the language of probability densities,

$$ \lim _{n\rightarrow \infty } \sigma _{Y_n} f_{Y_n} \left( \sigma _{Y_n} z + \mu _{Y_n} \right) = \frac{1}{\sqrt{2\pi }} \, \mathrm {e}^{-z^2/2} . $$

In other words, the dimensionless probability density \(\sigma _{Y_n} f_{Y_n}\) converges to the standardized normal probability density in the limit \(n\rightarrow \infty \), which is known as the central limit theorem (CLT).

3.1 Proof of the Central Limit Theorem

The central limit theorem can be proven in many ways: one way is to exploit our knowledge on momentum-generating functions from Appendix B.2. Suppose that the momentum-generating function of the variables \(X_i\) exists and is finite for all t in some neighborhood of \(t=0\). Then for each standardized variable \(U_i = (X_i - \mu _X)/\sigma _X\), for which \(E[U_i]=0\) and \(\mathrm {var}[U_i] = 1\) (thus also \(E[U_i^2]=1\)), there exists a corresponding momentum-generating function

$$ M_{U_i}(t) = E\bigl [ \mathrm {e}^{tU_i} \bigr ] , $$

which is the same for all \(U_i\). Its Taylor expansion in the vicinity of \(t=0\) is

$$\begin{aligned} M_U(t) = 1 + t \underbrace{E[U]}_{0} + {t^2\over 2!} \, \underbrace{E[U^2]}_{1} + {t^3\over 3!} \, E[U^3] + \cdots = 1 + {t^2\over 2} + {\mathcal{O}}\bigl ( t^2 \bigr ) . \end{aligned}$$
(6.5)

Let us introduce the standardized variable

$$ Z_n = (U_1 + U_2 + \cdots + U_n) / \sqrt{n} = (X_1 + X_2 + \cdots + X_n - n \mu _X) / (\sigma _X\sqrt{n}) . $$

Its momentum-generating function is \(M_{Z_n}(t) = E\bigl [ \mathrm {e}^{tZ_n} \bigr ]\). Since the variables \(X_i\) are mutually independent, this also applies to the rescaled variables \(U_i\), therefore, by formula (B.16), we get

$$ E\bigl [ \mathrm {e}^{tZ_n} \bigr ] = E\bigl [ \mathrm {e}^{t(U_1+U_2+\cdots +U_n)/\sqrt{n}} \bigr ] = E\bigl [ \mathrm {e}^{(t/\sqrt{n})U_1} \bigr ] E\bigl [ \mathrm {e}^{(t/\sqrt{n})U_2} \bigr ] \cdots E\bigl [ \mathrm {e}^{(t/\sqrt{n})U_n} \bigr ] $$

or

$$ M_{Z_n}(t) = \left[ M_U\bigl ( t/\sqrt{n} \bigr ) \right] ^n , \qquad n = 1, 2, \ldots $$

By using the expansion of \(M_U\), truncated at second order, we get

$$ M_{Z_n}(t) = \left[ 1 + {t^2\over 2n} + {\mathcal{O}}(t^2/n) \right] ^n , \qquad n=1,2,\ldots $$

Hence

$$ \lim _{n\rightarrow \infty } M_{Z_n}(t) = \lim _{n\rightarrow \infty } \left( 1 + {t^2\over 2n} \right) ^n = \mathrm {e}^{t^2/2} . $$

We know from (B.13) that this is precisely the momentum-generating function corresponding to the normal distribution N(0, 1), so indeed

$$ f_Z(z) = {1\over \sqrt{2\pi }} \, \mathrm {e}^{-z^2/2} , $$

which we set out to prove. A direct proof (avoiding the use of generating functions) can be found in [2]; it proceeds along the same lines as the proof of the Laplace limit theorem in Appendix B.3.1.

The speed of convergence to the standardized normal distribution N(0, 1) with the distribution function \(\Phi (z)\) is quantified by the Berry–Esséen theorem [2]. If the third moment of \(|X - \mu _X|\) is bounded (\(\rho = E[|X - \mu _X|^3] < \infty \)), it holds that

$$ \left| \, P(Z_n \le z) - \Phi (z) \, \right| \le \frac{C \rho }{\sqrt{n}\sigma _X^3} , $$

where \(0.4097 \lesssim C \lesssim 0.4748\) [3]. Now we also realize why consecutive convolutions in (6.3) have not led us to the normal distribution: no moments exist for the Cauchy distribution (see Sect. 4.7.1), so the condition \(\rho < \infty \) is not fulfilled. Moreover, one should not truncate the Taylor expansion (6.5).

The central limit theorem and the form of the bound on the speed of convergence remain valid when summing variables \(X_i\) distributed according to different (non-identical) probability distributions, if the variables are not too “dispersed” (Lindeberg criterion, see [2]). An excellent (and critical) commentary on “why normal distributions are normal” is also given by [4].

Example

Let us revisit the convolution of the uniform distribution in Fig. 6.2. We sum twelve mutually independent variables \(X_i \sim U(0,1)\) and subtract 6,

$$\begin{aligned} Y = \sum _{i=1}^{12} X_i - 6 . \end{aligned}$$
(6.6)

What are we supposed to get? The averages of all \(X_i\) are 1 / 2, \(E[X_i]=1/2\), while their variances are \(\mathrm {var}[X_i] = 1/12\) (see Table 4.1). Hence, Y should also have an average of zero and a variance of \(\mathrm {var}[Y] = \mathrm {var}[X_1] + \cdots + \mathrm {var}[X_{12}] = 12/12 = 1\). By the central limit theorem, Y should be almost normally distributed, if we believe that \(12 \approx \infty \). How well this holds is shown in Fig. 6.5.

Fig. 6.5
figure 5

Histogram of \(10^7\) values Y, randomly generated according to (6.6), compared to the density of the standardized normal distribution (3.10). In effect, the figure also shows the deviation of (6.2) from the normal density. The sharp cut-offs at \(\approx \) \(-5\) and \(\approx \)4.7 are random: by drawing a larger number of values the histogram would fill the whole interval \([-6,6]\)

We have thus created a primitive “convolution” generator of approximately normally distributed numbers, but with its tails cut off since Y can never exceed 6 and can never drop below \(-6\). It is a practical generator—which does not mean that it is good. How a “decent” generator of normally distributed random numbers can be devised will be discussed in Sect. C.2.5.     \(\triangleleft \)

Example

(Adapted from [5].) The mass M of granules of a pharmaceutical ingredient is a random variable, distributed according to the probability density

$$\begin{aligned} f_M(m) = {1\over 24 m_0^5} \, m^4 \mathrm {e}^{-m/m_0} , \quad m \ge 0 , \quad m_0 = 40\,\mathrm {mg} . \end{aligned}$$
(6.7)

To analyze the granulate, we acquire a sample of 30 granules. What is the probability that the total mass of the granules in the sample exceeds its average value by more than \(10\%\)?

The average mass of a single granule and its variance are

$$ \overline{M} = \int _0^\infty m f_M(m) \,\mathrm {d}m = 5 m_0 , \qquad \sigma _M^2 = \int _0^\infty \bigl ( m - \overline{M}^2 \bigr ) f_M(m) \, \mathrm {d}m = 5 m_0^2 . $$

The probability density \(f_X\) of the total sample mass X, which is also a random variable, is a convolution of thirty densities of the form (6.7); this number is large enough to invoke the central limit theorem, so the density \(f_X\) is almost normal, with average \(\overline{X} = 30\,\overline{M} = 150\,m_0\) and variance \(\sigma _X^2 = 30\,\sigma _M^2 = 150\,m_0^2\):

$$ f_X(x) \approx f_{\mathrm {norm}}\bigl ( x; \overline{X}, \sigma _X^2 \bigr ) = f_{\mathrm {norm}}\bigl ( x; 150\,m_0, 150\,m_0^2 \bigr ) . $$

The desired probability is then

$$ P( X \! > \! 165\,m_0 ) \approx \int \limits _{165\,m_0}^\infty f_{\mathrm {norm}}\bigl ( x; \overline{X}, \sigma _X^2 \bigr ) \,\mathrm {d}x = {1\over 2} \left[ \! 1 - \mathrm {erf}\left( { (165-150)m_0 \over \sqrt{2}\sqrt{150}\,m_0 } \right) \! \right] \approx 11\% , $$

where we have used Table D.2.     \(\triangleleft \)

4 Stable Distributions \(\star \)

The normal distribution as the limit distribution of the sum of independent random variables can be generalized by the concept of stable distributions [6, 7].

Suppose we are dealing with independent random variables \(X_1\), \(X_2\) and \(X_3\) with the same distribution over the sample space \(\Omega \). We say that such a distribution is stable, if for each pair of numbers a and b there exists a pair c and d such that the distribution of the linear combination \(aX_1+bX_2\) is equal to the distribution of \(cX_3+d\), that is,

$$ P\bigl ( a X_1 + b X_2 \in A \bigr ) = P\bigl ( c X_3 + d \in A \bigr ) \qquad \forall A \subset \Omega . $$

Such random variables are also called ‘stable’; a superposition of stable random variables is a linear function of a stable random variable with the same distribution.

Stable distributions are most commonly described by their characteristic functions (see Appendix B.3). Among many possible notations we follow [6]. We say that a random variable X has a stable distribution \(f_{\mathrm {stab}}(x;\alpha ,\beta ,\gamma ,\delta )\), if the logarithm of its characteristic function (B.17) has the form

$$ \log \phi _X(t) = {\mathrm {i}}\delta t - \gamma ^\alpha |t|^\alpha \bigl [ 1 - {\mathrm {i}}\beta \Phi _\alpha (t) \bigr ] , $$

where

$$ \Phi _\alpha (t) = \left\{ \begin{array}{rcl} \mathrm {sign}(t) \, \tan (\pi \alpha /2) &{};&{} \alpha \ne 1 , \\ -\frac{2}{\pi } \, \mathrm {sign}(t) \log |t| &{};&{} \alpha = 1 . \end{array} \right. $$

The parameter \(\alpha \in (0,2]\) is the stability index or characteristic exponent, the parameter \(\beta \in [-1,1]\) describes the skewness of the distribution, and two further parameters \(\gamma >0\) and \(\delta \in {\mathbb {R}}\) correspond to the distribution scale and location, respectively. For \(\alpha \in (1,2]\) the expected value exists and is equal to \(E[X] = \delta \). For general \(\alpha \in (0,2]\) there exist moments \(E[|X|^p]\), where \(p\in [0,\alpha )\).

Fig. 6.6
figure 6

Stable distributions \(f_{\mathrm {stab}}(x;\alpha ,\beta ,\gamma ,\delta )\). [Top left and right] Dependence on parameter \(\alpha \) at \(\beta =0.5\) and 1.0. [Bottom left and right] Dependence on parameter \(\beta \) at \(\alpha =0.5\) and 1.0. At \(\alpha \ne 1\) the independent variable is shifted by \(c_{\alpha ,\beta } = \beta \tan (\pi \alpha /2)\)

It is convenient to express X by another random variable Z,

$$ X = \left\{ \begin{array}{lcl} \gamma Z + \delta &{} ; &{} \alpha \ne 1 , \\ \gamma \left( Z + \frac{2}{\pi } \beta \log \gamma \right) + \delta &{} ; &{} \alpha = 1 . \end{array} \right. $$

Namely, the characteristic function of Z is somewhat simpler,

$$ \log \phi _Z(t) = -|t|^\alpha \bigl [ 1 - {\mathrm {i}}\beta \Phi _\alpha (t) \bigr ] , $$

as it depends only on two parameters, \(\alpha \) and \(\beta \). The probability density \(f_Z\) of the variable Z is calculated by the inverse Fourier transformation of the characteristic function \(\phi _Z\):

$$ f_Z(z; \alpha ,\beta ) = \frac{1}{\pi } {\int _0^\infty } \exp (-t^\alpha ) \cos \bigl ( z t - t^\alpha \beta \Phi _\alpha (t) \bigr ) \, \mathrm {d}t , $$

where \(f_Z(-z;\alpha ,\beta ) = f_Z(z;\alpha ,-\beta )\). The values of \(f_Z\) and \(f_X\) can be computed by using integrators tailored to rapidly oscillating integrands: see [8], p. 660; a modest software support for stable distributions can also be found in [9]. With respect to \(\alpha \) and \(\beta \), the definition domains of \(f_Z\) are

$$ z \in \left\{ \begin{array}{lcl} {(-\infty ,0]} &{} ; &{} {\alpha< 1} , \,\, {\beta = -1} , \\ {[0,\infty )} &{} ; &{} \alpha < 1 , \,\, \beta = 1 , \\ {\mathbb {R}}&{} ; &{} \mathrm {otherwise} . \end{array} \right. $$

The dependence of \(f_{\mathrm {stab}}\) (\(f_X\) or \(f_Z\) with appropriate scaling) on the parameter \(\alpha \) is shown in Fig. 6.6 (top left and right), while the dependence on \(\beta \) is shown in the same figure at bottom left and right.

By a suitable choice of parameters such a general formulation allows for all possible stable distributions. The most relevant ones are

Stable distributions with \(\alpha \in (0,2)\) have a characteristic behaviour of probability densities known as power or fat tails. The cumulative probabilities satisfy the asymptotic relations

(6.8)

where \(c_\alpha = 2 \sin (\pi \alpha /2)\Gamma (\alpha )/\pi \). For \(\beta \in (-1,1)\) such asymptotic behaviour is valid in both limits, \(x\rightarrow \pm \infty \). Note that the probability density has the asymptotics \(\mathcal{O}(|x|^{-\alpha -1})\) if the cumulative probability goes as \(\mathcal{O}(|x|^{-\alpha })\) .

5 Generalized Central Limit Theorem \(\star \)

Having introduced stable distributions (Sect. 6.4) one can formulate the generalized central (or Lévy’s) limit theorem, elaborated more closely in [2]. Here we just convey its essence.

Suppose we have a sequence of independent, identically distributed random variables \(\{ X_i \}_{i\in {\mathbb {N}}}\), from which we form the partial sum

$$ {Y_n} = {X_1} + {X_2} + \cdots + {X_n} . $$

Assume that their distribution has power tails, so that for \(\alpha \in (0,2]\) the following limits exist:

$$ \lim _{x\rightarrow \infty } |x|^\alpha \, P(X > x) = d_+ , \qquad \lim _{x\rightarrow -\infty } |x|^\alpha \, P(X < x) = d_- , $$

and \(d = d_+ + d_- > 0\). Then real coefficients \(a_n > 0\) and \(b_n\) exist such that the rescaled partial sum

$$ Z_n = \frac{Y_n - n b_n}{a_n} $$

in the limit \(n\rightarrow \infty \) is stable, and its probability density is \(f_{\mathrm {stab}}(x; \alpha ,\beta ,1,0)\). Its skewness is given by \(\beta = (d_+ - d_-)/(d_+ + d_-)\), while \(a_n\) and \(b_n\) are

$$\begin{aligned} a_n= & {} \left\{ \begin{array}{lcl} (d\,n/c_\alpha )^{{1/\alpha }} &{};&{} \alpha \in (0,2) , \\ \sqrt{(d \, n\log n)/2} &{};&{} \alpha = 2 , \end{array} \right. \\ b_n= & {} \left\{ \begin{array}{lcl} E\bigl [X_i\bigr ] &{};&{} \alpha \in (1,2] , \\ E\bigl [X_i \,H\bigl (|X_i|-a_n\bigr )\bigr ] &{};&{} \text {otherwise} , \end{array} \right. \end{aligned}$$

where H is the Heaviside function. The constant \(c_\alpha \) is defined next to (6.8). The coefficient \(a_n\) for \(\alpha < 2\) diverges with increasing n as \(\mathcal{O}(n^{1/\alpha })\).

The generalized central limit theorem is useful in analyzing the process of random walk, which is analogous to extending the partial sum of random numbers \(Y_n\). Such processes are discussed in Sects. 6.7 and 6.8. The convergence to the stable distribution when \(n\rightarrow \infty \) is becoming more and more “capricious” when \(\alpha \) decreases.

6 Extreme-Value Distributions \(\star \)

In Sects. 6.3 and 6.4 we have discussed the distributions of values obtained in summing independent, identically distributed random variables \(\{X_i\}_{i=1}^n\). Now we are interested in statistical properties of their maximal and minimal values, i.e. the behaviour of the quantities

$$\begin{aligned} M_n= & {} \mathrm {max}\{ X_1, X_2, \ldots , X_n \} , \\ \widetilde{M}_n= & {} \mathrm {min}\{ X_1, X_2, \ldots , X_n \} , \end{aligned}$$

when \(n\rightarrow \infty \). We thereby learn something about the probability of extreme events, as exceptionally strong earthquakes, unusual extent of floods or inconceivably large amounts of precipitation: “It rained for four years, eleven months, and two days.” (See [10], p. 315.) The variables \(X_i\) are the values of the process, usually recorded at constant time intervals—for example, \(n=365\) daily temperature averages on Mt. Blanc—while \(M_n\) is the corresponding annual maximum. We are interested in, say, the probability that on top of Mt. Blanc, the temperature of \(+42\,^\circ \mathrm {C}\) will be exceeded on any one day in the next ten years.

In principle, we have already answered these questions—about both the maximal and minimal value—in Problem 2.11.6: if \(F_X\) is the distribution function of the individual \(X_i\)’s, the maximal values \(M_n\) are distributed according to

$$\begin{aligned} F_{M_n}(x) = P\bigl ( M_n \le x \bigr ) = \bigl [ F_X(x) \bigr ]^n , \end{aligned}$$
(6.9)

and the minimal as

$$ 1 - F_{\widetilde{M}_n}(x) = 1 - P\bigl (\widetilde{M}_n \le x \bigr ) = P\bigl (\widetilde{M}_n > x \bigr ) = \bigl [ 1 - F_X(x) \bigr ]^n . $$

But this does not help much, as \(F_X\) is usually not known! A statistical analysis of the observations may result in an approximate functional form of \(F_X\), but even small errors in \(F_X\) (particularly in its tails) may imply large deviations in \(F_X^n\). We therefore accept the fact that \(F_X\) is unknown and try to find families of functions \(F_X^n\), by which extreme data can be modeled directly [11, 12].

 

figure b

There is another problem. Define \(x_+\) as the smallest value x, for which \(F_X(x)=1\). Then for any \(x < x_+\) we get \(F_X^n(x) \rightarrow 0\), when \(n\rightarrow \infty \), so that the distribution function of \(M_n\) degenerates into a “step” at \(x_+\). The figure above shows this in the case of uniformly distributed variables \(X_i\sim U(0,1)\) with probability density \(f_X(x)=1\) (\(0\le x\le 1\)) and distribution function \(F_X(x)=x\) (\(0\le x\le x_+ = 1\)). When \(n\rightarrow \infty \), the distribution function \(F_X^n\) tends to the step (Heaviside) function at \(x=1\), while its derivative (probability density) resembles the delta “function” at the same point. Our goal is to find a non-degenerate distribution function. We will show that this can be accomplished by a rescaling of the variable \(M_n\),

$$\begin{aligned} M_n^*= {M_n - b_n \over a_n} , \end{aligned}$$
(6.10)

where \(a_n > 0\) and \(b_n\) are constants. Illustrations of a suitable choice of these constants or of their calculation are given by the following Example and Exercise in Sect. 6.9.5. A general method to determine these constants is discussed in [13, 14].

Example

Let \(X_1, X_2, \ldots , X_n\) be a sequence of independent, exponentially distributed variables, thus \(F_X(x) = 1 - \mathrm {e}^{-x}\). Let \(a_n = 1\) and \(b_n = \log n\). Then

$$\begin{aligned} P\left( { {M_n} - {b_n} \over {a_n}} \le x \right)= & {} P\bigl ( M_n \le a_n x + b_n \bigr ) = P \bigl ( M_n \le x + \log n \bigr ) \\= & {} \left[ F_X\bigl ( x + \log n \bigr ) \right] ^n = \left[ 1 - \mathrm {e}^{-(x + \log n)} \right] ^n = \Bigl [ 1 - {1\over n} \, \mathrm {e}^{-x} \Bigr ]^n \\\rightarrow & {} \exp \bigl ( -\exp (-x) \bigr ) , \qquad x \in {\mathbb {R}}, \end{aligned}$$

when \(n\rightarrow \infty \). By a suitable choice of \(a_n\) and \(b_n\) we have therefore stabilized the location and scale of the distributions of \(M_n^*\) in the limit \(n\rightarrow \infty \).

Let us repeat this calculation for independent variables with the distribution function \(F_X(x) = \mathrm {e}^{-1/x}\) and for uniformly distributed variables, \(F_X(x) = x\)! In the first case we set \(a_n=n\) and \(b_n=0\), and get \(P(M_n^*\le x) = \mathrm {e}^{-1/x}\) (\(x > 0\)). In the second case a good choice is \(a_n = 1/n\) and \(b_n = 1\), yielding \(P(M_n^*\le x) \rightarrow \mathrm {e}^x\) (\(x < 0\)) in the limit \(n\rightarrow \infty \). Plot all three functions \(F_X(x)\) of this Example and elaborate why one or the other are more or less sensible for the actual physical description of extreme phenomena!     \(\triangleleft \)

6.1 Fisher–Tippett–Gnedenko Theorem

Apparently the choice of constants \(a_n\) and \(b_n\) is crucial if we wish the distribution of \(M_n^*\) in the limit \(n\rightarrow \infty \) to be non-trivial (not degenerated into a point); the basic formalism for a correct determination of these constants is discussed e.g. in [14]. In the following we assume that such constants can be found; one can then namely invoke the Fisher–Tippett–Gnedenko theorem [15, 16], which is the extreme-value analog of the central limit theorem of Sect. 6.3: if there exist sequences of constants \(\{ a_n > 0 \}\) and \(\{ b_n \}\) such that in the limit \(n\rightarrow \infty \) we have

$$ P\left( {M_n - b_n\over a_n} \le x \right) \rightarrow G(x) $$

for a non-degenerate distribution function G, then G belongs to the family

$$\begin{aligned} G(x) = \exp \left\{ - \left[ 1 + \xi \left( {x-\mu \over \sigma } \right) \right] ^{-1/\xi } \right\} , \end{aligned}$$
(6.11)

defined on the set of points \(\{ x: 1 + \xi (x-\mu )/\sigma ) > 0 \}\), where \(-\infty< \mu < \infty \), \(\sigma > 0\) and \(-\infty< \xi < \infty \). Formula (6.11) defines the family of generalized extreme-value distributions (GEV). An individual distribution is described by the location parameter \(\mu \) (a sort of average of extreme values), the scale parameter \(\sigma \) (their dispersion), and the shape parameter \(\xi \). The value of \(\xi \) characterizes three sub-families of the GEV set—Fréchet (\(\xi > 0\)), Gumbel (\(\xi = 0\)) and Weibull (\(\xi < 0\))—differing by the location of the point \(x_+\) and asymptotics. The Gumbel-type distributions must be understood as the \(\xi \rightarrow 0\) limit of (6.11):

$$\begin{aligned} G(x) = \exp \left\{ -\exp \left[ -\left( {x-\mu \over \sigma } \right) \right] \right\} , \qquad -\infty< x < \infty . \end{aligned}$$
(6.12)

The corresponding probability density in the case \(\xi \ne 0\) is

$$\begin{aligned} g(x) = G'(x) = {1\over \sigma } \bigl [ t(x) \bigr ]^{1+\xi } \,\mathrm {e}^{-t(x)} , \qquad t(x) = \left[ 1 + \xi \, {x-\mu \over \sigma } \right] ^{-1/\xi } \end{aligned}$$
(6.13)

while for \(\xi = 0\) it is

$$ g(x) = {1\over \sigma } \exp \left[ - {x-\mu \over \sigma } - \exp \left( -{x-\mu \over \sigma } \right) \right] . $$

The predictive power of the Fisher–Tippett–Gnedenko theorem does not lag behind the one of the central limit theorem: if one is able to find suitable \(\{ a_n\}\) and \(\{ b_n\}\), the limiting extreme-value distribution is always of the type (6.11), regardless of the parent distribution \(F_X\) that generated these extreme values in the first place! Different choices of \(\{ a_n\}\) and \(\{ b_n\}\) lead to GEV-type distributions with different \(\mu \) and \(\sigma \), but with the same shape parameter \(\xi \), which is the essential parameter of the distribution.

Example

Figure 6.7 (left) shows the annual rainfall maxima, measured over 151 years (1864–2014) in the Swiss town of Engelberg [17]. Each data point represents the extreme one-day total (the wettest day in the year): we are therefore already looking at the extreme values and we are interested in their distribution, not the distribution of all non-zero daily quantities: that is most likely normal!

Fig. 6.7
figure 7

Rainfall in Engelberg (1864–2014). [Left] Time series of annual extreme values. [Right] Histogram of extremes, the corresponding probability density g (dashed curve) and the GEV distribution function (full curve). The optimal parameters \(\widehat{\mu }\), \(\widehat{\sigma }\) and \(\widehat{\xi }\) have been determined by fitting g to the histogram

Fig. 6.8
figure 8

Return values for extreme rainfall in Engelberg (period 1864–2014). The full curve is the model prediction with parameters from Fig. 6.7, and the dashed curve is the model with parameters obtained by the maximum likelihood method

Figure 6.7 (right) shows the histogram of 151 extreme one-day totals, normalized such that the sum over all bins is equal to one. It can therefore be directly fitted by the density (6.13) (dashed curve), resulting in the distribution parameters \(\widehat{\mu }= 53.9\,\mathrm {mm}\), \(\widehat{\sigma }= 14.8\,\mathrm {mm}\), \(\widehat{\xi }= 0.077\) (Fréchet family). The corresponding distribution function is shown by the full curve.     \(\triangleleft \)

6.2 Return Values and Return Periods

The extreme-value distribution and its asymptotic behaviour can be nicely illustrated by a return-level plot. Suppose that we have measured \(n=365\) daily rainfall amounts \(x_i\) over a period of N consecutive years, so that their annual maxima are also available:

$$ \underbrace{x_1,x_2,\ldots ,x_n}_{\displaystyle {M_{n,1}}}, \, \underbrace{x_{n+1},x_{n+2},\ldots ,x_{2n}}_{\displaystyle {M_{n,2}}},\,\ldots ,\, \underbrace{x_{(N-1)n+1},x_{(N-1)n+2},\ldots ,x_{Nn}}_{\displaystyle {M_{n,N}}} . $$

The quantiles of the annual extremes distribution are obtained by inverting (6.11):

$$ x_p = \left\{ \begin{array}{lcl} \mu - \displaystyle {\sigma \over \xi } \left[ 1 - \bigl (-\log (1-p)\bigr )^{-\xi } \right] &{};&{} \xi \ne 0 , \\ \mu - \sigma \log \bigl ( -\log (1-p) \bigr ) &{};&{} \xi = 0 , \end{array} \right. $$

where \(G(x_p) = 1-p\). We call \(x_p\) the return level corresponding to the return period \(T=1/p\). One may namely expect that the value \(x_p\) will be exceeded once every 1 / p years or that the annual maximum will exceed the value \(x_p\) in any year with a probability of \(p=1/T\). From these definitions it follows that

$$\begin{aligned} T = {1\over p} = {1\over 1-G(x_p)} . \end{aligned}$$
(6.14)

The model dependence of \(x_p\) on T in the case of Engelberg rainfall is shown in Fig. 6.8 by the full curve. On the abscissa one usually uses a logarithmic scale; one thereby shrinks the region of “extreme extreme” values and obtains a clearer picture of the asymptotics in terms of \(\xi \). We must also plot the actually measured extreme observations \(M_{n,1}, M_{n,2}, \ldots , M_{n,N}\). In general, these are not sorted, so—in the spirit of (6.14)—individual extremes \(M_{n,i}\) are mapped to their return periods:

$$ T_i = {N \over N+1 - \mathrm {rank}(M_{n,i})} , \qquad i=1,2,\ldots , N . $$

The points \((T_i, M_{n,i})\) are denoted by circles in the figure. The maximum one-day total of \(111.2\,\mathrm {mm}\), recorded in 2005, has an expected return period of 31 years, while the deluge witnessed in 1874 may reoccur every \(\approx \)150 years on the average.

The fitting of the probability density to the data as in the previous Example depends on the number of bins in the histogram (see Sect. 9.2), so this is not the best way to pin down the optimal parameters. In Problem 8.8.3 the parameters of the GEV distribution and their uncertainties will be determined for the same data set by the method of maximum likelihood. In Fig. 6.7 (right) this distribution is shown by the dashed line.

6.3 Asymptotics of Minimal Values

So far we have only discussed the distributions of maximal values, most frequently occurring in practice. On the other hand, the distributions of extremely small values, i.e. the asymptotic behaviour of the quantities

$$ \widetilde{M}_n = \mathrm {min}\{ X_1, X_2, \ldots , X_n \} $$

when \(n\rightarrow \infty \), are also important, in particular in modeling critical errors in systems, where the lifetime of the whole system, \(\widetilde{M}_n\), is equal to the minimal lifetime of one of its components \(\{ X_i \}\).

There is no need to derive new formulas for minimal values; we can simply use the maximal-value results. Define \(Y_i = -X_i\) for \(i=1,2,\ldots ,n\), so that small values of \(X_i\) correspond to large values of \(Y_i\). Thus if \(\widetilde{M}_n = \mathrm {min}\{ X_1, X_2, \ldots , X_n \}\) and \({M}_n = \mathrm {max}\{ Y_1, Y_2, \ldots , Y_n \}\), we also have

$$ \widetilde{M}_n = - M_n . $$

In the limit \(n\rightarrow \infty \) we therefore obtain

$$\begin{aligned} P\bigl ( \widetilde{M}_n \le x \bigr )= & {} P\bigl ( -M_n \le x \bigr ) = P\bigl ( M_n \ge -x \bigr ) = 1 - P\bigl ( M_n \le -x \bigr ) \nonumber \\\rightarrow & {} 1 - \exp \left\{ - \left[ 1 + \xi \left( {-x-\mu \over \sigma } \right) \right] ^{-1/\xi } \right\} \nonumber \\= & {} 1 - \exp \left\{ - \left[ 1 - \xi \left( {x-\widetilde{\mu }\over \sigma } \right) \right] ^{-1/\xi } \right\} \end{aligned}$$
(6.15)

on \(\{ x: 1-\xi (x-\widetilde{\mu })/\sigma > 0 \}\), where \(\widetilde{\mu } = -\mu \). This means that a minimal-value distribution can be modeled either by directly fitting (6.15) to the observations or by using (6.11) and considering the symmetry exposed above: if, for example, we wish to model the data \(x_1, x_2, \ldots , x_n\) by a minimal-value distribution (parameters \(\widetilde{\mu }, \sigma , \xi \)), this is equivalent to modeling the data \(-x_1, -x_2, \ldots , -x_n\) by a maximal-value distribution with the same \(\sigma \) and \(\xi \), but with \(\widetilde{\mu } = -\mu \).

7 Discrete-Time Random Walks \(\star \)

Random walks are non-stationary random processes used to model a variety of physical processes. A random or stochastic process is a generalization of the concept of a random variable: instead of drawing a single value, one “draws” a whole time series (signal), representing one possible realization of the random process or its sample path. The non-stationarity of the process means that its statistical properties change with time. (A detailed classification of random process can be found in [8].) In this subsection we discuss discrete-time random walks [2, 18, 19], while the next subsection is devoted to their continuous-time counterparts [1821]. See also Chap. 12.

Imagine a discrete-time random process X, observed as a sequence of random variables \(\{X(t)\}_{t\in {\mathbb {N}}}\). The partial sums of this sequence are

$$\begin{aligned} Y(t) = Y(0) + \sum _{i=1}^t X(i) = Y(t-1) + X(t) \end{aligned}$$
(6.16)

and represent a new discrete-time random process Y, i.e. a sequence of random variables \(\{Y(t)\}_{t\in {\mathbb {N}}_0}\). The process Y is called a random walk, whose individual step is the process X(t). Let the sample space \(\Omega \) of X and Y be continuous. We are interested in the time evolution of the probability density \(f_{Y(t)}\) of the random variable Y if the initial density \(f_{Y(0)}\) is known.

If we assume that Y is a process in which the state of each point depends only on the state of the previous point, the time evolution of \(f_{Y(t)}\) is determined by

$$ f_{Y(t)}(y) = \int _\Omega f\bigl ( Y(t) = y \,|\, Y(t-1) = x \bigr ) \, f_{Y(t-1)}(x) \,\mathrm {d}x , $$

where \(f\bigl ( Y(t) = y \,|\, Y(t-1) = x \bigr )\) is the conditional probability density that Y goes from value x at time \(t-1\) to value y at time t. Let us also assume that the process X is independent of the previous states, so that \(f\bigl ( X(t) = x \,|\, Y(t-1) = y-x) = f_{X(t)}(x)\). By considering (6.16) and substituting \(z=y-x\) we get

$$ f_{Y(t)}(y) = \int _\Omega f_{X(t)}(z) f_{Y(t-1)}(y-z) \, \mathrm {d}z = \bigl ( f_{X(t)} *f_{Y(t-1)} \bigr ) (y) . $$

By using this formula \(f_{Y(t)}\) can be expressed as a convolution of the initial distribution \(f_{Y(0)}\) and the distribution of the sum of steps until time t, \(f_{X}^{*t}\):

$$ f_{Y(t)} = f_{Y(0)} *f_{X}^{*t},\qquad f_{X}^{*t} = f_{X(1)} *f_{X(2)} *\cdots *f_{X(t)} . $$

The time evolution \(f_{Y(t)}(y)\) is most easily realized in Fourier space, where it is given by the product of Fourier transforms \(\mathcal{F}\) of the probability densities,

$$ \mathcal{F} \left[ f_{Y(t)} \right] = \mathcal{F} \left[ f_{Y(0)} \right] \prod _{i=1}^t \mathcal{F} \left[ f_{X(i)} \right] . $$

One often assumes that at time zero the value of the process Y is zero and that \(f_{Y(0)}(y) = \delta (y)\). This assumption is useful in particular when one is interested in the qualitative behaviour of \(f_{Y(t)}\) at long times.

7.1 Asymptotics

To understand the time asymptotics of the distribution \(f_{Y(t)}\) is is sufficient to study one-dimensional random walks. Assume that the steps are identically distributed, with the density \(f_{X(t)} = f_X\), and \(Y(0)=0\). The distribution corresponding to the process Y is therefore determined by the formula

$$ f_{Y(t)} = \mathcal{F}^{-1} \left[ \Bigl ( \mathcal{F} \left[ f_X \right] \Bigr )^t \right] $$

for all times t. The behaviour of \(f_{Y(t)}\) in the limit \(t\rightarrow \infty \) is determined by the central limit theorem (Sect. 6.3) and its generalization (Sect. 6.5). The theorems tell us that at large t, \(f_{Y(t)}\) converges to the limiting (or asymptotic) distribution which can be expressed by one of the stable distributions \(f_{\mathrm {stab}}\), such that

$$ f_{Y(t)}(y) \sim L(t) f_\mathrm {stab}\bigl ( L(t) y + t \mu (t) \bigr ) $$

with suitably chosen functions L and \(\mu \). The function L represents the effective width of the central part of the distribution \(f_{Y(t)}\), where the bulk of the probability is concentrated, and is called the characteristic spatial scale of the distribution. The function \(\mu \) has the role of the distribution average.

Furthermore, if the distribution of steps, \(f_X\), has a bounded variance, \(\sigma _X^2 < \infty \), the central limit theorem tells us that \(f_{Y(t)}\) tends to the normal distribution with a standard deviation of

$$ L=\sigma _{Y(t)} \sim t^{1/2} . $$

Such asymptotic dependence of the spatial scale on time defines normal diffusion, and this regime of random walks is named accordingly (Fig. 6.9 (left)) .

Fig. 6.9
figure 9

Dependence of the characteristic spatial scale L on time t. [Left] Discrete-time random walks. [Right] Continuous-time random walks

If the probability density \(f_X\) asymptotically behaves as

$$ f_X(x) \sim \frac{C_\pm }{|x|^{\alpha +1}} , \qquad x\rightarrow \pm \infty , $$

where \(C_\pm \) are constants, we say that the distribution has power or fat tails, a concept familiar from Sect. 6.4. For \(\alpha \in (0,2)\), the second moment of the distribution no longer exists, and \(f_{Y(t)}\) at large t tends to a distribution with scale

$$ L \sim t^{1/\alpha } . $$

Because in this case the characteristic scale changes faster than in the case of normal diffusion, we are referring to super-diffusion. The dynamics of the process Y in this regime is known as Lévy flights. The diffusion with \(\alpha = 1\) is called ballistic: particles propagate with no restrictions with given velocities, so that

$$ L\sim t . $$

Near \(\alpha =2\) we have \(L(t) \sim (n \log n)^{1/2}\), a regime called log-normal diffusion.

These properties can be easily generalized to multiple space dimensions. We observe the projection of the walk, \(\hat{n}^\mathrm {T} Y(t)\), along the direction defined by the unit vector \(\hat{n}\), and its probability density \(f_{\hat{n}^\mathrm {T} Y(t)}\). For each \(\hat{n}\) we apply the central limit theorem or its generalization and determine the scale \(L_{\hat{n}}\). A random walk possesses a particular direction \(\hat{n}^*\) along which the scale is largest or increases fastest with time. We may take \(L_{\hat{n}^*}\) to be the characteristic scale of the distribution \(f_{Y(t)}\). An example of a simulation of a two-dimensional random walk where the steps in x and y directions are independent, is shown in Fig. 6.10.

Fig. 6.10
figure 10

Examples of random walks \((x_t,y_t)\) with \(10^4\) steps, generated according to \(x_{t+1} = x_t + \mathrm {sign}(X)|X|^{-\mu }\) and \(y_{t+1} = y_t + \mathrm {sign} (Y)|Y|^{-\mu }\), where X and Y are independent random variables, uniformly distributed on \([-1,1]\). [Left] \(\mu =0.25\). [Right] \(\mu =0.75\). The circles denote the initial position of the walks, \(x=y=0\)

If the densities \(f_{X(t)}\) have power tails, \(f_{Y(t)}\) also has them. This applies regardless of the central limit theorem or its generalization. Suppose that in the limit \(t\rightarrow \infty \) we have \(f_{X(t)}(x)\sim C_{\pm ,t} |x|^{-\alpha -1}\). When the walk “generates” the density \(f_{Y(t)}\), the tails add up, so \(f_{Y(t)}(x)\sim \bigl (\sum \nolimits _{i=1}^t C_{\pm ,i}\bigr ) |x|^{-\alpha -1}\) when \(x\rightarrow \pm \infty \). This means that the probability of extreme events in Y(t) increases with time, since

$$ P\bigl ( |Y(t)|> y \bigr ) \sim \sum _{i=1}^t P\bigl ( |X(i)| > y \bigr ) , \qquad \mathrm {when~} y\rightarrow \infty . $$

To estimate the variance of such processes we therefore apply methods of robust statistics (Sect. 7.4). Instead of calculating the standard deviation \(\sigma _{Y(t)}\) in sub-diffusive random walks, for example, one is better off using MAD (7.23).

8 Continuous-Time Random Walks \(\star \)

In continuous-time random walks [1821] the number of steps N(t) taken until time t becomes a continuous random variable. The definition of a discrete-time random walk (6.16) should therefore be rewritten as

$$ Y(t) = Y(0) + \sum _{i=1}^{N(t)} X(i) . $$

The expression for Y(t) can not be cast in iterative form \(Y(t) = Y(t-1) + \cdots \) as in (6.16). The number of steps N(t) has a probability distribution \(F_{N(t)}\). Suppose that N(t) and X(i) are independent processes—which is not always true, as it is not possible to take arbitrary many steps within given time [22, 23]. If X(i) at different times are independent and correspond to probability densities \(f_{X(i)}\), the probability density of the random variable Y(t) is

$$ f_{Y(t)}(y) = \sum _{n=0}^\infty F_{N(t)}(n) \bigl ( f_{Y(0)}*f_{X}^{*n} \bigr ) (y), $$

where

$$ f_{X}^{*t} = f_{X(1)} *f_{X(2)} *\cdots *f_{X(t)} . $$

In the interpretation of such random walks and the choice of distribution \(F_{N(t)}\) we follow [20]. A walk is envisioned as a sequence of steps whose lengths X(i) and waiting time T(i) between the steps are randomly drawn. After N steps the walk makes it to the point \(\mathcal{X}(N)\) and the elapsed time is \(\mathcal{T}(N)\), so that

$$ \mathcal{X}(N) = \sum _{i=1}^N X(i) ,\quad \mathcal{T}(N) = \sum _{i=1}^N T(i) ,\quad \mathcal{X}(0) = \mathcal{T}(0) = 0 . $$

Within given time, a specific point can be reached in different numbers of steps N. If the step lengths X(i) and waiting times T(i) are independent, the number of steps N(t) taken until time t is determined by the process of drawing the waiting times. Let us introduce the probability that the ith step does not occur before time t,

$$ F_{T(i)}(t) = \int _t^\infty f_{T(i)}(t') \, \mathrm {d}t' , $$

where \(f_{T(i)}\) is the probability density corresponding to the distribution of waiting times. The probability of making n steps within the time interval [0, t] is then

$$ F_{N(t)}(n) = \int _0^t f_{T}^{*n}(t') F_{T(n+1)}(t-t') \, \mathrm {d}t' = \left( f_{T}^{*n} *F_{T(n+1)} \right) (t) , $$

where

$$ f_{T}^{*n} = f_{T(1)}*f_{T(2)}*\cdots *f_{T(n)} . $$

The distribution \(F_{N(t)}\) can be calculated by using the Laplace transformation in time domain and the Fourier transformation in spatial domain: this allows one to avoid convolutions and operate with products of functions in transform spaces. The procedure, which we can not discuss in detail, leads to the Montroll–Weiss equation [20], helping us to identify four parameter regions corresponding to distributions of step lengths (density \(f_X\)) and waiting times (density \(f_T\)) with different dependencies of the scale L on time t, which determine the diffusion properties of the random walk. These regions are shown in Fig. 6.9 (right) and quantified below. We assume that the distributions of step lengths and waiting times do not change during the walk, so that \(f_{X(i)} = f_X\) and \(f_{T(i)} = f_T\).

Normal diffusion with spatial scale \(L \sim t^{1/2}\) is obtained when \(E[T] < \infty \), \(\sigma _X <\infty \).

Sub-diffusion with scale \(L \sim t^{\beta /2}\) is obtained with \(E[T] = \infty \), \(\sigma _X < \infty \) and distribution of waiting times

$$ f_T (t) \sim \frac{1}{t^{1+\beta }} , \qquad \beta \in (0,1) . $$

Super-diffusion with \(L\sim t^{1/\alpha }\) is obtained with \(E[T] < \infty \), \(\sigma _X=\infty \) and distribution of step lengths

$$ f_X (x) \sim \frac{1}{x^{1+\alpha }} , \qquad \alpha \in (0,2) . $$

When \(E[T] = \infty \) and \(\sigma _X=\infty \), and when

$$ f_X (x) \sim \frac{1}{x^{1+\alpha }} , \quad f_T (t) \sim \frac{1}{t^{1+\beta }} , \qquad \alpha \in (0,2) , \quad \beta \in (0,1) , $$

the scale is \(L\sim t^{\beta /\alpha }\). The walks are super-diffusive if \(2\beta >\alpha \) and sub-diffusive otherwise. Processes for which \(E[T]=\infty \) are deeply non-Markovian: this means that the values of the process at given time depend on its whole history, not just on the immediately preceding state. Further reading can be found in [19, 21].

9 Problems

9.1 Convolutions with the Normal Distribution

Calculate the convolution of the normal distribution with the uniform, normal and exponential distributions!

The convolution of the uniform distribution with the density \(f_X(x) = 1/(b-a)\) (see (3.1)) and the normal distribution with the density \(f_Y\) (Definition (3.7)) is

$$\begin{aligned} f_Z(z)= & {} \int _a^b f_X(x) f_Y(z-x) \, \mathrm {d}x = {1\over b-a} {1\over \sqrt{2\pi }\,\sigma } \int _a^b \exp \left[ -{(z-x)^2\over 2\sigma ^2} \right] \,\mathrm {d}x \\= & {} {1\over b-a} {1\over \sqrt{2\pi }} \int _{(a-z)/\sigma }^{(b-z)/\sigma } \mathrm {e}^{-u^2/2} \,\mathrm {d}u = {1\over 2(b-a)} \left[ \mathrm {erf} \left( {b-z\over \sqrt{2}\,\sigma } \right) - \mathrm {erf} \left( {a-z\over \sqrt{2}\,\sigma } \right) \right] . \end{aligned}$$

This function is shown in Fig. 6.11 (left).

Fig. 6.11
figure 11

[Left] Convolution of the uniform distribution \(U(-3,3)\) and the standardized normal distribution N(0, 1). [Right] Convolution of the exponential distribution with parameter \(\lambda = 0.2\) and the standardized normal distribution

Part is easily solved by using characteristic functions (B.20) and property (B.22):

$$ \phi _Z(t) = \phi _{X+Y}(t) = \phi _{X}(t)\phi _{Y}(t) = \mathrm {e}^{\mathrm {i} \, \left( \mu _X+\mu _Y\right) t} \mathrm {e}^{-\left( \sigma _X^2+\sigma _Y^2\right) t^2 / 2} = \mathrm {e}^{\mathrm {i} \, \mu _Z t} \mathrm {e}^{-\sigma _Z^2 t^2 / 2} . $$

From this it is clear that the convolution of two normal distributions with means \(\mu _X\) and \(\mu _Y\) and variances \(\sigma _X^2\) and \(\sigma _Y^2\) is also a normal distribution, with mean \(\mu _Z = \mu _X+\mu _Y\) and variance \(\sigma _Z^2 = \sigma _X^2 + \sigma _Y^2\).

Problem requires us to convolute the distribution with the probability density \(f_X(x) = \lambda \, \mathrm {exp}(-\lambda x)\) (see (3.4)) and the normal distribution, where we set \(\mu =0\):

$$ f_Z(z) = {\lambda \over \sqrt{2\pi }\,\sigma } \int _{-\infty }^z \mathrm {e}^{-\lambda (z-y)} \mathrm {e}^{-y^2/(2\sigma ^2)} \, \mathrm {d}y . $$

Upon rearranging the exponent,

$$\begin{aligned} -\lambda (z-y) - {y^2\over 2\sigma ^2}&= -{\lambda \over 2\sigma ^2}\biggl ( 2\sigma ^2(z-y) + {y^2\over \lambda } + \lambda \sigma ^4 - \lambda \sigma ^4 \biggr ) \\&= -\lambda z + {\lambda ^2\sigma ^2\over 2} -{1\over 2\sigma ^2} \left( y - \lambda \sigma ^2 \right) ^2 , \end{aligned}$$

it follows that

$$\begin{aligned} f_Z(z)= & {} {\lambda \over \sqrt{2\pi }\,\sigma } \exp \left( -\lambda z + {\lambda ^2\sigma ^2\over 2} \right) \int _{-\infty }^{(z-\lambda \sigma ^2)/\sigma } \mathrm {e}^{-u^2/2} \, \mathrm {d}u \\= & {} \lambda \exp \left( -\lambda z + {\lambda ^2\sigma ^2\over 2} \right) {1\over 2} \left[ 1 + \mathrm {erf} \left( {z - \lambda \sigma ^2 \over \sqrt{2}\,\sigma } \right) \right] . \end{aligned}$$

This function is shown in Fig. 6.11 (right).

9.2 Spectral Line Width

The lines in emission spectra of atoms and molecules have finite widths [24]. Line broadening has three contributions: the natural width (N), collisional broadening (C) due to inelastic collisions of radiating particles, and Doppler broadening (D). Calculate a realistic spectral line profile by convoluting these three distributions.

The natural width of the line in spontaneous emission—usually the smallest contribution to broadening—has a Lorentz (Cauchy) profile with a width of \(\Delta \nu _\mathrm {N}\),

$$ \phi _\mathrm {N}(\nu ) = {1\over \pi } { \Delta \nu _\mathrm {N}/2 \over (\nu -\nu _0)^2 + (\Delta \nu _\mathrm {N}/2)^2 } . $$

As noted in the discussion of (3.19), such a distribution embodies a Fourier transformation of the exponential time dependence of the decays into Fourier space.

The broadening due to inelastic collisions depends on pressure and temperature—approximately one has \(\Delta \nu _\mathrm {C} \propto p/\sqrt{T}\)—and has a Cauchy profile as well:

$$\begin{aligned} \phi _\mathrm {C}(\nu ) = {1\over \pi } { \Delta \nu _\mathrm {C}/2 \over (\nu -\nu _0)^2 + (\Delta \nu _\mathrm {C}/2)^2 } . \end{aligned}$$
(6.17)

A convolution of two Cauchy distributions is again a Cauchy distribution,

$$ \phi _\mathrm {N+C}(\nu ) = \int _{-\infty }^{\infty } \phi _\mathrm {N}(\rho ) \phi _\mathrm {C}(\nu -\rho ) \, \mathrm {d}\rho = {1\over \pi } { (\Delta \nu _\mathrm {N} + \Delta \nu _\mathrm {C}) / 2 \over (\nu - \nu _0)^2 + (\Delta \nu _\mathrm {N} + \Delta \nu _\mathrm {C})^2 / 4 } , $$

where we have shifted the origin of \(\phi _\mathrm {C}\) before integrating by setting \(\nu _0 = 0\) in (6.17). If we had failed to do that, the peak of the convoluted distribution \(\phi _\mathrm {N+C}\) would shift from \(\nu _0\) to \(2\nu _0\)—see Sect. 6.1.1!

The Doppler effect is proportional to the velocity of the radiating objects, which is normally distributed (see (3.14) for a single velocity component), hence the corresponding contribution to the line profile has the form

$$ \phi _\mathrm {D}(\nu ) = {2\sqrt{\log 2}\over \sqrt{\pi }\Delta \nu _\mathrm {D}} \mathrm {exp}\left\{ -\left( {2\sqrt{\log 2}\over \Delta \nu _\mathrm {D}}(\nu -\nu _0) \right) ^2 \right\} . $$

The final spectral line shape is calculated by the convolution of the distributions \(\phi _\mathrm {N+C}\) and \(\phi _\mathrm {D}\), where again the origin must be shifted. We obtain

$$ \phi _\mathrm {V}(\nu ) = \int _{-\infty }^\infty \phi _\mathrm {N+C}(\rho )\phi _{\mathrm {D}}(\nu -\rho )\,\mathrm {d}\rho = {2\sqrt{\log 2}\over \sqrt{\pi }\Delta \nu _\mathrm {D}} \left\{ {a\over \pi } \int _{-\infty }^\infty {\mathrm {e}^{-x^2}\over (w-x)^2 + a^2} \, \mathrm {d}x \right\} , $$

where

$$ a = \sqrt{\log 2}\,{\Delta \nu _\mathrm {N}+\Delta \nu _\mathrm {C} \over \Delta \nu _\mathrm {D}} , \qquad w = 2\sqrt{\log 2}\,{\nu -\nu _0\over \Delta \nu _\mathrm {D}} . $$

This is called the Voigt distribution. The natural width is usually neglected because \(\Delta \nu _\mathrm {N} \ll \Delta \nu _\mathrm {C}, \Delta \nu _\mathrm {D}\). How well \(\phi _\mathrm {V}\) describes an actual line shape (as compared to the Cauchy and Gaussian profiles) is shown in Fig. 6.12.

Fig. 6.12
figure 12

Description of the Si (III) emission line at the wave-length of \(254.182\,\mathrm {nm}\) (compare to Fig. 3.6) by a Gaussian (normal), Cauchy (Lorentz) and Voigt distribution with added constant background

9.3 Random Structure of Polymer Molecules

(Adapted from [5].) A polymer molecule can be envisioned as a chain consisting of a large number of equal, rigid, thin segments of length L. Starting at the origin, a molecule grows by attaching to the current terminal point further and further segments in arbitrary directions in space. What is the probability distribution for the position of the terminal point? Calculate the expected distance \(\overline{R}\) between the initial and terminal point of the chain and \(\overline{R^2}\)!

When a new segment is attached to the chain, it “chooses” its orientation at random: the directional distribution is therefore isotropic, \(f_\Theta (\cos \theta ) = \mathrm {d}F_\Theta /\mathrm {d}(\cos \theta ) = 1/2\). For a projection of a single segment onto an arbitrary direction (e.g. x) we have

$$\begin{aligned} \overline{X_1} = L \, \overline{\cos \Theta }= & {} L \int _0^\pi \cos \theta \, f_\Theta (\cos \theta ) \,\sin \theta \, \mathrm {d}\theta = 0 , \nonumber \\ \sigma _{X_1}^2 = \overline{X_1^2} = L^2 \, \overline{\cos ^2\Theta }= & {} L^2 \int _0^\pi \cos ^2\theta \, f_\Theta (\cos \theta ) \,\sin \theta \, \mathrm {d}\theta = {L^2\over 3} . \end{aligned}$$
(6.18)

The X-coordinate of the terminal point of an N-segment chain is the sum of independent and identically distributed random variables \(X_1\) so, by the central limit theorem, it is nearly normally distributed at large N, with expected value \(\overline{X} = N\overline{X_1} = 0\) and variance \(\sigma _X^2 = \overline{X^2} = N\sigma _{X_1}^2 = NL^2/3\). The corresponding probability density is

$$ f_X(x) = {1\over \sqrt{2\pi }\,\sigma _X} \, \exp \left( -{x^2\over 2\sigma _X^2}\right) = \sqrt{3\over 2\pi NL^2} \, \exp \left( -{3x^2\over 2NL^2}\right) . $$

The x, y and z projections are not independent when a single segment is attached, but they are independent on average (after many attachments), so the same reasoning applies to Y and Z coordinates. Since \(R^2 = X^2 + Y^2 + Z^2\), the probability density corresponding to the radial distribution of the terminal point of the chain is

$$\begin{aligned} f_R(r) = f_X(x) f_X(y) f_X(z) = \left( {3\over 2\pi NL^2} \right) ^{3/2} \exp \left( -{3r^2\over 2NL^2}\right) . \end{aligned}$$
(6.19)

This can be used to calculate the expected values of R and \(R^2\):

$$ \overline{R} = \int _0^\infty r f_R(r) \, 4\pi r^2 \, \mathrm {d}r = L \sqrt{8N\over 3\pi } , \qquad \overline{R^2} = \int _0^\infty r^2 f_R(r) \, 4\pi r^2 \, \mathrm {d}r = NL^2 . $$

The latter can also be derived by recalling (6.18), since

$$ \overline{R^2} = \overline{X^2 + Y^2 + Z^2} = \overline{X^2} + \overline{Y^2} + \overline{Z^2} = 3 \, {NL^2\over 3} = NL^2 . $$

There is yet another path to the same result. Each segment (\(n=1,2,\ldots ,N\)) is defined by a vector \({{\varvec{r}}}_n = (x_n, y_n, z_n)^\mathrm {T}\). We are interested in the average square of the sum vector,

$$ R^2 = |{{\varvec{R}}}|^2 = \left( \sum _{m=1}^N {{\varvec{r}}}_m \right) ^\mathrm {T} \left( \sum _{n=1}^N {{\varvec{r}}}_n \right) = \sum _{n} {{\varvec{r}}}_n^2 + \sum _{m\ne n} {{\varvec{r}}}_m^\mathrm {T} {{\varvec{r}}}_n . $$

Averaging the second sum yields zero due to random orientations, \(\overline{{{\varvec{r}}}_m^\mathrm {T} {{\varvec{r}}}_n} = 0\), hence

$$ \overline{R^2} = \sum _{n=1}^N \overline{{{\varvec{r}}}_n^2} = N \overline{{{\varvec{r}}}_1^2} = NL^2 . $$

9.4 Scattering of Thermal Neutrons in Lead

(Adapted from [5].) A neutron moves with velocity v in lead and scatters elastically off lead nuclei. The average time between collisions is \(\tau \), corresponding to the mean free path \(\lambda = v\tau \). The times between consecutive collisions are mutually independent, and each scattering is isotropic. What is the (spatial) probability distribution of neutrons at long times? Calculate the average distance \(\overline{R}\) of neutrons from the origin and \(\overline{R^2}\)! Demonstrate that \(\overline{R^2}\) is proportional to time, so the process has the usual diffusive nature! The diffusion coefficient D is defined by the relation \(\overline{R^2} = 6 Dt\). How does D depend on \(\lambda \) and v?

Isotropic scattering implies \(f_\Theta (\cos \theta ) = \mathrm {d}F_\Theta /\mathrm {d}(\cos \theta ) = 1/2\). But we must also take into account the times between collisions or the distances l traversed by the neutron between collisions, \(f_T(t) = \mathrm {d}F_T/\mathrm {d}t = \tau ^{-1}\exp (-t/\tau )\), thus

$$ f_L(l) = {\mathrm {d}F_L\over \mathrm {d}l} = {\mathrm {d}F_T\over \mathrm {d}t} {\mathrm {d}t\over \mathrm {d}l} = {1\over \tau } \, \mathrm {e}^{-t/\tau } {1\over v} = {1\over \lambda } \, \mathrm {e}^{-l/\lambda } , $$

where \(l=vt\). The joint probability density of the linear and angular variable, relevant to each collision, is therefore

$$ f_{L,\Theta }(l,\cos \theta ) = {1\over 2\lambda } \, \mathrm {e}^{-l/\lambda } . $$

The expected value of the projection of the neutron trajectory between two collisions onto the x-axis and the corresponding variance are

$$\begin{aligned} \overline{X_1} = \overline{L\cos \Theta }= & {} \int _0^\infty \int _0^\pi l \cos \theta \, f_{L,\Theta }(l,\cos \theta ) \, \mathrm {d}l \, \sin \theta \, \mathrm {d}\theta = 0 , \\ \sigma _{X_1}^2 = \overline{X_1^2} = \overline{L^2\cos ^2\Theta }= & {} \int _0^\infty \int _0^\pi l^2 \cos ^2\theta \, f_{L,\Theta }(l,\cos \theta ) \, \mathrm {d}l \, \sin \theta \, \mathrm {d}\theta = {2\lambda ^2\over 3} . \end{aligned}$$

Hence, as in Sect. 6.9.3, \(\overline{X} = N\overline{X_1} = 0\) after N scatterings, while \(\sigma _X^2 = N \sigma _{X_1}^2 = 2N\lambda ^2/3\). Therefore the probability density for the distribution of R (distance from the origin to the current collision point) has the same functional form as in (6.19),

$$ f_R(r) = \left( {3\over 4\pi N\lambda ^2} \right) ^{3/2} \exp \left( -{3r^2\over 4N\lambda ^2}\right) , $$

one only needs to insert the variance \(2N\lambda ^2/3\) instead of \(NL^2/3\). It follows that

$$ \overline{R} = \int _0^\infty r f_R(r) \, 4\pi r^2 \, \mathrm {d}r = \lambda \sqrt{16N\over 3\pi } , \qquad \overline{R^2} = \int _0^\infty r^2 f_R(r) \, 4\pi r^2 \, \mathrm {d}r = 2N\lambda ^2 . $$

The elapsed time after N collisions is \(t = N\lambda /v\), so that indeed \(\overline{R^2}\) is proportional to time, \(\overline{R^2} = 2N\lambda ^2 = 2 (vt/\lambda )\lambda ^2 = 2vt\lambda \). From the definition \(\overline{R^2} = 6Dt\) it follows that

$$ D = {\lambda v\over 3} . $$

9.5 Distribution of Extreme Values of Normal Variables \(\star \)

Let continuous random variables \(X_i\) (\(i=1,2,\ldots ,n\)) be normally distributed, \(X_i \sim N(0,1)\), with the corresponding distribution function \(F_X(x) = \Phi (x)\) and probability density \(f_X(x) = \phi (x)\) for each variable:

$$ \Phi (x) = {1\over \sqrt{2\pi }} \int _{-\infty }^x \mathrm {e}^{-t^2/2} \, \mathrm {d}t , \qquad \phi (x) = {1\over \sqrt{2\pi }} \mathrm {e}^{-x^2/2} . $$

What is the distribution function \(F_{M_n}\) of the values \(M_n = \mathrm {max}\{ X_1, X_2, \ldots , X_n \}\)? This Problem [13] is a continuation of the Example on p. 157 and represents a method to determine the parameters \(a_n\) and \(b_n\) for the scaling formula (6.10) such that the limiting distribution (6.9) is non-degenerate.

Let \(0 \le \tau \le \infty \) and let \(\{ u_n \}\) be a sequence of real numbers such that

$$\begin{aligned} 1 - F_X(u_n) \rightarrow {\tau \over n} \quad \mathrm {when~} n \rightarrow \infty . \end{aligned}$$
(6.20)

By definition of the exponential function by a series we get

$$ F_{M_n}(u_n) = P\bigl ( M_n \le u_n \bigr ) = F_X^n(u_n) = \bigl [ 1 - \bigl ( 1 - F_X(u_n) \bigr ) \bigr ]^n = \left[ 1 - { \tau \over n } + {\mathcal{O}}(1/n) \right] ^n \sim \mathrm {e}^{-\tau } , $$

when \(n\rightarrow \infty \). The leading dependence of the distribution function, \(F_{M_n} \sim \exp (-\tau )\), follows without explicit reference to the parent distribution \(F_X(x)\) being normal! A motivation for a specific form of \(\tau \) can then be found in the asymptotic property of the normal distribution

$$\begin{aligned} 1 - \Phi (z) \sim { \phi (z) \over z } , \quad n\rightarrow \infty . \end{aligned}$$
(6.21)

Let \(\tau = \mathrm {e}^{-x}\). The reason for this choice, fully consistent with (6.20), will become clear in the following: this is the only way to obtain in the final expression a linear dependence on x in the rescaled argument of the distribution function. By comparing (6.20) to (6.21) we obtain

$$ 1 - \Phi (u_n) \sim { \mathrm {e}^{-x} \over n} \sim { \phi (u_n) \over u_n } \quad \Rightarrow \quad {1\over n} \, \mathrm {e}^{-x} { u_n \over \phi (u_n) } \rightarrow 1 . $$

Taking the logarithm we get \(-\log n - x + \log u_n - \log \phi (u_n) \rightarrow 0\) or

$$\begin{aligned} -\log n - x + \log u_n - \textstyle {1\over 2} \log 2\pi + \textstyle {1\over 2} \, u_n^2 \rightarrow 0 . \end{aligned}$$
(6.22)

For fixed x in the limit \(n\rightarrow \infty \) one therefore has \(u_n^2 / (2\log n) \rightarrow 1\), so that taking the logarithm again yields \(2\log u_n - \log 2 - \log \log n \rightarrow 0\) or

$$ \log u_n = \textstyle {1\over 2} \bigl ( \log 2 + \log \log n \bigr ) + {\mathcal{O}}(1) . $$

Inserting this in (6.22), we get \({1\over 2}u_n^2 = x + \log n - {1\over 2} \log 4\pi - {1\over 2}\log \log n + {\mathcal{O}}(1)\), hence

$$ u_n^2 = 2\log n \left[ 1 + { x - {1\over 2} \log 4\pi - {1\over 2} \log \log n \over \log n } + {\mathcal{O}}\left( {1\over \log n} \right) \right] , $$

and finally, after taking the square root,

$$ u_n = \sqrt{2\log n} \left[ 1 + { x - {1\over 2} \log 4\pi - {1\over 2} \log \log n \over 2 \log n } + {\mathcal{O}}\left( {1\over \log n} \right) \right] . $$

This expression has the form

$$ u_n = a_n x + b_n + {\mathcal{O}}\bigl (({\log n})^{-1/2}\bigr ) = a_n x + b_n + {\mathcal{O}}\bigl (a_n\bigr ) , $$

whence we read off the normalization constants \(a_n\) and \(b_n\):

$$\begin{aligned} a_n = {1\over \sqrt{2\log n}} , \qquad b_n = \sqrt{2\log n} - { \log \log n + \log 4\pi \over 2\sqrt{2\log n} } . \end{aligned}$$
(6.23)

These constants imply \(P \bigl ( M_n \le a_n x + b_n + {\mathcal{O}}(a_n) \bigr ) \rightarrow \exp \left( -\mathrm {e}^{-x} \right) \), that is,

$$ F_{M_n}(x) = P \left( { M_n - b_n \over a_n } + {\mathcal{O}}(1) \le x \right) \rightarrow \exp \left( -\mathrm {e}^{-x} \right) . $$

The distribution of extreme values of normally distributed variables is therefore of the Gumbel type (6.12) with normalization constants (6.23).