Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

16.1 Basic Notions

Let \(\langle\varOmega,\mathfrak{F},\mathbf{P}\rangle\) be a probability space and ξ=(ξ 0,ξ 1,…) an infinite sequence of random variables given on it.

Definition 16.1.1

A sequence ξ is said to be strictly stationary if, for any k, the distribution of the vector (ξ n ,…,ξ n+k ) does not depend on n, n≥0.

Along with the sequence ξ, consider the sequence (ξ n ,ξ n+1,…). Since the finite-dimensional distributions of these sequences (i.e. the distributions of the vectors (ξ m ,…,ξ m+k )) coincide, the distributions of the sequences will also coincide (one has to make use of the measure extension theorem (see Appendix 1) or the Kolmogorov theorem (see Sect. 3.5). In other words, for a stationary sequence ξ, for any n and \(\mathbf{B}\in\mathfrak{B}^{\infty}\) (for notation see Sect. 3.5), one has

$${\mathbf{P}} ({\boldsymbol{\xi}} \in\mathbf{B}) = \mathbf{P}\bigl((\xi _n, \xi_{n+1}, \ldots) \in \mathbf{B}\bigr). $$

The simplest example of a stationary sequence is given by a sequence of independent identically distributed random variables ζ=(ζ 0,ζ 1,…). It is evident that the sequence ξ k =α 0 ζ k +⋯+α s ζ k+s , k=0,1,2,…, will also be stationary, but the variables ξ k will no longer be independent. The same holds for sequences of the form

$$\xi_k=\sum_{j=0}^\infty \alpha_j\zeta_{k+j}, $$

provided that E|ζ j |<∞, ∑|α j |<∞, or if E ζ k =0, \({\rm Var} (\zeta_{k})<\infty\), \(\sum\alpha_{j}^{2}<\infty\) (the latter ensures a.s. convergence of the series of random variables, see Sect. 10.2). In a similar way one can consider stationary sequences ξ k =g(ζ k ,ζ k+1,…) “generated” by ζ, where g(x) is an arbitrary measurable functional \(\mathbb{R}^{\infty}\mapsto\mathbb{R}\).

Another example is given by stationary Markov chains. If {X n } is a real-valued Markov chain with invariant measure π and transition probability P(⋅,⋅) then the chain {X n } with X⊂=π will form a stationary sequence, because the distribution

$$\mathbf{P}(X_n\in B_0,\ldots,X_{n+k}\in B_k)=\int_{B_0}\boldsymbol{\pi}(dx_0) \int_{B_1}P(x_0,dx_1)\cdots \int _{B_k}P(x_{k-1},dx_k) $$

will not depend on n.

Any stationary sequence ξ=(ξ 0,ξ 1,…) can always be extended to a stationary sequence \(\overline{\boldsymbol{\xi}}=(\ldots\xi_{-1},\xi_{0},\xi_{1},\ldots )\) given on the “whole axis”.

Indeed, for any n, −∞<n<∞, and k≥0 define the joint distributions of (ξ n ,…,ξ n+k ) as those of (ξ 0,…,ξ k ). These distributions will clearly be consistent (see Sect. 3.5) and by the Kolmogorov theorem there will exist a unique probability distribution on \(\mathbb{R}^{\infty}_{-\infty}=\prod_{k=-\infty}^{\infty}\mathbb {R}_{k}\) with the respective σ-algebra such that any finite-dimensional distribution is a projection of that distribution on the corresponding subspace. It remains to take the random element \(\overline{\boldsymbol{\xi}}\) to be the identity mapping of \(\mathbb{R}^{\infty}_{-\infty}\) onto itself.

In some of the subsequent sections it will be convenient for us to use stationary sequences given on the whole axis.

Let ξ be such a sequence. Define a transformation θ of the space \(\mathbb{R}^{\infty}_{-\infty}\) onto itself with the help of the relations

$$ (\theta{\boldsymbol{x}})_k=({\boldsymbol{x}})_{k+1}=x_{k+1}, $$
(16.1.1)

where (x) k is the k-th component of the vector \({\boldsymbol{x}}\in \mathbb{R}^{\infty}_{-\infty}\), −∞<k<∞. The transformation θ clearly has the following properties:

1. It is a one-to-one mapping, θ −1 is defined by

$$\bigl(\theta^{-1} \boldsymbol{x}\bigr)_k=x_{k-1}. $$

2. The sequence \(\theta\overline{\boldsymbol{\xi}}\) is also stationary, its distribution coinciding with that of \(\overline{\boldsymbol{\xi}}\):

$$\mathbf{P}(\theta\overline{\boldsymbol{\xi}}\in \mathbf{B})=\mathbf {P}(\overline{ \boldsymbol{\xi}}\in \mathbf{B}). $$

It is natural to call the last property of the transformation θ the “measure preserving” property.

The above remarks explain to some extent why historically exploring the properties of stationary sequences followed the route of studying measure preserving transforms. Studies in that area constitute a substantial part of the modern analysis. In what follows, we will relate the construction of stationary sequences to measure preserving transformations, and it will be more convenient to regard the latter as “primary” objects.

Definition 16.1.2

Let \(\langle\varOmega,\mathfrak{F},\mathbf{P}\rangle\) be the basic probability space. A transformation T of Ω into itself is said to be measure preserving if:

  1. (1)

    T is measurable, i.e. \(T^{-1}A=\{\omega:T\omega\in A\} \in\mathfrak{F}\) for any \(A\in\mathfrak{F}\); and

  2. (2)

    T preserves the measure: P(T −1 A)=P(A) for any \(A\in\mathfrak{F}\).

Let T be a measure preserving transformation, T n its n-th iteration and ξ=ξ(ω) be a random variable. Put (ω)=ξ(), so that U is a transformation of random variables, and U k ξ(ω)=ξ(T k ω). Then

$$ \boldsymbol{\xi}= \bigl\{U^n\xi(\omega) \bigr \}^\infty_0= \bigl\{\xi \bigl(T^n\omega \bigr) \bigr\}^\infty_0 $$
(16.1.2)

is a stationary sequence of random variables.

Proof

Indeed, let A={ω;ξB}, \(\mathbf{B}\in \mathfrak {B}^{\infty}\) and A 1={ω: θ ξB}. We have

$${\boldsymbol{\xi}}= \bigl(\xi(\omega),\xi(T\omega),\ldots \bigr),\qquad \theta{ \boldsymbol{\xi}}= \bigl(\xi(T\omega),\xi\bigl(T^2\omega \bigr),\ldots \bigr). $$

Therefore ωA 1 if and only if A, i.e. when A 1=T −1 A. But P(T −1 A)=P(A) and hence P(A 1)=P(A), so that P(A n )=P(A) for any n≥1 as well, where A n ={ω:θ n ξB}. □

Stationary sequences defined by (16.1.2) will be referred to as sequences generated by the transformation T.

To be able to construct stationary sequences on the whole axis, we will need measure preserving transformations acting both in “positive” and “negative” directions.

Definition 16.1.3

A transformation T is said to be bidirectional measure preserving if:

  1. (1)

    T is a one-to-one transformation, the domain and range of T coincide with the whole Ω;

  2. (2)

    the transformations T and T −1 are measurable, i.e.

    $$T^{-1}A=\{\omega:T\omega\in A\}\in\mathfrak{F},\qquad TA=\{ T\omega: \omega\in A\}\in\mathfrak{F} $$

    for any \(A\in\mathfrak{F}\);

  3. (3)

    the transformation T preserves the measure: P(T −1 A)=P(A), and therefore P(A)=P(TA) for any \(A\in\mathfrak{F}\).

For such transformations we can, as before, construct stationary sequences ξ defined on the whole axis:

$${\boldsymbol{\xi}}= \bigl\{U^n\xi(\omega) \bigr\}_{-\infty}^\infty= \bigl\{\xi\bigl(T^n\omega\bigr) \bigr\}^\infty_{-\infty}. $$

The argument before Definition 16.1.2 shows that this approach “exhausts” all stationary sequences given on \(\langle\varOmega ,\mathfrak{F},\mathbf{P}\rangle\), i.e. to any stationary sequence ξ we can relate a measure preserving transformation T and a random variable ξ=ξ 0 such that ξ k (ω)=ξ 0(T k ω). In this construction, we consider the “sample probability space” \(\langle \mathbb{R}^{\infty},\mathfrak{B}^{\infty},\mathbf{P}\rangle\) for which ξ(ω)=ω, θ=T. The transformation θ=T (that is, transformation (16.1.1)) will be called the pathwise shift transformation. It always exists and “generates” any stationary sequence.

Now we will give some simpler examples of (bidirectional) measure preserving transformations.

Example 16.1.1

Let Ω={ω 1,…,ω d }, d≥2, be a finite set, \(\mathfrak{F}\) be the σ-algebra of all its subsets, i =ω i+1, 1≤id−1 and d =ω 1. If P(ω i )=1/d then T and T −1 are measure preserving transformations.

Example 16.1.2

Let Ω=[0,1), \(\mathfrak{F}\) be the σ-algebra of Borel sets, P the Lebesgue measure and s a fixed number. Then =ω+s (mod 1) is a bidirectional measure preserving transformation.

In these examples, the spaces Ω are rather small, which allows one to construct on them only stationary sequences with deterministic or almost deterministic dependence between their elements. If we choose in Example 16.1.1 the variable ξ so that all ξ(ω i ) are different, then the value ξ k (ω)=ξ(T k ω) will uniquely determine T k ω and thereby T k+1 ω and ξ k+1(ω). The same can be said of Example 16.1.2 in the case when ξ(ω), ω∈[0,1), is a monotone function of ω.

As our argument at the beginning of the section shows, the space \(\varOmega=\mathbb{\mathbb{R}}^{\infty}\) is large enough to construct on it any stationary sequence.

Thus, we see that the concept of a measure preserving transformation arises in a natural way when studying stationary processes. But not only in that case. It also arises, for instance, while studying the dynamics of some physical systems. Indeed, the whole above argument remains valid if we consider on \(\langle\varOmega,\mathfrak{F}\rangle\) an arbitrary measure μ instead of the probability P. For example, for \(\varOmega=\mathbb{R}^{\infty}\), the value μ(A), \(A\in\mathfrak{F}\), could be the Lebesgue measure (volume) of the set A. The measure preserving property of the transformation T will mean that any set A, after the transform T has acted on it (which, say, corresponds to the change of the physical system’s state in one unit of time), will retain its volume. This property is rather natural for incompressible liquids. Many laws to be established below will be equally applicable to such physical systems.

Returning to probabilistic models, i.e. to the case when the measure is a probability distribution, it turns out that, in that case, for any set A with P(A)>0, the “trajectory” T n ω will visit A infinitely often for almost all (with respect to the measure P) ωA.

Theorem 16.1.1

(Poincaré)

Let T be a measure preserving transformation and \(A\in\mathfrak{F}\). Then, for almost all ωA, the relation T n ωA holds for infinitely many n≥1.

Proof

Put N:={ωA:T n ωA for all n≥1}. Because \(\{\omega: T^{n}\omega\in A\}\in\mathfrak{F}\), it is not hard to see that \(N\in\mathfrak{F}\). Clearly, NT n N=∅ for any n≥1, and T m NT −(m+n) N=T m(NT n N)=∅. This means that we have infinitely many sets T n N, n=0,1,2,…, which are disjoint and have one and the same probability. This evidently implies that P(N)=0.

Thus, for each ωAN, there exists an n 1=n 1(ω) such that \(T^{n_{1}}\omega\in A\). Now we apply this assertion to the measure preserving mapping T k =T k, k≥1. Then, for each ωAN k , P(N k )=0, there exists an n k =n k (ω)≥1 such that \((T^{k})^{n_{k}}\omega\in A\). Since kn k k, the theorem is proved. □

Corollary 16.1.1

Let ξ(ω)≥0 and A={ω:ξ(ω)>0}. Then, for almost all ωA,

$$\sum_{n=0}^\infty\xi\bigl(T^n \omega\bigr)=\infty. $$

Proof

Put A k ={ω:ξ(ω)≥1/k}⊂A. Then by Theorem 16.1.1 the above series diverges for almost all ωA k . It remains to notice that A=⋃ k A k . □

Remark 16.1.1

Formally, one does not need condition P(A)>0 in Theorem 16.1.1 and Corollary 16.1.1. However, in the absence of that condition, the assertions may become meaningless, since the set AN in the proof of Theorem 16.1.1 can turn out to be empty. Suppose, for example, that in the conditions of Example 16.1.2, A is a one-point set: A={ω}, ω∈[0,1). If s is irrational, then T k ω will never be in A for k≥1. Indeed, if we assume the contrary, then we will infer that there exist integers k and m such that ω+skm=ω, s=m/k, which contradicts the irrationality of s.

16.2 Ergodicity (Metric Transitivity), Mixing and Weak Dependence

Definition 16.2.1

A set \(A\in\mathfrak{F}\) is said to be invariant (with respect to a measure preserving transformation T) if T −1 A=A. A set \(A\in\mathfrak{F}\) is said to be almost invariant if the sets T −1 A and A differ from each other by a set of probability zero: P(AT −1 A)=0, where \(A\oplus B=A\overline{B}\cup\overline{A}B\) is the symmetric difference.

It is evident that the class of all invariant (almost invariant) sets forms a σ-algebra which will be denoted by \(\mathfrak{I}\) (\(\mathfrak{I}^{*}\)).

Lemma 16.2.1

If A is an almost invariant set then there exists an invariant set B such that P(AB)=0.

Proof

Put B=lim sup n→∞ T n A (recall that \(\limsup_{n\to\infty}A_{n}=\bigcap_{n=1}^{\infty}\bigcup_{k=n}^{\infty}A_{k}\) is the set of all points which belong to infinitely many sets A k ). Then

$$T^{-1}B=\limsup_{n\to\infty}T^{-(n+1)}A=B, $$

i.e. \(B\in\mathfrak{I}\). It is not hard to see that

$$A\oplus B\subset\bigcup_{k=0}^\infty \bigl(T^{-k}A\oplus T^{-(k+1)}A\bigr). $$

Since

$$\mathbf{P}\bigl(T^{-k}A\oplus T^{-(k+1)}A\bigr)=\mathbf{P} \bigl(A\oplus T^{-1}A\bigr)=0, $$

we have P(AB)=0. The lemma is proved. □

Definition 16.2.2

A measure preserving transformation T is said to be ergodic (or metric transitive) if each invariant set has probability zero or one.

A stationary sequence {ξ k } associated with such T (i.e. the sequence which generated T or was generated by T) is also said to be ergodic (metric transitive).

Lemma 16.2.2

A transformation T is ergodic if and only if each almost invariant set has probability 0 or 1.

Proof

Let T be ergodic and \(A\in\mathfrak{I}^{*}\). Then by Lemma 16.2.1 there exists an invariant set B such that P(AB)=0. Because P(B)=0 or 1, the probability P(A)=0 or 1. The converse assertion is obvious. □

Definition 16.2.3

A random variable ζ=ζ(ω) is said to be invariant (almost invariant) if ζ(ω)=ζ() for all ωΩ (for almost all ωΩ).

Theorem 16.2.1

Let T be a measure preserving transformation. The following three conditions are equivalent:

  1. (1)

    T is ergodic;

  2. (2)

    each almost invariant random variable is a.s. constant;

  3. (3)

    each invariant random variable is a.s. constant.

Proof

(1) ⇒ (2). Assume that T is ergodic and ξ is almost invariant, i.e. ξ(ω)=ξ() a.s. Then, for any \(v\in\mathbb{R}\), we have \(A_{v}:=\{\omega:\xi(\omega)\leq v\}\in \mathfrak{I}^{*}\) and, by Lemma 16.2.2, P(A v ) equals 0 or 1. Put V:=sup{v:P(A v )=0}. Since A v Ω as v↑∞ and A v ↓∅ as v↓−∞, one has |V|<∞ and

$$\mathbf{P} \bigl(\xi(\omega)<V \bigr)= \mathbf{P} \Biggl(\,\bigcup _{n=1}^\infty \biggl\{\xi(\omega)<V-\frac {1}{n} \biggr\} \Biggr)=0. $$

Similarly, P(ξ(ω)>V)=0. Therefore P(ξ(ω)=V)=1.

(2) ⇒ (3). Obvious.

(3) ⇒ (1). Let \(A\in\mathfrak{I}\). Then the indicator function I A is an invariant random variable, and since it is constant, one has either I A =0 or I A =1 a.s. This implies that P(A) equals 0 or 1. The theorem is proved. □

The assertion of the theorem clearly remains valid if one considers in (3) only bounded random variables. Moreover, if ξ is invariant, then the truncated variable ξ (N)=min(ξ,N) is also invariant.

Returning to Examples 16.1.1 and 16.1.2, in Example 16.1.1,

$$\varOmega=(\omega_1,\ldots,\omega_d),\qquad T \omega_i=\omega_{i+1\ (\mathrm{mod}\ d)}, \qquad\mathbf{P}(\omega_i)=1/d. $$

The transformation T is obviously metric transitive.

In Example 16.1.2, Ω=[0,1), =ω+s (mod 1), and P is the Lebesgue measure. We will now show that T is ergodic if and only if s is irrational.

Consider a square integrable random variable ξ=ξ(ω):E ξ 2(ω)<∞. Then by the Parseval equality, the Fourier series

$$\xi(\omega)=\sum_{n=0}^\infty a_ne^{2\pi i n\omega} $$

for this function has the property \(\sum_{n=0}^{\infty}|c_{n}^{2}|<\infty \). Assume that s is irrational, while ξ is invariant. Then

$$\begin{aligned} a_n =&\mathbf{E}\xi(\omega)e^{-2\pi i n\omega}=\mathbf{E}\xi (T \omega)e^{-2\pi i nT\omega} \\=&e^{-2\pi i n s}\mathbf{E}\xi(T\omega)e^{-2\pi i n\omega}= e^{-2\pi i n s} \mathbf{E}\xi(\omega)e^{-2\pi i n\omega}=e^{-2\pi i n s}a_n. \end{aligned}$$

For irrational s, this equality is only possible when a n =0, n≥1, and ξ(ω)=a 0=const. By Theorem 16.2.1 this means that T is ergodic.

Now let s=m/n be rational (m and n are integers). Then the set

$$A=\bigcup_{k=0}^{n-1} \biggl\{\omega:\, \frac{2k}{2n}\leq\omega <\frac{2k+1}{2n} \biggr\} $$

will be invariant and P(A)=1/2. This means that T is not ergodic.  □

Definition 16.2.4

A measure preserving transformation T is called mixing if, for any \(A_{1}, A_{2}\in\mathfrak{F}\), as n→∞,

$$ \mathbf{P}\bigl(A_1\cap T^{-n}A_2 \bigr)\to\mathbf{P}(A_1)\mathbf{P}(A_2). $$
(16.2.1)

Now consider the stationary sequence ξ=(ξ 0,ξ 1,…) generated by the transformation T: ξ k (ω)=ξ 0(T k ω).

Definition 16.2.5

A stationary sequence ξ is said to be weakly dependent if ξ k and ξ k+n are asymptotically independent as n→∞, i.e. for any \(B_{1}, B_{2}\in\mathfrak{B}\)

$$ \mathbf{P}(\xi_k\in B_1, \xi_{k+n}\in B_2)\to\mathbf{P}(\xi_0\in B_1)\mathbf{P}(\xi _0\in B_2). $$
(16.2.2)

Theorem 16.2.2

A measure preserving transformation T is mixing if and only if any stationary sequence ξ generated by T is weakly dependent.

Proof

Let T be mixing. Put \(A_{i}:=\xi_{0}^{-1}(B_{i})\), i=1,2, and set k=0 in (16.2.2). Then

$$\mathbf{P}(\xi_0\in B_1, \xi_n\in B_2)=\mathbf{P}\bigl(A_1\cap T^{-n} A_2\bigr)\to\mathbf{P}(A_1)\mathbf{P}(A_2). $$

Now assume any sequence generated by T is weakly dependent. For any given \(A_{1}, A_{2}\in\mathfrak{F}\), define the random variable

$$\xi(\omega) = \begin{cases} 0 & \hbox{if}\ \omega\notin A_1 \cup A_2;\cr1 & \hbox{if}\ \omega\in A_1 \overline{A}_2;\cr 2& \hbox{if}\ \omega\in A_1 A_2;\cr 3 & \hbox{if}\ \omega\in\overline{A}_1 A_2; \end{cases} $$

and put ξ k (ω):=ξ(T k ω). Then, as n→∞,

$$\begin{aligned} \mathbf{P}\bigl(A_1\cap T^{-n}A_2\bigr) =& \mathbf{P}(0<\xi_0<3, \xi_n>2)\to\mathbf{P}(0< \xi_0<3)\mathbf{P}(\xi_0>2)\\=& \mathbf {P}(A_1) \mathbf{P}(A_2). \end{aligned}$$

The theorem is proved. □

Let {X n } be a stationary real-valued Markov chain with an invariant distribution π that satisfies the conditions of the ergodic theorem, i.e. such that, for any \(B\in\mathfrak{B}\) and \(x\in\mathbb{R}\), as n→∞,

$$\mathbf{P}(X_n\in B\mid X_0=x)\to\boldsymbol{\pi}(B). $$

Then {X n } is weakly dependent, and therefore, by Theorem 16.2.2, the respective transformation T is mixing. Indeed,

$$\mathbf{P}(X_0\in B_1, X_n\in B_2)=\mathbf{E}\mathrm{I}(X_0\in B_1) \mathbf{P}(X_n\in B_2\mid X_0), $$

where the last factor converges to π(B 2) for each X 0. Therefore the above probability tends to π(B 2)π(B 1).

Further characterisations of the mixing property will be given in Theorems 16.2.4 and 16.2.5.

Now we will introduce some notions that are somewhat broader than those from Definitions 16.2.4 and 16.2.5.

Definition 16.2.6

A transformation T is called mixing on the average if, as n→∞,

$$ \frac{1}{n}\sum_{k=1}^n \mathbf{P}\bigl(A_1\cap T^{-k}A_2\bigr)\to \mathbf {P}(A_1)\mathbf{P}(A_2). $$
(16.2.3)

A stationary sequence ξ is said to be weakly dependent on the average if

$$ \frac{1}{n}\sum_{k=1}^n \mathbf{P}(\xi_0\in B_1,\xi_k\in B_2)\to \mathbf{P}(\xi _0\in B_1) \mathbf{P}(\xi_0\in B_2). $$
(16.2.4)

Theorem 16.2.3

A measure preserving transformation T is mixing on the average if and only if any stationary sequence ξ generated by T is weakly dependent on the average.

The Proof is the same as for Theorem 16.2.2, and is left to the reader.  □

If {X n } is a periodic real-valued Markov chain with period d such that each of the embedded sub-chains \(\{X_{i+nd}\}_{n=0}^{\infty}\), i=0,…,d−1, satisfies the ergodicity conditions with invariant distributions π (i) on disjoint sets \(\mathcal{X}_{0},\ldots,\mathcal{X}_{d-1}\), then the “common” invariant distribution π will be equal to \(d^{-1}\sum_{i=0}^{d-1}\boldsymbol{\pi}^{(i)}\), and the chain {X n } will be weakly dependent on the average. At the same time, it will clearly not be weakly dependent for d>1.

Theorem 16.2.4

A measure preserving transformation T is ergodic if and only if it is mixing on the average.

Proof

Let T be mixing on the average, and \(A_{1}\in\mathfrak{F}\), \(A_{2}\in\mathfrak{I}\). Then A 2=T k A 2 and hence P(A 1T k A 2)=P(A 1 A 2) for all k≥1. Therefore, (16.2.3) means that P(A 1 A 2)=P(A 1)P(A 2). For A 1=A 2 we get P(A 2)=P 2(A 2), and consequently P(A 2) equals 0 or 1.

We postpone the proof of the converse assertion until the next section. □

Now we will give one more important property of ergodic transforms.

Theorem 16.2.5

A measure preserving transformation T is ergodic if and only if, for any \(A\in\mathfrak{F}\) with P(A)>0, one has

$$ \mathbf{P} \Biggl(\,\bigcup_{n=0}^\infty T^{-n}A \Biggr)=1. $$
(16.2.5)

Note that property (16.2.5) means that the sets T n A, n=0,1,…, “exhaust” the whole space Ω, which associates well with the term “mixing”.

Proof

Let T be ergodic. Put \(B:=\bigcup_{n=0}^{\infty}T^{-n}A\). Then T −1 BB. Because T is measure preserving, one also has that P(T −1 B)=P(B). From this it follows that T −1 B=B up to a set of measure 0 and therefore B is almost invariant. Since T is ergodic, P(B) equals 0 or 1. But P(B)≥P(A)>0, and hence P(B)=1.

Conversely, if T is not ergodic, then there exists an invariant set A such that 0<P(A)<1 and, therefore, for this set T n A=A holds and

$$\mathbf{P}(B)=\mathbf{P}(A)<1. $$

The theorem is proved. □

Remark 16.2.1

In Sects. 16.1 and 16.2 we tacitly or explicitly assumed (mainly for the sake of simplicity of the exposition) that the components ξ k of the stationary sequence ξ are real. However, we never actually used this, and so we could, as we did while studying Markov chains, assume that the state space \(\mathcal{X}\), in which ξ k take their values, is an arbitrary measurable space. In the next section we will substantially use the fact that ξ k are real- or vector-valued.

16.3 The Ergodic Theorem

For a sequence ξ 1,ξ 2,… of independent identically distributed random variables we proved in Chap. 11 the strong law of large numbers:

$$\frac{S_n}{n}\stackrel{\mathit{a.s.}}{\longrightarrow} \mathbf{E}\xi_1,\quad\mbox{where}\ S_n=\sum _{k=0}^{n-1}\xi_k. $$

Now we will prove the same assertion under much broader assumptions—for stationary ergodic sequences, i.e. for sequences that are weakly dependent on the average.

Let {ξ k } be an arbitrary strictly stationary sequence, T be the associated measure preserving transformation, and \(\mathfrak{I}\) be the σ-algebra of invariant sets.

Theorem 16.3.1

(Birkhoff–Khintchin)

If E|ξ 0|<∞ then

$$ \frac{1}{n}\sum_{k=0}^{n-1} \xi_k\stackrel{\mathit {a.s.}}{\longrightarrow} \mathbf{E}( \xi_0\mid \mathfrak{I}). $$
(16.3.1)

If the sequence {ξ k } (or transformation T) is ergodic, then

$$ \frac{1}{n}\sum_{k=0}^{n-1} \xi_k\stackrel{\mathit {a.s.}}{\longrightarrow} \mathbf{E} \xi_0. $$
(16.3.2)

Below we will be using the representation ξ k =ξ(T k ω) for ξ=ξ 0. We will need the following auxiliary result.

Lemma 16.3.1

Set

$$S_n (\omega): = \sum ^{n-1}_{k=0} \xi \bigl(T^k \omega\bigr), \qquad M_k(\omega) := \max \bigl\{ 0, S_1 (\omega), \ldots, S_k (\omega) \bigr\}. $$

Then, under the conditions of Theorem 16.3.1,

$$\mathbf{E} \bigl[ \xi(\omega) \mathrm{I}_{ \{M_n > 0\}} (\omega) \bigr] \ge0 $$

for any n≥1.

Proof

For all kn, one has S k ()≤M n (), and hence

$$\xi(\omega)+M_n(T\omega)\geq\xi(\omega)+S_k(T\omega )=S_{k+1}(\omega). $$

Because ξ(ω)≥S 1(ω)−M n (), we have

$$\xi(\omega)\geq\max \bigl\{\max(S_1(\omega),\ldots,S_n( \omega ) \bigr\}-M_n(T\omega). $$

Further, since

$$\bigl\{M_n(\omega)>0 \bigr\}= \bigl\{\max \bigl(S_1( \omega),\ldots ,S_n(\omega) \bigr)>0 \bigr\}, $$

we obtain that

$$\begin{aligned} {\mathbf{E}} \bigl[ \xi(\omega) \mathrm{I}_{ \{M_n > 0 \} } (\omega) \bigr] \ge& { \mathbf{E}} \bigl(\max\bigl(S_1 (\omega), \ldots, S_n ( \omega)\bigr) - M_n (T \omega) \bigr) \ \mathrm{I}_{ \{M_n > 0 \} } ( \omega) \\\ge &{\mathbf{E}} \bigl(M_n (\omega) - M_n (T \omega) \bigr)\mathrm{I}_{ \{M_n > 0 \} } (\omega)\\\ge& {\mathbf{E}} \bigl(M_n ( \omega) - M_n (T \omega) \bigr) = 0. \end{aligned}$$

The lemma is proved. □

Proof of Theorem 16.3.1

Assertion (16.3.2) is an evident consequence of (16.3.1), because, for ergodic T, the σ-algebra \(\mathfrak{I}\) is trivial and \(\mathbf{E}(\xi|\mathfrak{I})=\mathbf{E}\xi\) a.s. Hence, it suffices to prove (16.3.1).

Without loss of generality, we can assume that \(\mathbf{E}(\xi|\mathfrak{I})=0\), for one can always consider \(\xi-\mathbf{E}(\xi|\mathfrak{I})\) instead of ξ.

Let \(\overline{S}:=\limsup_{n\to\infty}n^{-1}S_{n}\) and \(\underline{S}:=\liminf_{n\to\infty}n^{-1}S_{n}\). To prove the theorem, it suffices to establish that

$$ 0\leq\underline{S}\leq\overline{S}\leq0\quad\mbox{a.s.} $$
(16.3.3)

Since \(\overline{S}(\omega)=\overline{S}(T\omega)\), the random variable \(\overline{S}\) is invariant and hence the set \(A-\varepsilon= \{\overline{S}(\omega)>\varepsilon \}\) is also invariant for any ε>0. Introduce the variables

$$\begin{aligned} \xi^* (\omega) :=& \bigl(\xi(\omega) - \varepsilon\bigr)\mathrm{I}_{A_{\varepsilon}} ( \omega), \\S^*_k (\omega) :=& \xi^* (\omega) + \cdots+ \xi^* \bigl(T^{k-1}\omega\bigr), \\M^*_k (\omega) := &\max \bigl(0, S^*_1, \ldots, S^*_k \bigr). \end{aligned}$$

Then, by Lemma 16.3.1, for any n≥1, one has

$$\mathbf{E}\xi^* \mathrm{I}_{ \{ M^*_n > 0 \} } \ge0. $$

But, as n→∞,

$$\begin{aligned} \bigl\{ M^*_n > 0 \bigr\} = &\Bigl\{\, \max _{1 \le k \le n} S^*_k > 0 \Bigr\} \uparrow \Bigl\{\, \sup _{k \ge1} S^*_k > 0 \Bigr\} \\=& \biggl\{\, \sup _{k \ge1} {S^*_k\over k} > 0 \biggr\} = \biggl\{\, \sup _{ k \ge1} {S_k\over k} > \varepsilon \biggr\} \cap A_{\varepsilon} = A_{\varepsilon}. \end{aligned}$$

The last equality follows from the observation that

$$A_{\varepsilon} = \{ \overline{S} > \varepsilon\} \subset \biggl\{ \,\sup _ {k \ge1} {S_k\over k} > \varepsilon \biggr\}. $$

Further, E|ξ |≤E|ξ|+ε. Hence, by the dominated convergence theorem,

$$0 \le\mathbf{E}\xi^* \mathrm{I}_{ \{ M^*_n > 0 \} } \to\mathbf{E}\xi ^* \mathrm{I}_ {A_{\varepsilon}}. $$

Consequently,

$$\begin{aligned} 0 \le\mathbf{E}\xi^* \mathrm{I}_{A_{\varepsilon}} = \mathbf{E}(\xi- \varepsilon) \mathrm{I}_{A_{\varepsilon}} = \mathbf{E}\xi \mathrm{I}_{A_{\varepsilon}} - \varepsilon \mathbf{P}(A_{\varepsilon}) \\= \mathbf{E} \mathrm{I}_{A_{\varepsilon}} \mathbf{E}(\xi\mid \mathfrak{I}) - \varepsilon\mathbf{P}(A_{\varepsilon}) = - \varepsilon\mathbf{P} (A_{\varepsilon}). \end{aligned}$$

This implies that P(A ε )=0 for any ε>0, and therefore \(\mathbf{P}(\overline{S}\leq0)=1\).

In a similar way, considering the variables −ξ instead of ξ, we obtain that

$$\limsup_{n\to\infty} \biggl(- {S_n\over n} \biggr) = - \liminf_{n\to\infty} {S_n\over n} = - \underline{S}, $$

and \(\mathbf{P}(-\underline{S}\le0) = 1\), \(\mathbf{P}(\underline {S}\ge0)=1\). The required inequalities (16.3.3), and therefore the theorem itself, are proved. □

Now we can complete the

Proof of Theorem 16.2.4

It remains to show that the ergodicity of T implies mixing on the average. Indeed, let T be ergodic and \(A_{1}, A_{2}\in\mathfrak{F}\). Then, by Theorem 16.3.1, we have

$$\zeta_n=\frac{1}{n}\sum_{k=1}^n{ \rm I}\bigl(T^{-k}A_2\bigr)\stackrel{\mathit{a.s.}}{ \longrightarrow}\mathbf {P}(A_2),\qquad\mathrm{I}(A_1) \zeta_n\stackrel{\mathit{a.s.}}{\longrightarrow} \mathrm{I}(A_1)\mathbf{P}(A_2). $$

Since ζ n I(A 1) are bounded, one also has the convergence

$$\mathbf{E}\zeta_n\mathrm{I}(A_1)\to\mathbf{P}(A_2) \cdot\mathbf{P}(A_1). $$

Therefore

$$\frac{1}{n} \sum ^n_{k=1} \mathbf{P} \bigl(A_1 \cap T^{-k} A_2\bigr) = \mathbf{E} \mathrm{I} (A_1) \zeta_n \to\mathbf{P}(A_1) \mathbf{P}(A_2). $$

The theorem is proved. □

Now we will show that convergence in mean also holds in (16.3.1) and (16.3.2).

Theorem 16.3.2

Under the assumptions of Theorem 16.3.1, one has along with (16.3.1) and (16.3.2) that, respectively,

$$ \mathbf{E} \biggl| {1\over n} \sum ^{n-1}_{k=0} \xi_k - \mathbf{E}( \xi_0 | \mathfrak{I} ) \biggr| \to0 $$
(16.3.4)

and

$$ \mathbf{E} \biggl| {1\over n} \sum ^{n-1}_{k=0} \xi_k - \mathbf{E} \xi_0 \biggr|\to0 $$
(16.3.5)

as n→∞.

Proof

The assertion of the theorem follows in an obvious way from Theorems 16.3.1, 6.1.7 and the uniform integrability of the sums

$$\frac{1}{n}\sum_{k=0}^{n-1} \xi_k, $$

which follows from Theorem 6.1.6. □

Corollary 16.3.1

If {ξ k } is a stationary metric transitive sequence and a=E ξ k <0, then S(ω)=sup k≥0 S k (ω) is a proper random variable.

The proof is obvious since, for 0<ε<−a, one has S k <(a+ε)k<0 for all kn(ω)<∞.  □

An unusual feature of Theorem 16.3.1 when compared with the strong law of large numbers from Chap. 11 is that the limit of

$$\frac{1}{n}\sum_{k=0}^{n-1} \xi_k $$

can be a random variable. For instance, let k :=ω k+2 and d=2l be even in the situation of Example 16.1.1. Then the transformation T will not be ergodic, since the set A={ω 1,ω 3,…,ω d−1} will be invariant, while P(A)=1/2.

On the other hand, it is evident that, for any function ξ(ω), the sum

$$\frac{1}{n}\sum_{k=0}^{n-1}\xi \bigl(T^k\omega\bigr) $$

will converge with probability 1/2 to

$$\frac{2}{d}\sum_{j=0}^{l-1}\xi( \omega_{2j+1}) $$

(if ω=ω i and i is odd) and with probability 1/2 to

$$\frac{2}{d}\sum_{j=1}^l\xi( \omega_{2j}) $$

(if ω=ω i and i is even). This limiting distribution is just the distribution of \(\mathbf{E}(\xi|\mathfrak{I})\).