Ergodicity and Central Limit Theorems for Markov Processes

Kurtz, Thomas G.

doi:10.1007/978-3-319-30190-7_6

Thomas G. Kurtz⁴

Part of the book series: Contemporary Mathematicians ((CM))

1118 Accesses

Abstract

There are several contexts in the theory of Markov processes in which the term ergodicity is used, but in all of these, assertions of the form

Access provided by Autonomous University of Puebla. Download chapter PDF

An Introduction to Ergodic Theory

Keywords

1 Introduction

There are several contexts in the theory of Markov processes in which the term ergodicity is used, but in all of these, assertions of the form

$$\displaystyle{ \lim _{n\rightarrow \infty }\frac{1} {n}\sum _{k=1}^{n}h(X_{ k}) =\int hd\pi, }$$

(6.1)

or in continuous time,

$$\displaystyle{ \lim _{t\rightarrow \infty }\frac{1} {t}\int _{0}^{t}h(X(s))ds =\int hd\pi, }$$

(6.2)

for some probability measure, π, appear. Limits of this form are essentially laws of large numbers, and given such a limit, it is natural to ask about rates of convergence or fluctuations, in particular, to explore the behavior of the rescaled deviations,

$$\displaystyle{\sqrt{n}\left ( \frac{1} {n}\sum _{k=1}^{n}h(X_{ k}) -\int hd\pi \right )\mbox{ or }\sqrt{t}\left (\frac{1} {t}\int _{0}^{t}h(X(s))ds -\int hd\pi \right ).}$$

Many times during his career, Rabi has studied problems of this form. The goal of these brief comments is to review some of his results and provide some of the background needed to read his papers.

All processes we consider will take values in a complete separable metric space (E, r). They will be temporally homogeneous and Markov in discrete or continuous time. In discrete time, the transition function will be denoted by P(x, Γ), that is, there is a filtration $\{\mathcal{F}_{k}\}$ such that the process of interest $X =\{ X_{k},k = 0,1,\ldots \}$ satisfies

$$\displaystyle{ P\{X_{k+1} \in \varGamma \vert \mathcal{F}_{k}\} = P(X_{k},\varGamma ),\quad k = 0,1,\ldots,\varGamma \in \mathcal{B}(E), }$$

(6.3)

where $\mathcal{B}(E)$ denotes the Borel subsets of E. The filtration may be larger than the filtration generated by {X _k}. When X and $\{\mathcal{F}_{k}\}$ satisfy (6.3), we will say that X is $\{\mathcal{F}_{k}\}$-Markov with transition function P.

In continuous time, the transition function will be denoted by P(t, x, Γ) and there will be a filtration $\{\mathcal{F}_{t}\}$ such that the process {X(t), t ≥ 0} satisfies

$$\displaystyle{ P\{X(s + t) \in \varGamma \vert \mathcal{F}_{s}\} = P(t,X(s),\varGamma ),\quad s,t \geq 0,\varGamma \in \mathcal{B}(E). }$$

(6.4)

Setting

$$\displaystyle{T(t)f(x) =\int _{E}f(y)P(t,x,dy),\quad f \in B(E),}$$

where B(E) is the space of bounded, Borel measurable functions on E, the Markov property implies {T(t)} is a semigroup, that is

$$\displaystyle{T(s)T(t)f = T(s + t)f.}$$

The semigroup can (and will be) defined for larger classes of functions as is convenient.

The notion of an operator A being a generator for a Markov process can be defined in a variety of ways, but essentially always implies

$$\displaystyle{T(t)f = f +\int _{ 0}^{t}T(s)Afds,}$$

which in turn implies

$$\displaystyle{ f(X(t)) - f(X(0)) -\int _{0}^{t}Af(X(s))ds }$$

(6.5)

is a martingale for any filtration satisfying (6.4).

The analog in discrete time to the continuous-time semigroup is obtained by defining the linear operator

$$\displaystyle{Pf(x) =\int _{E}f(y)P(x,dy)}$$

and observing that

$$\displaystyle{E[f(X_{k+n})\vert \mathcal{F}_{k}] = P^{n}f(X_{ k}),}$$

and that

$$\displaystyle{ M_{n} = f(X_{n}) - f(X_{0}) -\sum _{k=0}^{n-1}(Pf(X_{ k}) - f(X_{k})) }$$

(6.6)

is a martingale for every f ∈ B(E). Consequently,

$$\displaystyle{A = P - I}$$

in discrete time plays the role of the generator in continuous time. The martingale properties (6.5) and (6.6) are central to the study of Markov processes and are the basis for the central limit theorems that Rabi and others have given.

By the initial distribution of a Markov process, we mean the distribution of X ₀ in the discrete case and of X(0) in the continuous-time case. The finite dimensional distributions of a Markov process are determined by its initial distribution and its transition function. If we want to emphasize the initial distribution μ of the process, we will write $\{X_{k}^{\mu }\}$ or {X ^μ(t)}.

The following lemma will prove useful in studying discrete time Markov processes.

Lemma 1.

Let P(x,Γ) be a transition function on E. There exists a measurable space $(U,\mathcal{U})$ , a measurable mapping $\alpha: U \times E \rightarrow E$ , and a probability distribution ν on $(U,\mathcal{U})$ such that if ξ has distribution ν, then α(ξ,x) has distribution P(x,⋅).

Consequently, if X ₀ has distribution $\mu \in \mathcal{P}(E)$ and $\xi _{1},\xi _{2},\ldots$ is a sequence of independent, ν-distributed, U-valued random variables that is independent of X ₀ , then for $\mathcal{F}_{k} =\sigma (X_{0},\xi _{1},\ldots,\xi _{k})$ , {X _k } defined recursively by

$$\displaystyle{X_{k+1} =\alpha (\xi _{k+1},X_{k}),\quad k = 0,1,\ldots,}$$

is a $\{\mathcal{F}_{k}\}$ -Markov process with initial distribution μ and transition function P(x,Γ).

Proof.

The construction in [8] gives α for ξ uniformly distributed on [0, 1] × [0, 1]. A slight modification allows ξ to be uniform on [0, 1].

Remark 1.

If the mapping $x \in E \rightarrow P(x,\cdot ) \in \mathcal{P}(E)$ is continuous taking the weak topology on $\mathcal{P}(E)$, then α given by the Blackwell and Dubins construction has the property that for each x ₀ ∈ E, the mapping $x \in E \rightarrow \alpha (x,\xi )$ is almost surely continuous at x ₀.

The next section reviews ideas of ergodicity of Markov processes and gives some of the basic results. The final section considers central limit theorems exploiting the martingale properties mentioned above. We assume that all continuous-time Markov processes considered are cadlag.

2 Ergodicity for Markov Processes

Ideas of ergodicity for Markov processes all relate to the existence of stationary distributions for the processes. In discrete time, $\pi \in \mathcal{P}(E)$ is a stationary distribution if

$$\displaystyle{\int _{E}P(x,\varGamma )\pi (dx) =\pi (\varGamma ),\quad \varGamma \in \mathcal{B}(E),}$$

and in continuous time, if

$$\displaystyle{\int _{E}T(t)f(x)\pi (dx) =\int _{E}f(x)\pi (dx),\quad f \in B(E),t \geq 0,}$$

which is equivalent to requiring

$$\displaystyle{\int _{E}Af(x)\pi (dx) = 0}$$

for a sufficiently large class of f.

If π is a stationary distribution and we take π to be the initial distribution for the process, then $\{X_{k}^{\pi }\}$ (or {X ^π(t)}) will be a stationary process. If $\{X_{k}^{\pi }\}$ is ergodic as defined generally for stationary processes, that is, the tail σ-field

$$\displaystyle{\mathcal{T} = \cap _{n}\sigma (X_{k}^{\pi },k \geq n)}$$

only contains events of probability zero or one, we will say that π is an ergodic stationary distribution. If there is only one stationary distribution, it must be ergodic. If π is an ergodic stationary distribution, then taking X = X ^π, (6.1) or (6.2) hold for all h ∈ B(E), or more generally, for all h ∈ L ¹(π).

The questions of existence and uniqueness of stationary distributions are among the fundamental questions in the study of Markov processes. If, as is typically the case,

$$\displaystyle{ P: C_{b}(E) \rightarrow C_{b}(E), }$$

(6.1)

or

$$\displaystyle{ T(t): C_{b}(E) \rightarrow C_{b}(E),\quad t \geq 0, }$$

(6.2)

where C _b(E)is the space of bounded continuous functions on E, proof of existence of a stationary distribution can be reduced to the proof of relative compactness of a sequence of probability measures.

Theorem 1.

Assume that {T(t)} satisfies ( 6.2 ) and for a corresponding Markov process {X ^μ (t)}, define a family of probability measures {ν _t } by

$$\displaystyle\begin{array}{rcl} \nu _{t}f = E[\frac{1} {t}\int _{0}^{t}f(X(s))ds]& =& \frac{1} {t}\int _{0}^{t}E[f(X(s))]ds \\ & =& \frac{1} {t}\int _{0}^{t}\int _{ E}T(s)fd\mu ds,\quad f \in B(E).{}\end{array}$$

(6.3)

Then as $t \rightarrow \infty$ , any weak limit point of {ν _t } is a stationary distribution for {T(t)}.

Similarly, in the discrete-time case, if P satisfies ( 6.1 ), any weak limit point of {ν _n } defined by

$$\displaystyle{ \nu _{n}f = \frac{1} {n}\sum _{k=0}^{n-1}E[f(X_{ k})] }$$

(6.4)

is a stationary distribution for P.

Proof.

Suppose $t_{n} \rightarrow \infty$ and $\{\nu _{t_{n}}\}$ converges weakly to π. Observe that for each f ∈ C _b(E) and t > 0,

$$\displaystyle{ \frac{1} {t_{n}}\int _{t}^{t+t_{n} }E[f(X(s))]ds =\nu _{t_{n}}T(t)f}$$

has the same limit as $\nu _{t_{n}}f$. Consequently, π f = π T(t)f, and π is a stationary distribution.

The proof in the discrete case is essentially the same.

The natural approach to proving the existence of the sequence $\{\nu _{t_{n}}\}$ is to prove relative compactness for {ν _t}. Since relative compactness in $\mathcal{P}(E)$ is equivalent to tightness, we have the following.

Corollary 1.

Let E be compact. If {T(t)} satisfies ( 6.2 ), then there exists at least one stationary distribution for {T(t)}. Similarly, if P satisfies ( 6.1 ), then there exists at least one stationary distribution for P.

More generally, relative compactness is usually proved by obtaining a Lyapunov function for the process. In particular, we want to find a function $\psi: E \rightarrow [0,\infty )$ such that for each a ≥ 0, the level set

$$\displaystyle{\varGamma _{a} =\{ x \in E:\psi (x) \leq a\}}$$

is compact and for some initial distribution μ,

$$\displaystyle{K \equiv \sup _{t\geq 0}E[\psi (X^{\mu }(t))] <\infty.}$$

It follows that

$$\displaystyle{P\{X^{\mu }(t)\notin \varGamma _{a}\} = P[\psi (X^{\mu }(t))> a\} \leq \frac{K} {a} }$$

and that

$$\displaystyle{\nu _{t}(\varGamma _{a}^{c}) \leq \frac{\nu _{t}\psi } {a} \leq \frac{K} {a},}$$

so {ν _t} is tight and hence relatively compact.

The notion of a stochastic Lyapunov functions was developed in [14] and reflects ideas dating back to [10] and [13]. There is a large literature on constructing such functions. In discrete time, we have the following simple condition.

Lemma 2.

Let $\psi: E \rightarrow [0,\infty )$ . Suppose that there exist a ≥ 0 and 0 ≤ b < 1 such that

$$\displaystyle{ P\psi (x) \leq a + b\psi (x). }$$

(6.5)

Then for each n

$$\displaystyle{P^{n}\psi (x) \leq a\frac{1 - b^{n-1}} {1 - b} + b^{n}\psi (x),}$$

and hence for $\mu \in \mathcal{P}(E)$ satisfying ∫ _E ψdμ < ∞,

$$\displaystyle{\sup _{n}E[\psi (X_{n}^{\mu })] \leq \frac{a} {1 - b} +\int _{E}\psi d\mu <\infty.}$$

Consequently, if ψ has compact level sets and P satisfies ( 6.1 ), then there exists at least one stationary distribution for P.

The analogous result in the continuous-time case is somewhat more delicate. Rewriting (6.5) as

$$\displaystyle{(P - I)\psi (x) \leq a - (1 - b)\psi (x)}$$

and recalling that P − I plays a role analogous to the generator A suggests looking for ψ satisfying

$$\displaystyle{A\psi (x) \leq a -\epsilon \psi (x),}$$

for some positive a and ε. To this point, we have only considered A to be defined so that f and Af are in B(E). For many Markov processes, for example, diffusions, the extension of the generator to a large class of unbounded ψ is clear, but even in the diffusion setting with smooth ψ, in general we can only claim that

$$\displaystyle{\psi (X(t)) -\psi (X(0)) -\int _{0}^{t}A\psi (X(s))ds}$$

is a local martingale, not a martingale. Note, however, that if ψ is bounded below and Aψ is bounded above, this local martingale will also be a supermartingale. With that observation in mind, the following lemma provides the desired extension.

The following is essentially a consequence of Fatou’s lemma.

Lemma 3.

For n = 1,2,…, let f _n ,Af _n ∈ B(E), and

$$\displaystyle{f_{n}(X(t)) - f_{n}(X(0)) -\int _{0}^{t}Af_{ n}(X(s))ds}$$

be a martingale. Suppose f _n ≥ 0, sup_n,x Af _n (x) < ∞, and for each x ∈ E, {f _n (x)} and {Af _n (x)} converge. Denote the limits ψ and Aψ. Then

$$\displaystyle{ \psi (X(t)) -\psi (X(0)) -\int _{0}^{t}A\psi (X(s))ds }$$

(6.6)

is a supermartingale.

The supermartingale property is exactly what is needed to give the continuous-time analog of Lemma 2.

Lemma 4.

Let measurable functions $\psi,A\psi: E \rightarrow \mathbb{R}$ satisfy ψ ≥ 0 and sup_x∈E Aψ(x) < ∞. For $\mu \in \mathcal{P}(E)$ satisfying ∫ _E ψdμ < ∞, assume that ( 6.6 ) with X replaced by X ^μ is a supermartingale. Suppose

$$\displaystyle{A\psi (x) \leq a -\epsilon \psi (x)}$$

Then

$$\displaystyle{ E[\psi (X^{\mu }(t))] \leq \frac{a} {\epsilon } \vee \int _{E}\psi d\mu. }$$

(6.7)

Consequently, if ψ has compact level sets and {T(t)} satisfies ( 6.2 ), then there exists at least one stationary distribution for {T(t)}.

Proof.

Let Z ^μ denote the supermartingale. Then

$$\displaystyle\begin{array}{rcl} e^{\epsilon t}\psi (X^{\mu }(t))& =& \psi (X^{\mu }(0)) +\int _{ 0}^{t}e^{\epsilon s}d\psi (X^{\mu }(s)) +\int _{ 0}^{t}\epsilon e^{\epsilon s}\psi (X^{\mu }(s))ds {}\\ & =& \psi (X^{\mu }(0)) +\int _{ 0}^{t}e^{\epsilon s}dZ^{\mu }(s) +\int _{ 0}^{t}e^{\epsilon s}(\epsilon \psi (X^{\mu }(s)) + A\psi (X^{\mu }(s)))ds {}\\ & \leq & \psi (X^{\mu }(0)) +\int _{ 0}^{t}e^{\epsilon s}dZ^{\mu }(s) +\int _{ 0}^{t}e^{\epsilon s}ads. {}\\ \end{array}$$

Since $E[\int _{0}^{t}e^{\epsilon s}dZ^{\mu }(s)] \leq 0$,

$$\displaystyle{E[\psi (X^{\mu }(t))] \leq e^{-\epsilon t}\int _{ E}\psi d\mu + \frac{a} {\epsilon } (1 - e^{-\epsilon t}),}$$

and the lemma follows.

We can relax the conditions of Lemma 4 and still obtain relative compactness of {ν _t} but without the moment estimate (6.7).

Lemma 5.

Let measurable functions $\psi,A\psi: E \rightarrow \mathbb{R}$ satisfy ψ ≥ 0 and K = sup_x∈E Aψ(x) < ∞. For $\mu \in \mathcal{P}(E)$ satisfying ∫ _E ψdμ < ∞, assume that ( 6.6 ) with X replaced by X ^μ is a supermartingale. Suppose that for each a > 0

$$\displaystyle{\varGamma _{a} =\{ x: A\psi (x) \geq -a\}}$$

is compact. Then {ν _t } defined by ( 6.3 ) is relatively compact, and if {T(t)} satisfies ( 6.2 ), there exists at least one stationary distribution for {T(t)}.

If E is locally compact and {T(t)} satisfies ( 6.2 ), then existence of a stationary distribution holds as long as Γ _a is compact for some a > 0.

Remark 2.

The assumption that Γ _a is compact only for some a > 0 is, in general, not enough to ensure relative compactness of {ν _t}. If, however, the process is Harris recurrent (see Section 6.2.2), then existence of a stationary distribution implies convergence of {ν _t}.

Proof.

The supermartingale property implies

$$\displaystyle{-\int _{E}A\psi d\nu _{t} = -\frac{1} {t}E[\int _{0}^{t}A\psi (X(s))ds] \leq \frac{1} {t}\int _{E}\psi d\mu -\frac{1} {t}E[\psi (X(t))]] \leq \frac{1} {t}\int _{E}\psi d\mu,}$$

and for a > 0,

$$\displaystyle{a\nu _{t}(\varGamma _{a}^{c}) \leq K + \frac{1} {t}\int _{E}\psi d\mu,}$$

giving tightness and hence relative compactness for {ν _t}.

The second part of the lemma follows by the observation that {ν _t} is relatively compact as a probability measure on the one-point compactification of E and the compactness of Γ _a for some a > 0 implies that any limit point ν _∞ satisfies ν _∞(Γ _a) > 0 and hence ν _∞(E) > 0. Normalizing the restriction of ν _∞ to E to be a probability measure gives a stationary distribution for {T(t)}. See Theorem 4.9.9 of [9].

The following lemma gives conditions which coupled with some kind of irreducibility imply recurrence, but not necessarily positive recurrence.

Lemma 6.

Let measurable functions $\psi,A\psi: E \rightarrow \mathbb{R}$ satisfy ψ ≥ 0 and K = sup_x∈E Aψ(x) < ∞. For $\mu \in \mathcal{P}(E)$ satisfying ∫ _E ψdμ < ∞, assume that ( 6.6 ) with X replaced by X ^μ is a supermartingale. Suppose that for each a > 0,

$$\displaystyle{\varGamma _{a} =\{ x:\psi (x) \leq a\}}$$

is compact and that there exists a ₀ such that

$$\displaystyle{\sup _{x\in \varGamma _{a_{ 0}}^{c}}A\psi \leq 0.}$$

Let $\tau _{0} =\inf \{ t \geq 0: X^{\mu }(t) \in \varGamma _{a_{0}}$ and $\gamma _{a} =\inf \{ t \geq 0: X^{\mu }(t)\notin \varGamma _{a}\}$ . Then

$$\displaystyle{ \lim _{a\rightarrow \infty }P\{\gamma _{a} \leq \tau _{0}\} = 0. }$$

(6.8)

Proof.

It is at least not immediately obvious that γ _a < ∞ implies ψ(X ^μ(γ _a)) ≥ a, so some randomization may be necessary for a complete proof, but assuming this inequality holds, the supermartingale property implies

$$\displaystyle{aP\{\gamma _{a} \leq \tau _{0}\} \leq E[\psi (X^{\mu }(\gamma _{a} \wedge \tau _{0}))] \leq \int _{E}\psi d\mu,}$$

and (6.8) follows.

Example 1.

In [3], Rabi gives a class of ψ of the form $\psi (x) = F(\vert x - z\vert )$ for nondegenerate diffusion processes which satisfy the conditions of Lemma 6. (Actually, in Rabi’s notation, we need to set $\psi (x) = -F(\vert x - z\vert )$.) The non-degeneracy assumption then ensures Harris recurrence (see below). He also formulates similar conditions that imply transience and gives a construction of an F such that $\psi (x) = -F(\vert x - z\vert )$ satisfies the conditions of the second part of Lemma 5.

A central idea in the study of uniqueness of stationary distributions is the notion of Harris recurrence.

2.1 Harris Recurrence

Harris irreducibility requires the existence of a measure $\varphi$ on $\mathcal{B}(E)$ such that $\varphi (B)> 0$ implies that the Markov process visits B with positive probability, regardless of the initial distribution. If the process visits such B infinitely often with probability one, or in the continuous time case, the process visits B for arbitrarily large times, that is, τ _n = inf{t > n: X(t) ∈ B} is finite almost surely for each n, the process is Harris recurrent. As long as $\varphi$ is σ-finite, without loss of generality, we can and will assume $\varphi$ is a probability measure. In discrete time, the classical conditions for Harris recurrence can be formulated under the assumption that there exists a function $\varepsilon: E \rightarrow [0,1]$ such that the transition function satisfies

$$\displaystyle{ P(x,B) \geq \varepsilon (x)\varphi (B) }$$

(6.9)

and that for each initial condition μ, the Markov process satisfies

$$\displaystyle{ P\{\sum _{k=1}^{\infty }\varepsilon (X_{ k}^{\mu }) = \infty \} = 1. }$$

(6.10)

The following lemma illustrates the significance of these conditions.

Lemma 7.

Let $\mu \in \mathcal{P}(E)$ , and suppose that ( 6.9 ) and ( 6.10 ) hold. Then there exists a probability space with a process X ^μ , a filtration $\{\mathcal{F}_{k}^{\mu }\}$ , and a $\{\mathcal{F}_{k}^{\mu }\}$ -stopping time τ ^μ such that X ^μ is $\{\mathcal{F}_{k}^{\mu }\}$ -Markov with initial distribution μ and transition function P(x,Γ) and the distribution of $X_{\tau ^{\mu }}^{\mu }$ is $\varphi$ .

Proof.

We enlarge the state space to be E ×{−1, 1} and define the new transition function by

$$\displaystyle{Q(x,\theta,\varGamma \times \{\theta \}) = P(x,\varGamma ) -\varepsilon (x)\varphi (\varGamma )}$$

and

$$\displaystyle{Q(x,\theta,\varGamma \times \{-\theta \}) =\varepsilon (x)\varphi (\varGamma ).}$$

If (X ^μ, Θ) is a Markov process with this transition function such that X ₀ ^μ has distribution μ, then X ^μ is a Markov process with transition function P(x, Γ) and initial distribution μ, and the desired stopping time is $\tau ^{\mu } =\min \{ k:\theta _{k}\neq \theta _{k-1}\}$. Note that

$$\displaystyle{P\{\tau ^{\mu }> n\} = E[\prod _{k=0}^{n-1}(1 -\varepsilon (X_{ k}^{\mu }))]}$$

and (6.10) implies P{τ ^μ < ∞} = 1.

Much of the work on Harris recurrence is done under weaker conditions of the form

$$\displaystyle{\sum _{n=1}^{\infty }a_{ n}P^{n}(x,\varGamma ) \geq \varepsilon (x)\varphi (\varGamma ),}$$

where a _n(x) ≥ 0, $\sum _{n=1}^{\infty }a_{n}(x) = 1$, or in continuous time,

$$\displaystyle{\int _{0}^{\infty }P(t,x,\varGamma )a_{ x}(dt) \geq \varepsilon (x)\varphi (\varGamma ),}$$

where a _x is a probability distribution on (0, ∞), and typically, $\varepsilon (x)$ has the form $\varepsilon \mathbf{1}_{C}(x)$ for some constant $\varepsilon> 0$ and $C \in \mathcal{B}(E)$. The analog of Lemma 7 holds under these conditions, at least if (6.10) is replaced by

$$\displaystyle{P\{\sum _{k=1}^{\infty }\varepsilon (X_{ k}^{\mu })^{2} = \infty \} = 1\mbox{ or }P\{\int _{ 0}^{\infty }\varepsilon (X^{\mu }(s))^{2}ds = \infty \} = 1.}$$

The existence of these stopping times implies the desired uniqueness of the stationary distribution and convergence in total variation of ν _n and ν _t.

Lemma 8.

Let $\varphi \in \mathcal{P}(E)$ . Suppose that for each $\mu \in \mathcal{P}(E)$ , on some probability space, there exists a process X ^μ a filtration $\{\mathcal{F}_{k}^{\mu }\}$ , and a $\{\mathcal{F}_{k}^{\mu }\}$ -stopping time τ ^μ such that X ^μ is $\{\mathcal{F}_{k}^{\mu }\}$ -Markov with initial distribution μ and transition function P and $X_{\tau ^{\mu }}^{\mu }$ has distribution $\varphi$ . Then there is at most one stationary distribution for P.

If there is a stationary distribution π, then for each initial distribution $\mu \in \mathcal{P}(E)$ , {ν _n } defined by ( 6.4 ) converges in total variation to π.

The analogous result holds in continuous time.

Proof.

Suppose π ₁ and π ₂ are stationary distributions for P. Let $X^{\pi _{1}}$ and $X^{\pi _{2}}$ satisfy the hypotheses of the lemma. By the ergodic theorem, for each h ∈ B(E), we can define

$$\displaystyle{\;H_{\pi _{i}}^{h} =\lim _{ n\rightarrow \infty }\frac{1} {n}\sum _{k=0}^{n-1}h(X_{ k}^{\pi _{i} }) =\lim _{n\rightarrow \infty }\frac{1} {n}\sum _{k=\tau ^{\pi _{i}}}^{\tau ^{\pi _{i}}+n-1 }h(X_{k}^{\pi _{i} }) =\lim _{n\rightarrow \infty }\frac{1} {n}\sum _{k=0}^{n-1}h(X_{\tau ^{\pi _{ i}}+k}^{\pi _{i} }).}$$

By the strong Markov property, $H_{\pi _{1}}^{h}$ and $H_{\pi _{2}}^{h}$ must have the same distribution. Since $\pi _{i}h = E[H_{\pi _{i}}^{h}]$, π ₁ = π ₂.

Under the hypotheses of the lemma, for h ∈ B(E) and $\mu \in \mathcal{P}(E)$,

$$\displaystyle{\vert E[ \frac{1} {n}\sum _{k=0}^{n-1}h(X_{ k}^{\mu })] - E[ \frac{1} {n}\sum _{k=0}^{n-1}h(X_{ k}^{\varphi })]\vert ] \leq \Vert h\Vert (P\{\tau ^{\mu }> k\} + \frac{2k} {n} ),}$$

and hence

$$\displaystyle{ \vert E[ \frac{1} {n}\sum _{k=0}^{n-1}h(X_{ k}^{\mu })] -\pi h\vert \leq \Vert h\Vert (P\{\tau ^{\mu }> k\} + P\{\tau ^{\pi }> k\} + \frac{4k} {n} ), }$$

(6.11)

and taking the sup over h ∈ B(E) with $\Vert h\Vert \leq 1$ gives the convergence in total variation.

If $\tau ^{\varphi }$ satisfies $0 <E[\tau ^{\varphi }] <\infty$, then τ ₁ = 0 and $\tau _{2} =\tau ^{\varphi }$ provide an example of τ ₁ and τ ₂ in the following lemma.

Lemma 9.

Let X be $\{\mathcal{F}_{k}\}$ -Markov with transition function P, and let τ ₁ and τ ₂ be stopping times satisfying τ ₁ ≤τ ₂ and $0 <E[\tau _{2} -\tau _{1}] <\infty$ such that $X_{\tau _{1}}$ and $X_{\tau _{2}}$ have the same distribution. Then π defined by

$$\displaystyle{\pi h = \frac{E[\sum _{k=\tau _{1}+1}^{\tau _{2}}h(X_{k})]} {E[\tau _{2} -\tau _{1}]} }$$

is a stationary distribution for P.

In continuous time,

$$\displaystyle{\pi h = \frac{E[\int _{\tau _{1}}^{\tau _{2}}h(X(s))ds]} {E[\tau _{2} -\tau _{1}]}.}$$

Remark 3.

In the case $0 <E[\tau ^{\varphi }] <\infty$, this observation is essentially the renewal argument of [1] and [16].

Proof.

Since

$$\displaystyle{M_{n} = h(X_{n}) - h(X_{0}) -\sum _{k=0}^{n-1}(Ph(X_{ k}) - h(X_{k}))}$$

is a martingale,

$$\displaystyle{0 = E[h(X_{\tau _{2}}) - h(X_{\tau _{1}})] = E[\sum _{k=\tau _{1}}^{\tau _{2}-1}(Ph(X_{ k}) - h(X_{k})),}$$

and hence,

$$\displaystyle{\pi Ph =\pi h,}$$

so π is a stationary distribution for P.

Example 2.

In [6], Rabi and Mukul Majumdar consider processes in E = (0, 1) of the form

$$\displaystyle{X_{n+1} =\xi _{n+1}X_{n}(1 - X_{n}),}$$

where the {ξ _k} are iid with values in (0, 4). Clearly, this process satisfies (6.1). Under the assumption that the distribution of ξ has an absolutely continuous part with a density that is strictly positive on some interval, they give conditions for Harris recurrence.

Example 3.

The inequality in (6.11) and the analogous inequality in continuous time,

$$\displaystyle{ \vert E[\frac{1} {t}\int _{0}^{t}h(X^{\mu }(s))ds] -\pi h\vert \leq \Vert h\Vert (P\{\tau ^{\mu }> r\} + P\{\tau ^{\pi }> r\} + \frac{4r} {t} ), }$$

(6.12)

actually give rates of convergence. Under aperiodicity assumptions, one can replace the average by E[h(X ^μ(t))] and eliminate the O(t ⁻¹) term. In [7], Rabi and Aramian Wasielak give conditions under which this can be done for a class of diffusion processes.

2.2 Conditions without Harris Recurrence

Harris recurrence is very useful when it holds, or perhaps more to the point, when it can be shown that it holds. In general, it does not hold, even in relatively simple settings. Perhaps the best known example is the “Markov process” in [0, 1) given by the recursion

$$\displaystyle{X_{n+1} = X_{n} + z\mbox{ mod }1,}$$

for some irrational z.

For an example with more interesting probabilistic structure, let $E =\{ -1,1\}^{\infty }$, and consider a generator of the form

$$\displaystyle{ Af(x) =\sum _{ k=1}^{\infty }\lambda _{ k}(f(\eta _{k}x) - f(x)), }$$

(6.13)

where λ _k > 0, ∑ _k λ _k < ∞, and η _k x is obtained by replacing x _k by − x _k. If x, y ∈ E differ on infinitely many components, then P(t, x, ⋅ ) and P(t, y, ⋅ ) are mutually singular for all t, but for all x ∈ E, P(t, x, ⋅ ) converges weakly to the distribution under which the components are independent symmetric Bernoulli.

In general, infinite dimensional processes provide a source of examples that are not Harris recurrent even if ergodic. We will not address any more examples of this type, but see [11] for recent work in this direction.

There is a need for techniques for studying ergodicity for processes that are not Harris recurrent. One approach that appears frequently in Rabi’s work involves the notion of splitting and is discussed in the paper by Ed Waymire in this volume. A second approach considered by Rabi and Gopal Basak in [2] is by verifying asymptotical flatness, that is, by showing that X ^x and X ^y can be coupled in such a way that for each compact K ⊂ E and $\varepsilon> 0$,

$$\displaystyle{\lim _{t\rightarrow \infty }\sup _{x,y\in K}P\{\vert X^{x}(t) - X^{y}(t)\vert>\varepsilon \}= 0.}$$

For example, if one rewrites the generator in (6.13) as

$$\displaystyle{ Af(x) =\sum _{ k=1}^{\infty }2\lambda _{ k}(\frac{1} {2}f(\eta _{k}^{1}x) + \frac{1} {2}f(\eta _{k}^{-1}x) - f(x)), }$$

(6.14)

where η _k ¹ x is obtained from x by replacing x _k by 1 and η _k ⁻¹ x is obtained by replacing x _k by − 1, then the coupling can be obtained using independent Poisson processes {N _k}, N _k with intensity 2λ _k, and at the lth jump of N _k replacing x _k by ξ _kl, where the {ξ _kl} are independent symmetric Bernoulli.

Example 4.

In [2], Rabi and Gopal Basak consider diffusions of the form

$$\displaystyle{X^{x}(t) = x +\int _{ 0}^{t}BX^{x}(s)ds +\int _{ 0}^{t}\sigma (X^{x}(s))dW(s).}$$

One has a natural coupling simply by using the same Brownian motion W for both X ^x and X ^y. Lyapunov-type arguments are again employed but with analytic estimates rather than simply compactness arguments. In particular, the arguments employ ψ (v in the notation of the paper) of the form

$$\displaystyle{\psi (x) = (x \cdot Cx)^{1-\varepsilon },}$$

for appropriately chosen positive definite C and $\varepsilon \in [0,1)$. Different choices of C are applied to ψ(X ^x(t)) to ensure the existence of a stationary distribution and to $\psi (X^{x}(t) - X^{y}(t))$ to give the asymptotic flatness.

3 Central Limit Theorems

There are many version of the martingale central limit theorem. See, for example, [15, 17, 12]. The following version is from Theorem 7.1.4 of [9].

Theorem 2.

Let {M _n } be a sequence of cadlag, $\mathbb{R}^{d}$ -valued martingales, with M _n (0) = 0, and let A _n = [M _n ] be the matrix of covariations, that is,

$$\displaystyle{A_{n}^{ij}(t) = [M_{ n}^{i},M_{ n}^{j}]_{ t}.}$$

Suppose that for each t ≥ 0,

$$\displaystyle{\lim _{n\rightarrow \infty }E[\sup _{s\leq t}\vert M_{n}(s) - M_{n}(s-)\vert ] = 0}$$

and

$$\displaystyle{\lim _{n\rightarrow \infty }A_{n}(t) = A(t),}$$

where A is deterministic and continuous. Then {M _n } converges in distribution to a Gaussian process M such that M has independent increments, E[M(t)] = 0, and $[M^{i},M^{j}]_{t} = E[M^{i}(t)M^{j}(t)] = A^{ij}(t)$ .

If

$$\displaystyle{A(t) =\int _{ 0}^{t}\sigma (s)\sigma (s)^{T}ds,}$$

for some d × m-matrix-valued function σ, then we can write

$$\displaystyle{M(t) =\int _{ 0}^{t}\sigma (s)dW(s),}$$

where W is an $\mathbb{R}^{m}$ -valued standard Brownian motion.

Example 5.

Let π be an ergodic stationary distribution for a Markov semigroup {T(t)}. Then {T(t)} extends to L ²(π) and is strongly continuous on L ²(π). Let A be the Hille-Yosida generator for the semigroup on L ²(π). Then for each $f \in \mathcal{D}(A)$, the domain of A,

$$\displaystyle{M^{f}(t) = f(X^{\pi }(t)) - f(X^{\pi }(0)) -\int _{ 0}^{t}Af(X^{\pi }(s))ds}$$

is a square integrable martingale.

Then, for h ∈ L ²(π), ergodicity implies

$$\displaystyle{\lim _{n\rightarrow \infty }\frac{1} {n}\int _{0}^{nt}h(X^{\pi }(s))ds =\pi ht,}$$

and Theorem 2.1 of [4] gives the functional central limit theorem for the scaled deviations,

$$\displaystyle{Z_{n}^{h}(t) = \frac{1} {\sqrt{n}}\int _{0}^{nt}(h(X^{\pi }(s)) -\pi h)ds.}$$

The key assumption is that there exists $f \in \mathcal{D}(A)$ such that $Af = h -\pi h$. Then

$$\displaystyle{Z_{n}^{h}(t) = \frac{1} {\sqrt{n}}(f(X^{\pi }(nt)) - f(X^{\pi }(0))) - \frac{1} {\sqrt{n}}M^{f}(nt).}$$

Consequently, we have the functional central limit theorem for $\{Z_{n}^{h}\}$ provided we can prove the functional central limit theorem for $\{ \frac{1} {\sqrt{n}}M^{f}(n\cdot )\}$. Observe that the quadratic variation of $\frac{1} {\sqrt{n}}M^{f}(n\cdot )$ is the same as the quadratic variation for $U_{n}(t) = \frac{1} {\sqrt{n}}f(X^{\pi }(n\cdot ))$ and that by Itô’s formula,

$$\displaystyle\begin{array}{rcl} [U_{n}]_{t}& =& U_{n}(t)^{2} - \frac{1} {n}\int _{0}^{t}2f(X^{\pi }(ns-))df(X^{\pi }(ns)) \\ & =& U_{n}(t)^{2} - \frac{1} {n}\int _{0}^{t}2f(X^{\pi }(ns-))dM^{f}(ns) -\int _{ 0}^{t}2f(X^{\pi }(ns))Af(X^{\pi }(ns))ds \\ & \rightarrow & -t\int _{E}2f(x)Af(x)\pi (dx). {}\end{array}$$

(6.1)

By Theorem 2, the convergence of $Z_{n}^{h}$ follows. Of course, under the assumptions of Lemma 8, the same result will hold for X ^μ for all $\mu \in \mathcal{P}(E)$.

If f is smooth and X ^π is an $\mathbb{R}^{d}$-valued diffusion,

$$\displaystyle{X^{\pi }(t) = X^{\pi }(0) +\int _{ 0}^{t}\sigma (X(s))dW(s) +\int _{ 0}^{t}b(X(s))ds,}$$

then

$$\displaystyle{f(X^{\pi }(t)) = f(X^{\pi }(0)) +\int _{ 0}^{t}\nabla f(X^{\pi }(s))^{T}\sigma (X^{\pi }(s))dW(s) + R(t),}$$

where R is continuous with finite variation, so we can also write

$$\displaystyle\begin{array}{rcl} [U_{n}]_{t}& =& \frac{1} {n}\int _{0}^{nt}\nabla f(X^{\pi }(s))^{T}\sigma (X^{\pi }(s))\sigma (X^{\pi }(s))\nabla f(X^{\pi }(s))ds \\ & \rightarrow & \int _{\mathbb{R}^{d}}\nabla f(x)^{T}\sigma (x)\sigma (x)^{T}\nabla f(x)\pi (dx). {}\end{array}$$

(6.2)

Example 6.

In [5], Rabi considers diffusions in $\mathbb{R}^{d}$ of the form

$$\displaystyle{X(t) = X(0) +\int _{ 0}^{t}u_{ 0}b(X(s))ds +\int _{ 0}^{t}\sigma (X(s))dW(s),}$$

under the assumption that σ is the square root of a positive definite matrix and σ and b are periodic in the sense that

$$\displaystyle{\sigma (x + z) =\sigma (x)\mbox{ and }b(x + z) = b(x)\quad z \in \mathbb{Z}^{d}.}$$

At least under additional regularity assumptions on σ and b, Y (t) = X(t) mod 1, $\mathbf{1} \in \mathbb{Z}^{d}$, the vector with each component 1, is a Markov process in [0, 1)^d which has a unique, ergodic stationary distribution π. Then

$$\displaystyle{\lim _{n\rightarrow \infty }\frac{1} {n}X(nt) =\lim _{t\rightarrow \infty }\frac{1} {n}\int _{0}^{nt}u_{ 0}b(X(s))ds =\lim _{n\rightarrow \infty }\frac{1} {n}\int _{0}^{nt}u_{ 0}b(Y (s))ds = u_{0}\bar{b}t,}$$

where $\bar{b} =\pi b$. Rabi gives the corresponding central limit theorem showing the convergence of

$$\displaystyle{V _{n}(t) = \frac{1} {\sqrt{n}}(X(nt) - nu_{0}\bar{b}t).}$$

For simplicity, assume X(0) = 0. Setting

$$\displaystyle{M_{1}^{n}(t) = \frac{1} {\sqrt{n}}(X(nt) -\int _{0}^{nt}u_{ 0}b(Y (s))ds) = \frac{1} {\sqrt{n}}\int _{0}^{nt}\sigma (Y (s))dW(s),}$$

the convergence of $M_{1}^{n}$ follows from Theorem 2 and the ergodicity of Y.

Note that $V _{n} = M_{1}^{n} + Z^{n}$, where

$$\displaystyle{Z^{n}(t) = \frac{1} {\sqrt{n}}u_{0}\int _{0}^{nt}(b(Y (s)) -\bar{ b})ds.}$$

Then Z ⁿ is of the form treated in [4], Example 5. Let $\hat{A}$ denote the generator for Y, which will satisfy $\hat{A}f(x\mbox{ mod }\mathbf{1}) = Af(x)$, if f extends periodically to an element in the domain of A. Rabi shows the existence of a twice continuously differentiable g satisfying

$$\displaystyle{\hat{A}g(y) = b(y) -\bar{ b},}$$

and setting

$$\displaystyle{M_{2}^{n}(t) = \frac{u_{0}} {\sqrt{n}}(g(Y (nt)) - g(0) -\int _{0}^{nt}(b(Y (s)) -\bar{ b})ds),}$$

we have

$$\displaystyle{V _{n}(t) = M_{1}^{n}(t) - M_{ 2}^{n}(t) + \frac{u_{0}} {\sqrt{n}}(g(Y (nt)) - g(0)).}$$

Since

$$\displaystyle{M_{2}^{n}(t) = \frac{u_{0}} {\sqrt{n}}\int _{0}^{nt}\nabla g(X(s))^{T}\sigma (X(s))dW(s) + R_{ n}(t),}$$

where R _n is continuous with finite variation,

$$\displaystyle{[M_{1}^{n}-M_{ 2}^{n}]_{ t} = \frac{1} {n}\int _{0}^{nt}(I -u_{ 0}\nabla g(X(s))^{T})\sigma (X(s))\sigma (X(s))^{T}(I -u_{ 0}\nabla g(X(s)))ds.}$$

Setting a = σ σ ^T,

$$\displaystyle{D =\int _{[0,1)^{d}}(I - u_{0}\nabla g(y)^{T})a(y)(I - u_{ 0}\nabla g(y))\pi (dy),}$$

and V _n converges in distribution to a mean zero Brownian motion with covariance matrix D. The form of D derived here differs from the form in [5], but compare (6.1) and (6.2).

References

K. B. Athreya and P. Ney. A new approach to the limit theory of recurrent Markov chains. Trans. Amer. Math. Soc., 245: 493–501, 1978. ISSN 0002-9947. doi: 10.2307/1998882. URL http://dx.doi.org/10.2307/1998882.
Google Scholar
Gopal K. Basak and Rabi N. Bhattacharya. Stability in distribution for a class of singular diffusions. Ann. Probab., 20(1):312–321, 1992. ISSN 0091-1798. URL http://links.jstor.org/sici?sici=0091-1798(199201)20:1<312:SIDFAC>2.0.CO;2-L&origin=MSN.
R. N. Bhattacharya. Criteria for recurrence and existence of invariant measures for multidimensional diffusions. Ann. Probab., 6(4):541–553, 1978. ISSN 0091-1798.
Google Scholar
R. N. Bhattacharya. On the functional central limit theorem and the law of the iterated logarithm for Markov processes. Z. Wahrsch. verw. Gebiete, 60(2):185–201, 1982. ISSN 0044-3719. doi: 10.1007/BF00531822. URL http://dx.doi.org/10.1007/BF00531822.
Rabi Bhattacharya. A central limit theorem for diffusions with periodic coefficients. Ann. Probab., 13(2):385–396, 1985. ISSN 0091-1798. URL http://links.jstor.org/sici?sici=0091-1798(198505)13:2<385:ACLTFD>2.0.CO;2-S&origin=MSN.
Rabi Bhattacharya and Mukul Majumdar. Stability in distribution of randomly perturbed quadratic maps as Markov processes. Ann. Appl. Probab., 14(4):1802–1809, 2004. ISSN 1050-5164. doi: 10.1214/105051604000000918. URL http://dx.doi.org/10.1214/105051604000000918.
Google Scholar
Rabi Bhattacharya and Aramian Wasielak. On the speed of convergence of multidimensional diffusions to equilibrium. Stoch. & Dyn., 12(1):1150003, 19, 2012. ISSN 0219-4937. doi: 10.1142/S0219493712003638. URL http://dx.doi.org/10.1142/S0219493712003638.
Google Scholar
David Blackwell and Lester E. Dubins. An extension of Skorohod’s almost sure representation theorem. Proc. Amer. Math. Soc., 89 (4):691–692, 1983. ISSN 0002-9939. doi: 10.2307/2044607. URL http://dx.doi.org/10.2307/2044607.
Google Scholar
Stewart N. Ethier and Thomas G. Kurtz. Markov Processes: Characterization and Convergence. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, 1986. ISBN 0-471-08186-8.
Google Scholar
F. G. Foster. On the stochastic matrices associated with certain queuing processes. Ann. Math. Statistics, 24:355–360, 1953. ISSN 0003-4851.
Google Scholar
Martin Hairer and Jonathan C. Mattingly. A theory of hypoellipticity and unique ergodicity for semilinear stochastic PDEs. Electron. J. Probab., 16:no. 23, 658–738, 2011. ISSN 1083-6489. doi: 10.1214/EJP.v16-875. URL http://dx.doi.org/10.1214/EJP.v16-875.
Google Scholar
P. Hall and C. C. Heyde. Martingale Limit Theory and its Application. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London, 1980. ISBN 0-12-319350-8. Probability and Mathematical Statistics.
Google Scholar
R. Z. Hasminskii. Ergodic properties of recurrent diffusion processes and stabilization of the solution of the Cauchy problem for parabolic equations. Theory Probab. Appl., 5:179–196, 1960. ISSN 0040-361x.
Google Scholar
Harold J. Kushner. Stochastic Stability and Control. Mathematics in Science and Engineering, Vol. 33. Academic Press, New York-London, 1967.
Google Scholar
D. L. McLeish. Dependent central limit theorems and invariance principles. Ann. Probab., 2:620–628, 1974.
Google Scholar
E. Nummelin. A splitting technique for Harris recurrent Markov chains. Z. Wahrsch. verw. Gebiete, 43(4):309–318, 1978. ISSN 0178-8051.
Google Scholar
Rolando Rebolledo. Central limit theorems for local martingales. Z. Wahrsch. verw. Gebiete, 51(3):269–286, 1980. ISSN 0044-3719. doi: 10.1007/BF00587353. URL http://dx.doi.org/10.1007/BF00587353.

Download references

Acknowledgements

This research was supported in part by NSF grant DMS 11-06424.

Author information

Authors and Affiliations

University of Wisconsin, Madison, WI, USA
Thomas G. Kurtz

Authors

Thomas G. Kurtz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas G. Kurtz .

Editor information

Editors and Affiliations

Mathematics, The Pennsylvania State University, State College, Pennsylvania, USA
Manfred Denker
Department of Mathematics, Oregon State University, Corvallis, Oregon, USA
Edward C. Waymire

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kurtz, T.G. (2016). Ergodicity and Central Limit Theorems for Markov Processes. In: Denker, M., Waymire, E. (eds) Rabi N. Bhattacharya. Contemporary Mathematicians. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-30190-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-30190-7_6
Published: 01 July 2016
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-30188-4
Online ISBN: 978-3-319-30190-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Ergodicity and Central Limit Theorems for Markov Processes

Abstract

Similar content being viewed by others

Birkhoff’s Ergodic Theorem

An Introduction to Ergodic Theory

Markov Processes

Keywords

1 Introduction

Lemma 1.

Proof.

Remark 1.

2 Ergodicity for Markov Processes

Theorem 1.

Proof.

Corollary 1.

Lemma 2.

Lemma 3.

Lemma 4.

Proof.

Lemma 5.

Remark 2.

Proof.

Lemma 6.

Proof.

Example 1.

2.1 Harris Recurrence

Lemma 7.

Proof.

Lemma 8.

Proof.

Lemma 9.

Remark 3.

Proof.

Example 2.

Example 3.

2.2 Conditions without Harris Recurrence

Example 4.

3 Central Limit Theorems

Theorem 2.

Example 5.

Example 6.

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation