Keywords

1 Introduction

There are several contexts in the theory of Markov processes in which the term ergodicity is used, but in all of these, assertions of the form

$$\displaystyle{ \lim _{n\rightarrow \infty }\frac{1} {n}\sum _{k=1}^{n}h(X_{ k}) =\int hd\pi, }$$
(6.1)

or in continuous time,

$$\displaystyle{ \lim _{t\rightarrow \infty }\frac{1} {t}\int _{0}^{t}h(X(s))ds =\int hd\pi, }$$
(6.2)

for some probability measure, π, appear. Limits of this form are essentially laws of large numbers, and given such a limit, it is natural to ask about rates of convergence or fluctuations, in particular, to explore the behavior of the rescaled deviations,

$$\displaystyle{\sqrt{n}\left ( \frac{1} {n}\sum _{k=1}^{n}h(X_{ k}) -\int hd\pi \right )\mbox{ or }\sqrt{t}\left (\frac{1} {t}\int _{0}^{t}h(X(s))ds -\int hd\pi \right ).}$$

Many times during his career, Rabi has studied problems of this form. The goal of these brief comments is to review some of his results and provide some of the background needed to read his papers.

All processes we consider will take values in a complete separable metric space (E, r). They will be temporally homogeneous and Markov in discrete or continuous time. In discrete time, the transition function will be denoted by P(x, Γ), that is, there is a filtration \(\{\mathcal{F}_{k}\}\) such that the process of interest \(X =\{ X_{k},k = 0,1,\ldots \}\) satisfies

$$\displaystyle{ P\{X_{k+1} \in \varGamma \vert \mathcal{F}_{k}\} = P(X_{k},\varGamma ),\quad k = 0,1,\ldots,\varGamma \in \mathcal{B}(E), }$$
(6.3)

where \(\mathcal{B}(E)\) denotes the Borel subsets of E. The filtration may be larger than the filtration generated by {X k }. When X and \(\{\mathcal{F}_{k}\}\) satisfy (6.3), we will say that X is \(\{\mathcal{F}_{k}\}\)-Markov with transition function P.

In continuous time, the transition function will be denoted by P(t, x, Γ) and there will be a filtration \(\{\mathcal{F}_{t}\}\) such that the process {X(t), t ≥ 0} satisfies

$$\displaystyle{ P\{X(s + t) \in \varGamma \vert \mathcal{F}_{s}\} = P(t,X(s),\varGamma ),\quad s,t \geq 0,\varGamma \in \mathcal{B}(E). }$$
(6.4)

Setting

$$\displaystyle{T(t)f(x) =\int _{E}f(y)P(t,x,dy),\quad f \in B(E),}$$

where B(E) is the space of bounded, Borel measurable functions on E, the Markov property implies {T(t)} is a semigroup, that is

$$\displaystyle{T(s)T(t)f = T(s + t)f.}$$

The semigroup can (and will be) defined for larger classes of functions as is convenient.

The notion of an operator A being a generator for a Markov process can be defined in a variety of ways, but essentially always implies

$$\displaystyle{T(t)f = f +\int _{ 0}^{t}T(s)Afds,}$$

which in turn implies

$$\displaystyle{ f(X(t)) - f(X(0)) -\int _{0}^{t}Af(X(s))ds }$$
(6.5)

is a martingale for any filtration satisfying (6.4).

The analog in discrete time to the continuous-time semigroup is obtained by defining the linear operator

$$\displaystyle{Pf(x) =\int _{E}f(y)P(x,dy)}$$

and observing that

$$\displaystyle{E[f(X_{k+n})\vert \mathcal{F}_{k}] = P^{n}f(X_{ k}),}$$

and that

$$\displaystyle{ M_{n} = f(X_{n}) - f(X_{0}) -\sum _{k=0}^{n-1}(Pf(X_{ k}) - f(X_{k})) }$$
(6.6)

is a martingale for every f ∈ B(E). Consequently,

$$\displaystyle{A = P - I}$$

in discrete time plays the role of the generator in continuous time. The martingale properties (6.5) and (6.6) are central to the study of Markov processes and are the basis for the central limit theorems that Rabi and others have given.

By the initial distribution of a Markov process, we mean the distribution of X 0 in the discrete case and of X(0) in the continuous-time case. The finite dimensional distributions of a Markov process are determined by its initial distribution and its transition function. If we want to emphasize the initial distribution μ of the process, we will write \(\{X_{k}^{\mu }\}\) or {X μ(t)}.

The following lemma will prove useful in studying discrete time Markov processes.

Lemma 1.

Let P(x,Γ) be a transition function on E. There exists a measurable space \((U,\mathcal{U})\) , a measurable mapping \(\alpha: U \times E \rightarrow E\) , and a probability distribution ν on \((U,\mathcal{U})\) such that if ξ has distribution ν, then α(ξ,x) has distribution P(x,⋅).

Consequently, if X 0 has distribution \(\mu \in \mathcal{P}(E)\) and \(\xi _{1},\xi _{2},\ldots\) is a sequence of independent, ν-distributed, U-valued random variables that is independent of X 0 , then for \(\mathcal{F}_{k} =\sigma (X_{0},\xi _{1},\ldots,\xi _{k})\) , {X k } defined recursively by

$$\displaystyle{X_{k+1} =\alpha (\xi _{k+1},X_{k}),\quad k = 0,1,\ldots,}$$

is a \(\{\mathcal{F}_{k}\}\) -Markov process with initial distribution μ and transition function P(x,Γ).

Proof.

The construction in [8] gives α for ξ uniformly distributed on [0, 1] × [0, 1]. A slight modification allows ξ to be uniform on [0, 1].

Remark 1.

If the mapping \(x \in E \rightarrow P(x,\cdot ) \in \mathcal{P}(E)\) is continuous taking the weak topology on \(\mathcal{P}(E)\), then α given by the Blackwell and Dubins construction has the property that for each x 0 ∈ E, the mapping \(x \in E \rightarrow \alpha (x,\xi )\) is almost surely continuous at x 0.

The next section reviews ideas of ergodicity of Markov processes and gives some of the basic results. The final section considers central limit theorems exploiting the martingale properties mentioned above. We assume that all continuous-time Markov processes considered are cadlag.

2 Ergodicity for Markov Processes

Ideas of ergodicity for Markov processes all relate to the existence of stationary distributions for the processes. In discrete time, \(\pi \in \mathcal{P}(E)\) is a stationary distribution if

$$\displaystyle{\int _{E}P(x,\varGamma )\pi (dx) =\pi (\varGamma ),\quad \varGamma \in \mathcal{B}(E),}$$

and in continuous time, if

$$\displaystyle{\int _{E}T(t)f(x)\pi (dx) =\int _{E}f(x)\pi (dx),\quad f \in B(E),t \geq 0,}$$

which is equivalent to requiring

$$\displaystyle{\int _{E}Af(x)\pi (dx) = 0}$$

for a sufficiently large class of f.

If π is a stationary distribution and we take π to be the initial distribution for the process, then \(\{X_{k}^{\pi }\}\) (or {X π(t)}) will be a stationary process. If \(\{X_{k}^{\pi }\}\) is ergodic as defined generally for stationary processes, that is, the tail σ-field

$$\displaystyle{\mathcal{T} = \cap _{n}\sigma (X_{k}^{\pi },k \geq n)}$$

only contains events of probability zero or one, we will say that π is an ergodic stationary distribution. If there is only one stationary distribution, it must be ergodic. If π is an ergodic stationary distribution, then taking X = X π, (6.1) or (6.2) hold for all h ∈ B(E), or more generally, for all h ∈ L 1(π).

The questions of existence and uniqueness of stationary distributions are among the fundamental questions in the study of Markov processes. If, as is typically the case,

$$\displaystyle{ P: C_{b}(E) \rightarrow C_{b}(E), }$$
(6.1)

or

$$\displaystyle{ T(t): C_{b}(E) \rightarrow C_{b}(E),\quad t \geq 0, }$$
(6.2)

where C b (E)is the space of bounded continuous functions on E, proof of existence of a stationary distribution can be reduced to the proof of relative compactness of a sequence of probability measures.

Theorem 1.

Assume that {T(t)} satisfies ( 6.2 ) and for a corresponding Markov process {X μ (t)}, define a family of probability measures {ν t } by

$$\displaystyle\begin{array}{rcl} \nu _{t}f = E[\frac{1} {t}\int _{0}^{t}f(X(s))ds]& =& \frac{1} {t}\int _{0}^{t}E[f(X(s))]ds \\ & =& \frac{1} {t}\int _{0}^{t}\int _{ E}T(s)fd\mu ds,\quad f \in B(E).{}\end{array}$$
(6.3)

Then as \(t \rightarrow \infty\) , any weak limit point of {ν t } is a stationary distribution for {T(t)}.

Similarly, in the discrete-time case, if P satisfies ( 6.1 ), any weak limit point of {ν n } defined by

$$\displaystyle{ \nu _{n}f = \frac{1} {n}\sum _{k=0}^{n-1}E[f(X_{ k})] }$$
(6.4)

is a stationary distribution for P.

Proof.

Suppose \(t_{n} \rightarrow \infty\) and \(\{\nu _{t_{n}}\}\) converges weakly to π. Observe that for each f ∈ C b (E) and t > 0,

$$\displaystyle{ \frac{1} {t_{n}}\int _{t}^{t+t_{n} }E[f(X(s))]ds =\nu _{t_{n}}T(t)f}$$

has the same limit as \(\nu _{t_{n}}f\). Consequently, π f = π T(t)f, and π is a stationary distribution.

The proof in the discrete case is essentially the same.

The natural approach to proving the existence of the sequence \(\{\nu _{t_{n}}\}\) is to prove relative compactness for {ν t }. Since relative compactness in \(\mathcal{P}(E)\) is equivalent to tightness, we have the following.

Corollary 1.

Let E be compact. If {T(t)} satisfies ( 6.2 ), then there exists at least one stationary distribution for {T(t)}. Similarly, if P satisfies ( 6.1 ), then there exists at least one stationary distribution for P.

More generally, relative compactness is usually proved by obtaining a Lyapunov function for the process. In particular, we want to find a function \(\psi: E \rightarrow [0,\infty )\) such that for each a ≥ 0, the level set

$$\displaystyle{\varGamma _{a} =\{ x \in E:\psi (x) \leq a\}}$$

is compact and for some initial distribution μ,

$$\displaystyle{K \equiv \sup _{t\geq 0}E[\psi (X^{\mu }(t))] <\infty.}$$

It follows that

$$\displaystyle{P\{X^{\mu }(t)\notin \varGamma _{a}\} = P[\psi (X^{\mu }(t))> a\} \leq \frac{K} {a} }$$

and that

$$\displaystyle{\nu _{t}(\varGamma _{a}^{c}) \leq \frac{\nu _{t}\psi } {a} \leq \frac{K} {a},}$$

so {ν t } is tight and hence relatively compact.

The notion of a stochastic Lyapunov functions was developed in [14] and reflects ideas dating back to [10] and [13]. There is a large literature on constructing such functions. In discrete time, we have the following simple condition.

Lemma 2.

Let \(\psi: E \rightarrow [0,\infty )\) . Suppose that there exist a ≥ 0 and 0 ≤ b < 1 such that

$$\displaystyle{ P\psi (x) \leq a + b\psi (x). }$$
(6.5)

Then for each n

$$\displaystyle{P^{n}\psi (x) \leq a\frac{1 - b^{n-1}} {1 - b} + b^{n}\psi (x),}$$

and hence for \(\mu \in \mathcal{P}(E)\) satisfying ∫ E ψdμ < ∞,

$$\displaystyle{\sup _{n}E[\psi (X_{n}^{\mu })] \leq \frac{a} {1 - b} +\int _{E}\psi d\mu <\infty.}$$

Consequently, if ψ has compact level sets and P satisfies ( 6.1 ), then there exists at least one stationary distribution for P.

The analogous result in the continuous-time case is somewhat more delicate. Rewriting (6.5) as

$$\displaystyle{(P - I)\psi (x) \leq a - (1 - b)\psi (x)}$$

and recalling that PI plays a role analogous to the generator A suggests looking for ψ satisfying

$$\displaystyle{A\psi (x) \leq a -\epsilon \psi (x),}$$

for some positive a and ε. To this point, we have only considered A to be defined so that f and Af are in B(E). For many Markov processes, for example, diffusions, the extension of the generator to a large class of unbounded ψ is clear, but even in the diffusion setting with smooth ψ, in general we can only claim that

$$\displaystyle{\psi (X(t)) -\psi (X(0)) -\int _{0}^{t}A\psi (X(s))ds}$$

is a local martingale, not a martingale. Note, however, that if ψ is bounded below and is bounded above, this local martingale will also be a supermartingale. With that observation in mind, the following lemma provides the desired extension.

The following is essentially a consequence of Fatou’s lemma.

Lemma 3.

For n = 1,2,…, let f n ,Af n ∈ B(E), and

$$\displaystyle{f_{n}(X(t)) - f_{n}(X(0)) -\int _{0}^{t}Af_{ n}(X(s))ds}$$

be a martingale. Suppose f n ≥ 0, supn,x Af n (x) < ∞, and for each x ∈ E, {f n (x)} and {Af n (x)} converge. Denote the limits ψ and Aψ. Then

$$\displaystyle{ \psi (X(t)) -\psi (X(0)) -\int _{0}^{t}A\psi (X(s))ds }$$
(6.6)

is a supermartingale.

The supermartingale property is exactly what is needed to give the continuous-time analog of Lemma 2.

Lemma 4.

Let measurable functions \(\psi,A\psi: E \rightarrow \mathbb{R}\) satisfy ψ ≥ 0 and supx∈E Aψ(x) < ∞. For \(\mu \in \mathcal{P}(E)\) satisfying ∫ E ψdμ < ∞, assume that ( 6.6 ) with X replaced by X μ is a supermartingale. Suppose

$$\displaystyle{A\psi (x) \leq a -\epsilon \psi (x)}$$

Then

$$\displaystyle{ E[\psi (X^{\mu }(t))] \leq \frac{a} {\epsilon } \vee \int _{E}\psi d\mu. }$$
(6.7)

Consequently, if ψ has compact level sets and {T(t)} satisfies ( 6.2 ), then there exists at least one stationary distribution for {T(t)}.

Proof.

Let Z μ denote the supermartingale. Then

$$\displaystyle\begin{array}{rcl} e^{\epsilon t}\psi (X^{\mu }(t))& =& \psi (X^{\mu }(0)) +\int _{ 0}^{t}e^{\epsilon s}d\psi (X^{\mu }(s)) +\int _{ 0}^{t}\epsilon e^{\epsilon s}\psi (X^{\mu }(s))ds {}\\ & =& \psi (X^{\mu }(0)) +\int _{ 0}^{t}e^{\epsilon s}dZ^{\mu }(s) +\int _{ 0}^{t}e^{\epsilon s}(\epsilon \psi (X^{\mu }(s)) + A\psi (X^{\mu }(s)))ds {}\\ & \leq & \psi (X^{\mu }(0)) +\int _{ 0}^{t}e^{\epsilon s}dZ^{\mu }(s) +\int _{ 0}^{t}e^{\epsilon s}ads. {}\\ \end{array}$$

Since \(E[\int _{0}^{t}e^{\epsilon s}dZ^{\mu }(s)] \leq 0\),

$$\displaystyle{E[\psi (X^{\mu }(t))] \leq e^{-\epsilon t}\int _{ E}\psi d\mu + \frac{a} {\epsilon } (1 - e^{-\epsilon t}),}$$

and the lemma follows.

We can relax the conditions of Lemma 4 and still obtain relative compactness of {ν t } but without the moment estimate (6.7).

Lemma 5.

Let measurable functions \(\psi,A\psi: E \rightarrow \mathbb{R}\) satisfy ψ ≥ 0 and K = supx∈E Aψ(x) < ∞. For \(\mu \in \mathcal{P}(E)\) satisfying ∫ E ψdμ < ∞, assume that ( 6.6 ) with X replaced by X μ is a supermartingale. Suppose that for each a > 0

$$\displaystyle{\varGamma _{a} =\{ x: A\psi (x) \geq -a\}}$$

is compact. Then {ν t } defined by ( 6.3 ) is relatively compact, and if {T(t)} satisfies ( 6.2 ), there exists at least one stationary distribution for {T(t)}.

If E is locally compact and {T(t)} satisfies ( 6.2 ), then existence of a stationary distribution holds as long as Γ a is compact for some a > 0.

Remark 2.

The assumption that Γ a is compact only for some a > 0 is, in general, not enough to ensure relative compactness of {ν t }. If, however, the process is Harris recurrent (see Section 6.2.2), then existence of a stationary distribution implies convergence of {ν t }.

Proof.

The supermartingale property implies

$$\displaystyle{-\int _{E}A\psi d\nu _{t} = -\frac{1} {t}E[\int _{0}^{t}A\psi (X(s))ds] \leq \frac{1} {t}\int _{E}\psi d\mu -\frac{1} {t}E[\psi (X(t))]] \leq \frac{1} {t}\int _{E}\psi d\mu,}$$

and for a > 0,

$$\displaystyle{a\nu _{t}(\varGamma _{a}^{c}) \leq K + \frac{1} {t}\int _{E}\psi d\mu,}$$

giving tightness and hence relative compactness for {ν t }.

The second part of the lemma follows by the observation that {ν t } is relatively compact as a probability measure on the one-point compactification of E and the compactness of Γ a for some a > 0 implies that any limit point ν satisfies ν (Γ a ) > 0 and hence ν (E) > 0. Normalizing the restriction of ν to E to be a probability measure gives a stationary distribution for {T(t)}. See Theorem 4.9.9 of [9].

The following lemma gives conditions which coupled with some kind of irreducibility imply recurrence, but not necessarily positive recurrence.

Lemma 6.

Let measurable functions \(\psi,A\psi: E \rightarrow \mathbb{R}\) satisfy ψ ≥ 0 and K = supx∈E Aψ(x) < ∞. For \(\mu \in \mathcal{P}(E)\) satisfying ∫ E ψdμ < ∞, assume that ( 6.6 ) with X replaced by X μ is a supermartingale. Suppose that for each a > 0,

$$\displaystyle{\varGamma _{a} =\{ x:\psi (x) \leq a\}}$$

is compact and that there exists a 0 such that

$$\displaystyle{\sup _{x\in \varGamma _{a_{ 0}}^{c}}A\psi \leq 0.}$$

Let \(\tau _{0} =\inf \{ t \geq 0: X^{\mu }(t) \in \varGamma _{a_{0}}\) and \(\gamma _{a} =\inf \{ t \geq 0: X^{\mu }(t)\notin \varGamma _{a}\}\) . Then

$$\displaystyle{ \lim _{a\rightarrow \infty }P\{\gamma _{a} \leq \tau _{0}\} = 0. }$$
(6.8)

Proof.

It is at least not immediately obvious that γ a  <  implies ψ(X μ(γ a )) ≥ a, so some randomization may be necessary for a complete proof, but assuming this inequality holds, the supermartingale property implies

$$\displaystyle{aP\{\gamma _{a} \leq \tau _{0}\} \leq E[\psi (X^{\mu }(\gamma _{a} \wedge \tau _{0}))] \leq \int _{E}\psi d\mu,}$$

and (6.8) follows.

Example 1.

In [3], Rabi gives a class of ψ of the form \(\psi (x) = F(\vert x - z\vert )\) for nondegenerate diffusion processes which satisfy the conditions of Lemma 6. (Actually, in Rabi’s notation, we need to set \(\psi (x) = -F(\vert x - z\vert )\).) The non-degeneracy assumption then ensures Harris recurrence (see below). He also formulates similar conditions that imply transience and gives a construction of an F such that \(\psi (x) = -F(\vert x - z\vert )\) satisfies the conditions of the second part of Lemma 5.

A central idea in the study of uniqueness of stationary distributions is the notion of Harris recurrence.

2.1 Harris Recurrence

Harris irreducibility requires the existence of a measure \(\varphi\) on \(\mathcal{B}(E)\) such that \(\varphi (B)> 0\) implies that the Markov process visits B with positive probability, regardless of the initial distribution. If the process visits such B infinitely often with probability one, or in the continuous time case, the process visits B for arbitrarily large times, that is, τ n  = inf{t > n: X(t) ∈ B} is finite almost surely for each n, the process is Harris recurrent. As long as \(\varphi\) is σ-finite, without loss of generality, we can and will assume \(\varphi\) is a probability measure. In discrete time, the classical conditions for Harris recurrence can be formulated under the assumption that there exists a function \(\varepsilon: E \rightarrow [0,1]\) such that the transition function satisfies

$$\displaystyle{ P(x,B) \geq \varepsilon (x)\varphi (B) }$$
(6.9)

and that for each initial condition μ, the Markov process satisfies

$$\displaystyle{ P\{\sum _{k=1}^{\infty }\varepsilon (X_{ k}^{\mu }) = \infty \} = 1. }$$
(6.10)

The following lemma illustrates the significance of these conditions.

Lemma 7.

Let \(\mu \in \mathcal{P}(E)\) , and suppose that ( 6.9 ) and ( 6.10 ) hold. Then there exists a probability space with a process X μ , a filtration \(\{\mathcal{F}_{k}^{\mu }\}\) , and a \(\{\mathcal{F}_{k}^{\mu }\}\) -stopping time τ μ such that X μ is \(\{\mathcal{F}_{k}^{\mu }\}\) -Markov with initial distribution μ and transition function P(x,Γ) and the distribution of \(X_{\tau ^{\mu }}^{\mu }\) is \(\varphi\) .

Proof.

We enlarge the state space to be E ×{−1, 1} and define the new transition function by

$$\displaystyle{Q(x,\theta,\varGamma \times \{\theta \}) = P(x,\varGamma ) -\varepsilon (x)\varphi (\varGamma )}$$

and

$$\displaystyle{Q(x,\theta,\varGamma \times \{-\theta \}) =\varepsilon (x)\varphi (\varGamma ).}$$

If (X μ, Θ) is a Markov process with this transition function such that X 0 μ has distribution μ, then X μ is a Markov process with transition function P(x, Γ) and initial distribution μ, and the desired stopping time is \(\tau ^{\mu } =\min \{ k:\theta _{k}\neq \theta _{k-1}\}\). Note that

$$\displaystyle{P\{\tau ^{\mu }> n\} = E[\prod _{k=0}^{n-1}(1 -\varepsilon (X_{ k}^{\mu }))]}$$

and (6.10) implies P{τ μ < } = 1.

Much of the work on Harris recurrence is done under weaker conditions of the form

$$\displaystyle{\sum _{n=1}^{\infty }a_{ n}P^{n}(x,\varGamma ) \geq \varepsilon (x)\varphi (\varGamma ),}$$

where a n (x) ≥ 0, \(\sum _{n=1}^{\infty }a_{n}(x) = 1\), or in continuous time,

$$\displaystyle{\int _{0}^{\infty }P(t,x,\varGamma )a_{ x}(dt) \geq \varepsilon (x)\varphi (\varGamma ),}$$

where a x is a probability distribution on (0, ), and typically, \(\varepsilon (x)\) has the form \(\varepsilon \mathbf{1}_{C}(x)\) for some constant \(\varepsilon> 0\) and \(C \in \mathcal{B}(E)\). The analog of Lemma 7 holds under these conditions, at least if (6.10) is replaced by

$$\displaystyle{P\{\sum _{k=1}^{\infty }\varepsilon (X_{ k}^{\mu })^{2} = \infty \} = 1\mbox{ or }P\{\int _{ 0}^{\infty }\varepsilon (X^{\mu }(s))^{2}ds = \infty \} = 1.}$$

The existence of these stopping times implies the desired uniqueness of the stationary distribution and convergence in total variation of ν n and ν t .

Lemma 8.

Let \(\varphi \in \mathcal{P}(E)\) . Suppose that for each \(\mu \in \mathcal{P}(E)\) , on some probability space, there exists a process X μ a filtration \(\{\mathcal{F}_{k}^{\mu }\}\) , and a \(\{\mathcal{F}_{k}^{\mu }\}\) -stopping time τ μ such that X μ is \(\{\mathcal{F}_{k}^{\mu }\}\) -Markov with initial distribution μ and transition function P and \(X_{\tau ^{\mu }}^{\mu }\) has distribution \(\varphi\) . Then there is at most one stationary distribution for P.

If there is a stationary distribution π, then for each initial distribution \(\mu \in \mathcal{P}(E)\) , {ν n } defined by ( 6.4 ) converges in total variation to π.

The analogous result holds in continuous time.

Proof.

Suppose π 1 and π 2 are stationary distributions for P. Let \(X^{\pi _{1}}\) and \(X^{\pi _{2}}\) satisfy the hypotheses of the lemma. By the ergodic theorem, for each h ∈ B(E), we can define

$$\displaystyle{\;H_{\pi _{i}}^{h} =\lim _{ n\rightarrow \infty }\frac{1} {n}\sum _{k=0}^{n-1}h(X_{ k}^{\pi _{i} }) =\lim _{n\rightarrow \infty }\frac{1} {n}\sum _{k=\tau ^{\pi _{i}}}^{\tau ^{\pi _{i}}+n-1 }h(X_{k}^{\pi _{i} }) =\lim _{n\rightarrow \infty }\frac{1} {n}\sum _{k=0}^{n-1}h(X_{\tau ^{\pi _{ i}}+k}^{\pi _{i} }).}$$

By the strong Markov property, \(H_{\pi _{1}}^{h}\) and \(H_{\pi _{2}}^{h}\) must have the same distribution. Since \(\pi _{i}h = E[H_{\pi _{i}}^{h}]\), π 1 = π 2.

Under the hypotheses of the lemma, for h ∈ B(E) and \(\mu \in \mathcal{P}(E)\),

$$\displaystyle{\vert E[ \frac{1} {n}\sum _{k=0}^{n-1}h(X_{ k}^{\mu })] - E[ \frac{1} {n}\sum _{k=0}^{n-1}h(X_{ k}^{\varphi })]\vert ] \leq \Vert h\Vert (P\{\tau ^{\mu }> k\} + \frac{2k} {n} ),}$$

and hence

$$\displaystyle{ \vert E[ \frac{1} {n}\sum _{k=0}^{n-1}h(X_{ k}^{\mu })] -\pi h\vert \leq \Vert h\Vert (P\{\tau ^{\mu }> k\} + P\{\tau ^{\pi }> k\} + \frac{4k} {n} ), }$$
(6.11)

and taking the sup over h ∈ B(E) with \(\Vert h\Vert \leq 1\) gives the convergence in total variation.

If \(\tau ^{\varphi }\) satisfies \(0 <E[\tau ^{\varphi }] <\infty\), then τ 1 = 0 and \(\tau _{2} =\tau ^{\varphi }\) provide an example of τ 1 and τ 2 in the following lemma.

Lemma 9.

Let X be \(\{\mathcal{F}_{k}\}\) -Markov with transition function P, and let τ 1 and τ 2 be stopping times satisfying τ 1 ≤τ 2 and \(0 <E[\tau _{2} -\tau _{1}] <\infty\) such that \(X_{\tau _{1}}\) and \(X_{\tau _{2}}\) have the same distribution. Then π defined by

$$\displaystyle{\pi h = \frac{E[\sum _{k=\tau _{1}+1}^{\tau _{2}}h(X_{k})]} {E[\tau _{2} -\tau _{1}]} }$$

is a stationary distribution for P.

In continuous time,

$$\displaystyle{\pi h = \frac{E[\int _{\tau _{1}}^{\tau _{2}}h(X(s))ds]} {E[\tau _{2} -\tau _{1}]}.}$$

Remark 3.

In the case \(0 <E[\tau ^{\varphi }] <\infty\), this observation is essentially the renewal argument of [1] and [16].

Proof.

Since

$$\displaystyle{M_{n} = h(X_{n}) - h(X_{0}) -\sum _{k=0}^{n-1}(Ph(X_{ k}) - h(X_{k}))}$$

is a martingale,

$$\displaystyle{0 = E[h(X_{\tau _{2}}) - h(X_{\tau _{1}})] = E[\sum _{k=\tau _{1}}^{\tau _{2}-1}(Ph(X_{ k}) - h(X_{k})),}$$

and hence,

$$\displaystyle{\pi Ph =\pi h,}$$

so π is a stationary distribution for P.

Example 2.

In [6], Rabi and Mukul Majumdar consider processes in E = (0, 1) of the form

$$\displaystyle{X_{n+1} =\xi _{n+1}X_{n}(1 - X_{n}),}$$

where the {ξ k } are iid with values in (0, 4). Clearly, this process satisfies (6.1). Under the assumption that the distribution of ξ has an absolutely continuous part with a density that is strictly positive on some interval, they give conditions for Harris recurrence.

Example 3.

The inequality in (6.11) and the analogous inequality in continuous time,

$$\displaystyle{ \vert E[\frac{1} {t}\int _{0}^{t}h(X^{\mu }(s))ds] -\pi h\vert \leq \Vert h\Vert (P\{\tau ^{\mu }> r\} + P\{\tau ^{\pi }> r\} + \frac{4r} {t} ), }$$
(6.12)

actually give rates of convergence. Under aperiodicity assumptions, one can replace the average by E[h(X μ(t))] and eliminate the O(t −1) term. In [7], Rabi and Aramian Wasielak give conditions under which this can be done for a class of diffusion processes.

2.2 Conditions without Harris Recurrence

Harris recurrence is very useful when it holds, or perhaps more to the point, when it can be shown that it holds. In general, it does not hold, even in relatively simple settings. Perhaps the best known example is the “Markov process” in [0, 1) given by the recursion

$$\displaystyle{X_{n+1} = X_{n} + z\mbox{ mod }1,}$$

for some irrational z.

For an example with more interesting probabilistic structure, let \(E =\{ -1,1\}^{\infty }\), and consider a generator of the form

$$\displaystyle{ Af(x) =\sum _{ k=1}^{\infty }\lambda _{ k}(f(\eta _{k}x) - f(x)), }$$
(6.13)

where λ k  > 0, k λ k  < , and η k x is obtained by replacing x k by − x k . If x, y ∈ E differ on infinitely many components, then P(t, x, ⋅ ) and P(t, y, ⋅ ) are mutually singular for all t, but for all x ∈ E, P(t, x, ⋅ ) converges weakly to the distribution under which the components are independent symmetric Bernoulli.

In general, infinite dimensional processes provide a source of examples that are not Harris recurrent even if ergodic. We will not address any more examples of this type, but see [11] for recent work in this direction.

There is a need for techniques for studying ergodicity for processes that are not Harris recurrent. One approach that appears frequently in Rabi’s work involves the notion of splitting and is discussed in the paper by Ed Waymire in this volume. A second approach considered by Rabi and Gopal Basak in [2] is by verifying asymptotical flatness, that is, by showing that X x and X y can be coupled in such a way that for each compact K ⊂ E and \(\varepsilon> 0\),

$$\displaystyle{\lim _{t\rightarrow \infty }\sup _{x,y\in K}P\{\vert X^{x}(t) - X^{y}(t)\vert>\varepsilon \}= 0.}$$

For example, if one rewrites the generator in (6.13) as

$$\displaystyle{ Af(x) =\sum _{ k=1}^{\infty }2\lambda _{ k}(\frac{1} {2}f(\eta _{k}^{1}x) + \frac{1} {2}f(\eta _{k}^{-1}x) - f(x)), }$$
(6.14)

where η k 1 x is obtained from x by replacing x k by 1 and η k −1 x is obtained by replacing x k by − 1, then the coupling can be obtained using independent Poisson processes {N k }, N k with intensity 2λ k , and at the lth jump of N k replacing x k by ξ kl , where the {ξ kl } are independent symmetric Bernoulli.

Example 4.

In [2], Rabi and Gopal Basak consider diffusions of the form

$$\displaystyle{X^{x}(t) = x +\int _{ 0}^{t}BX^{x}(s)ds +\int _{ 0}^{t}\sigma (X^{x}(s))dW(s).}$$

One has a natural coupling simply by using the same Brownian motion W for both X x and X y. Lyapunov-type arguments are again employed but with analytic estimates rather than simply compactness arguments. In particular, the arguments employ ψ (v in the notation of the paper) of the form

$$\displaystyle{\psi (x) = (x \cdot Cx)^{1-\varepsilon },}$$

for appropriately chosen positive definite C and \(\varepsilon \in [0,1)\). Different choices of C are applied to ψ(X x(t)) to ensure the existence of a stationary distribution and to \(\psi (X^{x}(t) - X^{y}(t))\) to give the asymptotic flatness.

3 Central Limit Theorems

There are many version of the martingale central limit theorem. See, for example, [15, 17, 12]. The following version is from Theorem 7.1.4 of [9].

Theorem 2.

Let {M n } be a sequence of cadlag, \(\mathbb{R}^{d}\) -valued martingales, with M n (0) = 0, and let A n = [M n ] be the matrix of covariations, that is,

$$\displaystyle{A_{n}^{ij}(t) = [M_{ n}^{i},M_{ n}^{j}]_{ t}.}$$

Suppose that for each t ≥ 0,

$$\displaystyle{\lim _{n\rightarrow \infty }E[\sup _{s\leq t}\vert M_{n}(s) - M_{n}(s-)\vert ] = 0}$$

and

$$\displaystyle{\lim _{n\rightarrow \infty }A_{n}(t) = A(t),}$$

where A is deterministic and continuous. Then {M n } converges in distribution to a Gaussian process M such that M has independent increments, E[M(t)] = 0, and \([M^{i},M^{j}]_{t} = E[M^{i}(t)M^{j}(t)] = A^{ij}(t)\) .

If

$$\displaystyle{A(t) =\int _{ 0}^{t}\sigma (s)\sigma (s)^{T}ds,}$$

for some d × m-matrix-valued function σ, then we can write

$$\displaystyle{M(t) =\int _{ 0}^{t}\sigma (s)dW(s),}$$

where W is an \(\mathbb{R}^{m}\) -valued standard Brownian motion.

Example 5.

Let π be an ergodic stationary distribution for a Markov semigroup {T(t)}. Then {T(t)} extends to L 2(π) and is strongly continuous on L 2(π). Let A be the Hille-Yosida generator for the semigroup on L 2(π). Then for each \(f \in \mathcal{D}(A)\), the domain of A,

$$\displaystyle{M^{f}(t) = f(X^{\pi }(t)) - f(X^{\pi }(0)) -\int _{ 0}^{t}Af(X^{\pi }(s))ds}$$

is a square integrable martingale.

Then, for h ∈ L 2(π), ergodicity implies

$$\displaystyle{\lim _{n\rightarrow \infty }\frac{1} {n}\int _{0}^{nt}h(X^{\pi }(s))ds =\pi ht,}$$

and Theorem 2.1 of [4] gives the functional central limit theorem for the scaled deviations,

$$\displaystyle{Z_{n}^{h}(t) = \frac{1} {\sqrt{n}}\int _{0}^{nt}(h(X^{\pi }(s)) -\pi h)ds.}$$

The key assumption is that there exists \(f \in \mathcal{D}(A)\) such that \(Af = h -\pi h\). Then

$$\displaystyle{Z_{n}^{h}(t) = \frac{1} {\sqrt{n}}(f(X^{\pi }(nt)) - f(X^{\pi }(0))) - \frac{1} {\sqrt{n}}M^{f}(nt).}$$

Consequently, we have the functional central limit theorem for \(\{Z_{n}^{h}\}\) provided we can prove the functional central limit theorem for \(\{ \frac{1} {\sqrt{n}}M^{f}(n\cdot )\}\). Observe that the quadratic variation of \(\frac{1} {\sqrt{n}}M^{f}(n\cdot )\) is the same as the quadratic variation for \(U_{n}(t) = \frac{1} {\sqrt{n}}f(X^{\pi }(n\cdot ))\) and that by Itô’s formula,

$$\displaystyle\begin{array}{rcl} [U_{n}]_{t}& =& U_{n}(t)^{2} - \frac{1} {n}\int _{0}^{t}2f(X^{\pi }(ns-))df(X^{\pi }(ns)) \\ & =& U_{n}(t)^{2} - \frac{1} {n}\int _{0}^{t}2f(X^{\pi }(ns-))dM^{f}(ns) -\int _{ 0}^{t}2f(X^{\pi }(ns))Af(X^{\pi }(ns))ds \\ & \rightarrow & -t\int _{E}2f(x)Af(x)\pi (dx). {}\end{array}$$
(6.1)

By Theorem 2, the convergence of \(Z_{n}^{h}\) follows. Of course, under the assumptions of Lemma 8, the same result will hold for X μ for all \(\mu \in \mathcal{P}(E)\).

If f is smooth and X π is an \(\mathbb{R}^{d}\)-valued diffusion,

$$\displaystyle{X^{\pi }(t) = X^{\pi }(0) +\int _{ 0}^{t}\sigma (X(s))dW(s) +\int _{ 0}^{t}b(X(s))ds,}$$

then

$$\displaystyle{f(X^{\pi }(t)) = f(X^{\pi }(0)) +\int _{ 0}^{t}\nabla f(X^{\pi }(s))^{T}\sigma (X^{\pi }(s))dW(s) + R(t),}$$

where R is continuous with finite variation, so we can also write

$$\displaystyle\begin{array}{rcl} [U_{n}]_{t}& =& \frac{1} {n}\int _{0}^{nt}\nabla f(X^{\pi }(s))^{T}\sigma (X^{\pi }(s))\sigma (X^{\pi }(s))\nabla f(X^{\pi }(s))ds \\ & \rightarrow & \int _{\mathbb{R}^{d}}\nabla f(x)^{T}\sigma (x)\sigma (x)^{T}\nabla f(x)\pi (dx). {}\end{array}$$
(6.2)

Example 6.

In [5], Rabi considers diffusions in \(\mathbb{R}^{d}\) of the form

$$\displaystyle{X(t) = X(0) +\int _{ 0}^{t}u_{ 0}b(X(s))ds +\int _{ 0}^{t}\sigma (X(s))dW(s),}$$

under the assumption that σ is the square root of a positive definite matrix and σ and b are periodic in the sense that

$$\displaystyle{\sigma (x + z) =\sigma (x)\mbox{ and }b(x + z) = b(x)\quad z \in \mathbb{Z}^{d}.}$$

At least under additional regularity assumptions on σ and b, Y (t) = X(t)  mod 1, \(\mathbf{1} \in \mathbb{Z}^{d}\), the vector with each component 1, is a Markov process in [0, 1)d which has a unique, ergodic stationary distribution π. Then

$$\displaystyle{\lim _{n\rightarrow \infty }\frac{1} {n}X(nt) =\lim _{t\rightarrow \infty }\frac{1} {n}\int _{0}^{nt}u_{ 0}b(X(s))ds =\lim _{n\rightarrow \infty }\frac{1} {n}\int _{0}^{nt}u_{ 0}b(Y (s))ds = u_{0}\bar{b}t,}$$

where \(\bar{b} =\pi b\). Rabi gives the corresponding central limit theorem showing the convergence of

$$\displaystyle{V _{n}(t) = \frac{1} {\sqrt{n}}(X(nt) - nu_{0}\bar{b}t).}$$

For simplicity, assume X(0) = 0. Setting

$$\displaystyle{M_{1}^{n}(t) = \frac{1} {\sqrt{n}}(X(nt) -\int _{0}^{nt}u_{ 0}b(Y (s))ds) = \frac{1} {\sqrt{n}}\int _{0}^{nt}\sigma (Y (s))dW(s),}$$

the convergence of \(M_{1}^{n}\) follows from Theorem 2 and the ergodicity of Y.

Note that \(V _{n} = M_{1}^{n} + Z^{n}\), where

$$\displaystyle{Z^{n}(t) = \frac{1} {\sqrt{n}}u_{0}\int _{0}^{nt}(b(Y (s)) -\bar{ b})ds.}$$

Then Z n is of the form treated in [4], Example 5. Let \(\hat{A}\) denote the generator for Y, which will satisfy \(\hat{A}f(x\mbox{ mod }\mathbf{1}) = Af(x)\), if f extends periodically to an element in the domain of A. Rabi shows the existence of a twice continuously differentiable g satisfying

$$\displaystyle{\hat{A}g(y) = b(y) -\bar{ b},}$$

and setting

$$\displaystyle{M_{2}^{n}(t) = \frac{u_{0}} {\sqrt{n}}(g(Y (nt)) - g(0) -\int _{0}^{nt}(b(Y (s)) -\bar{ b})ds),}$$

we have

$$\displaystyle{V _{n}(t) = M_{1}^{n}(t) - M_{ 2}^{n}(t) + \frac{u_{0}} {\sqrt{n}}(g(Y (nt)) - g(0)).}$$

Since

$$\displaystyle{M_{2}^{n}(t) = \frac{u_{0}} {\sqrt{n}}\int _{0}^{nt}\nabla g(X(s))^{T}\sigma (X(s))dW(s) + R_{ n}(t),}$$

where R n is continuous with finite variation,

$$\displaystyle{[M_{1}^{n}-M_{ 2}^{n}]_{ t} = \frac{1} {n}\int _{0}^{nt}(I -u_{ 0}\nabla g(X(s))^{T})\sigma (X(s))\sigma (X(s))^{T}(I -u_{ 0}\nabla g(X(s)))ds.}$$

Setting a = σ σ T,

$$\displaystyle{D =\int _{[0,1)^{d}}(I - u_{0}\nabla g(y)^{T})a(y)(I - u_{ 0}\nabla g(y))\pi (dy),}$$

and V n converges in distribution to a mean zero Brownian motion with covariance matrix D. The form of D derived here differs from the form in [5], but compare (6.1) and (6.2).