The focus of this chapter is that of discrete parameter Markov processes on a general (measurable) state space S. Some general considerations for the existence and uniqueness of invariant probabilities are provided.

Consider a discrete parameter stochastic process {X n}n≥0 with general state space S equipped with a σ-field \(\mathcal S\) of subsets of S. Here the measurable space \((S,{\mathcal S})\) may be a countable set and \(\mathcal {S}\) its power set as in the previous chapter, or more generally, S may be a metric space and \(\mathcal S\) its (Borel) σ-field generated by the open subsets of S, for example. Unless stated otherwise, a map f on S into a metric space M is (implicitly) referred to as measurable (or more explicitly, Borel measurable) if it is measurable with respect to \(\mathcal S\) on S and the Borel σ-field on M, i.e., if \(f^{-1} (B) \in {\mathcal S}\) ∀ Borel sets B in M. Also, for any family {Y λ : λ ∈ Λ} of random variables on a probability space \((\varOmega ,\mathcal {F},P)\), σ{Y λ : λ ∈ Λ} is the σ-field generated by the family i.e., it is the smallest σ-field (\(\subset \mathcal {F}\)) with respect to which Y λ is measurable for each λ ∈ Λ.

FormalPara Definition 8.1

A stochastic process {X 0, X 1, …, X n, …} having state space S equipped with a σ-field \(\mathcal S\) has the Markov property with regular transition probabilities if for each n ≥ 0,

$$\displaystyle \begin{aligned} P(X_{n+1} \in B|\sigma\{X_0,X_1,\ldots,X_n\}) = p_n(X_n,B), \quad B\in \mathcal{S}, {} \end{aligned} $$
(8.1)

where for each n,

  1. 1.

    For each \(B\in \mathcal {B}\), x → p n(x, B) is a measurable function on S.

  2. 2.

    For each x ∈ S, B → p n(x, B) is a probability measure on \(\mathcal S\).

A stochastic process having the Markov property is called a Markov process with transition probabilities p n(x, dy), n ≥ 0. The Markov process is said to be homogeneous if the transition probabilities p n(x, dy) are the same for all n = 1, 2, … , say p(x, dy).

For the most part we will consider Markov processes with homogeneous transition probabilities, and unless explicitly stated otherwise, by a Markov process we will mean Markov process with homogeneous conditional probabilities from here out.

The special case in which there is a σ-finite measure ν on \((S,{\mathcal S})\) and a nonnegative measurable function p(x, y) on S × S such that ∫S p(x, y)ν(dy) = 1 and p(x, B) =∫B p(x, y)ν(dy), for all x ∈ S, occurs often. In this case, (8.7) and (8.12) below are iterated integrals of p(x 0, x 1)p(x 1, x 2)…p(x n−1, x n). This was the case, for example, in Chapter 7, where S is finite or countable and ν is the counting measure.

The first task is to establish the following construction.

FormalPara Proposition 8.1

Given an initial distribution μ and a transition probability p(x, dy), there is a unique probability measure P μ on the canonical space \((S^\infty , {\mathcal S}^{\otimes \infty })\) with the property (8.1) for the coordinate projections X n(x) = x n(n = 0, 1, 2, … ), where x = (x 0, x 1, x 2, … ) ∈ S ≡ S {0, 1, 2, … }, and \({\mathcal S}^{\otimes \infty }\) is the product σ-field; i.e., the smallest σ-field on S for which each of the respective coordinate projection maps is measurable. Moreover,

$$\displaystyle \begin{aligned} P_\mu(X_{n+1}\in B_{n+1}\vert \sigma(X_0,\dots,X_n)) = p(X_n,B_{n+1}),\quad B\in{\mathcal S}. \end{aligned} $$
(8.2)
FormalPara Proof

To begin, one constructs a collection of probability measures P μ,n, n = 0, 1, 2, …, on the finite dimensional product spaces \((S^n,{\mathcal S}^{\otimes n})\), where S n = S ×⋯ × S, and \({\mathcal S}^{\otimes n} = {\mathcal S}\otimes \cdots \otimes {\mathcal S}\), (n-fold), to represent the respective distributions of (X 0, X 1, …, X n), n = 0, 1, 2….

To this end, for a bounded \({\mathcal S}^{\otimes n}\)-measurable function f on S n, define integrations on the product spaces iteratively, beginning with the innermost integral (with respect to p(x n−1, dx n)), keeping all variables except the last one, namely x n, fixed; that is,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} & \ &\displaystyle \int_S\cdots\int_S f(x_0,x_1,\dots,x_{n-1},x_n) p(x_{n-1},dx_{n})\cdots p(x_0,dx_1)\mu(dx_0) \\ & &\displaystyle \quad = \int_S\cdots\int_Sf_1(x_0,x_1,\dots,x_{n-1}) p(x_{n-2},dx_{n-1})\cdots p(x_0,dx_1)\mu(dx_0)\\ & &\displaystyle \quad = \int_S\cdots\int_Sf_2(x_0,x_1,\dots,x_{n-2}) p(x_{n-3},dx_{n-2})\cdots p(x_0,dx_1)\mu(dx_0)\\ & &\displaystyle \quad \vdots \\ & &\displaystyle \quad = \int_S\int_Sf_{n-1}(x_0,x_1) p(x_0,dx_1)\mu(dx_0)\\ & &\displaystyle \quad = \int_Sf_n(x_0)\mu(dx_0), \end{array} \end{aligned} $$
(8.3)

where

$$\displaystyle \begin{aligned} f_1(x_0,x_1,\dots,x_{n-1}) = \int_Sf(x_0,x_1,\dots,x_{n-1},x_n)p(x_{n-1},dx_n), \end{aligned}$$
$$\displaystyle \begin{aligned} f_2(x_0,x_1,\dots,x_{n-2}) = \int_Sf_1(x_0,x_1,\dots,x_{n-1})p(x_{n-2},dx_{n-1}), \end{aligned}$$

and so on. To justify this integration, first observe that y → f(x 0, …, x n−1, y) is \({\mathcal S}\)-measurable for any fixed (x 0, …, x n−1) ∈ S n; namely, it is clear for indicators of measurable rectangles C = B 0 ×⋯ × B n, (\(B_i\in {\mathcal S}, i = 0,1,\dots ,n\)), therefore, by the π − λ theorem for all \(C\in {\mathcal S}^{\otimes n}\). Since every bounded measurable function is a pointwise (uniform) limit of a sequence of simple functions, the measurability of y → f(x 0, …, x n−1, y) follows for all bounded \({\mathcal S}^{\otimes n}\)-measurable functions f on S n. The \(\mathcal {S}^{\otimes n}\)-measurability of the map (x 0, …, x n−1) → f 1(x 0, …, x n−1) =∫S f(x 0, x 1, …, x n−1, y)p(x n−1, dy) follows.

Now, to define P μ,n, take \(f= {\mathbf {1}}_C, C\in {\mathcal S}^{\otimes n+1}\). Writing \(C_{x_0,\dots ,x_{n-1}} = \{y\in S: (x_0,\dots ,x_{n-1},y)\in C\}\), and \(f_1(x_0,x_1,\dots ,x_{n-1}) = p(x_{n-1},C_{x_0,\dots ,x_{n-1}})\), define

$$\displaystyle \begin{aligned} P_{\mu,0} = \mu, \ P_{\mu,1}(C) = \int_S\int_S{\mathbf{1}}_{C_{x_0}}(x_1) p(x_0,dx_1)\mu(dx_0)\ C\in{\mathcal S}^{\otimes2}. \end{aligned} $$
(8.4)

More generally, define for n ≥ 2

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P_{\mu,n}(C) & =&\displaystyle \int_S\cdots\int_S {\mathbf{1}}_{C_{x_0,\dots,x_{n-1}}}(x_n)p(x_{n-1},dx_n) p(x_{n-2},dx_{n-1})\cdots p(x_0,dx_1)\mu(dx_0)\\ & =&\displaystyle \int_S\cdots\int_Sp(x_{n-1},C_{x_0,\dots,x_{n-1}}) p(x_{n-2},dx_{n-1}) \cdots p(x_0,dx_1)\mu(dx_0). \end{array} \end{aligned} $$
(8.5)

In particular, for a measurable rectangle C = B 0 ×⋯ × B n, (8.5) reduces to

$$\displaystyle \begin{aligned} P_{\mu,1}(B_0\times B_1) = \int_{B_0}p(x_0,B_1)\mu(dx_0), \end{aligned}$$

and

$$\displaystyle \begin{aligned} P_{\mu,n}(B_0\times\cdots\times B_n) = \int_{B_0}\cdots\int_{B_{n-1}}p(x_{n-1},B_n)p(x_{n-2},dx_{n-1})\cdots p(x_0,dx_1)\mu(dx_0). \end{aligned}$$

Finite additivity of P μ,n follows by writing \({\mathbf {1}}_C = \sum _{j=1}^m{\mathbf {1}}_{C_j}\) for disjoint \(C_j\in {\mathcal S}^{\otimes n}\), and countable additivity then follows by the monotone convergence theorem. This completes the construction of the finite dimensional distributions. It is simple to check that this collection of probability measures is consistent in the sense of the Kolmogorov existence theorem.Footnote 1 Thus, by the Kolmogorov existence theorem in the case that S is also a Borel subset of a Polish space or, more generally by Tulcea’s theoremFootnote 2 for an arbitrary measurable space \((S,{\mathcal S})\), one arrives at a probability measure P μ on \((S^\infty ,{\mathcal S}^{\otimes \infty })\) such that the distribution of the projection (X 0, X 1, …, X n) : S → S n+1 is P μ,n, n = 0, 1, 2, …. For this, let \(B_{n+1}\in {\mathcal S}\). If \(C\in {\mathcal S}^{\otimes (n+1)}\), then it follows from (8.5) that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} & &\displaystyle P_{\mu,n+1}(C\times B_{n+1})\\ & &\displaystyle \quad = \int_S\cdots\int_S{\mathbf{1}}_C(x_0,\dots,x_n){\mathbf{1}}_{B_{n+1}}(x_{n+1}) P_{\mu,n+1}(dx_0\times\cdots\times dx_{n+1})\\ & &\displaystyle \quad = \int_{C\times S} p(x_n,B_{n+1})P_{\mu,n}(dx_0\times\cdots\times dx_n). \end{array} \end{aligned} $$
(8.6)

If one takes B n+1 = S, then the consistency is checked: P μ,n+1(C × S) = P μ,n(C). This completes the construction of P μ on \((S^\infty ,{\mathcal S}^{\otimes \infty })\). Finally, note that (8.6) also expresses the Markov property in the form:

$$\displaystyle \begin{aligned} P_\mu(X_{n+1}\in B_{n+1}\vert \sigma(X_0,\dots,X_n)) = p(X_n,B_{n+1}).\end{aligned} $$

\(\hfill \blacksquare \)

With the construction carried out in the proof of Proposition 8.1, given a transition probability p(x, dy) and an initial distribution μ(dx), one may make the following definition.

FormalPara Definition 8.2

A stochastic process {X 0, X 1, … } on an arbitrary probability space \((\varOmega ,{\mathcal F},P)\) is Markov with transition probability p(x, dy) and initial distribution μ(dx) if its distribution is P μ on \((S^\infty ,{\mathcal S}^{\otimes \infty }).\) If μ = δ x, then we write P x for \(P_{\delta _x}\).

FormalPara Note:

To avoid a clutter of symbols, we will often abuse notation in probabilities associated with Markov processes X = {X n : n = 0, 1, … } defined on a (possibly non-canonical probability space) \((\varOmega ,{\mathcal F},P)\) as P μ(X ∈ B) to indicate that X 0 has distribution μ. That is, we may use the expression for the corresponding probability in canonical space, where X n is the nth coordinate projection on S .

FormalPara Proposition 8.2

If \(\{X_n\}^\infty _{n\geq 0}\) has the Markov property, then one may obtain the distribution at m ≥ 1 time points into the future inductively for \(B_1,B_2,\ldots ,B_m \in {\mathcal S}\), as

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle P(X_{n+1}\in B_1,\ldots, X_{n+m}\in B_m\vert \sigma\{X_0,X_1,\ldots, X_n\}) \\ & &\displaystyle \quad = P(X_{n+1}\in B_1,\ldots,X_{n+m}\in B_m\vert \sigma\{X_n\}) \\ & &\displaystyle \quad = \int_{B_1} \cdots \int_{B_m} p(x_{m-1}, dx_m) \ldots p(x_{1}, dx_{2})p(X_n,dx_1)\\ & &\displaystyle \quad = P_{x}(X_1\in B_1,\dots,X_m\in B_m)|{}_{x=X_n}, {} \end{array} \end{aligned} $$
(8.7)

where the integration is an iterated integral.

FormalPara Proof

The first equality follows from the Markov property. The second follows from (8.6), as does the last. To prove this in detail, simply condition the left hand side of (8.7) on the larger σ-field σ(X 0, …, X n+m−1) and use the smoothing property of conditional expectations. Since

$$\displaystyle \begin{aligned}\int_{B_m} p(x_{m-1}, dx_m) = p(x_{m-1},B_m),\end{aligned}$$

the first integration yields

$$\displaystyle \begin{aligned}f_1(X_{n+1},\dots,X_{n+m-1}) = {\mathbf{1}}[X_{n+1}\in B_1,\dots,X_{n+m-1}\in B_{m-1}]\cdot p(X_{m-1}, B_m),\end{aligned}$$

in the notation introduced earlier. That is, the left side of (8.7) equals

$$\displaystyle \begin{aligned} {\mathbb E}(f_1(X_{n+1},\dots,X_{n+m-1})\vert \sigma(X_0,\dots,X_n)). \end{aligned} $$
(8.8)

Next taking the conditional expectation of (8.8), given σ(X 0, …, X n+m−2), one has

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle {\mathbb E}(f_2(X_{n+1},\dots, X_{n+m-2})\vert \sigma(X_0,\dots,X_n)) \\ & &\displaystyle \quad = {\mathbf{1}}[X_{n+1}\in B_1,\dots,X_{n+m-2}\in B_{m-2}] \int_{B_{m-1}}p(X_{m-1},B_m)p(X_{m-2},dx_{m-1}). \end{array} \end{aligned} $$

Continuing in this way, the probability is ultimately given by a function of X n of the form:

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle P(X_{n+m}\in B_m,\ldots, X_{n+1} \in B_1|\sigma\{X_0,X_1,\ldots,X_n\}) \\ & &\displaystyle \quad = {\mathbb E}(f_{m-1}(X_{n+1})\vert\sigma(X_0,\dots,X_n)\\ & &\displaystyle \quad = {\mathbb E}f_{m-1}(X_{n+1}\vert\sigma(X_n)) = \int_{B_1}f_{m-1}(x_1) p(X_n,dx_1). \end{array} \end{aligned} $$
(8.9)

\(\hfill \blacksquare \)

Observe that taking \(B_m = B \in {\mathcal S}\), B 1 = B 2 = ⋯ = B m−1 = S in (8.7), one has

$$\displaystyle \begin{aligned} P(X_{n+m}\in B|\sigma\{X_0,\ldots,X_n\}) = p^{(m)} (X_n,B), {} \end{aligned} $$
(8.10)

where the m-step transition probabilities are recursively given by

$$\displaystyle \begin{aligned} p^{(m+1)} (x,B) = \int_S p^{(m)} (y,B) p(x,dy), \quad B\in {\mathcal S},\, x\in S, {} \end{aligned} $$
(8.11)

with p (1)(x, B) ≡ p(x, B). Notice that by taking successive conditional expectations of 1[X 0 ∈ B 0, …, X n−1 ∈ B n−1, X n ∈ B n] given σ(X 0, …, X n−1), σ(X 0, …, X n−2), …, one obtains

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle P_\mu(X_{0}\in B_0,\dots, X_{n-1}\in B_{n-1}, X_{n}\in B_n)\\ & &\displaystyle \quad = \int_{B_0}\int_{B_1}\cdots\int_{B_n}p(x_{n-1},dx_n)p(x_{n-2},dx_{n-1}) \cdots p(x_0,dx_1)\mu(dx_0)\\ & &\displaystyle \quad = P_{\mu,n}(B_0\times B_1\times\cdots\times B_n), \end{array} \end{aligned} $$

where μ is the initial distribution of the process as defined by

$$\displaystyle \begin{aligned} \mu(B) = P(X_0 \in B), \quad B\in {\mathcal S}. {}\end{aligned} $$
(8.12)

The following is another equivalent version of the Markov property that is commonly used.

FormalPara Proposition 8.3

Let {X n}n≥1 be a Markov process with a transition probability p(x, dy) and some initial distribution μ. Then the conditional distribution of the after-n process defined by \(X^+_n := (X_{n},X_{n+1},\dots )\), given σ{X 0, …, X n}, is \(P_{X_n}\). That is, it equals P x on the event [X n = x] ⊂ Ω.

FormalPara Proof

In view of (8.7), one has for finite dimensional events of the form \(C = B_0\times B_1\times B_m\times S^{\infty }, B_i\in {\mathcal S}, 0\le i\le m,\) that \(P(X_n^+\in C | \sigma \{X_0,\dots ,X_n\}) = P_{X_n}((X_0,X_1,\dots )\in C).\) Now observe that the collection of sets \(C\in {\mathcal S}^{\otimes \infty }\) such that this equation holds is a λ-system, which contains a π-system of finite dimensional events. The assertion thus follows from an application of the π − λ theorem.\(\hfill \blacksquare \)

Situations in which there is an initial probability π for the Markov process which is invariant under the evolution are of particular interest, especially because when unique it represents the long term behavior of the process regardless of how it is initiated.

FormalPara Definition 8.3

A probability π on \(\mathcal S\) is said to be an invariant probability or steady state distribution for a Markov process {X n}n≥0 with transition probabilities p(x, dy) if

$$\displaystyle \begin{aligned} \int_S p(x,B) \pi(dx) = \pi(B) \quad \text{for all}\ B\in {\mathcal S}. {}\end{aligned} $$
(8.13)

Notice that if π is an invariant initial probability for {X n}n≥0, i.e., X 0 has the invariant distribution π, then the left side of (8.13) is P(X 1 ∈ B). One has that P(X n ∈ B) = π(B), \(B\in {\mathcal S}\), that is,

$$\displaystyle \begin{aligned} P(X_{n+1}\in B) = \int_S p(x,B)P(X_n\in dx) = \int_S p(x,B) \pi (dx) = \pi(B), {} \end{aligned} $$
(8.14)

which for n = 0 is true by definition, and the general case follows by the Markov property and induction. In addition to questions of

  1. (i)

    Existence

  2. (ii)

    Uniqueness

of invariant probabilities, one also seeks

  1. (iii)

    Basins of Attraction

i.e., initial distributions under which convergence to a given invariant probability will hold, and

  1. (iv)

    Rates of Convergence

to name a few of the central topics of the theory. In view of (8.7),

$$\displaystyle \begin{aligned} P(X_{n+1} \in B_1,\ldots, X_{n+m}\in B_m) = P(X_1 \in B_1,\ldots, X_m\in B_m), \quad B_i \in {\mathcal S}. {} \end{aligned} $$
(8.15)

In particular, in this context the distribution of the after-n process \(X^{+}_n \equiv \{X_{n+m}: m=0,1,2,\ldots \}\) coincides with that of \(X_0^+ = \{X_m: m=0,1,2,\ldots \}\) for each n = 1, 2, …, a property referred to as stationarity of the process {X 0, X 1, … }, as discussed in the earlier chapters of the text. From this perspective, theorems providing

  1. (v)

    Law of Averages

and

  1. (vi)

    Fluctuation Law

in the forms of a strong law of large numbers (ergodic theorem) and a central limit theorem for averages of the form \({1\over n}\sum _{j=0}^{n-1}f(X_j)\) in the presence of a unique invariant initial distributions are also essential to a complete theory.

For the next definition and elsewhere in the book, B(S) denotes the set of all real-valued bounded measurable functions on S, equipped with the sup-norm ∥f∥ =supxS|f(x)|.

FormalPara Definition 8.4

Given a transition probability p(x, dy) on \((S,{\mathcal S})\), the transition operator T is the map on B(S) (into B(S)) defined by

$$\displaystyle \begin{aligned} Tf(x) = \int_S f(y)p(x,dy), \quad f\in B(S). \end{aligned} $$
(8.16)

Note that the measurability of x → p(x, B) for every \(B\in \mathcal {S}\) implies the measurability of Tf (Exercise 1). The transition operator is a positive linear contraction on B(S) such that T 1 = 1, where 1 ∈ B(S) is the constant function on S with constant numerical value one (Exercise 1). In particular, given any such transition operator, one may recover \(p(x,B) = T{\mathbf {1}}_B(x), x\in S, B\in {\mathcal S}.\)

Note that for a Markov process {X n : n = 0, 1, 2, … } with transition probability p(x, dy), one has \(Tf(x) = {\mathbb E}(f(X_1)\vert X_0=x)\equiv {\mathbb E}(f(X_1)\vert \sigma (X_0))|{ }_{X_0=x}\), which may be expressed as \({\mathbb E}_xf(X_1)\). In fact, by the Markov property and induction,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbb E}_xf(X_{n+1}) & =&\displaystyle {\mathbb E}_x{\mathbb E}_x[f(X_{n+1})\mid\sigma(X_0,\dots,X_n)]\\ & =&\displaystyle {\mathbb E}_x{\mathbb E}_{X_n}f(X_{n+1})\\ & =&\displaystyle {\mathbb E}_xTf(X_n)\\ & =&\displaystyle {\mathbb E}_x {\mathbb E}(Tf(X_n)\vert\sigma(X_{n-1}))\\ & =&\displaystyle {\mathbb E}_xTTf(X_{n-1})={\mathbb E}_xT^2f(X_{n-1})=\cdots={\mathbb E}T^nf(X_1) \\ & =&\displaystyle T^{n+1}f(x),\quad x\in S, n \ge 0, \end{array} \end{aligned} $$

i.e., T n is the transition operator defined by the n-step transition probability p (n)(x, dy). The relation (8.13) implies (and is, therefore, equivalent to)

$$\displaystyle \begin{aligned} \int_S(Tf)(x)\pi(dx) = \int_S f(x) \pi(dx) \quad \ \text{for all} \ \, f\in B(S). \end{aligned} $$
(8.17)

Thus (8.17) also defines the convenient notation “π(dy) =∫S p(x, dy)π(dx).” Just as (8.13) implies (8.17), (8.14) implies

$$\displaystyle \begin{aligned} \int_S(T^nf)(x)\pi(dx) = \int_S f(x) \pi(dx) \quad \ \text{for all} \ \, f\in B(S), \ \text{for all} \ n\ge 1. \end{aligned} $$
(8.18)

One approach to obtain invariant probabilities is by consideration of long time steady state distributions. In particular, one might anticipate that if for some x ∈ S, the distribution p (n)(x, dy) of the state X n converges weakly as n → to some limit probability distribution π x, then π x should be invariant under continued evolution. However, as the following example shows, one must be careful.

FormalPara Example 1 (Liggett)

Let \(S = \{0,1,{1\over 2}, {2\over 3}, {3\over 4}, \ldots , {m\over {m+1}},\ldots \}\), and let \(\mathcal S\) be the power set of S. Then S is a metric space with the usual distance on the line, d(x, y) = |x − y|, and \(\mathcal S\) is the Borel σ-field. Fix 0 < θ ≤ 1 and define

$$\displaystyle \begin{aligned} \begin{array}{ll} p\left(0, \left\{{1\over 2}\right\}\right) = \theta, & p\left(0, \left\{ {2\over 3}\right\}\right) = 1-\theta \\ p\left( {m\over{m+1}}, \left\{ \frac{m+1}{m+2}\right\}\right) = \theta, & p\left( {m\over{m+1}}, \left\{ \frac{m+2}{m+3}\right\}\right) = 1-\theta, \quad m=1,2,\dots\,\, . \\ p(1,\{0\}) =\theta, & p\left( 1, \left\{ {1\over 2}\right\}\right) = 1-\theta. \end{array} \end{aligned}$$

Then p (n)(x, dy) converges weakly to δ {1}, but this is clearly not an invariant probability.

Note that weak convergence requires convergence of the sequences of integrals ∫S f(y)p (n+1)(x, dy) = T n+1 f(x) = T n(Tf)(x), for all f ∈ C b(S), as n →; recall C b(S) ⊂ B(S) denotes the subset of all bounded continuous functions on the metric space S. As usual whenever S is a metric space, we take \(\mathcal S\) to be the Borel σ-field on S for the uniform norm on C b(S).

FormalPara Definition 8.5

A transition probability p(x, dy) on a metric space S is said to be Feller continuous, or weakly Feller continuous, if for every f ∈ C b(S), Tf ∈ C b(S). In this case one also says p(x, dy) has the (weak) Feller property.

Observe that Feller continuity of p(x, dy) means that the map x → p(x, dy), on S into the set \(\mathcal {P}(S)\) of all probability measures on \((S,{\mathcal S})\), is weakly continuous. Moreover, T is a positive, linear contraction operator on C b(S) with T1 = 1. Conversely, if S is a compact metric space, then any such operator on C b(S) uniquely determines Feller transition probabilities p(x, dy) by applying the Riesz Representation TheoremFootnote 3 from analysis to the bounded linear functional f → Tf(x), f ∈ C b(S), for each x ∈ S (Exercise 6).

Notice that since C b(S) is measure-determining,Footnote 4 the condition (8.17) defining an invariant probability may be restricted to f ∈ C b(S).

Another obstacle to this approach to the determination of invariant probabilities is evident in the simple two-state example p 01 = p 10 = 1, p 00 = p 11 = 0. In this case, π 0 = π 1 = 1∕2 is the unique invariant probability, but \(p_{01}^{(n)}\) oscillates between 1 and 0 as a function of n. However, these oscillations can be averaged out by considerations of \({1\over 2n+1}\sum _{r=0}^{2n}p_{0j}^{(r)}\to 1/2,\) as n →, for j = 0, 1. So this example suggests that time averaging may be required, and Liggett’s Example 1 shows that the hypothesis of Feller continuity in Proposition 8.4 cannot in general be dispensed with in the weak convergence approach to invariant probabilities (Exercise 5). From a probabilistic perspective, note also that

$$\displaystyle \begin{aligned} {1\over m}\sum_{r=0}^{m-1}p^{(r)}(x,B) = {1\over m}\sum_{r=0}^{m-1}{\mathbb E}_x{\mathbf{1}}[X_r\in B] = {\mathbb E}_x\left({\sum_{r=0}^{m-1}{\mathbf{1}}[X_r\in B]\over m}\right) \end{aligned} $$
(8.19)

is the expected proportion of visits to the set \(B\in {\mathcal S}\) in time 0 to m − 1, starting from x.

FormalPara Proposition 8.4

Suppose S is a metric space and p(x, dy) is a Feller continuous transition probability on \((S,{\mathcal S})\). (a) If for some x ∈ S there is a sequence of integers 1 ≤ n 1 < n 2 < … such that, as k →,

$$\displaystyle \begin{aligned} {1\over{n_k}} \sum^{n_k-1}_{r=0} p^{(r)} (x,dy) \,\,\text{converges weakly to}\ \pi_x(dy) \end{aligned} $$
(8.20)

for some probability measure π x, then π x is an invariant for p(x, dy).

(b) If, for some sequence 1 ≤ n 1 < n 2… , (8.20) holds for every x ∈ S with the same limit π x = π for all x, then π is the unique invariant probability.

FormalPara Proof
  1. (a)

    The relation (8.20) says that

    $$\displaystyle \begin{aligned} \frac{1}{n_k} \sum^{n_k-1}_{r=0} (T^rf)(x) \longrightarrow \int_S f(y) \pi_x(dy) \quad \ \text{for all} \ \, f\in C_b(S). \end{aligned} $$
    (8.21)

    Replacing f by Tf (which belongs to C b(S) by hypothesis), one gets

    $$\displaystyle \begin{aligned} \frac{1}{n_k} \sum^{n_k}_{r=1} T^rf(x) \longrightarrow \int_S Tf(y) \pi_x(dy) \quad \ \text{for all} \ \, f\in C_b(S). \end{aligned} $$
    (8.22)

    But the difference between the left sides of (8.21) and (8.22) equals in magnitude \(|(T^{n_k}f)(x) - f(x)|/n_k \leq 2 \sup \{|f(x)|:x\in S\}/n_k\), which goes to zero as k →. Hence the limits in (8.21) and (8.22) are the same. Thus, since C b(S) is measure-determining, one has π x(dz) =∫S p(y, dz)π x(dy); see Lemma 1 below.

  2. (b)

    By (a), π is invariant. Suppose that, under the hypothesis of part (b), π′ is another invariant probability, and then integrating the two sides of (8.21) with respect to π′, one obtains

    $$\displaystyle \begin{aligned} \frac{1}{n_k} \sum^{n_k-1}_{r=0} \int_S(T^rf)(x) \pi'(dx) \longrightarrow \int_S \left[\int_S f(y) \pi(dy) \right] \pi'(dx). \end{aligned} $$
    (8.23)

    By invariance of π′, the left side equals \(\int fd \pi '\) (see (8.18)), while the right side is \(\int fd\pi \). Thus \(\int fd\pi ' = \int fd\pi \) for every f ∈ C b(S), implying π′ = π since C b(S) is measure-determining.

\(\hfill \blacksquare \)

FormalPara Lemma 1

If Q 1 and Q 2 are probability measures on the Borel σ-field of a metric space S such that ∫S fdQ 1 =∫S fdQ 2 for all bounded continuous real-valued functions f on S, then Q 1 = Q 2.

FormalPara Proof

Let \({\mathcal C}\) be the collection of Borel sets B such that Q 1(B) = Q 2(B). Then it is simple to check that \({\mathcal C}\) is a σ-field. Since \({\mathcal B}\) the Borel σ-field is the smallest σ-field containing all closed sets, it is sufficient to show that \({\mathcal C}\) contains all closed sets. For this, it is enough to show that for each (closed) F ⊂ S, there exists a sequence of nonnegative functions {f n}⊂ C b(S) such that f n 1 F as n↑∞. Since F is closed, one may view x ∈ F in terms of the equivalent condition that ρ(x, F) = 0, where \(\rho (x,F) :=\inf \{\rho (x,y):y\in F\}\). Let h n(r) = 1 − nr for 0 ≤ r ≤ 1∕n, h n(r) = 0 for r ≥ 1∕n. Then take f n(x) = h n(ρ(x, F)). In particular, 1 F(x) =limn f n(x), x ∈ S, and Lebesgue’s monotone convergence theorem applies to show \(F\in {\mathcal C}\).\(\hfill \blacksquare \)

As an immediate corollary, one gets the following corollary:

FormalPara Corollary 8.5

If a transition probability p(x, dy) on a metric space has the (weak) Feller property and there exists a Caesaro limit in the weak topology, namely,

$$\displaystyle \begin{aligned} \lim_{n\to\infty} {1\over n} \sum^{n-1}_{r=0} p^{(r)} (x,dy) = \pi(dy) \end{aligned} $$
(8.24)

such that the probability measure π does not depend on x, then π is the unique invariant probability.

Many important Markov processes do not admit an invariant probability, such as the case, for example, of a random walk on \(\mathbb {R}^k\) with an arbitrary step size distribution Q ≠ δ {0} (Exercise 8). There is one case, namely that of a compact state space, where every Feller transition probability admits at least one invariant probability.

FormalPara Proposition 8.6

Let S be a compact metric space and \(\mathcal S\) its Borel σ-field. If p(x, dy) is a Feller transition probability on \((S,{\mathcal S})\), then it admits an invariant probability.

FormalPara Proof

Fix x ∈ S and consider the sequence of probability measures μ n, n ≥ 1, given by \(\mu _n(B) = (1/n) \sum ^n_{m=1} p^{(m)}(x,B)\), \(B\in {\mathcal S}\). Since \(\mathcal {P}(S)\) is a weakly compact metric space,Footnote 5 there exists a subsequence \(\{\mu _{n_k} :k=1,2,\dots \}\), which converges weakly to a probability measure π x, say. By Proposition 8.4, π x is invariant.\(\hfill \blacksquare \)

As a simple corollary, we get the following result for finite Markov chains.

FormalPara Corollary 8.7

A Markov chain on a finite state space S has at least one invariant probability.

FormalPara Proof

This follows from Proposition 8.6 by making S a compact metric space with the metric d(x, y) = 1 if x ≠ y, d(x, x) = 0. Then \({\mathcal S} \equiv \mathcal {B}(S)\) is the class of all subsets of S, and every real-valued function on S is continuous.\(\hfill \blacksquare \)

FormalPara Remark 8.1

A direct proof of Corollary 8.7, which does not use Proposition 8.6 will be given later (see Corollary13.8 in Chapter 13).

The next approach to invariant probabilities is based on symmetries. For the definition below, consider a stationary Markov process {X n}n≥0 on a state space \((S,{\mathcal S})\) with transition probability p(x, dy). Since the distribution of such a process is invariant under time shift, i.e., {X n}n≥0 and {X n}nk have the same distribution, one may use Kolmogorov’s existence theorem to construct a stationary Markov process {Z n}<n< having the same transition probability p and the same invariant probability π.

FormalPara Definition 8.6

A Markov process with a transition probability p and an invariant probability π is said to be time-reversible if the stationary Markov process {Z n}<n<, with this transition probability and this invariant distribution, has the same distribution as the time-reversed process {Y n}<n<, where Y n := Z n (− < n < ). We refer to {Z n}<n< as the double-sided version.

In the context of the “movie metaphor,” the statistics of a stationary data stream does not depend on when viewing begins, while a time-reversible data stream sequence is the same whether it is viewed forward or backward.

For the propositions below, assume that the transition probability p(x, dy) has a density p(x, y) with respect to a σ-finite measure μ on \((S,{\mathcal S})\), with (x, y) → p(x, y) measurable (on \((S\times S, {\mathcal S}\otimes {\mathcal S})\) into \(([0,\infty ), \mathcal {B}_{[0,\infty )})\)).

FormalPara Proposition 8.8 (Detailed Balance Condition)

Let π(dy) be a probability measure on \((S,{\mathcal S})\) with a density π(y) with respect to μ. (a) If

$$\displaystyle \begin{aligned} \pi(x)p(x,y) = \pi(y)p(y,x) \quad \text{a.e. }\,(\mu\times \mu), {} \end{aligned} $$
(8.25)

then π is a time-reversible invariant probability for the Markov process. (b) For a Markov process with transition probability density p(x, y) and invariant probability density π(y), (8.25) is necessary for the process to be time-reversible.

FormalPara Proof
  1. (a)

    Let p and π satisfy (8.25). Then for every Borel measurable f,

    $$\displaystyle \begin{aligned}\begin{array}{rcl} \int_STf(x)\pi(dx) &=& \int_S Tf(x)\pi(x)\mu(dx) = \int_S\left( \int_S f(y)p(x,y)\mu(dy)\right)\pi(x)\mu(dx)\\ &=& \int_S\int_Sf(y)p(x,y)\pi(x)\mu(dy)\mu(dx)\\ &=& \int_S\left(\int_S f(y)p(y,x)\mu(dx)\right)\pi(y)\mu(dy)\\ & = & \int_S f(y) \pi(y) \mu(dy) = \int_Sf(y)\pi(dy), \end{array}\end{aligned}$$

    implying π is invariant. Let {X n}n≥0 be a stationary Markov process with transition probability p and invariant initial distribution π. Then, by the Markov property, the joint density of (X n, X n+1, …, X n+k), with respect to μ ×⋯ × μ, at (y 0, y 1, …, y k) ∈ S k+1 is

    $$\displaystyle \begin{aligned} g(y_0,y_1,\ldots, y_k) := \pi (y_0) p(y_0,y_1)p(y_1,y_2)\ldots p(y_{k-1}, y_k), {} \end{aligned} $$
    (8.26)

    while the joint density of (X n+k, X n+k−1, …, X n), at the same point (y 0, y 1, …, y k) ∈ S k+1, is

    $$\displaystyle \begin{aligned}\begin{array}{rcl} h(y_0,y_1,\ldots, y_k) &:=& g(y_k,y_{k-1},\ldots, y_1,y_0) \\ &=& \pi(y_k) p(y_k,y_{k-1}) p(y_{k-1}, y_{k-2}) \cdots p(y_1,y_0) \\ & = & \pi(y_{k-1}) p(y_{k-1}, y_k) p(y_{k-1}, y_{k-2}) \cdots p (y_1,y_0) \\ & = & p(y_{k-1},y_k) \pi(y_{k-1}) p(y_{k-1}, y_{k-2}) \cdots p(y_1,y_0) \\ & = & p(y_{k-1}, y_k) \pi(y_{k-2}) p(y_{k-2}, y_{k-1}) \cdots p(y_1,y_0)\\ & \vdots & \\ & = & p(y_{k-1},y_k) p(y_{k-2},y_{k-1}) \cdots p(y_1,y_2) \pi (y_1) p(y_1,y_0) \\ & = & p(y_{k-1},y_k) p(y_{k-2},y_{k-1}) \cdots p(y_1,y_2) \pi (y_0) p(y_0,y_1) \\ & = & g(y_0,y_1,\ldots,y_k). \end{array}\end{aligned}$$

    Since this is true for all k ≥ 1, the finite dimensional distributions of the double-sided version {Z n}<n< and {Y n}<n< with Y n := Z n (− < n < ), described in Definition 8.6, coincide. Thus, using the π − λ theorem,Footnote 6 it follows that the process {Z n}<n< and its time-reversal {Y n}<n< have the same distribution.

  2. (b)

    For the stationary Markov process {Z n}<n< to be time-reversible, it is necessary that the distribution of (Z 0, Z 1) is the same as that of (Y 0, Y 1) ≡ (Z 0, Z −1). But the latter has the same distribution as (Z 1, Z 0). The left side of (8.25) is the p.d.f. of (Z 0, Z 1) at (x, y), while the right side is the p.d.f. of (Z 1, Z 0) at (x, y). Thus for time-reversibility, (8.25) must hold.

\(\hfill \blacksquare \)

If π is an invariant probability, then by Jensen’s inequality,

$$\displaystyle \begin{aligned}\int_S(\int_S|f(y)|p(x,dy))^2\pi(dx)\le\int_S\int_Sf^2(y)p(x,dy)\pi(dx) = \int_Sf^2(y)\pi(dy),\end{aligned}$$

so that one may extend the transition operator T to L 2(S, π) ⊇ B(S). We will now show that, in analytical terms, time-reversibility of a Markov process means that the transition operator T is self-adjoint on L 2(S, π), i.e.,

$$\displaystyle \begin{aligned} \langle Tf,g \rangle = \langle f,Tg\rangle \qquad \ \text{for all} \ \, f,g \in L^2(S,\pi). {} \end{aligned} $$
(8.27)

Here 〈 〉 is the inner product on the Hilbert space L 2(S, π),

$$\displaystyle \begin{aligned}\langle g,h \rangle = \int_S g(y)h(y) \pi(dy).\end{aligned}$$

We will denote by ∥g∥ the L 2norm:g2 = 〈g, g〉.

FormalPara Proposition 8.9

Let π be an invariant probability of a Markov process. (a) The transition operator T is a contraction on L 2(S, π). (b) If π is a time-reversible invariant probability, then T is self-adjoint.

FormalPara Proof

(a) This is proved in (8.28).

(b) Let f, g ∈ L 2(S, π), and assume π is time-reversible in the sense of Definition 8.6. Then, conditioning on X 0, one has

$$\displaystyle \begin{aligned} \begin{array}{rcl} \langle Tf, g \rangle & = &\displaystyle \int_S \left( \int_S f(y)p(x,dy)\right)g(x)\mu(dx)\\ & =&\displaystyle {\mathbb E}\big({\mathbb E}(f(X_1)\vert\sigma(X_0))g(X_0)\big) = {\mathbb E}f(X_1)g(X_0) \\ & =&\displaystyle {\mathbb E}f(X_0)g(X_1) = \langle f, Tg \rangle. \end{array} \end{aligned} $$

\(\hfill \blacksquare \)

Recall in the case of finite S that, given an initial distribution μ for X 0, the distribution μ 1 of X 1 may be obtained by the transformation μ → μ 1 = p ′μ, where p is the transpose matrix, see (7.12). More generally, one may define an adjoint operator as follows.

FormalPara Definition 8.7

Given a transition probability p(x, dy) on \((S,{\mathcal S})\), the adjoint linear operator T is defined on the linear space \(\mathcal {M}(S)\) of all finite signed measures on \((S,{\mathcal S})\) by

$$\displaystyle \begin{aligned} (T^*\mu)(B) = \int_S p(x,B) \mu(dx) \qquad (B\in {\mathcal S}, \mu\in \mathcal{M}(S)). \end{aligned} $$
(8.28)

In general, if μ is a probability measure, then T μ is the distribution of X 1 where X 0 has distribution μ. In particular, π is an invariant probability if and only if

$$\displaystyle \begin{aligned} T^*\pi =\pi. \end{aligned} $$
(8.29)

To see the connection between the L 2(S, π)-adjoint of T and this more general operator, then, irrespective of (8.27), identify f ∈ L 2(S, π) with the signed measure fdπ, and note that T (fdπ)(dy) is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \int_Sg(y)T^*(fd\pi)(dy) & =&\displaystyle \int_S\int_S g(y)p(x,dy)f(x)\pi(dx) = \int_STg(x)f(x)\pi(dx) \\ & =&\displaystyle \langle Tg,f\rangle = \langle g,T^*f\rangle, g\in L^2(S,\pi), \end{array} \end{aligned} $$
(8.30)

where, by an obvious abuse of notation, T f ∈ L 2(S, π) is given by the L 2(S, π)-adjoint operator to T. In the interpretation of T as an operator on L 2(S, π), 1 is an eigenvalue of T with the constant eigenvector f(⋅) ≡ 1. For the adjoint operator on \(\mathcal {M}(S)\), this eigenvector corresponds to the invariant measure π(dy) = 1 ⋅ π(dy).

Define T n μ = T (T n−1 μ) iteratively on \(\mathcal {M}(S)\). In the case μ is a probability measure, and X 0 has distribution μ, T μ is the distribution of X 1, T ∗2 μ is the distribution of X 2, …, T n μ is the distribution of X n. Thus, whereas iterates of the transition operator T govern the “evolution of states” via \(T^nf(x) = {\mathbb E}_xf(X_n),\) the iterates of the adjoint T govern the “evolution of probability distributions” of the Markov process.

Now, T n is a linear operator on \(\mathcal {M}(S)\), as is T n on C b(S). The term adjoint operator given to T (or, T n) is more fully justified in terms of the basic identities

$$\displaystyle \begin{aligned} \int_S Tf(x) \mu(dx) = \int_S f(x) (T^*\mu)(dx), \ \int_ST^nf (x) \mu(dx) = \int_S f(x) (T^{*n} \mu)(dx). \end{aligned} $$
(8.31)

The first equality in (8.31) follows from (8.28), first for simple functions f and then by approximating f ∈ B(S) uniformly by simple functions. The second equality in (8.31) follows by induction on n. When μ is a probability measure, then the second equality says \({\mathbb E}_\mu [{\mathbb E}(f(X_n)\mid X_0)] = {\mathbb E}_{\mu }f(X_n)\), with X 0 having distribution μ.

FormalPara Example 2

S = {0, 1}, \({\mathbf {p}} = \left [ \begin {array}{cc} a & 1-a \\ 1-b & b \end {array} \right ]\), 0 ≤ a, b ≤ 1. S is a metric space with the discrete metric d(0, 0) = d(1, 1) = 0, d(1, 0) = d(0, 1) = 1, \(\mathcal S\) is the power set, and every function \(f:S \to \mathbb {R}\) is a bounded, continuous function. By the computation at the end of the previous chapter, one has for a + b < 2, with a + b ≠ 0,

$$\displaystyle \begin{aligned} \lim_{n\to \infty} {\mathbf{p}}^{(n)} = \left[ \begin{array}{cc} \frac{1-b}{2-a-b} & \frac{1-a}{2-a-b} \\ \frac{1-b}{2-a-b} & \frac{1-a}{2-a-b} \end{array}\right]\, . \end{aligned}$$

In particular, the invariant probability (vector) π = (π 0, π 1) is given by

$$\displaystyle \begin{aligned}\begin{array}{rcl} \pi_0 & = & \lim_{n\to \infty} p^{(n)}_{i0} = \frac{1-b}{2-a-b} \\ \pi_1 & = & \lim_{n\to\infty} p^{(n)}_{i1} = \frac{1-a}{2-a-b}\,\,, \qquad i=0,1. \end{array}\end{aligned}$$

Alternatively, one may determine π 0, π 1 from time-reversibility via the detailed balance and total probability one equations. In this case one could then use the L 2(S, π) theory for self-adjoint operators to obtain convergence (Exercise 7).

In the cases a + b = 2 and a + b = 0, one has \({\mathbf {p}} = \left [ \begin {array}{cc} 1 & 0 \\ 0 & 1 \end {array}\right ]\) and \({\mathbf {p}} = \left [ \begin {array}{cc} 0 & 1 \\ 1 & 0 \end {array} \right ]\), respectively. In the former case every probability on S = {0, 1} is an invariant probability, and hence there are infinitely many invariant probabilities. In the latter case there is a unique invariant probability given by \(\pi _0 = \pi _1 = {1\over 2}\); but p n does not have a limit, although (8.24) holds.

FormalPara Example 3

Suppose \(S= \mathbb {R}\) with Borel σ-field \(\mathcal S\). Let ε 1, ε 2, … be an i.i.d. sequence of standard normal random variables and let \(b \in \mathbb {R}\). Consider the sequence of random variables

$$\displaystyle \begin{aligned}X_{n+1} = bX_n + \varepsilon_{n+1}, \quad n=0,1,2,\ldots\, .\end{aligned}$$

Iterating the recursion, one has that

$$\displaystyle \begin{aligned}X_{n+1} = b^{n+1} X_0 + \sum^n_{j=0} b^j \varepsilon_{n+1-j}.\end{aligned}$$

In particular, the m-step transition probability p (m)(x, dy) is given by the Gaussian distribution with mean b m x and variance \(\sum ^{m-1}_{j=0} b^{2j} = \frac {b^{2m}-1}{b^2-1}\) if |b|≠ 1. In the case b = ±1, p (m)(x, dy) is Gaussian with mean (±1)m x and variance m. In any case p(x, dy) clearly has the (weak) Feller property, and in particular, if |b| < 1, then p (m)(x, dy) converges (weakly) as m → to the invariant distribution \(\pi (dy) = \frac {1}{\sqrt {2\pi (1-b^2)^{-1}}} e^{- {{1-b^2}\over {2}} y^2} dy\), y ∈ S = (−, ), possessing a Gaussian density with respect to Lebesgue measure. Note that for |b| < 1, if X is N(0, σ 2) and Z is standard normal and independent of X, then X =dist bX + Z (equality in distribution) if and only if σ 2 = (1 − b 2)−1. One may check that π is also a time-reversible invariant probability (Exercise 13).

FormalPara Example 4 (Random Walk on a Finite Graph)

A finite graph consists of a finite set S = {v 1, …, v k} of k vertices together with a relation \({\mathcal E}\subset \{1,\dots ,k\}\times \{1,\dots ,k\}\), with the property that \((i,j)\in {\mathcal E}\) if and only if \((j,i)\in {\mathcal E}\), defining edges as follows: there is an edge e ij connecting vertices v i and v j if and only if \((i,j)\in {\mathcal E}\), denoted by means of the obvious abuse of notation \(e_{ij}\in {\mathcal E}\). The graph is said to be connected if for any pair of distinct vertices v i, v j there is a path of m ≥ 1 edges \(e_{ii_1}, e_{i_1i_2},\dots e_{i_{m-1}i_m}\) with i m = j. For a fixed vertex v i, the integer \(d_i := \mathop {\textit {card}} \nolimits \{j: e_{ij}\in \mathcal {E}\}\) is called the degree of v i. A random walk on a finite connected graph \((S,{\mathcal E})\) may be defined as a Markov chain with state space S and transition probabilities given by \(p_{v_i,v_j} = 1/d_i\) if and only if \(e_{ij}\in {\mathcal E}\), else \(p_{v_i,v_j} = 0\). It is straightforward to check that up to normalization, the vertex degrees define the unique time-reversible invariant probability for a random walk on a finite connected graph (see Exercise 12).

An approach to the construction of invariant probabilities, similar to that in Proposition 8.4 but which is valid without the Feller property, is given below.

FormalPara Proposition 8.10

Let p(x, dy) be a transition probability on a state space \((S,\mathcal {S})\).

  1. a

    If for some x ∈ S there is a sequence of integers 1 ≤ n 1 < n 2 < ⋯ such that, as k →,

    $$\displaystyle \begin{aligned} \frac{1}{n_k} \sum^{n_k-1}_{r=0} p^{(r)} (x,B) \longrightarrow \pi_x(B) \quad \ \text{for all} \ B\in {\mathcal S} \end{aligned} $$
    (8.32)

    for some probability measure π x on \((S,{\mathcal S})\), then π x is an invariant probability for p.

  2. b

    If, for some sequence {n k : k ≥ 1}, (8.32) holds for every x ∈ S with the same limit π x = π for all x, then π is the unique invariant probability for p.

FormalPara Proof

The proof is essentially the same as that of Proposition 8.4, with C b(S) replaced by B(S)—the space of real-valued bounded measurable functions on S. Note that (8.32) implies

$$\displaystyle \begin{aligned}{1\over{ n_k}} \sum^{n_k-1}_{r=0} (T^rf)(x) \longrightarrow \int_S f(y) \pi_x (dy) \quad \ \text{for all} \ \, f\in B(S).\end{aligned} $$

\(\hfill \blacksquare \)

To close this chapter, let us note that a large class of examples of Markov chains occur as functions of a given, perhaps more primitive Markov chain. Of course, one-to-one functions are simply relabeling of the states and do not affect the dependence structure. A more general class of functions can be obtained as followsFootnote 7.

FormalPara Definition 8.8

A measurable function φ on (S, \({\mathcal S}\)) to a measurable space (S, \({\mathcal S}^{\prime }\)), is said to be an invariant function of a group G of transformations on S if (i) φ(gx) = φ(x) for all g ∈ G, x ∈ S. If, in addition, (ii) every measurable invariant function is a measurable function of φ, then φ is said to be a maximal invariant.

FormalPara Example 5
  1. 1.

    For each x ∈ S, the orbit of x under G is defined by o(x) = {gx : g ∈ G}. Note that each invariant function is constant on orbits. Let S and S be metric spaces and φ : S → S a measurable surjection such that (a) φ is constant on orbits, (b) φ(x) ≠ φ(y) if o(x) ≠ o(y), i.e., φ is a relabeling of o(x), and S may be viewed as a relabeling of the space of orbits. Then φ is a maximal invariant since (i) invariance is obvious by (a), and (ii) if φ(x) = φ(y), then ρ(x) = ρ(y) for any invariant function ρ by (b), i.e., ρ is a function of φ.

  2. 2.

    φ(x) = |x| is a maximal invariant of the reflection group {e, −e}, where \(ex=x, (-e)x = -x, x\in S={\mathbb R}, S^{\prime } = [0,\infty ).\)

FormalPara Proposition 8.11

Suppose X = {X n} is a Markov process on S whose transition probabilities are invariant under the group G of transformations from S to S’, i.e.,

$$\displaystyle \begin{aligned}p(gx,g(B)) = p(x,B), \quad \forall\ x\in S, g\in G, B\in{\mathcal S}.\end{aligned}$$

If φ is a maximal invariant, then {φ(X n)} is Markov.

FormalPara Proof

Take conditional expectations with respect to the larger σ(X m : m ≤ n), followed by the smaller σ(φ(X m) : m ≤ n), n = 1, 2, …, to get

$$\displaystyle \begin{aligned} P(\varphi(X_{n+1})\in B\vert \varphi(X_m),m\le n) = {\mathbb E}\{p(X_n,\varphi^{-1}(B))\vert \varphi(X_m),m\le n\}. \end{aligned} $$
(8.33)

By invariance of φ, φ ∘ g = φ, and one has from (i),

$$\displaystyle \begin{aligned} \begin{array}{rcl} p(x,\varphi^{-1}(B)) & =&\displaystyle p(g^{-1}x,g^{-1}(\varphi^{-1}(B)))\\ & =&\displaystyle p(g^{-1}x,\varphi^{-1}(B)). \end{array} \end{aligned} $$
(8.34)

That is, the function x → p(x, φ −1(B)) is invariant. By (ii), therefore, it is a function q(φ(x), B), say, of φ. Thus,

$$\displaystyle \begin{aligned} \begin{array}{rcl} P(\varphi(X_{n+1})\in B\vert \varphi(X_m),m\le n) & =&\displaystyle {\mathbb E}\{p(X_n,\varphi^{-1}(B))\vert \varphi(X_m),m\le n\}\\ & =&\displaystyle {\mathbb E}\{q(\varphi(X_n),B)\vert \varphi(X_m),m\le n\}\\ & =&\displaystyle q(\varphi(X_n),B). \end{array} \end{aligned} $$
(8.35)

Thus, {φ(X n)} is Markov with one-step transition probabilities q(φ(x), B).\(\hfill \blacksquare \)

FormalPara Example 6 (Reflecting Simple Symmetric Random Walk)

Consider the unrestricted simple symmetric random walk on \({\mathbb Z}\) starting at the origin, defined by \(S_n := \sum _{j=1}^nX_j, n\ge 1, \ S_0 = 0\), where the displacements X n : n ≥ 1 are i.i.d. ± 1-valued symmetric Bernoulli random variables. Then, since \(p_{ij} = {1\over 2}\delta _{i-1}(j) +{1\over 2}\delta _{i+1}(j), i,j\in {\mathbb Z}\) is invariant under the reflection group G = {e, −e} where e(i) = i, \(i\in {\mathbb Z}\), it follows that {R n := |S n|}n is a Markov chain. (Also see Example 11.1.)

Exercises

  1. 1.

    Prove that for a transition probability p the measurability x → p(x, B) \(\ \text{for all} \ B\in {\mathcal S}\) implies that x → Tf(x) is measurable \(\ \text{for all} \ f\in \mathbb {B}(S)\). Show that (i) T is a linear operator on \(\mathbb {B}(S)\), (ii) \(||Tf|| \le ||f||, f\in \mathbb {B}(S), ||f|| = \sup _{x\in S}|f(x)|\), (iii)T 1 = 1, where 1(x) = 1, for all x ∈ S, and (iv) Tf ≥ 0 on S if \(f\in \mathbb {B}(S)\) is a nonnegative function.

  2. 2.

    Let p(x, dy) be a transition probability on \((S,{\mathcal S})\).

    1. (a)

      Show that x → P x(B) is \({\mathcal S}\)-measurable for all \(B\in {\mathcal S}^{\otimes \infty }\), and letting X denote the identity map on S , the function \(y\to {\mathbb E}_yf({\mathbf X})\) is \({\mathcal S}\)-measurable for all bounded measurable \(f:S^\infty \to {\mathbb R}\).

    2. (b)

      For every bounded \({\mathcal S}^{\otimes n}\)-measurable function f on S n, show that the function (x 0, x 1, …, x n−1) →∫S f(x 0, x 1, …, x n−1, y)p(x n−1, dy) is \({\mathcal S}^{\otimes n}\)-measurable.

  3. 3.

    Express time-reversibility and detailed balance without requiring densities. [Hint: Detailed balance may be stated as π(dx)p(x, dy) = π(dy)p(y, dx), suitably interpreted, and the consequent time-reversibility compares the joint distribution of (X 0, …, X k) with that of (X k, X k−1, …, X 0).]

  4. 4.

    Prove that \(T^nf(x) = \int _S f(y)p^{(n)}(x,dy) = {\mathbb E}(f(X_n)\vert X_0=x), x\in S\), for all bounded, measurable functions f on S, and n ≥ 1.

  5. 5.

    Show that the transition probability p in Example 1 (i) is not Feller continuous, and (ii) does not satisfy the hypothesis of Proposition 8.10. [Hint: (i) has the relative topology of [0, 1] so that all points m∕(m + 1), m = 0, 1, …, are isolated and only 1 is a point of accumulation. Hence a real-valued function on S is continuous if and only if \(f({m\over m+1})\to f(1)\) as m →. For (ii), p (n)(x, dy) → δ(dy) weakly as n → for all x ∈ S.]

  6. 6.

    Let S be a locally compact separable metric space. Suppose that T is a positive linear contraction operator on C b(S) with T1 = 1. Use the Riesz Representation Theorem to show that T uniquely determines Feller transition probabilities p(x, dy) such that Tf(x) =∫S f(y)p(x, dy).

  7. 7.

    In the case of the two-state Markov chain in Example 2, use time-reversibility to compute the invariant probability π and establish convergence by an appeal to the spectral theorem for self-adjoint linear operators on L 2(S, π).

  8. 8.

    Let X 0, X n := X 0 + Z 1 + ⋯ + Z n (n ≥ 1) be a (general) random walk \(S=\mathbb {R}^k\) with step size distribution Q, i.e., {Z n : n ≥ 1} is an i.i.d. sequence with common distribution Q on \((\mathbb {R}^k, \mathcal {B}^k)\), independent of X 0. Show that (i) {X n : n ≥ 0} is Markov and (ii) no invariant probability exists if Q ≠ δ {0}.

  9. 9.

    (Birth–Death Chain with Two Reflecting Boundaries) Let S = {0, 1, 2, …, d} (d > 1), p(x, x + 1) = β x, p(x, x − 1) = δ x ≡ 1 − β x, with 0 < β x < 1 (x = 1, 2, …, d − 1), β 0 ≡ p(0, 1) = 1, δ d ≡ p(d, d − 1) = 1. Prove that there exists a unique invariant probability π, and the Markov process with this initial distribution is time-reversible. [Hint: There is a unique probability π for which (8.25) holds (with μ as counting measure). To solve for π, note that (8.25) implies π(x + 1)∕π(x) = β xδ x+1 (x = 0, 1, …, d − 1). Check that this must be true of the ratios for any invariant probability.]

  10. 10.

    (Time-Reversed Stationary Markov Process) Let {X n : n ≥ 0} be a stationary Markov process on \((S,\mathcal {S})\) with a transition probability density p(x, y) (w.r.t. a σ-finite measure μ) and an invariant probability density π(y), y ∈ S. Let {Z n : − < n < } be a stationary Markov process such that {Z n : n ≥ 0} has the same distribution as {X n : n ≥ 0}. Show that \(\{Y_n :=Z_{-n} :n\in \mathbb {Z}\}\) is (i) stationary and (ii) Markov with the transition probability density q(x, y) := (π(y)∕π(x))p(y, x); note that this is simply “Bayes formula” for (X n, X n+1). [Hint: Check that π is an invariant probability for q, and then construct a stationary double-sided Markov process {R n}<n< with these transition probabilities and invariant probability. Then check that the processes \(\{Y_n:n\in \mathbb {Z}\}\) and \(\{R_n:n\in \mathbb {Z}\}\) have the same distribution by considering finite dimensional events.]

  11. 11.

    Let {X n : n ≥ 0} be a Markov chain on the countable state space S. Assume that for any i, j ∈ S, one has \(p_{ij}^{(n)} > 0\) for some n ≥ 1. We say that p = ((p ij)) is irreducible in this case. Define Y n = (X n, X n+1), n = 0, 1, 2, ….

    1. (a)

      Show that {Y n : n ≥ 0} is a Markov chain on S  = {(i, j) ∈ S × S : p ij > 0}.

    2. (b)

      Show that if {X n : n ≥ 0} is irreducible, then so is {Y n : n ≥ 0}.

    3. (c)

      Show that if {X n : n ≥ 0} has invariant distribution π = (π i), then { Y n : n ≥ 0} has invariant distribution (π i p ij).

    4. (d)

      Show that an irreducible Markov chain on a state space S with an invariant initial distribution π is time-reversible if and only if (Kolmogorov Condition):

      $$\displaystyle \begin{aligned}p_{ii_1}p_{i_1 i_2}\cdots p_{i_k i}=p_{ii_k}p_{i_k i_{k-1}}\cdots p_{i_1i}\qquad \mbox{for all}\ i,i_1,\ldots, i_k \in S, \quad k\ge 1.\end{aligned}$$
    5. (e)

      If there is a j ∈ S such that p ij > 0 for all ij in (d), then for time-reversibility it is both necessary and sufficient that p ij p jk p ki = p ik p kj p ji for all i, j, k.

  12. 12.

    (A General Finite State Space Graph) Let {X n : n ≥ 0} be an irreducible Markov chain on a finite state space S; i.e., for each i, j ∈ S, there is an n ≥ 1 such that \(p_{ij}^{(n)} > 0.\) Define a graph G having states of S as vertices with edges joining i and j if and only if either p ij > 0 or p ji > 0.

    1. (a)

      Show that G is connected; i.e., for any two sites i and j, there is a path of edges from i to j.

    2. (b)

      Show that if {X n : n ≥ 0} has an invariant distribution π, then for any A ⊂ S,

      $$\displaystyle \begin{aligned}\sum_{i\in A} \sum_{j \in S \setminus A}\pi_i p_{ij}=\sum_{i \in A}\sum_{j \in S \setminus A}\pi_j p_{ji}\end{aligned}$$

      i.e., the net probability flux across a cut of S into complementary subsets A, S ∖ A is in balance. [Hint: Notice that ∑iAjS π i p ij =∑iAjS π i p ji.]

    3. (c)

      Show that if G contains no cycles of three or more vertices, i.e., m = 3 or more distinct vertices v 1, …, v m such that v i and v i+1 are joined by an edge for i = 1, …, m and v m+1 = v 1, then the process is time-reversible started with π. A connected graph without cycles is called a tree graph. [Hint: Proceed inductively on the number of states.]

    4. (d)

      Give a graphical proof that an invariant probability for a birth–death Markov chain on {0, 1, …, N} with reflecting boundaries at 0, N must be time-reversible.

  13. 13.

    Prove the time-reversibility of Example 3 when |b| < 1 and {𝜖 n : n ≥ 1} is an i.i.d. standard normal sequence.

  14. 14.

    Consider a Markov process {X n : n = 0, 1, 2, … } on a metric space S defined recursively by X n+1 = g(X n, 𝜖 n+1), n ≥ 0, where (i) {𝜖 n : n ≥ 1} is a sequence of i.i.d. random variables with values in a metric space U, and independent of X 0, and (ii) g : S × U → S is continuous. Show that the Markov process {X n : n = 0, 1, 2, … } has the Feller property.