1 Introduction

These notes are based on the course of six lectures given by the first named author at the well-run workshop organised at IIT-Delhi in the month of December, 2017. The lectures were intended to be self-contained covering some basic facts in ergodic theory including a discussion of the Birkhoff ergodic theorem which, in a sense, heralded the beginning of ergodic theory. Since the audience mainly consisted of graduate students with different mathematical backgrounds, the lectures began with a quick recap of the construction of the Lebesgue measure in \(\mathbb {R}\) and progressed gradually to a discussion of more general measures. After setting up the groundwork on measure preserving transformations and flows on measure spaces, the notion of ergodicity was introduced.

Following a brief look at a couple of illustrative examples of dynamical systems, the focus shifted to a discussion of one of the early interesting examples of an ergodic system, namely the geodesic flow on closed surfaces of constant negative curvature. This necessitated a working recapitulation of the geometry of the upper-half plane with respect to the hyperbolic metric, the lectures culminated with a sketch of a proof, due to Eberhard Hopf, of the ergodicity of the geodesic flow in this setting.

The notes, naturally, reflect the dynamics that the lectures carried and also include some historical titbits in an attempt to capture the significance of the exciting developments, that have shaped this field of study.

The first named author would like to record his deep gratitude to the organisers of this extremely well-run workshop, and to Nikita Agarwal who cheerfully conducted the afternoon tutorials at the workshop with great energy and lot of prior planning. Both the authors thank the efficient editors of this volume for their invitation to script the sketchy lecture notes into a coherent narrative, and the anonymous referees whose careful comments as well as suggestions to add a few explanatory lines at a couple of places helped weed out the several inadvertent typos and in improving the readability. The authors take full responsibility for any errors that may still remain despite their sincere efforts to make these notes error free.

2 Measure Theoretic Preliminaries

This section seeks to develop some rudimentary aspects of measure, starting with the illuminating case of the Lebesgue measure on the real line. Finding the measure of a set means to get a certain estimation of its size. A finite set could be measured by its cardinality, whereas what distinguishes an infinite set from a finite set is its intriguing property of being in bijective correspondence with a proper subset of itself. This begs the question as to how one would determine the size of an infinite subset of the real line \(\mathbb {R}\)?

For a subset which is an interval \(I = (a, b) \subset \mathbb {R}\), its length |I|, namely \(b - a\) seems a natural and a reasonable estimation of its size. In fact, the seminal investigation that Henri Lebesgue undertook culminating with the description of the so called Lebesgue measure, by exploiting the notion of length, appears in his fundamental paper of 1904 [10].

The basic idea of the Lebesgue measure on \(\mathbb {R}\) stems from an effort to adapt the notion of length for an arbitrary subset of \(\mathbb {R}\). This turns out to be a very profitable enterprise, as building on finer and subtle variants of this notion, allows one to describe a whole family of s-dimensional Hausdorff measures for each \(s \in (0, 1]\); in turn giving rise to the notion of Hausdorff dimension of a given subset. We shall quickly uncover the main facets in this section, particularly mentioning the succinct and elegant work of Caratheodory [4].

We begin by first recalling the notion of outer measure.

Definition 2.1

If \(A \subseteq \mathbb {R}\), the (Lebesgue) outer measure of A is

$$\begin{aligned} \mu ^{*}(A)= & {} \inf \Bigg \{ \sum _{k = 1}^{\infty } | I_{k} |\, :\, A \subseteq \bigcup _{k = 1}^\infty I_{k},\ \text {where}\ ( I_{k} )_{k = 1}^{\infty }\ \text {is} \\&\text {a collection of open intervals} \Bigg \}. \end{aligned}$$

The completeness property of the reals ensures that if at least one of the members of the above set is finite, then \(\mu ^{*}\) will be a finite non-negative real number. If no such finite number exists, then the outer measure of A is said to be infinite.

Definition 2.2

If \(A \subseteq \mathbb {R}\) and \(h \in \mathbb {R}\), the translate of A by h is

$$\begin{aligned} A + h = \{ x + h\, :\, x \in A \}. \end{aligned}$$

The outer measure on \(\mathbb {R}\) exhibits the following properties which can easily be derived from first principles.

Theorem 2.3

This theorem features the basic properties of outer measure on \(\mathbb {R}\).

  1. 1.

    (Non-negativity)    \(0 \le \mu ^{*} (A) \le + \infty \).

  2. 2.

    (Monotonicity)    \(A \subseteq B \Longrightarrow \mu ^{*} (A) \le \mu ^{*} (B)\).

  3. 3.

    (Countable subadditivity)    \(A \subseteq \bigcup _{n = 1}^{\infty } A_{n} \Longrightarrow \mu ^{*} (A) \le \sum _{n = 1}^{\infty } \mu ^{*} (A_{n})\).

  4. 4.

    (Translation invariance)    \(\mu ^{*} (A + h) = \mu ^{*} (A)\).

  5. 5.

    \(\mu ^{*} (A) = |A|\), the length of A, if A is an interval.

While the above mentioned properties inherently follow from the definition; one other natural and desirable property is to expect that the outer measures of two disjoint sets A and B add up to the outer measure of their disjoint union \(A \cup B\). This expectation lies at the heart of our discussion and, in a sense, the real essence of the theory lies in understanding this rather innocuous requirement.

A moment’s reflection on what the finite additivity property ensures, can be gathered from the following. If \(\{A_{i}\},\, i = 1, \ldots ,\infty \) is a countable collection of pairwise disjoint subsets of \(\mathbb {R}\), then

$$\begin{aligned} \sum _{i = 1}^{n} \mu ^{*}(A_{i}) = \mu ^{*} \left( \bigcup _{i = 1}^{n} A_{i} \right) \le \mu ^{*} \left( \bigcup _{i = 1}^{\infty } A_{i} \right) \le \sum _{i = 1}^{\infty } \mu ^{*} (A_{i}). \end{aligned}$$

Taking limits as \(n \rightarrow \infty \) on both sides results in the countable additivity of the outer measure.

But, the outer measure \(\mu ^{*}\) defined above has a singular shortcoming in that it is not finitely additive! One way to see this fact, a posteriori, is to glean from Vitali’s construction in 1905 [12], of a non-measurable subset of \(\mathbb {R}\). Recall that Vitali exhibited a proper non-empty subset C of \(\mathbb {R}\), taking rational translates of which, one obtains a countable collection of pairwise disjoint subsets of \(\mathbb {R}\). It is on this collection, that the outer measure \(\mu ^{*}\) cannot be countably additive. In particular, there are disjoint subsets A and B of \(\mathbb {R}\) such that \(\mu ^{*} (A \cup B) \ne \mu ^{*} (A) + \mu ^{*} (B)\).

In other words, there are subsets X and O of \(\mathbb {R}\) such that for the partition by O of X, into disjoint subsets \(X \cap O\) and \(X \cap O^{c}\), one has

$$\begin{aligned} \mu ^{*}(X) \ne \mu ^{*}(X \cap O) + \mu ^{*}(X \cap O^{c}). \end{aligned}$$

To see this, consider disjoint sets A and B and take \(X = A \cup B\) and \(O = A\). Therefore, \(\mu ^{*} (X) = \mu ^{*} (A \cup B) \ne \mu ^{*} (A) + \mu ^{*} (B) = \mu ^{*} (X \cap O) + \mu ^{*} (X \cap O^{c})\).

Consequently, one looks at the collection, \(\mathfrak {M}\), of all those sets \(E \subseteq \mathbb {R}\) such that

$$\begin{aligned} \mu ^{*}(A) = \mu ^{*} (A \cap E) + \mu ^{*} (A \cap E^{c}),\; \forall A \subseteq \mathbb {R}. \end{aligned}$$
(1)

On this collection, \(\mathfrak {M}\), the outer measure \(\mu ^{*}\) is countably additive. The collection \(\mathfrak {M}\), which includes open intervals, constitutes a \(\sigma \)-algebra, and the outer measure restricted to \(\mathfrak {M}\) is called the Lebesgue measure on \(\mathfrak {M}\). The expression (1) is termed as the Caratheodory criterion and naturally leads to the definition of a (Lebesgue) measurable set. The next two definitions make this observation precise.

Definition 2.4

A family of subsets, \(\mathfrak {M}\) of a set X is said to be a \(\sigma \)-algebra if the following hold:

  1. 1.

    \(X \in \mathfrak {M}\);

  2. 2.

    \(A \in \mathfrak {M} \implies A^{c} \in \mathfrak {M}\);

  3. 3.

    \(\bigl \{ A_{i} \bigr \}_{i = 1}^{\infty } \in \mathfrak {M} \implies \bigcup \limits _{i = 1}^{\infty } A_{i} \in \mathfrak {M} \).

Definition 2.5

A set \(E \subseteq \mathbb {R}\) is said to be Lebesgue measurable or measurable if the Caratheodory criterion (1) holds with respect to E.

In light of the preceding definitions, the conclusions of the next proposition can be deduced using properties of the outer measure given in Theorem 2.3.

Proposition 2.6

  1. 1.

    If I is an interval, then \(I \in \mathfrak {M}\) and \(\mu ^{*} (I) = |I|\).

  2. 2.

    If \(A \in \mathfrak {M}\), then \(A^{c} \in \mathfrak {M}\).

  3. 3.

    If \(A, B \in \mathfrak {M}\), then \(A \cup B,\ A \cap B \in \mathfrak {M}\).

  4. 4.

    If pairwise disjoint sets \(A_{1}, A_{2},\ldots , A_{N} \in \mathfrak {M}\) and \(E \subseteq \mathbb {R}\), then

    $$\begin{aligned} \mu ^{*} \left( E \cap \left( \bigcup _{k = 1}^{N} A_{k} \right) \right) = \sum _{k = 1}^{N} \mu ^{*} (E \cap A_{k}). \end{aligned}$$
  5. 5.

    (Countable additivity or \(\sigma \)-additivity) If \(\{ A_{n} \}_{n = 1}^{\infty }\) is any sequence of measurable sets, then \(\bigcap \limits _{n = 1}^{\infty } A_{n}\) and \(\bigcup \limits _{n = 1}^{\infty } A_{n}\) are also measurable. Further, if \(\{ A_{n} \}_{n = 1}^{\infty }\) is a sequence of pairwise disjoint measurable sets, then \(\bigcup _{n = 1}^{\infty } A_{n} \in \mathfrak {M}\) and

    $$\begin{aligned} \mu ^{*} \left( \bigcup _{n = 1}^{\infty } A_{n} \right) = \sum _{n = 1}^{\infty } \mu ^{*} (A_{n}). \end{aligned}$$

Definition 2.7

Suppose \(A \in \mathfrak {M}\). Then its (Lebesgue) measure, \(\mu (A)\) is defined to be its outer measure: \(\mu (A) = \mu ^{*} (A)\).

Remark 2.8

  • The reason for the need of two different concepts is that both have their disadvantages.

  • \(\mu \) is an additive measure, but is not defined for all subsets of \(\mathbb {R}\).

  • \(\mu ^{*}\) is defined for all subsets of \(\mathbb {R}\), but is not additive, as demonstrated by Vitali’s construction.

A more restricted class of Lebesgue measurable sets are the Borel measurable sets.

Definition 2.9

If X is any topological space (in this case \(\mathbb {R}\)), then the \(\sigma \)-algebra, \(\mathfrak {B}\) generated by the class of open sets in X (resp. open intervals in \(\mathbb {R}\)) are called the Borel sets of X (resp. \(\mathbb {R}\)).

Remark 2.10

It can be easily shown that the Borel \(\sigma \)-algebra for \(\mathbb {R}\) includes the half-open intervals such as [ab) as well as closed intervals and further that every Borel set is (Lebesgue) measurable.

The important properties of the outer measure \(\mu ^{*}\) continue to hold on replacing \(\mu ^{*}\) by \(\mu \) whenever \(A \in \mathfrak {M}\).

Theorem 2.11

Here, we enlist some additional properties of measurable sets.

  1. 1.

    Continuity: Suppose \(A_{1} \supseteq A_{2} \supseteq A_{3} \cdots \) and \(B_{1} \subseteq B_{2} \subseteq B_{3} \cdots \) are sequences of measurable sets, and \(\mu (A_{1}) < \infty \). Then,

    $$\begin{aligned} \mu \left( \bigcap _{n = 1}^{\infty } A_{n} \right) = \lim _{n \rightarrow \infty } \mu (A_{n}) \quad \text {and} \quad \mu \left( \bigcup _{n = 1}^{\infty } B_{n} \right) = \lim _{n \rightarrow \infty } \mu (B_{n}). \end{aligned}$$
  2. 2.

    Approximation: If \(A \in \mathfrak {M}\), and \(\mu (A) < \infty \), then for all \(\epsilon > 0\) there exists a bounded closed set B and an open set C such that \(B \subseteq A \subseteq C\), and \(\mu (C {\cap } B^{c}) {<} \epsilon \).

The previously sketched discussion of the construction of the Lebesgue measure on \(\mathbb {R}\), starting from the notion of outer measure is, in a sense, a proto for the construction of measures more generally on complete metric spaces. In the setting of a metric space X together with the distance function d, one starts with the notion of a ‘metric outer measure’ which estimates the size of a subset A, by considering covers of A by a countable number of open balls; then, using radii of open balls, one considers an appropriate measure of their sizes to analogously replace lengths of intervals.

We shall elaborate more on this later when discussing Hausdorff measures, but will now proceed to a discussion of measures in general.

Definition 2.12

A measure space is a triple \((X, \mathfrak {M}, \mu )\), where X is any set, \(\mathfrak {M}\) is a \(\sigma \)-algebra of measurable sets and \(\mu \) is a \(\sigma \)-additive measure.

A measurable space is just the pair \((X, \mathfrak {M})\) with no specification about the measure. The concept of \(\sigma \)-finiteness is another desirable property for a measure to possess.

Definition 2.13

A measure space \((X, \mathfrak {M}, \mu )\) is said to be \(\sigma \)-finite if X can be written as a countable union of measurable sets of finite measure i.e., \(X = \bigcup \limits _{n = 1}^{\infty } A_{n}\) with \(\mu (A_{n}) < + \infty \), for all n. \(\mu \) is then said to be a \(\sigma \)-finite measure.

Definition 2.14

Given a measure space \((X, \mathfrak {M}, \mu )\), a set \(A \subset X\) is said to be a null set or a set of measure zero if there exists a set \(A_{1} \in \mathfrak {M}\) so that \(A \subseteq A_{1}\) and \(\mu (A_{1}) = 0\). Furthermore, two sets \(A_{1}, A_{2} \subset X\) are said to be equivalent mod 0 if their symmetric difference, \(A_{1} \Delta A_{2}\) i.e., \((A_{1} \setminus A_{2}) \cup (A_{2} \setminus A_{1})\) has measure zero and this is denoted as \(A_{1} \equiv A_{2} \pmod {0}\).

Remark 2.15

  1. 1.

    It should be noted in this context that not every measurable set is a Borel set. In fact, it is possible to construct sets of measure zero which are Lebesgue measurable but not Borel measurable. Thus, the Lebesgue measure serves as a completion of the Borel measure.

  2. 2.

    Note that a more formal definition of a complete measure is as follows: Given a measure space \((X, \mathfrak {M}, \mu ),\ \mu \) is complete if and only if for any \(N \in \mathfrak {M}\) where \(\mu (N) = 0,\ E \subseteq N\) implies \(E \in \mathfrak {M}\). The Lebesgue measure is complete precisely in the above sense.

Another example of a finite measure space is the probability space which is the space of choice for ergodic theory. For a measure space \((X, \mathfrak {M}, \mu )\), if \(\mu (X) = 1\), then X is a said to be probability space and \(\mu \) a probability measure.

Measure zero sets are very useful in characterising properties in measure theory.

Definition 2.16

A property P of points of a set \(A \subseteq X\) is said to hold almost everywhere (a.e.) if the set of points of A which do not satisfy P form a set of measure zero.

2.1 Measurable Functions and Transformations

We now move on to the notion of a measurable function which closely mirrors the topological definition of a continuous function. The first definition is formulated in the setting of general measure spaces.

Definition 2.17

(Measurable functions or transformations) If \((X, \mathfrak {M})\) and \((Y, \mathfrak {N})\) are two measurable spaces, then a map \(f : X \longrightarrow Y\) is measurable if \(f^{-1} (A)\) is measurable i.e., \(f^{-1}(A) \in \mathfrak {M}\) for every \(A \in \mathfrak {N}\). Further, if X and Y are topological spaces, then \(f : X \longrightarrow Y\) is said to be (Borel-) measurable if it is measurable with respect to the Borel \(\sigma \)-algebras of X and Y.

Remark 2.18

The above definition implies that every continuous function is (Borel-) measurable.

In the sequel, we use the extended real line \( \bar{\mathbb {R}} = \mathbb {R} \cup \{-\infty , \infty \}\) with the usual conventions. To keep things simple, in the remaining part of this section, we restrict ourselves to extended real-valued functions defined on \(\mathbb {R}\) (equipped with the usual Lebesgue measure), unless otherwise explicitly stated, although the statements hold in the more general setting of complete measure spaces.

Remark 2.19

In particular, if \(f : (\mathbb {R}, \mathfrak {L}) \longrightarrow (\bar{\mathbb {R}}, \mathfrak {B}),\) where \(\mathfrak {L}\) is the Lebesgue \(\sigma \)-algebra, and f is measurable as in the Definition 2.17, then f is said to be Lebesgue measurable.

For extended real-valued functions fg, denote

$$\begin{aligned} (f \wedge g)(x) = \min \{ f(x), g(x) \}, \quad (f \vee g)(x) = \max \{ f(x), g(x) \}. \end{aligned}$$

Proposition 2.20

Measurable functions satisfy the following notable properties:

  1. 1.

    Suppose fg are measurable functions and \(c \in \mathbb {R}\), then \(c f, f + g, f g, |f|, f \wedge g, f \vee g\) are measurable.

  2. 2.

    Suppose \(\{ f_{n} \}_{n = 1}^{\infty }\) is a sequence of measurable functions and \(\lim _{n \rightarrow \infty } f_{n} (x) = f(x)\), then f is measurable.

  3. 3.

    Suppose \(\{ f_{n} \}_{ n =1}^{\infty }\) is a sequence of measurable functions. Let \(g(x) = \inf \{ f_{n} (x) \}\) and \(h(x) = \sup \{ f_{n} (x) \}\). Then g and h are measurable.

Definition 2.21

The indicator function of a set \(A \subseteq \mathbb {R}\) is the function

$$ \chi _{A} (x) = {\left\{ \begin{array}{ll} 1 &{} \text {if}\ x \in A, \\ 0 &{} \text {if}\ x \notin A. \end{array}\right. } $$

Definition 2.22

A simple function is a function of the form

$$\begin{aligned} f = a_{1} \chi _{A_{1}} + \cdots + a_{n} \chi _{A_{n}} \quad \text {where}\ a_{i} \in \mathbb {R}, A_{i} \in \mathfrak {M}\ \text {and}\ \mu (A_{i}) < \infty . \end{aligned}$$

Definition 2.23

The integral of a simple function \(f = \sum _{i = 1}^{n} a_{i} \chi _{A_{i}}\) is

$$\begin{aligned} \int f\, d \mu = \int _{\mathbb {R}} f\, d \mu = \sum _{i = 1}^{n} a_{i} \mu (A_{i}). \end{aligned}$$

Definition 2.24

(Integral of nonnegative measurable functions) If \(f : \mathbb {R} \longrightarrow \mathbb {R}\) is a nonnegative measurable function, then its integral is

$$\begin{aligned} \int f\, d \mu = \sup \left\{ \int g\, d \mu \ :\ g\ \text {is a simple function such that}\ 0 \le g \le f \right\} . \end{aligned}$$

Proposition 2.25

If fg are nonnegative measurable functions and \(a > 0\), then

$$\begin{aligned} \int a f\, d \mu = a \int f\, d \mu , \quad \int (f + g)\, d \mu = \int f\, d \mu + \int g\, d \mu . \end{aligned}$$

Moreover, if \(f \le g\), then

$$\begin{aligned} \int f\, d \mu \le \int g\, d \mu . \end{aligned}$$

This additivity property will allow us to extend the definition of integration to functions that change sign.

Definition 2.26

For an extended real-valued function f, define functions

$$ f^{+} (x) = {\left\{ \begin{array}{ll} f(x) &{} \text {if}\ f(x) > 0, \\ 0 &{} \text {if}\ f(x) \le 0; \end{array}\right. } \qquad f^{-} (x) = {\left\{ \begin{array}{ll} - f(x) &{} \text {if}\ f(x) < 0, \\ 0 &{} \text {if}\ f(x) \ge 0. \end{array}\right. } $$

Note that \(f^{+}\) and \(f^{-}\) are nonnegative. They are measurable if f is, and \(f = f^{+} - f^{-},\ \left| f \right| = f^{+} + f^{-}\).

Definition 2.27

A measurable function is integrable if \(\int \left| f \right| \, d \mu < + \infty \).

Definition 2.28

If f is an integrable function, its integral is

$$\begin{aligned} \int f\, d \mu = \int f^{+}\, d \mu - \int f^{-}\, d \mu . \end{aligned}$$

Definition 2.29

The limit supremum of a sequence is the least upper bound of the set of all subsequential limits of the sequence. That is,

$$\begin{aligned} \limsup _{n \rightarrow \infty } a_{n} := \lim _{n \rightarrow \infty } \left( \sup \{ a_{m} : m \ge n \} \right) = \inf _{n \ge 0} \left( \sup _{m \ge n} a_{m} \right) . \end{aligned}$$

Similarly, we define

$$\begin{aligned} \liminf _{n \rightarrow \infty } a_{n} := \lim _{n \rightarrow \infty } \left( \inf \{ a_{m} : m \ge n \} \right) . \end{aligned}$$

Theorem 2.30

(Fundamental convergence theorems) Here, we record the fundamental convergence theorems in analysis, that we use in the sequel.

  1. 1.

    (Lebesgue’s dominated convergence theorem) Suppose \((f_{n})_{n = 1}^{\infty }\) is a sequence of measurable functions and \(\lim \limits _{n \rightarrow \infty } f_{n} (x) = f (x)\), for all \(x \in \mathbb {R}\), and \(|f_{n}(x)| \le g(x)\) for all \(n \in \mathbb {N},\, x \in \mathbb {R}\) where g is an integrable function. Then,

    $$\begin{aligned} \lim _{n \rightarrow \infty } \int f_{n}\, d \mu = \int f\, d \mu . \end{aligned}$$
  2. 2.

    (Monotone Convergence Theorem) Suppose \((f_{n})_{n = 1}^{\infty }\) is a non-decreasing sequence of non-negative measurable functions \(0 {\le } f_{1} {\le } f_{2} \le \cdots \). Let \(f(x) {=} \lim \limits _{n \rightarrow \infty } f_n (x)\). Then,

    $$\begin{aligned} \lim _{n \rightarrow \infty } \int f_{n}\, d \mu = \int f\, d \mu . \end{aligned}$$
  3. 3.

    (Fatou’s Lemma) If \((f_{n})_{n = 1}^{\infty }\) is a sequence of nonnegative measurable functions, then

    $$\begin{aligned} \int \liminf _{n \rightarrow \infty } f_{n}\, d \mu \le \liminf _{n \rightarrow \infty } \int f_{n}\, d \mu . \end{aligned}$$

Definition 2.31

Two functions f and g are said to be equal almost everywhere, written \(f = g\) a.e., if \(\{ x\ :\ f(x) \ne g(x) \}\) is a set of measure zero.

Proposition 2.32

If f is a function on a Lebesgue measurable set E and \(g = f\) a.e., then g is Lebesgue measurable if and only if f is Lebesgue measurable.

Definition 2.33

Consider the set of all integrable functions on \(\mathbb {R}\). The function space \(L^{1}\) is the set of all equivalence classes of integrable functions on \(\mathbb {R}\), where we set \(f \simeq g\) if \(f = g\) a.e. The \(L^{1}\) norm is given by

$$\begin{aligned} \left\| f \right\| _{1} := \int \left| f \right| \, d \mu . \end{aligned}$$

Theorem 2.34

\(L^{1}\) is complete, i.e., given a Cauchy sequence \(\{ f_{n} \}_{n = 1}^{\infty }\) in \(L^{1}\), there exists \(f \in L^{1}\) such that \(\lim \limits _{n \rightarrow \infty } \left\| f_{n} - f \right\| _{1} = 0\).

Generalising the \(L^{1}\) notion to functions on arbitrary complete measure spaces, we have the following definition.

Definition 2.35

Let \((X, \mathfrak {M}, \mu )\) be a complete measure space and \(f : X \longrightarrow \bar{\mathbb {R}}\) be a measurable function, then for each integer \(p \ge 1\), we say that \(f \in L^{p} (\mu )\) if

$$\begin{aligned} \int \limits _{X} \left| f \right| ^{p}\, d \mu < \infty . \end{aligned}$$

For any such \(f \in L^{p} (\mu )\), we may define the \(L^{p}\)-norm as

$$\begin{aligned} \left\| f \right\| _{p} := \left( \int \limits _{X} \left| f \right| ^{p}\, d \mu \right) ^{\frac{1}{p}}. \end{aligned}$$

Identifying the functions whose values agree a.e. allows for defining a metric on the space \(L^{p} (\mu )\) by means of the \(L^{p}\)-norm. We treat \(L^{p}(\mu )\) as the set of equivalence classes of functions which coincide a.e.. Thus, \(L^{p}(\mu )\) becomes a Banach space for \(1 \le p < \infty \). In particular, \(L^{2} (\mu )\) is a Hilbert space with the inner product defined by

$$\begin{aligned} \langle f, g \rangle := \int \limits _{X} f g\, d \mu . \end{aligned}$$

Definition 2.36

\(f : X \longrightarrow \mathbb {R}\) is said to be compactly supported if the closure of the set of points in X where the value of f is non-zero, is a compact subset of X.

Notation 2.37

We denote the set of all compactly supported (real-valued) continuous functions on X as \(C_{c}(X)\).

Theorem 2.38

(Lusin’s Theorem) If X is a locally compact Hausdorff topological space and if \(f : X \longrightarrow \bar{\mathbb {R}}\) is a measurable function such that \(f(x) = 0\), for all \(x \notin A \subset X,\) where \(\mu (A) < \infty \), then given \(\epsilon > 0\), there exists a \(g \in C_{c}(X)\) so that

$$\begin{aligned} \mu \left( \left\{ x\ :\ f(x) \ne g(x) \right\} \right) < \epsilon . \end{aligned}$$

Theorem 2.39

For \(1 \le p < \infty ,\ C_{c}(X)\) is dense in \(L^{p}(\mu )\).

Definition 2.40

Let \((X, \mathfrak {M)}\) be a measurable space and \(\mu ,\ \nu : X \longrightarrow [0, \infty )\) be two measures on \(\mathfrak {M}\). We say that \(\mu \) is absolutely continuous with respect to \(\nu \) if \(A \in \mathfrak {M}\) and \(\nu (A) = 0\) implies \(\mu (A) = 0\). This is denoted as \(\mu<\!\!< \nu \).

Theorem 2.41

(Radon–Nikodym) If \((X, \mathfrak {M}, \nu )\) is a \(\sigma \)-finite measure space, then \(\mu<\!\!< \nu \) if and only if there exists a function \(f \in L^{1}(\nu )\) such that

$$\begin{aligned} \mu (A) = \int \limits _{A} f\, d \nu \ \ \text {for every}\ A \in \mathfrak {M}. \end{aligned}$$

The function f is unique a.e. with respect to \(\nu \) and is written as \(\frac{d \mu }{d \nu }\), called the Radon-Nikodym derivative of \(\mu \) w.r.t \(\nu \).

2.2 Hausdorff Measures

In this subsection, we outline the notion of more general measures called Hausdorff measures that subsume the Lebesgue measure. It is assumed that (Xd) is a non-empty metric space. The notion of Hausdorff dimension of a subset \(A \subset X\) arises from the construction of Hausdorff measures [6].

Definition 2.42

A function \(\mu \) defined on \(\mathcal {P}(X)\) is called a metric outer measure if it satisfies the following:

  1. 1.

    \(\mu ^{*}(A) \ge 0\), for all \(A \in \mathcal {P}(X)\);

  2. 2.

    \(\mu ^{*}(\varnothing ) = 0\);

  3. 3.

    (Monotonicity) \(A_{1} \subseteq A_{2} \implies \mu ^{*}(A_{1}) \le \mu ^{*}(A_{2})\);

  4. 4.

    (Countable subadditivity) if \(\{ A_{n} \}_{n = 1}^{\infty }\) is a countable collection of members of \(\mathcal {P}(X)\), then \(\mu ^{*} \left( \bigcup \limits _{n = 1}^{\infty } A_{n} \right) \le \sum \limits _{n = 1}^{\infty } \mu ^{*} (A_{n})\);

  5. 5.

    if \(A_{1}, A_{2} \in \mathcal {P}(X)\) with \(d(A_{1}, A_{2}) > 0\), then \(\mu ^{*} (A_{1} \cup A_{2}) = \mu ^{*} (A_{1}) + \mu ^{*} (A_{2})\).

A familiar example of such an outer measure is the Lebesgue outer measure discussed in the earlier sections. Before defining the Hausdorff measure, we remark that as in the case of \(\mathbb {R}\), a subset E of a space X is said to be measurable if

$$\begin{aligned} \mu ^{*} (A) = \mu ^{*} (A \cap E) + \mu ^{*} (A \cap E^{c}),\ \ \forall A \in \mathcal {P}(X). \end{aligned}$$

The class of measurable sets in X evidently form a \(\sigma \)-algebra \(\mathfrak {P}\) so that \(\mu \) when restricted to \(\mathfrak {P}\), is countably additive and thus a measure in the usual sense. We henceforth use the usual \(\mu \) notation for the measure.

Definition 2.43

Given a metric space (Xd) and \(A \subset X\), the diameter of A is given as \(\delta (A) := \sup \{ d(x, y)\ :\ x, y \in A \}\).

Let (Xd) be a metric space and let \(\alpha (> 0) \in \mathbb {R}\). Let \(A \subset X\). Given \(\epsilon > 0\), consider

$$\begin{aligned} H^{\epsilon }_{\alpha } (A) = \inf \left\{ \sum \limits _{k = 1}^{\infty } \delta (A_{k})^{\alpha }\ :\ A \subseteq \bigcup \limits _{k = 1}^{\infty } A_{k}\ \text {where}\ \delta (A_{k}) < \epsilon \; \forall k \right\} , \end{aligned}$$

the infimum being taken over all countable covers of the set A whose members have diameter less than \(\epsilon \). Note that if \(\epsilon _{1} < \epsilon \), then \(H_{\alpha }^{\epsilon _{1}} (A) \ge H_{\alpha }^{\epsilon } (A)\). Therefore, \(\lim \limits _{\epsilon \rightarrow 0} H_{\alpha }^{\epsilon } (A)\) exists, though it may be infinite, and we write \(H_{\alpha } (A) = \lim \limits _{\epsilon \rightarrow 0} H_{\alpha }^{\epsilon } (A)\).

Theorem 2.44

For each \(\alpha > 0,\ H_{\alpha }\) is a metric outer measure on X called the Hausdorff outer measure of dimension \(\alpha \) and when restricted to the \(\sigma \)-algebra of measurable sets, is called the Hausdorff measure of dimension \(\alpha \) on X.

Note that if \(\alpha = 0\), then \(H_{\alpha }\) is merely, the counting measure.

Theorem 2.45

  1. (i)

    If \(H_{\alpha } (A) < \infty \), then \(H_{\beta } (A) = 0\) for \(\beta > \alpha \).

  2. (ii)

    If \(H_{\alpha } (A) > 0\), then \(H_{\beta } (A) = \infty \) for \(\beta < \alpha \).

Proof

It is easy to see that (i) and (ii) are equivalent. Therefore, we prove (i). Suppose \(A = \bigcup \limits _{k = 1}^{\infty } A_{k}\), with \(\delta (A_{k}) < \epsilon \). If \(\beta > \alpha \), then

$$\begin{aligned} H^{\epsilon }_{\beta } (A) \le \sum \limits _{k = 1}^{\infty } \delta (A_{k})^{\beta } \le \epsilon ^{\beta - \alpha } \sum \limits _{k = 1}^{\infty } \delta (A_{k})^{\alpha }. \end{aligned}$$

That is, \(H_{\beta }^{\epsilon } (A) \le \epsilon ^{\beta - \alpha } H^{\epsilon }_{\alpha } (A)\). Letting \(\epsilon \rightarrow 0\), we see that \(H_{\beta } (A) = 0\) if \(H_{\alpha } (A) < \infty \).    \(\square \)

As a consequence of the above theorem, for \(A \subset X\), there exists \(d \in \mathbb {R}\) such that

$$ {\left\{ \begin{array}{ll} H_{m} (A) = 0 &{} \text {if}\ m > d, \\ H_{m} (A) = \infty &{} \text {if}\ m < d. \end{array}\right. } $$

The d, obtained as above, is called the Hausdorff dimension of the set A, denoted by \(\mathcal {H}_{dim}(A)\).

Example 2.46

  1. 1.

    If A is any countable set then, \(\mathcal {H}_{dim}(A) = 0\).

  2. 2.

    If \(X = \mathbb {R}\) and \(\alpha = 1\), then it is straightforward to check that \(H_{1}\) is the Lebesgue measure.

  3. 3.

    The Cantor ternary set is an example of an uncountable set of zero Lebesgue measure, as opposed to countable sets which are also of Lebesgue measure zero. It can be shown that its Hausdorff dimension is \(\displaystyle {\frac{\ln {2}}{\ln {3}}}\).

If \(X = \mathbb {R}^{n},\ n > 1\), then \(H_{n}\) is not the same as the Lebesgue measure, but is comparable to it, a fact elucidated in the next theorem.

Theorem 2.47

Let \(A \subset \mathbb {R}^{n}\).

  1. 1.

    Then, there exists positive constants \(C_{1}\) and \(C_{2}\) depending only on the dimension n such that

    $$\begin{aligned} C_{1} H_{n} (A) \le \lambda (A) \le C_{2} H_{n} (A), \end{aligned}$$

    for \(A \subset \mathbb {R}^{n},\ \lambda \) being the Lebesgue measure on \(\mathbb {R}^{n}\).

  2. 2.

    If \(\alpha > n\), then \(H_{\alpha } (A) = 0\), for every \(A \subset \mathbb {R}^{n}\).

3 Recurrence and Ergodic Theorems

Let \((X, \mathfrak {M}, \mu )\) be a measure space. A transformation \(T : X \longrightarrow X\) is said to be a measurable transformation (with respect to \(\mu \)) if the inverse image of every \(\mu \)-measurable set is \(\mu \)-measurable. And a \(\mu \)-measurable transformation T of X into itself is said to be measure preserving if \(\mu (T^{-1} (E)) = \mu (E)\) for every \(\mu \)-measurable subset E of X.

Example 3.1

  1. 1.

    Let \(X = [0, 1)\) and \(\lambda \) be the Lebesgue measure on X. Let \(c \in X\) be any point. Then the transformation \(T : X \longrightarrow X\) defined by \(T(x) = x + c\; (\mathrm{mod} 1)\) is measure preserving.

  2. 2.

    Let \(X = [0, 1)\) and \(\lambda \) be the Lebesgue measure on X. Define \(T : X \longrightarrow X\) as

    $$ T(x) = {\left\{ \begin{array}{ll} 2x &{} \text {for}\ 0 \le x< \frac{1}{2} \\ 2x - 1 &{} \text {for}\ \frac{1}{2} \le x < 1. \end{array}\right. } $$

    It can be easily verified that T as defined above is a measure preserving transformation.

  3. 3.

    Given \(a = (a_{1}, a_{2}, \ldots , a_{n}) \in \mathbb {R}^{n}\) where \(\mathbb {R}^{n}\) is equipped with the usual Lebesgue measure. The affine transformation \(T : \mathbb {R}^{n} \longrightarrow \mathbb {R}^{n}\) defined as \(T(x) = x + a\) is invertible and measure preserving.

In the context of ergodic theory, a measurable space \((X, \mathfrak {M}, \mu )\) equipped with a measure preserving transformation T constitutes a dynamical system denoted by \((X, \mathfrak {M}, \mu , T)\).

3.1 Recurrence

In the sequel, we assume that \((X, \mathfrak {M}, \mu )\) is a probability space i.e. \(\mu (X) = 1\). Given a measure preserving transformation T on a measure space \((X, \mathfrak {M}, \mu ),\ T\) is said to be recurrent if for any given set of positive measure \(A \subset X\), almost all points of A return to A after at most finitely many iterations of T.

Theorem 3.2

(Poincare recurrence theorem) Let \((X, \mathfrak {M}, \mu )\) be a probability space and \(T : X \longrightarrow X\) be a measure preserving transformation. Given \(A \in \mathfrak {M}\), let \(A_{0}\) be the set of points \(x \in A\) such that \(T^{n}(x) \in A\) for infinitely many \(n \ge 0\). Then \(A_{0} \in \mathfrak {M}\) and \(\mu (A_{0}) = \mu (A)\).

Proof

Let

$$\begin{aligned} C_{n} = \left\{ x \in A\ :\ T^{k} (x) \notin A\ \forall k \ge n \right\} . \end{aligned}$$

Therefore \(A_{0} = A \setminus \bigcup \limits _{n = 1}^{\infty } C_{n}\). In order to prove the theorem, it is enough to show that

  1. 1.

    \(C_{n} \in \mathfrak {M}\) and

  2. 2.

    \(\mu (C_{n}) = 0\) for every \(n \ge 1\).

  1. 1.

    Now, \(C_{n} = A \setminus \bigcup \limits _{k \ge n} T^{-k}(A)\). Since \(T^{-k}(A) \in \mathfrak {M}\) for every \(k \ge 1\), we see that \(C_{n} \in \mathfrak {M}\).

  2. 2.

    Also,

    $$\begin{aligned} C_{n}\subset & {} \bigcup \limits _{k \ge 0} T^{-k}(A) \setminus \bigcup \limits _{k \ge n} T^{-k}(A) \\ \Longrightarrow \ \ \mu (C_{n})\le & {} \mu \left( \bigcup \limits _{k \ge 0} T^{-k}(A) \right) - \mu \left( \bigcup \limits _{k \ge n} T^{-k}(A) \right) . \end{aligned}$$

    Now, observe that \(\bigcup \limits _{k \ge n} T^{-k}(A) = T^{-n} \left( \bigcup \limits _{k \ge 0} T^{-k}(A) \right) \). Since T is measure preserving, this implies

    $$\begin{aligned} \mu \left( \bigcup _{k \ge 0} T^{-k}(A) \right) = \mu \left( \bigcup _{k \ge n} T^{-k}(A) \right) . \end{aligned}$$

    Therefore \(\mu (C_{n}) = 0\).    \(\square \)

3.2 Birkhoff Ergodic Theorem and the Notion of Ergodicity

Let \((X, \mathfrak {M}, \mu )\) be a probability space and \(T : X \longrightarrow X\) be a measure preserving transformation. Let \(E \in \mathfrak {M}\). Given \(x \in X\), one would like to ask with what frequency do the elements of the set \(\{ x, Tx, T^{2}x, \ldots \}\) lie in the set E?

Clearly \(T^{i}x \in E\) if and only if \(\chi _{E}(T^{i} x) = 1\); therefore the number of elements of \(\{ x, Tx, T^{2} x, \ldots , T^{n - 1} x \}\) in E is \(\sum \limits _{k = 0}^{n - 1} \chi _{E} (T^{k} x)\) or the relative number of \(\{ x, Tx, \ldots , T^{n - 1} x \}\) in E is \(\displaystyle {\frac{1}{n} \sum \limits _{k = 0}^{n - 1} \chi _{E} (T^{k} x)}\).

Around the turn of the century, the work of Boltzmann and Gibbs on statistical mechanics raised a mathematical problem which can be stated as follows: Given a measure preserving transformation T of a probability space and an integrable function \(f : X \longrightarrow \mathbb {R}\), find conditions under which

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{f(x) + f(T x) + \cdots + f(T^{n - 1} x)}{n} \end{aligned}$$

exists and is constant almost everywhere.

In 1931 [3], Birkhoff proved that for any T and f, the above limit exists almost everywhere. From this, he concluded that a necessary and sufficient condition for its value to be constant almost everywhere, is that there exist no set \(A \in \mathfrak {M}\) such that \(0< \mu (A) < 1\) and \(T^{-1} A = A\). As we will see later, the fact that this limit is constant easily implies that it is equal to the integral of f over X. Transformations T which satisfy this condition are called ergodic and ergodic theory is essentially the study of such transformations. The Birkhoff Ergodic theorem is the first fundamental result that sets the tone for much of what follows.

Theorem 3.3

(Birkhoff Ergodic Theorem) Let \((X, \mathfrak {M}, \mu )\) be a probability space and \(T : X \longrightarrow X\) be a measure preserving transformation. If \(f \in L^{1} (\mu )\) then the limit

$$\begin{aligned} \lim \limits _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f(T^{k} (x)) = \widetilde{f}(x), \end{aligned}$$

exists for almost every point \(x \in X,\ \widetilde{f} \in L^{1}(\mu )\) and \(\widetilde{f} \circ T = \widetilde{f}\) almost everywhere. Furthermore,

$$\begin{aligned} \int _{X} \widetilde{f}\, d \mu \ \ =\ \ \int _{X} f\, d \mu . \end{aligned}$$

If f is any measurable function, let \(g(x) = f(T x)\). Since T is measurable, the function g is measurable so that, writing \(g (x) = U f(x)\), the transformation U assigns to each measurable function f, a measurable function g. Clearly, U is linear and g is non-negative if f is so. Moreover, we have:

Theorem 3.4

If \(1 \le p \le \infty \) and \(\Vert f \Vert _{p}\) denotes the \(L^{p}\)-norm of f, then \(\left\| g \right\| _{p} = \left\| f \right\| _{p}\) for \(g = U f\).

Proof

Let \(E \in \mathfrak {M}\) and \(f = \chi _{E}\). Then \(g = U f = f(T x) = \chi _{T^{-1}(E)}\). Therefore,

$$\begin{aligned} \left\| g \right\| _{p}^{p}\ =\ \mu (T^{-1}(E))\ =\ \mu (E)\ =\ \left\| f \right\| _{p}^{p}. \end{aligned}$$

It follows that \(\left\| g \right\| _{p} = \left\| f \right\| _{p}\) for every non-negative simple function. If f is any non-negative measurable function, there exists a sequence of simple, non-negative measurable functions \(\{ s_{n} \}_{n = 1}^{\infty }\) such that \(s_{n} \rightarrow f\), as \(n \rightarrow \infty \), with \(s_{1} \le s_{2} \le \cdots \le f\). Now, since \(t_{n} = U s_{n}\) is also an increasing sequence of simple functions, converging to g, monotone convergence theorem implies that

$$\begin{aligned} \left\| g \right\| _{p}\ \ =\ \ \lim _{n \rightarrow \infty } \left\| t_{n} \right\| _{p}\ \ =\ \ \lim _{n \rightarrow \infty } \left\| s_{n} \right\| _{p}\ \ =\ \ \left\| f \right\| _{p}. \end{aligned}$$

The general case of f now follows by writing \(f = f^{+} - f^{-}\) and applying the above conclusion to \(f^{+}\) and \(f^{-}\) separately.    \(\square \)

In particular, if \(f \in L^{2} (\mu )\) we have showed that \(g(x) = U f(x) = f(Tx)\) is also in \(L^{2} (\mu )\) and that \(\left\| g \right\| _{2} = \left\| f \right\| _{2}\). In other words, U is an isometric transformation of the Hilbert space \(L^{2} (\mu )\) into itself.

If, in addition, T is invertible (i.e., there exists a measure preserving transformation \(S : X \longrightarrow X\) such that \(S T = T S = Id_{X}\)) and if V is the isometric transformation in \(L^{2}(\mu )\) corresponding to its inverse, then \(U V = V U\) is the identity transformation in \(L^{2}(\mu )\). Therefore, the range of V is the whole of \(L^{2}(\mu )\); in other words, U is a unitary transformation in \(L^{2}(\mu )\) and V is its inverse. Thus, an invertible measure preserving transformation on a measure space \((X, \mu )\) induces an invertible unitary transformation in the Hilbert space \(L^{2}(\mu )\).

Therefore, in so far as it concerns functions \(f \in L^{2}(\mu )\), the existence of the limit of the averages is reduced to the problem of existence of the limit as \(n \rightarrow \infty \) of \(\displaystyle {\frac{1}{n} \sum _{k = 0}^{n - 1} U^{k} f (x)}\), where U is an isometric transformation in the Hilbert space \(L^{2}(\mu )\). Precisely, this convergence, known as the mean ergodic theorem, was proven by J. von Neumann in 1932 [13].

Theorem 3.5

(Mean ergodic theorem) If U is an isometric transformation in an arbitrary Hilbert space H and if P is the orthogonal projection on the closed linear subspace of all \(f \in H\) satisfying \(U f = f\), then \(\displaystyle {\frac{1}{n} \sum _{k = 0}^{n - 1} U^{k} f}\) converges in norm as \(n \rightarrow \infty \) to Pf for all \(f \in H\).

We will skip a proof of this and prove the more general Birkhoff ergodic theorem (BET, for short). We prove the first part of the BET and prove the more general \(L^{p}\) version of the second part as a corollary. The key step in the proof of BET is itself a useful lemma known as the Maximal ergodic theorem.

Lemma 3.6

(Maximal ergodic theorem) Given \(f \in L^{1}(\mu )\), put

$$\begin{aligned} E(f)\ \ =\ \ \left\{ x\ :\ \max _{n \ge 0} \left( \sum _{k = 0}^{n - 1} f(T^{k} x) \right) > 0 \right\} . \end{aligned}$$

Then \(\int _{E(f)} f\, d \mu \ge 0\).

Proof

Define

$$\begin{aligned} f_{0}:= & {} 0, \\ f_{n}:= & {} f + f \circ T + f \circ T^{2} + \cdots + f \circ T^{n - 1} \\= & {} f + U f + U^{2} f + \cdots + U^{n - 1} f. \end{aligned}$$

Let \(F_{n} = \max \limits _{0 \le k \le n} f_{k}\). Therefore

$$\begin{aligned} E(f) = \bigcup \limits _{n - 1}^{\infty } \left\{ x\ :\ F_{n} (x) > 0 \right\} = \bigcup \limits _{n - 1}^{\infty } E_{n}. \end{aligned}$$

Clearly, \(F_{n} \in L^{1} (\mu )\) and, for \(0 \le k \le n\), we have \(F_{n} \ge f_{k}\). Therefore \(U F_{n} \ge U f_{k}\) because \(U : L^{1} (\mu ) \longrightarrow L^{1} (\mu )\) is a positive linear operator (i.e., \(f \ge 0\) implies \(U f \ge 0\)) and hence,

$$\begin{aligned} U F_{n} + f \ge U f_{k} + f = f_{k + 1}. \end{aligned}$$

In other words,

$$\begin{aligned} U F_{n} + f \ge \max _{1 \le k \le n} f_{k} (x)\ \ =\ \ \max _{0 \le k \le n} f_{k} (x)\ \ =\ \ F_{n} (x)\ \text {when}\ F_{n}(x)\ >\ 0. \end{aligned}$$

That is, \(f \ge F_{n} - U F_{n}\) on \(\{ x\ :\ F_{n} (x) > 0 \} = E_{n}\). Therefore,

$$\begin{aligned} \int _{E_{n}} f\, d \mu\ge & {} \int _{E_{n}} F_{n}\, d \mu - \int _{E_{n}} U F_{n}\, d \mu \\= & {} \int _{X} F_{n}\, d \mu - \int _{E_{n}} U F_{n}\, d \mu \\\ge & {} \int _{X} F_{n}\, d \mu - \int _{X} U F_{n}\, d \mu \\= & {} 0. \end{aligned}$$

The second equality above holds because \(F_{n} = 0\) on \(X \setminus E_{n}\), the third inequality holds because \(F_{n} \ge 0\) implies \(U F_{n} \ge 0\) and the last equality holds because \(\left\| U \right\| = 1\). Finally, since \(E_{1} \subseteq E_{2} \subseteq \cdots \), we have that \(E_{n} \rightarrow E(f)\) and we are done.    \(\square \)

Corollary 3.7

If \(A \subset E(f),\ A \in \mathfrak {M}\) and \(T^{-1} A = A\), then,

$$\begin{aligned} \int _{A} f\, d \mu \ge 0. \end{aligned}$$

Proof

Since \(T^{-1} A = A\), we see that \(E(f \chi _{A}) = A\). Therefore, the lemma above implies \(0 \le \int _{E (f \chi _{A})} f \chi _{A}\, d \mu = \int _{A} f \chi _{A}\, d \mu = \int _{A} f\, d \mu \).    \(\square \)

Theorem 3.8

Let \((X, \mathfrak {M}, \mu )\) be a probability space and \(T : X \longrightarrow X\) be a measure preserving transformation. If \(f \in L^{1} (\mu )\), then the limit

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x) \end{aligned}$$

exists for almost every point \(x \in X\).

Proof

For each \(\alpha ,\ \beta \in \mathbb {R}\) with \(\alpha < \beta \), let

$$\begin{aligned} E_{\alpha , \beta }\ \ =\ \ \left\{ x \in X\ :\ \liminf _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f(T^{k} x)< \alpha< \beta < \limsup _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f(T^{k} x) \right\} . \end{aligned}$$

Clearly, \(E_{\alpha , \beta } \in \mathfrak {M}\). We will show that \(\mu (E_{\alpha , \beta }) = 0\) for each \(\alpha ,\ \beta \). This would imply that \(\bigcup E_{\alpha , \beta }\), where \(\alpha ,\ \beta \in \mathbb {R}\) such that \(\alpha < \beta \), has measure zero and hence the limit exists almost everywhere.

Put \(\displaystyle {f^{*} (x) = \sup \limits _{n \ge 1} \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x)}\) and \(\displaystyle {f_{*} (x) = \inf \limits _{n \ge 1} \frac{1}{n} \sum _{k = 0}^{n - 1} f(T^{k} x)}\). Therefore,

$$\begin{aligned} E_{\alpha , \beta } \subset \left\{ x\ :\ f^{*} (x)> \beta \right\} = \left\{ x\ :\ (f^{*} - \beta ) (x) > 0 \right\} = E (f - \beta ) \end{aligned}$$

and \(E_{\alpha , \beta } \subset \left\{ x\ :\ f_{*}(x) < \alpha \right\} \).

We first show that \(E_{\alpha , \beta }\) is T-invariant. That is, we show that \(T^{-1} (E_{\alpha , \beta }) = E_{\alpha , \beta }\).

Let \(\displaystyle {a_{n} (x) = \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x)}\). Then, \(\displaystyle {\frac{n + 1}{n} a_{n + 1} (x) - a_{n} (T x) = \frac{f(x)}{n}}\). Therefore,

$$\begin{aligned} \limsup _{n \rightarrow \infty } (a_{n + 1} (x) + \frac{1}{n} a_{n + 1} (x) - a_{n} (T x)) = \limsup _{n \rightarrow \infty } \frac{f(x)}{n}. \end{aligned}$$

This implies that \(\limsup \limits _{n \rightarrow \infty } (a_{n + 1} (x) - a_{n} (T x)) = 0\). That is, \(\limsup \limits _{n \rightarrow \infty } (a_{n + 1} (x)) = \limsup \limits _{n \rightarrow \infty } (a_{n} (T x))\). Similarly, \(\liminf \limits _{n \rightarrow \infty } (a_{n + 1} (x)) = \liminf \limits _{n \rightarrow \infty } (a_{n} (T x))\).

Therefore, \(T^{-1} (E_{\alpha , \beta }) = E_{\alpha , \beta }\).

By Corollary 3.7, we get \(\int _{E_{\alpha , \beta }} (f - \beta )\, d \mu \ge 0\) or \(\int _{E_{\alpha , \beta }} f\, d \mu \ge \beta \mu (E_{\alpha , \beta })\). Now \(E_{\alpha , \beta } \subset \left\{ x\ :\ f_{*} (x) < \alpha \right\} = \left\{ x\ :\ - f_{*}> - \alpha \right\} = \left\{ x\ :\ (- f)^{*} > - \alpha \right\} \).

Therefore, by the maximal ergodic theorem 3.6, \(\int _{E_{\alpha , \beta }} (- f)\, d \mu \ge - \alpha \mu (E_{\alpha , \beta })\) or \(\int _{E_{\alpha , \beta }} f\, d \mu \le \alpha \mu (E_{\alpha , \beta })\). Thus, \(\beta \mu (E_{\alpha , \beta }) \le \int _{E_{\alpha , \beta }} f\, d \mu \le \alpha \mu (E_{\alpha , \beta })\).

But \(\alpha < \beta \). Therefore, the above inequality holds only if \(\mu (E_{\alpha , \beta }) = 0\).    \(\square \)

Corollary 3.9

  1. (i)

    If \(f \in L^{p} (\mu ),\ 1 \le p \le \infty \), the function \(\widetilde{f}\) defined by,

    $$\begin{aligned} \widetilde{f} (x) = \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x) \end{aligned}$$

    is in \(L^{p} (\mu )\) and satisfies

    $$\begin{aligned} \lim _{n \rightarrow \infty } \left\| \widetilde{f} - \frac{1}{n} \sum _{k = 0}^{n - 1} f \circ T^{k} \right\| _{p} = 0. \end{aligned}$$
  2. (ii)

    \(\widetilde{f}(T x) = \widetilde{f}(x)\).

  3. (iii)

    For \(f \in L^{p}(\mu ),\ \int _{X} \widetilde{f}\, d \mu = \int _{X} f\, d \mu \).

Proof

(i) Since X is a probability space, \(\mu (X) = 1\). Therefore, \(f \in L^{1} (\mu )\) and \(\widetilde{f}(x)\) makes sense. Moreover, \(\left| f \right| \in L^{1} (\mu )\) and \(\displaystyle {\left| \widetilde{f} (x) \right| \le \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} \left| f (T^{k} x) \right| }\) for a.e. x (this limit exists since \(\left| f \right| \in L^{1} (\mu )\)). That is, \(\displaystyle {\left| \widetilde{f}(x) \right| ^{p} \le \lim _{n \rightarrow \infty } \left( \frac{1}{n} \sum _{k = 0}^{n - 1} \left| f(T^{k} x) \right| \right) ^{p}}\). Since \(\left| \widetilde{f} \right| ^{p} \ge 0\),

$$\begin{aligned} \left\| \widetilde{f} \right\| _{p}^{p} = \int _{X} \left| \widetilde{f} \right| ^{p}\, d \mu= & {} \int _{X} \lim _{n \rightarrow \infty } \left( \frac{1}{n} \sum _{k = 0}^{n - 1} \left| f(T^{k} x) \right| \right) ^{p}\, d \mu \\= & {} \int _{X} \liminf _{n \rightarrow \infty } \left( \frac{1}{n} \sum _{k = 0}^{n - 1} \left| f (T^{k} x) \right| \right) ^{p}\, d \mu \\\le & {} \liminf _{n \rightarrow \infty } \int _{X} \left( \frac{1}{n} \sum _{k = 0}^{n - 1} \left| f (T^{k} x) \right| \right) ^{p}\, d \mu .\ {(\text {Fatou's Lemma})} \end{aligned}$$

Now

$$\begin{aligned} \int _{X} \left( \frac{1}{n} \sum _{k = 0}^{n - 1} \left| f (T^{k} x) \right| \right) ^{p}\, d \mu= & {} \left\| \frac{1}{n} \sum _{k = 0}^{n - 1} \left| f (T^{k} x) \right| \right\| _{p}^{p} \\\le & {} \left( \frac{1}{n} \sum _{k = 0}^{n - 1} \left\| f(T^{k} x) \right\| _{p} \right) ^{p} \\= & {} \left( \frac{1}{n} \sum _{k = 0}^{n - 1} \left\| f \right\| _{p} \right) ^{p} {(T^{k}\ \text {is measure preserving})} \\= & {} \left\| f \right\| _{p}^{p}. \end{aligned}$$

Therefore

$$\begin{aligned} \left\| \widetilde{f} \right\| _{p}^{p} \le \liminf \limits _{n \rightarrow \infty } \int _{X} \left( \frac{1}{n} \sum _{k = 0}^{n - 1} \left| f(T^{k} x) \right| \right) ^{p}\, d \mu \le \liminf _{n \rightarrow \infty } \left\| f \right\| _{p}^{p} = \left\| f \right\| _{p}^{p} < \infty , \end{aligned}$$

since \(f \in L^{p} (\mu )\). Therefore \(\widetilde{f} \in L^{p} (\mu )\).    \(\square \)

Definition 3.10

(Convergence in the \(L^{p}\)-norm) Consider the case \(f \in L^{\infty }(\mu )\), i.e., \(\sup \limits _{x \in X} \left| f(x) \right| < \infty \) a.e. Clearly, \(f \in L^{1} (\mu )\) and the sequence of functions

$$\begin{aligned} \left| \widetilde{f} - \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x) \right| ^{p} \end{aligned}$$

converges to 0 a.e. Moreover,

$$\begin{aligned} \left| \widetilde{f} (x) \right| \ \le \ \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} \left| f (T^{k} x) \right| \ \le \ \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} \left\| f \right\| _{\infty } \ =\ \left\| f \right\| _{\infty }\ \text {-a.e.} \end{aligned}$$

Therefore,

$$ \left| \widetilde{f}(x) - \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x) \right| ^{p} \le \left| \left\| f \right\| _{\infty } + \frac{1}{n} \sum _{k = 0}^{n - 1} \left\| f \circ T^{k} \right\| _{\infty } \right| ^{p} \le \left( 2 \left\| f \right\| _{\infty } \right) ^{p} = \text {constant}. $$

Hence by dominated Convergence theorem,

$$\begin{aligned} \int _{X} \left| \widetilde{f} - \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x) \right| ^{p}\, d \mu \rightarrow 0\ \text {-a.e.} \end{aligned}$$

That is, for \(\displaystyle {f \in L^{p} (\mu ),\ \lim _{n \rightarrow \infty } \left\| \widetilde{f} - \frac{1}{n} \sum _{k = 0}^{n - 1} f \circ T^{k} \right\| _{p} = 0}\). Now, let \(f \in L^{p} (\mu )\) and let \(\varepsilon > 0\). There is an \(f_{0} \in L^{\infty } (\mu )\) such that \(\left\| f - f_{0} \right\| _{p} \le \varepsilon /3\) and there exists an \(N > 0\) such that \(\displaystyle {\left\| \widetilde{f}_{0} - \frac{1}{n} \sum _{k = 0}^{n - 1} f_{0} \circ T^{k} \right\| _{p} \le \varepsilon /3}\) for \(n \ge N\).

Then,

$$\begin{aligned}&\left\| \widetilde{f} - \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x) \right\| _{p} \\\le & {} \left\| \widetilde{f} - \widetilde{f}_{0} \right\| _{p}\ +\ \left\| \widetilde{f}_{0} - \frac{1}{n} \sum _{k = 0}^{n - 1} f_{0} (T^{k} x) \right\| _{p}\ +\ \left\| \frac{1}{n} \sum _{k = 0}^{n - 1} (f_{0} - f) (T^{k} x) \right\| _{p}. \end{aligned}$$

Now, \(\widetilde{f} - \widetilde{f}_{0} = \widetilde{f - f_{0}}\) and hence,

$$\begin{aligned} \left\| \widetilde{f} - \widetilde{f}_{0} \right\| _{p}\ =\ \left\| \widetilde{f - f_{0}} \right\| _{p}\ \le \ \left\| f - f_{0} \right\| _{p}\ \le \ \frac{\varepsilon }{3}, \end{aligned}$$

and

$$\begin{aligned} \left\| \frac{1}{n} \sum _{k = 0}^{n - 1} (f_{0} - f) (T^{k} x) \right\| _{p}\ \le \ \frac{1}{n} \sum _{k = 0}^{n - 1} \left\| f_{0} - f \right\| _{p}\ =\ \left\| f_{0} - f \right\| _{p}\ \le \ \frac{\varepsilon }{3}. \end{aligned}$$

Therefore, for \(n \ge N\),

$$\begin{aligned} \left\| \widetilde{f} - \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x) \right\| _{p}\ \ <\ \ \varepsilon , \end{aligned}$$

which implies that

$$\begin{aligned} \lim _{n \rightarrow \infty } \left\| \widetilde{f} - \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x) \right\| _{p}\ \ =\ \ 0. \end{aligned}$$

We now prove the remainder of the statements in Corollary 3.9.

(ii)

$$\begin{aligned} \widetilde{f}(T x)= & {} \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} (T x)) \\= & {} \lim _{n \rightarrow \infty } \left( \frac{1}{n} \sum _{k = 0}^{n} f (T^{k} x) - \frac{1}{n} f(x) \right) \\= & {} \lim _{n \rightarrow \infty } \frac{n + 1}{n} \frac{1}{n + 1} \sum _{k = 0}^{n} f (T^{k} x) - \lim _{n \rightarrow \infty } \frac{1}{n} f(x) \\= & {} \lim _{n \rightarrow \infty } \frac{1}{n + 1} \sum _{k = 0}^{n} f (T^{k} x) \\= & {} \widetilde{f}(x). \end{aligned}$$

(iii) If \(f \in L^{p} (\mu )\), note that by (ii), the sequence \(\displaystyle {\frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x)}\) converges to \(\widetilde{f}\) in \(L^{1} (\mu )\). Hence,

$$\begin{aligned} \int _{X} \widetilde{f}\, d \mu = \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} \int _{X} f (T^{k} x)\, d \mu = \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} \int _{X} f\, d \mu = \int _{X} f\, d \mu . \end{aligned}$$

   \(\square \)

In Birkhoff Ergodic Theorem, suppose the limit \(\widetilde{f}(x)= c\), where c is a constant. Then,

$$\begin{aligned} \int _{X} f\, d \mu = \int _{X} \widetilde{f}\, d \mu = c \mu (X). \end{aligned}$$

That is,

$$\begin{aligned} c = \widetilde{f}(x) = \frac{1}{\mu (X)} \int _{X} f\, d \mu . \end{aligned}$$

In other words, we see that

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x) = \frac{1}{\mu (X)} \int _{X} f\, d \mu . \end{aligned}$$

The left hand side is the time average of f and the right hand side is the space average of f. This is what the physicists call the ergodic hypothesis, (the equality of the time and space averages of f).

Proposition 3.11

Let T be an invertible measure preserving transformation of \(X,\ f \in L^{1} (\mu )\) and let

$$\begin{aligned} f^{+}_{n} (x)\ =\ \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{k} x)\ \ \ \ f^{-}_{n} (x)\ =\ \frac{1}{n} \sum _{k = 0}^{n - 1} f (T^{-k} x). \end{aligned}$$

Then, \(\widetilde{f}^{+} = \lim \limits _{n \rightarrow \infty } f^{+}_{n}\) and \(\widetilde{f}^{-} = \lim \limits _{n \rightarrow \infty } f^{-}_{n}\) exist and are equal almost everywhere, i.e., \(\widetilde{f}^{+} = \widetilde{f}^{-}\) -a.e.

Proof

We first observe that

$$\begin{aligned} f^{+}_{N} \circ T^{-(N - 1)} (x) = \frac{1}{N} \sum _{k = 0}^{N - 1} f (T^{k} (T^{-(N - 1)} x))\ \ =\ \ \frac{1}{N} \sum _{k = 0}^{N - 1} f (T^{-k} x) = f_{N}^{-}(x). \end{aligned}$$

Also, since \(\widetilde{f}^{+}_{N} \circ T = \widetilde{f}^{+}_{N}\) and \(\widetilde{f}^{-}_{N} \circ T^{- 1} = \widetilde{f}^{-}_{N}\), we get \(\widetilde{f}^{+}_{N} \circ T^{- 1} = \widetilde{f}^{+}_{N}\) and hence, \(\widetilde{f}^{+}_{N} \circ T^{- k} = \widetilde{f}^{+}_{N}\) for all \(k \in \mathbb {N}\). Therefore,

$$\begin{aligned} \widetilde{f}^{+}_{N} (x) = \widetilde{f}^{+}_{N} \circ T^{- (N - 1)} (x)= & {} \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f^{+}_{N} (T^{k} (T^{- (N - 1)} x)) \\= & {} \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f^{+}_{N} \circ T^{- (N - 1)} (T^{k} x) \\= & {} \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} f^{-}_{N} (T^{k}x ) \\= & {} \widetilde{f}^{-}_{N} (x). \end{aligned}$$

Hence \(\widetilde{f}^{+} = \lim _{n \rightarrow \infty } \widetilde{f}^{+}_{n} = \lim \limits _{n \rightarrow \infty } \widetilde{f}^{-}_{n} = \widetilde{f}^{-}\) (this holds because, \(f_{n} \rightarrow f\) implies \(\widetilde{f}_{n} \rightarrow \widetilde{f}\)).    \(\square \)

Definition 3.12

  1. 1.

    A measurable flow in a measure space \((X, \mathfrak {M}, \mu )\) is a map \(\tau : X \times \mathbb {R} \longrightarrow X\) that satisfies the following two conditions:

    1. (a)

      \(\tau \) is measurable with respect to the product measure \(\mu \times \lambda \) on \(X \times \mathbb {R}\) and the measure \(\mu \) on X. Here, \(\lambda \) is the Lebesgue measure on \(\mathbb {R}\).

    2. (b)

      For \(t \in \mathbb {R}\), the maps \(\tau _{t} (x) := \tau (x, t)\) form a one-parameter group of transformations of X to itself with \(\tau _{0} = \) identity on X and \(\tau _{t + s} = \tau _{t} \circ \tau _{s}\) for \(t, s \in \mathbb {R}\).

  2. 2.

    A measurable flow \(\tau _{t}\) is measure preserving or is \(\mu \)-invariant if \(\mu (\tau _{t} A) = \mu (A)\) for every \(t \in \mathbb {R}\) and every \(A \in \mathfrak {M}\).

Remark 3.13

If \(\tau _{t}\) is a measure preserving flow on a finite measure space \((X, \mathfrak {M}, \mu )\) and if \(f \in L^{1} (\mu )\), then the limits

$$\begin{aligned} f^{+} = \lim _{T \rightarrow \infty } \frac{1}{T} \int _{0}^{T} f (\tau _{t} x)\, dt\ \ \ \text {and}\ \ \ f^{-} = \lim _{T \rightarrow \infty } \frac{1}{T} \int _{0}^{T} f (\tau _{- t} x)\, dt \end{aligned}$$

exist and are equal for \(\mu \) - a.e x.

Proof

Let \(F (x) = \int _{0}^{1} f (\tau _{t} x)\, dt\). Since f and \(\tau \) are measurable, \(f \circ \tau (x, t) = f(\tau _{t} x)\) is measurable and by Fubini theorem \(F (x) = \int _{0}^{1} f (\tau _{t} x)\, dt\) is \(\mu \)-measurable and is in \(L^{1} (\mu )\) since \(f \in L^{1} (\mu )\).

Now

$$\begin{aligned} \lim \limits _{n \rightarrow \infty } \frac{1}{n} \int _{0}^{n} f (\tau _{t} x)\, dt = \lim \limits _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} F (\tau _{1}^{k} (x)) \end{aligned}$$

(where \(\tau _{1} (x) = \tau (x, 1) : X \times \mathbb {R} \rightarrow X\)) exists for \(\mu \) a.e. x by Birkhoff ergodic theorem.

Let

$$\begin{aligned} \widetilde{f}(x) = \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} F( \tau _{1}^{k} (x) ) = \lim _{n \rightarrow \infty } \frac{1}{n} \int _{0}^{n} f(\tau _{t} x)\, dt. \end{aligned}$$

If \(t \in \mathbb {R},\ t > 0\) is such that \(n< t <n + 1\) for \(n \in \mathbb {N} \cup \{ 0 \}\), then

$$\begin{aligned} \left| \int _{0}^{t} f (\tau _{t} x)\, dt - \int _{0}^{n} f (\tau _{t} x)\, dt \right|= & {} \left| \int _{n}^{t} f (\tau _{t} x)\, dt \right| \\\le & {} \left| \int _{n}^{n + 1} f(\tau _{t} x)\, dt \right| \\\le & {} \int _{n}^{n + 1} \left| f(\tau _{t} x) \right| \, dt \\= & {} \int _{0}^{1} \left| f(\tau _{1}^{n} \circ \tau _{t} (x)) \right| \, dt \\= & {} \int _{0}^{1} \left| f(\tau _{t} x) \right| \, dt, \end{aligned}$$

where the last equality follows from Theorem 3.4.

Since

$$\begin{aligned} \frac{1}{n} \int _{0}^{1} \left| f(\tau _{t} x) \right| \, dt \rightarrow 0\ \ \text {as}\ \ n \rightarrow \infty , \end{aligned}$$

we have

$$\begin{aligned} \frac{1}{t} \left| \int _{n}^{t} f(\tau _{t} x)\, dt \right| \ \ \le \ \ \frac{1}{n} \left| \int _{n}^{t} f (\tau _{t} x)\, dt \right| \rightarrow 0\ \ \text {as}\ \ n \rightarrow \infty . \end{aligned}$$

Since \(t \rightarrow \infty \) as \(n \rightarrow \infty \), we have

$$\begin{aligned} \frac{1}{t} \int _{n}^{t} f (\tau _{t} x)\, dt \rightarrow 0\ \ \text {as}\ \ t \rightarrow \infty , \end{aligned}$$

and hence

$$\begin{aligned} \frac{1}{t} \int _{0}^{t} f (\tau _{t} x)\, dt \rightarrow \widetilde{f} (x)\ \ \text {as}\ \ t \rightarrow \infty . \end{aligned}$$

Now the remark follows by virtue of the preceding Proposition 3.11.    \(\square \)

Definition 3.14

  1. 1.

    Let \((X, \mathfrak {M}, \mu )\) be a probability space. If \(A \in \mathfrak {M}\) and T is a measure preserving transformation of X, then A is said to be T-invariant if \(\mu (T^{-1} A \Delta A) = 0\). A is said to be strictly T-invariant if \(T^{- 1} A = A\).

  2. 2.

    A measurable function \(f : X {\longrightarrow } \mathbb {R}\) is T-invariant if \(\mu \left( \! \left\{ x\ :\ f(T x) {\ne } f(x) \!\right\} \right) = 0\). f is strictly T-invariant if \(f(T x) = f(x)\) for all x.

The next two observations seek to bridge the divide between T-invariant and strictly T-invariant sets (or functions).

Lemma 3.15

  1. 1.

    If \(A \in \mathfrak {M}\) is a T-invariant set, then there is a strictly T-invariant set \(A_{\infty }\) such that \(\mu \left( A_{\infty } \right) = \mu (A)\).

  2. 2.

    If f is a T-invariant function, then there is a strictly T-invariant function \(\bar{f}\) such that \(\bar{f} (x) = f (x)\) -a.e.

Proof

  1. 1.

    Let

    $$\begin{aligned} A_{\infty }\ \ =\ \ \bigcap _{n = 0}^{\infty } \bigcup _{i = n}^{\infty } T^{- i} A. \end{aligned}$$

    It is easy to check that \(A_{\infty } \in \mathfrak {M},\ T^{- 1} A_{\infty } = A_{\infty }\) and \(\mu \left( A_{\infty } \right) = \mu (A)\).

  2. 2.

    Let

    $$\begin{aligned} A_{f}\ \ =\ \ \left\{ x\ :\ f (T^{k} x) = f(x)\ \text {for some}\ k \in \mathbb {N} \right\} . \end{aligned}$$

    Clearly, \(A_{f}\) has measure 1, since the set \(\left\{ x\ :\ f (T x) = f (x) \right\} \) is contained in \(A_{f}\). Let

    $$ \bar{f} (x)\ \ =\ \ {\left\{ \begin{array}{ll} f(y) &{} \text {if}\ y = T^{k} (x) \in A_{f}\ \text {for some}\ k \in \mathbb {N} \\ 0 &{} \text {otherwise}. \end{array}\right. } $$

    It is easy to see that \(\bar{f}\) is well-defined, strictly T-invariant and \(\bar{f} = f\)-a.e.    \(\square \)

Let us find out the conditions under which the limit \(\widetilde{f}(x)\) in the ergodic theorem is constant a.e. for every \(f \in L^{1}(\mu )\).

Suppose \(\widetilde{f}(x) = \)constant -a.e. for every \(f \in L^{1}(\mu )\). Let \(A \in \mathfrak {M}\) be a strictly T-invariant set and let \(\chi _{A}\) be the characteristic function of A.

The ergodic theorem for \(\chi _{A}\) implies \(\int _{X} \widetilde{\chi }_{A}\, d \mu = \int _{X} \chi _{A}\, d \mu = \mu (A)\). Now

$$\begin{aligned} \widetilde{\chi }_{A} (x) = \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{k = 0}^{n - 1} \chi _{A} (T^{k} x). \end{aligned}$$

Since \(A = T^{- 1} A,\ T x \in A\) if and only if \(x \in T^{- 1} A = A\) or \(T^{k} x \in A\) if and only if \(x \in T^{- k} A = A\) for \(k \in \mathbb {N}\). Therefore,

$$ \widetilde{\chi }_{A} (x) = {\left\{ \begin{array}{ll} 1 &{} \text {if}\ x \in A \\ 0 &{} \text {if}\ x \notin A. \end{array}\right. } $$

By assumption, \(\widetilde{\chi }_{A} (x) = \)constant -a.e. Therefore, \(\widetilde{\chi }_{A} = 0\) or 1 -a.e. This implies \(\mu (A) = 0\) or 1. That is, every T-invariant set has measure either 0 or 1.

Now, suppose on the contrary that if \(A \in \mathfrak {M}\) is T-invariant then \(\mu (A) = 0\) or 1. Let \(f \in L^{1} (\mu )\) and let \(\widetilde{f} (x)\) be the limit as in the ergodic theorem. By ergodic theorem, \(\widetilde{f} \circ T = \widetilde{f}\) -a.e. on X.

Let

$$\begin{aligned} A(k, n)\ \ =\ \ \left\{ x\ :\ \frac{k}{2^{n}} \le \widetilde{f} (x) < \frac{k + 1}{2^{n}} \right\} \ \ \text {for}\ \ k \in \mathbb {Z},\ n \in \mathbb {N}. \end{aligned}$$

Now

$$\begin{aligned} T^{- 1} (A(k, n)) \Delta A(k, n) \subset \left\{ x\ :\ \widetilde{f} \circ T (x) \ne \widetilde{f} (x) \right\} . \end{aligned}$$

Therefore,

$$\begin{aligned} \mu (T^{- 1} (A(k, n)) \Delta A(k, n)) = 0 \end{aligned}$$

and hence, A(kn) is a T-invariant set and therefore \(\mu (A(k, n)) = 0\) or 1.

Now, for a fixed \(n \in \mathbb {N},\ \bigcup \limits _{k \in \mathbb {Z}} A(k, n) = X\) is a disjoint union. Therefore, for each \(n \in \mathbb {N}\), there exists a unique \(k_{n} \in \mathbb {Z}\) such that \(\mu (A(k_{n}, n)) = 1\).

Let \(Y = \bigcap \limits _{n = 1}^{\infty } A(k_{n}, n)\). Then \(\mu (Y) = 1\) (because \(\mu (Y^{c}) = 0\)). Since \(\widetilde{f}\) is constant on Y and \(\mu (Y) = 1,\ \widetilde{f}\) is constant a.e on X.

Definition 3.16

A measure preserving transformation \(T : X \longrightarrow X\), where (X\(\mathfrak {M}, \mu )\) is a probability measure space, is said to be ergodic if for every set \(A \in \mathfrak {M}\) which is T-invariant, one has \(\mu (A) = 0\) or 1.

Indeed we have shown that a measure preserving transformation T is ergodic if and only if every T-invariant function f is constant a.e. on X.

Proposition 3.17

Let \((X, \mathfrak {M}, \mu )\) be a second countable probability measure space such that every non-empty open subset of X has positive measure. If \(T : X \longrightarrow X\) is an ergodic transformation then

$$\begin{aligned} \mu \left( \left\{ x\ :\ \left\{ T^{n} x\ :\ n \ge 0\ \text {is dense in}\ X \right\} \right\} \right) = 1. \end{aligned}$$

That is, almost all points in X have dense orbits.

Proof

Let \(\left\{ U_{n} \right\} _{n = 1}^{\infty }\) be a basis for X. Let

$$\begin{aligned} Y\ \ =\ \ \left\{ x\ :\ \left\{ T^{n} x\ :\ n \ge 0 \text { is dense in}\ X \right\} \right\} . \end{aligned}$$

Clearly \(x \notin Y\) if and only if there is a basic open set \(U_{k}\) such that \(x \in \bigcap \limits _{n = 0}^{\infty } (X \setminus T^{- n} (U_{k})) = P\), say. It is easy to see that \(P \subset T^{- 1} (P)\). Since T is measure preserving and \(P \in \mathfrak {M},\ \mu (T^{- 1} P) = \mu (P)\). Therefore, \(T^{- 1} P \equiv P \pmod {0}\) and hence, P is T-invariant. Also \(U_{n} \cap P = \varnothing \) and since \(\mu (U_{k}) > 0\), we must have \(\mu (P) = 0\), which implies \(\mu (P^{c}) = 1\). Ergo, \(P^{c}\) consists of points x whose T-orbits are dense in X.    \(\square \)

Example 3.18

Let \(X = [0, 1)\) be equipped with the Lebesgue measure. If \(c \in \mathbb {R}\), the map \(T_{c} : X \longrightarrow X\) defined by

$$\begin{aligned} T_{c} (x)\ =\ x + c \pmod {1}\ =\ \left\{ x + c \right\} \ \ \text {{i.e.}, fractional part of}\ x + c. \end{aligned}$$

It is clear that \(T_{c}\) preserves the Lebesgue measure, and it is easy to see that if \(c \in \mathbb {Q}\), then \(T_{c}\) is periodic and all orbits are finite having same cardinality. Therefore, \(T_{c}\) is not ergodic when c is rational.

Example 3.19

If X is the circle \(\mathbb {S} = \left\{ \! z {\in } \mathbb {C}\ :\ \left| z \right| {=} 1 \right\} \) with the normalised Lebesgue measure, then \(T : \mathbb {S} \longrightarrow \mathbb {S}\) defined as \(T(z) = a z\) is measure preserving, as can be easily verified. Then T is ergodic iff a is not a root of unity. For, suppose a is a root of unity, i.e., \(a^{p} = 1\) for some \(p \ne 0\). Then \(f(z) = z^{p}\). Clearly \(f \circ T = f\), but f is not constant a.e. Therefore, T is not ergodic.

Conversely, suppose a is not a root of unity and let \(f(z) = \sum \limits _{n = - \infty }^{\infty } b_{n} z^{n}\) be its Fourier expansion. Now, \(f \circ T = f\) implies \(\sum \limits _{n = - \infty }^{\infty } b_{n} a^{n} z^{n} = \sum \limits _{n = - \infty }^{\infty } b_{n} z^{n}\). Hence, \(b_{n} (a^{n} - 1) = 0\). As \(a^{n} \ne 1\), for any \(n \ne 0\), we must have \(b_{n} = 0\) for all \(n \ne 0\). Consequently, it follows that f is constant a.e. and that T is ergodic. Alternatively, if \(a = e^{2 \pi i c n}\), then T is ergodic whenever c is irrational.

4 Geodesic Flows on Closed Surfaces

Let M be a compact or, more generally, a complete, smooth manifold endowed with a Riemannian metric g, and let SM denote the associated unit tangent bundle. That is,

$$\begin{aligned} SM\ \ =\ \ \left\{ (x, v)\ :\ x \in M,\ v\ \text {is a unit tangent vector to}\ M\ \text {at}\ x \right\} . \end{aligned}$$

For each \(t \in \mathbb {R}\), consider the transformation \(\phi ^{t} : SM \longrightarrow SM\) defined as follows: Given \((x, v) \in SM\), let \(\gamma _{v}\) be the unique geodesic in M passing through the point \(x \in M\) and with v as its tangent vector at x. Since M is a manifold which is complete, \(\gamma _{v}\) is defined on all of \(\mathbb {R}\). Moreover, given any two points \(p, q \in M\) there exists a geodesic joining p and q that realises the distance between them. Now set

$$\begin{aligned} \phi ^{t} (x, v)\ \ =\ \ (\gamma _{v} (t), \gamma '_{v} (t)). \end{aligned}$$
(2)

It is easy to verify that \(\phi ^{t}\) as defined above for all \(t \in \mathbb {R}\) constitutes a 1-parameter group of transformations, called the geodesic flow, and satisfies the following properties:

  1. 1.

    \(\phi ^{t} \circ \phi ^{s} = \phi ^{t + s} = \phi ^{s + t} = \phi ^{s} \circ \phi ^{t}\) and \(\phi ^{0} = \mathrm{Id}|_{SM}\).

  2. 2.

    \(\phi ^{t}\) is measure preserving where the measure under consideration is the Liouville measure given locally by the product of the Riemannian volume [form] on M, (i.e., \(\sqrt{ \mathrm{det} (g_{i j}) }\; d x_{1} \wedge \cdots \wedge d x_{n})\) - also called the Riemannian measure and the usual Lebesgue measure on the unit sphere.

It would be illuminating to look at a simple example of the geodesic flow.

Example 4.1

Suppose \(M = \mathbb {S}^2\), the unit 2-sphere, then M admits a metric of constant positive curvature. Since all of its geodesics are great circles, it means that every orbit of the geodesic flow is periodic, and is therefore not ergodic.

Following up on the previous example, the question of ergodicity of the geodesic flow on closed surfaces of constant negative curvature is treated in the sequel.

The Gauss-Bonnet theorem suggests that a compact Riemann surface with genus \(\ge 2\) admits a Riemannian metric of constant negative curvature.

We shall initially see how to define such a metric on these surfaces. The universal cover of the surface is, in fact, the upper half plane \(\mathbb {H}^{2}\), where \(\mathbb {H}^{2} = \left\{ z \in \mathbb {C}\ :\ \mathrm{Im} (z) > 0 \right\} \), equipped with the metric \(\displaystyle {ds = \frac{\sqrt{dx^{2} + dy^{2}}}{y}}\), which is a metric of constant negative curvature, called the hyperbolic metric. Therefore, we first discuss the geometry of the upper half plane.

4.1 Isometries and Geodesics of \(\mathbb {H}^{2}\)

Let \(\gamma : I \longrightarrow \mathbb {H}^{2}\) be a piecewise differentiable path parametrised as

$$\begin{aligned} \gamma (t)\ \ =\ \ \left\{ z(t)\ =\ x(t) + i y(t) \in \mathbb {H}^{2}\ :\ t \in I \right\} ,\ \ \text {where}\ \ I = [0, 1]. \end{aligned}$$

Then, the hyperbolic length \(l(\gamma )\) of the path is given by

$$\begin{aligned} l(\gamma )\ \ =\ \ \int \limits _{0}^{1} \frac{\sqrt{(\frac{dx}{dt})^{2} + (\frac{dy}{dt})^{2}}}{y(t)}\, d t\ \ =\ \ \int \limits _{0}^{1} \frac{\left| \frac{dz}{dt} \right| }{y(t)}\, d t. \end{aligned}$$
(3)

The hyperbolic distance \(\rho _{h} (z, w)\) between any two points \(z, w \in \mathbb {H}^{2}\) is given as \(\rho _{h} (z, w) = \inf l(\gamma )\), where the infimum is taken over all piecewise differentiable paths \(\gamma \) joining z and w in \(\mathbb {H}^{2}\).

A natural question is to look at the isometries of \(\mathbb {H}^{2}\); i.e., transformations on \(\mathbb {H}^{2}\) preserving the hyperbolic distance \(\rho _{h}\) defined above. This leads us to a particular group of matrices denoted as \(\mathrm{PSL}(2, \mathbb {R})\).

In order to place the elements in \(\mathrm{PSL}(2, \mathbb {R})\), we first look at the group of matrices \(\mathrm{SL}(2, \mathbb {R})\) consisting of all \(2 \times 2\) real matrices of the form

$$\begin{aligned} g\ \ =\ \ \begin{pmatrix} a &{} b \\ c &{} d \end{pmatrix}\ \ \text {where}\ \ \mathrm{det} (g) = 1. \end{aligned}$$
(4)

Quite clearly, the above group of matrices assumes a correspondence with the group of all fractional linear transformations of \(\mathbb {C}\) onto itself of the form

$$\begin{aligned} \bigg \{ z \longmapsto \frac{a z + b}{c z + d}\ :\ a d - b c = 1;\; a, b, c, d\, \in \mathbb {R} \bigg \} \end{aligned}$$

with the product of two such transformations being equivalent to the product of two corresponding matrices in \(\mathrm{SL}(2, \mathbb {R})\) and the inverse of a given transformation corresponding to the inverse matrix.

However the correspondence is not 1-1, rather any such fractional linear transformation is represented by a pair of matrices \(\pm g \). Ergo, the group of all fractional linear transformations, henceforth identified with \(\mathrm{PSL}(2, \mathbb {R})\), is isomorphic to \(\mathrm{SL}(2, \mathbb {R})/\pm I\), where I is the \(2 \times 2\) identity matrix. The corresponding identity transformation in \(\mathrm{PSL}(2, \mathbb {R})\) will be denoted by Id.

Remark 4.2

Note that \(\mathrm{PSL}(2, \mathbb {R})\) contains all fractional linear transformations of the form \(z \longmapsto \frac{a z + b}{c z + d} \), where \(a d - b c = \Delta > 0\), as dividing the numerator and denominator by \(\sqrt{\Delta }\) gives a new matrix of determinant 1, but resulting in the same transformation on \(\mathbb {H}^{2}\). In particular, \(\mathrm{PSL}(2, \mathbb {R})\) contains transformations of the form \(z \longmapsto a z + b,\ a, b \in \mathbb {R},\ a > 0\) and those of the form \(\displaystyle {z \longmapsto \frac{-1}{z}}\).

Remark 4.3

\(\mathrm{PSL}(2, \mathbb {R})\) acts on \(\mathbb {H}^{2}\) by homeomorphisms. In fact, \(\mathrm{PSL}(2, \mathbb {R}) \subset \mathrm{Isom} (\mathbb {H}^{2})\), the group of all isometries of \(\mathbb {H}^{2}\) (i.e., transformations of \(\mathbb {H}^{2}\) onto itself preserving the hyperbolic distance on \(\mathbb {H}^{2}\)).

Proof

Firstly, any transformation of the form \(\displaystyle {z \longmapsto \frac{a z + b}{c z + d}}\) on \(\mathbb {C}\) maps \(\mathbb {H}^{2}\) onto itself. Given any \(T \in \mathrm{PSL}(2, \mathbb {R})\), let \(\displaystyle {w = T(z) = \frac{a z + b}{c z + d}}\). Then,

$$\begin{aligned} w\ \ =\ \ \frac{(a z + b)(c z + d)}{\left| c z + d \right| ^{2}}\ \ =\ \ \frac{a c \left| z \right| ^{2} + a d z + b c \overline{z} + b d}{\left| c z + d \right| ^{2}}. \end{aligned}$$

Hence, the imaginary part \(\mathrm{Im}(w)\) of w is,

$$\begin{aligned} \mathrm{Im}(w)\ \ =\ \ \frac{w - \overline{w}}{2i}\ \ =\ \ \frac{z - \overline{z}}{2 i \left| c z + d \right| ^{2}}\ \ =\ \ \frac{\mathrm{Im}(z)}{\left| c z + d \right| ^{2}}. \end{aligned}$$

Therefore, \(\mathrm{Im}(z)> 0 \iff \mathrm{Im}(w) > 0\). As T is continuous and its inverse exists, we conclude that T is a homeomorphism of \(\mathbb {H}^{2}\) onto itself.

To show that \(T \in \mathrm{PSL}(2, \mathbb {R})\) is an isometry of \(\mathbb {H}^{2}\) onto itself, we show that if \(\gamma : I \longrightarrow \mathbb {H}^{2}\) is a piecewise differentiable path in \(\mathbb {H}^{2}\), then \(l \left( T(\gamma ) \right) = l (\gamma )\). Therefore, suppose \(\gamma := z(t) = x(t) + i y(t)\), and \(T(\gamma )\) is given by \(w(t) = T (z(t)) = u(t) + i v(t)\). Now

$$\begin{aligned} \frac{dw}{dz}\ \ =\ \ \frac{a(c z + d) - c(a z + b)}{(c z + d)^{2}}\ \ =\ \ \frac{1}{(c z + d)^{2}}. \end{aligned}$$

Since \(\displaystyle {v = \frac{y}{\left| c z + d \right| ^{2}}}\), we have \(\displaystyle {\left| \frac{dw}{dz} \right| = \frac{v}{y}}\). Therefore,

$$\begin{aligned} l (T(\gamma ))= & {} \int \limits _{0}^{1} \frac{\left| \frac{dw}{dt} \right| }{v(t)}\, d t\ \ = \ \ \int \limits _{0}^{1} \frac{\left| \frac{dw}{dz} \frac{dz}{dt} \right| }{v(t)}\, d t \\= & {} \int \limits _{0}^{1} \frac{\left| \frac{dw}{dz} \right| \left| \frac{dz}{dt} \right| }{v(t)}\, d t\ \ =\ \ \int \limits _{0}^{1} \frac{\left| \frac{dz}{dt} \right| }{y(t)}\, d t\ \ =\ \ l(\gamma ). \end{aligned}$$

   \(\square \)

It is a fact that isometries take geodesics to geodesics and hence any transformation in \(\mathrm{PSL}(2, \mathbb {R})\) maps geodesics to geodesics. We now determine the geodesics on the hyperbolic plane.

Theorem 4.4

The geodesics in \(\mathbb {H}^{2}\) are semicircles and straight lines orthogonal to the real axis.

Proof

Let \(z_{1}, z_{2} \in \mathbb {H}^{2}\). First suppose \(z_{1} = i a\) and \(z_{2} = i b\) with \(b > a\) which are two points on the imaginary axis. If \(\gamma : [0, 1] \longrightarrow \mathbb {H}^{2}\) is any path joining ia to ib, with \(\gamma (t) = x(t) + i y(t)\), then

$$\begin{aligned} l(\gamma ) = \int \limits _{0}^{1} \frac{\sqrt{(\frac{dx}{dt})^{2} + (\frac{dy}{dt})^{2}}}{y(t)}\, d t \ge \int \limits _{0}^{1} \frac{\left| \frac{dy}{dt} \right| }{y(t)}\, d t \ge \int \limits _{a}^{b} \frac{dy}{y} \ge \ln {\frac{b}{a}}. \end{aligned}$$

It is easy to verify that the equality in the above expression is realised by the hyperbolic length of the segment of the y-axis joining ia to ib which is of length \(\displaystyle {\ln {\frac{b}{a}}}\) and hence the geodesic joining the points ia and ib is the segment of the imaginary axis between them.

If \(z_{1}, z_{2} \in \mathbb {H}^{2}\) are arbitrary, let L be the unique Euclidean semi-circle or straight line orthogonal to the real axis passing through \(z_{1}\) and \(z_{2}\), then there exists a transformation in \(\mathrm{PSL}(2, \mathbb {R})\) which maps L into the imaginary axis. The transformation \(\displaystyle {T(z) = \frac{- 1}{z - a}}\) takes a to \(\infty \) and b to \(\displaystyle {\frac{1}{b - a}\ (> 0)}\), and the transformation \(\displaystyle {S(z) = z - \frac{1}{b - a} = z - c}\) takes \(\infty \) to \(\infty \) and c to 0. Thus,

$$\begin{aligned} S \circ T = \begin{pmatrix} 1 &{} - c \\ 0 &{} 1 \end{pmatrix}\ \begin{pmatrix} 0 &{} - 1 \\ 1 &{} - a \end{pmatrix}\ =\ \begin{pmatrix} - c &{} - 1 - a c \\ 1 &{} - a \end{pmatrix} \end{aligned}$$

is the transformation in \(\mathrm{PSL}(2, \mathbb {R})\) that takes (ab) to \((\infty , 0)\). Since each element of \(\mathrm{PSL}(2, \mathbb {R})\) is an isometry of \(\mathbb {H}^{2}\) and segments of the imaginary axis are geodesics, we conclude that the geodesic joining \(z_{1}\) and \(z_{2}\) is the segment of L joining them.    \(\square \)

Since \(\mathrm{PSL}(2, \mathbb {R})\) acts by isometries on \(\mathbb {H}^{2}\), it acts on the unit tangent bundle \(S\mathbb {H}^{2}\) as

$$\begin{aligned} g (z, \zeta ) = \left( g(z), D_{z} g (\zeta ) \right) = \left( g(z), \frac{1}{(c z + d)^{2}} \right) , \end{aligned}$$

where \(z \in \mathbb {H}^{2},\ \zeta \in T_{z} \mathbb {H}^{2}\) such that \(\left\| \zeta \right\| = 1\) and \(g = \begin{pmatrix} a &{} b \\ c &{} d \end{pmatrix} \in \mathrm{PSL}(2, \mathbb {R})\).

Lemma 4.5

The action of \(\mathrm{PSL}(2, \mathbb {R})\) on \(S \mathbb {H}^{2}\) is transitive and free, i.e., all isotropy groups are trivial.

Proof

Let \(z_{0} = i\) and \(\zeta _{0}\) be the unit tangent vector at \(z_{0}\) pointing in the positive direction of the imaginary axis. Let \((z, \zeta ) \in S \mathbb {H}^{2}\) and \(\sigma \) be the positive imaginary half axis starting from \(z_{0}\). Let L be the unique geodesic determined by \((z, \zeta )\). Let \(g \in \mathrm{PSL}(2, \mathbb {R})\) be the transformation taking \(\sigma \) to L, i.e., \(g (\sigma ) = L\), with \(g(z_{0}) = z\). Since transformations of \(\mathrm{PSL}(2, \mathbb {R})\) have positive determinant, they preserve orientation and hence the condition that \(D_{z_{0}} g (\zeta _{0}) = \zeta \) forces g to be unique; we will, therefore, denote it by \(g_{z \zeta }\).    \(\square \)

Remark 4.6

In the above lemma, taking \((z, \zeta ) \in S \mathbb {H}^{2}\) to \(g_{z \zeta } \in \mathrm{PSL}(2, \mathbb {R})\), sets up a bijection F between \(S \mathbb {H}^{2}\) and \(\mathrm{PSL}(2, \mathbb {R})\), and is easily seen to be a diffeomorphism.

Let \(z_{0} = i\) and \(\zeta _{0}\) be as in the proof of Lemma 4.5. Given an arbitrary \((z, \zeta ) \in S \mathbb {H}^{2}\), let \(g_{z \zeta }\) be the unique element of \(\mathrm{PSL}(2, \mathbb {R})\) (which exists by virtue of the lemma) that takes \((z_{0}, \zeta _{0})\) to \((z, \zeta )\) in \(S \mathbb {H}^{2}\). The uniqueness of the element \(g_{z \zeta }\) shows that the diffeomorphism F intertwines the action of \(\mathrm{PSL}(2, \mathbb {R})\) on \(S \mathbb {H}^{2}\) with the left multiplication in the group. That is,

$$\begin{aligned} g ( (z, \zeta ) ) = g \cdot g_{z \zeta }\ \ \forall g \in \mathrm{PSL}(2, \mathbb {R}). \end{aligned}$$

Proposition 4.7

The geodesic flow on \(S \mathbb {H}^{2}\) corresponds to the flow on the group \(\mathrm{PSL}(2, \mathbb {R})\) given by the right translation

$$\begin{aligned} g \longmapsto g \cdot g_{t},\ \ \text {where}\ \ g_{t} = \begin{pmatrix} e^{\frac{t}{2}} &{} 0 \\ 0 &{} e^{\frac{- t}{2}}, \end{pmatrix}\ \ \forall t \in \mathbb {R}. \end{aligned}$$

Proof

It is clear that \(\phi ^{t} (z_{0}, \zeta _{0}) = g_{t} (z_{0}, \zeta _{0})\), where \(\phi ^{t}\) is the geodesic flow. Therefore, for \((z, \zeta ) \in S \mathbb {H}^{2}\),

$$\begin{aligned} \phi ^{t} (z, \zeta ) = \phi ^{t} \left( g_{z \zeta } (z_{0}, \zeta _{0}) \right) = g_{z \zeta } \left( \phi ^{t} (z_{0}, \zeta _{0}) \right) = g_{z \zeta } \left( g_{t} (z_{0}, \zeta _{0}) \right) = g_{z \zeta } g_{t}. \end{aligned}$$

The second equality is a result of the fact that the action of \(\mathrm{PSL}(2, \mathbb {R})\) on \(\mathbb {H}^{2}\) is by isometries, and hence takes geodesics to geodesics as described in the proof of Lemma 4.5.    \(\square \)

Let \(\Sigma \) be a compact Riemann surface of genus \(g \ge 2\). Then \(\Sigma \) has \(\mathbb {H}^{2}\) as its universal cover, i.e., if \(\Gamma = \pi _{1} (\Sigma )\), the fundamental group of \(\Sigma \), then \(\Gamma \) acts freely and discontinuously on \(\mathbb {H}^{2}\) by deck transformations. Consequently, \(\Gamma \) can be identified with a discrete subgroup of \(\mathrm{PSL}(2, \mathbb {R})\) such that the quotient space \(\Sigma = \mathbb {H}^{2}/ \Gamma \) is compact. Further \(\Sigma \) is a Riemannian manifold with constant negative curvature \(- 1\) with respect to the metric induced from \(\mathbb {H}^{2}\) via the quotient map. The pictures in this page roughly serve to illustrate this procedure.

figure afigure a

Proposition 4.8

The identification of \(S\mathbb {H}^{2}\) with \(\mathrm{PSL}(2, \mathbb {R})\) induces an identification \(S \left( \mathbb {H}^{2}/\Gamma \right) \cong \Gamma \!\setminus \!\mathrm{PSL}(2, \mathbb {R})\). The geodesic flow on \(S \Sigma \) corresponds to the flow

$$\begin{aligned} \Gamma \setminus \mathrm{PSL}(2, \mathbb {R}) \longrightarrow \Gamma \!\setminus \!\mathrm{PSL}(2, \mathbb {R}),\ \ \ \ \Gamma g \longmapsto \Gamma gg_t, \end{aligned}$$

where \(g_{t} = \begin{pmatrix} e^{\frac{t}{2}} &{} 0 \\ 0 &{} e^{\frac{- t}{2}} \end{pmatrix}\).

Proof

Since \((z, \zeta ) \longmapsto g_{z \zeta }\) intertwines the action of \(\mathrm{PSL}(2, \mathbb {R})\), the proof follows from the previous proposition and is left as an exercise to the reader.    \(\square \)

4.2 Hopf’s Proof of Ergodicity

In this section, we sketch a proof of the ergodicity of the geodesic flow \(g_{t}\) on \(\Gamma \!\setminus \!\mathrm{PSL}(2, \mathbb {R})\) that was originally presented by E. Hopf [9]. In this context, we introduce the notion of horocycles, some of whose illustrative examples are the lines parallel to the x-axis in \(\mathbb {H}^{2}\). As we shall soon discover, horocycles have a very special role in the study of the dynamics of the geodesic flow.

Lines parallel to the x-axis can also be viewed as orbits of points in \(\mathbb {H}^{2}\) under the action of the 1-parameter subgroup of \(\mathrm{PSL}(2, \mathbb {R})\) consisting of matrices of the form \(H_{s}^{+} = \begin{pmatrix} 1 &{} s \\ 0 &{} 1 \end{pmatrix}\); that is, transformations of the form \(z \longmapsto z + s\). Being orthogonal to the lines parallel to the y-axis in \(\mathbb {H}^{2}\), it turns out that their images, under a typical element of \(\mathrm {PSL}(2, \mathbb {R})\) taking \(\infty \) to a point \(x_{0}\) on the x-axis, are the Euclidean circles in \(\mathbb {H}^{2}\) tangent to the x-axis at the point \(x_{0}\).

Moving a step further, and using the identification of \(\mathrm {PSL}(2, \mathbb {R})\) with \(S\mathbb {H}^{2}\), we see that the 1-parameter subgroup \(H^{+}_{s}\), of \(\mathrm {PSL}(2, \mathbb {R})\), defines a measure preserving flow on \(S\mathbb {H}^{2}\). In a similar fashion, we observe that the 1-parameter subgroup \(H_{r}^{-} = \begin{pmatrix} 1 &{} 0 \\ r &{} 1 \end{pmatrix}\) of \(\mathrm{PSL}(2, \mathbb {R})\) also defines a measure preserving flow on \(S \mathbb {H}^{2}\). The flow \(H^{+}_{s}\) is termed the stable horocycle flow while \(H_{r}^{-}\) is termed the unstable horocycle flow.

The next figure serves to illustrate the orbits of a vector \(v \in S \mathbb {H}^{2}\) under the dynamics of the two horocycle flows, in relation to the geodesic flow.

Fig. 1
figure 1figure 1

Geodesic and horocycle flows

The two horocycle flows determine vector fields on \(S\mathbb {H}^{2}\) which are linearly independent, i.e., at any given point of \(S \mathbb {H}^{2}\), the tangent vectors of the corresponding vector fields are linearly independent and hence, together with the tangent vector given by geodesic flow vector field, span the tangent space to \(S \mathbb {H}^{2}\) at that point.

4.2.1 A Historical Interlude

Eberhard Hopf exploited the interrelation between the stable and unstable horocycle flows and the geodesic flow in his proof. Historically it was G.A. Hedlund [7] who, in 1934, first proved that the geodesic flow on closed surfaces of constant negative curvature is ergodic (which was called metric transitivity at that time). In 1936, E. Hopf gave another proof of ergodicity in the case considered by Hedlund. Hedlund was also the first to recognize the importance of the close relationship between horocycle and geodesic flows. Later, in 1939, Hedlund proved [8] stronger properties (like mixing) for geodesic flow on surfaces of finite area and constant negative curvature. Erogdicity was extended to arbitrary dimensions for manifolds of constant negative curvature by Hopf in 1939. In the same paper [9], Hopf also proved that the geodesic flow is ergodic for a surface of finite area and of variable negative curvature under the restriction that the curvature and its first derivatives are bounded in absolute value (Fig. 1).

Gelfand and Fomin, in 1952 [5], provided the next impetus by proving the stronger property of mixing for the case of manifolds of higher dimension and constant negative curvature. Their approach and method was generalised by Mautner in 1957 [11] to prove ergodicity of the geodesic flow on locally symmetric spaces of negative curvature and arbitrary dimensions.

However the question remained open in the case of variable curvature in arbitrary dimension until 1960s when the work of Anosov and Sinai [2] led Anosov to prove ergodicity for closed manifolds of negative curvature and arbitrary dimension [1]. The approach adopted in the work of Anosov and Sinai enabled Anosov to overcome the difficulty faced by Hopf, and Anosov proved ergodicity for manifolds of finite volume and variable negative curvature under exactly the same hypothesis considered by Hopf in 1939 [9], namely when the covariant derivative of the curvature tensor is bounded in absolute value.

Remark 4.9

For manifolds of finite volume and variable negative curvature without the boundedness assumption on the first derivatives of curvature, to the best of our knowledge, the question of ergodicity is still an outstanding open problem (even for surfaces!).

Resuming the sketch of Hopf’s proof, let \(f : S \Sigma \longrightarrow \mathbb {R}\) be a continuous function with compact support where \(\Sigma \) is a surface of genus \(g \ge 2\) with the hyperbolic metric. Note that as a consequence of Theorem 2.39, it suffices to consider continuous functions with compact support. We will show that f is constant a.e. when f is \(g_{t}\)-invariant.

For the three smooth flows \(g_{t}, H_{s}^{+}\) and \(H_{r}^{-}\) on \(\mathrm{PSL}(2, \mathbb {R})\), a routine computation shows that

$$\begin{aligned} H^{+}_{s} g_{t}\ =\ g_{t} H^{+}_{e^{- t} s}\ \ \text {and}\ \ H^{-}_{r} g_{t}\ =\ g_{t} H^{-}_{e^{- t} r}. \end{aligned}$$

From this, it follows that

$$\begin{aligned} f(x H^{+}_{s} g_{t})\ =\ f(x g_{t} H^{+}_{e^{- t} s})\ \ \text {and}\ \ f(x H^{-}_{r} g_{t})\ =\ f(x g_{t} H^{-}_{e^{- t} r}). \end{aligned}$$

Uniform continuity of f then implies that

$$\begin{aligned} \lim \limits _{t \rightarrow \infty } \left( f( x H^{+}_{s} g_{t} ) - f( x g_{t} ) \right) \ \ =\ \ \lim \limits _{t \rightarrow \infty } \left( f( x g_{t} H^{+}_{e^{- t} s}) - f( x g_{t}) \right) \ \ =\ \ 0 \end{aligned}$$

and

$$\begin{aligned} \lim \limits _{t \rightarrow \infty } \left( f( x H^{-}_{r} g_{t} ) - f( x g_{t}) \right) \ \ =\ \ \lim \limits _{t \rightarrow \infty } \left( f( x g_{t} H^{-}_{e^{- t} r} ) - f( x g_{t} ) \right) \ \ =\ \ 0. \end{aligned}$$

Therefore,

$$\begin{aligned} \lim \limits _{\tau \rightarrow \infty } \frac{1}{\tau } \int \limits _{0}^{\tau } \left( f( x g_{t} ) - f( x H_{s}^{+} g_{t} ) \right) \, d t\ \ =\ \ 0. \end{aligned}$$

Similarly,

$$\begin{aligned} \lim \limits _{\tau \rightarrow \infty } \frac{1}{\tau } \int \limits _{0}^{\tau } \left( f( x g_{- t} ) - f( x H_{r}^{-} g_{- t} ) \right) \, d t\ \ =\ \ 0. \end{aligned}$$

With the notation introduced in an earlier remark in this chapter, we note that \(\widetilde{f}^{+} ( x H_{s}^{+} )\) and \(\widetilde{f}^{-} ( x H_{r}^{-} )\) exist whenever \(\widetilde{f}^{+} (x)\) and \(\widetilde{f}^{-} (x)\) exist. Further, we conclude from the above that \(\widetilde{f}^{+} (x) = \widetilde{f}^{+} ( x H_{s}^{+} )\) and \(\widetilde{f}^{-} (x) = \widetilde{f}^{-} ( x H_{r}^{-} )\), and are equal a.e.

Let \(x_{0} \in S\Sigma \). We will construct an open neighbourhood of \(x_{0}\) as follows. Let \(\delta _{1}, \delta _{2}, \delta _{3} > 0\) be sufficiently small. Construct a smooth curve \(\gamma _{\delta _{1}} (x_{0})\) through \(x_{0}\) by defining

$$\begin{aligned} \gamma _{\delta _{1}} (x_{0})\ \ =\ \ \left\{ x_{0} H_{r}^{-}\ :\ \left| r \right| < \delta _{1} \right\} \end{aligned}$$

and then construct an open smooth surface \(\sigma _{\delta _{1}, \delta _{2}} (x_{0})\) by defining

$$\begin{aligned} \sigma _{\delta _{1}, \delta _{2}} (x_{0})= & {} \left\{ x_{0} H_{r}^{-} g_{t}\ :\ \left| r \right|< \delta _{1}, \left| t \right|< \delta _{2} \right\} \\= & {} \bigcup \limits _{\left| t \right| < \delta _{2}} \left( \gamma _{\delta _{1}} (x_{0}) \right) g_{t}. \end{aligned}$$

Finally, construct an open neighbourhood \(U_{\delta _{1}, \delta _{2}, \delta _{3}} (x_{0})\) by

$$\begin{aligned} U_{\delta _{1}, \delta _{2}, \delta _{3}} (x_{0})\ \ =\ \ \bigcup \limits _{\left| s \right| < \delta _{3}} \left( \sigma _{\delta _{1}, \delta _{2}} (x_{0}) \right) H_{s}^{+}. \end{aligned}$$

It follows from the smoothness of the corresponding vector fields that for sufficiently small \(\delta _{1}, \delta _{2}, \delta _{3}\), the surfaces \(\left( \sigma _{\delta _{1}, \delta _{2}} (x_{0}) \right) H_{s}^{+}\) are disjoint for distinct s with \(\left| s \right| < \delta _{3}\) and for the point

$$\begin{aligned} x\ \ =\ \ x_{0} H_{r}^{-} g_{t} H_{s}^{+}\ \ \in \ \ U_{\delta _{1}, \delta _{2}, \delta _{3}} (x_{0}), \end{aligned}$$

the numbers rts are smooth coordinates in \(U_{\delta _{1}, \delta _{2}, \delta _{3}} (x_{0})\). In fact, as \(x_{0}\) varies over a compact set on \(S \Sigma \), all of \(\delta _{1}, \delta _{2}, \delta _{3}\) can be chosen to be independent of \(x_{0}\). Now, the Liouville measure on \(S \Sigma \) induces conditional measures on each of the surfaces \(\left( \sigma _{\delta _{1}, \delta _{2}} (x_{0}) \right) H_{s}^{+}\), for all s and invoking Fubini’s theorem shows that for a.e. \(y \in \sigma _{\delta _{1}, \delta _{2}} (x_{0})\) (with respect to the induced conditional measure), one has \(\widetilde{f}^{+} (y) = \widetilde{f}^{-} (y)\); and this holds for \(x_{0}\) a.e. in \(S \Sigma \)(with respect to \(\mu \)).

We will now show that \(\widetilde{f} (x)\) is constant for \(x (= x_{0} H_{r}^{-} g_{t} H_{s}^{+})\) a.e. in \(U_{\delta _{1}, \delta _{2}, \delta _{3}} (x_{0})\). To this end, let

$$\begin{aligned} \widetilde{U}= & {} \bigg \{ x \in U_{\delta _{1}, \delta _{2}, \delta _{3}} (x_{0})\ :\ \widetilde{f}^{+} (x)\ \text {exists and} \\&\qquad \qquad \qquad \qquad \qquad \text {for}\ y = x_{0} H_{r}^{-} g_{t} \in \sigma _{\delta _{1}, \delta _{2}} (x_{0}),\ \widetilde{f}^{+} (y) = \widetilde{f}^{-} (y) \bigg \}. \end{aligned}$$

Since the vector fields are smooth, it follows from Fubini’s theorem that \(\widetilde{U}\) has full measure in \(U_{\delta _{1}, \delta _{2}, \delta _{3}} (x_{0})\). Further, if \(x_{1}, x_{2} \in \widetilde{U}\), with \(x_{1} = x_{0} N_{r_{1}}^{-} g_{t_{1}} N_{s_{1}}^{+}\) and \(x_{2} = x_{0} N_{r_{2}}^{-} g_{t_{2}} N_{s_{2}}^{+}\), and if \(y_{1}, y_{2}, z_{1}, z_{2}\) denote \(x_{0} N_{r_{1}}^{-} g_{t_{1}},\ x_{0} N_{r_{2}}^{-} g_{t_{2}},\ x_{0} N_{r_{1}}^{-}\) and \(x_{0} N_{r_{2}}^{-}\) respectively, then we have,

$$\begin{aligned} \widetilde{f}^{+} (x_{1})= & {} \widetilde{f}^{+} (y_{1})\ \ =\ \ \widetilde{f}^{-} (y_{1})\ \ =\ \ \widetilde{f}^{-} (z_{1}) \\= & {} \widetilde{f}^{-} (z_{2})\ \ =\ \ \widetilde{f}^{-} (y_{2})\ \ =\ \ \widetilde{f}^{+} (y_{2})\ \ =\ \ \widetilde{f}^{+} (x_{2}). \end{aligned}$$

Thus \(\widetilde{f}^{+}\) is constant in \(\widetilde{U}\), i.e., \(\widetilde{f}^{+}\) is constant a.e. in \(U_{\delta _{1}, \delta _{2}, \delta _{3}} (x_{0})\), which proves the ergodicity of \(g_{t}\).