6.1 Introduction

This chapter is one of three on the ergodicity of the Weil–Petersson (WP) geodesic flow. The first of these is a summary and outline of the work by Burns–Masur–Wilkinson, and the second one describes in depth the implementation of the Hopf argument on which that work relies. This chapter covers some of the aspects related to moduli spaces of Riemann surfaces (and Teichmüller theory) in the proofs of the ergodicity of WP flow [BMW] (see also Theorem 1 below) and the recent results of Burns, Masur, Wilkinson and the author [BMMW] on the rates of mixing of WP flow (see also Theorem 2 below).

6.1.1 An Overview of the Dynamics of WP Flow

Before giving precise definitions of the terms introduced above (e.g., moduli spaces of Riemann surfaces, Weil–Petersson geodesic flow, etc.), let us list and compare some properties of the WP flow and its close cousin the Teichmüller (geodesic) flow (see [Zo]) in order to get a flavor of their dynamical behaviors.

 

Teichmüller flow

WP flow

(a)

Comes from a Finsler metric

Comes from a Riemannian metric

(b)

Complete

Incomplete

(c)

Is part of a \(SL(2, \mathbb{R})\)-action

Is not part of a \(SL(2, \mathbb{R})\)-action

(d)

Non-uniformly hyperbolic

Singular hyperbolic

(e)

Related to flat geometry of Riemann surfaces

Related to hyperbolic geometry of Riemann surfaces

(f)

Transitive

Transitive

(g)

Periodic orbits are dense

Periodic orbits are dense

(h)

Finite topological entropy

Infinite topological entropy

(i)

Ergodic for the Liouville measure μ T

Ergodic for the Liouville measure μ WP

(j)

Metric entropy 0 < h(μ T ) <

Metric entropy 0 < h(μ WP ) <

(k)

Exponential rate of mixing

Mixing at most polynomial (in general)

Let us make some comments on both the common features and the significant differences between the Teichmüller and WP flows highlighted in the items above.

The Teichmüller flow is associated to a Finsler metric (i.e., a continuous family of norms) on the fibers of the cotangent bundle of the moduli spaces,Footnote 1 while the WP flow is associated to a Riemannian (and, actually, Kähler) metric called Weil–Petersson (WP) metric. In particular, the item (a) says that the WP flow comes from a metric that is smoother than the metric generating the Teichmüller flow. We will come back to this point later when defining the WP metric.

On the other hand, the item (b) says that the dynamics of WP flow is not so nice because it is incomplete, that is, there are certain WP geodesics that “go to infinity” in finite time. In particular, the WP flow is not defined for all time \(t \in \mathbb{R}\) when we start from certain initial data. We will make more comments on this later. Nevertheless, Wolpert [Wo03] showed that the WP flow is defined for all time \(t \in \mathbb{R}\) for almost every initial data with respect to the Liouville (volume) measure induced by WP metric, and, thus, the WP flow is a legitime flow from the point of view of Ergodic Theory.

The item (c) says that WP flow is less algebraic than Teichmüller flow because the former is not part of a \(SL(2, \mathbb{R})\)-action while the latter corresponds to the diagonal subgroup g t = diag(e t, e t) of \(SL(2, \mathbb{R})\) acting (in a natural way) on the unit cotangent bundle of the moduli spaces of Riemann surfaces. Here, it is worth to mention that the mere fact that the Teichmüller flow is part of a \(SL(2, \mathbb{R})\)-action makes its dynamics very rich: for instance, once one shows that the Teichmüller flow is ergodic (with respect to some \(SL(2, \mathbb{R})\)-invariant probability measure), it is possible to apply the Howe–Moore Theorem (or variants of it) to improve ergodicity into mixing (and, actually, exponential mixing) of Teichmüller flow (see, e.g., [AG] and [AGY] for more details).

The item (d) says that WP and Teichmüller flows (morally) are nonuniformly hyperbolic in the sense of Pesin theory [Pe2], but they are so for distinct reasons. The nonuniform hyperbolicity of the Teichmüller flow was shown by Veech [Ve] (for “volume”/Masur–Veech measure) and Forni [Fo] (for arbitrary invariant probability measures) and it follows from uniform estimates for the derivative of the Teichmüller flow on compact sets. On the other hand, the nonuniform hyperbolicity of the WP flow requires a slightly different argument because some sectional curvatures of WP metric approach − or 0 at certain places near the “boundary” of the moduli spaces. We will return to this point in the future.

The item (e) partly explains the interest of several authors in Teichmüller and WP flows. Indeed, since their introduction by Bernard Riemann in 1851 (in his PhD thesis), the study of Riemann surfaces and their moduli spaces became an important topic of research in both Mathematics and Physics (for reasons whose explanations are beyond the scope of these notes). In particular, the fact that the properties of the Teichmüller and WP flows on moduli spaces allows to recover geometrical information about Riemann surfaces motivated part of the literature on the dynamics of these flows. Concerning applications of these flows to the investigation of Riemann surfaces, it is natural to study the Teichmüller flow whenever one is interested in the properties of flat metrics with conical singularities on Riemann surfaces (cf. Zorich’s survey [Zo]), while it is more natural to study the WP metric/flow whenever one is interested in the properties of hyperbolic metrics on Riemann surfaces: for instance, Wolpert [Wo08] showed that the hyperbolic length of a closed geodesic in a fixed free homotopy class is a convex function along orbits of the WP flow, Mirzakhani [Mi08] proved that the growth of the hyperbolic lengths of simple geodesics on hyperbolic surfaces is related to the WP volume of the moduli space, and, after the works of Bridgeman [Bri2010], McMullen [McM08] and more recently Bridgeman–Canary–Labourie–Sambarino [BCLS] (among other authors), we know that the Weil–Petersson metric is intimately related to thermodynamical invariants (entropy, pressure, etc.) of the geodesic flow on hyperbolic surfaces.

Concerning items (f) to (h), Pollicott–Weiss–Wolpert [PWW10] showed the transitivity and denseness of periodic orbits of the WP flow in the particular case of the unit cotangent bundle of the moduli space \(\mathcal{M}_{1,1}\) (of once-punctured tori). In general, the transitivity, the denseness of periodic orbits and the infinitude of the topological entropy of the WP flow on the unit cotangent bundle of the moduli space \(\mathcal{M}_{g,n}\) of genus g Riemann surfaces with n marked points (for any g ≥ 1, n ≥ 1) were shown by Brock–Masur–Minsky [BMM10]. Moreover, Hamenstädt [Ham] proved the ergodic version of the denseness of periodic orbits, i.e., the denseness of the subset of ergodic probability measures supported on periodic orbits in the set of all ergodic WP flow invariant probability measures.

The ergodicity of WP flow (mentioned in item (i)) was first studied by Pollicott–Weiss [PW09] in the particular case of the unit cotangent bundle \(T^{1}\mathcal{M}_{1,1}\) of the moduli space \(\mathcal{M}_{1,1}\) of once-punctured tori: they showed that if the first two derivatives of the WP flow on \(T^{1}\mathcal{M}_{1,1}\) are suitably bounded, then this flow is ergodic. More recently, Burns–Masur–Wilkinson [BMW] were able to control in general the first derivatives of WP flow and they used their estimates to show the following theorem:

Theorem 1 (Burns–Masur–Wilkinson)

The WP flow on the unit cotangent bundle \(T^{1}\mathcal{M}_{g,n}\) of the moduli space \(\mathcal{M}_{g,n}\) of Riemann surfaces of genus g with n marked points is ergodic with respect to the Liouville measure μ WP of the WP metric whenever 3g − 3 + n ≥ 1. Actually, it is Bernoulli (i.e., it is measurably isomorphic to a Bernoulli shift) and, a fortiori, mixing. Furthermore, its measure-theoretic entropy h(μ WP ) is positive and finite.

The Teichmüller-theoretical aspects of this theorem will occupy the next two sections of this text. For now, we will just try to describe the general lines of Burns–Masur–Wilkinson arguments in Sect. 6.1.2 below.

However, before passing to this topic, let us make some comments about item (k) above on the rate of mixing of Teichmüller and WP flows.

Generally speaking, it is expected that the rate of mixing of a system (diffeomorphism or flow) displaying a “reasonable” amount of hyperbolicity is exponential: for example, the property of exponential rate of mixing was shown by Dolgopyat [Dol] (see also this article of Liverani [Liv]) for a large class of contact Anosov flows,Footnote 2 and by Avila–Gouëzel–Yoccoz [AGY] and Avila–Gouëzel [AG] for the Teichmüller flow equipped with “nice” measures.

Here, we recall that the rate of mixing/decay of correlations of a mixing flow ψ t is the speed of convergence to zero of the correlations functions \(C_{t}(\,f,g):=\int f \cdot g \circ \psi ^{t} -\left (\int f\right )\left (\int g\right )\) as t (for choices of “sufficiently smooth” observables f and g). Intuitively, the rate of mixing is a quantitative measurement of how fast the flow ψ t mix distinct regions of the phase space (such as the supports of the observables f and g). See, e.g., Sect. 6.16 of Hasselblatt’s lecture notes [Ha] for more comments.

In this context, given the ergodicity and mixing theorem of Burns–Masur–Wilkinson stated above, it is natural to try to “determine” the rate of mixing of WP flow. In this direction, we obtained the following result (cf. [BMMW]):

Theorem 2 (Burns–Masur-Matheus–Wilkinson)

The rate of mixing of WP flow on \(T^{1}\mathcal{M}_{g,n}\) (for “reasonably smooth” observables) is

  • at most polynomial for 3g − 3 + n > 1 and

  • rapid (superpolynomial) for 3g − 3 + n = 1.

We will present a sketch of proof of this result in the last section of this text. For now, we will content ourselves with a vague description of the geometrical reason for the difference in the rate of mixing of the Teichmüller and WP flows in Sect. 6.1.3 below.

6.1.2 Ergodicity of WP Flow: Outline of Proof

The initial idea to prove Burns–Masur–Wilkinson Theorem is the “usual” argument for the proof of ergodicity of a system exhibiting some hyperbolicity, namely, Hopf’s argument.

6.1.2.1 A Quick Review of Hopf’s Argument

Traditionally, Hopf’s argument runs as follows (cf. Sect. 4.3 of Hasselblatt’s lecture notes [Ha]). Given a smooth flow \((\psi ^{t})_{t\in \mathbb{R}}: X \rightarrow X\) on a compact Riemannian manifold (X, d) preserving the corresponding volume measure μ and a continuous observable \(f: X \rightarrow \mathbb{R}\), we consider the future and past Birkhoff averages:

$$\displaystyle{ f^{+}(x):=\lim \limits _{ T\rightarrow +\infty }\frac{1} {T}\int _{0}^{T}f(\psi ^{s}(x))\,ds\quad \text{and}\quad f^{-}(x):=\lim \limits _{ T\rightarrow -\infty }\frac{1} {T}\int _{0}^{T}f(\psi ^{s}(x))\,ds }$$

By the Birkhoff Ergodic Theorem (cf. Sect. 6.3 of [Ha]), for μ-almost every xX, the quantities f +(x) and f (x) exist and, actually, they coincide \(f^{+}(x) = f^{-}(x):= \tilde{f} (x)\). In the literature, a point x such that f +(x), f (x) exist and \(f^{+}(x) = f^{-}(x) = \tilde{f} (x)\) is called a Birkhoff generic point (with respect to μ).

By definition, the ergodicity of ψ t (with respect to μ) is equivalent to the fact that the functions f + and f are constant at μ-almost every point.

In order to show the ergodicity of a flow ψ t with some hyperbolicity, Hopf [Ho] observes that the function f +, resp. f , is constant along stable, resp. unstable, sets

$$\displaystyle{ W^{s}(x):=\{ y:\lim \limits _{ t\rightarrow +\infty }d(\psi ^{t}(\,y),\psi ^{t}(x)) = 0\},\text{resp.}W^{u}(x) =\{ y:\lim \limits _{ t\rightarrow -\infty }d(\psi ^{t}(\,y),\psi ^{t}(x))\,=\,0\}, }$$

i.e., f +(x) = f +( y) whenever yW s(x), resp. f (x) = f (z) whenever zW u(x). We leave the verification of this fact as an exercise to the reader.

In the case of an Anosov flow ψ t on X, we know that the stable and unstable sets are immersed submanifolds (cf. Sect. 5.5 of Hasselblatt’s notes [Ha]). Moreover, if one forgets about the flow direction, the stable and unstable manifolds have complementary dimensions and intersect transversely. Hence, given two points p, qX (lying in distinct orbits of ψ t), we can connect them using pieces of stable and unstable manifolds as shown in the figure below:

In particular, this indicates that a volume-preserving Anosov flow ψ t is ergodic because the functions f + and f are constant along stable and unstable manifolds, they coincide almost everywhere and any pair of points can be connected via pieces of stable and unstable manifolds. However, this argument towards ergodicity of ψ t is not complete yet: indeed, one needs to know that the intersection points z 1, , z n between the pieces of stable and unstable manifolds connecting p and q are Birkhoff generic in order to conclude that \(\tilde{f} (\,p) = \tilde{f} (z_{1}) =\ldots = \tilde{f} (z_{n}) = \tilde{f} (q)\).

In the original context of his article, Hopf [Ho] studies a geodesic flow ψ t of a compact surface of constant negative curvature, and he uses the fact that the stable and unstable manifolds form C 1 foliations to deduce that the intersection points z 1, , z n can be taken to be Birkhoff generic points. Indeed, since the invariant foliations are C 1 in his context, Hopf applies the Fubini Theorem to the set \(\mathcal{B}\) of full μ-volume consisting of Birkhoff generic points in order to ensure that almost all stable and unstable manifolds W s(x) and W u(x) intersect \(\mathcal{B}\) in a subset of total length measure of W s(x) and W u(x) (compare with the proof of Proposition 4.10 of [Ha]).

On the other hand, it is known that the stable and unstable manifolds of a general Anosov flow (such as geodesic flows on compact manifolds of variable negative curvature) do not form necessarily a C 1-foliation, but only Hölder-continuous foliations (see e.g., the papers of Anosov [A] and/or Hasselblatt [Ha94] for concrete examples). In particular, this is an obstacle to the argument à la Fubini of the previous paragraph. Nevertheless, Anosov [A] showed that the stable and unstable foliations of a smooth Anosov flow are always absolutely continuous, so that one can still apply the Fubini Theorem to conclude ergodicity along the lines of Hopf’s argument presented.

In summary, we know that a smooth (C 2) volume-preserving Anosov flow on a compact manifold is ergodic thanks to Hopf’s argument and the absolute continuity of stable and unstable foliations.

Remark 1

Robinson–Young [RoY] showed that the stable and unstable foliations of a C 1 Anosov system are not necessarily absolutely continuous. In particular, the smoothness (C 2) assumption on the Anosov flow is necessary for the ergodicity argument described above.

Remark 2

The absolute continuity of a foliation invariant under some system depends on some hyperbolicity. In fact, Shub–Wilkinson [SW] constructed examples of invariant central (along which the dynamics is neutral) foliations of certain partially hyperbolic diffeomorphisms failing to satisfy the Fubini Theorem: each leaf of these central foliations intersects a set of full volume exactly at one point! This phenomenon is sometimes referred to as Fubini’s nightmare in the literature (see, e.g., this article of Milnor [Mil]) and sometimes a foliation “failing” the Fubini Theorem is called a pathological foliation.

After this brief sketch of Hopf’s argument for the ergodicity of smooth volume-preserving Anosov flows on compact manifolds, let us explain the difficulties of extending this argument to the setting of WP flow.

6.1.2.2 Hopf’s Argument in the Context of WP Flow

As we already mentioned (cf. item (d) of the table above), the WP flow is singular hyperbolic. In a nutshell, this means that, even though WP flow is not Anosov, it is (morally) nonuniformly hyperbolic in the sense of Pesin theory and it satisfies some hyperbolicity estimates along pieces of orbits staying in compact parts of moduli space.

In particular, thanks to (Katok–Strelcyn [KS] version of) Pesin’s Stable-Manifold Theorem [Pe2], the stable and unstable sets of almost every point are immersed submanifolds, and, if we forget about the flow direction, the stable and unstable manifolds have complementary dimensions. Furthermore, the stable and unstable manifolds are part of absolutely continuous laminations. Here, it is important that the dynamics is sufficiently smooth (see, e.g., this paper of Pugh [P], and this paper of Bonatti–Crovisier–Shinohara [BCS]).

Thus, this gives hopes that Hopf’s argument could be applied to show the ergodicity of volume-preserving nonuniformly hyperbolic systems.

However, by inspecting the Fig. 6.1 above, we see that Hopf’s argument relies on the fact that stable and unstable manifolds of Anosov flows have a nice, well-controlled, geometry.

Fig. 6.1
figure 1

Connecting p and q with pieces of stable and unstable manifolds

For instance, if we start with a point p and we want to connect it with pieces of stable and unstable manifolds to a point q at a large distance, we have to make sure that the pieces of stable and unstable manifolds used in Fig. 6.1 are “uniform”, e.g., they are graphs of definite size and bounded curvature with respect to the splitting into stable and unstable directions, and, moreover, the angles between the stable and unstable directions are uniformly bounded away from zero.

Indeed, if the pieces of stable and unstable manifolds get shorter and shorter, and/or if they “curve” a lot, and/or the angles between stable and unstable directions are not bounded away from zero, one might not be able to reach/access q from p with stable and unstable manifolds:

As it turns out, while these kinds of nonuniformity do not occur for Anosov flows, they can actually occur for certain nonuniformly hyperbolic systems. More precisely, the sizes and curvatures of stable and unstable manifolds, and the angles between stable and unstable directions of a general nonuniformly hyperbolic system vary only measurably from point to point (Fig. 6.2).

Fig. 6.2
figure 2

Pesin stable and unstable manifolds with “bad” geometry

In particular, this excludes a priori a naive generalization of Hopf’s ergodicity argument for nonuniformly hyperbolic systems, and, in fact, there are concrete examplesFootnote 3 by Dolgopyat–Hu-Pesin [DHP] of volume-preserving nonuniformly hyperbolic systems with countably many ergodic components consisting of invariant sets of positive volumes that are essentially open.

In summary, the ergodicity of a nonuniformly hyperbolic system depends on the particular dynamical features of the given system.

In this direction, there is an important literature dedicated to the construction of large classes of ergodic nonuniformly hyperbolic systems: for example, the ergodicity of several classes of billiards was shown by Sinai [S70], Bunimovich [Bu74], Bunimovich–Chernov–Sinai [BCS91] among others (see also Chernov–Markarian’s book [CM]) and the ergodicity of nonuniformly hyperbolic systems exhibiting partial hyperbolicity (or dominated splitting) was shown by Pugh–Shub [PS89], Rodriguez Hertz [RH], Tahzibi [T], Burns–Wilkinson [BW], Rodriguez Hertz–Rodriguez Hertz–Ures [RHRHU] among others.

For the proof of their ergodicity result for the WP flow, Burns–Masur–Wilkinson take part of their inspiration from the work of Katok–Strelcyn [KS] where Pesin’s theory [Pe2] (of existence and absolute continuity of stable manifolds) is extended to singular hyperbolic systems.

In a nutshell, the basic philosophy behind Katok–Strelcyn’s work is the following. Given a nonuniformly hyperbolic system with some nontrivial singular set, all dynamical features predicted by Pesin theory in virtue of the (nonuniform) exponential contraction and expansion are not affected if the loss of control on the system is at most polynomial as one approaches the singular set. In other terms, the exponential (hyperbolic) behavior of a singular system is not disturbed by the presence of a singular set where the first two derivatives of the system lose control in a polynomial way. In particular, this hints that Hopf’s argument can be extended to singular hyperbolic systems with polynomially bad singular sets.

In this context, Burns–Masur–Wilkinson shows the following ergodicity criterion for singular hyperbolic geodesic flows (cf. Theorem 3.1 of [BMW]).

Let N be the quotient N = MΓ of a contractible, negatively curved, possibly incomplete, Riemannian manifold M by a subgroup Γ of isometries of M acting freely and properly discontinuously. By slightly abusing notation, we denote by d the metrics on N and M induced by the Riemannian metric of M.

We consider \(\overline{N}\) the (Cauchy) metric completion of the metric space (N, d), i.e., the (complete) metric space consisting of all equivalence classes of Cauchy sequences {x n } ⊂ N under the relation {x n } ∼ {y n } if and only if \(\lim \limits _{n\rightarrow \infty }d(x_{n},y_{n}) = 0\) equipped with the metric \(d(\{x_{n}\},\{z_{n}\}) =\lim \limits _{n\rightarrow \infty }d(x_{n},z_{n})\), and we define the (Cauchy) boundary \(\partial N:= \overline{N} - N\).

Theorem 3 (Burns–Masur–Wilkinson Ergodicity Criterion for Geodesic Flows)

 Let N = MΓ be a manifold as above. Suppose that:

  1. (I)

    the universal cover M of N is geodesically convex, i.e., for every p, qM, there exists an unique geodesic segment in M connecting p and q.

  2. (II)

    the metric completion \(\overline{N}\) of (N, d) is compact.

  3. (III)

    the boundary ∂N is volumetrically cusplike, i.e., for some constants C > 1 and ν > 0, the volume of a ρ-neighborhood of the boundary satisfies

    $$\displaystyle{ \mathit{\text{Vol}}(\{x \in N: d(x,\partial N) <\rho \}) \leq C\rho ^{2+\nu } }$$

    for every ρ > 0.

  4. (IV)

    N has polynomially controlled curvature, i.e., there are constants C > 1 and β > 0 such that the curvature tensor R of N and its first two derivatives satisfy the following polynomial bound

    $$\displaystyle{ \max \{\|R(x)\|,\|\nabla R(x)\|,\|\nabla ^{2}R(x)\|\} \leq Cd(x,\partial N)^{-\beta } }$$

    for every xN.

  5. (V)

    N has polynomially controlled injectivity radius, i.e., there are constants C > 1 and β > 0 such that

    $$\displaystyle{ \mathit{\text{inj}}(x) \geq (1/C)d(x,\partial N)^{\beta } }$$

    for every xN (where inj(x) denotes the injectivity radius at x).

  6. (VI)

    The first derivative of the geodesic flow φ t is polynomially controlled, i.e., there are constants C > 1 and β > 0 such that, for every infinite geodesic γ on N and every t ∈ [0, 1]:

    $$\displaystyle{ \|D_{\stackrel{.}{\gamma }(0)}\varphi _{t}\| \leq Cd(\gamma ([-t,t]),\partial N)^{\beta } }$$

Then, the Liouville (volume) measure m of N is finite, the geodesic flow φ t on the unit cotangent bundle T 1 N of N is defined at m-almost every point for all time t, and the geodesic flow φ t is nonuniformly hyperbolic (in the sense of Pesin’s theory) and ergodic.

Actually, the geodesic flow φ t is Bernoulli and, furthermore, its measure-theoretic entropy h(φ t ) is positive, finite and h(φ t ) is given by Pesin’s entropy formula (i.e., h(φ t ) is the sum of positive Lyapunov exponents of φ t counted with multiplicities).

The proof of this ergodicity criterion for geodesic flows was one of the main motivations of Burns’ lectures (see [Bu]) and, for this reason, we will not discuss it here. Instead, we will always assume Theorem 3 in the sequel, so that the proof of Theorem 1 (ergodicity of the WP flow) will be completeFootnote 4 once we show that the moduli space of Riemann surfaces equipped with the WP metric satisfies the six items (I) to (VI) above.

6.1.2.3 A Brief Comment on the Verification of the Ergodicity Criterion for WP Flow

In comparison with previously known results in the literature, some of the main novelties in Burns–Masur–Wilkinson work [BMW] concern the verification of items (IV) and (VI) for the WP metric: in fact, those items are the most delicate to check and their verifications are strongly based on important previous works of McMullen [McM00] and Wolpert [Wo03, Wo08, Wo09, Wo11].

In any case, this completes our outline of the proof of Burns–Masur–Wilkinson Theorem on the ergodicity of WP flow.

6.1.3 Rates of Mixing of WP Flow

As we mentioned above, both Teichmüller and WP flows are uniformly hyperbolic in compact parts of the moduli space of curves. Since an uniformly hyperbolic system is (usually) exponentially mixing, the sole obstacle preventing an exponential rate of mixing for these flows is the possibility that a “big” set of orbits spends a “lot” of time near infinity (or rather the boundary of the moduli space) before coming back to the compact parts.

In the case of Teichmüller flow, the volume in Teichmüller metric of aρ-neighborhood of the boundary of moduli space is exponentially small.Footnote 5

Intuitively, this says that the “probability” that an orbit spends a long time near the boundary of moduli space is exponentially small (cf. Theorem 2.15 of Avila–Gouëzel–Yoccoz paper [AGY]). In particular, the excursions near infinity of most orbits is not long enough to disrupt the exponential rate of mixing “imposed” by hyperbolic dynamics of the Teichmüller flow on compact parts. Of course, this is merely a vague intuition behind the exponential mixing of the Techmüller flow and the curious reader is encouraged to consult the articles of Avila–Gouëzel–Yoccoz [AGY] and Avila–Gouëzel [AG] for detailed explanations.

On the other hand, in the context of the WP flow, we will see that the volume in WP metric of ρ-neighborhood of the boundary of moduli space is ≃ ρ 4 (compare with Lemma 6.1 of [BMW]).

Therefore, the “probability” that an orbit of WP flow spends a long time near infinity could be only polynomially small but not exponentially small. In particular, this possibility might conspire against an exponential mixing of WP flow.

In fact, in our joint work [BMMW] with Burns, Masur and Wilkinson, we construct a subset A ρ of volume ≃ ρ 8 of orbits of WP flow staying near infinity for a time ≃ 1∕ρ (at least). For this sake, we use some estimates of Wolpert [Wo09] (see also Propositions 4.11, 4.12 and 4.13 in Burns–Masur–Wilkinson paper [BMW]) saying that the geometry of WP metric on the moduli space of Riemann surfaces of genus g ≥ 2 looks like a product of the WP metrics on the moduli spaces of curves of lower genera 1 ≤ g′ < g. In particular, the set A ρ is chosen to correspond to geodesics travelling almost parallel to one of the factors of the product for a relatively long time.

Of course, the existence of such sets A ρ means that the rate of mixing of WP flow ψ t can not be very fast. Indeed, by taking g ρ a “smooth approximation” of the characteristic function of A ρ (i.e., 0 ≤ g ρ ≤ 1 supported on A ρ and ∫g ρ ρ 8), and by letting f be a fixed smooth function supported on the compact part (away from infinity), we see that

$$\displaystyle{ \vert C_{t}(\,f,g_{\rho })\vert:= \left \vert \int f \cdot g_{\rho } \circ \psi ^{t} -\left (\int f\right )\left (\int g_{\rho }\right )\right \vert = \left (\int f\right )\left (\int g_{\rho }\right ) \simeq \rho ^{8} }$$

for 0 ≤ t ≤ 1∕ρ. In fact, the second equality follows because f is supported in the compact part of the moduli space, g ρ ψ t is supported on ψ t(A ρ ) and the set ψ t(A ρ ) is disjoint from the compact part for 0 ≤ t ≤ 1∕ρ (by construction of A ρ ), so that f ⋅ g ρ ψ t ≡ 0 for 0 ≤ t ≤ 1∕ρ. Therefore, at time t = 1∕ρ, we deduce that C t ( f, g ρ ) ≃ 1∕t 8, and, hence, the correlation functions associated to WP flow ψ t can not decay faster than a polynomial function of degree > 8 of 1∕t as the time t. In particular, this explains the first part of the statement of Theorem 2.

Finally, let us remark that this argument does not work in genus g = 1 because the crucial fact (in the construction of the set A ρ ) that the WP metric looks like the product of WP metrics in moduli spaces of lower genera breaks down in genus g = 1. Indeed, in this situation, the moduli space is naturally compactified by adding a single point (because the moduli space in lower genus g = 0 is trivial) and so the WP metric does not behave like a product (or, more precisely, no sectional curvature approaches zero as we get close to infinity). In this case, one can exploit this “absence of zero curvatures at infinity” to show that the rate of mixing of the WP flow on the moduli space of torii is rapid, i.e., faster than any polynomial function of 1∕t. In particular, this explains the second part of the statement of Theorem 2.

Concluding this Subsection, let us observe that Theorem 2 does not claim that the rate of mixing of the WP flow on moduli space of curves of genus g ≥ 2 is genuinely polynomial.

Indeed, recall that the naive intuition says that the rate of mixing is polynomial if we can show that most orbits do not spend long time near infinity.

Of course, this would not be the case if the WP metric is very close to a product metric, or, more precisely, if some sectional curvatures of WP metric are very close to zero: in fact, the structure of a product metric near infinity would allow for several orbits to travel almost parallel to the factors of the product (and, hence, near infinity) for a long time.

So, we need estimates saying how fast the sectional curvatures of WP metric approach zero as one gets close to infinity, and, unfortunately, the best formulas for the sectional curvatures of WP metric near infinity available so far (due to Wolpert [Wo09]) do not give this type of information (because of certain potential cancellations in Wolpert’s calculations).

6.1.4 Organization of the Text

The remainder of these lectures notes are divided into three sections. Section 6.2 contains introductory material on moduli spaces and WP metrics. Section 6.3 is dedicated to the proof of Theorem 1. Finally, Sect. 6.4 gives a sketch of the proof of Theorem 2.

6.2 Moduli Spaces of Riemann Surfaces and the Weil–Petersson Metric

The main purposes of this section are the following. In the next seven subsections below, we recall the definitions and basic properties of the moduli spaces of Riemann surfaces and their cotangent bundles, and we introduce the Weil–Petersson (and Teichmüller) metric(s). In particular, the definition of the main actor of these lecture notes, namely the Weil–Petersson geodesic flow, is presented in details in Sect. 6.2.7. The basic reference for these subsections is Hubbard’s book [Hu].

Finally, we fulfill in the last subsection the promise made in footnote 4 to explain the subtle point in the reduction of the ergodicity of WP flow (Theorem 1) to the ergodicity criterion for geodesic flows (Theorem 3) related to the orbifoldic nature of moduli spaces (cf. Sect. 6.2.8). Of course, this is a technicality about moduli spaces and the reader might wish to skip this subsection in a first reading of this text.

6.2.1 Definition and Examples of Moduli Spaces

Let S be a fixed topological surface of genus g ≥ 0 with n ≥ 0 punctures. The moduli space \(\mathcal{M}(S) =\mathcal{ M}_{g,n}\) is the set of Riemann surface structures on S modulo biholomorphisms (conformal equivalences).

Example 1 (Moduli Space of Triply Punctured Spheres)

The moduli space \(\mathcal{M}_{0,3}\) of triply punctured spheres consists of a single point

$$\displaystyle{ \mathcal{M}_{0,3} =\{ \overline{\mathbb{C}} -\{ 0,1,\infty \}\} }$$

where \(\overline{\mathbb{C}}\) denotes the Riemann sphere. Indeed, this is a consequence of the fact that the group of biholomorphisms (Möbius transformations) of the Riemann sphere \(\overline{\mathbb{C}}\) is simply 3-transitive, i.e., given 3 points \(x,y,z \in \overline{\mathbb{C}}\), there exists an unique biholomorphism of \(\overline{\mathbb{C}}\) sending x, y and z (resp.) to 0, 1 and (resp.).

Example 2 (Moduli Space of Once Punctured Torii)

The moduli space \(\mathcal{M}_{1,1}\) of once punctured torii is

$$\displaystyle{ \mathcal{M}_{1,1} = \mathbb{H}/SL(2, \mathbb{Z}) }$$

where \(SL(2, \mathbb{Z})\) acts on the hyperbolic half-plane \(\mathbb{H}:=\{ z \in \mathbb{C}: \text{Im}(z)> 0\}\) via Möbius transformations, i.e., \(\left (\begin{array}{cc} a&b\\ c &d \end{array} \right ) \in SL(2, \mathbb{Z})\) acts on \(\mathbb{H}\) via

$$\displaystyle{ \left (\begin{array}{cc} a&b\\ c &d \end{array} \right )z:=\frac{az + b} {cz + d} }$$

Indeed, this follows from the facts that:

  • a complex torus with a marked point is biholomorphic to a “normalized” lattice \(\mathbb{C}/(\mathbb{Z} \oplus \mathbb{Z}z)\) for some \(z \in \mathbb{H}\) (with the marked point corresponding to the origin), and

  • two “normalized” lattices \(\mathbb{C}/(\mathbb{Z} \oplus \mathbb{Z}z)\) and \(\mathbb{C}/(\mathbb{Z} \oplus \mathbb{Z}w)\) are biholomorphic if and only if \(w = \frac{az+b} {cz+d}\) for some \(\left (\begin{array}{cc} a&b\\ c &d\end{array} \right ) \in SL(2, \mathbb{Z})\).

The second example reveals an interesting feature of \(\mathcal{M}_{1,1}\): it is not a manifold, but only an orbifold. In fact, the stabilizer of the action of \(SL(2, \mathbb{Z})\) on \(\mathbb{H}\) at a typical point is trivial, but it has order 2 at \(i \in \mathbb{H}\) and order 3 at \(\exp (\pi i/3) \in \mathbb{H}\) (this happens because a typical torus has no symmetry, but the square and hexagonal torii have some symmetries). In particular, \(\mathcal{M}_{1,1}\) is topologically an once punctured sphere with two conical singularities at i and exp(πi∕3). The figure below is a classical fundamental domain of the action of \(SL(2, \mathbb{Z})\) on \(\mathbb{H}\) together with the actions of the matrices \(T = \left (\begin{array}{cc} 1&1\\ 0 &1 \end{array} \right )\) and \(J = \left (\begin{array}{cc} 0& - 1\\ 1 & 0 \end{array} \right )\) is shown in Fig. 6.3.

Fig. 6.3
figure 3

Fundamental domain \(\{z \in \mathbb{H}: \vert \text{Re}(z)\vert \leq 1/2,\vert z\vert \geq 1\}\) for \(\mathbb{H}/SL(2, \mathbb{Z})\)

As it turns out, all moduli spaces \(\mathcal{M}_{g,n}\) are complex orbifolds. In order to see this fact, we need to introduce some auxiliary structures (including the notions of Teichmüller spaces and mapping-class groups).

Remark 3

From now on, we will restrict our attention to the case of a topological surface S of genus g ≥ 0 with n ≥ 0 punctures such that 3g − 3 + n > 0. In this case, the uniformization theorem says that a Riemann surface structure X on S is conformally equivalent to a quotient \(\mathbb{H}/\varGamma\) of the hyperbolic upper half-plane \(\mathbb{H}\) by a discrete subgroup of \(SL(2, \mathbb{R})\) (isomorphic to the fundamental group of S). Moreover, the hyperbolic metric \(\tilde{\rho }= \frac{\vert dz\vert } {\text{Im}(z)}\) on \(\mathbb{H}\) descends to a finite area hyperbolic metric ρ on \(\mathbb{H}/\varGamma\) and, in fact, ρ is the unique Riemannian metric of constant curvature − 1 on X inducing the same conformal structure. (See, e.g., Hubbard’s book [Hu] for more details)

6.2.2 Teichmüller Metric

Let us start by endowing the moduli spaces with the structure of complete metric spaces.

By definition, a metric on \(\mathcal{M}(S)\) corresponds to a way to measure the distance between two points in \(\mathcal{M}(S)\). A natural way of telling how far apart are two conformal structures on S is by the means of quasiconformal maps.

Very roughly speaking, the idea is that even though by definition there is no conformal maps (biholomorphisms) between conformal structures S 0 and S 1 corresponding two distinct points of \(\mathcal{M}(S)\), one has several quasiconformal maps between them, that is, f: S 0S 1 such that the quantity

$$\displaystyle{ K(\,f) =\sup \limits _{x\in S_{0}} \frac{\vert \partial f(x)/\partial z\vert + \vert \partial f(x)/\partial \overline{z}\vert } {\vert \partial f(x)/\partial z\vert -\vert \partial f(x)/\partial \overline{z}\vert } \geq 1 }$$

is finite.

Here, it is worth to point out that K( f) is measuring the largest possible eccentricity among all infinitesimal ellipses in the tangent planes T f(x) S 1 obtained as images under Df(x) of infinitesimal circles on the tangent planes T x S 0, and, moreover, f: S 0S 1 is conformal if and only if K( f) = 1. See Hubbard’s book [Hu] for details (including some pictures of the geometrical meaning of K( f)).

This motivates measuring the “distance” between S 0 and S 1 via the formula:

$$\displaystyle{ d_{T}(S_{0},S_{1}) =\inf _{f: S_{ 0}\rightarrow S_{1}\text{ quasiconformal }}\log K(\,f) }$$

This function d T (⋅ , ⋅ ) is the so-called Teichmüller metric and, as the nomenclature suggests, it can be shown that d T (⋅ , ⋅ ) is a metric on \(\mathcal{M}(S)\).

The moduli space \(\mathcal{M}(S)\) endowed with d T (⋅ , ⋅ ) is a complete metric space.

Example 3

The Teichmüller metric on the moduli space \(\mathcal{M}_{1,1} = \mathbb{H}/SL(2, \mathbb{Z})\) of once-punctured torii can be shown to coincide with the hyperbolic metric induced by Poincaré’s metric on \(\mathbb{H}\) (see Hubbard’s book).

6.2.3 Teichmüller Spaces and Mapping-Class Groups

Once we know that the moduli spaces are topological spaces (and, actually, complete metric spaces), we can start the discussion of its (orbifold) universal cover.

In this direction, we need to describe the “fiber” in the universal cover of a point X of \(\mathcal{M}(S)\) (i.e., a Riemann surface structure on S). In other terms, we need to add “extra information” to X. As it turns out, this “extra information” has topological nature and it is called a marking.

More precisely, a marked complex structure (on S) is the data of a Riemann surface X together with a homeomorphism f: SX (called marking).

By analogy with the notion of moduli spaces, we define the Teichmüller space Teich(S) as the set of Teichmüller equivalence classes of marked complex structures, where two marked complex structures f: SX 1 and g: SX 2 are Teichmüller equivalent whenever there exists a conformal map h: X 1X 2 isotopic to gf −1. In other words, the Teichmüller space is the “moduli space of marked complex structures”.

The Teichmüller metric d T (⋅ , ⋅ ) also makes sense on the Teichmüller space Teich(S) and the metric space (Teich(S), d T ) is also complete.

From the definitions, we see that one can recover the moduli space from the Teichmüller space by forgetting the “extra information” given by the markings. Equivalently, \(\mathcal{M}(S) = Teich(S)/MCG(S)\) where MCG(S) = MCG g, n is the so-called mapping-class group of isotopy classes of orientation-preserving homeomorphisms of S.

The mapping-class group is a discrete group acting on Teich(S) by isometries of the Teichmüller metric d T . Moreover, by Hurwitz Theorem (and our standing assumption that 3g − 3 + n > 0), the MCG(S)-stabilizer of any point of Teich(S) is finite (of cardinality ≤ 84(g − 1) when g > 1), but it might vary from point to point because some Riemann surfaces are more symmetric than others (see, e.g., the paragraph after Example 2 above).

Example 4

The Teichmüller space Teich 1,1 of once-punctured torii is

$$\displaystyle{ Teich_{1,1} \simeq \mathbb{H}. }$$

Indeed, as we already mentioned (cf. Example 2), the set of once-punctured torii is parametrized by normalized lattices \(\varLambda (w) = \mathbb{Z} \oplus \mathbb{Z}w\), \(w \in \mathbb{H}\), and there is a conformal map between \(\mathbb{C}/\varLambda (w)\) and \(\mathbb{C}/\varLambda (w')\) if and only if \(w' = \frac{aw+b} {cw+d}\), \(\left (\begin{array}{cc} a&b\\ c &d \end{array} \right ) \in \ SL(2, \mathbb{Z})\). From this, onecan check that \(Teich_{1,1} = \mathbb{H}\) and \(MCG_{1,1} = SL(2, \mathbb{Z})\) (because the conformal map associated to \(\left (\begin{array}{cc} a&b\\ c &d \end{array} \right )\) is isotopic to the identity if and only if \(\left (\begin{array}{cc} a&b\\ c &d \end{array} \right ) = Id\)).

The Teichmüller space Teich(S) is the (orbifold) universal cover of \(\mathcal{M}(S)\) and MCG(S) is the (orbifold) fundamental group of \(\mathcal{M}(S)\) (compare with the example above). A common way to see this fact passes through showing that Teich(S) is simply connected (and even contractible) because it admits a global system of coordinates called Fenchel–Nielsen coordinates (providing an homeomorphism between Teich(S) and \(\mathbb{R}^{6g-6+n}\)). The discussion of these coordinates is the topic of the next subsection.

6.2.4 Fenchel–Nielsen Coordinates

In order to introduce the Fenchel–Nielsen coordinates, we need the notion of pants decomposition. A pants (trouser) decomposition of S is a collection {α 1, , α 3g−3+n } of 3g − 3 + n simple closed curves on S that are pairwise disjoint, homotopically nontrivial (i.e., not homotopic to a point) and nonperipheral (i.e., not homotopic to a small loop around one of the possible punctures of S). The picture below illustrates a pants decomposition of a compact surface ofgenus 2:

The nomenclature “pants decomposition” comes from the fact that if we cut S along the curves α j , j = 1, , 3g − 3 + n (i.e., we consider the connected components of the complement of these curves), then we see “pairs of pants” (topologically equivalent to a triply punctured sphere):

A remarkable fact about pair of pants is that hyperbolic structures on them are uniquely determined by the lengths of their boundary components. In other terms, a trouser with j boundary circles ( j = 1, 2 or 3) has a j-dimensional space of hyperbolic structures (parametrized by the lengths of these j-circles). Alternatively, one can construct trousers out of right-angled hexagons in the hyperbolic plane (see, e.g., Theorem 3.5.8 in Hubbard’s book [Hu]).

In this setting, the Fenchel–Nielsen coordinates can be described as follows. We fix P = { α 1, , α 3g−3+n } a pants decomposition and we consider

$$\displaystyle{ \mathcal{FN}_{P}: Teich(S) \rightarrow (\mathbb{R}_{+} \times \mathbb{R})^{3g-3+n} }$$

defined by \(\mathcal{FN}_{P}(\,f: S \rightarrow X) = (\ell_{\alpha _{1}},\tau _{\alpha _{1}},\ldots,\ell_{\alpha _{3g-3+n}},\tau _{\alpha _{3g-3+n}})\), where α is the hyperbolic length of αP with respect to the hyperbolic structure associated to the marked complex structure f: SX, and τ α is a twist parameter measuring the “relative displacement” of the pairs of pants glued at α.

A detailed description of twist parameters can be found in Sect. 7.6 of Hubbard’s book [Hu], but, for now, let us just make some quick comments about them. First, we fix (in an arbitrary way) a collection of simple arcs joining the boundaries of the pairs of pants determined by P such that these arcs land at the same point whenever they come from opposite sides of α j P.

From these arcs, we get a collection P of simple closed curves on S looking like this:

Consider now a pair of trousers sharing a curve αP (they might be the same trouser) and let γ be an arc of a curve in P joining two boundary components A(γ ) and B(γ ) of the union of these trousers:

Given a marked complex structure f: SX, consider the unique arc α(γ ) on X homotopic to f(γ ) (relative to the boundary of the union of the pair of trousers) consisting of two minimal geodesic arcs connecting αP to A(γ ) and B(γ ) and an immersed geodesic δ(γ ) moving inside αP. We define the twist parameter τ α ( f: SX) as the oriented length of δ(γ ) counted as positive if it turns to the right and negative if it turns to the left.

Remark 4

Since the definition of twist parameters depend on the choice of P , these parameters are well-defined only up to an additive constant. Nevertheless, this technical difficulty does not lead to any serious issue.

The Fig. 6.4 below illustrates two markings f: SX and g: SY whose twist parameters differ by

$$\displaystyle{ \tau _{\alpha }(g: S \rightarrow Y ) =\tau _{\alpha }(\,f: S \rightarrow X) + 2\ell_{\alpha }(\,f: S \rightarrow X) }$$
Fig. 6.4
figure 4

Concrete calculation of twist parameters

In any case, it is possible to show the Fenchel–Nielsen coordinates \(\mathcal{FN}_{P}\) associated to any pants decomposition P is a global homeomorphism (see, e.g., Theorem 7.6.3 in Hubbard’s book [Hu]). In particular, the Teichmüller space Teich(S) is simply connected (as it is homeomorphic to \(\mathbb{R}^{6g-6+2n}\)). Hence, it is the orbifold universal cover of the moduli space \(\mathcal{M}(S)\) (and the mapping-class group MCG(S) is the orbifold fundamental group of \(\mathcal{M}(S) = Teich(S)/MCG(S)\)).

This partly explain why one discusses the properties of \(\mathcal{M}(S)\) and Teich(S) at the same time.

6.2.5 Cotangent Bundle to Moduli Spaces of Riemann Surfaces

Another reason for studying \(\mathcal{M}(S)\) and Teich(S) together is because Teich(S) is a manifold while \(\mathcal{M}(S)\) is only an orbifold. In fact, the Teichmüller spaces Teich(S) are real-analytic manifolds. Indeed, the real-analytic structure on Teich(S) comes from the uniformization theorem. More precisely, given a marked complex structure f: SX, we can apply the uniformization theorem to write \(X = \mathbb{H}/\varGamma\) where \(\varGamma \subset SL(2, \mathbb{R})\) is a discrete subgroup isomorphic to the fundamental group π 1(S) of S. In other words, from a marked complex structure f: SX, we have a representation of π 1(S) on \(SL(2, \mathbb{R})\) (well-defined modulo conjugation), and this permits to identify Teich(S) with an open component of the character variety of homomorphisms from π 1(S) to \(SL(2, \mathbb{R})\) modulo conjugacy. In particular, the pullback of the real-analytic structure of this representation variety to endow Teich(S) with its own real-analytic structure.

Actually, as it turns out, this real-analytic structure of Teich(S) can be “upgraded” to a complex-analytic structure. One way of seeing this uses a “generalization” of the construction of the real-analytic structure above based on the complex-analytic structure on the representation variety of π 1(S) in \(SL(2, \mathbb{C})\) and Bers simultaneous uniformization theorem [Bers]. We will discuss this point later (in Sect. 6.3).

Remark 5

This should be compared with the following “toy model” situation.

Let E be a real vector space of dimension 2n and denote by \(\mathcal{J}(E)\) the set of linear complex structures Footnote 6 on E. It is possible to check that a linear complex structure on E is equivalent to the data of a complex subspace \(K \subset \mathbb{C} \otimes _{\mathbb{R}}E\) of the complexification \(\mathbb{C} \otimes _{\mathbb{R}}E\) of E such that \(\text{dim}_{\mathbb{C}}K = n\) and \(K \cap \overline{K} =\{ 0\}\) (i.e., \(\mathbb{C} \otimes _{\mathbb{R}}E = K \oplus \overline{K}\)) where \(\overline{K}\) is the complex conjugate of K.

Since the Grassmannian manifold \(Gr_{n}(\mathbb{C} \otimes _{\mathbb{R}}E)\) of complex subspaces of \(\mathbb{C} \otimes _{\mathbb{R}}E\) of complex dimension n is naturally a complex manifold and the condition \(K \cap \overline{K} =\{ 0\}\) is open in \(Gr_{n}(\mathbb{C} \otimes _{\mathbb{R}}E)\), we obtain that the set \(\mathcal{J}(E)\) parametrizing complex structures on E is itself a complex manifold.

Let us now sketch the relationship between the quadratic differentials on Riemann surfaces and the cotangent bundle to Teichmüller and moduli spaces.

6.2.6 Integrable Quadratic Differentials

The Teichmüller metric was defined via the notion of quasiconformal mappings f: S 0S 1. By inspecting the nature of this notion, we see that the quantities \(k(\,f,x) = \dfrac{\vert \partial f(x)/\partial \overline{z}\vert } {\vert \partial f(x)/\partial z\vert }\) (related to the eccentricities of infinitesimal ellipses obtained as the images under Df(x) of infinitesimal circles) play an important role in the definition of the Teichmüller distance between S 0 and S 1.

The measurable Riemann Mapping Theorem of Alhfors and Bers (see, e.g., page 149 of Hubbard’s book [Hu]) says that the quasiconformal map f can be recovered from the quantities k( f, x) up to composition with conformal maps. More precisely, by collecting the quantities k( f, x) in a globally defined tensor of type (−1, 1)

$$\displaystyle{ \mu (x) = \frac{(\partial f(x)/\partial \overline{z})d\overline{z}} {(\partial f(x)/\partial z)dz} }$$

with \(\|\mu \|_{L^{\infty }} <1\) called Beltrami differential, one can “recover” f by solving Beltrami’s equation

$$\displaystyle{ (\partial f/\partial z) =\mu \cdot (\partial f/\partial \overline{z}) }$$

in the sense that there is always a solution to the equation and, furthermore, two solutions f and g differ by a conformal map (i.e., g = fφ).

In other terms, the deformations of complex structures are intimately related to Beltrami differentials and it is not surprising that Beltrami differentials can be used to describe the tangent bundle of Teich(S). In this setting, we can obtain the cotangent bundle T Teich(S) by noticing that there is a natural pairing between bounded (L ) Beltrami differentials μ and integrable (L 1) quadratic differentials q (i.e., a tensor of type (2, 0), q = q(z)dz 2):

$$\displaystyle{ \langle \mu,q\rangle =\int \mu q =\int \mu (z)q(z)\frac{d\overline{z}} {dz}dz^{2} =\int \mu (z)q(z)dz\,d\overline{z} }$$

because \(dz\,d\overline{z}\) is an area form and μ(z)q(z) is integrable. In this way, it can be shown that the cotangent space T X Teich(S) at a point f: SX of Teich(S) is naturally identified to the space Q(X) of integrable quadratic differentials on X.

Note that the space of integrable quadratic differentials Q(X) provides a concrete way of manipulating the complex structure of Teich(S): in this setting, the complex structure is just the multiplication by i on the space of quadratic differentials.

Remark 6

By a theorem of Royden (see Hubbard’s book), the mapping-class group MCG(S) is the group of complex-analytic automorphisms of Teich(S). In particular, the moduli space \(\mathcal{M}(S) = Teich(S)/MCG(S)\) is a complex orbifold.

6.2.7 Teichmüller and Weil–Petersson Metrics

The description of the cotangent bundle of Teich(S) in terms of quadratic differentials allows us to define the Teichmüller and Weil–Petersson metrics in the following way.

Given a point f: SX of Teich(X), we endow the cotangent space T X Teich(S) ≃ Q(X) with the L p-norm:

$$\displaystyle{ \|\psi \|_{p}:= \left (\int \rho ^{2-2p}\vert \psi \vert ^{p}\right )^{1/p} }$$

where ρ is the hyperbolic metric associated to the conformal structure X and ψ is a quadratic differential (i.e., a tensor of type (2, 0)).

Remark 7

More generally, we define the L p-norm of a tensor ψ of type (r, s) (i.e., \(\psi =\psi (z)dz^{r}\,d\overline{z}^{s}\)) as:

$$\displaystyle{ \|\psi \|_{p}:= \left (\int \rho ^{2-p(r+s)}\vert \psi \vert ^{p}\right )^{1/p} }$$

In this notation, the infinitesimal Teichmüller metric is the family of L 1-norms on the fibers T X Teich(S) of the cotangent bundle of Teich(S). Here, the nomenclature “infinitesimal Teichmüller metric” is justified by the fact that the “global” Teichmüller metric (defined by the infimum of the eccentricity factors K( f) of quasiconformal maps f: X 0X 1) is the Finsler metric induced by the “infinitesimal” Teichmüller metric (see, e.g., Theorem 6.6.5 of Hubbard’s book).

In a similar vein, the Weil–Petersson (WP) metric is the family of L 2-norms on the fibers T X Teich(S) of the cotangent bundle of Teich(S).

Remark 8

In the definition of the WP metric, it was implicit that an integrable quadratic differential has finite L 2-norm (and, actually, all L p-norms are finite, 1 ≤ p). This fact is obvious when the S is compact, but it requires a (simple) computation when S has punctures. See, e.g., Proposition 5.4.3 of Hubbard’s book for the details.

For later use, we will denote the (infinitesimal) Teichmüller metric, resp., Weil–Petersson metric, as ∥. ∥ T , resp. ∥. ∥ WP .

The Teichmüller metric ∥. ∥ T is a Finsler metric: the family of L 1-norms on the fibers of T Teich(S) vary in a C 1 but not C 2 way (cf. Lemma 7.4.3 and Proposition 7.4.4 in Hubbard’s book).

Remark 9

The first derivative of the Teichmüller metric is not hard to compute. Given two cotangent vectors p, qQ(X) with ∥q T ≠ 0, we affirm that

$$\displaystyle{ D\|.\|_{T}(q) \cdot p =\int _{X}\text{Re}\left ( \frac{\overline{q}} {\vert q\vert }p\right ) }$$

Indeed, the first derivative is \(D\|.\|_{T}(q) \cdot p:=\lim \limits _{t\rightarrow 0}\frac{1} {t} \int _{X}(\vert q + tp\vert -\vert q\vert )\). Since | q + tp | − | q | ≤ t | p | and pQ(X) is bounded (i.e., its L norm is finite), we can use the Dominated-Convergence Theorem to obtain that

$$\displaystyle{ D\|.\|_{T}(q) \cdot p =\int _{X}\lim \limits _{t\rightarrow 0}\frac{\vert q + tp\vert -\vert q\vert } {t} =\int _{X}\text{Re}\left ( \frac{\overline{q}} {\vert q\vert }p\right ) }$$

The Weil–Petersson metric ∥. ∥ WP is induced by the Hermitian inner product

$$\displaystyle{ \langle q_{1},q_{2}\rangle _{WP}:=\int _{X}\frac{\overline{q_{1}}q_{2}} {\rho ^{2}} }$$

As usual, the real part g WP := Re〈⋅ , ⋅ 〉 WP induces a real inner product (also inducing the WP metric), while the imaginary part ω WP := Im〈⋅ , ⋅ 〉 WP induces a symplectic form (i.e., an antisymmetric bilinear form).

By definition, the Weil–Petersson metric g WP relates to the Weil–Petersson symplectic form ω WP and the complex structure J on Teich(S) (i.e., multiplication by i of elements of Q(X)) via:

$$\displaystyle{ g_{WP}(q_{1},q_{2}) =\omega _{WP}(q_{1},Jq_{2}) }$$

Furthermore, as it was firstly discovered by Weil [Weil] by means of a “simple-minded calculation” (“calcul idiot”) and later confirmed by others, it is possible to show that the Weil–Petersson metric is Kähler, i.e., the Weil–Petersson symplectic form ω WP is closed (that is, its exterior derivative vanishes: WP = 0). See, e.g., Sect. 7.7 of Hubbard’s book for more details.

We will come back later (in Sect. 6.3) to the Kähler property of the WP metric, but for now let us just mention that this property enters into the proof of a beautiful theorem of Wolpert [Wo83] saying that the Weil–Petersson symplectic form has a simple expression in terms of Fenchel–Nielsen coordinates:

$$\displaystyle{ \omega _{WP} = \frac{1} {2}\sum \limits _{\alpha \in P}d\ell_{\alpha } \wedge d\tau _{\alpha } }$$

where P is an arbitrary pants decomposition of S. Here, it is worth to mention that an important step in the proof of this formula (cf. Step 2 in the proof of Theorem 7.8.1 in Hubbard’s book [Hu]) is the fact discovered by Wolpert that the infinitesimal generator ∂τ α of the Dehn twist about α is the symplectic gradient of the Hamiltonian function \(\frac{1} {2}\ell_{\alpha }\), that is,

$$\displaystyle{ \frac{1} {2}d\ell_{\alpha } =\omega _{WP}(.,\partial /\partial \tau _{\alpha })\quad (i.e.,\ \text{grad}\,\ell_{\alpha } = -2J(\partial /\partial \tau _{\alpha })) }$$

This equation is the starting point of several Wolpert’s expansion formulas for the Weil–Petersson metric that we will discuss later in this text.

Before proceeding further, let us briefly discuss the Teichmüller and WP metrics on the moduli spaces of once-punctured torii \(\mathcal{M}_{1,1} \simeq \mathbb{H}/SL(2, \mathbb{Z})\).

Example 5

The Teichmüller metric on \(\mathcal{M}_{1,1} \simeq \mathbb{H}/SL(2, \mathbb{Z})\) is the quotient of the hyperbolic metric \(\rho (z) = \frac{\vert dz\vert } {\vert \text{Im}(z)\vert }\) of \(\mathbb{H}\).

On the other hand, the Fenchel–Nielsen coordinates (, τ) on Teich 1,1 have first-order expansion

$$\displaystyle{ \ell(z) \sim \frac{1} {\text{Im}(z)} = \frac{1} {y}\quad \text{and}\quad \tau (z) \sim \frac{\text{Re}(z)} {\text{Im}(z)} = \frac{x} {y} }$$

where z = x + iy. Thus, we see from Wolpert’s formula that

$$\displaystyle{ \omega _{WP} = \frac{1} {2}d\ell \wedge d\tau \sim \left (-\frac{1} {y}dy\right ) \wedge \left (\frac{1} {y}dx - \frac{x} {y^{2}}dy\right ) = \frac{1} {y^{3}}dx \wedge dy = \frac{1} {\text{Im}(z)^{3}}dz \wedge d\overline{z}. }$$

Since the complex structure on Teich 1,1 is the standard complex structure of \(\mathbb{H}\), we see that the Weil–Petersson metric g WP has asymptotic expansion

$$\displaystyle{ g_{WP}^{2} \sim \frac{\vert dz\vert ^{2}} {\text{Im}(z)^{3}}, }$$

that is, the Weil–Petersson g WP on the moduli space \(\mathcal{M}_{1,1} \simeq \mathbb{H}/SL(2, \mathbb{Z})\) near the cusp at infinity is modeledFootnote 7 by the surface of revolution obtained by rotating the curve v = u 3 (for 0 < u ≤ 1 say).

This is in contrast with the fact that the Teichmüller metric is the hyperbolic metric and hence it is modeled by surface of revolution obtained by rotation the curve v = e u (for 1 < u < say).

From this asymptotic expansion of g WP , we see that it is incomplete: indeed, a vertical ray to the cusp at infinity starting at a point z in the line Im(z) = y 0 has Weil–Petersson length ∼ 2y 0 −1∕2 ∼ 2(z)1∕2. Moreover, the curvature K satisfies K(z) ∼ −3∕2(z), and, in particular, K → − as Im(z) → .

The previous example (WP metric on \(\mathcal{M}_{1,1}\)) already contains several features of the WP metric on general moduli spaces \(\mathcal{M}_{g,n}\). For example, we will see later that the Weil–Petersson metric is incomplete because it is possible to shrink a simple closed curve α to a point and leave Teichmüller space along a Weil–Petersson geodesic in time ∼ α 1∕2. Also, some sectional curvatures might approach − as one leaves Teichmüller space.

Nevertheless, an interesting feature of the Weil–Petersson metric in Teich g, n and \(\mathcal{M}_{g,n}\) for 3g − 3 + n > 1 not occurring in the case of \(\mathcal{M}_{1,1}\) is the fact that some sectional curvatures might also approach 0 as one leaves Teichmüller space. Indeed, as we will see later, this happens because the “boundary” of \(\mathcal{M}_{g,n}\) is sufficiently “large” when 3g − 3 + n > 1 so that it is possible for some Weil–Petersson geodesics to travel “almost parallel” to certain parts of the “boundary” for a certain time (while the same is not possible for \(\mathcal{M}_{1,1}\) because the “boundary” consists of a single point).

Concluding this subsection, let us mention that our main dynamical object in these notes—the Weil–Petersson geodesic flow—is simply the geodesic flow induced by the WP metric on the unit cotangent bundle to \(\mathcal{M}_{g,n}\).

6.2.8 Ergodicity of WP Flow: Outline of Proof Revisited

By the end of Sect. 6.1.2 above, we mentioned that the proof of Burns–Masur–Wilkinson Theorem of ergodicity of the WP geodesic flow (Theorem 1) can be essentially reduced to show that the WP metric satisfies the six conditions of Burns–Masur–Wilkinson ergodicity criterion for geodesic flows (Theorem 3).

Indeed, at first sight, it is tempting to say that Theorem 1 follows from Theorem 3 after checking items (I) to (VI) of the latter theorem for the case M = T 1 Teich g, n (the cotangent bundle of Teich g, n ), \(N = T^{1}\mathcal{M}_{g,n}\) (the cotangent bundle of \(\mathcal{M}_{g,n}\)) and Γ = MCG g, n (the mapping-class group).

However, a closer inspection of the statement of the ergodicity criterion (Theorem 3) reveals that this is not quite true: the moduli spaces \(\mathcal{M}_{g,n}\) and their unit cotangent bundles \(N = T^{1}\mathcal{M}_{g,n}\) are not manifolds but only orbifolds, while the ergodicity criterion (Theorem 3) assumes that the phase space N of the geodesic flow is a manifold.

In other words, the orbifoldic nature of moduli spaces imposes a technical difficulty in the reduction of Theorems 1 to 3. Fortunately, a solution to this technical issue is well-known to algebraic geometers and it consists of taking an adequate finite cover of the moduli space in order to “kill” the orbifold points (i.e., points with large stabilizers for the mapping-class group).

More precisely, for each \(k \in \mathbb{N}\), one considers the following finite-index subgroup of the mapping-class group MCG(S):

$$\displaystyle{ MCG(S)[k] =\{\varphi \in MCG(S):\varphi _{{\ast}} = 0\text{ acting on }H_{1}(S, \mathbb{Z}/k\mathbb{Z})\} }$$

where φ is the action on homology of φ. Equivalently, an element φ of MCG(S) belongs to MCG(S)[k] whenever its action φ on the absolute homology group \(H_{1}(S, \mathbb{Z})\) corresponds to a (symplectic) integral 2g × 2g matrix congruent to the identity matrix modulo k.

Example 6

In the case of once-punctured torii, the mapping-class group is \(MCG_{1,1} = SL(2, \mathbb{Z})\) and

$$\displaystyle{ MCG_{1,1}[k] = \left \{\left (\begin{array}{cc} a&b\\ c &d \end{array} \right ) \in SL(2, \mathbb{Z}): a \equiv d \equiv 1(\text{mod }k),b \equiv c \equiv 0(\text{mod }k)\right \} }$$

In the literature, MCG 1,1[k] is called the principal congruence subgroup of \(SL(2, \mathbb{Z})\) of level k.

Remark 10

The index of MCG g, n [k] in MCG g, n can be computed explicitly. For instance, the natural map from MCG g to \(Sp(2g, \mathbb{Z})\) is surjective (see, e.g., Farb–Margalit’s book), so that the index of MCG g [k] is the cardinality of \(Sp(2g, \mathbb{Z}/k\mathbb{Z})\), and, for k = p prime, one has

$$\displaystyle{ \#Sp(2g, \mathbb{Z}/k\mathbb{Z}) = p^{g^{2} }(\,p^{2} - 1)(\,p^{4} - 1)\ldots (\,p^{2g} - 1) = p^{2g^{2}+g } + O(\,p^{2g^{2}+g-2 }), }$$

cf. Dickson’s paper [Di].

It was shown by Serre (see [Se] for the original proof or Farb–Margalit’s book [FaMa] for an alternative exposition) that MCG(S)[k] is torsion-free for k ≥ 3 and, a fortiori, it acts freely and properly discontinuous on Teich(S) for k ≥ 3. In other terms, the finite cover of \(\mathcal{M}(S) = Teich(S)/MCG(S)\) given by

$$\displaystyle{ \mathcal{M}(S)[k] = Teich(S)/MCG(S)[k] }$$

is a manifold for k ≥ 3.

Remark 11

Serre’s result is sharp: the principal congruence subgroup MCG 1,1[2] of level 2 of \(SL(2, \mathbb{Z})\) contains the torsion element − Id.

Once one disposes of an appropriate manifold \(\mathcal{M}(S)[3]\) finitely covering the moduli space \(\mathcal{M}(S)\), the reduction of Theorems 1 to 3 consists of two steps:

  1. (a)

    the verification of items (I) to (VI) in the statement of Theorem 3 in the case of the unit cotangent bundle \(N = T^{1}\mathcal{M}(S)[3]\) of \(\mathcal{M}(S)[3]\).

  2. (b)

    the deduction of the ergodicity (and mixing, Bernoullicity, and positivity and finiteness of measure-theoretic entropy) of the Weil–Petersson geodesic flow on \(T^{1}\mathcal{M}(S)\) from the corresponding fact(s) for the Weil–Petersson geodesic flow on \(T^{1}\mathcal{M}(S)[3]\).

For the remainder of this section, we will discuss item (b) while leaving item (a) (i.e., items (I) to (VI) of Theorem 3 for \(N = T^{1}\mathcal{M}(S)[3]\)) for the next section.

For ease of notation, we will denote \(Teich(S) =\mathcal{ T}\), \(\mathcal{M}(S) =\mathcal{ M}\) and \(\mathcal{M}(S)[3] =\mathcal{ M}[3]\). Assuming that the Weil–Petersson flow is ergodic (and Bernoulli, and its measure-theoretic entropy is positive and finite) on \(T^{1}\mathcal{M}[3]\), the “obstruction” to show the same fact(s) for the Weil–Petersson flow on \(T^{1}\mathcal{M}\) is the possibility that the orbifold points of \(\mathcal{M}\) form a “large” set.

Indeed, if we can show that the set of orbifold points of \(\mathcal{M}\) is “small” (e.g., they form a set of zero measure), then the geodesic flow on \(T^{1}\mathcal{M}[3]\) covers the geodesic flow on \(T^{1}\mathcal{M}\) on a set of full measure. In particular, if E is a (Weil–Petersson flow) invariant set of positive measure on \(T^{1}\mathcal{M}\), then its lift \(\widetilde{E}\) to \(T^{1}\mathcal{M}[3]\) is also a (Weil–Petersson flow) invariant set of positive measure. Therefore, by the ergodicity of the Weil–Petersson flow on \(T^{1}\mathcal{M}[3]\), we have that \(\widetilde{E}\) has full measure, and, a fortiori, E has full measure. Moreover, the fact that the Weil–Petersson flow on \(T^{1}\mathcal{M}[3]\) covers the Weil–Petersson flow on \(T^{1}\mathcal{M}\) on a full measure set also allows to deduce Bernoullicity and positivity and finiteness of measure-theoretic entropy of the latter flow from the corresponding properties for the former flow.

At this point, this subsection is complete once we check that the orbifold points of \(\mathcal{M}(S)\) form a subset of zero measure (for the Liouville/volume measure of the Weil–Petersson metric). This is an immediate consequence of the following lemma:

Lemma 1

Let F be the subset of Teich(S) corresponding to orbifoldic points, i.e., F is the (countable) union of the subsets F(h) of fixed points of the natural action on Teich(S) of all elements hMCG(S) of finite order, excluding the genus 2 hyperelliptic involution. Then, F is a closed subset of real codimension ≥ 2.

Proof

For each hMCG(S) of finite order, F(h) is the Teichmüller space of the quotient orbifold X∕〈h〉. From this, one can show that:

  • if S is compact and h is not the hyperelliptic involution in genus 2, then F(h) has complex dimension ≤ 3g − 5;

  • if S has punctures, then F(h) has complex dimension ≤ 3g − 4;

  • if h is the hyperelliptic involution in genus 2, then F(h) = Teich(S).

See, e.g., Rauch’s paper [Ra] for more details.

In particular, the proof of the lemma is complete once we verify that F is a locally finite union of the real codimension ≥ 2 subsets F(h), hMCG(S).

Keeping this goal in mind, we fix a compact subset K of Teich(S) and we recall that the mapping-class group MCG(S) acts in a properly discontinuous manner on Teich(S). Therefore, it is not possible for an infinite sequence \((h_{n})_{n\in \mathbb{N}} \subset MCG(S)\) of distinct finite order elements to satisfy \(F(h_{n}) \cap K\neq \varnothing\) for all \(n \in \mathbb{N}\). In other words, FK is the subset of finitely many F(h), i.e., F is a locally finite union of F(h), hMCG(S).

Example 7

In the case of once-punctured torii, the subset FTeich 1,1 consists of the \(SL(2, \mathbb{Z})\)-orbits of the points \(i \in \mathbb{H}\) and \(j =\exp (2\pi i/3) \in \mathbb{H}\).

6.3 Geometry of the Weil–Petersson Metric

This section is devoted to the verification of items (I) to (VI) of Burns–Masur–Wilkinson ergodicity criterion (Theorem 3) in the context of the Weil–Petersson metric on Teich(S) and \(\mathcal{M}(S)[3]\). In other terms, as it was explained in Sect. 6.2.8 above, this section covers (some of) the Teichmüller-theoretical aspects of the proof of Burns–Masur–Wilkinson Theorem on the ergodicity of the WP geodesic flow on moduli spaces (Theorem 1) assuming Burns–Masur–Wilkinson ergodicity criterion (Theorem 3).

6.3.1 Items (I) and (II) of Theorem 3 for WP Metric

The item (I) in the statement of Theorem 3 in the context of the Weil–Petersson metric (i.e., the geodesic convexity of the WP metric on Teich(S)) was proved by Wolpert [Wo08], but we will not attempt to discuss this topic here (for the sake of making comments on other aspects of the geometry of WP metric).

Next, let us discuss the item (II) of Theorem 3 in the context of the WP metric, that is, the compactness of the metric completions of moduli spaces \(\mathcal{M}(S)\) equipped with WP metrics.

We start by recalling that the metric completion of the Teichmüller space Teich(S) with respect to the WP metric was determined by Masur [Masur]. Indeed, Masur exploited the fact that we can leave Teich(S) along a WP geodesic in finite time of order ∼ α 1∕2 by pinching a closed geodesic α of hyperbolic length α to show that the WP metric completion of the Teich(S) is the so-called augmented Teichmüller space \(\overline{Teich}(S)\).

The augmented Teichmüller space \(\overline{Teich}(S)\) is a stratified space obtained by adjoining lower-dimensional Teichmüller spaces of noded Riemann surfaces. The combinatorial structure of the stratification of \(\overline{Teich}(S)\) is encoded by the curve complex \(\mathcal{C}(S)\) (sometimes also called complex of curves or graph of curves).

More precisely, the curve complex \(\mathcal{C}(S)\) is a (3g − 4 + n)-simplicial complex defined as follows. The vertices of \(\mathcal{C}(S)\) are homotopy classes of homotopically nontrivial, nonperipheral, simple closed curves on S. We put an edge between two vertices whenever the corresponding homotopy classes have disjoint representatives. In general, a k-simplex \(\sigma \in \mathcal{ C}(S)\) consists of k + 1 distinct vertices possessing mutually disjoint representatives.

Remark 12

\(\mathcal{C}(S)\) is a (3g − 4 + n)-simplicial complex because a maximal collection P of distinct vertices possessing disjoint representatives is a pants decomposition of S and, hence, #P = 3g − 3 + n.

Example 8

In the case of once-punctured torii, the curve complex \(\mathcal{C}(S)\) consists of an infinite discrete set of vertices (because there is no pair of disjoint homotopically distinct curves). However, some authors define the curve complex \(\mathcal{C}(S)\) of once-punctured torii by putting an edge between vertices corresponding to curves intersecting minimally (i.e., only once). In this alternative setting, the curve complex of once-punctured torii becomes the Farey graph.

The curve complex \(\mathcal{C}(S)\) is a connected locally infinite complex, except for the cases (g, n) = (0, 4) or (1, 1). Also, the mapping-class group MCG(S) naturally acts on \(\mathcal{C}(S)\). Moreover, Masur–Minsky [MaMi] showed that \(\mathcal{C}(S)\) is a δ-hyperbolic metric space for some δ = δ(S) > 0.

Using the curve complex \(\mathcal{C}(S)\), we can define the augmented Teichmüller space \(\overline{Teich}(S)\) as follows.

A noded Riemann surface is a compact topological surface equipped with the structure of a complex space with at most isolated singularities called nodes such that each of these singularities possess a neighborhood biholomorphic to a neighborhood of (0, 0) in the singular curve

$$\displaystyle{ \{(z,w) \in \mathbb{C}^{2}: zw = 0\} }$$

Removing the nodes of a noded Riemann surface Y yields to a possibly disconnected Riemann surface denoted by \(\widehat{Y }\). The connected components of \(\widehat{Y }\) are called the pieces of Y.

For example, the noded Riemann surface of genus g of the figure below has two pieces (of genera g − 1 and 1 resp.).

Given a simplex \(\sigma \in \mathcal{ C}(S)\), we will adjoint a Teichmüller space \(\mathcal{T}_{\sigma }\) to Teich(S) in the following way. A marked noded Riemann surface with nodes at σ is a noded Riemann surface X σ equipped with a continuous map f: SX σ such that the restriction of f to Sσ is a homeomorphism to \(\widehat{X_{\sigma }}\). We say that two marked noded Riemann surfaces f: SX σ 1 and g: SX σ 2 are Teichmüller equivalent if there exists a biholomorphic node-preserving map h: X σ 1X σ 2 such that fh is isotopic to g. The Teichmüller space \(\mathcal{T}_{\sigma }\) associated to σ is the set of Teichmüller equivalence classes f: SX σ marked noded Riemann surfaces with nodes at σ.

In this context, the augmented Teichmüller space is

$$\displaystyle{ \overline{Teich}(S) = Teich(S) \cup \bigcup \limits _{\sigma \in \mathcal{C}(S)}\mathcal{T}_{\sigma } }$$

The topology on \(\overline{Teich}(S)\) is given by the following neighborhoods of points f: SX σ . Given \(\sigma \in \mathcal{ C}(S)\), we consider P a maximal simplex (pants decomposition of S) containing σ and we let ( α , τ α ) αP be the corresponding Fenchel–Nielsen coordinates on Teich(S). We extend these coordinates by allowing α = 0 whenever α is pinched in a node and we take the quotient by identifying noded Riemann surfaces corresponding to parameters ( α , τ α ) = (0, t) and ( α , τ α ) = (0, t′) whenever ασ.

Remark 13

The augmented Teichmüller space \(\overline{Teich}(S)\) is not locally compact: indeed, a neighborhood of a noded Riemann surface allows for arbitrary twists τ α corresponding to curves ασ.

The quotient of \(\overline{Teich}(S)\) by the natural action of MCG(S) (through the corresponding action on \(\mathcal{C}(S)\)) is the so-called Deligne–Mumford compactification \(\overline{\mathcal{M}}(S) = \overline{Teich}(S)/MCG(S)\) of the moduli space of \(\mathcal{M}(S)\). The space \(\overline{\mathcal{M}}(S)\) was originally introduced by Deligne–Mumford [DM] and, as the nomenclature suggests, \(\overline{\mathcal{M}}(S)\) is compact (see also Hubbard–Koch’s paper [HuKo] for more details).

Since \(\overline{Teich}(S)\) is the metric completion of Teich(S) with respect to the WP metric and MCG(S)[k] is a finite-index subgroup of MCG(S), it follows from the compactness of \(\overline{\mathcal{M}}(S)\) that the metric completion \(\overline{Teich}(S)/MCG(S)[k]\) of \(\mathcal{M}(S)[k]\) with respect to the WP metric is also compact (because it is a finite cover of \(\overline{\mathcal{M}}(S)\)).

In particular, \(\mathcal{M}(S)[3]\) satisfies the item (II) in the statement of Theorem 3.

Remark 14

It is worth to notice that the Deligne–Mumford compactification in the case of the once-punctured torii is just one pointFootnote 8 while it is stratified in nontrivial lower-dimensional moduli spaces in general. Moreover, as we will see later, some asymptotic formulas of Wolpert tells that the WP metric “looks” like a product of the WP metrics on these lower-dimensional moduli spaces.

In particular, as we will discuss in the last section of this text, some WP geodesics to travel “almost parallel” to these lower-dimensional moduli spaces for a long time and this will give a polynomial rate of mixing for this flow in general. On the other hand, since it is not possible to travel almost parallel to a point for a long time, this arguments breaks down in the case of the WP metric in the case of the moduli space of once-punctured torii.

6.3.2 Item (III) of Theorem 3 for WP Metric

Let us now quickly check that \(\mathcal{M}(S)[3]\) also satisfies the item (III) in the statement of Theorem 3, i.e., its boundary \(\partial \mathcal{M}(S)[3]\) is volumetrically cusp-like.

In this direction, given XTeich(S), let us denote by ρ 0(X) the Weil–Petersson distance between X and \(\partial Teich(S):= \overline{Teich}(S) - Teich(S)\). Our current task is to prove that there are constants C > 0 and ν > 0 such that

$$\displaystyle{ \text{vol}(E_{\rho }) \leq C\rho ^{2+\nu } }$$

where E ρ :={ XTeich(S)∕MCG(S)[3]: ρ 0(X) ≤ ρ}.

As we are going to see now, one can actually take ν = 2 in the estimate above thanks to some asymptotic formulas of Wolpert for the Weil–Petersson metric near the boundary \(\partial \mathcal{T} =\bigcup \limits _{\sigma \in \mathcal{C}(S)}\mathcal{T}_{\sigma }\) of augmented Teichmüller space.

Lemma 2

One has vol(E ρ ) ≃ ρ 4 .

Proof

It was shown by Wolpert (in page 284 of [Wo08]) that the Weil–Petersson metric g WP has asymptotic expansion

$$\displaystyle{ g_{WP} \sim \sum \limits _{\alpha \in \sigma }(4\,dx_{\alpha }^{2} + x_{\alpha }^{6}d\tau _{\alpha }^{2}) }$$

near \(\mathcal{T}_{\sigma }\), where \(x_{\alpha } =\ell_{ \alpha }^{1/2}/\sqrt{2\pi ^{2}}\) and α , τ α are the Fenchel–Nielsen coordinates associated to ασ.

This gives that the volume element \(\sqrt{ \det (g_{WP})}\) of the Weil–Petersson metric near \(\mathcal{T}_{\sigma }\) is \(\sim \prod \limits _{\alpha \in \sigma }x_{\alpha }^{3}\). Furthermore, this asymptotic expansion of g WP also says that the distance ρ 0(X) between X and \(\mathcal{T}_{\sigma }\) is comparable to min ασ x α (X). By putting these two facts together, we see that

$$\displaystyle{ \text{vol}(E_{\rho }) \simeq \rho ^{4} }$$

This proves the lemma.

Remark 15

The properties that \(\overline{\mathcal{M}}(S)\) is compact and \(\mathcal{M}(S)\) is volumetrically cusp-like imply that the Liouville measure (volume) is finite.

Recently, Mirzakhani [Mi13] studied the total mass V g, n of \(\mathcal{M}(S)\) with respect to the WP metric and she showed that there exists a constant M > 0 such that

$$\displaystyle{ g^{-M} \leq \frac{V _{g,n}} {(4\pi ^{2})^{2g+n-3}(2g + n - 3)!} \leq g^{M} }$$

6.3.3 Item (IV) of Theorem 3 for WP Metric

Recall that the item (IV) of Theorem 3 asks for polynomial bounds in the sectional curvatures and their first two derivatives.

In the context of the Weil–Petersson (WP) metric, the desired polynomial bounds on the sectional curvatures themselves follow from the work of Wolpert.

6.3.3.1 Wolpert’s Formulas for the Curvatures of the WP Metric

We will give now a compte rendu of some estimates of Wolpert for the behavior of the WP metric near the boundary \(\partial \mathcal{T}\) of the Teichmüller space \(\mathcal{T} = Teich(S)\).

Before stating Wolpert’s formulas, we need an adapted system of coordinates (called combined length basis in the literature) near the strata \(\mathcal{T}_{\sigma }\), \(\sigma \in \mathcal{ C}(S)\), of \(\partial \mathcal{T}\), where \(\mathcal{C}(S)\) is the curve complex of S.

Denote by \(\mathcal{B}\) the set of pairs (“basis”) (σ, χ) where \(\sigma \in \mathcal{ C}(S)\) is a simplex of the curve complex and χ is a collection of simple closed curves such that each βχ is disjoint from all ασ. Here, we allow that two curves β, β′ ∈ χ intersect (i.e., one might have \(\beta \cap \beta '\neq \varnothing\)) and also the case \(\chi = \varnothing\) is not excluded.

Following the nomenclature introduced by Wolpert, we say that \((\sigma,\chi ) \in \mathcal{ B}\) is a combined length basis at a point \(X \in \mathcal{ T}\) whenever the set of tangent vectors

$$\displaystyle{ \{\lambda _{\alpha }(X),J\lambda _{\alpha }(X),\text{grad}\ell_{\beta }(X)\}_{\alpha \in \sigma,\beta \in \chi } }$$

is a basis of \(T_{X}\mathcal{T}\), where γ is the length parameter in the Fenchel–Nielsen coordinates and λ α := grad α 1∕2.

Remark 16

The length parameters γ and their square-roots γ 1∕2 are natural for the study of the WP metric: for instance, Wolpert showed that these functions are convex along WP geodesics (see, e.g., Wolpert [Wo08, Wo09a] and Wolf [Wolf12]).

The name combined length basis comes from the fact that we think of (σ, χ) as a combination of a collection \(\sigma \in \mathcal{ C}(S)\) of short curves (indicating the boundary stratum that one is close to), and a collection χ of relative curves to σ allowing to complete the set {λ α } ασ into a basis of the tangent space to \(\mathcal{T}\) in which one can write nice formulas for the WP metric.

This notion can be “extended” to a stratum \(\mathcal{T}_{\sigma }\) of \(\mathcal{T}\) as follows. We say χ is a relative basis at a point \(X_{\sigma } \in \mathcal{ T}_{\sigma }\) whenever \((\sigma,\chi ) \in \mathcal{ B}\) and the length parameters { β } βχ is a local system of coordinates for \(\mathcal{T}_{\sigma }\) near X σ .

Remark 17

The stratum \(\mathcal{T}_{\sigma }\) is (isomorphic to) a product of the Teichmüller spaces of the pieces of \(X_{\sigma } \in \mathcal{ T}_{\sigma }\). In particular, \(\mathcal{T}_{\sigma }\) carries a “WP metric”, namely, the product of the WP metrics on the Teichmüller spaces of the pieces of X σ . In this setting, χ is a relative basis at \(X_{\sigma } \in \mathcal{ T}_{\sigma }\) if and only if {grad β } βχ is a basis of \(T_{X_{\sigma }}\mathcal{T}_{\sigma }\).

Remark 18

Contrary to the Fenchel–Nielsen coordinates, the length parameters { β } βχ associated to a relative basis χ might not be a global system of coordinates for \(\mathcal{T}_{\sigma }\). Indeed, this is so because we allow the curves in χ to intersect nontrivially: geometrically, this means that there are points X 0 in \(\mathcal{T}_{\sigma }\) where the geodesic representatives of such curves meet orthogonally, and, at such points, the system of coordinates induced by { β } βχ hits a singularity.

The relevance of the concept of combined length basis to the study of the WP metric is explained by the following theorem of Wolpert [Wo08]:

Theorem 4 (Wolpert)

For any point \(X_{\sigma } \in \mathcal{ T}_{\sigma }\) , \(\sigma \in \mathcal{ C}(S)\) , there exists a relative length basis χ. Furthermore, the WP metric 〈⋅ , ⋅ 〉 WP can be written as

$$\displaystyle{ \langle \cdot,\cdot \rangle _{WP} \sim \sum \limits _{\alpha \in \sigma }\left ((d\ell_{\alpha }^{1/2})^{2} + (d\ell_{\alpha }^{1/2} \circ J)^{2}\right ) +\sum \limits _{\beta \in \chi }(d\ell_{\beta })^{2} }$$

where the implied comparison constant is uniform in a neighborhood \(U \subset \overline{\mathcal{T}}\) of X σ .

In particular, there exists a neighborhood \(V \subset \overline{\mathcal{T}}\) of X σ such that (σ, χ) is a combined length basis at any \(X \in V \cap \mathcal{ T}\) .

The statement above is just the beginning of a series of formulas of Wolpert for the WP metric and its sectional curvatures written in terms of the local system of coordinates induced by a combined length basis (σ, χ).

In order to write down the next list of formulas of Wolpert, we need the following notations. Given μ an arbitrary collection of simple closed curves on S, we define

$$\displaystyle{ \underline{\ell}_{\mu }(X):=\min \limits _{\alpha \in \mu }\ell_{\alpha }(X)\quad \text{and}\quad \overline{\ell}_{\mu }(X):=\max \limits _{\alpha \in \mu }\ell_{\alpha }(X) }$$

where \(X \in \mathcal{ T} = Teich(S)\). Also, given a constant c > 1 and a basis \((\sigma,\chi ) \in \mathcal{ B}\), we will consider the following (Bers) region of Teichmüller space:

$$\displaystyle{ \varOmega (\sigma,\chi,c):=\{ X \in \mathcal{ T}: 1/c <\underline{\ell} _{\chi }(X)\text{ and }\overline{\ell}_{\sigma \cup \chi }(X) <c\} }$$

Wolpert [Wo09] provides several estimates for the WP metric 〈⋅ , ⋅ 〉 WP = 〈⋅ , ⋅ 〉 and its sectional curvatures in terms of the basis λ α = grad α 1∕2, ασ and grad β , βχ, which are uniform on the regions Ω(σ, χ, c).

Theorem 5 (Wolpert)

Fix c > 1. Then, for any \((\sigma,\chi ) \in \mathcal{ B}\) , and any α, α′σ and β, β′χ, the following estimates hold uniformly on Ω(σ, χ, c)

  • \(\langle \lambda _{\alpha },\lambda _{\alpha '}\rangle = \frac{1} {2\pi }\delta _{\alpha,\alpha '} + O((\ell_{\alpha }\ell_{\alpha '})^{3/2}) =\langle J\lambda _{\alpha },J\lambda _{\alpha '}\rangle\) where δ ∗,∗∗ is Kronecker’s delta.

  • λ α ,  α′ 〉 = 〈 α , gradℓ β 〉 = 0

  • gradℓ β , gradℓ β′ 〉 ∼ 1 and, furthermore,gradℓ β , gradℓ β′ extends continuously to the boundary stratum \(\mathcal{T}_{\sigma }\) .

  • λ α , gradℓ β 〉 = O( α 3∕2)

  • the distance from XΩ(σ, χ, c) to the boundary stratum \(\mathcal{T}_{\sigma }\) is

    $$\displaystyle{ d(X,\mathcal{T}_{\sigma }) = \sqrt{2\pi \sum \limits _{\alpha \in \sigma } \ell_{\alpha }(X)} + O\left (\sum \limits _{\alpha \in \sigma }\ell_{\alpha }^{5/2}(X)\right ) }$$
  • for any vector v(σ, χ, c),

    $$\displaystyle{ \left \|\nabla _{v}\lambda _{\alpha } - \frac{3} {2\pi \ell_{\alpha }^{1/2}}\langle v,J\lambda _{\alpha }\rangle J\lambda _{\alpha }\right \|_{WP} = O(\ell_{\alpha }^{3/2}\|v\|_{ WP}) }$$
  • \(\|\nabla _{\lambda _{\alpha }}\mathit{\text{grad}}\ell_{\beta }\|_{WP} = O(\ell_{\alpha }^{1/2})\) and \(\|\nabla _{\lambda _{\alpha }}\mathit{\text{grad}}\ell_{\beta }\|_{WP} = O(\ell_{\alpha }^{1/2})\)

  • \(\nabla _{\mathit{\text{grad}}\ell_{\beta }}\mathit{\text{grad}}\ell_{\beta '}\) extends continuously to the boundary stratum \(\mathcal{T}_{\sigma }\)

  • the sectional curvature of the complex line (real two-plane) {λ α ,  α } is

    $$\displaystyle{ \langle R(\lambda _{\alpha },J\lambda _{\alpha })J\lambda _{\alpha },\lambda _{\alpha }\rangle = \frac{3} {16\pi ^{2}\ell_{\alpha }} + O(\ell_{\alpha }) }$$
  • for any quadruple (v 1, v 2, v 3, v 4), v i ∈ {λ α ,  α , gradℓ β } ασ, βχ distinct from a curvature-preserving permutation of (λ α ,  α ,  α , λ α ), one has

    $$\displaystyle{ \langle R(v_{1},v_{2})v_{3},v_{4}\rangle = O(1), }$$

    and, moreover, each v i of the form λ α or Jλ α introduces a multiplicative factor O( α ) in the estimate above.

These estimates of Wolpert give a good understanding of the geometry of the WP metric in terms of combined length basis. For instance, one infers from the last two items above that, as one approaches the boundary stratum \(\mathcal{T}_{\sigma }\), the sectional curvatures of the WP metric along the complex lines {λ α ,  α } converge to − with speed \(\sim -\ell_{\alpha }^{-1} \sim -d(X,\mathcal{T}_{\sigma })^{-2}\), while the sectional curvatures of the WP metric associated to quadruples of the form (λ α ,  α ,  α, λ α) with α, α′ ∈ σ, αα′, converge to 0 with speed \(\sim O(\ell_{\alpha }^{2}\ell_{\alpha '}^{2}) = O(d(X,\mathcal{T}_{\sigma })^{8})\) at least.

In particular, these formulas of Wolpert allow to show “one third of item (IV) of Theorem 3” for the WP metric, that is,

$$\displaystyle{ \|R_{WP}(x)\|_{WP} \leq Cd(x,\partial \mathcal{T})^{-2} }$$
(6.1)

for all \(x \in \mathcal{ T}\).

Remark 19

Observe that the formulas of Wolpert provide asymmetric information on the sectional curvatures of the WP metric: indeed, while we have precise estimates on how these sectional curvatures can approach −, the same is not true for the sectional curvatures approaching zero (where one disposes of lower bounds but no upper bounds for the speed of convergence).

Remark 20

From the discussion above, we see that there are sectional curvatures of the WP metric on Teich(S) approaching zero whenever \(\sigma \in \mathcal{ C}(S)\) contains two distinct curves. In other words, the WP metric has sectional curvatures approaching zero whenever the genus g and the number of punctures n of S = S g, n satisfy 3g − 3 + n > 1, i.e., except in the cases of once-punctured torii S 1,1 and four-times puncture spheres S 0,4. This qualitative difference on the geometry of the WP metric on Teich g, n in the cases 3g − 3 + n > 1 and 3g − 3 + n = 1 (i.e., (g, n) = (0, 4) or (1, 1)) will be important in the last section of this text when we will discuss the rates of mixing of the WP geodesic flow.

Remark 21

As it was pointed out by Wolpert [Wo11], these estimates permit to think of the WP metric on the moduli space \(\mathcal{M}_{1,1} \simeq \mathbb{H}^{2}/PSL(2, \mathbb{Z})\) in a ɛ-neighborhood of the cusp at infinity as a C 2-perturbation of the metric π 3(4dr 2 + r 6 ) of the surface of revolution of the profile {y = x 3} modulo multiplicative factors of the form 1 + O(r 4).

Now, we will investigate the remaining “two thirds of item (IV) of Theorem 3” for the WP metric, i.e., polynomial bounds for the first two derivatives ∇R and ∇2 R of the curvature operator R of the WP metric.

6.3.3.2 Bounds for the First Two Derivatives of WP Metric: Overview

As it was recently pointed out to us by Wolpert (in a private communication), it is possible to deduce good bounds for the derivatives of the WP metric (and its curvature tensor) by refining the formulas for the WP metric in some of his works.

Nevertheless, by the time Burns–Masur–Wilkinson’s paper [BMW] was written, it was not clear at all that Wolpert’s delicate calculations for the WP metric could be extended to provide useful information about the derivatives of this metric.

For this reason, Burns–Masur–Wilkinson decided to implement the following alternative strategy.

At first sight, our task reminds the setting of Cauchy’s inequality in Complex Analysis where one estimates the derivatives of a holomorphic function in terms of given bounds for the C 0-norm of this function via the Cauchy integral formula. In fact, our current goal is to estimate the first two derivatives of a “function” (actually, the curvature tensor of the WP metric) defined on the complex-analytic manifold Teich(S) knowing that this “function” already has nice bounds (cf. Eq. (6.1)).

However, one can not apply the argument described in the previous paragraph directly to the curvature tensor of the WP metric because this metric is only a real-analytic (but not a complex-analytic/holomorphic) object on the complex-analytic manifold Teich(S).

Fortunately, as it was observed by Burns–Masur–Wilkinson, this idea of using the Cauchy inequalities can still be shown to work after one adds some results of McMullen [McM00] into the picture. In a nutshell, McMullen showed that the WP metric is closely related to a holomorphic object: roughly speaking, using the so-called Bers simultaneous uniformization theorem, one can think of the Teichmüller space Teich(S) as a totally real submanifold of the so-called quasi-Fuchsian locus QF(S), and, in this setting, the Weil–Petersson symplectic 2-form ω WP is the restriction to Teich(S) of the differential of a holomorphic 1-form θ WP globally defined on the quasi-Fuchsian locus QF(S). In particular, it is possible to use Cauchy’s inequalities to the holomorphic object θ WP to get some estimates for the first two derivatives of the WP metric.

Remark 22

A caricature of the previous paragraph is the following. We want to estimate the first two derivatives of a real-analytic function \(f: \mathbb{R} \rightarrow \mathbb{C}\) (“WP metric”) knowing some bounds for the values of f. In principle, we can not do this by simply applying Cauchy’s estimates to f, but in our context we know (“by the results of McMullen”) that the natural embedding \(\mathbb{R} \subset \mathbb{C} = \mathbb{R} \oplus i\mathbb{R}\) of \(\mathbb{R}\) as a totally real submanifold of \(\mathbb{C}\) allows to think of \(f: \mathbb{R} \rightarrow \mathbb{C}\) as the restriction of a holomorphic function \(g: \mathbb{C} \rightarrow \mathbb{C}\) and, thus, we can apply Cauchy inequalities to g to get some estimates for f.

In what follows, we will explain the “Cauchy inequality idea” of Burns–Masur–Wilkinson in two steps. Firstly, we will describe the embedding of Teich(S) into the quasi-Fuchsian locus QF(S) and the holomorphic 1-form θ WP of McMullen whose differential restricts to the WP symplectic 2-form on Teich(S). After that, we will show how the Cauchy inequalities can be used to give the remaining “two thirds of item (IV) of Theorem 3” for the WP metric.

6.3.3.3 Quasi-Fuchsian Locus QF(S) and McMullen’s 1-Forms θ WP

Given a hyperbolic Riemann surface \(S = \mathbb{H}/\varGamma\), \(\varGamma <PSL(2, \mathbb{R})\), the quasi-Fuchsian locus QF(S) is defined as

$$\displaystyle{ QF(S) = Teich(S) \times Teich(\overline{S}) }$$

where \(\overline{S}\) is the conjugate Riemann surface of S, i.e., \(\overline{S}\) is the quotient \(\overline{S} = \mathbb{L}/\varGamma\) of the lower half-plane \(\mathbb{L} =\{ z \in \mathbb{C}: \text{Im}(z) <0\}\) by Γ. The Fuchsian locus F(S) is the image of Teich(S) under the antidiagonal embedding

$$\displaystyle{ \hat{\alpha }: Teich(S) \rightarrow QF(S),\quad \hat{\alpha }(X) = (X,\overline{X}) }$$

Geometrically, we can think of elements (X, Y ) ∈ QF(S) as follows. Recall that X and Y are related to S and \(\overline{S}\) via (extremal) quasiconformal mappings determined by the solutions of Beltrami equations associated to Γ-invariant Beltrami differentials (coefficients) μ X and μ Y on \(\mathbb{H}\) and \(\mathbb{L}\). Now, we observe that \(\mathbb{H}\) and \(\mathbb{L}\) live naturally on the Riemann sphere \(\overline{\mathbb{C}} = \mathbb{C} \cup \{\infty \}\). Since the real axis/circle at infinity/equator \(\mathbb{R}_{\infty } = \overline{\mathbb{C}} - (\mathbb{H} \cup \mathbb{L})\) has zero Lebesgue measure, we see that μ X and μ Y induce a Beltrami differential μ (X, Y ) on \(\overline{\mathbb{C}}\). By solving the corresponding Beltrami equation, we obtain a quasiconformal map f X, Y on \(\overline{\mathbb{C}}\) and, by conjugating, we obtain a quasi-Fuchsian subgroup

$$\displaystyle{ \varGamma (X,Y ) =\{\, f_{(X,Y )} \circ \gamma \circ f_{(X,Y )}^{-1}:\gamma \in \varGamma \}<PSL(2, \mathbb{C}), }$$

i.e., a Kleinian subgroup whose domain of discontinuity \(\varOmega (X,Y ) \subset \overline{\mathbb{C}}\) consists of two connected components A and B such that XAΓ(X, Y ) and YBΓ(X, Y ).

The following picture summarizes the discussion of the previous paragraph:

Remark 23

The Jordan curve given by the image \(f_{(X,Y )}(\mathbb{R}_{\infty })\) of the equator \(\mathbb{R}_{\infty }\) under the quasiconformal map f (X, Y ) is “wild” in general, e.g., it has Hausdorff dimension > 1 (as the picture above tries to represent). In fact, this happens because a typical quasiconformal map is merely a Hölder-continuous, and, hence, it might send “nice” curves (such as the equator) into curves with “intricate geometries”.

The data of the quasi-Fuchsian subgroup Γ(X, Y ) attached to \((X,Y ) \in QF(S) = Teich(S) \times Teich(\overline{S})\) permits to assign (marked) projective structures to X and Y. More precisely, by writing XAΓ(X, Y ) and YBΓ(X, Y ) with \(A,B \subset \overline{\mathbb{C}}\) and \(\varGamma (X,Y ) <PSL(2, \mathbb{C})\), we are equipping X and Y with projective structures, that is, atlases of charts to \(\mathbb{C}\) whose changes of coordinates are Möebius transformations (i.e., elements of \(PSL(2, \mathbb{C})\)). Furthermore, by recalling that X and Y come with markings f: SX and \(g: \overline{S} \rightarrow Y\) (because they are points in Teichmüller spaces), we see that the projective structures above are marked.

In summary, we have a natural quasi-Fuchsian uniformization map

$$\displaystyle{ \sigma: QF(S) \rightarrow Proj(S) \times Proj(\overline{S}) }$$

assigning to (X, Y ) the marked projective structures

$$\displaystyle{ \sigma (X,Y ):= (\sigma _{QF}(X,Y ),\overline{\sigma }_{QF}(X,Y )) }$$

Here, Proj(S) is the “Teichmüller space of projective structures” on S, i.e., the space of “Teichmüller” equivalence classes of marked projective structures f: SX where two marked projective structures f 1: SX 1 and f 2: SX 2 are “Teichmüller” equivalent whenever there is a projective isomorphism h: X 1X 2 homotopic to f 2f 1 −1.

Remark 24

This procedure due to Bers [Bers] of attaching a quasi-Fuchsian subgroup Γ(X, Y ) to a pair of hyperbolic surfaces X and Y is called Bers simultaneous uniformization because the knowledge of Γ(X, Y ) allows to equip at the same time X and Y with natural projective structures.

Note that σ is a section of the natural projection

$$\displaystyle{ Proj(S) \times Proj(\overline{S}) \rightarrow QF(S) = Teich(S) \times Teich(\overline{S}) }$$

obtained by sending each pair of (marked) projective structures (X, Y ), XProj(S), \(Y \in Proj(\overline{S})\), to the unique pair of (marked) compatible conformal structures \((\pi (X),\overline{\pi }(Y ))\), π(X) ∈ Teich(S), \(\overline{\pi }(Y ) \in Teich(\overline{S})\).

We will now describe how the (affine) structure of the fibers Proj X (S) = π −1(X) of the projection π: Proj(S) → Teich(S) and the section σ can be used to construct McMullen’s primitives/potentials of the Weil–Petersson symplectic form ω WP .

Given two projective structures p 1, p 2Proj X (S) in the same fiber of the projection π: Proj(S) → Teich(S), one can measure how far apart from each other are p 1 and p 2 using the so-called Schwarzian derivative.

More precisely, the fact that p 1 and p 2 induce the same conformal structure means that the charts of atlases associated to them can be thought as some families of maps \(f_{1}: U \rightarrow \overline{\mathbb{C}}\) and \(f_{2}: U \rightarrow \overline{\mathbb{C}}\) from (small) open subsets UX to the Riemann sphere \(\overline{\mathbb{C}}\), and we can measure the “difference” p 2p 1 by computing how “far” from a Möbius transformation (in \(PSL(2, \mathbb{C})\)) is f 2f 1 −1.

Here, given a point zU, one observes that there exists an unique Möebius transformation \(A \in PSL(2, \mathbb{C})\) such that f 2 and Af 1 coincide at z up to second order (i.e., f 2 and Af 1 have the same value and the same first and second derivatives at z). Hence, it is natural to measure how far from a Möbius transformation is f 2f 1 −1 by understanding the difference between the third derivatives of f 2 and Af 1 at zU, i.e., D 3( f 2Af 1)(z).

Actually, this is almost the definition of the Schwarzian derivative: since the derivatives of f 2 and Af 1 map T z U to \(T_{f_{2}(z)}\overline{\mathbb{C}}\), in order to recover an object from T z U to itself, it is a better idea to “correct” D 3( f 2Af 1)(z) with Df 2 −1(z), i.e., we define the Schwarzian derivative S{ f 2, f 1}(z) of f 2 and f 1 at z as

$$\displaystyle{ S\{\,f_{2},f_{1}\}(z):= 6\left (Df_{2}(z)^{-1} \circ D^{3}(\,f_{ 2} - A \circ f_{1})(z)\right ) }$$

Here, the factor 6 shows up for historical reasons.Footnote 9

By definition, the Schwarzian derivative S{ f 2, f 1} is a field of quadratic forms on U (since its definition involves taking third order derivatives). In other terms, S{ f 2, f 1} is a quadratic differential on U, that is, the “difference” p 2p 1 between two projective structures p 1, p 2Proj X (S) in the same fiber of the projection π: Proj(S) → Teich(S) is given by a quadratic differential p 2p 1 = S{p 2, p 1} ∈ Q(X). In particular, the fibers Proj X (S) are affine spaces modeled by the space Q(X) of quadratic differentials on X.

Remark 25

The reader will find more explanations about the Schwarzian derivative in Sect. 6.3 of Hubbard’s book [Hu].

Remark 26

The idea of “measuring” the distance between projective structures (inducing the same conformal structure) by computing how far they are from Möbius transformations via the Schwarzian derivative is close in some sense to the idea of measuring the distance between two points in Teichmüller space Teich(S) by computing the eccentricities of quasiconformal maps between these points.

Using this affine structure on Proj X (S) and the fact that Q(X) ≃ T X Teich(S) is the cotangent space of Teich(S) at X, we see that, for each \(Y,Z \in Teich(\overline{S})\), the map

$$\displaystyle{ X \in Teich(S)\mapsto \sigma _{QF}(X,Y ) -\sigma _{QF}(X,Z) \in Q(X) }$$

defines a (holomorphic) 1-form on Teich(S). Note that, by letting \(Y \in Teich(\overline{S})\) vary and by fixing \(Z \in Teich(\overline{S})\), we have a map τ Z = τ given by

$$\displaystyle{ (X,Y ) \in Teich(S) \times Teich(\overline{S})\mapsto \tau (X,Y ):=\sigma _{QF}(X,Y ) -\sigma _{QF}(X,Z) \in Q(X) }$$

Since \(QF(S) = Teich(S) \times Teich(\overline{S})\) (so that \(T^{{\ast}}QF(S) = T^{{\ast}}Teich(S) \oplus T^{{\ast}}Teich(\overline{S})\)) and Q(X) ≃ T X Teich(S), we can think of τ as a (holomorphic) 1-form on QF(S).

For later use, let us note that the 1-form τ: QF(S) → T Teich(S) is bounded with respect to the Teichmüller metric on Teich(S). Indeed, this is a consequence of Nehari’s bound stating that if \(U \subset \overline{\mathbb{C}}\) is a round disk (i.e., the image of the unit disk \(\mathbb{D} \subset \mathbb{C} \subset \overline{\mathbb{C}}\) under a Möebius transformation) equipped with its hyperbolic metric ρ and \(f: U \rightarrow \mathbb{C}\) is an injective complex-analytic map, then

$$\displaystyle{ \|S\{\,f,z\}\|_{L^{\infty }} \leq 3/2. }$$

In this setting, McMullen constructed primitives/potentials for the WP symplectic form ω WP as follows. The Teichmüller space Teich(S) sits in the quasi-Fuchsian locus QF(S) as the Fuchsian locus \(F(S) =\hat{\alpha } (Teich(S))\) where \(\hat{\alpha }\) is the antidiagonal embedding

$$\displaystyle{ \hat{\alpha }: Teich(S) \rightarrow QF(S),\quad \hat{\alpha }(X) = (X,\overline{X}) }$$

By pulling back the 1-form τ under \(\hat{\alpha }\), we obtain a bounded 1-form

$$\displaystyle{ \theta _{WP}(X):=\hat{\alpha } ^{{\ast}}(\tau )(X) =\sigma _{ QF}(X,\overline{X}) -\sigma _{QF}(X,Z) }$$

Remark 27

This form \(\theta _{WP} =\hat{\alpha } ^{{\ast}}(\tau )\) is closely related to a classical object in Teichmüller theory called Bers embedding: in our notation, the Bers embedding is

$$\displaystyle{ \beta _{X}(\overline{Z}) =\sigma _{QF}(X,Z) -\sigma _{QF}(X,\overline{X}) = -\hat{\alpha }^{{\ast}}(\tau )(X) = -\theta _{ WP}(X) }$$

McMullen [McM00] showed that the bounded 1-forms WP are primitives/potentials of the WP symplectic 2-form ω WP , i.e., 

$$\displaystyle{ d(i\theta _{WP}) =\omega _{WP} }$$

See also Sect. 7.7 of Hubbard’s book [Hu] for a nice exposition of this theorem of McMullen. Equivalently, the restriction of the holomorphic 1-form τ to the Fuchsian locus F(S) (a totally real sublocus of QF(S)) permits to construct (Teichmüller bounded) primitives for the WP symplectic form on F(S).

At this point, we are ready to implement the “Cauchy estimate” idea of Burns–Masur–Wilkinson to deduce bounds for the first two derivatives of the curvature operator of the WP metric.

6.3.3.4 “Cauchy Estimate” of ω WP After Burns–Masur–Wilkinson

Following Burns–Masur–Wilkinson, we will need the following local coordinates in Teich(S):

Proposition 1 (McMullen [McM00])

There exists an universal constant C 0 = C 0(g, n) ≥ 1 such that, for any X 0Teich(S) = Teich g, n , one has a holomorphic embedding

$$\displaystyle{ \psi =\psi _{X_{0}}:\varDelta ^{N} \rightarrow Teich(S) }$$

of the Euclidean unit polydisk \(\varDelta ^{N}:=\{ (z_{1},\ldots,z_{N}) \in \mathbb{C}^{N}: \vert z_{j}\vert <1\,\,\,\,\forall \,j = 1,\ldots,N\}\) (where N = 3g − 3 + n = dim(Teich(S))) sending 0 ∈ Δ N to X 0 = ψ(0) and satisfying

$$\displaystyle{ \frac{1} {C_{0}}\|v\| \leq \| D\psi (v)\|_{T} \leq C_{0}\|v\|,\quad \forall v \in T\varDelta ^{N}, }$$

where ∥. ∥ T is the Teichmüller norm and ∥. ∥ is the Euclidean norm on Δ N .

Also, since the statement of Proposition 1 involves the Teichmüller norm ∥. ∥ T and we are interested in the Weil–Petersson norm ∥. ∥ WP , the following comparison (from Lemma 5.4 of Burns–Masur–Wilkinson paper [BMW]) between ∥. ∥ T and ∥. ∥ WP will be helpful:

Lemma 3

There exists an universal constant C = C(g, n) ≥ 1 such that, for any XTeich(S) and any cotangent vector φQ(X) ≃ T X Teich(S), one has

$$\displaystyle{ \|\varphi \|_{WP} \leq C \frac{1} {\underline{\ell}(X)}\|\varphi \|_{T} }$$

where \(\underline{\ell}(X)\) is the systole of X (i.e., the length of the shortest closed simple hyperbolic geodesics on X). In particular, for any XTeich(S) and any tangent vector μT X Teich(S), one has

$$\displaystyle{ \|\mu \|_{T} \leq C \frac{1} {\underline{\ell}(X)}\|\mu \|_{WP} }$$

Proof

Given XTeich(S), let us write \(X \simeq \mathbb{H}^{2}/\varGamma\) where \(\varGamma <PSL(2, \mathbb{R})\) is “normalized” to contain the element T(z) = λz where \(\lambda =\log \underline{\ell} (X)\).

Fix \(D \subset \mathbb{H}\) a Dirichlet fundamental domain of the action of Γ centered at the point \(i \in \mathbb{H}\). By the collaring theorem,Footnote 10 the union of \(1/\underline{\ell}(X)\) isometric copies of D contains a ball B of fixed (universal) radius c = c(g, n) > 0 around any point zD.

By combining the Cauchy integral formula with the fact stated in the previous paragraph, we see that

$$\displaystyle{ \vert \varphi (z)\vert \leq \frac{1} {2\pi c}\int _{B}\vert \varphi \vert \leq \frac{1} {2\pi c\underline{\ell}(X)}\int _{D}\vert \varphi \vert = \frac{1} {2\pi c\underline{\ell}(X)}\|\varphi \|_{T} }$$

Since the hyperbolic metric ρ is bounded away from 0 on D, we can use the L -norm estimate on φ above to deduce that

$$\displaystyle{ \|\varphi \|_{WP}^{2}:=\int _{ D}\frac{\vert \varphi \vert ^{2}} {\rho ^{2}} \leq \frac{C} {\underline{\ell}(X)^{2}}\|\varphi \|_{T}^{2} }$$

for some constant C = C(g, n) > 0. This completes the proof of the lemma.

Remark 28

The factor \(1/\underline{\ell}(X)\) in the previous lemma can be replaced by \(1/\sqrt{\underline{\ell}(X)}\) via a refinement of the argument above. However, we will not prove this here because this refined estimate is not needed for the proof of the main results of Burns–Masur–Wilkinson.

Using the local coordinates from Proposition 1 (and the comparison between Teichmüller and Weil–Petersson norms in the previous lemma), we are ready to use Cauchy’s inequalities to estimate “g ij ’s” of the WP metric. More concretely, denoting by \(\psi =\psi _{X_{0}}\) the local coordinate “centered at some X 0Teich(S)” in Proposition 1, let z k = x k + iy k , k = 1, , N and consider the vector fields

$$\displaystyle{ e_{\ell}:=\left \{\begin{array}{cl} \partial /\partial x_{\ell}, &\text{if }\ell = 1,\ldots,N\\ \partial /\partial y_{\ell -N},&\text{if }\ell = N + 1,\ldots,2N \end{array} \right. }$$

on Δ N. In setting, we denote by G ij (z) = ψ g WP (z)(e i , e j ) the “g ij ’s” of the WP metric g WP in the local coordinate ψ and by G −1(z) = (G ij(z))1 ≤ i, j ≤ 2N the inverse of the matrix (G ij (z))1 ≤ i, j ≤ 2N .

Proposition 2

There exists an universal constant C = C(g, n) ≥ 1 such that, for any X 0Teich(S), the pullback G = ψ g WP of the WP metric g WP under the local coordinate \(\psi =\psi _{X_{0}}:\varDelta ^{N} \rightarrow Teich(S)\) “centered at X 0 ” in Proposition  1 verifies the following estimates:

$$\displaystyle{ \|G^{-1}(z)\| \leq C/\underline{\ell}(X_{ 0})^{2}\quad \forall z \in \varDelta ^{N},\,\|z\| <1/2, }$$

and

$$\displaystyle{ \max \limits _{(\xi _{1},\ldots,\xi _{k})\in \{x_{1},\ldots,x_{N},y_{1},\ldots,y_{N}\}^{k}} \frac{1} {k!}\left \vert \frac{\partial ^{k}G_{ij}} {\partial \xi _{1}\ldots \partial \xi _{k}} (z)\right \vert \leq C }$$

for all 1 ≤ i, j ≤ 2N, k ≥ 0 and zΔ N ,z∥ < 1∕2.

Proof

The first inequality

$$\displaystyle{ \|G^{-1}(z)\| \leq C/\underline{\ell}(X_{ 0})^{2} }$$

follows from Proposition 1 and Lemma 3. Indeed, by letting \(v =\sum \limits _{ i=1}^{2N}v_{i}e_{i}\), we see from Proposition 1 and Lemma 3 that

$$\displaystyle{ \|v\|^{2} \leq C_{ 0}^{2}\|D\psi (v)\|_{ T}^{2} \leq \frac{C} {\underline{\ell}(X_{0})^{2}}\|D\psi (v)\|_{WP}^{2} }$$

Since

$$\displaystyle\begin{array}{rcl} \|D\psi (v)\|_{WP}^{2}& =& \langle D\psi (v),D\psi (v)\rangle _{ WP} =\sum v_{i}v_{j}\langle D\psi (e_{i}),D\psi (e_{j})\rangle _{WP} {}\\ & =& \sum v_{i}v_{j}G_{ij} =\langle v,Gv\rangle {}\\ & \leq & \|v\| \cdot \| Gv\|, {}\\ \end{array}$$

we deduce that

$$\displaystyle{ \|v\|^{2} \leq \frac{C} {\underline{\ell}(X_{0})^{2}}\|v\| \cdot \| Gv\|, }$$

i.e., \(\|G^{-1}\| \leq C/\underline{\ell}(X_{0})^{2}\).

For the proof of second inequality (estimates of the k-derivatives of G ij ’s), we begin by “rephrasing” the construction of McMullen’s θ WP -form in terms of the local coordinate \(\psi =\psi _{X_{0}}\) introduced in Proposition 1.

The composition \(\hat{\alpha }\circ \psi\) of the local coordinate ψ: Δ NTeich(S) with the antidiagonal embedding \(\hat{\alpha }: Teich(S) \rightarrow QF(S)\) of the Teichmüller space in the quasi-Fuchsian locus can be rewritten as

$$\displaystyle{ \hat{\alpha }\circ \psi =\varPsi \circ \alpha }$$

where α: Δ NΔ N ×Δ N is the antidiagonal embedding

$$\displaystyle{ \alpha (z) = (z,\overline{z}) }$$

and the local coordinate Ψ: Δ N ×Δ NQF(S) given by

$$\displaystyle{ \varPsi (z,w) = (\psi (z),\overline{\psi (\overline{w})}). }$$

In this setting, the pullback by Ψ of the holomorphic 1-form τ(X, Y ) = σ QF (X, Y ) −σ QF (X, Z) gives a holomorphic 1-form κ = Ψ τ on Δ N ×Δ N. Moreover, since the Euclidean metric on Δ N ×Δ N is comparable to the pullback by Ψ of the Teichmüller metric (cf. Proposition 1), τ is bounded in Teichmüller metric and d( WP ) = ω WP where \(\theta _{WP} =\hat{\alpha } ^{{\ast}}\tau\), we see that

$$\displaystyle{ \alpha ^{{\ast}}\varOmega =\psi ^{{\ast}}\omega _{ WP} }$$

where Ω:= d() and κ:= Ψ τ is a holomorphic bounded (in the Euclidean norm) 1-form on Δ N ×Δ N.

Let us write \(\kappa =\sum \limits _{ j=1}^{N}a_{j}dz_{j}\) in complex coordinates (z 1, , z N , w 1, , w N ) ∈ Δ N ×Δ N, where \(a_{j}:\varDelta ^{N} \times \varDelta ^{N} \rightarrow \mathbb{C}\) are bounded holomorphic functions. Hence,

$$\displaystyle{ \varOmega = d(i\kappa ) = i\left (\sum \limits _{j,k=1} \frac{\partial a_{j}} {\partial z_{k}}dz_{k} \wedge dz_{j} +\sum \limits _{j,k=1} \frac{\partial a_{j}} {\partial w_{k}}dw_{k} \wedge dz_{j}\right ) }$$

and, a fortiori,

$$\displaystyle{ \psi ^{{\ast}}\omega _{ WP} =\alpha ^{{\ast}}\varOmega = i\left (\sum \limits _{ j,k=1} \frac{\partial a_{j}} {\partial z_{k}}dz_{k} \wedge dz_{j} +\sum \limits _{j,k=1} \frac{\partial a_{j}} {\partial \overline{z}_{k}}d\overline{z}_{k} \wedge dz_{j}\right ) }$$

Since ψ ω WP is the Kähler form of the metric G = ψ g WP , we see that the coefficients of G are linear combinations of the α-pullbacks of ∂a j ∂z k and ∂a j ∂w k . Because a j are (universally) bounded holomorphic functions, we can use Cauchy’s inequalities to see that the derivatives of a j are (universally) bounded at any (z, w) ∈ Δ N with ∥(z, w)∥ < 1∕2. It follows from the boundedness of the (nonholomorphic) antidiagonal embedding α that the k-derivatives of G ij ’s satisfy the desired bound.

The estimates in Proposition 2 (controlling the WP metric in the local coordinates constructed in Proposition 1) permit to deduce the remaining “two thirds of item (IV) of Theorem 3” for the WP metric:

Theorem 6 (Burns–Masur–Wilkinson)

There are constants C > 0 and β > 0 such that, for any \(X_{0} \in \mathcal{ T} = Teich(S)\) , the curvature tensor R WP of the WP metric satisfies

$$\displaystyle{ \max \{\|\nabla R_{WP}(X_{0})\|,\|\nabla ^{2}R_{ WP}(X_{0})\|\} \leq Cd(X_{0},\partial \mathcal{T})^{-\beta } }$$

Proof

Fix X 0Teich(S) and consider the local coordinate \(\psi =\psi _{X_{0}}\) provided by Proposition 1. Since ∥∥ and ∥ −1∥ are uniformly bounded, our task is reduced to estimate the first two derivatives of the curvature tensor R of the metric G(z) = ψ g WP (z) = (G ij (z)) at the origin 0 ∈ Δ N.

Recall that the Christoffel symbols of G ij = G ij (z) are

$$\displaystyle{ \varGamma _{ij}^{m} = \frac{1} {2}\sum \limits _{k}G^{mk}\left (\frac{\partial G_{ki}} {\partial \xi _{j}} + \frac{\partial G_{kj}} {\partial \xi _{i}} -\frac{\partial G_{ij}} {\partial \xi _{k}} \right ) }$$

or

$$\displaystyle{ \varGamma _{ij}^{m} = \frac{1} {2}G^{mk}(G_{ ki,m} + G_{kj,i} - G_{ij,k}) }$$

in Einstein summation convention, and, in terms of the Christoffel symbols, the coefficients of the curvature tensor are

$$\displaystyle{ R_{ijk}^{l} = \frac{\partial \varGamma _{ik}^{l}} {\partial \xi _{j}} -\frac{\partial \varGamma _{ij}^{l}} {\partial \xi _{k}} +\varGamma _{ js}^{l}\varGamma _{ ik}^{s} -\varGamma _{ ks}^{l}\varGamma _{ ij}^{s} }$$

Therefore, we see that the coefficients of the k-derivative ∇k R is a polynomial function of G ij and the first k + 2 partial derivatives G ij whose “degree”Footnote 11 in the “variables” G ij is ≤ k + 2.

By Proposition 2, each G ij(0) has order \(O(\underline{\ell}(X_{0})^{-2})\) and the first k + 2 partial derivatives of G ij at 0 are bounded by a constant depending only on k. It follows that

$$\displaystyle\begin{array}{rcl} \|\nabla ^{k}R(0)\|^{2}& \leq & C(k)\sum \limits _{ i_{1},\ldots,i_{k+3},j_{1},\ldots,j_{k+3},l,m}(\nabla ^{k}R)_{ i_{1}\ldots i_{k+3}}^{l}(\nabla ^{k}R)_{ j_{1}\ldots j_{k+3}}^{m}G^{i_{1}j_{1} }\ldots G^{i_{k+3}j_{k+3} }G_{lm} {}\\ & \leq & C(k) \frac{1} {\underline{\ell}(X_{0})^{2(k+2)}} \frac{1} {\underline{\ell}(X_{0})^{2(k+2)}} \frac{1} {\underline{\ell}(X_{0})^{2(k+3)}} = C(k) \frac{1} {\underline{\ell}(X_{0})^{6k+14}}, {}\\ \end{array}$$

and, consequently,

$$\displaystyle{ \max \{\|\nabla R_{WP}(X_{0})\|,\|\nabla ^{2}R_{ WP}(X_{0})\|\} \leq C/\underline{\ell}(X_{0})^{26/2} = C/d(X_{ 0},\partial \mathcal{T})^{26}. }$$

This completes the proof.

At this point, Theorem 5 (or, more precisely, its consequence in Eq. (6.1)) and Theorem 6 imply the validity of item (IV) of Theorem 3 for the WP metric.

Remark 29

The estimates for the derivatives of the curvature tensor R WP appearing in the proof of Theorem 6 are not sharp with respect to the exponent β. For instance, the WP metric on the moduli space \(\mathcal{M}_{1,1}\) of once-punctured torii has curvature ∼ −1∕ ∼ −1∕d 2 where d = d(X 0, ) is the WP distance between X 0 and the boundary \(\partial \mathcal{M}_{1,1} =\{ \infty \}\), so that one expects the kth-derivatives of the curvature behave like ∼ −1∕d k+2 (i.e., the exponent 6k + 14 above should be k + 2).

In a recent private communication, Wolpert indicated that it is possible to derive the sharp estimates of the form

$$\displaystyle{ \|\nabla ^{k}R_{ WP}(X_{0})\| \leq C(k)/d(X_{0},\partial \mathcal{T})^{k+2} }$$

for the derivatives of the curvature tensor of the WP metric from his works.

6.3.4 Item (V) of Theorem 3 for WP Metric

The main result of this subsection is the following theorem implying item (V) of Theorem 3 for WP metric.

Theorem 7

There exists a constant c  > 0 such that for all \(X \in \mathcal{ M}[k]=\mathcal{T}/MCG[k]\) ,k ≥ 3, one has the following polynomial lower bound

$$\displaystyle{ inj(X) \geq c \cdot d_{WP}(X,\partial \mathcal{M}[k])^{3} }$$

on the injectivity radius of the WP metric at X.

The proof of this result also relies on the work of Wolpert. More precisely, Wolpert [Wo03] showed that there exists a constant c > 0 such that, for any \(\sigma \in \mathcal{ C}(S)\) and \(X \in \mathcal{ T}\) with \(\overline{\ell}_{\sigma }(X) \ll 1\),

$$\displaystyle{ d_{WP}(X,\varGamma (\sigma )(X)) \geq cd(X,\partial \mathcal{T})^{3} }$$

where Γ(σ) ⊂ MCG(S)[k] is the Abelian subgroup of the “level k” mapping-class group MCG(S)[k] generated by the Dehn twists τ α about the curves ασ.

This reduces the proof of Theorem 7 to the following lemma:

Lemma 4

There exists an universal constant J 0 = J 0(g, n) ≥ 1 with the following property. For each ɛ > 0, there exists δ > 0 such that, for any \(X \in \mathcal{ T}\) with

$$\displaystyle{ d_{WP}(X,\varphi (X)) <\delta }$$

for some nontrivial φMCG(S)[k], one can find \(\sigma \in \mathcal{ C}(S)\) so that \(\overline{\ell}_{\sigma }(X) <\varepsilon\) and φ jΓ(σ) for some 1 ≤ jJ 0 .

Proof

We begin the proof of the lemma by recalling that the mapping-class group MCG(S)[k] acts on \(\mathcal{T}\) in a properly discontinuous way with no fixed points. Therefore, for each ɛ > 0, there exists δ > 0 such that if d WP (X, φ(X)) < δ for some nontrivial φMCG(S)[k] (i.e., some nontrivial element of the mapping-class group has an “almost fixed point”), then \(\overline{\ell}_{\sigma }(X) <\varepsilon\) (i.e., the “almost fixed point” is close to the boundary of \(\mathcal{T}\)).

Let us show now that in the setting of the previous paragraph, φ jΓ(σ) for some 1 ≤ jJ 0.

In this direction, let \(J_{0} = J_{0}(g,n) \in \mathbb{N}\) be the product of (3g − 3 + n)! and the maximal orders of all finite order elements of the mapping-class groups of “lower complexity” surfaces. By contradiction, let us assume that there exist infinite sequences \(X_{m} \in \mathcal{ T}\), φ m MCG(S)[k], \(m \in \mathbb{N}\), such that \(\overline{\ell}_{\sigma }(X_{m}) \ll 1\) for some \(\sigma \in \mathcal{ C}(S)\) and

$$\displaystyle{ \lim \limits _{m\rightarrow \infty }d(X_{m},\varphi _{m}(X_{m})) = 0 }$$

but φ m jΓ(σ) for all \(m \in \mathbb{N}\), 1 ≤ jJ 0.

Passing to a subsequence (and applying appropriate elements of φ m Γ(σ)), we can assume that the sequence \(X_{m} \in \mathcal{ T}\) converges to some noded Riemann surface \(X_{\sigma } \in \partial \mathcal{T}_{\sigma }\). Because d(X m , φ m (X m )) → 0 as m, we see that, for each βσ,

$$\displaystyle{ \ell_{\varphi _{m}(\beta )}(\varphi _{m}(X)) =\ell _{\beta }(X_{m}) \rightarrow 0. }$$

It follows that, for all m sufficiently large, φ m sends any curve βσ to another curve φ m (β) ∈ σ. Therefore, for each m sufficiently large, there exists

$$\displaystyle{ 1 \leq j = j(m) \leq \#\sigma ! \leq (3g - 3 + n)! \leq J_{0} }$$

such that φ m j fixes each βσ (i.e., φ m j is a reducible element of the mapping-class group). By the Nielsen–Thruston classification of elements of the mapping-class groups, the restrictions of φ m j to each piece of X σ are given by compositions of Dehn twists about the boundary curves with either a pseudo-Anosov or a periodic (finite order) element (in a surface of “lower complexity” than S).

It follows that we have only two possibilities for φ m j: either the restrictions of φ m j to all pieces of X σ are compositions of Dehn twists about certain curves in σ and finite order elements, or the restriction of φ m j to some piece of X σ is the composition of Dehn twists about certain curves in σ and a pseudo-Anosov element.

In the first scenario, by the definition of J 0, we can replace φ m j by an adequate power φ m J with 1 ≤ JJ 0 to “kill” the finite order elements and “keep” the Dehn twists. In other terms, φ m JΓ(σ) (with 1 ≤ JJ 0), a contradiction with our choice of the sequence φ m .

This leaves us with the second scenario. In this case, by definition of J 0, we can replace φ m j by an adequate power φ m J with 1 ≤ JJ 0 such that the restriction of φ m J to some piece of X σ is pseudo-Anosov. However, Daskalopoulos–Wentworth [DaWe] showed that there exists an uniform positive lower bound for

$$\displaystyle{ d_{WP}(X_{\sigma },\varphi _{m}^{J}(X_{\sigma })) }$$

when φ m J is pseudo-Anosov on some piece of X σ . Since 1 ≤ JJ 0 and J 0 is an universal constant, it follows that there exists an uniform positive lower bound for

$$\displaystyle{ d_{WP}(X_{m},\varphi _{m}(X_{m})) }$$

for all m sufficiently large, a contradiction with our choice of the sequences \(X_{m} \in \mathcal{ T}\) and φ m MCG(S)[k].

These contradictions show that the sequences \(X_{m} \in \mathcal{ T}\) and φ m MCG(S)[k] with the properties described above can’t exist.

This completes the proof of the lemma.

6.3.5 Item (VI) of Theorem 3 for WP Flow

We complete in this subsection our discussion of the proof of Theorem 1 modulo Theorem 3 by verifying the item (VI) of Theorem 3 for the WP geodesic flow φ t . More precisely, we will show the following result:

Theorem 8

There are constants C ≥ 1, β > 0, δ > 0 and ρ 0 > 0 such that

$$\displaystyle{ \|D_{v}\varphi _{\tau }\|_{WP} \leq C/\rho _{\tau }(v)^{\beta } }$$

for any 0 ≤ τδ and any \(v \in T^{1}\mathcal{T}\) with

$$\displaystyle{ 0 <\rho _{\tau }(v):=\min \{ d_{WP}(\varphi _{t}(v),\partial \mathcal{T}): t \in [-\tau,\tau ]\} <\rho _{0}. }$$

The proof of this result in [BMW] is naturally divided into two steps.

In the first step, one shows a general result providing an estimate for the first derivative of the geodesic flow φ t on arbitrary negatively curved manifold:

Theorem 9

Let M be a negatively curved manifold. Consider γ: [−τ, τ] → M a geodesic where 0 ≤ τ ≤ 1 and suppose that for everyτtτ the sectional curvatures of any plane containing \(\dot{\gamma }(t) \in T^{1}M\) is greater thanκ(t)2 for some Lipschitz function \(\kappa: [-\tau,\tau ] \rightarrow \mathbb{R}_{+}\) .

Then,

$$\displaystyle{ \|D_{\dot{\gamma }(0)}\varphi _{\tau }\| \leq 1 + 2(1 + u(0)^{2})(1 + \sqrt{1 + u(\tau )^{2}})\exp \left (\int _{ 0}^{\tau }u(s)ds\right ) }$$

where u: [−τ, τ] → [0, ) is the solution of Riccati equation

$$\displaystyle{ u' + u^{2} =\kappa ^{2} }$$

with initial data u(−τ) = 0.

Remark 30

The proof of this theorem involves classical objects in Differential Geometry (e.g., Jacobi fields, matrix Riccati equation, Sasaki metric, etc.), but we will not make more comments on this topic because it is not directly related to the geometry of moduli spaces of Riemann surfaces. Instead, we refer the curious reader to the original article [BMW] of Burns–Masur–Wilkinson (or the paper [Bu] in this volume).

In the second step, one uses the works of Wolpert to exhibit an adequate bound κ(t) for the sectional curvatures of the WP metric along WP geodesics γ(t). More concretely, one has the following theorem:

Theorem 10

There are constants Q, P, L ≥ 2 and 0 < δ < 1 such that for any 0 < δ′ < δ and any geodesic segment \(\gamma: (-\delta ',\delta ') \rightarrow \mathcal{ T}\) there exists a positive Lipschitz function \(\kappa: (-\delta ',\delta ) \rightarrow \mathbb{R}_{+}\) with

  1. (a)

    \(\sup \limits _{v\in T_{\gamma (t)}^{1}\mathcal{T}} -\langle R_{WP}(v,\dot{\gamma }(t))\dot{\gamma }(t),v\rangle _{WP} \leq \kappa ^{2}(t)\) for all t ∈ (−δ′, δ′);

  2. (b)

    κ is Q-controlled in the sense that κ has a right-derivative D + κ satisfying

    $$\displaystyle{ D^{+}\kappa \geq \frac{1 - Q^{2}} {Q} \kappa ^{2} }$$
  3. (c)

    \(\int _{-\delta '}^{\delta '}\kappa (s)ds \leq L\vert \log \rho _{\delta '}(\dot{\gamma }(0))\vert\) ;

  4. (d)

    \(\max \{\kappa (0),\kappa (\delta ')\} \leq P/\rho _{\delta '}(\dot{\gamma }(0))\) .

where \(\rho _{\delta '}(\dot{\gamma }(0))\) is the distance between the geodesic segment γ([−δ′, δ]) and \(\partial \mathcal{T}\) .

Using Theorems 9 and 10, we can easily complete the proof of Theorem 8 (i.e., the verification of item (VI) of Theorem 3 for the WP metric):

Proof (Proof of Theorem 8)

Denote by κ the “WP curvature bound” function provided by Theorem 10 and let \(u: [-\delta,\delta ] \rightarrow \mathbb{R}_{+}\) be the solution of Riccati’s equation

$$\displaystyle{ u' + u^{2} =\kappa ^{2} }$$

with initial data u(−δ) = 0.

Since κ is Q-controlled (in the sense of item (b) of Theorem 10), it follows that u(t) ≤ (t) for all t ∈ [−δ, δ]: indeed, this is so because u(−δ) = 0 ≤ (−δ), and, if u(t 0) = (t 0) for some t 0 ∈ [−δ, δ], then

$$\displaystyle{ u'(t_{0}) =\kappa (t_{0})^{2} - u(t_{ 0})^{2} = (1 - Q^{2})\kappa (t_{ 0})^{2} \leq Q \cdot D^{+}\kappa (t_{ 0}). }$$

Therefore, by applying Theorem 9 in this setting, we deduce that

$$\displaystyle{ \|D_{\dot{\gamma }(0)}\varphi _{\tau }\|_{WP} \leq C/\rho _{\tau }(\dot{\gamma }(0))^{\beta } }$$

for β = L + 3 and some constant C = C(P, Q) ≥ 1. This completes the proof of Theorem 8.

Closing this subsection, let us sketch the proof of Theorem 10 while referring to Sect. 4.4 of Burns–Masur–Wilkinson paper [BMW] (especially Proposition 4.22 of this article) for more details.

We start by describing how the function κ is defined. For this sake, we will use Wolpert’s formulas in Theorem 5 above.

More precisely, since the sectional curvatures of the WP metric approach 0 or − only near the boundary, we can assumeFootnote 12 that our geodesic segment \(\gamma: [-\delta ',\delta '] \rightarrow \mathcal{ T}\) in the statement of Theorem 10 is “relatively close” to a boundary stratum \(\mathcal{T}_{\sigma }\), \(\sigma \in \mathcal{ C}(S)\).

In this setting, for each ασ, we consider the functions \(f_{\alpha }(t):= \sqrt{\ell_{\alpha }(t)}\) and

$$\displaystyle{ r_{\alpha }(t):= \sqrt{\langle \lambda _{\alpha },\dot{\gamma } (t)\rangle ^{2 } +\langle J\lambda _{\alpha },\dot{\gamma } (t)\rangle ^{2}} }$$

(where λ α := grad  α 1∕2) along our geodesic segment \(\gamma: I \rightarrow \mathcal{ T}\), I = (−δ′, δ′). Note that it is natural to consider these functions in view of the statements in Wolpert’s formulas in Theorem 5.

The WP sectional curvatures of planes containing the tangent vectors to γ(I) are controlled in terms of r α and f α . Indeed, given \(v \in T_{\gamma (t)}^{1}\mathcal{T}\), we can use a combined length basis \((\sigma,\chi ) \in \mathcal{ B}\) to write

$$\displaystyle{ v:=\sum \limits _{\alpha \in \sigma }(a_{\alpha }\lambda _{\alpha } + b_{\alpha }J\lambda _{\alpha }) +\sum \limits _{\beta \in \chi }c_{\beta }\text{grad}\,\ell_{\beta } }$$

Similarly, let us write

$$\displaystyle{ \dot{\gamma }(t) =\dot{\gamma }:=\sum \limits _{\alpha \in \sigma }(A_{\alpha }\lambda _{\alpha } + B_{\alpha }J\lambda _{\alpha }) +\sum \limits _{\beta \in \chi }C_{\beta }\text{grad}\,\ell_{\beta } }$$

By Theorem 5, we obtain the following facts. Firstly, since v and \(\dot{\gamma }\) are WP-unit vectors, the coefficients a α , b α , c α , A α , B α , C α are

$$\displaystyle{ a_{\alpha },b_{\alpha },c_{\alpha },A_{\alpha },B_{\alpha },C_{\alpha } = O(1) }$$

Secondly, by definition of r α , we have that

$$\displaystyle{ r_{\alpha }^{2} = \frac{1} {4\pi ^{2}}(A_{\alpha }^{2} + B_{\alpha }^{2}) + O(\,f_{\alpha }^{3}) }$$

Finally,

$$\displaystyle\begin{array}{rcl} -\langle R_{WP}(v,\dot{\gamma })\dot{\gamma },v\rangle _{WP}& =& \sum \limits _{\alpha \in \sigma }(a_{\alpha }^{2}B_{\alpha }^{2} + A_{\alpha }^{2}b_{\alpha }^{2})\langle R_{ WP}(\lambda _{\alpha },J\lambda _{\alpha })J\lambda _{\alpha },\lambda _{\alpha }\rangle _{WP} + O(1) {}\\ & =& \sum \limits _{\alpha \in \sigma }O\left ( \frac{r_{\alpha }^{2}} {f_{\alpha }^{2}}\right ) + O(1) {}\\ \end{array}$$

In summary, Wolpert’s formulas (Theorem 5) imply that

$$\displaystyle{ \sup \limits _{v\in T_{\dot{\gamma }(t)}^{1}\mathcal{T}} -\langle R_{WP}(v,\dot{\gamma }(t))\dot{\gamma }(t),v\rangle _{WP} =\sum \limits _{\alpha \in \sigma }O\left ( \frac{r_{\alpha }(t)^{2}} {f_{\alpha }(t)^{2}}\right ) }$$
(6.2)

(cf. Lemma 4.17 in [BMW]).

Now, we want convert the expressions r α (t)∕f α (t) into a positive Lipschitz function satisfying the properties described in items (b), (c), and (d) of Theorem 10, i.e., a Q-controlled function with appropriately bounded total integral and values at 0 and δ′. We will not give full details on this (and we refer the curious reader to Sect. 4.4 of [BMW]), but, as it turns out, the function

$$\displaystyle{ \kappa (t):= C\max _{\alpha \in \sigma }\left \{1, \frac{r_{\alpha }(t_{\alpha })} {r_{\alpha }(t_{\alpha })\vert t - t_{\alpha }\vert + f_{\alpha }(t_{\alpha })}\right \} }$$

where t α ∈ [−δ′, δ′] is the (unique) time with f α (t) ≥ f α (t α ) for all t ∈ [−δ′, δ′] and C ≥ 1 is a sufficiently large constant satisfies the conditions in items (a), (b), (c) and (d) of Theorem 10. Here, the basic idea is these properties are consequences of the features of two ODE’s (cf. Lemmas 4.15 and 4.16 in [BMW]) for r α and f α . For instance, the verification of item (a) (i.e., the fact that κ controls certain WP sectional curvatures along γ) relies on the fact that these two ODE’s permit to prove that

$$\displaystyle{ \frac{r_{\alpha }(t)} {f_{\alpha }(t)} \leq A\max \left \{1, \frac{r_{\alpha }(t_{\alpha })} {r_{\alpha }(t_{\alpha })\vert t - t_{\alpha }\vert + f_{\alpha }(t_{\alpha })}\right \} }$$

for some sufficiently large constant A ≥ 1. In particular, by plugging this into (6.2), we obtain that

$$\displaystyle{ \sup \limits _{v\in T_{\dot{\gamma }(t)}^{1}\mathcal{T}} -\langle R_{WP}(v,\dot{\gamma }(t))\dot{\gamma }(t),v\rangle _{WP} \leq \kappa ^{2}(t), }$$

i.e., the estimate required by item (a) of Theorem 10.

Concluding this sketch of proof of Theorem 10, let us indicate the two ODE’s on r α and f α .

Lemma 5 (Lemma 4.15 of [BMW])

r α (t) = O( f α 3(t)).

Proof

By differentiating \(r_{\alpha }(t)^{2} =\langle \lambda _{\alpha },\dot{\gamma }(t)\rangle ^{2} +\langle J\lambda _{\alpha },\dot{\gamma }(t)\rangle ^{2}\), we see that

$$\displaystyle{ 2r_{\alpha }(t)r_{\alpha }^{{\prime}}(t) = 2\langle \lambda _{\alpha },\dot{\gamma }(t)\rangle \langle \nabla _{\dot{\gamma } (t)}\lambda _{\alpha },\dot{\gamma }(t)\rangle + 2\langle J\lambda _{\alpha },\dot{\gamma }(t)\rangle \langle J\nabla _{\dot{\gamma }(t)}\lambda _{\alpha },\dot{\gamma }(t)\rangle. }$$

Here, we used the fact that the WP metric is Kähler, so that J is parallel (“commutes with ∇”).

Now, we observe that, by Wolpert’s formulas (cf. Theorem 5), one can write \(\nabla _{\dot{\gamma }(t)}\lambda _{\alpha }\) and \(J\nabla _{\dot{\gamma }(t)}\lambda _{\alpha }\) as

$$\displaystyle{ \nabla _{\dot{\gamma }(t)}\lambda _{\alpha } = \frac{3\langle \dot{\gamma }(t),J\lambda _{\alpha }\rangle } {2\pi f_{\alpha }(t)} J\lambda _{\alpha } + O(\,f_{\alpha }(t)^{3}) }$$

and

$$\displaystyle{ J\nabla _{\dot{\gamma }(t)}\lambda _{\alpha } = -\frac{3\langle \dot{\gamma }(t),J\lambda _{\alpha }\rangle } {2\pi f_{\alpha }(t)} \lambda _{\alpha } + O(\,f_{\alpha }(t)^{3}). }$$

Since \(\max \{\vert \langle \dot{\gamma }(t),\lambda _{\alpha }\vert,\vert \langle \dot{\gamma }(t),J\lambda _{\alpha }\rangle \vert \}\leq r_{\alpha }(t)\) (by definition), we conclude from the previous equations that

$$\displaystyle\begin{array}{rcl} 2r_{\alpha }(t)r_{\alpha }^{{\prime}}(t)& =& \frac{3} {\pi f_{\alpha }(t)}(\langle \lambda _{\alpha },\dot{\gamma }(t)\rangle \langle J\lambda _{\alpha },\dot{\gamma }(t)\rangle ^{2} -\langle \lambda _{\alpha },\dot{\gamma }(t)\rangle \langle J\lambda _{\alpha },\dot{\gamma }(t)\rangle ^{2}) {}\\ & & +O(r_{\alpha }(t)f_{\alpha }(t)^{3}) {}\\ & =& 0 + O(r_{\alpha }(t)f_{\alpha }(t)^{3}). {}\\ \end{array}$$

This proves the lemma.

Remark 31

This ODE is an analog for the WP metric of Clairaut’s relation for the “model metric” on the surface of revolution of the profil y = x 3.

Lemma 6 (Lemma 4.16 of [BMW])

$$\displaystyle{ r_{\alpha }(t)^{2} = f_{\alpha }^{{\prime}}(t)^{2} + \frac{2\pi } {3}f_{\alpha }(t)f_{\alpha }^{{\prime\prime}}(t) + O(\,f_{\alpha }(t)^{4}) }$$

Proof

By definition, λ α = grad  α 1∕2, so that

$$\displaystyle{ f_{\alpha }^{{\prime}}(t) =\langle \lambda _{\alpha },\dot{\gamma }(t)\rangle. }$$

Differentiating this equality and using Wolpert’s formulas (Theorem 5), we see that

$$\displaystyle{ f_{\alpha }^{{\prime\prime}}(t) =\langle \nabla _{\dot{\gamma } (t)}\lambda _{\alpha },\dot{\gamma }(t)\rangle = \frac{3} {2\pi f_{\alpha }(t)}\langle \dot{\gamma }(t),J\lambda _{\alpha }\rangle ^{2} + O(\,f_{\alpha }(t)^{3}) }$$

(Here, we used in the first equality the fact that γ is a geodesic, i.e., \(\ddot{\gamma }(t) = 0\).)

It follows that

$$\displaystyle\begin{array}{rcl} \frac{2\pi } {3}f_{\alpha }(t)f_{\alpha }^{{\prime\prime}}(t) + f_{\alpha }^{{\prime}}(t)^{2}& =& \langle \dot{\gamma }(t),J\lambda _{\alpha }\rangle ^{2} +\langle \dot{\gamma } (t),\lambda _{\alpha }\rangle ^{2} + O(\,f_{\alpha }(t)^{4}) {}\\ & =:& r_{\alpha }(t)^{2} + O(\,f_{\alpha }(t)^{4}). {}\\ \end{array}$$

This proves the lemma.

At this point, the conclusion is that the WP metric (on \(\mathcal{M}(S)[3]\)) satisfies items (I) to (VI) of Theorem 3, so that the desired ergodicity (and mixing) result of Theorem 1 follows.

6.4 Decay of Correlations for the Weil–Petersson Geodesic Flow

Our goal in this section is to discuss the proof of Theorem 2 on the rates of mixing of the Weil–Petersson (WP) geodesic flow on the unit tangent bundle \(T^{1}\mathcal{M}_{g,n}\) of the moduli space \(\mathcal{M}_{g,n}\) of Riemann surfaces of genus g ≥ 0 with n ≥ 0 punctures for 3g − 3 + n ≥ 1.

Let us recall that, by Burns–Masur–Wilkinson Theorem (cf. Theorem 1), the WP flow φ t on \(T^{1}\mathcal{M}_{g,n}\) is mixing with respect to the Liouville measure μ whenever 3g − 3 + n ≥ 1.

By definition of the mixing property, this means that the correlation function \(C_{t}(\,f,g):=\int f \cdot g \circ \varphi _{t}d\mu -\left (\int fd\mu \right )\left (\int gd\mu \right )\) converges to 0 as t for any given L 2-integrable observables f and g. (See, e.g., Hasselblatt’s text [Ha])

Given this scenario, it is natural to ask how fast the correlation function C t ( f, g) converges to zero. In general, the correlation function C t ( f, g) can decay to 0 (as a function of t) in a slow way depending on the choice of the observables. Nevertheless, it is often the case (for mixing flows with some hyperbolicity) that the correlation function C t ( f, g) decays to 0 with a definite (e.g., polynomial, exponential, etc.) speed when restricting the observables to appropriate spaces of “reasonably smooth” functions.

In other words, given a mixing flow (with some hyperbolicity), it is usually possible to choose appropriate functional (e.g., Hölder, C r, Sobolev, etc.) spaces X and Y such that

  • | C t ( f, g) | ≤ C∥ f X g Y t n for some constants C > 0, \(n \in \mathbb{N}\) and for all t ≥ 1 (polynomial decay),

  • or | C t ( f, g) | ≤ C∥ f X g Y e ct for some constants C > 0, c > 0 and for all t ≥ 1 (exponential decay).

Evidently, the “precise” rate of mixing of the flow (i.e., the sharp values of the constants C > 0, \(n \in \mathbb{N}\) and/or c > 0 above) depend on the choice of the functional spaces X and Y (e.g., they might change if we replace C 1 observables by C 2 observables say). On the other hand, the qualitative speed of decay of C t ( f, g), that is, the fact that C t ( f, g) decays polynomially or exponentially as t whenever f and g are “reasonably smooth”, tends to remain unchanged if we select X and Y from a well-behaved scale of functional (like C r spaces, \(r \in \mathbb{N}\), or H s spaces, s > 0). In particular, this partly explains why in the Dynamical Systems literature one simply says that a given mixing flow φ t has “polynomial decay” or “exponential decay”: usually we are interested in the qualitative behavior of the correlation function for reasonably smooth observables, but the particular choice of functional spaces X and Y is normally treated as a “technical detail”.

After this brief description of the notion of rate of mixing (speed of decay of correlation functions), let us restate Theorem 2 as two separate results (for the sake of convenience).

Theorem 11

The rate of mixing of the WP flow φ t on \(T^{1}\mathcal{M}_{g,n}\) is at most polynomial when 3g − 3 + n > 1.

Theorem 12

The rate of mixing of the WP flow φ t on \(T^{1}\mathcal{M}_{g,n}\) is rapid (faster than any polynomial) when 3g − 3 + n = 1.

Remark 32

These results were announced in [BMMW]. Since then, Burns, Masur, Wilkinson and myself found some evidence indicating that the Weil–Petersson geodesic flow on \(T^{1}\mathcal{M}_{g,n}\) is actually exponentially mixing when 3g − 3 + n = 1. The details will hopefully appear in the forthcoming paper (currently still in preparation).

Remark 33

An open problem left by Theorem 11 is to determine the rate of mixing of the WP flow on \(T^{1}\mathcal{M}_{g,n}\) for 3g − 3 + n > 1. Indeed, while this theorem provides a polynomial upper bound for the rate of mixing in this setting, it does not rule out the possibility that the actual rate of mixing of the WP flow is sub-polynomial (even for reasonably smooth observables). Heuristically speaking, we believe that the sectional curvatures of the WP metric control the time spend by WP geodesics near the boundary of \(\overline{\mathcal{M}}_{g,n}\). In particular, it seems that the problem of determining the rate of mixing of the WP flow (when 3g − 3 + n > 1) is somewhat related to the issue of finding suitable (polynomial?) bounds for how close to zero the sectional curvatures of the WP metric can be (in terms of the distance to the boundary of \(\overline{\mathcal{M}}_{g,n}\)). Unfortunately, the best available bounds for the sectional curvatures of the WP metric (due to Wolpert) do not rule out the possibility that some of these quantities get extremely close to zero (see Remark 20 above).

The difference in the rates of mixing of the WP flow on \(T^{1}\mathcal{M}_{g,n}\) when 3g − 3 + n > 1 or 3g − 3 + n = 1 in Theorem 2 reflects the following simple (yet important) feature of the WP metric near the boundary of the Deligne–Mumford compactification of \(\mathcal{M}_{g,n}\).

In the case 3g − 3 + n = 1, e.g., g = 1 = n, the moduli space \(\mathcal{M}_{1,1} \simeq \mathbb{H}/PSL(2, \mathbb{Z})\) equipped with the WP metric looks like the surface of revolution of the profile {v = u 3: 0 < u ≤ 1} near the cusp at infinity (see Remark 21 above). Thus, even though a ɛ-neighborhood of the cusp is “polynomially large” (with area ∼ ɛ 4), the Gaussian curvature approaches only − near the cusp and, as it turns out, this strong negative curvature near the cusp makes that all geodesic not pointing directly towards the cusp actually come back to the compact part in bounded (say ≤ 1) time. In other words, the excursions of infinite WP geodesics on \(\mathcal{M}_{1,1}\) near the cusp are so quick that the WP flow on \(T^{1}\mathcal{M}_{1,1}\) is “close” to a classical Anosov geodesic flow on negatively curved compact surface. In particular, it is not entirely surprising that the WP flow on \(T^{1}\mathcal{M}_{1,1}\) is rapidly mixing.

On the other hand, in the case 3g − 3 + n > 1, the WP metric on \(\mathcal{M}_{g,n}\) has some sectional curvatures close to zero near the boundary of the Deligne–Mumford compactification \(\overline{\mathcal{M}}_{g,n}\) of \(\mathcal{M}_{g,n}\) (cf. Remark 20). By exploiting this feature of the WP metric on \(\mathcal{M}_{g,n}\) for 3g − 3 + n > 1 (that has no counterpart for \(\mathcal{M}_{1,1}\) or \(\mathcal{M}_{0,4}\)), we will build a nonneglegible set of WP geodesics spending a long time near the boundary of \(\overline{\mathcal{M}}_{g,n}\) before eventually getting into the compact part. In this way, we will deduce that the WP flow on \(\mathcal{M}_{g,n}\) takes a fair (polynomial) amount of time to mix certain parts of the boundary of \(\overline{\mathcal{M}}_{g,n}\) with fixed compact subsets of \(\mathcal{M}_{g,n}\).

In the remainder of this text, we will give some details of the proof of Theorem 2 (or, equivalently, Theorems 11 and 12). In the next subsection, we give a fairly complete proof of the polynomial upper bound on the rate of mixing of the WP flow on \(T^{1}\mathcal{M}_{g,n}\) when 3g − 3 + n > 1. After that, in the final subsection, we provide a sketch of the proof of the rapid mixing property of the WP flow on \(T^{1}\mathcal{M}_{1,1}\). In fact, we decided (for pedagogical reasons) to explain some key points of the rapid mixing property only in the toy model case of a negatively curved surface with one cusp corresponding exactly to a surface of revolution of a profile {v = u r}, r > 3. In this way, since the WP metric near the cusp of \(\mathcal{M}_{1,1} \simeq \mathbb{H}/PSL(2, \mathbb{Z})\) can be thought as a “perturbation” of the surface of revolution of the “borderline profile” {v = u 3} with r = 3 (thanks to Wolpert’s asymptotic formulas), the reader hopefully will get a flavor of the main ideas behind the proof of rapid mixing of the WP flow on \(\mathcal{M}_{1,1}\) without getting into the (somewhat boring) technical details needed to check that the arguments used in the toy model case are “sufficiently robust” so that they can be “carried over” to the “perturbative setting” of the WP flow on \(T^{1}\mathcal{M}_{1,1}\).

6.4.1 Rates of Mixing of the WP Flow on \(T^{1}\mathcal{M}_{g,n}\) I: Proof of Theorem 11

In this subsection, our notations are the same as in Sect. 6.3.

Given ɛ > 0, let us consider the portion of \(\mathcal{M}_{g,n}\) consisting of \(X \in \mathcal{ M}_{g,n}\) such that a nonseparating (homotopically nontrivial, nonperipheral) simple closed curve α has hyperbolic length α (X) ≤ (2ɛ)2. The following picture illustrates this portion of \(\mathcal{M}_{g,n}\) as a (2ɛ)2-neighborhood of the stratum \(\mathcal{T}_{\alpha }/MCG_{g,n}\) of the boundary of the Deligne–Mumford compactification \(\overline{\mathcal{M}}_{g,n}\) where α gets pinched (i.e.,  α becomes zero).

Note that the stratum \(\mathcal{T}_{\alpha }/MCG_{g,n}\) is nontrivial (that is, not reduced to a single point) when 3g − 3 + n > 1. Indeed, by pinching α as above and by disconnecting the resulting node, we obtain Riemann surfaces of genus g − 1 with n + 2 punctures whose moduli space is isomorphic to \(\mathcal{T}_{\alpha }/MCG_{g,n}\). It follows that \(\mathcal{T}_{\alpha }/MCG_{g,n}\) is a complex orbifold of dimension 3(g − 1) + (n + 2) = 3g − 3 + n − 1 > 0, and, a fortiori, \(\mathcal{T}_{\alpha }/MCG_{g,n}\) is not trivial. Evidently, this argument breaks down when 3g − 3 + n = 1: for example, by pinching a curve α as above in a once-punctured torus and by removing the resulting node, we obtain thrice punctured spheres (whose moduli space \(\mathcal{M}_{0,3} =\{ \overline{\mathbb{C}} -\{ 0,1,\infty \}\}\) is trivial). In particular, our Fig. 6.5 concerns exclusively the case 3g − 3 + n > 1.

Fig. 6.5
figure 5

A portion of the boundary of \(\mathcal{M}_{g,n}\) (when 3g − 3 + n > 1)

We want to locate certain regions near \(\mathcal{T}_{\alpha }/MCG_{g,n}\) taking a long time to mix with the compact part of \(\mathcal{M}_{g,n}\). For this sake, we will exploit the geometry of the WP metric near \(\mathcal{T}_{\alpha }/MCG_{g,n}\)—e.g., Wolpert’s formulas in Theorem 5—to build nice sets of unit vectors traveling in an “almost parallel” way to \(\mathcal{T}_{\alpha }/MCG_{g,n}\) for a significant amount of time.

More precisely, we consider the vectors λ α := grad( α 1∕2) and α (where J is the complex structure). By definition, they span a complex line L = span{λ α ,  α }. Intuitively, the complex line L points in the normal direction to a “copy” of \(\mathcal{T}_{\alpha }/MCG_{g,n}\) inside a level set of the function α 1∕2 as indicated below:

Using the complex line L, we can formalize the notion of “almost parallel” vector to \(\mathcal{T}_{\alpha }/MCG_{g,n}\). Indeed, given \(v \in T^{1}\mathcal{M}_{g,n}\), let us denote by r α (v) the quantity \(r_{\alpha }(v):= \sqrt{\langle v,\lambda _{\alpha } \rangle ^{2 } +\langle v, J\lambda _{\alpha }\rangle ^{2}}\) (where 〈⋅ , ⋅ 〉 is the WP metric). By definition, r α (v) measures the size of the projection of the unit vector v in the complex line L. In particular, we can think of v as “almost parallel” to \(\mathcal{T}_{\alpha }/MCG_{g,n}\) whenever the quantity r α (v) is close to zero.

In this setting, we will show that unit vectors almost parallel to \(\mathcal{T}_{\alpha }/MCG_{g,n}\) whose footprints are close to \(\mathcal{T}_{\alpha }/MCG_{g,n}\) always generate geodesics staying near \(\mathcal{T}_{\alpha }/MCG_{g,n}\) for a long time. More concretely, given ɛ > 0, let us define the set

$$\displaystyle{ V _{\varepsilon }:=\{ v \in T^{1}\mathcal{M}_{ g,n}: f_{\alpha }(v) \leq \varepsilon,\,r_{\alpha }(v) \leq \varepsilon ^{2}\} }$$

where f α (v):= α 1∕2( p) and \(p \in \mathcal{ M}_{g,n}\) is the footprint of the unit vector \(v \in T^{1}\mathcal{M}_{g,n}\). Equivalently, V ɛ is the disjoint union of the pieces of spheres \(S_{\varepsilon }(\,p):=\{ v \in T_{p}^{1}\mathcal{M}_{g,n}: r_{\alpha }(v) \leq \varepsilon ^{2}\}\) attached to points \(p \in \mathcal{ M}_{g,n}\) with α ( p) ≤ ɛ 2. The following figure summarizes the geometry of S ɛ ( p):

We would like to prove that a geodesic γ v (t) originating at any vV ɛ stays in a (2ɛ)2-neighborhood of \(\mathcal{T}_{\alpha }/MCG_{g,n}\) for an interval of time [0, T] of size of order 1∕ɛ, so that the WP geodesic flow does not mix V ɛ with any fixed ball U in the compact part of \(\mathcal{M}_{g,n}\) of Riemann surfaces with systole > (2ɛ)2:

In this direction, we will need the following estimate from Lemma 5 above: given γ(t) be a WP geodesic as above, and denoting by \(r_{\alpha }(t) = r_{\alpha }(\dot{\gamma }(t))\) and f α (t) = α 1∕2(γ(t)), then

$$\displaystyle{ r_{\alpha }^{{\prime}}(t) = O(\,f_{\alpha }(t)^{3}) }$$

From this inequality, it is not hard to estimate the amount of time spent by a geodesic γ v (t) near \(\mathcal{T}_{\alpha }/MCG_{g,n}\) for an arbitrary vV ɛ :

Lemma 7

There exists a constant C 0 > 0 (depending only on g and n) such that

$$\displaystyle{ \ell_{\alpha }^{1/2}(\gamma _{ v}(t)) = f_{\alpha }(t) \leq 2\varepsilon }$$

for all vV ɛ and 0 ≤ t ≤ 1∕C 0 ɛ.

Proof

By definition, vV ɛ implies that f α (0) ≤ ɛ. Thus, it makes sense to consider the maximal interval [0, T] of time such that f α (t) ≤ 2ɛ for all 0 ≤ tT.

By Lemma 5, we have that r α (s) = O( f α (s)3), i.e., | r α (s) | ≤ Bf α (s)3 for some constant B > 1∕4 depending only on g and n. In particular, | r α (s) | ≤ Bf α (s)3B(2ɛ)3 for all 0 ≤ sT. From this estimate, we deduce that

$$\displaystyle{ r_{\alpha }(t) = r_{\alpha }(0) +\int _{ 0}^{t}r_{\alpha }^{{\prime}}(s)\,ds \leq r_{\alpha }(0) + B(2\varepsilon )^{3}t = r_{\alpha }(0) + 8B\varepsilon ^{3}t }$$

for all 0 ≤ tT. Since r α (0) ≤ ɛ 2 whenever vV ɛ , the previous inequality tells us that

$$\displaystyle{ r_{\alpha }(t) \leq \varepsilon ^{2} + 8B\varepsilon ^{3}t }$$

for all 0 ≤ tT.

Next, we observe that, by definition, \(f_{\alpha }^{{\prime}}(t) =\langle \dot{\gamma } (t),\text{grad}\ell_{\alpha }^{1/2}\rangle =\langle \dot{\gamma } (t),\lambda _{\alpha }\rangle\). Hence,

$$\displaystyle{ \vert f_{\alpha }^{{\prime}}(t)\vert = \vert \langle \dot{\gamma }(t),\lambda _{\alpha }\rangle \vert \leq \sqrt{\langle \dot{\gamma }(t),\lambda _{\alpha } \rangle ^{2 } +\langle \dot{\gamma } (t), J\lambda _{\alpha }\rangle ^{2}} = r_{\alpha }(t) }$$

By putting together the previous two inequalities with the fact that f α (0) ≤ ɛ (as vV ɛ ), we conclude that

$$\displaystyle{ f_{\alpha }(T) = f_{\alpha }(0) +\int _{ 0}^{T}f_{\alpha }^{{\prime}}(t)\,dt \leq \varepsilon +\varepsilon ^{2}T + 4B\varepsilon ^{3}T^{2} }$$

Since T > 0 was chosen so that [0, T] is the maximal interval with f α (t) ≤ 2ɛ for all 0 ≤ tT, we have that f α (T) = 2ɛ. Therefore, the previous estimate can be rewritten as

$$\displaystyle{ 2\varepsilon \leq \varepsilon +\varepsilon ^{2}T + 4B\varepsilon ^{3}T^{2} }$$

Because B > 1∕4, it follows from this inequality that T ≥ 1∕C 0 ɛ where C 0:= 8B.

In other words, we showed that [0, 1∕C 0 ɛ] ⊂ [0, T], and, a fortiori, f α (t) ≤ 2ɛ for all 0 ≤ t ≤ 1∕C 0 ɛ. This completes the proof of the lemma.

Once we have Lemma 7 in our toolbox, it is not hard to infer some upper bounds on the rate of mixing of the WP flow on \(T^{1}\mathcal{M}_{g,n}\) when 3g − 3 + n > 1.

Proposition 3

Suppose that the WP flow φ t on \(T^{1}\mathcal{M}_{g,n}\) has a rate of mixing of the form

$$\displaystyle{ C_{t}(a,b) = \left \vert \int a \cdot b \circ \varphi _{t} -\left (\int a\right )\left (\int b\right )\right \vert \leq Ct^{-\gamma }\|a\|_{ C^{1}}\|b\|_{C^{1}} }$$

for some constants C > 0, γ > 0, for all t ≥ 1, and for all choices of C 1 -observables a and b.

Then, γ ≤ 10, i.e., the rate of mixing of the WP flow is at most polynomial.

Proof

Let us fix once and for all an open ball U (with respect to the WP metric) contained in the compact part of \(\mathcal{M}_{g,n}\): this means that there exists ɛ 0 > 0 such that the systoles of all Riemann surfaces in U are ≥ ɛ 0 2.

Take a C 1 function a supported on the set T 1 U of unit vectors with footprints on U with values 0 ≤ a ≤ 1 such that ∫a ≥ vol(U)∕2 and \(\|a\|_{C^{1}} = O(1)\): such a function a can be easily constructed by smoothing the characteristic function of U with the aid of bump functions.

Next, for each ɛ > 0, take a C 1 function b ɛ supported on the set V ɛ with values 0 ≤ b ɛ ≤ 1 such that ∫b ɛ ≥ vol(V ɛ )∕2 and \(\|b_{\varepsilon }\|_{C^{1}} = O(1/\varepsilon ^{2})\): such a function b ɛ can also be constructed by smoothing the characteristic function of V ɛ after taking into account the description of the WP metric near \(\mathcal{T}_{\alpha }/MCG_{g,n}\) given by Theorems 4 and 5 above and the definition of V ɛ (in terms of the conditions α 1∕2ɛ and r α ɛ 2). Furthermore, this description of the WP metric g WP near \(\mathcal{T}_{\alpha }/MCG_{g,n}\) combined with the asymptotic expansion g WP ∼ 4dx α 2 + x α 6 α where \(x_{\alpha }:=\ell_{ \alpha }^{1/2}/\sqrt{2\pi ^{2}}\) and τ α is a twist parameter says that vol(V ɛ ) ∼ ɛ 8 (cf. the proof of Lemma 2 for more details): indeed, the condition f α = α 1∕2ɛ on footprints of unit tangent vectors in V ɛ provides a set of volume ∼ ɛ 4 and the condition r α ɛ 2 on unit tangent vectors in V ɛ with a fixed footprint provides a set of volume comparable to the Euclidean area πɛ 4 of the Euclidean ball \(\{\mathbf{v} \in \mathbb{R}^{2}: \vert v\vert \leq \varepsilon ^{2}\}\) (cf. Theorem 4), so that

$$\displaystyle{ \text{vol}(V _{\varepsilon }) =\int _{\{\ell_{\alpha }^{1/2}(\,p)\leq \varepsilon \}}\text{vol}(\{v \in T_{p}^{1}\mathcal{M}_{ g,n}: r_{\alpha }(v) \leq \varepsilon ^{2}\}) \sim (\pi \varepsilon ^{4}) \cdot \varepsilon ^{4} \sim \varepsilon ^{8} }$$

In summary, for each ɛ > 0, we have a C 1 function b ɛ supported on V ɛ with 0 ≤ b ≤ 1, \(\|b_{\varepsilon }\|_{C^{1}} = O(1/\varepsilon ^{2})\) and ∫b ɛ c 0 ɛ 8 for some constant c 0 > 0 depending only on g and n.

Our plan is to use the observables a and b ɛ to give some upper bounds on the mixing rate of the WP flow φ t . For this sake, suppose that there are constants C > 0 and γ > 0 such that

$$\displaystyle{ C_{t}(a,b_{\varepsilon }) = \left \vert \int a \cdot b_{\varepsilon } \circ \varphi _{t} -\left (\int a\right )\left (\int b_{\varepsilon }\right )\right \vert \leq Ct^{-\gamma }\|a\|_{ C^{1}}\|b_{\varepsilon }\|_{C^{1}} }$$

for all t ≥ 1 and ɛ > 0.

By Lemma 7, there exists a constant C 0 > 0 such that \(\varphi _{ - \frac{1} {C_{0}\varepsilon } }(V _{\varepsilon }) \cap T^{1}U = \varnothing\) whenever 2ɛ < ɛ 0. Indeed, since V ɛ is a symmetric set (i.e., vV ɛ if and only if − vV ɛ ), it follows from Lemma 7 that all Riemann surfaces in the footprints of vectors in \(\varphi _{ - \frac{1} {C_{0}\varepsilon } }(V _{\varepsilon })\) have a systole ≤ (2ɛ)2 < ɛ 0 2. Because we took U in such a way that all Riemann surfaces in U have systole ≥ ɛ 0 2, we obtain \(\varphi _{- \frac{1} {C_{0}\varepsilon } }(V _{\varepsilon }) \cap T^{1}U = \varnothing\), as it was claimed.

Now, let us observe that the function a ⋅ b ɛ φ t is supported on φ t (V ɛ ) ∩ T 1 U because a is supported on T 1 U and b ɛ is supported on V ɛ . By putting together this fact and the claim in the previous paragraph (that \(\varphi _{ - \frac{1} {C_{0}\varepsilon } }(V _{\varepsilon }) \cap T^{1}U = \varnothing\) for 2ɛ < ɛ 0), we deduce that \(a \cdot b_{\varepsilon } \circ \varphi _{ \frac{1} {C_{0}\varepsilon } } \equiv 0\) whenever 2ɛ < ɛ 0. Thus,

$$\displaystyle{ C_{ \frac{1} {C_{0}\varepsilon } }(a,b_{\varepsilon }):= \left \vert \int a \cdot b_{\varepsilon } \circ \varphi _{ \frac{1} {C_{0}\varepsilon } } -\left (\int a\right )\left (\int b_{\varepsilon }\right )\right \vert = \left (\int a\right )\left (\int b_{\varepsilon }\right ) }$$

By plugging this identity into the polynomial decay of correlations estimate \(C_{t}(a,b_{\varepsilon }) \leq Ct^{-\gamma }\|a\|_{C^{1}}\|b_{\varepsilon }\|_{C^{1}}\), we get

$$\displaystyle{ \left (\int a\right )\left (\int b_{\varepsilon }\right ) = C_{ \frac{1} {C_{0}\varepsilon } }(a,b_{\varepsilon }) \leq CC_{0}^{\gamma }\varepsilon ^{\gamma }\|a\|_{ C^{1}}\|b_{\varepsilon }\|_{C^{1}} }$$

whenever 2ɛ < ɛ 0 and 1∕C 0 ɛ ≥ 1.

We affirm that the previous estimate implies that γ ≤ 10. In fact, recall that our choices were made so that ∫a ≥ vol(U)∕2 where U is a fixed ball, \(\|a\|_{C^{1}} = O(1)\), ∫b ɛ c 0 ɛ 8 for some constant c 0 > 0 and \(\|b_{\varepsilon }\|_{C^{1}} = O(1/\varepsilon ^{2})\). Hence, by combining these facts and the previous mixing rate estimate,

$$\displaystyle{ \left (\frac{\text{vol}(U)} {2} \right )c_{0}\varepsilon ^{8} \leq \left (\int a\right )\left (\int b_{\varepsilon }\right ) \leq CC_{ 0}^{\gamma }\varepsilon ^{\gamma }\|a\|_{ C^{1}}\|b_{\varepsilon }\|_{C^{1}} = O(\varepsilon ^{\gamma }\frac{1} {\varepsilon ^{2}} ), }$$

that is, ɛ 10 γ, for some constant D > 0 and for all ɛ > 0 sufficiently small (so that 2ɛ < ɛ 0 and 1∕C 0 ɛ ≥ 1). It follows that γ ≤ 10, as we claimed. This completes the proof of the proposition.

Remark 34

In the statement of the previous proposition, the choice of C 1-norms to measure the rate of mixing of the WP flow is not very important. Indeed, an inspection of the construction of the functions b ɛ in the argument above reveals that \(\|b_{\varepsilon }\|_{C^{k+\alpha }} = O(1/\varepsilon ^{k+\alpha })\) for any \(k \in \mathbb{N}\), 0 ≤ α < 1. In particular, the proof of the previous proposition is sufficiently robust to show also that a rate of mixing of the form

$$\displaystyle{ C_{t}(a,b) = \left \vert \int a \cdot b \circ \varphi _{t} -\left (\int a\right )\left (\int b\right )\right \vert \leq Ct^{-\gamma }\|a\|_{ C^{k+\alpha }}\|b\|_{C^{k+\alpha }} }$$

for some constants C > 0, γ > 0, for all t ≥ 1, and for all choices of C k+α-observables a and b holds only if γ ≤ 8 + 2(k + α).

In other words, even if we replace C 1-norms by (stronger, smoother) C k+α-norms in our measurements of rates of mixing of the WP flow (on \(T^{1}\mathcal{M}_{g,n}\) for 3g − 3 + n > 1), our discussions so far will always give polynomial upper bounds for the decay of correlations.

At this point, our discussion of the proof of Theorem 11 (i.e., the first item of Theorem 2) is complete thanks to Proposition 3 and Remark 34. So, we will now move on to the next subsection where we give some of the key ideas in the proof of Theorem 12 (i.e., the second item of Theorem 2).

6.4.2 Rates of Mixing of the WP Flow on \(T^{1}\mathcal{M}_{g,n}\) II: Proof of Theorem 12

Let us consider the WP flow on \(T^{1}\mathcal{M}_{g,n}\) when 3g − 3 + n = 1, that is, when (g, n) = (0, 4) or (1, 1).

Actually, we will restrict our attention to the case (g, n) = (1, 1) because the remaining case (g, n) = (0, 4) is similar to (g, n) = (1, 1).

Indeed, the moduli space \(\mathcal{M}_{0,4}\) of four-times punctured spheres is a finite cover of the moduli space \(\mathcal{M}_{1,1} \simeq \mathbb{H}/SL(2, \mathbb{Z})\): this can be seen by sending each four-punctured sphere \(\overline{\mathbb{C}} -\{ x_{1},\ldots,x_{4}\}\) to the elliptic curve y 2 = (xx 1)(xx 4), so that \(\mathcal{M}_{0,4}\) becomes naturally isomorphic to \(\mathbb{H}/\varGamma _{0}(2)\) where Γ 0(2) is a congruence subgroup of \(SL(2, \mathbb{Z})\) of level 2 with index 3. Since all arguments towards rapid mixing of geodesic flows in this section still work after taking finite covers, it suffices to prove Theorem 12 for the WP flow on \(T^{1}\mathcal{M}_{1,1}\).

The rate of mixing of a geodesic flow on the unit tangent bundle of a negatively curved compact surface is known to be fast: indeed, Chernov [C] used his technique of “Markov approximations” to show stretched exponential decay of correlations, and Dolgopyat [Dol] added a new crucial ingredient (“Dolgopyat’s estimate”) to Chernov’s work to prove exponential decay of correlations.

Evidently, these works of Chernov and Dolgopyat can not be applied to the WP flow on \(T^{1}\mathcal{M}_{1,1}\) because of the noncompactness of \(\mathcal{M}_{1,1} \sim \mathbb{H}/SL(2, \mathbb{Z})\) due to the presence of a (single) cusp (at infinity). Nevertheless, this suggests that we should be able to determine the rate of mixing of the WP flow on \(T^{1}\mathcal{M}_{1,1}\) provided we have enough control of the geometry of the WP metric near the cusp.

Fortunately, as we mentioned in Example 5 above, Wolpert showed that the WP metric g WP on \(\mathcal{M}_{1,1} \simeq \mathbb{H}/SL(2, \mathbb{Z})\) has an asymptotic expansion \(g_{WP}^{2} \sim \frac{\vert dz\vert ^{2}} {\text{Im}(z)}\) at a point \(z \in \mathbb{H}\). Thus, the WP metric on neighborhoods \(\{z = x + iy \in \mathbb{H}: \vert x\vert \leq 1/2,y> y_{0}\}/SL(2, \mathbb{Z})\) (with y 0 > 1) of the cusp at infinity of \(\mathcal{M}_{1,1}\) becomes closer (as y 0) to the metric of surface of revolution of the profile v = u 3 on neighborhoods {v = u 3: 0 ≤ u < u 0} of the cusp at 0 (as u 0 → 0).

Partly motivated by the scenario of the previous paragraph, from now on we will pretend that the WP metric on \(\mathbb{H}/PSL(2, \mathbb{Z})\) looks exactly like the metric \(\frac{\vert dz\vert ^{2}} {\text{Im}(z)}\) at all points \(\{z \in \mathbb{H}: \text{Im}(z)> y_{0}\}\) for some y 0 ≫ 1. In other words, instead of studying the WP flow on \(T^{1}\mathcal{M}_{1,1}\), we will focus on the rates of mixing of the following toy model: the geodesic flow on a negatively curved surface S with a single cusp possessing a neighborhood where the metric is isometric to the surface of revolution of a profile {v = u r} for a fixed real number r > 3.

Remark 35

The surface of revolution modeling the WP metric on \(T^{1}\mathcal{M}_{1,1}\) is obtained by rotating the profile {v = u 3}. In other words, the study of rates of mixing of the surface of revolution approximating the WP metric on \(T^{1}\mathcal{M}_{1,1}\) is a “borderline case” in our subsequent discussion.

Here, our main motivations to replace the WP flow φ t on \(T^{1}\mathcal{M}_{1,1}\) by the toy model described above are:

  • all important ideas for the study of rates of mixing of φ t are also present in the case of the toy model, and

  • even though the WP metric on \(\mathcal{M}_{1,1}\) is a perturbation of a surface of revolution, the verification of the fact that the arguments used to estimate the decay of correlations of the geodesic flow on the toy model surfaces are robust enough so that they can be carried over the WP metric situation is somewhat boring: basically, besides performing a slight modification of the proofs to include the borderline case r = 3, one has to introduce “error terms” in the whole discussion below and, after that, one has to check that these errors terms do not change the qualitative nature of all estimates.

In summary, the remainder of this subsection will contain a proof of the following “toy model version” of Theorem 12.

Theorem 13

Let \(\overline{S}\) be a compact surface and fix \(0 \in \overline{S}\) . Suppose that \(S = \overline{S} -\{ 0\}\) is equipped with a negatively curved Riemannian metric g such that the restriction of g to a neighborhood of {pS: d( p, 0) < ρ 0} is isometric to a surface of revolution of a profile {v = u r: 0 < uu 0} (for some choices of ρ 0 > 0 and u 0 > 0).

Then, the geodesic flow (associated to g) on T 1 S is rapid (faster than polynomial) mixing in the sense that, for all \(n \in \mathbb{N}\) , one can choose an adequate Banach space X n of “reasonably smooth” observables and a constant C n > 0 so that

$$\displaystyle{ C_{t}(a,b) = \left \vert \int a \cdot b \circ \varphi _{t} -\left (\int a\right )\left (\int b\right )\right \vert \leq C_{n}t^{-n}\|a\|_{ X_{n}}\|b\|_{X_{n}} }$$

for all t ≥ 1.

Remark 36

The arguments below show that the statement above also holds when \(S = \overline{S} -\{ 0_{1},\ldots,0_{k}\}\) is equipped with a negatively curved metric that is isometric to a surface of revolution \(\{v = u^{r_{i}}\}\), r i > 3, near 0 i for each i = 1, , k.

Remark 37

The Riemannian metric g is incomplete because the surface of revolution of {v = u r} is incomplete when r > 1 (as the reader can check via a simple calculation).

Recall that, in the setting of Theorem 13, we want to understand the dynamics of the excursions of the geodesic flow near the cusp 0 (in order to get rapid mixing). For this sake, we describe these excursions by rewriting the geodesic flow (near 0) as a suspension flow.

6.4.2.1 Excursions Near the Cusp and Suspension Flows

Consider a small neighborhood in S of 0 where the metric is isometric to the surface of revolution of the profile {v = u r: 0 < uu 0}, i.e., 

$$\displaystyle{ \{(x,x^{r}\cos y,x^{r}\sin y) \in \mathbb{R}^{3}: 0 <x \leq u_{ 0},0 \leq y \leq 2\pi \} }$$

Next, take 0 < d 0 < u 0 a small parameter and consider the parallel \(C = C(d_{0}) =\{ (d_{0},d_{0}^{r}\cos y,d_{0}^{r}\sin y) \in \mathbb{R}^{3}: 0 \leq y \leq 2\pi \}\). We parametrize unit tangent vectors to the surface of revolution with footprints in C as follows.

Given q = (d 0, d 0 rcosy 0, d 0 rsiny 0) ∈ C, we denote by V = V (q) ∈ T q 1 S the unique unit tangent vector pointing towards to the cusp O at x = 0. Equivalently, V is the unit vector tangent to the meridian \(\{(d_{0} - t,(d_{0} - t)^{r}\cos y_{0},(d_{0} - t)^{r}\sin y_{0}) \in \mathbb{R}^{3}: 0 \leq t <d_{0}\}\) at time t = 0, or, alternatively, V (q) = −∇d(q) where d( p) = dist(O, p) is the distance function from the cusp O to a point p. Also, we let JV = JV (q) be the unit vector obtained by rotating V by π∕2 in the counterclockwise sense (i.e., by applying the natural almost complex structure J).

In this setting, an unit vector vT q 1 S pointing towards the cusp O is completely determined by a real number β ∈ (−π∕2, π∕2) such that 〈v, V 〉 = cosβ and 〈v, JV 〉 = sinβ, i.e., 

$$\displaystyle{ v =\cos \beta \cdot V +\sin \beta \cdot JV:= v(\beta ) }$$

The qualitative behavior of the excursion of a geodesic γ(t) = (x(t), x(t)rcosy(t), x(t)rsiny(t)) starting at \(\dot{\gamma }(0) = v(\beta ) \in T_{q}^{1}S\) can be easily determined in terms of the parameter β thanks to the classical results in Differential Geometry about surfaces of revolutions. Indeed, it is well-known (see, e.g., Do Carmo’s book [DoC]) that such a geodesic γ(t) satisfies

$$\displaystyle{ x(t)^{2r}y'(t) = c }$$

and

$$\displaystyle{ (1 + r^{2}x(t)^{2(r-1)})x'(t)^{2} + \frac{c^{2}} {x(t)^{2r}} = 1 }$$

for a certain constant c, and, furthermore, these relations imply the famous Clairaut’s relation:

$$\displaystyle{ x(t)^{r}\cos \vert \frac{\pi } {2} -\vert \beta (t)\vert \vert = c = constant }$$
(6.3)

where β(t) is the parameter attached to \(\dot{\gamma }(t)\) (i.e., \(\dot{\gamma }(t) = v(\beta (t)) \in T_{\gamma (t)}^{1}C(x(t))\)). In particular, except for the geodesic going directly to the cusp (i.e., the geodesic starting at V (q) associated to β = 0), all geodesics γ(t) (starting at v(β) with β ≠ 0) behave qualitatively in a simple way. In the first part t ∈ [0, T(β)∕2] of its excursion towards the cusp, the angle β(t) increases (resp. decreases) from β > 0 to π∕2 (resp. from β < 0 to −π∕2) while the value of x(t) diminishes in order to keep up with Clairaut’s relation. Then, the geodesic γ(t) reaches its closest position to the cusp at time t = T(β)∕2: here, β(t) = ±π∕2 (i.e., \(\dot{\gamma }(T(\beta )/2)\) is tangent to the parallel C(x(T(β)∕2)) containing γ(T(β)∕2)) and, hence,

$$\displaystyle{ x(T(\beta )/2)^{r} = x(0)^{r}\sin \beta = d_{ 0}^{r}\sin \beta:= x_{\min }(\beta )^{r} }$$

Finally, in the second part t ∈ [T(β)∕2, T(β)], γ(t) does the “opposite” from the first part: the angle β(t) goes from ±π∕2 to ±π∕2 −β and x(t) increases from x min(β) back to x(0) = d 0. The following picture summarizes the discussion of this paragraph:

Remark 38

Note that the time T(β) taken by the geodesic γ(t) to go from the parallel C = C(d 0) to C(x min(β)) and then from C(x min(β)) back to C is independent of the base-point q = γ(0) ∈ C. Indeed, this is a direct consequence of the rotational symmetry of our surface. Alternatively, this can be easily seen from the formula

$$\displaystyle{ \frac{T(\beta )} {2} =\int _{ x_{\min }(\beta )}^{x(0)}x^{r}\sqrt{\frac{1 + (rx^{r-1 } )^{2 } } {x^{2r} - c^{2}}} \,dx =\int _{ d_{0}(\sin \beta )^{1/r}}^{d_{0} }x^{r}\sqrt{\frac{1 + (rx^{r-1 } )^{2 } } {x^{2r} - (d_{0}^{r}\sin \beta )^{2}}} \,dx }$$

deduced by integration of the ODE satisfied by x(t). Observe that this formula also shows that T(β) is uniformly bounded, i.e., \(T(\beta ) = O_{d_{0},r}(1)\) for all β ≠ 0. Geometrically, this means that all geodesics γ(t) starting at C must return to C in bounded time unless they go directly into the cusp.

This description of the excursions of geodesics near the cusp permits to build a suspension-flow model of the geodesic flow near O. Indeed, let us consider the cross-section \(N = T_{C}^{1}S = T_{C(d_{0})}^{1}S\). As we saw above, an element of the surface N is parametrized by two angular coordinates y and β: the value of y determines a point q = (d 0, d 0 rcosy, d 0 rsiny) ∈ C and the value of β determines an unit tangent vector v(β) ∈ T q 1 S making angle β with V (q). The subset M of N consisting of those elements v(β) with angular coordinate −π∕2 < β < π∕2 corresponds to the unit vectors with footprint in C pointing towards the cusp at O. The equation β = 0 determines a circle Σ inside M corresponding to geodesics going straight into the cusp, and, furthermore, we have a natural “first-return map” F: MΣN defined by \(F(v(\beta )) =\dot{\gamma } _{v(\beta )}(T(\beta ))\) where γ v(β) is the geodesic starting at v(β) at time t = 0.

In this setting, the orbits γ v(β)(t), t ∈ [0, T(β)] are modeled by the “suspension flow” φ t (v(β), s) = (v(β), s + t) if 0 ≤ s + t < T(β), φ T(β)(v(β), 0) = (F(v(β)), 0) over the base map F with roof function \(T: M-\varSigma \rightarrow \mathbb{R}\), T(v(β)) = T(β).

Remark 39

Technically speaking, one needs to “complete” the definition of F and r by including the dynamics of the geodesic flow on the compact part of S in order to properly write the geodesic flow on S as a suspension flow. Nevertheless, since the major technical difficulty in the proof of Theorem 13 comes from the presence of the cusp, we will ignore the excursions of geodesics in the compact part S and we will pretend that the (partially defined) flow φ t is a “genuine” suspension flow model.

6.4.2.2 Rapid Mixing of Contact Suspension Flows

One of the advantages about thinking of the geodesic flow on S as a suspension flow comes from the fact that several authors have previously studied the interplay between the rates of mixing of this class of flows and the features of F and r: see, e.g., these papers of Avila–Gouëzel–Yoccoz [AGY] and Melbourne [Melb] for some results in this direction (and also for a precise definition of suspension flows).

For our current purposes, it is worth to recall that Bálint and Melbourne (cf. Theorem 2.1 [and Remarks 2.3 and 2.5] of [BM]) proved the rapid mixing property for contact suspension flows whose base map is modeled by a Young tower with exponential tails and whose roof function is bounded and uniformly piecewise Hölder-continuous on each subset of the basis of the Young tower. In particular, the proof of Theorem 13 is complete once we prove that the base map F: MΣN is modeled by Young towers and the roof function \(T: M-\varSigma \rightarrow \mathbb{R}\) is bounded and uniformly piecewise Hölder-continuous on each element of the basis of the Young tower (whatever this means).

As it turns out, the theory of Young towers (introduced by Young [Young98, Young98]) is a double-edged sword: while it provides an adequate setup for the study of statistical properties of systems with some hyperbolicity once the so-called Young towers were built, it has the drawback that the construction of Young towers (satisfying all five natural but technical axioms in Young’s definition) is usually a delicate issue: indeed, one has to find a countable Markov partition of a positive measure subset (working as the basis of the Young tower) so that the return maps associated to this Markov partition verify several hyperbolicity and distortion controls, and it is not always clear where one could possibly find such a Markov partition for a given dynamical system.

Fortunately, Chernov and Zhang [CZ] gave a list of sufficient geometric properties for a two-dimensional map like F: MΣN to be modeled by Young towers with exponential tails: in fact, Theorem 10 in Chernov–Zhang paper is a sort of “black-box” producing Young towers with exponential tails whenever seven geometrical conditions are fulfilled. For the sake of exposition, we will not attempt to check all seven conditions for F: MΣN: instead, we will focus on two main conditions called distortion bounds and one-step growth condition.

Before we discuss the distortion bounds and the one-step growth condition, we need to recall the concept of homogeneity strips (originally introduced by Bunimovich–Chernov–Sinai [BCS91]). In our setting, we take \(k_{0} \in \mathbb{N}\) and \(\nu =\nu (r) \in \mathbb{N}\) (to be chosen later) and we make a partition of a neighborhood of the singular set Σ (of geodesics going straight into the cusp) into countably many strips:

$$\displaystyle{ H_{k}:= \left \{(\,y,\beta ) \in M: \frac{1} {(k + 1)^{\nu }} <\vert \beta \vert <\frac{1} {k^{\nu }}\right \} }$$

for all \(k \in \mathbb{N}\), kk 0. (Actually, H k has two connected components, but we will slightly abuse of notation by denoting these connected components by H k .)

Intuitively, the partition H k into polynomial scales 1∕k ν in the parameter β is useful in our context because the relevant quantities (such as Gaussian curvature, first and second derivatives, etc.) for the study of the geodesic flow of the surface of revolution blows up with a polynomial speed as the excursions of geodesics get closer the cusp (that is, as β → 0). Thus, the important quantities for the analysis of the geodesic flow near the cusp become “almost constant” when restricted to one of the homogeneity strips H k .

Also, another advantage of the homogeneity strips is the fact that they give a rough control of the elements of the countable Markov partition at the basis of the Young tower produced by Chernov–Zhang: indeed, the arguments of Chernov–Zhang show that each element of the basis of their Young tower is completely contained in a homogeneity strip. In particular, the verification of the uniform piecewise Hölder-continuity of the roof function T: MΣN follows once we prove that the restriction \(T\vert _{H_{k}}\) of the roof function to each homogeneity strip H k is uniformly Hölder-continuous (in the sense that, for some 0 < α = α(r) ≤ 1, the Hölder norms \(\|T\vert _{H_{k}}\|_{C^{\alpha }}\) are bounded by a constant independent of k).

Coming back to the one-step growth and distortion bounds, let us content ourselves to formulate simpler versions of them (while referring to Sects. 4 and 5 of Chernov–Zhang paper for precise definitions): indeed, the actual definitions of these notions involve the properties of the derivative along unstable manifolds, and, in our current setting, we have just a partially defined map F: MΣN, so that we can not talk about future iterates and unstable manifolds unless we “complete” the definition of F.

Nevertheless, even if F is only partially defined, we still can give crude analogs to unstable directions for F by noticing that the vector field w u:= ∂β on MΣ (whose leaves are {y = constant}) morally works like an unstable direction: in fact, this vector field is transverse to the singular set Σ = { β = 0} which is a sort of “stable set” because all trajectories of the geodesic flow starting at Σ converge in the future to the same point, namely, the cusp at O. In terms of the “unstable direction” w u = ∂β, we define the expansion factor Λ(v) of F at a point v = ( y, β) ∈ MΣ as Λ(v):= ∥DF(v)w u∥∕∥w u∥, that is, the amount of expansion of the “unstable” vector field w u under DF(v). Note that, from the definitions, the expansion factor Λ(v) depends only on the β-coordinate of v = ( y, β). So, from now on, we will think of expansion factors as a function Λ(β) of β.

In terms of expansion factors, the (variant of the) distortion bound condition is

$$\displaystyle{ \frac{d\log \varLambda } {d\beta }(\beta _{0}) = \frac{\varLambda '(\beta _{0})} {\varLambda (\beta _{0})} \leq C\frac{1} {\beta _{0}^{\theta }} }$$
(6.4)

where θ = θ(r) > 0 satisfies νθ < ν + 1, and the (variant of the) one-step growth condition is

$$\displaystyle{ \sum \limits _{k=k_{0}}^{\infty }\varLambda _{ k}^{-1} <1 }$$
(6.5)

where \(\varLambda _{k}:=\min \limits _{v\in H_{k}}\varLambda (v) =\min \limits _{ \frac{1} {(k+1)^{\nu }} \leq \vert \beta \vert \leq \frac{1} {k^{\nu }} }\varLambda (\beta )\).

Remark 40

The one-step growth condition above is close to the original version in Chernov–Zhang work (compare (6.5) with Eq. (5.5) in [CZ]). On the other hand, the distortion bound condition (6.4) differs slightly from its original version in Eq. (4.1) in Chernov–Zhang paper. Nevertheless, they can be related as follows. The original distortion condition essentially amounts to give estimates \(\log \prod \limits _{i=0}^{n}\frac{\varLambda (F^{-i}(v_{ 1}))} {\varLambda (F^{-i}(v_{2}))} \leq \psi (dist(v_{1},v_{2}))\) (where ψ is a smooth function such that ψ(s) → 0 as s → 0) whenever x and y belong to the same homogenous unstable manifold W (i.e., a piece W of unstable manifold such that F j(W) never intersects the boundaries of the homogeneity strips H k for all j ≥ 0 and kk 0; the existence of homogenous unstable manifolds through almost every point is guaranteed by a Borel–Cantelli type argument described in Appendix 2 of Bunimovich–Chernov–Sinai’s paper [BCS91]). Here, one sees that

$$\displaystyle{ \log \prod \limits _{i=0}^{n}\frac{\varLambda (F^{-i}(v_{ 1}))} {\varLambda (F^{-i}(v_{2}))} =\sum \limits _{ i=0}^{n}\frac{\varLambda '(z_{i})} {\varLambda (z_{i})}dist(F^{-i}(x),F^{-i}(\,y)) }$$

for some z i F i(W). Using the facts that dist(F i(x), F i( y)) decays exponentially fast (as x and y are in the same unstable manifold W) and F i(W) is always contained in a homogeneity strip \(H_{k_{i}}\) (as W is a homogenous unstable manifold), one can check that the estimate in (6.4) implies the desired uniform bound on the previous expression in terms of a smooth function ψ(s) such that ψ(s) → 0 as s → 0. In other words, the estimate (6.4) can be shown to imply the original version of distortion bounds, so that we can safely concentrate on the proof of (6.4).

At this point, we can summarize the discussion so far as follows. By Melbourne’s criterion for rapid mixing for contact suspension flows and Chernov–Zhang criterion for the existence of Young towers with exponential tails for the map F: MΣN, we have “reduced” the proof of Theorem 13 to the following statements:

Proposition 4

Given ν > 0 and 0 < α < 1∕(ν + 1), one has the following “uniform Hölder estimate”

$$\displaystyle{ \sup \limits _{k\in \mathbb{N}}\|T\vert _{H_{k}}\|_{C^{\alpha }} <\infty }$$

whenever d 0 is sufficiently small (depending on r, ν and α).

Proposition 5

The expansion factor function Λ(β) satisfies:

  • given ν > r∕(r − 1), we can choose \(k_{0} \in \mathbb{N}\) large (and d 0 sufficiently small) so that

    $$\displaystyle{ \sum \limits _{k=k_{0}}^{\infty }\varLambda _{ k}^{-1} <1 }$$

    where \(\varLambda _{k} =\min \limits _{ \frac{1} {(k+1)^{\nu }} \leq \vert \beta \vert \leq \frac{1} {k^{\nu }} }\varLambda (\beta )\) ;

  • given r > 3, we can choose ν > r∕(r − 1) and θ > 1 + 2∕r such that νθ < ν + 1 and

    $$\displaystyle{ \frac{\varLambda '(\beta )} {\varLambda (\beta )} \leq C\frac{1} {\beta ^{\theta }} }$$

    for some (sufficiently large) constant C > 0 and for all β.

The proofs of these two propositions are given in the next two subsections and they are based on the study of perpendicular unstable Jacobi fields related to the variations of geodesics of the form γ v(β)(t), 0 < β < π∕2.

6.4.2.3 The Derivative of the Roof Function

From now on, we fix qC = C(d 0) (e.g., q = (d 0, d 0 r, 0)) and, for the sake of simplicity, we will denote a geodesic γ v(β)(t) corresponding to an initial vector v(β) ∈ T q 1 S by γ β (t). Of course, there is no loss of generality here because of the rotational symmetry of the surface S. Also, we will suppose that β > 0 as the case β < 0 is symmetric.

Note that the roof function T(β) is defined by the condition γ β (T(β)) ∈ C = C(d 0), or, equivalently,

$$\displaystyle{ d(\gamma _{\beta }(T(\beta ))) = I(d_{0}):=\int _{ 0}^{d_{0} }\sqrt{1 + (rx^{r-1 } )^{2}}dx }$$

where d(. ) denotes the distance from a point to the cusp at O and I(d 0) is the distance from C(d 0) to O. By taking the derivative with respect to β at β = β 0 and by recalling that −∇d = V, we obtain that

$$\displaystyle{ 0 =\langle \nabla d(c(\beta _{0})),\dot{c}(\beta _{0})\rangle = -\langle V (c(\beta _{0})),\dot{c}(\beta _{0})\rangle }$$

where c(β):= γ β (T(β)). Since c(β) = C(β, T(β)) where C(β, t):= γ β (t), we have \(\dot{c}(\beta ) = \frac{D\gamma _{\beta }} {\partial \beta } (T(\beta )) +\dot{\gamma } _{\beta }(T(\beta ))T'(\beta )\), and, a fortiori,

$$\displaystyle{ 0 =\langle V (\gamma _{\beta _{0}}(T(\beta _{0}))), \frac{D\gamma _{\beta }} {\partial \beta } \vert _{\beta =\beta _{0}}(T(\beta _{0}))\rangle +\langle V (\gamma _{\beta _{0}}(T(\beta _{0}))),\dot{\gamma }_{\beta _{0}}(T(\beta _{0}))\rangle T'(\beta _{0}) }$$

Let us compute the two inner products above. By definition of the parameter β and the symmetry of the revolution surface S, we have \(\langle V (\gamma _{\beta }(T(\beta ))),\dot{\gamma }_{\beta }(T(\beta ))\rangle = -\cos \beta = -\langle V (\gamma _{\beta }(0)),\dot{\gamma }_{\beta }(0)\rangle\). Also, if we denote by \(J(t) = \frac{D\gamma _{\beta }} {\partial \beta } (t):= j(t) \cdot J\dot{\gamma }_{\beta }(t)\) the perpendicular (“unstable”) Jacobi field Footnote 13 along the geodesic \(\gamma _{\beta _{0}}(t)\) associated to the variation of C(β, t) = γ β (t) with initial conditions j(0) = 0 and j′(0) = 1, then

$$\displaystyle\begin{array}{rcl} \langle V (\gamma _{\beta _{0}}(T(\beta _{0}))), \frac{D\gamma _{\beta }} {\partial \beta } \vert _{\beta =\beta _{0}}(T(\beta _{0}))\rangle & =& j(T(\beta _{0}))\langle V (\gamma _{\beta _{0}}(T(\beta _{0}))),J\dot{\gamma }_{\beta _{0}}(T(\beta _{0}))\rangle {}\\ & =& -j(T(\beta _{0}))\langle V (\gamma _{\beta _{0}}(0)),J\dot{\gamma }_{\beta _{0}}(0)\rangle {}\\ & =& -j(T(\beta _{0}))\langle JV (\gamma _{\beta _{0}}(0)),\dot{\gamma }_{\beta _{0}}(0)\rangle {}\\ & =& -j(T(\beta _{0}))\sin \beta _{0} {}\\ \end{array}$$

From the computation of the inner products above and the fact that they add up to zero, we deduce that 0 = −j(T(β 0))sinβ 0 − (cosβ 0)T′(β 0), that is,

$$\displaystyle{ T'(\beta _{0}) = -(\tan \beta _{0})j(T(\beta _{0})) }$$
(6.6)

In other terms, the previous equation says that the derivative T′(β 0) can be controlled via the quantity j(T(β 0)) measuring the growth of the perpendicular Jacobi field J(t) at the return time T(β 0). Here, it is worth to recall that Jacobi fields are driven by Jacobi’s equation:

$$\displaystyle{ j''(t) + K(t)j(t) = 0 }$$

where K(t) < 0 is the Gaussian curvature of the surface of revolution S at the point \(\gamma _{\beta _{0}}(t)\). Also, it is useful to keep in mind that Jacobi’s equation implies that the quantity u = j′∕j satisfies Riccati’s equation

$$\displaystyle{ u'(t) + u(t)^{2} = k(t)^{2} }$$

where − k(t)2:= K(t).

In the context of the surface of revolution S, these equations are important tools because we have the following explicit formula for the Gaussian curvature K(q) at a point q = (x, x rcosy, x rsiny) ∈ S:

$$\displaystyle{ K(q) = \frac{-r(r - 1)} {x^{2}(1 + (rx^{r-1})^{2})^{2}} }$$

In particular, \(k(q):= \sqrt{r(r - 1)}/x(1 + (rx^{r-1})^{2})\) verifies − k(q)2 = K(q).

Next, we take ɛ > 0 and we consider the following auxiliary function:

$$\displaystyle{ g(q):= \frac{r(1+\varepsilon )} {x} }$$

By definition, k(q) < g(q). Furthermore,

$$\displaystyle{ k(t)^{2} - g(t)^{2} - g'(t) \leq \frac{r(r - 1)} {x(t)^{2}} -\frac{r^{2}(1+\varepsilon )^{2}} {x(t)^{2}} -\frac{r(1+\varepsilon )x'(t)} {x(t)^{2}} }$$

Since the equation (1 + rx(t)r−1)2 x′(t)2 = 1 − c 2x(t)2r = cosβ(t)2 (describing the motion of geodesic on S) implies that | x′(t) | ≤ 1, we deduce from the previous inequality that

$$\displaystyle{ k(t)^{2} - g(t)^{2} - g'(t) \leq \frac{1} {x(t)^{2}}(r(r - 1) - (r(1+\varepsilon ))^{2} + r(1+\varepsilon )) <0 }$$
(6.7)

for all times t ∈ [0, T(β)].

This estimate allows to control the solution u = j′∕j of Riccati’s equation along the following lines. The initial data of the Jacobi field J(t) is j(0) = 0 and j′(0). Hence,

$$\displaystyle{ \frac{j'(0)} {j(0)} = \infty> g(0) = \frac{r(1+\varepsilon )} {x(0)} = \frac{r(1+\varepsilon )} {d_{0}} }$$

In particular, there exists a well-defined maximal interval [0, t 0] ⊂ [0, T(β)] where j′(t)∕j(t) ≥ g(t) for all t ∈ [0, t 0]. By plugging this estimate into Jacobi’s equation,

$$\displaystyle{ \frac{j''(t)} {j'(t)} = \frac{k(t)^{2}j(t)} {j'(t)} \leq \frac{k(t)^{2}} {g(t)} \leq g(t) }$$

for each t ∈ [0, t 0].

By integrating this inequality (and using the initial condition j′(0) = 1), we obtain that

$$\displaystyle{ \log j'(t_{0}) =\log \frac{j'(t_{0})} {j'(0)} =\int _{ 0}^{t_{0} } \frac{j''(t)} {j'(t)}ds \leq \int _{0}^{t_{0} }g(t)dt. }$$

Therefore,

$$\displaystyle{ j(t_{0}) \leq \frac{j'(t_{0})} {g(t_{0})} \leq \frac{1} {g(t_{0})}\exp \left (\int _{0}^{t_{0} }g(t)dt\right ) }$$

If t 0 = T(β), we deduce that \(j(T(\beta )) \leq \frac{1} {g(T(\beta ))}\exp \left (\int _{0}^{T(\beta )}g(t)dt\right ) \leq \frac{1} {k(0)}\exp \left (\int _{0}^{T(\beta )}g(t)dt\right )\) (as k(0) = k(T(β)) < g(T(β))). Otherwise, 0 < t 0 < T(β) and u(t 0) = j′(t 0)∕j(t 0) = g(t 0). Since u = j′∕j satisfies Riccati’s equation, we deduce from (6.7) that

$$\displaystyle{ u'(t_{1}) - g'(t_{1}) = k(t_{1})^{2} - u(t_{ 1})^{2} - g'(t_{ 1}) = k(t_{1})^{2} - g(t_{ 1})^{2} - g'(t_{ 1}) <0 }$$

at each time t 1 where u(t 1) = g(t 1). It follows that j′(t)∕j(t):= u(t) ≤ g(t) for all t ∈ [t 0, T(β)]. Hence,

$$\displaystyle{ \log \frac{j(T(\beta ))} {j(t_{0})} =\int _{ t_{0}}^{T(\beta )}\frac{j'(t)} {j(t)}dt \leq \int _{t_{0}}^{T(\beta )}g(t)dt, }$$

and, a fortiori,

$$\displaystyle\begin{array}{rcl} j(T(\beta ))& \leq & j(t_{0})\exp \left (\int _{t_{0}}^{T(\beta )}g(t)dt\right ) {}\\ & \leq & \frac{1} {g(t_{0})}\exp \left (\int _{0}^{t_{0} }g(t)dt\right )\exp \left (\int _{t_{0}}^{T(\beta )}g(t)dt\right ) {}\\ & \leq & \frac{1} {k(0)}\exp \left (\int _{0}^{T(\beta )}g(t)dt\right ). {}\\ \end{array}$$

In other words, we proved that

$$\displaystyle{ j(T(\beta )) \leq \frac{1} {k(0)}\exp \left (\int _{0}^{T(\beta )}g(t)dt\right ) }$$
(6.8)

independently whether t 0 = T(β) or 0 < t 0 < T(β).

Now, the quantity \(\exp \left (\int _{0}^{T(\beta )}g(t)dt\right )\) can be estimated as follows. By deriving Clairaut’s relation x(t)rsinβ(t) = c, we get

$$\displaystyle{ rx(t)^{r-1}x'(t)\sin \beta (t) + x(t)^{r}(\cos \beta (t))\beta '(t) = 0, }$$

that is,

$$\displaystyle{ \frac{1} {x(t)} = -\frac{1} {r} \frac{\cos \beta (t)} {x'(t)} \frac{\beta '(t)} {\sin \beta (t)} }$$
(6.9)

Since sinβ(t) ∼ β(t) (as we are interested in small angles | β | < k 0 ν, k 0 large) and cosβ(t) ∼ x′(t) (thanks to the relation (1 + rx(t)r−1)2 x′(t)2 = 1 − c 2x(t)2r = (cosβ(t))2 and the fact that r > 1 and, thus, 1 ≤ 1 + (rx(t)r−1)2 ≤ 1 + (rd 0 r−1)2 ∼ 1 for d 0 small), we conclude that

$$\displaystyle{ g(t) = \frac{r(1+\varepsilon )} {x(t)} \leq (1 + 2\varepsilon )\frac{\beta '(t)} {\beta (t)} }$$

for t ∈ [0, T(β)∕2]. Here, we used the fact that x′(t) < 0 for t ∈ [0, T(β)∕2]. Therefore,

$$\displaystyle{ \int _{0}^{T(\beta )/2}g(t)dt \leq (1 + 2\varepsilon )\log \frac{\pi /2} {\beta (0)} }$$

since β(T(β)∕2) = π∕2. Also, the symmetry of the surface S implies x(t) = x(T(β) − t) and, hence,

$$\displaystyle{ \int _{0}^{T(\beta )/2}g(t)dt =\int _{ T(\beta )/2}^{T(\beta )}g(t)dt }$$

In summary, we have shown that 0 T(β) g(t)dt ≤ 2(1 + 2ɛ)log(π∕2β(0)), i.e., 

$$\displaystyle{ \exp \left (\int _{0}^{T(\beta )}g(t)dt\right ) \leq (\pi /2)^{2(1+2\varepsilon )} \frac{1} {\beta (0)^{2(1+2\varepsilon )}} }$$
(6.10)

By putting together (6.6), (6.8) and (6.10), we conclude that

$$\displaystyle{ \vert T'(\beta _{0})\vert \leq \frac{\tan \beta _{0}} {k(0)}\exp \left (\int _{0}^{T(\beta _{0})}g(t)dt\right ) \leq C \frac{\beta _{0}} {\beta _{0}^{2(1+2\varepsilon )}} = \frac{C} {\beta _{0}^{1+4\varepsilon }} }$$
(6.11)

for some constant C > 0 depending on r > 1 and ɛ > 0.

At this stage, we are ready to complete the proof of Proposition 4.

Proof

Let us estimate the Hölder constant \(\|T\vert _{H_{k}}\|_{C^{\alpha }}\). For this sake, we fix β 1, β 2H k and we write

$$\displaystyle{ \frac{\vert T(\beta _{1}) - T(\beta _{2})\vert } {\vert \beta _{1} -\beta _{2}\vert ^{\alpha }} = \vert T'(\beta _{3})\vert \cdot \vert \beta _{1} -\beta _{2}\vert ^{1-\alpha } }$$

for some β 3H k between β 1 and β 2. Since | β 1β 2 | ≤ k ν − (k + 1)ννk ν+1 and | β 3 | ≥ (k + 1)ν, it follows from (6.11) that

$$\displaystyle{ \frac{\vert T(\beta _{1}) - T(\beta _{2})\vert } {\vert \beta _{1} -\beta _{2}\vert ^{\alpha }} \leq C\nu ^{1-\alpha }\frac{(k + 1)^{\nu (1+4\varepsilon )}} {k^{(\nu +1)(1-\alpha )}} }$$

Because β 1 and β 2 are arbitrary points in H k , we have that

$$\displaystyle{ \|T\vert _{H_{k}}\|_{C^{\alpha }} \leq C\frac{(k + 1)^{\nu (1+4\varepsilon )}} {k^{(\nu +1)(1-\alpha )}} }$$

where C > 0 is an appropriate constant.

Now, our assumption 0 < α < 1∕(ν + 1) implies that we can choose ɛ > 0 sufficiently small so that ν(1 + 4ɛ) ≤ (ν + 1)(1 −α). By doing so, we see from the previous estimate that

$$\displaystyle{ \sup \limits _{k\in \mathbb{N}}\|T\vert _{H_{k}}\|_{C^{\alpha }} <\infty }$$

whenever ɛ > 0, i.e., d 0 > 0, is sufficiently small. This proves Proposition 4.

6.4.2.4 Some Estimates for the Expansion Factors Λ(β)

Similarly to the previous subsection, the proof of Proposition 5 uses the properties of Jacobi’s and Riccati’s equation to study

$$\displaystyle{ \varLambda (\beta ):= j(T(\beta )) + j'(T(\beta )) }$$
(6.12)

where j(t) = j β (t) is the scalar function (with j(0) = 0 and j′(0) = 1) measuring the size of the perpendicular “unstable” Jacobi field along γ β (t).

We begin by giving a lower bound on Λ(β). Given ɛ > 0, let us choose d 0 = d 0(ɛ, r) > 0 small so that

$$\displaystyle{ \sqrt{1-\varepsilon } <\frac{1} {1 + (rd_{0}^{r-1})^{2}}(\leq 1) }$$

Of course, this choice of d 0 is possible because r > 1. Next, we consider the auxiliary function:

$$\displaystyle{ h(q):= \frac{(r - 1)(1 - 2\varepsilon )} {x}. }$$

By definition, \(h(q) <\sqrt{r(r - 1)}/x(1 + (rd_{0}^{r-1})^{2}) \leq k(q)\). Furthermore,

$$\displaystyle{ h'(t) = -\frac{(r - 1)(1-\varepsilon )} {x(t)^{2}} x'(t) }$$

In particular,

$$\displaystyle{ k(t)^{2} - h(t)^{2} - h'(t)> \frac{r(r - 1)(1-\varepsilon )} {x(t)^{2}} -\frac{(r - 1)^{2}(1 - 2\varepsilon )^{2}} {x(t)^{2}} -\frac{(r - 1)(1 - 2\varepsilon )x'(t)} {x(t)^{2}} }$$

Since | x′(t) | ≤ 1 (cf. the paragraph before (6.7)), we deduce from the previous estimate that

$$\displaystyle{ k(t)^{2} - h(t)^{2} - h'(t)> 0 }$$

This inequality implies that the solution u(t) = j′(t)∕j(t) of Riccati’s equation satisfies u(t) ≥ h(t) for all t ∈ [0, T(β)]. Indeed, the initial condition j′(0) = 1, j(0) = 0 says that u(0) = > h(0) and the inequality above tells us that

$$\displaystyle{ u'(t_{1}) - h'(t_{1}) = k(t_{1})^{2} - u(t_{ 1})^{2} - h'(t_{ 1}) = k(t_{1})^{2} - h(t_{ 1})^{2} - h'(t_{ 1})> 0 }$$

at any time t 1 where u(t 1) = h(t 1).

By integrating the estimate u(t) = j′(t)∕j(t) ≥ h(t) over the interval [t 0, T(β)], we obtain that

$$\displaystyle{ \log \frac{j(T(\beta ))} {j(t_{0})} =\int _{ t_{0}}^{T(\beta )}\frac{j'(t)} {j(t)}dt \geq \int _{t_{0}}^{T(\beta )}h(t)dt, }$$

i.e., 

$$\displaystyle{ j(T(\beta )) \geq j(t_{0})\exp \left (\int _{t_{0}}^{T(\beta )}h(t)dt\right ) }$$

For sake of concreteness, let us set t 0:= d 0∕10 and let us restrict our attention to geodesics whose initial angle β = β(0) with the meridians of S are sufficiently small so that T(β) ≥ d 0∕2. In this way, j(t 0) ≥ t 0 = d 0∕10 (thanks to Jacobi’s equation j″ = k 2 j and our initial conditions j(0) = 0 and j′(0) = 1). So, the inequality above becomes

$$\displaystyle{ j(T(\beta )) \geq \frac{d_{0}} {10}\exp \left (\int _{t_{0}}^{T(\beta )}h(t)dt\right ) }$$

Next, we observe that \(\exp \left (\int _{t_{0}}^{T(\beta )}h(t)dt\right )\) can be bounded from below in a similar way to our derivation of a bound from above to \(\exp \left (\int _{0}^{T(\beta )}g(t)dt\right )\) in the previous subsection: in fact, by repeating the arguments appearing after (6.9) above, one can show that

$$\displaystyle{ h(t) \geq \frac{(r - 1)(1 - 3\varepsilon )} {r} \frac{\beta '(t)} {\beta (t)} }$$

and

$$\displaystyle{ \exp \left (\int _{t_{0}}^{T(\beta )}h(t)dt\right ) \geq \overline{c} \frac{1} {\beta (0)^{(r-1)(1-3\varepsilon )/r}} }$$

where \(\overline{c}> 0\) is an adequate (small) constant depending on r, d 0 and ɛ.

By putting together the estimates above, we deduce that

$$\displaystyle{ \varLambda (T(\beta )) \geq j(T(\beta )) \geq c \frac{1} {\beta (0)^{(r-1)(1-3\varepsilon )/r}} }$$

where \(c = d_{0}\overline{c}/10\).

This inequality shows that

$$\displaystyle{ \sum \limits _{k=k_{0}}^{\infty }\varLambda _{ k}^{-1} \leq \frac{1} {c}\sum \limits _{k=k_{0}}^{\infty } \frac{1} {(k + 1)^{(r-1)\nu (1-3\varepsilon )/r}} }$$

Thus, if ν > r∕(r − 1), then we can choose ɛ > 0 small (with (r − 1)(1 − 3ɛ)νr > 1) and \(k_{0} \in \mathbb{N}\) large so that (our variant of) the one-step growth condition (6.5) holds. This proves the first part of Proposition 5.

Finally, we give an indication of the proof of the second part of Proposition 5 (i.e., the distortion bound (6.4)). We start by writing

$$\displaystyle{ \frac{\varLambda '(\beta )} {\varLambda (\beta )} = \frac{d} {d\beta }\log \varLambda (\beta ) }$$

and by noticing that

$$\displaystyle{ \log \varLambda (\beta ) =\log (\,j(T(\beta )) + j'(T(\beta ))) =\log j(T(\beta )) +\log (1 + u(T(\beta ))) }$$

Next, we take the derivative with respect to β of the previous expression. Here, we obtain several terms involving some quantities already estimated above via Jacobi’s and Riccati’s equation (such as j(T(β)), T′(β), etc.), but also a new quantity appears, namely, u β (t), i.e., the derivative with respect to β of the family of solutions u(t) = u(t, β) of Riccati’s equation along γ β (t). Here, the “trick” to give bounds on u β (t) is to derive Riccati’s equation

$$\displaystyle{ u'(t) + u(t)^{2} = k(t)^{2} }$$

with respect to β in order to get an ODE (in the time variable t) satisfied by u β (t). In this way, it is possible to see that one has reasonable bounds on u β (t) as soon as one can estimate the derivative k β of the square root of the absolute value − K of the Gaussian curvature. Here, k β can be bounded by recalling that we have an explicit formula

$$\displaystyle{ K = -r(r - 1)/x^{2}(1 + (rx^{r-1})^{2})^{2} }$$

for the Gaussian curvature. By following these lines, one can prove that, for a given ɛ > 0, the distortion bound

$$\displaystyle{ \frac{\varLambda '(\beta )} {\varLambda (\beta )} \leq C \frac{1} {\beta (0)^{(1+2/r)(1+\varepsilon )}} = \frac{C} {\beta (0)^{\theta }} }$$

holds whenever d 0 > 0 is taken sufficiently small. In other words, by taking θ = θ(r) = (r + 2)(1 + ɛ)∕r, we have Λ′(β)∕Λ(β) ≤ (0)θ.

Note that the estimate in the previous paragraph gives the desired distortion bounds (6.4) once we show that \(\theta =\theta (r) = \frac{(r+2)} {r} +\) can be selected such that νθ < ν + 1. In order to check this, it suffices to recall that νr∕(r − 1) > 0 can be taken arbitrarily small (cf. the proof of the first part of Proposition 5), i.e., \(\nu = \frac{r} {r-1}+\). So,

$$\displaystyle{ \nu \theta = \left ( \frac{r} {r - 1}+\right )\left (\frac{r + 2} {r} +\right ) = \frac{r + 2} {r - 1}+ }$$

and

$$\displaystyle{ \nu +1 = \frac{r} {r - 1} + 1+ = \frac{2r - 1} {r - 1} + }$$

Since r + 2 < 2r − 1 for r > 3, it follows that νθ < ν + 1 for adequate choices of θ and ν. This completes our sketch of proof of the second part of Proposition 5.