1 Introduction

1.1 Outline of main results

Let K be a number field. This paper has two main goals.

Firstly, we will give a new proof of the finiteness of K-rational points on a smooth projective K-curve of genus \(\geqslant 2\). The proof is closely related to Faltings’s proof [14], but is based on a closer study of the variation of p-adic Galois representations in a family; it makes no usage of techniques specific to abelian varieties.

Secondly, we give an application of the same methods to a higher-dimensional situation. Consider the family of degree-d hypersurfaces in \({\mathbf {P}}^n\) and let \(F_{n,d}\) be the complement of the discriminant divisor in this family; we regard \(F_{n,d}\) as a smooth \({{\mathbf {Z}}}\)-scheme. For S a finite set of primes, points of \(F_{n,d}({{\mathbf {Z}}}[S^{-1}])\) correspond to proper smooth hypersurfaces of degree d in \({\mathbf {P}}^n_{{{\mathbf {Z}}}[S^{-1}]}\). It is very reasonable to suppose that \(F_{n,d}({{\mathbf {Z}}}[S^{-1}])\) is finite modulo the action of \({{\text {GL}}}_{n+1}({{\mathbf {Z}}}[S^{-1}])\) for \(d \geqslant 3\) and all n. We shall show at least that, if \(n \geqslant n_0\) and \(d \geqslant d_0(n)\), then \(F_{n,d}({{\mathbf {Z}}}[S^{-1}])\) is contained in a proper Zariski closed subset of \(F_{n,d}\) (i.e., there exists a proper \({{\mathbf {Q}}}\)-subvariety of the generic fiber \((F_{n,d})_{{{\mathbf {Q}}}}\) whose rational points contain \(F_{n,d}({{\mathbf {Z}}}[S^{-1})\)). To prove this higher-dimensional result, we use a very recent theorem of Bakker and Tsimerman, the Ax–Schanuel theorem for period mappings.

We can obtain a still stronger theorem along a subvariety of \(F_{n,d}\) if one has control over monodromy. Namely, if \(F_{n,d}^* \subset (F_{n,d})_{{{\mathbf {Q}}}}\) is the Zariski closure of integral points, our result actually implies that the Zariski closure of monodromy for the universal family of hypersurfaces must drop over each component of \(F_{n,d}^*\). It is possible that this imposes a stronger codimension condition on \(F_{n,d}^*\) than simply “proper” but we do not know for sure.

Note that, without the result of Bakker and Tsimerman, one can still prove that \(F_{n,d}({{\mathbf {Z}}}[S^{-1}])\) lies in a proper \({{\mathbf {Q}}}_p\)-analytic subvariety of \(F_{n,d}({{\mathbf {Q}}}_p)\), but one cannot prove the second statement about \(F_{n,d}^*\).

A simple toy case to illustrate the methods is given by the S-unit equation, which we analyze in Sect. 4.

1.2 Outline of the proof

Consider a smooth projective family \(X \rightarrow Y\) over K, where Y is itself a smooth K-variety; we suppose this extends to a family \(\pi {:}\,{\mathcal {X}} \rightarrow {\mathcal {Y}}\) over the ring \({\mathcal {O}}\) of S-integers of K, for some finite set S of places of K (containing all the archimedean places).

For \(y \in Y(K)\) call \(X_y\) the fiber over y. We want to bound \({\mathcal {Y}}({\mathcal {O}})\), making use of the fact that, if \(y \in Y(K)\) extends to \({\mathcal {Y}}({\mathcal {O}})\), then \(X_y\) admits a smooth proper model over \({\mathcal {O}}\). That one can thus reduce Mordell’s conjecture to finiteness results for varieties with good reduction was observed by Parshin [30] and then used by Faltings in his proof of the Mordell conjecture [14].

Choosing a rational prime p that is unramified in K and not below any prime of S, write \(\rho _y\) for the Galois representation of \(G_K = {{\text {Gal}}}({\overline{K}}/K)\) on the p-adic geometric étale cohomology of \(X_y\), i.e. \(H^*_{\mathrm {et}}(X_y \times _{K} {\bar{K}}, {{\mathbf {Q}}}_p)\). As observed by Faltings, one deduces from Hermite–Minkowski finiteness that, as y varies through \({\mathcal {Y}}({\mathcal {O}})\), there are only finitely many possibilities for the semisimplification of the \(G_K\)-representation \(\rho _y\) (denoted by \(\rho _y^{\mathrm {ss}}\)).

We seek to use the fact that, for v a place of K above p, one can understand the restriction \(\rho _{y,v}\) of \(\rho _y\) to \(G_{K_v}\) via p-adic Hodge theory. In the Mordell case, when Y is a projective curve, our argument proceeds by showing that both of the following statements hold for suitable choice of X and v:

(*) The representation \(\rho _y\) is semisimple for all but finitely many \(y \in Y(K)\), and the map

$$\begin{aligned} y \in Y(K) \longrightarrow \,\text{ isomorphism } \text{ class } \text{ of }\,\rho _{y,v} \end{aligned}$$
(1.1)

has finite fibers.

Faltings proves much stronger statements when X is an abelian scheme over Y, using a remarkable argument with heights: every \(\rho _y\) is semisimple and \(\rho _y\) determines \(X_y\) up to isogeny. Our approach gives less, but it gives results in other cases too, such as the hypersurface family discussed above. However, in that setting, the issue of semisimplicity proves harder to control, and what we prove instead is the following hybrid of the two statements in (*): the map

$$\begin{aligned} y \in Y(K) \longrightarrow \text{ restriction } \text{ of }\,\rho _y^{\mathrm {ss}}\,\text{ to }\,G_{K_v} \end{aligned}$$
(1.2)

considered as a mapping from Y(K) to isomorphism classes of \(G_{K_v}\)-representations, has fibers that are not Zariski dense. (It is crucial, in the above equation, that we semisimplify \(\rho _y\) as a global Galois representation and then restrict to \(G_{K_v}\).)

For the remainder of the current Sect. 1.2, we will explain (1.1) in more detail.

Our analysis uses p-adic Hodge theory. However we make no use of p-adic Hodge theory in families: we need only the statements over a local field. Under the correspondence of p-adic Hodge theory, the restricted representation \(\rho _{y,v}\) corresponds to a filtered \(\phi \)-module, namely the de Rham cohomology of \(X_y\) over \(K_v\) equipped with its Hodge filtration and a semilinear Frobenius map. The variation of this filtration is described by a period mapping; in this setting, this is a \(K_v\)-analytic mapping

$$\begin{aligned} \text{ residue } \text{ disk } \text{ in }\,Y(K_v) \longrightarrow K_v\text{-points } \text{ of } \text{ a } \text{ flag } \text{ variety }. \end{aligned}$$
(1.3)

Therefore, the variation of the p-adic representation \(\rho _{y,v}\) with y is controlled by (1.3). The basic, and very naive, “hope” of the proof is that injectivity of the period map (1.3) should force (1.1) to be injective.

However, (1.1) does not follow directly from injectivity of the period map, that is to say, from Torelli-type theorems.

Different filtrations on the underlying \(\phi \)-module can give filtered \(\phi \)-modules which are abstractly isomorphic, the isomorphism being given by a linear endomorphism commuting with \(\phi \). Hence, one needs to know not only that the period mapping (1.3) is injective, but that its image has finite intersection with an orbit on the period domain of the centralizer \(\mathrm {Z}(\phi )\) of \(\phi \). In other words, we must analyze a question of “exceptional intersections” between the image of a period map and an algebraic subvariety.

To illustrate how this is done, let us restrict to the case when Y is a curve. Assuming that we have shown that the \(\mathrm {Z}(\phi )\)-orbit on the ambient flag variety is a proper subvariety, it will then be sufficient to show that the image of (1.3) is in fact Zariski dense. Then the intersection points between the image of (1.3) and a \(\mathrm {Z}(\phi )\)-orbit amount to zeroes of a nonvanishing \(K_v\)-analytic function in a residue disc, and are therefore finite.

To check Zariski density, the crucial point is that one can verify the same statement for the complex period map:

$$\begin{aligned} \text{ universal } \text{ cover } \text{ of }\,Y({{\mathbf {C}}}) \longrightarrow {{\mathbf {C}}}\text{-points } \text{ of } \text{ a } \text{ flag } \text{ variety }. \end{aligned}$$
(1.4)

To pass between the p-adic and complex period maps, we use the fact that (in suitable coordinates), they satisfy the same differential equation coming from the Gauss Manin connection, and so have the same power series. This is a simple but crucial argument, given in Lemma 3.2. But—over the complex numbers—Zariski density can be verified by topological methods: (1.4) is now equivariant for an action of \(\pi _1(Y)\), acting on the right according to the monodromy representation. It is enough to verify that the image of \(\pi _1\) under the monodromy representation is sufficiently large. In the Mordell case, we show that the monodromy action of \(\pi _1(Y)\) extends to a certain mapping class group, and we deduce large monodromy from the same assertion for the mapping class group (where we can use Dehn twists). This monodromy argument is related to computations of Looijenga [27], Grunewald et al. [18], and Salter and Tshishiku [37].

If Y were not a curve, the argument above says only that the intersection of the image of (1.3) and a \(\mathrm {Z}(\phi )\)-orbit is a proper \(K_v\)-analytic subvariety of \(Y(K_v)\). One wants to get a proper Zariski-closed subvariety (for example, this permits one, in principle at least, to make an inductive argument on the dimension, although we do not try to do so here.) We obtain this only by appealing to a remarkable recent result of Bakker and Tsimerman, the Ax–Schanuel theorem for period mappings: this is a very powerful and general statement about the transcendence of period mappings.

To summarize, we have outlined the strategy of the proof of (1.1). However, we have omitted one crucial ingredient needed in this proof, and also a crucial ingredient needed to get from (1.1) to Mordell:

  1. (a)

    Showing that the centralizer \(\mathrm {Z}(\phi )\) of \(\phi \) is not too large, and

  2. (b)

    Controlling in some a priori way the extent to which \(\rho _y\) can fail to be semisimple.

We now discuss these issues in turn.

1.3 Problem (a): controlling the centralizer of \(\phi \)

As we have explained, we need a method to ensure the centralizer of the crystalline Frobenius \(\phi \) acting on the cohomology of a fiber \(X_y\) is not too large. For example, if \(K_v = {{\mathbf {Q}}}_p\) so that \(\phi \) is simply a \({{\mathbf {Q}}}_p\)-linear map, we must certainly rule out the possibility that \(\phi \) is a scalar!

This issue, that \(\phi \) might have too large a centralizer and thus the map

$$\begin{aligned} y \in Y(K_v) \longrightarrow \,\text{ isomorphism } \text{ class } \text{ of }\,\rho _{y,v} \end{aligned}$$
(1.5)

might fail to have finite fibers, already occurs in the simplest possible example. When analyzing the S-unit equation, it is natural to take \(Y = {{\mathbf {P}}}^1-\{0,1,\infty \}\) and \(X \rightarrow Y\) to be the Legendre family, so that \(X_t \) is the curve \(y^2 = x(x-1)(x-t)\). Unfortunately (1.5) fails: for \(t \in {{\mathbf {Z}}}_p\), if we write \(\rho _t\) for the representation of the Galois group \(G_{{{\mathbf {Q}}}_p}\) on the (rational) Tate module of \(X_t\), then \(\rho _t\) belongs to only finitely many isomorphism classes so long as the reduction \({\bar{t}} \in {{\mathbf {F}}}_p\) is not equal to 0 or 1.

Again we proceed in two different ways:

  1. (i)

    In general, Frobenius is a semilinear operator on a vector space over an unramified extension \(L_w\) of \({{\mathbf {Q}}}_p\); semilinearity alone gives rise to a nontrivial bound (Lemma 2.1) on the size of its centralizer, which, in effect, becomes stronger as \([L_w{:}\,{{\mathbf {Q}}}_p]\) gets larger.

    In the application to Mordell, it turns out that we can always put ourselves in a situation where \([L_w{:}\,{{\mathbf {Q}}}_p]\) is rather large. This forces the Frobenius centralizer to be small. We explain this at more length below.

  2. (ii)

    In the case of hypersurfaces, we do not have a way to enlarge the base field as in (i). Our procedure is less satisfactory than in case (i), in that it gives much weaker results:

    We are of course able to choose the prime p, and we choose it (via Chebotarev) so that the crystalline Frobenius at p has centralizer that is as small as possible. To do this, we fix an auxiliary prime \(\ell \), and first use the fact (from counting points over extensions of \({{\mathbf {F}}}_p\)) that crystalline Frobenius at p has the same eigenvalues as Frobenius on p acting on \(\ell \)-adic cohomology; thus it is enough to choose p such that the latter operator has small centralizer. One can do this via Chebotarev, given a lower bound on the image of the global Galois representation, and for this we again use some p-adic Hodge theory (cf. [39]). Another approach, by point-counting, is outlined in Lemma 12.1.

Let us explain point (i) above by example. In our analysis of the S-unit equation in Sect. 4, we replace the Legendre family instead by the family with fiber

$$\begin{aligned} X_{t} = \coprod _{z^{2^k} = t} \{ y^2=x(x-1)(x-z) \}, \end{aligned}$$

for a suitable large integer k. In our situation, the corresponding map \(t \mapsto [\rho _t]\) will now only have finite fibers, at least on residue disks where \({\bar{t}}\) is not a square—an example of the importance of enlarging \(K_v\).

Said differently, we have replaced the Legendre family \(X {\mathop {\rightarrow }\limits ^{\ell }} {{\mathbf {P}}}^1-\{0,1,\infty \}\) with a family with the following composite structure:

$$\begin{aligned} X' {\mathop { \rightarrow }\limits ^{\ell '}} {{\mathbf {P}}}^1-\{0,\mu _{2^k},\infty \} \rightarrow {{\mathbf {P}}}^1 - \{0,1,\infty \} \end{aligned}$$

where the second map is given by \(u \mapsto u^{2^k}\), and \(\ell '\) is simply the restriction of the Legendre family over \({{\mathbf {P}}}^1-\{0,\mu _{2^k}, \infty \}\). The composite defines a family over \({{\mathbf {P}}}^1-\{0,1,\infty \}\) with geometrically disconnected fibres, and this disconnectedness is, as we have just explained, to our advantage.

It turns out that the families introduced by Parshin (see [30, Proposition 9]), in his reduction of Mordell’s conjecture to Shafarevich’s conjecture, automatically have a similar structure. That is to say, if Y is a smooth projective curve, Parshin’s families factorize as

$$\begin{aligned} X \rightarrow Y' \rightarrow Y, \end{aligned}$$

where \(Y' \rightarrow Y\) is finite étale and \(X \rightarrow Y'\) is a relative curve.

There is in fact a lot of flexibility in this construction; in Parshin’s original construction the covering \(Y' \rightarrow Y\) is obtained by pulling back multiplication by 2 on the Jacobian, and as such each geometric fiber is a torsor under \(H^1(Y_{{\bar{K}}},\mu _2)\). We want to ensure that the Galois action on each fiber of \(Y' \rightarrow Y\) has large image—with reference to the discussion above, this is what allows us to ensure that the auxiliary field \(L_w\) is of large degree. We use a variant where each fiber admits a \(G_K\)-equivariant map to \(H^1(Y_{{\bar{K}}}, {{\mathbf {Z}}}/q{{\mathbf {Z}}})\) (for a suitable auxiliary prime q). The Weil pairing alone implies that the Galois action on this is nontrivial, and this (although very weak) is enough to run our argument.

1.4 Problem (b): how to handle the failure of semisimplicity

Let \(y \in Y(K)\). The local Galois representation \(\rho _y|_{G_{K_v}}\) can certainly be very far from semisimple, and thus we cannot hope to use p-adic Hodge theory alone to constrain semisimplicity.

However, the Hodge weights of a global representation are highly constrained by purity (Lemma 2.9). This means, for example, that any global subrepresentation W of \(\rho _y\) corresponds, under p-adic Hodge theory, to a Frobenius-stable subspace \(W_{\mathrm {dR}} \subset H^*_{\mathrm {dR}}(X_y \otimes _{K} K_v)\) whose Hodge filtration is numerically constrained. Now (assuming we have arranged that the Frobenius has small centralizer) there are not too many choices for a Frobenius-stable subspace; on the other hand, the Hodge filtration varies as y varies p-adically. Thus one can at least hope to show that such a “bad” \(W_{\mathrm {dR}}\) exists only for finitely many \(y \in Y(K_v)\). In this way we can hope to show that \(\rho _y\) is simple for all but finitely many y.

The purity argument is also reminiscent of an argument at the torsion level in Faltings’s proof (the use of Raynaud’s results on [14, p. 364]).

We use this argument both for Mordell’s conjecture and for hypersurfaces (although for hypersurfaces we prove a much weaker result, just bounding from above the failure of semisimplicity). The linear algebra involved is fairly straightforward for curves (see Claim 1 and its proof in Sect. 6) but becomes very unwieldy in the higher-dimensional case. To handle it in a reasonably compact way we use some combinatorics related to reductive groups (Sect. 11). However this argument is not very efficient and presumably gives results that are far from optimal.

1.5 Effectivity; comparison with Chabauty–Kim and Faltings

It is of interest to compare our method with that of Chabauty, and the nonabelian generalizations thereof due to Kim [23].

Let Y be a projective smooth curve over K with Jacobian J. Fix a finite place v. The classical method of Chabauty proceeds by considering Y(K) as the intersection of global points J(K) on the Jacobian and local points \(Y(K_v)\) on the curve, inside \(J(K_v)\). If the rank of J(K) is less than the \(K_v\)-dimension of J (i.e. the genus of the curve) it is easy to see this intersection is finite.

We can reinterpret this cohomologically. Let \(T_p\) be the p-adic Tate module of J, where p is a prime below v. There is a Kummer map \(J(K) \otimes {{\mathbf {Q}}}_p \rightarrow H^1(G_K, T_p)\) and we obtain a mapping

$$\begin{aligned} Y(K) \longrightarrow H^1(G_K, T_p) = \mathrm {Ext}^1(\mathrm {trivial}, T_p), \end{aligned}$$

which, explicitly speaking, sends \(y \in Y(K)\) to the extension between the trivial representation and \(T_p\) realized by cohomology of the punctured curve \(H^1_{\mathrm {et}}(Y-\{y,y_0\})\) for a suitable basepoint \(y_0\). By this discussion, and its local analogue, we get a diagram

(1.6)

(Here the global and local Galois representations are extensions of \(T_p\) by the trivial representation.) Kim generalizes this picture, replacing \(T_p\) by deeper quotients of \(\pi _1(Y)\). The idea of p-adic period mappings also plays a key role in his work, see [23, p. 360], [24, p. 93], [25, Proposition 1.4]. The key difficulty to be overcome is to obtain control over the size of the space of global Galois representations (e.g. the rank of J(K)).

Our picture is very much the same: we have a map \(y \rightarrow \rho _y\) from Y(K) to global Galois representations. In the story just described \(\rho _y\) arises from the cohomology of an open variety—the curve Y punctured at y and an auxiliary point. In the situation of our paper, \(\rho _y\) will arise from the cohomology of a smooth projective variety—a covering of Y branched only at y.

What does this gain? Our global Galois representations are now pure and (presumably) semisimple. Therefore our space of global Galois representations should be extremely small. On the other hand, what we lose is that the map \(S_v\) is now no longer obviously injective.

Kim has remarked to one of us (A.V.) that it would be of interest to consider combining these methods in some way, in particular that one might replace the role of the pro-unipotent completion of \(\pi _1(Y)\) in Kim’s analysis by a relative completion.

We expect that our method of proof can be made algorithmic in the same sense as the method of Chabauty. For example, given a curve C as above, one would be able to “compute” a finite subset \(S \subset C(K_v)\) which contains C(K); “compute” means that there is an algorithm that will compute all the elements of S to a specified p-adic precision in a finite time. However, the resulting method is completely impractical, as we now explain.

Firstly, our argument relies on Faltings’s finiteness lemma for Galois representations (Lemma 2.3) to give a finite list of possibilities for \(\rho _y^{\mathrm {ss}}\). We expect that Faltings’s proof can easily be made algorithmic; but there may be very, very many such representations.

Secondly, we would need to explicitly compute the comparisons furnished by p-adic Hodge theory. For a given local Galois representation \(\rho _y^{\mathrm {ss}}\), we need to calculate to some finite precision the filtered \(\phi \)-module associated to it by the crystalline comparison isomorphism of p-adic Hodge theory. We expect that this should be possible, but we are not aware of any known algorithm to achieve this.

To conclude let us compare our method to Faltings’s original proof. That proof gives much more than ours does: it gives the full Shafarevich and Tate conjectures for abelian varieties, as well as semisimplicity of the associated Galois representation. Our proof gives none of these; it gives nothing about the Tate conjecture, and (at least without further effort) it does not give the Shafarevich conjecture but only its restriction to a one-dimensional subfamily of moduli of abelian varieties. Moreover, our proof is also in some sense more elaborate, since it requires the use of tricks and delicate computations to avoid the various complications that we have described. Its only real advantage in the Mordell case seems to be that it is in principle algorithmic in the sense described above. In our view, the real gain of the method is the ability to apply it to families of higher-dimensional varieties. Our results about hypersurfaces are quite modest, but we regard them as a proof of concept for this idea.

1.6 Structure of the paper

Section 2 contains notation and preliminaries.

We suggest the reader start with Sects. 3 and 4 to get a sense of the argument.

Section 3 sets up the general formalism and the structure of the argument. We relate Galois representations to a p-adic period map using crystalline cohomology, and we connect the p-adic period map to a complex period map and monodromy. The section ends with Proposition 3.4, a preliminary form of our main result.

Section 4 gives a first application: a proof of the S-unit theorem, using a variant of the Legendre family. This is much simpler than the proof of Mordell and can be considered a “warm-up.”

Sections 58 give the proof of the Mordell conjecture. Sect. 5 describes the strategy of the proof: we apply a certain refined version of Proposition 3.4, formulated as Proposition 5.3, to a specific family of varieties that we call the Kodaira–Parshin family. Section 6 is the proof of Proposition 5.3. In particular this is where we take advantage of “geometrically disconnected fibers”; the argument also deals with a technical issue relating to semisimplification. In Sect. 7 we introduce the Kodaira–Parshin family and Sect. 8 is purely topological: it computes the monodromy of the Kodaira–Parshin family.

Sections 912 study families of varieties of higher dimension. Section 9 introduces a recent transcendence result of Bakker and Tsimerman which is needed to study families over a higher-dimensional base. Section 10 proves the main result, Proposition 10.1, which shows that fibers of good reduction lie in a Zariski-closed subset of the base. The argument however invokes a “general position” result in linear algebra, Proposition 10.6, whose proof takes up Sect. 11. In Sect. 12 we suggest an alternative argument, not used in the rest of the paper, to bound the size of the Frobenius centralizer.

2 Notation and preparatory results

We gather here some notation and some miscellaneous lemmas that we will use in the text. We suggest that the reader refer to this section only as necessary when reading the main text.

The following notation will be fixed throughout the paper.

  • K a number field

  • \({\overline{K}}\) a fixed algebraic closure of K

  • \(G_K = {{\text {Gal}}}({\overline{K}}/K)\) the absolute Galois group

  • S a finite set of finite places of K containing all the archimedean places

  • \({\mathcal {O}}_S\) the ring of S-integers

  • \({\mathcal {O}} = {\mathcal {O}}_S\) when S is understood

  • p a (rational) prime number such that no place of S lies above p

  • \(K_w\) the completion of K at a prime w of \({\mathcal {O}}\)

  • \({\overline{K}}_w\) a fixed algebraic closure of \(K_w\)

  • \({{\mathbf {F}}}_w\) the residue field at w

  • \(q_w\) the cardinality of \({{\mathbf {F}}}_w\)

  • \({\overline{{{\mathbf {F}}}}}_w\) the residue field of \({\overline{K}}_w\), which is an algebraic closure of \({{\mathbf {F}}}_w\)

  • \({\mathcal {O}}_{(w)}\) the localization of \({\mathcal {O}}\) at w

By a \(G_K\)-set we mean a (discretely topologized) set with a continuous action of \(G_K\).

For a variety X over a field E of characteristic zero, we denote by \(H^*_{\mathrm {dR}}(X/E)\) the de Rham cohomology of \(X \rightarrow {{\text {Spec}}}(E)\). If \(E' \supset E\) is a field extension, we denote by \(H^*_{\mathrm {dR}}(X/E') \) the de Rham cohomology of the base-change \(X_{E'}\), which is identified with \(H_{\mathrm {dR}}(X/E) \otimes _{E} E'\).

For any scheme S, a family overS is an (arbitrary) S-scheme \(\pi {:}\,Y \rightarrow S\). A curve overS is a family over S for which \(\pi \) is smooth and proper of relative dimension 1 and each geometric fiber is connected. (Note that we will also make use of “open” curves, for example in Sect. 4, but we will avoid using the word “curve” in that context.)

Let \(E / {{\mathbf {Q}}}_p\) be a finite unramified extension of \({{\mathbf {Q}}}_p\), and \(\sigma \) the unique automorphism of E inducing the pth power map on the residue field. By \(\phi \)-module (overE) we will mean a pair \((V, \phi )\), with V a finite-dimensional E-vector space and \(\phi {:}\,V \rightarrow V\) a map semilinear over \(\sigma \). A filtered\(\phi \)-module will be a triple \((V, \phi , F^i V)\) such that \((V, \phi )\) is a \(\phi \)-module and \((F^i V)_i\) is a descending filtration on V. We demand that each \(F^i V\) be an E-linear subspace of V but require no compatibility with \(\phi \). Note that the filtered \(\phi \)-modules arising from Galois representations via p-adic Hodge theory satisfy a further condition, admissibility, but we will make no use of it in this paper (see [17, Exposé III, §4.4] and [17, Exposé III, §5.3.3]).

2.1 Linear algebra

Lemma 2.1

Suppose that \(\sigma {:}\,E \rightarrow E\) is a field automorphism of finite order e, with fixed field F. Let V be an E-vector space of dimension d, and \(\phi {:}\,V \rightarrow V\) a \(\sigma \)-semilinear automorphism. Define the centralizer \(\mathrm {Z}(\phi )\) of \(\phi \) in the ring of E-linear endomorphisms of V via

$$\begin{aligned} \mathrm {Z}(\phi ) = \{ f{:}\,V \rightarrow V\,\text{ an }\,E\text{-linear } \text{ map }, \ \ f \phi = \phi f\}; \end{aligned}$$

it is an F-vector space. Then

$$\begin{aligned} \dim _F \mathrm {Z}(\phi ) = \dim _E \mathrm {Z}(\phi ^e), \end{aligned}$$

where \(\phi ^e{:}\,V \rightarrow V\) is now E-linear. In particular, \(\dim _F \mathrm {Z}(\phi ) \leqslant (\dim _E V)^2\).

Proof

Let \({\bar{F}}\) be an algebraic closure of F, and let \(\Sigma \) be the set of F-embeddings \(E \hookrightarrow {\bar{F}}\). Then \({\bar{V}} = V \otimes _{F} {\bar{F}}\) is a \(E \otimes _{F} {\bar{F}} \simeq {\bar{F}}^{\Sigma }\)-module, and splitting by idempotents of \(E \otimes _{F} {\bar{F}}\) we get a decomposition

$$\begin{aligned} {\bar{V}} = \bigoplus _{\tau \in \Sigma } {\bar{V}}^{\tau }, \end{aligned}$$

where \({\bar{V}}^{\tau }\) consists of \({\bar{v}} \in {\bar{V}}\) such that \(e {\bar{v}} = \tau (e) {\bar{v}}\) for all \(e \in E\). (Here the multiplication \(e {\bar{v}}\) is for the E-module structure, and \(\tau (e) {\bar{v}}\) for the \({\bar{F}}\)-module structure, on \({\bar{V}}\).) Moreover, \(\phi \) extends to an \({\bar{F}}\)-linear endomorphism \({\overline{\phi }}\) of \({\bar{V}}\); this endomorphism carries \({\bar{V}}^{\tau }\) to \({\bar{V}}^{\tau \sigma ^{-1}}\).

Fix \(\tau _0 \in \Sigma \); then projection to the \(\tau _0\) factor induces an isomorphism

$$\begin{aligned} \mathrm {Z}({\overline{\phi }}) \simeq \text{ centralizer } \text{ of } {\overline{\phi }}^e \text{ on } {\bar{V}}^{\tau _0}. \end{aligned}$$

Now \(({\bar{V}}^{\tau _0}, {\overline{\phi }}^e)\) is obtained by base extension \(\tau _0{:}\,E \rightarrow {\bar{F}}\) from the E-linear map \(\phi ^e{:}\,V \rightarrow V\); in particular, the dimension of the centralizer on the right is the same as \(\mathrm {Z}(\phi ^e)\), whence the result. \(\square \)

2.2 Semisimplicity

Lemma 2.2

Let \(H \leqslant G\) be a finite-index inclusion of groups, and let \(\rho {:}\,H \rightarrow {{\text {GL}}}_n(F)\) be a semisimple representation of the group H over the characteristic-zero field F. Then the induction \(\rho ^G = \mathrm {Ind}_H^G \rho \) is also semisimple.

Proof

This follows readily from the fact that a representation \(\rho \) of G is semisimple if and only if its restriction to a finite-index normal subgroup \(G_1 \leqslant G\) is semisimple: take \(G_1\) to be the intersection of conjugates of H.

For “if” one can promote a splitting from \(G_1\) to G by averaging; for “only if” we take an irreducible G-representation V, an irreducible \(G_1\)-subrepresentation \(W \subset V\), and note that G-translates of W must span V, exhibiting \(V|_{G_1}\) as a quotient of a semisimple module. \(\square \)

2.3 Global Galois representations

Lemma 2.3

(Faltings) Fix integers \(w,d \geqslant 0\), and fix K and S as above. There are, up to conjugation, only finitely many semisimple Galois representations \(\rho {:}\,G_{K} \rightarrow {{\text {GL}}}_d({{\mathbf {Q}}}_p)\) such that

  1. (a)

    \(\rho \) is unramified outside S, and

  2. (b)

    \(\rho \) is pure of weight w, i.e. for every prime \(\wp \notin S\) the characteristic polynomial of Frobenius at \(\wp \) has all roots algebraic, with complex absolute value \(q_{\wp }^{w/2}\).

  3. (c)

    For \(\wp \) as above the characteristic polynomial of Frobenius at \(\wp \) has integer coefficients.

Proof

This is a consequence of Hermite–Minkowski finiteness; see the proof of [14, Satz 5], or [47, V, Proposition 2.7]. \(\square \)

We want to explain how to adapt this proof to a reductive target group. First we recall the notion of “semisimple” with general reductive target, and some allied notions.

Let K be a field of characteristic zero. First of all, recall that if \({\mathbf {G}}\) is a reductive algebraic group over K and \(\rho {:}\,\Gamma \rightarrow {\mathbf {G}}(K)\) is a representation of the group \(\Gamma \), there are natural notions of “irreducible” and “semisimple” adapted to \({\mathbf {G}}\), as described by Serre [43, 3.2]:

the representation \(\rho \) is G-ir, or irreducible relative to G, if the image \(\rho (\Gamma )\) is not contained in a proper parabolic subgroup \(P \leqslant G\) defined over K.

For example, if \({\mathbf {G}}\) is an orthogonal or symplectic group, this assertion amounts to saying that there is no isotropic\(\Gamma \)-invariant subspace. Next

the representation \(\rho \) is G-c.r., or completely reducible relative to G, if for any parabolic subgroup \(P \leqslant G\) defined over K containing the image \(\rho (\Gamma )\), there exists a Levi factor \(L \leqslant P\), defined over K, which also contains this image.

We will also refer to G-c.r. as “semisimple” when the target group is clear. Let \(\rho {:}\,\Gamma \rightarrow {\mathbf {G}}(K)\) be an arbitrary representation. Let P be a K-parabolic subgroup that contains the image of \(\rho \) and which is minimal for this property. Then the projection of \(\rho \) to a Levi factor \(M \subset P\) is independent, up to G-conjugacy, of the choice of M; see [43, Proposition 3.3]. This resulting representation is called the semisimplification of \(\rho \), relative to the ambient group \({\mathbf {G}}\), and will be denoted by \(\rho ^{\mathrm {ss}}\). The Zariski closure of this semisimplification is a reductive group, at least for K in characteristic zero: see [43, Proposition 4.2].Footnote 1

Later on we will use the following observation.

Lemma 2.4

For any \(\gamma \in \Gamma \), \(\rho ^{\mathrm {ss}}(\gamma )\) and \(\rho (\gamma )\) have the same semisimple part up to conjugacy.

Proof

Indeed, let P be as above, and factorize \(P = MU\) into a Levi factor M and U the unipotent radical of P. We must prove that for \(p=mu \in P(K)\), with \(m \in M(K)\) and \(u \in U(K)\), the semisimple parts of p and m are conjugate within P. To prove this take a commuting factorization \(p = p^{ss} p^{u}\), and similarly for m. By functoriality, \(m^{ss}\) is the image of \(p^{ss}\). We are reduced to the case of m and p semisimple:

$$\begin{aligned} \text{ a } \text{ semisimple } \text{ element }\,p=mu \text{ in } P(K)\,\text{ is }\,P(K)\text{-conjugate } \text{ to }\,m, \end{aligned}$$
(2.1)

and clearly it is enough to be able to conjugate p into M.

The element p is contained in some maximal torus T [6, 10.6,11.10] which is contained in a Levi subgroup of P. However all Levi subgroups are conjugate under U(K) [6, Proposition 20.5] we may therefore conjugate p into M as desired. \(\square \)

In passing we also record:

Lemma 2.5

Suppose \(P=MU\) is a parabolic subgroup of the reductive K-group G.

Let \(S \leqslant P\) be a K-torus, then S is conjugate under U(K) to its projection to M.

In particular, \(\chi {:}\,{\mathbf {G}}_m \rightarrow P\) be a character; then \(\chi \) is conjugate, under P(K), to its projection to M.

Proof

We may assume that S is a maximal torus, and then the claim follows from the argument above. \(\square \)

Faltings’ finiteness theorem continues to apply in this context:

Lemma 2.6

Let \({\mathbf {G}} \subset {{\text {GL}}}_n\) be a reductive group, K a number field, S a finite set of places. Consider all representations

$$\begin{aligned} \rho {:}\,G_{K} \longrightarrow {\mathbf {G}}({{\mathbf {Q}}}_{p}) \end{aligned}$$

which, when considered as representations into \({{\text {GL}}}_n({{\mathbf {Q}}}_p)\), satisfy conditions (a), (b), (c) of Lemma 2.3 (i.e. S-unramified, pure of weight w, integral).

Then there are only finitely many possibilities for the \({\mathbf {G}}({{\mathbf {Q}}}_p)\)-conjugacy class of \(\rho ^{\mathrm {ss}}\).

Indeed, there are only finitely many possibilities up to \({\mathbf {G}}({{\mathbf {Q}}}_p)\)-conjugacy for pairs \(({\mathbf {Q}}, \rho {:}\,G_{{{\mathbf {Q}}}} \rightarrow {\mathbf {L}}_Q({{\mathbf {Q}}}_p))\) where \({\mathbf {Q}}\) is a \({{\mathbf {Q}}}_p\)-parabolic subgroup with Levi quotient \({\mathbf {L}}_Q\), the image of \(\rho \) is irreducible in \({\mathbf {L}}_Q\), and \(\rho \) again satisfies the conditions of Lemma 2.3.

Proof

Note first that for such \(\rho \), the \({\mathbf {G}}\)-semisimplification \(\rho ^{\mathrm {ss}}\) is also semisimple considered as a representation with target \({{\text {GL}}}_n\) (since its Zariski closure is reductive, as noted above).

By Lemma 2.3 is enough to check that, for any fixed such \(\rho _0\), there are only finitely many \({\mathbf {G}}({{\mathbf {Q}}}_p)\)-orbits on the set of \({{\text {GL}}}_n({{\mathbf {Q}}}_p)\)-conjugates of \(\rho _0^{\mathrm {ss}}\) with image in \({\mathbf {G}}\). Let \({\mathbf {L}}\) be the Zariski closure of the image of \(\rho _0^{\mathrm {ss}}\). It is a reductive \({{\mathbf {Q}}}_p\)-subgroup of \({\mathbf {G}}\). Then for \(g \in {{\text {GL}}}_n({{\mathbf {Q}}}_p)\) the image of \(\mathrm {Ad}(g) \rho _0\) belongs to \({\mathbf {G}}\) if, and only if, \(\mathrm {Ad}(g) {\mathbf {L}} \subset {\mathbf {G}}\). In other words, it is enough to verify that the set

$$\begin{aligned} \{g \in {{\mathbf {G}}}{{\mathbf {L}}}_n({{\mathbf {Q}}}_p){:}\,\mathrm {Ad}(g) {\mathbf {L}} \subset {\mathbf {G}} \} \end{aligned}$$

consists of finitely many double cosets under \(({\mathbf {G}}({{\mathbf {Q}}}_p), {\mathbf {L}}({{\mathbf {Q}}}_p))\), or equivalently finitely many \({\mathbf {G}}({{\mathbf {Q}}}_p)\)-orbits.

We may replace \({\mathbf {L}}\) by its connected component, and then it is enough to verify this assertion at the level of Lie algebras, i.e. to prove the same assertion for the set

$$\begin{aligned} \{g \in \mathrm {GL}_n({{\mathbf {Q}}}_p){:}\,\mathrm {Ad}(g) {\mathfrak {l}} \subset {\mathfrak {g}}\} \end{aligned}$$

According to Richardson’s theorem [35, Theorem 7.1] this forms finitely many \({\mathbf {G}}\) orbits over the algebraic closure \(\overline{{{\mathbf {Q}}}_p}\). The result then follows from finiteness of the Galois cohomology \(H^1({{\mathbf {Q}}}_p, {\mathbf {S}})\) for any linear algebraic group \({\mathbf {S}}\) [42, III §4, Theorem 4].

To see the validity of the refinement, note that there are finitely many conjugacy classes of parabolic subgroups \({\mathbf {P}}\) defined over \({{\mathbf {Q}}}_p\), and for each such \({\mathbf {P}}\) there are—by what we just proved, applied to a Levi factor—only finitely many \({\mathbf {P}}({{\mathbf {Q}}}_p)\)-conjugacy classes of (pure of weight w, unramified outside S) irreducible representations \(G_K \rightarrow {\mathbf {L}}_P({{\mathbf {Q}}}_p)\). \(\square \)

2.4 Friendly places

For our later applications it is convenient to have available a class of “friendly” places of a number field K at which the local behavior of homomorphisms \(G_K \rightarrow {{\mathbf {Q}}}_p^*\) is particularly simple. (Actually, in our applications, it would be enough to do this for \(K={{\mathbf {Q}}}\), for which everything is quite straightforward, and to always use Lemma 2.10 with \(K={{\mathbf {Q}}}\). However, it makes our arguments a little easier to write to have friendly places available for a general number field K).

First we recall some structural theory [41, II.3.3]. Let \({\mathcal {C}} \subset G_{{{\mathbf {Q}}}} = {{\text {Gal}}}({\overline{{{\mathbf {Q}}}}}/{{\mathbf {Q}}})\) be the conjugacy class of complex conjugation, and let \(H^+ = \langle {\mathcal {C}} \rangle \), the normal subgroup generated by \({\mathcal {C}}\); there is a unique nontrivial homomorphism \(H^+ \rightarrow \{ \pm 1\}\) and we let H be its kernel. A subfield \(K \subset {\overline{{{\mathbf {Q}}}}}\) is totally real if and only if it is fixed by \(H^+\). It is CM if and only if it is fixed by H but not \(H^+\).

For an arbitrary number field \(K \subset {\overline{{{\mathbf {Q}}}}}\) let E and \(E^+\) be, respectively, the subfields of K defined by fixed fields of \(G_K \cdot H\) and \(G_K \cdot H^+\), respectively (where \(G_K\) is the Galois group of \({\overline{{{\mathbf {Q}}}}}\) over K). Then \(E^+\) is the largest totally real subfield of K, and either \(E^+=E\) is totally real, orE is CM and is the largest CM subfield of K.

Definition 2.7

(Friendly places) Let K be a number field.

  • If K has a CM subfield, then let E be its maximal CM subfield and \(E^+\) the maximal totally real subfield of E. In this case, we say that a place v of K is friendly if it is unramified over \({{\mathbf {Q}}}\), and it lies above a place of \(E^+\) that is inert in E.

  • If K has no CM subfield, any place v of K which is unramified over \({{\mathbf {Q}}}\) will be understood to be friendly.

Clearly, infinitely many friendly places exist; however, if K has a CM subfield, they have Dirichlet density 0.

Consider, now, a continuous character \(\eta {:}\,{{\text {Gal}}}({\overline{K}}/K) \longrightarrow {{\mathbf {Q}}}_p^*\), ramified at only finitely many places; by class field theory it corresponds to a homomorphism \({\mathbf {A}}_K^*/K^* \rightarrow {{\mathbf {Q}}}_p^*\). In particular, its restriction to places above p gives rise to a homomorphism \(\eta _p{:}\,(K \otimes {{\mathbf {Q}}}_p)^* \longrightarrow {{\mathbf {Q}}}_p^*\). As usual, we say this is locally algebraic if it agrees, in a neighbourhood of the identity, with the \({{\mathbf {Q}}}_p\)-points of an algebraic homomorphism \(\mathrm {Res}_{(K \otimes {{\mathbf {Q}}}_p)/{{\mathbf {Q}}}_p} {\mathbf {G}}_m \longrightarrow {\mathbf {G}}_m\) of \({{\mathbf {Q}}}_p\)-algebraic groups, cf. [41, Chapter III]. This condition is implied by being Hodge–Tate at primes above p, by a theorem of Tate [41, Chapter III, Appendix]. Moreover, since \(\eta \) is finitely ramified, it follows that \(\eta _p\) is trivial on a finite-index subgroup of the units \({\mathcal {O}}_K^*\), embedded into \((K \otimes {{\mathbf {Q}}}_p)^*\).

For such \(\eta \), we say that \(\eta \) is pure of weight w when it satisfies the condition explained in Lemma 2.3.

Lemma 2.8

Let v be any friendly place of K, lying above the prime p of \({{\mathbf {Q}}}\). For any continuous character \(\eta {:}\,{{\text {Gal}}}({\bar{K}}/K) \longrightarrow {{\mathbf {Q}}}_p^*\), ramified at only finitely many places, pure of weight w, and locally algebraic at each prime above p, one has

$$\begin{aligned} \eta ^2|_{K_{v}^*} = \chi \cdot \mathrm {Norm}_{K_{v}/{{\mathbf {Q}}}_p}^{w}, \end{aligned}$$

where \(\chi \) has finite order. In particular, w is even and the Hodge–Tate weight of \(\eta \) at the place v equals w/2.

In other words, the restriction of globally pure characters to friendly places is of a standard form. Note that if the coefficients are enlarged from \({{\mathbf {Q}}}_p^*\) to \({{\mathbf {Q}}}_{p^2}^*\), the statement above is no longer true; an example is given by the idele class character associated to a CM elliptic curve.

The proof of this result is routine. The key point is due to Artin and Weil: an algebraic Hecke character factors through the norm map to the maximal CM subfield.

Proof

Being locally algebraic, \(\eta \) gives rise to an algebraic character of \(\mathrm {Res}_{K/{{\mathbf {Q}}}} {\mathbf {G}}_m\), which is trivial on a finite-index subgroup of \({\mathcal {O}}^*\). Said differently, we obtain a \({{\mathbf {Q}}}_p\)-rational character \( {\mathbf {S}} \longrightarrow {\mathbf {G}}_m\) of the Serre torus \({\mathbf {S}}\); we will denote this also by \(\eta \). (Note that \(\eta \) is forced to be \({{\mathbf {Q}}}_p\)-rational since it carries \({\mathbf {S}}({{\mathbf {Q}}}_p)\) into \({{\mathbf {Q}}}_p^*\)). Here \({\mathbf {S}}\) is the quotient of \(\mathrm {Res}_{K/{{\mathbf {Q}}}} {\mathbf {G}}_m\) by the Zariski closure of (a sufficiently deep finite-index subgroup of) the units. Because of the purity assertion, if \(\lambda \in K^*\) is a unit at all ramified primes for \(\eta \), then \(\eta (\lambda )\) is an algebraic number all of whose conjugates have absolute value \(\mathrm {N}_{K/{{\mathbf {Q}}}}(\lambda )^{w/2}\).

The structure of this torus was in effect computed by Weil [45], and in detail by Serre: If K admits no CM subfield, then the norm map \({\mathbf {S}} \rightarrow {\mathbf {G}}_m\) is in fact an isogeny. So \(\eta \) is (up to finite order) the norm raised to the power w/2. The result follows.

Thus we suppose that K has a CM subfield; now let E be the largest CM subfield of K, and let \(E^+\) be the totally real subfield of E. Then the norm map \( {\mathbf {S}} \rightarrow {\mathbf {S}}_{E}\) is an isogeny; in other words, a suitable power \(\eta ^k\) factors through the norm from K to E. Therefore it is enough to prove the Lemma for \(K=E\), replacing v by the place of E below it. In particular, by definition, v lies above an inert prime of \(E/E^+\).

Now there is a norm map \({\mathbf {S}}_E \rightarrow {\mathbf {G}}_m\). Write \(x \mapsto {\bar{x}}\) for the complex conjugation on E. The map \(x \mapsto x/{\bar{x}}\), from \(E^*\) to \(E^*\), is trivial on a finite-index subgroup of the units, and its image consists entirely of elements whose norm (to \(E^+\)) equals 1. Indeed for any \({{\mathbf {Q}}}\)-algebra R the rule \(x \mapsto x/{\bar{x}}\) defines a map \((E \otimes R)^* \rightarrow (E \otimes R)^*\), corresponding to a unique map of \({{\mathbf {Q}}}\)-algebraic groups

$$\begin{aligned} \theta {:}\,{\mathbf {S}}_E \rightarrow (\mathrm {Res}_{E/{{\mathbf {Q}}}} {\mathbf {G}}_m)^1 \end{aligned}$$

where the superscript 1 denotes the kernel of the norm to \(E^+\). Together with the norm map this gives an isogeny \( {\mathbf {S}}_E \longrightarrow {\mathbf {G}}_m \times (\mathrm {Res}_{E/{{\mathbf {Q}}}} {\mathbf {G}}_m)^1\). Raising the character \(\eta \) to a suitable power we can suppose that it factors through the right-hand side; twisting it by a power of the cyclotomic character, we can arrange that it is trivial on the \({\mathbf {G}}_m\) factor.

In other words, we are reduced to checking the case where \(\eta \) factors through \(\theta \). Now the weights of \(x\mapsto \eta (x)\) and \(x \mapsto \eta ({\bar{x}})\) coincide, but their product is trivial; so the weight of \(\eta \) is zero. Also \(\eta \) is trivial on \(E_v^*\): consider

$$\begin{aligned} E_{v}^* \subset (E \otimes {{\mathbf {Q}}}_p)^* \rightarrow {\mathbf {S}}({{\mathbf {Q}}}_p) {\mathop {\rightarrow }\limits ^{\theta }} (E \otimes {{\mathbf {Q}}}_p)^{1} = \left( \prod _{w|p} E_w^*\right) ^{1}. \end{aligned}$$

The image of \(E_{v}^*\) is contained inside \( \{ y \in E_{v}^*{:}\,y {\bar{y}} = 1\}\); this is contained in a \({{\mathbf {Q}}}_p\)-anisotropic subtorus of \((\mathrm {Res}_{E/{{\mathbf {Q}}}} {\mathbf {G}}_m)^1 \). Therefore, any \({{\mathbf {Q}}}_p\)-rational character of \((\mathrm {Res}_{E/{{\mathbf {Q}}}} {\mathbf {G}}_m)^1\) is trivial upon pullback to \(E_v^*\). This is exactly what we wanted to prove (since, as we just saw, once \(\eta \) factors through \(\theta \) its weight is zero). \(\square \)

2.5 Reducibility of global Galois representations

We now give some lemmas which limit the reducibility of a global pure Galois representation. The mechanism is as follows: purity passes to subrepresentations, and then leads to restrictions on the sub-Hodge structure.

For a decreasing filtration \(F^{\bullet } V\) on a vector space V (with \(F^0 V =V\)) we define the weight of the filtration to be

$$\begin{aligned} \mathrm {weight}_F(V) = \frac{ \sum _{p \geqslant 0} p \dim \mathrm {gr}^p(V) }{\dim V}, \end{aligned}$$
(2.2)

where \(\mathrm {gr}^p(V) = F^p(V)/F^{p+1}(V)\) is the associated graded.Footnote 2 For the other p-adic Hodge theory terms that appear in the following result, see [7, §6] or [17, Expose III].

Lemma 2.9

Let K be a number field and v a friendly place. Let V be a Galois representation of \(G_K\) on a \({{\mathbf {Q}}}_p\)-vector space which is crystalline at all primes above p, and pure of weight w.

Let \(V_{\mathrm {dR}} = (V \otimes _{{{\mathbf {Q}}}_p} B_{\mathrm {cris}})^{G_{K_v}}\) be the filteredFootnote 3\(K_v\)-vector space that is associated to \(\rho |_{K_v}\) by the p-adic Hodge theory functor \({\underline{D}}_{\mathrm {cris}}\) of [17, Expose III].

Then the weight of the Hodge filtration on \(V_{\mathrm {dR}}\) equals w/2.

Proof

Apply Lemma 2.8 to \(\det (V)\). \(\square \)

Lemma 2.10

Let K be a number field, and \(L \supset K\) a finite extension. Let \(\rho {:}\,G_L \rightarrow {{\text {GL}}}_n({{\mathbf {Q}}}_p)\) be a representation of \(G_L\) that is crystalline at all primes above p, and pure of weight w; let \(a_u(\rho )\) be the weight of the associated Hodge filtration at each such prime u. Then, for any friendly prime v of K above p,

$$\begin{aligned} \sum _{u | v} [L_u{:}\,K_v] a_u(\rho ) = [L{:}\,K] \frac{w}{2}. \end{aligned}$$

Proof

We apply Lemma 2.9 to \(\mathrm {Ind}_{G_L}^{G_K} \rho \) and to the place v. Applying the functor of p-adic Hodge theory to its restriction to \(K_v\), we obtain

$$\begin{aligned} (\mathrm {Ind}^K_L \rho \otimes _{{{\mathbf {Q}}}_p} B_{\mathrm {dR}})^{G_{K_v}} \simeq \bigoplus _{u|v} ( \rho \otimes _{{{\mathbf {Q}}}_p} B_{\mathrm {dR}})^{G_{L_u}} \end{aligned}$$

(considered now as a filtered \(K_v\)-vector space), and its weight is therefore \(\frac{\sum _{u | v} [L_u{:}\,K_v] a_u(\rho )}{[L{:}\,K]}\). \(\square \)

2.6 The affine group \(\mathrm {Aff}(q)\)

Let \(q \geqslant 3\) be a prime number and let \(\mathrm {Sym}({\mathbf {F}}_q)\) be the symmetric group on the q elements of \({\mathbf {F}}_q\). Let \(\mathrm {Aff}(q) \subseteq \mathrm {Sym}({\mathbf {F}}_q)\) be the subgroup consisting of permutations of \({{\mathbf {F}}}_q\) of the form \( x \mapsto a x+b\) where \(a \in {{\mathbf {F}}}_q^*\) and \(b \in {{\mathbf {F}}}_q\). ThusFootnote 4\(\mathrm {Aff}(q) \cong ({{\mathbf {F}}}_q)^+ \rtimes ({{\mathbf {F}}}_q)^*\); this group has important applications in the theory of qualifying examinations. We shall make extensive use of it as a Galois group for certain auxiliary coverings of curves.

Lemma 2.11

For any \(s \geqslant 1\) consider the map \(f{:}\,\mathrm {Aff}(q)^{2s} \longrightarrow {{\mathbf {F}}}_q^+\) given by

$$\begin{aligned} f{:}\,{\mathbf {g}} = (g_1, g_1', \ldots , g_s, g_s') \mapsto [g_1, g_1'] \cdot [g_2, g_2'] \cdot \cdots \cdot [g_s, g_s'] \end{aligned}$$

(here [xy] is the commutator \(xyx^{-1}y^{-1}\)). The image of the map

$$\begin{aligned} \{ {\mathbf {g}} \in \mathrm {Aff}(q)^{2s}{:}\,f({\mathbf {g}}) \ne 0, {\mathbf {g}} \text{ generates } \mathrm {Aff}(q) \} \rightarrow \left[ {{\mathbf {F}}}_q^* \right] ^{2s} \end{aligned}$$
(2.3)

(sending each \(g_i\) to its image in the abelian quotient \({{\mathbf {F}}}_q^*)\) consists precisely of those (2s)-tuples in \({{\mathbf {F}}}_q^*\) whose entries generate \({{\mathbf {F}}}_q^*\). The fiber above any point in the image has the same size.

Proof

Note that, for such a fiber to be nonempty, the element \({\mathbf {y}} = (y_1, y_1', \dots , y_s, y_s')\) of the target must have the property that the \(y_i\) and \(y_i'\) generate \({{\mathbf {F}}}_q^*\). In this case, any preimage \({\mathbf {g}} \in \mathrm {Aff}(q)^{2s}\) with the property that \(f({\mathbf {g}}) \ne 0\) necessarily generates \(\mathrm {Aff}(q)\). The fiber of \(\mathrm {Aff}(q)^{2s}\) above \({\mathbf {y}}\) is (in obvious coordinates) an affine space over \({{\mathbf {F}}}_q\), and the map f is a nontrivial affine-linear map; each fiber thus has size \(q^{2s-1}(q-1)\). \(\square \)

2.7 Symplectic groups

Let K be a field of characteristic zero. As usual if V is a symplectic space over a field K, with nondegenerate alternating bilinear form \(\langle -, - \rangle \), we write \(\mathrm {Sp}(V)\) for the algebraic group of automorphisms of V preserving the bilinear form.

The following statement is an algebraic version of Goursat’s lemma (cf. [34, Lemma 5.2.1]). One uses the fact that the Lie algebra \(\mathfrak {sp}_V\) of \(\mathrm {Sp}(V)\) is simple, and that all the automorphisms of \(\mathfrak {sp}_V\) are inner.

Lemma 2.12

Suppose G is an algebraic subgroup of \(\mathrm {Sp}(V)^N\), satisfying the following conditions.

  • For \(1 \leqslant i \leqslant N\), the projection \(\pi _i{:}\,G \rightarrow \mathrm {Sp}(V)\) onto the ith factor is surjective.

  • For \(1 \leqslant i, j \leqslant N\), there exists \(g \in G\) such that \(\pi _i(g)\) and \(\pi _j(g)\) are unipotent with fixed spaces of different dimensions.

Then G is all of \(\mathrm {Sp}(V)^N\).

Any unipotent element of \(\mathrm {Sp}(V)\) whose fixed space has codimension 1 is of the form

$$\begin{aligned} T_v^r{:}\,x \mapsto x + r \langle v, x \rangle v \end{aligned}$$
(2.4)

for some \(v \in V, r \in K\). We call \(T_v^r\) a transvection with center v, and write \(T_v\) for \(T_v^1\).

Lemma 2.13

Let V be a symplectic space over \({\mathbb {Q}}\). Suppose \(v_1, v_2 \in V\) are linearly independent and satisfy

$$\begin{aligned} \langle v_1, v_2 \rangle \ne 0. \end{aligned}$$

The Zariski closure of the subgroup generated by \(T_{v_1}, T_{v_2}\) also contains \(T_v\) for every \(v \in {\text {Span}} ( v_1, v_2 )\).

Proof

The subgroup in question preserves the splitting \(V = \langle v_1, v_2 \rangle \oplus \langle v_1, v_2 \rangle ^{\perp }\), and so we reduce to the case that V is 2-dimensional. The statement then amounts to the fact that \({{\text {SL}}}(2)\) is generated, as an algebraic group, by upper and lower triangular matrices. \(\square \)

Lemma 2.14

Let V be a symplectic space over \({\mathbb {Q}}\). Let S be a set of vectors \(v \in V\). Make a graph whose set of vertices are S, having an edge between \(v_1\) and \(v_2\) if and only if \(\langle v_1, v_2 \rangle \ne 0\). If this graph is connected, then the Zariski closure of the group generated by the transvections \(T_v\), for \(v \in S\), contains \(T_w\) for any w in the span of S.

Proof

We can assume S is finite, and then use induction on \(\left| S \right| \), using Lemma 2.13 for the inductive step.

In detail: suppose \(S = S_0 \cup \{v\}\), with the graph on \(S_0\) connected. By inductive hypothesis we obtain all transvections centered at vectors in \(W := \mathrm {span}(S_0)\). It is enough to verify that the Zariski closure in question contains the transvection \(T_x\) for each vector x of the form \(w +v \ (w \in W)\); this is so when \(\langle w, v \rangle \ne 0\) by the prior Lemma. The condition \(\langle w, v \rangle \ne 0\) defines a Zariski-dense subset of W and so we also get the remaining transvections \(T_x\) when \(\langle w, v \rangle = 0\) in the Zariski closure of them. \(\square \)

3 Fibers with good reduction in a family

In this section we give a general criterion (Proposition 3.4) which controls, in a given family of smooth proper varieties, the collection of fibers that have good reduction outside a fixed set of primes. The Proposition simply translates (using p-adic Hodge theory) the finiteness statement of Lemma 2.3 into a restriction on the image of the period map.

3.1 Basic notation

We use notation \(K, {\mathcal {O}}, {\mathcal {O}}_{(w)}, S, G_K, {{\mathbf {F}}}_w\) as in Sect. 2.

Let Y be a smooth K-variety, and \(\pi {:}\,X \rightarrow Y\) a proper smooth morphism.

Suppose that this admits a good model over \({\mathcal {O}}\), i.e. it extends to a proper smooth morphism \(\pi {:}\,{\mathcal {X}} \rightarrow {\mathcal {Y}}\) of smooth \({\mathcal {O}}\)-schemes. Suppose, moreover, that all the cohomology sheaves \({\mathbf {R}}^q \pi _* \Omega ^p_{{\mathcal {X}}/{\mathcal {Y}}}\) are sheaves of locally free \({\mathcal {O}}_Y\)-modules, and that the same is true of the relative de Rham cohomology \({\mathscr {H}}^q = {\mathbf {R}}^q \pi _* \Omega ^{\bullet }_{{\mathcal {X}}/{\mathcal {Y}}}\). There is no harm in these assumptions, because the sheaves in question are coherent \({\mathcal {O}}_Y\)-modules which are free over the generic point of \({\mathcal {O}}\) [10, Theorem 5.5]; so the assumptions can always be achieved by possibly enlarging the set S of primes.

The generic fiber of \({\mathscr {H}}^q\) is equipped with the Gauss–Manin connection (by [22, Theorem 1]) and, again by enlarging S if necessary, we may suppose that this extends to a morphism

$$\begin{aligned} {\mathscr {H}}^q \rightarrow {\mathscr {H}}^q \otimes \Omega ^1_{{\mathcal {Y}}/{\mathcal {O}}}. \end{aligned}$$
(3.1)

For any \(y \in Y(K)\), we shall denote by \(X_y = \pi ^{-1}(y)\) the fiber of \(\pi \) above y; it is a smooth proper variety over K. Our goal in this section is to bound \({\mathcal {Y}}({\mathcal {O}})\). We will do this by studying the p-adic properties of the Galois representation attached to \(X_y\), for \(y \in {\mathcal {Y}}({\mathcal {O}}) \hookrightarrow Y(K)\). Fixing a degree \(q \geqslant 0\), we denote by \(\rho _y\) the representation of the Galois group \(G_K\) on the étale cohomology group of \((X_y)_{{\bar{K}}}\):

$$\begin{aligned} \rho _y{:}\,G_K \rightarrow {{\text {Aut}}}\ H^q_{\mathrm {et}}(X_y \times _{K} {\bar{K}}, {{\mathbf {Q}}}_p). \end{aligned}$$
(3.2)

Fix an archimedean place \(\iota {:}\,K \hookrightarrow {{\mathbf {C}}}\), and fix a finite place \(v{:}\,K \hookrightarrow K_v\) satisfying:

  • if p is the rational prime below v, then \(p > 2\), and

  • \(K_v\) is unramified over \({{\mathbf {Q}}}_p\), and

  • no prime above p lies in S.

Fix \(y_0 \in {\mathcal {Y}}({\mathcal {O}})\). In what follows, we will analyze the set

$$\begin{aligned} U := \{y \in {\mathcal {Y}}({\mathcal {O}}){:}\,y \equiv y_0 \text{ modulo } v. \} \end{aligned}$$
(3.3)

and give criteria for the finiteness of U in terms of the associated period map. Clearly if U is finite for each choice of \(y_0\), then \({\mathcal {Y}}({\mathcal {O}})\) is finite too.

Finally, we put

$$\begin{aligned} X_0 = \pi ^{-1}(y_0) \end{aligned}$$

to be the fiber above \(y_0\).

3.2 The cohomology at the basepoint \(y_0\)

For any K-variety Z, we shall denote by \(Z_{{{\mathbf {C}}}}\) its base change to \({{\mathbf {C}}}\) via \(\iota \), and by \(Z_{K_v}\) its base change to \(K_v\) via v.

Let

$$\begin{aligned} V = H^q_{\mathrm {dR}}(X_0/K). \end{aligned}$$
(3.4)

Let \(d = \dim _K V\). We will also denote by \(V_v\) and \(V_{{\mathbf {C}}}\) the \(K_v\)- and \({{\mathbf {C}}}\)-vector spaces obtained by \(\otimes _{K} K_v\) or \(\otimes _{(K,\iota )} {{\mathbf {C}}}\). Then \(V_{{\mathbf {C}}}\) is naturally identified with the de Rham cohomology of the variety \(X_{0,{{\mathbf {C}}}}\), which is also (by the comparison theorem) identified with the singular cohomology of \(X_{0,{{\mathbf {C}}}}\) with complex coefficients:

$$\begin{aligned} V_{{{\mathbf {C}}}} \simeq H^q_{\mathrm {sing}}(X_{0,{{\mathbf {C}}}}, {{\mathbf {C}}}). \end{aligned}$$

In particular, monodromy defines a representation \(\mu {:}\,\pi _1(Y_{{{\mathbf {C}}}}({{\mathbf {C}}}), y_0) \longrightarrow {{\text {GL}}}(V_{{\mathbf {C}}}),\) whose Zariski closure we denote by \(\Gamma \):

$$\begin{aligned} \Gamma = \text{ Zariski } \text{ closure } \text{ of }\,\mathrm {image}(\mu ), \end{aligned}$$
(3.5)

an algebraic subgroup of \({{\text {GL}}}(V_{{\mathbf {C}}})\). Note that both \(V_{{\mathbf {C}}}\) and \(\Gamma \) depend on the choice of archimedean place \(\iota \), although this dependence is suppressed in our notation.

3.3 The Gauss–Manin connection

The connection (3.1) allows us to identify the cohomology of nearby fibers. This is true both for the \(K_v\) and \({{\mathbf {C}}}\) topologies. However, as we now discuss, both identifications can be described as the evaluation of a single power series with K coefficients, which is convergent both for the \(K_v\) and \({{\mathbf {C}}}\) topology.

Specifically, if we fix a local basis \(\{v_1, \dots , v_r\}\) for \({\mathscr {H}}^q\) in a neighborhood of some point of the scheme \({\mathcal {Y}}\), and write \(\nabla v_i = \sum _j A_{ij} v_j\), where \(A_{ij} \) are sections of \( \Omega ^1_{{\mathcal {Y}}}\), then a local section \(\sum f_i v_i\) is flat exactly when it solves the equation

$$\begin{aligned} d(f_i) = - \sum _{j} A_{ji} f_j. \end{aligned}$$
(3.6)

In particular, if \(y_0 \in {\mathcal {Y}}({\mathcal {O}})\) and the place v is as before, let \(\overline{y_0} \in {\mathcal {Y}}({{\mathbf {F}}}_v)\) be the reduction, and choose a system of parameters \(p, z_1, \dots , z_m \in {\mathcal {O}}_{{\mathcal {Y}}, \overline{y_0}}\) for the local ring of \({\mathcal {Y}}\) at \(\overline{y_0}\); we may do this so that \((z_1, \dots , z_m)\) generate the kernel of the morphism \({\mathcal {O}}_{{\mathcal {Y}},\overline{y_0}} \rightarrow {\mathcal {O}}_{(v)}\) corresponding to \(y_0\). The completed local ring \(\widehat{{\mathcal {O}}}_{{\mathcal {Y}}, \overline{y_0}}\) at \(\overline{y_0}\) is therefore identified with \({\mathcal {O}}_v[[z_1, \dots , z_m]]\), and the image of \({\mathcal {O}}_{{\mathcal {Y}},\overline{y_0}}\) in it is contained in \({\mathcal {O}}_{(v)}[[z_1, \dots , z_m]]\).

Fix a basis \(\{{\bar{v}}_1, \dots , {\bar{v}}_r\}\) for \({\mathscr {H}}^q\) at \(\overline{y_0}\), which we assume to be compatible with the Hodge filtration, i.e. each step of the Hodge filtration \(F^i {\mathscr {H}}^q\) at \(\overline{y_0}\) is spanned by a subset of \(\{{\bar{v}}_i\}\). Then by lifting we obtain a similar basis \(\{v_1, \dots , v_r\}\) for \({\mathscr {H}}^q\) over the local ring \({\mathcal {O}}_{{\mathcal {Y}}, \overline{y_0}}\) of \({\mathcal {Y}}\) at \(\overline{y_0}\). With respect to such a basis \(v_i\), the coefficients \(A_{ij}\) of (3.6) are of the form \(A_{ij} =\sum _{k=1}^{m} a_{ij,k} dz_k\), where \(a_{ij,k} \in {\mathcal {O}}_{{\mathcal {Y}}, \overline{y_0}}\). In particular, the coefficients of \(a_{ij,k}\), considered as formal power series in the \(z_i\), lie in \({\mathcal {O}}_{(v)}\).

We may write down a formal solution to (3.6), where the \(f_i\) are given by formal power series in \(K[[z_1, \dots , z_m]]\). By direct computation we see that these are v-adically absolutely convergent for \(|z_i|_v < |p|_v^{1/(p-1)}\) (where p is the residue characteristic of \({\mathcal {O}}_v\)) and \(\iota \)-adically absolutely convergent for sufficiently small \(|z_i|_{{{\mathbf {C}}}}\).

By assumption, we have \(p>2\), and v is unramified above p. Thus we obtain an identification

$$\begin{aligned} \mathrm {GM}{:}\,H^q_{\mathrm {dR}} ({\mathcal {X}}_{y_0} / K_v) {\mathop {\rightarrow }\limits ^{\sim }} H^q_{\mathrm {dR}}({\mathcal {X}}_y / K_v) \end{aligned}$$
(3.7)

whenever \(y \in {\mathcal {Y}}({\mathcal {O}}_v)\) satisfies \(y \equiv y_0\) modulo v, and

$$\begin{aligned} \mathrm {GM}{:}\,H^q_{\mathrm {dR}}(X_{y_0,{{\mathbf {C}}}} / {{\mathbf {C}}}) {\mathop {\rightarrow }\limits ^{\sim }} H^q_{\mathrm {dR}}(X_{y, {{\mathbf {C}}}} / {{\mathbf {C}}}), \end{aligned}$$
(3.8)

when \(y \in Y_{{{\mathbf {C}}}}({{\mathbf {C}}})\) is sufficiently close to \(y_0\). In the coordinates of the basis \(v_i\) fixed above, \(\mathrm {GM}\) is given by an \(r \times r\) matrix with entries

$$\begin{aligned} A_{ij}(z_1, \dots , z_m) \in {\mathcal {O}}_{(v)}[[z_1, \dots , z_m]], \end{aligned}$$

convergent in the regions noted above.

The fiber over the \({\mathcal {O}}\)-point \(y_0\) of \({\mathcal {Y}}\) gives a smooth proper \({\mathcal {O}}\)-model \({\mathcal {X}}_0\) for \(X_0\). For \(y \in Y({\mathcal {O}}_v)\) with \(y \equiv y_0\) modulo v, we have a commutative diagram

(3.9)

where \(\mathrm {GM}\) denotes the map induced by the Gauss–Manin connection, \(H^q_{\mathrm {cris}}\) is the crystalline cohomology of \(\overline{{\mathcal {X}}_0}\) (as a reference for crystalline cohomology, see [4, 5]), the diagonal arrows are the canonical identification [5, Corollary 7.4] of crystalline cohomology with the de Rham cohomology of a lift, and the commutativity of the diagram can be deduced from the results of [4, Chapter V] (see Proposition 3.6.4 and prior discussion).

This crystalline cohomology \(V_v=H^q_{\mathrm {dR}}(X_0/K)\) is equipped with a Frobenius operator

$$\begin{aligned} \phi _v{:}\,V_v \longrightarrow V_v, \end{aligned}$$

which is semilinear with respect to the Frobenius on the unramified extension \(K_v/{{\mathbf {Q}}}_p\). By the isomorphisms of (3.9), this \(\phi _v\) acts on \(H^q_{\mathrm {dR}}(X_{y}/K_v)\) and \(H^q_{\mathrm {dR}}(X_{y_0}/K_v)\) as well, in a manner compatible with the map\(\mathrm {GM}\).

3.4 The period mappings in a neighbourhood of y

Now \(V=H^q_{\mathrm {dR}}(X_0/K)\) is equipped with a Hodge filtration:

$$\begin{aligned} V=F^0 V \supset F^1 V \supset \cdots \end{aligned}$$
(3.10)

Let \({\mathcal {H}}\) be the K-variety parameterizing flags in V with the same dimensional data as (3.10), and let \(h_0 \in {\mathcal {H}}(K)\) be the point corresponding to the Hodge filtration on V.

Base changing by means of v and \(\iota \), we get a \(K_v\)-variety \({\mathcal {H}}_v\) and a \({{\mathbf {C}}}\)-variety \({\mathcal {H}}_{{\mathbf {C}}}\). We denote by \(h_0^{\iota } \in {\mathcal {H}}_{{\mathbf {C}}}({{\mathbf {C}}})\) the image of \(h_0\).

Let \(\Omega _{{\mathbf {C}}}\) be a contractible analytic neighbourhood of \(y_0 \in Y_{{{\mathbf {C}}}}^\mathrm {an}\). The Gauss–Manin connection defines an isomorphism \(H_{\mathrm {dR}}(X_t/ {{\mathbf {C}}}) \simeq H_{\mathrm {dR}}(X_0/ {{\mathbf {C}}})\) for each \(t \in \Omega _{{\mathbf {C}}}\). In particular, the Hodge structure on the cohomology of \(X_t\) defines a point of \({\mathcal {H}}_{{{\mathbf {C}}}}({{\mathbf {C}}})\); this gives rise to the complex period map

$$\begin{aligned} \Phi _{{{\mathbf {C}}}}{:}\,\Omega _{{\mathbf {C}}} \longrightarrow {\mathcal {H}}_{{{\mathbf {C}}}}({{\mathbf {C}}}). \end{aligned}$$

Indeed, \(\Phi _{{{\mathbf {C}}}}\) extends to a map from the universal cover of \(Y_{{{\mathbf {C}}}}^\mathrm {an}\) to \({\mathcal {H}}_{{{\mathbf {C}}}}({{\mathbf {C}}})\) and this map is equivariant for the monodromy action of \(\pi _1(Y_{{{\mathbf {C}}}}^\mathrm {an}, y_0)\) on \({\mathcal {H}}_{{{\mathbf {C}}}}({{\mathbf {C}}})\). We conclude that the image of the period map can be bounded below by monodromy.

Lemma 3.1

Suppose given a family \(X \rightarrow Y\), and take notation as above; in particular, \(\Gamma \) is the Zariski closure of monodromy, and \(h_0^{\iota } = \Phi _{{{\mathbf {C}}}}(y_0)\). Then we have the containment

$$\begin{aligned} \Gamma \cdot h_0^{\iota } \subset \,\text{ the } \text{ Zariski } \text{ closure } \text{ of } \Phi _{{{\mathbf {C}}}}(\Omega _{{\mathbf {C}}})\,\text{ inside }\,{\mathcal {H}}_{{\mathbf {C}}}. \end{aligned}$$
(3.11)

Proof

The preimage \(\Phi _{{{\mathbf {C}}}}^{-1} Z\) of any algebraic subvariety \(Z \subset {\mathcal {H}}_{{{\mathbf {C}}}}\), with \(Z \supset \Phi _{{{\mathbf {C}}}}(\Omega _{{\mathbf {C}}})\), is a complex-analytic subvariety of \(\widetilde{Y_{{{\mathbf {C}}}}^\mathrm {an}}\) containing \(\Omega \) and thus all of \(\widetilde{Y_{{{\mathbf {C}}}}^{\mathrm {an}}}\); therefore

$$\begin{aligned} \pi _1(Y_{{{\mathbf {C}}}}, y_0) \cdot h_0^{\iota } \subset Z \end{aligned}$$

and then Z contains the Zariski closure of the right hand side, which is \(\Gamma \cdot h_0^{\iota }\).

\(\square \)

We need a v-adic analogue. Again, if \(y \in {\mathcal {Y}}({\mathcal {O}}_v)\) satisfies \(y \equiv y_0\) modulo v, the Gauss–Manin connection (3.9) allows one to identify the Hodge filtration on \(H^q_{\mathrm {dR}}(X_{y}/K_v)\) with a filtration on \(V_v\), and thus with a point of \({\mathcal {H}}(K_v)\). This gives rise to a \(K_v\)-analytic function

$$\begin{aligned} \Phi _v{:}\,\Omega _v \longrightarrow {\mathcal {H}}(K_v), \text{ where } \Omega _v = \{ y \in {\mathcal {Y}}({\mathcal {O}}_v){:}\,y \equiv y_0 \text{ modulo } v \}. \end{aligned}$$

The following simple Lemma plays a crucial role. It allows us to analyze the Zariski closure of the p-adic period map in terms of the Zariski closure of the complex period map; for the latter we can use monodromy.

Lemma 3.2

Suppose given power series \(B_0, \dots , B_N \in K[[z_1, \dots , z_m]]\) such that all \(B_i\) are absolutely convergent, with no common zero, both in the v-adic and complex disks

$$\begin{aligned} U_v= \{{\underline{z}}{:}\,|z_i|_v< \epsilon \} \text{ and } U_{{{\mathbf {C}}}} = \{{\underline{z}}{:}\,|z_i|_{{{\mathbf {C}}}} < \epsilon \}. \end{aligned}$$

Write

$$\begin{aligned}&{\underline{B}}_v{:}\,U_v \rightarrow {\mathbf {P}}^N_{K_v}\\&{\underline{B}}_{{{\mathbf {C}}}}{:}\,U_{{{\mathbf {C}}}} \rightarrow {\mathbf {P}}^N_{{{\mathbf {C}}}} \end{aligned}$$

for the corresponding maps.

Then there exists a K-subscheme \({\mathcal {Z}} \subset {\mathbf {P}}^N\) whose base extension to \(K_v\) (respectively \({{\mathbf {C}}})\) gives the Zariski closure of \({\underline{B}}_{{{\mathbf {C}}}}(U_{{{\mathbf {C}}}}) \subset {\mathbf {P}}^N_{{{\mathbf {C}}}}\) (respectively \({\underline{B}}_v(U_v) \subset {\mathbf {P}}^N_{K_v})\). In particular, these Zariski closures have the same dimension.

Proof

We take I the ideal of \({\mathcal {Z}}\) to be that generated by all homogeneous polynomials \(Q \in K[x_0, \dots , x_N]\) such that \(Q(B_0, \dots , B_N)\) is identically zero.

To verify the claim (for \(K_v\); the proof for \({{\mathbf {C}}}\) is identical) we just need to verify that if a homogeneous polynomial \(Q_v \in K_v[x_0, \dots , x_N]\) vanishes on \({\underline{B}}_v(U_v)\) then \(Q_v\) lies in the \(K_v\)-span of I. But if \(Q_v\) vanishes on \({\underline{B}}_v(U_v)\) then \(Q_v(B_0, \dots , B_N) \equiv 0\) in \(K_v[[z_1, \dots , z_m]]\). The identical vanishing of \(Q_v(B_0, \dots , B_N)\) is an infinite system of linear equations on the coefficients of \(Q_v\), with coefficients in K. Any \(K_v\)-solution of such a linear system is, of course, a \(K_v\)-linear combination of K-solutions. \(\square \)

By embedding \({\mathcal {H}}\) into a projective space \({\mathbf {P}}^N\), and applying the prior two Lemmas, we deduce:

Lemma 3.3

The dimension of the Zariski closure (in the \(K_v\)-variety \({\mathcal {H}}_{K_v})\) of \(\Phi _v(\Omega _v)\) is at least the (complex) dimension of \(\Gamma \cdot h_0^{\iota }\).

In particular, if \({\mathcal {H}}_v^{\mathrm {bad}} \subset {\mathcal {H}}_v\) is a Zariski-closed subset of dimension less than \(\dim _{{{\mathbf {C}}}}(\Gamma \cdot h_0^{\iota })\), then \(\Phi _v^{-1}({\mathcal {H}}_v^{\mathrm {bad}})\) is contained in a proper \(K_v\)-analytic subset of \(\Omega _v\), by which we mean a subset cut out by v-adic power series converging absolutely on \(\Omega _v\).

One can do better than this using the results of Bakker and Tsimerman, replacing “proper \(K_v\)-analytic” by “Zariski-closed.” See Sect. 9. We do not need this improvement for the applications to Mordell.

3.5 Hodge structures

We use p-adic Hodge theory to relate Galois representations to crystalline cohomology. A good reference is [7, 17].

For each \(y\in U\) the representation \(\rho _y\) (see (3.2)) is crystalline upon restriction to \(K_v\), because of the existence of the model \({\mathcal {X}}_y\) for \(X_y\). By p-adic Hodge theory, there is [7, Proposition 9.1.9] a fully faithful embedding of categories:

$$\begin{aligned} \text{ crystalline } \text{ representations } \text{ of }\,{{\text {Gal}}}_{K_v}\,\text{ on } {{\mathbf {Q}}}_p \text{ vector } \text{ spaces } \hookrightarrow \mathcal {FL},\nonumber \\ \end{aligned}$$
(3.12)

where the objects of \(\mathcal {FL}\) are triples \((W, \phi , F)\) of a \(K_v\)-vector space W, a Frobenius-semilinear automorphism \(\phi {:}\,W \rightarrow W\), and a descending filtration F of W. The morphisms in the category \(\mathcal {FL}\) are morphisms of \(K_v\)-vector spaces that respect \(\phi \) and filtrations [17, Expose III, §4.3].

By the crystalline comparison theorem of Faltings [15], the embedding (3.12) carries \(\rho _y\) to the triple \( \left( H_{\mathrm {dR}}^q(X_y/ K_v), \phi _v, \text{ Hodge } \text{ filtration } \text{ for } X_y \right) \). But (3.9) induces an isomorphism in \(\mathcal {FL}\):

$$\begin{aligned} \left( H_{\mathrm {dR}}^q(X_y/ K_v), \phi _v, \text{ Hodge } \text{ filtration } \text{ for }\,X_y \right) \simeq (V_v, \phi _v, \Phi _v(y)), \end{aligned}$$

As a sample result of what we can now show, we give the following. We will use the method of proof again and again, so it seems useful to present it in the current simple context.

Proposition 3.4

Notation as above: in particular \(X \rightarrow Y\) is a smooth proper family over K, V is the degree q de Rham cohomology of a given fiber \(X_0\) above \(y_0 \in Y(K)\), \({\mathcal {H}}\) a space of flags in V,

$$\begin{aligned} \Phi _v{:}\,\{y \in {\mathcal {Y}}({\mathcal {O}}_v){:}\,y \equiv y_0\} \longrightarrow {\mathcal {H}}(K_v) \end{aligned}$$

is the v-adic period mapping, \(\Gamma \subset {{\text {GL}}}(V_{{{\mathbf {C}}}})\) is the Zariski closure of the monodromy group, and \(h_0 = \Phi (y_0)\) is the image of \(y_0\) under the period mapping.

Suppose that

$$\begin{aligned} \dim _{K_v}\left( \mathrm {Z}(\phi _v^{[K_v{:}\,{{\mathbf {Q}}}_p]}) \right) < \dim _{{{\mathbf {C}}}} \ \Gamma \cdot h_0^{\iota } \end{aligned}$$
(3.13)

where the left-hand side \(\mathrm {Z}(\dots )\) denotes the centralizer, in \({{\text {Aut}}}_{K_v}(V_v)\), of the \(K_v\)-linear operator \(\phi _v^{[K_v{:}\,{{\mathbf {Q}}}_p]}\).

Then the set

$$\begin{aligned} \{y \in Y({\mathcal {O}}){:}\,y \equiv y_0\,\text{ modulo }\,v, \rho _y \text{ semisimple } \} \end{aligned}$$
(3.14)

is contained in a proper \(K_v\)-analytic subvariety of the residue disk of \(Y(K_v)\) at \(y_0\).

Proof

For any y as in (3.14) the Galois representation \(\rho _y\) belongs to a finite set of isomorphism classes (Lemma 2.3). By our previous discussion the triple \((V_v, \phi _v, \Phi _v(y))\) also belongs to a finite set of isomorphism classes (now in the category \(\mathcal {FL}\)). Choosing representatives \((V_v, \phi _v, h_i)\) for these isomorphism classes, we must have

$$\begin{aligned} \Phi _v(y) \in \bigcup _{i} \mathrm {Z}(\phi _v) \cdot h_i, \end{aligned}$$

where \(\mathrm {Z}(\phi _v)\) is the subgroup of elements in \({{\text {GL}}}_{K_v}(V_v)\) which commute with \(\phi _v\).

Now certainly \(\mathrm {Z}(\phi _v) \subset \mathrm {Z}(\phi _v^{[K_v{:}\,{{\mathbf {Q}}}_p]})\), and the right-hand side is now the \(K_v\)-points of a \(K_v\)-algebraic subgroup of \({{\text {GL}}}_{K_v}(V_v)\). Therefore, any y as in (3.14) is contained in the preimage, under \(\Phi _v\), of a proper Zariski-closed subset of \({\mathcal {H}}_v\) with dimension the left hand side of (3.13). This is obviously a \(K_v\)-analytic subvariety as asserted. It is proper because of Lemma 3.3. \(\square \)

In conclusion we note that we really have bounded \({\mathcal {Y}}({\mathcal {O}})\) rather than the set of \(y \in Y(K)\) for which the abstract Galois representation \(\rho _y\) has good reduction outside S. To bound the latter set, we would have to deal with the possibility that such y would be nonintegral at S; this would require a more detailed analysis “at infinity” and we have not attempted it.

4 The S-unit equation

As a first application, and a warm-up to the more complicated case of curves of higher genus, we will show finiteness of the set of solutions to the S-unit equation. This argument is not logically necessary for the later proofs but we hope it will serve as a useful introduction to them.

Theorem 4.1

The set

$$\begin{aligned} U= \{ t \in {\mathcal {O}}_S^*{:}\,1-t \in {\mathcal {O}}_S^*\} \end{aligned}$$

is finite.

4.1 Reductions

We begin with some elementary reductions.

We may freely enlarge both S and K. Thus, we may suppose that S contains all primes above 2 and that K contains the 8th roots of unity. Let m be the largest power of 2 dividing the order of the group of roots of unity in K. By assumption \(m \geqslant 8\).

First of all, it suffices to prove finiteness of the set

$$\begin{aligned} U_1 = \{ t \in {\mathcal {O}}_S^*{:}\,1-t \in {\mathcal {O}}_S^*, t \notin (K^*)^2\}, \end{aligned}$$

because \(U \subset U_1 \cup U_1^2 \cup U_1^4 \cup \cdots \cup U_1^m.\) To see this, we take \(t \in U\) and try to repeatedly extract its square root; observe that such a square root, if in K, also belongs to U. If we cannot extract an mth root of t, we are done; otherwise, write \(t=t_1^{m}\) and adjust \(t_1\) by a primitive mth root of unity to ensure that \(t_1\) is nonsquare.

Suppose that \(t \in U_1\). Since t is a nonsquare and \(\mu _m \subset K\) the order of t in the group \((K^*)/(K^*)^{m}\) is exactly m. Otherwise there is some proper divisor \(k > 1\) of m, and an element \(a \in K^*\), such that \(t^k = a^m\), i.e. \(t \in a^{m/k} \mu _k\), contradicting the fact that t is nonsquare.

Fixing \(t^{1/m}\) an mth root of t in \({\overline{K}}\), the field \(K(t^{1/m})\) is Galois over K, and Kummer theory guarantees that its Galois group is \({{\mathbf {Z}}}/m{{\mathbf {Z}}}\). There are (Hermite–Minkowski) only finitely many possibilities for \(K(t^{1/m})\). Enumerate them; call them \(L_1, \dots , L_r\), say. Each \(L_i\) is a cyclic degree-m extension of K, and it is sufficient to prove finiteness of the set

$$\begin{aligned} U_{1,L} = \{ t \in U_1, K(t^{1/m}) \simeq L\}. \end{aligned}$$
(4.1)

for a fixed field \(L \in \{L_1, \dots , L_r\}\); here we understand \(K(t^{1/m})= K[x]/(x^m-t)\).

Fix an L as above. L is cyclic of degree m over K. Choose a prime v of K such that:

  1. (i)

    the class of Frobenius at v generates \(\mathrm {Gal}(L/K)\);

  2. (ii)

    the prime p of \({{\mathbf {Q}}}\) below v is unramified in K.

  3. (iii)

    no prime of S lies above p.

In particular, v is inert in L/K; thus, if \(t \in U_{1,L}\) then t is not a square in \(K_v\), for otherwise \(L \otimes _{K} K_v \simeq K_v[x]/(x^m-t)\) would not be not a field.

In summary, it is enough to prove the following lemma.

Lemma 4.2

Suppose K contains the 8th roots of unity, and S contains all primes above 2. Fix a cyclic field extension L/K, a place \(v \not \in S\) as above, and a basepoint \(t_0 \in {\mathcal {O}}_S\). Let \(U_{1, L}\) be as above. Then the set

$$\begin{aligned} \{t \in U_{1,L}{:}\,t \equiv t_0 \text{ modulo } v\} \end{aligned}$$
(4.2)

is finite.

The proof of this Lemma will occupy the rest of the section. Throughout the proof, p is the prime of \({{\mathbf {Q}}}\) below v, and “Tate module” always refers to p-adic Tate module.

4.2 A variant of the Legendre family

As discussed in Sect. 1, we apply Proposition 3.4 not to the Legendre family, but to a modification of it: let \( {\mathcal {Y}} ={{\mathbf {P}}}^1_{{\mathcal {O}}}-\{0,1, \infty \} \) (where \(0,1, \infty \) denote the corresponding sections over \({\text {Spec}} {\mathcal {O}}\)) and let \({\mathcal {Y}}' = {{\mathbf {P}}}^1_{{\mathcal {O}}} - \{0, \mu _m,\ \infty \}\); let \(\pi {:}\,{\mathcal {Y}}' \rightarrow {\mathcal {Y}}\) be the map \(u \mapsto u^m\).

Let \({\mathcal {X}} \rightarrow {\mathcal {Y}}'\) be the Legendre family, so that its fiber over t is the curve \(y^2=x(x-1)(x-t)\); and consider the composite

$$\begin{aligned} {\mathcal {X}} \longrightarrow {\mathcal {Y}}' {\mathop {\longrightarrow }\limits ^{\pi }} {\mathcal {Y}}. \end{aligned}$$

We will apply our prior results to the family \({\mathcal {X}} \rightarrow {\mathcal {Y}}\); also, as before, we denote by X and Y the fibers of \({\mathcal {X}}\) and \({\mathcal {Y}}\) over \({{\text {Spec}}}(K)\). Thus the geometric fiber \(X_t\) of \(X \rightarrow Y\) over \(t \in Y(K)\) is the disjoint union of the curves \(y^2 =x(x-1)(x-t^{1/m})\) over all mth roots of t.

4.3 Proof of finiteness

Assume for the moment the following two Lemmas; they will be proved in Sect. 4.4.

Lemma 4.3

(Big monodromy) Consider the family of curves over \({{\mathbf {C}}}-\{0,1\}\) whose fiber over \(t \in {{\mathbf {C}}}\) is the union of the elliptic curves \(E_z{:}\,y^2=x(x-1)(x-z)\), over all \(m{\text {th}}\) roots \(z^m=t\). Then the action of monodromy

$$\begin{aligned} \pi _1({{\mathbf {C}}}-\{0,1\}, t_0) \longrightarrow {{\text {Aut}}}\left( \bigoplus _{z^m = t_0} H^1_B(E_z, {{\mathbf {Q}}})\right) \end{aligned}$$
(4.3)

has Zariski closure containing \(\prod _{z} {{\text {SL}}}(H^1_B(E_z, {{\mathbf {Q}}}))\).

Lemma 4.4

(Generic simplicity) Let L be a number field and p a rational prime, larger than 2, and unramified in L. There are only finitely many \(z \in L\) such that \(z, 1-z\) are both p-units, but for which the Galois representation of \(G_L\) on the Tate module \(T_p(E_z) = H^1_{\mathrm {et}}(E_{z, {\bar{L}}}, {{\mathbf {Q}}}_p)\) of the elliptic curve

$$\begin{aligned} E_z{:}\,y^2=x(x-1)(x-z), \end{aligned}$$

fails to be simple.

Of course much stronger results than Lemma 4.4 are known. The point here is that we prove this in a “soft” fashion, using the Torelli theorem as a substitute for more sophisticated arguments; although we use the specific feature of Hodge weights 0 and 1, the argument is robust enough to generalize (although with a little added complexity, see e.g. Lemma 6.3).

Proof of Lemma 4.2 assuming Lemmas 4.3 and 4.4

This argument is similar to the proof of Proposition 3.4, with added complication coming from the interaction of the fields K and L. Recall that we have fixed \(t_0 \in U_{1,L}\) and we must verify the finiteness of the set of \(t \in U_{1,L}\) with \(t \equiv t_0\) modulo v.

By Lemmas 4.4 and 2.3, it is enough to verify the finiteness of the subset of such t where the pair \((K(t^{1/m}), \rho _t | G_{K(t^{1/m})})\) lies in a fixed isomorphism class; in particular \((K_v(t^{1/m}), \rho _t|G_{K_v(t^{1/m})})\) lies in a fixed isomorphism class.

Under the correspondence of p-adic Hodge theory, \(\rho _t\) restricted to \(K_v(t^{1/m})\) corresponds to the filtered \(\phi \)-module

$$\begin{aligned} \left( H^1_{\mathrm {dR}}(X_{t, K_v} / K_v) \text{ as } K_v(t^{1/m})\text{-module }, \text{ Frobenius, } \text{ filtration }\right) , \end{aligned}$$
(4.4)

where we equip \(H^1_{\mathrm {dR}}(X_{t, K_v} / K_v)\) with the structure of 2-dimensional vector space over \(K_v(t^{1/m})\) that arises from the scheme structure of \(X_t\) over \(K(t^{1/m})\).

Let us clarify this vector space structure over \(K_v(t^{1/m})\), which is crucial to our argument. Although a priori a K-scheme, the factorization \(X \rightarrow Y' \rightarrow Y\) induces on \(X_t\) the structure of \(K(t^{1/m})\)-scheme, i.e. arising from the morphism \(X_t \rightarrow (Y')_t \simeq {{\text {Spec}}}K(t^{1/m})\). Now the de Rham cohomology of \(X_t\) is the same whether we consider it as a \(K(t^{1/m})\)-variety or as a K-variety. If we consider it as K-variety, we can recover its structure of \(K(t^{1/m})\)-vector space by means of the natural map

$$\begin{aligned} K(t^{1/m}) = H^0_{\mathrm {dR}}(Y'_t/K) \rightarrow H^0_{\mathrm {dR}}(X_t/K). \end{aligned}$$

The same picture works with K replaced by \(K_v\) everywhere.

(Similarly, there are two natural interpretations for “Frobenius” in (4.4), but they are equivalent: as just explained, we can consider the space \(H^1_{\mathrm {dR}}\) as the de Rham cohomology of either a \(K_v(t^{1/m})\)-scheme, or of the associated \(K_v\)-scheme obtained simply by restricting the scalars. Both of these schemes have evident integral models, over \({\mathcal {O}}_v[x]/(x^m-t)\) and \({\mathcal {O}}_v\) respectively. Accordingly, the de Rham cohomologies can be identified with the crystalline cohomologies of the special fibers; these crystalline cohomologies are identified, in a fashion that respects the semilinear Frobenius endomorphisms.)

The Gauss–Manin connection for the family \(X \rightarrow Y\) induces

$$\begin{aligned} H^1_{\mathrm {dR}}(X_{t, K_v} / K_v) \simeq H^1_{\mathrm {dR}}(X_{t_0, K_v} K_v) \end{aligned}$$
(4.5)

which, by compatibility of Gauss–Manin connection with the cup product, is compatible with their module structures over the corresponding \(H^0\)s. The corresponding identification of \(H^0\)s induces the standard identification \(K_v(t^{1/m}) \simeq K_v(t_0^{1/m})\) and therefore the isomorphism (4.5) is compatible with structures of \(K_v(t^{1/m}) \simeq K_v(t_0^{1/m})\)-modules.

Therefore, under the identification of (4.5), the \(F^1\)-step of the filtration on \(H^1_{\mathrm {dR}}(X_{t, K_v} / K_v)\) is identified with a \(K_v(t_0^{1/m})\)-line inside \(H^1_{\mathrm {dR}}(X_{t_0, K_v} / K_v)\). Call this line \(\Phi (t)\). The variation of this line gives a \(K_v\)-analytic period mapping

(4.6)

(The period mapping for the family \(X \rightarrow Y\)a priori takes values in the bottom row, but we have just seen that it factors through the top row. See Sect. 3.3 for a more detailed discussion of the radius of convergence; in particular it defines a rigid analytic function on a domain containing \(\{ t \in K_v, t \equiv t_0 \text{ modulo } v\}\) i.e. the \(K_v\)-points in a residue disk.)

Therefore (applying the Gauss–Manin connection to identify (4.4) with similar data over \(t_0\)) the isomorphism class of the quadruple

$$\begin{aligned} \left( K_v(t_0^{1/m}), H^1_{\mathrm {dR}}(X_{t_0, K_v} / K_v) \text{ as } K_v(t_0^{1/m})\text{-module }, \Phi (t), \text{ Frob }_v \right) \end{aligned}$$

is determined from (4.4) and therefore the triple

$$\begin{aligned} \left( H^1_{\mathrm {dR}}(X_{t_0, K_v} / K_v) \text{ as } K_v(t_0^{1/m})\text{-module }, \Phi (t), \text{ Frob }_v \right) \end{aligned}$$

lies in a finite set of isomorphism classes for filtered \(\phi \)-modules over \(K_v(t_0^{1/m})\) (coming from the finitely many automorphisms of \(K_v(t_0^{1/m})\) over \(K_v\)). Therefore, \(\Phi (t)\) lies in a finite collection of orbits for

$$\begin{aligned} Z&= \text{ centralizer } \text{ of } \mathrm {Frob}_v^{[K_v{:}\,{{\mathbf {Q}}}_p]} \text{ in } K_v(t_0^{1/m})\\&\quad \text{-linear } \text{ automorphisms } \text{ of } H^1_{\mathrm {dR}}(X_{t_0, K_v} / K_v). \end{aligned}$$

Now we can apply Lemma 2.1 to the field extension \(K_v(t_0^{1/m})/K_v\) and the \(K_v\)-linear automorphism \(\mathrm {Frob}_v^{[K_v{:}\,{{\mathbf {Q}}}_p]}\) of \(H^1_{\mathrm {dR}}(X_{t_0, K_v} / K_v)\). This gives us that

$$\begin{aligned} \dim _{K_v} Z \leqslant (\dim _{K_v(t_0^{1/m})} H^1_{\mathrm {dR}} )^2 =4. \end{aligned}$$

Our analysis thus far has shown that the set of \(t \in U_{1,L}\) such that \(t \equiv t_0\) modulo v is contained in

$$\begin{aligned} \Phi ^{-1}\left( {\mathcal {Z}}\right) , \end{aligned}$$

where \(\Phi \) is the period map as in (4.6) and \({\mathcal {Z}} \subset \mathrm {Gr}_{K_v}(2m,m)\) has dimension at most 4. By Lemma 3.3, this set is finite so long as we verify an assertion about the complex period map, namely, that the dimension of the orbit of the algebraic monodromy group over \({{\mathbf {C}}}\) is strictly greater than 4. As in Lemma 3.3, we fix an embedding \(K \hookrightarrow {{\mathbf {C}}}\) throughout the following discussion.

As mentioned, the vector space \(V = H^1_{\mathrm {dR}}(X_{t_0}/K)\) has the natural structure of a 2-dimensional vector space over \(K(t_0^{1/m})\). The splitting of \(X_{t_0, {{\mathbf {C}}}}\) into geometric components induces a splitting

$$\begin{aligned} V_{{{\mathbf {C}}}} = \bigoplus _{i=1}^m V_i, \end{aligned}$$
(4.7)

where each \(V_i\) is a 2-dimensional complex vector space; moreover the Hodge filtration on \(H^1_{\mathrm {dR}}(X_{t_0}/K) \otimes {{\mathbf {C}}}\) also splits along this decomposition. Lemma 4.3 shows that the algebraic monodromy group \(\Gamma \) contains \(\prod _{i=1}^m {{\text {SL}}}(V_i)\). The pertinent flag variety \({\mathcal {H}} \simeq \mathrm {Gr}(V, m)\) is the variety of m-dimensional subspaces in V; the splitting (4.7) induces a natural inclusion \(\prod _{i=1}^m {{\mathbf {P}}}V_i \hookrightarrow {\mathcal {H}}_{{{\mathbf {C}}}}\). Therefore the orbit \(\Gamma h_0^{\iota }\) is all of \(\prod _{i=1}^m {{\mathbf {P}}}V_i\) and, in particular, has dimension \(m \geqslant 8\). Lemma 3.3 now gives the desired finiteness.

In conclusion, assuming Lemmas 4.3 and 4.4, we have shown that the set described in (4.1) is finite. \(\square \)

4.4 Big monodromy and generic simplicity

In this section we prove Lemmas 4.3 and 4.4.

Proof of Lemma 4.3

Write \(\Gamma \) for the Zariski closure in question. It preserves the splitting of (4.3), although not the individual summands. Then:

  • \(\Gamma \) transitively permutes the factors on the right-hand side of (4.3), by considering the action of local monodromy near \(t=0\);

  • \(\Gamma \cap {{\text {SL}}}(2)^m\) projects to \({{\text {SL}}}(2)\) in each factor: indeed, this projection contains a finite-index subgroup of the algebraic monodromy group of the Legendre family.

  • \(\Gamma \) contains an element of the form

    $$\begin{aligned} (1, 1, \dots , 1, u, 1, \dots , 1) \end{aligned}$$

    where \(u \in {{\text {SL}}}(2)\) is a nontrivial unipotent element, as we see by considering the action of local monodromy near \(t=1\).

We now apply a slight variant of Lemma 2.12 to conclude that \(\Gamma \supset {{\text {SL}}}(2)^m\).

\(\square \)

Proof of Lemma 4.4

Fix \(z_0 \in L\) with the quoted p-integrality properties; in particular, \(E_{z_0}\) has good reduction at all primes of L above p.

It is enough to show the same finiteness when we restrict to the set

$$\begin{aligned} V_L = \{z \in L{:}\,z \equiv z_0\,\text{ modulo }\,v\text{, } \text{ for } \text{ all } v|p\}. \end{aligned}$$

If \(T_p(E_z)\) is reducible there exists a one-dimensional subrepresentation \(W_z \subset T_p(E_z)\). By Lemma 2.10 (applied with \(K = {{\mathbf {Q}}}\)) there is a place w of L above p such that \(F^1 (W_z^{\mathrm {dR}}) = W_z^{\mathrm {dR}}\); here \(W_z^{\mathrm {dR}}\) is the filtered \(L_w\)-vector space associated to \(W_z\) by p-adic Hodge theory over the p-adic field \(L_w\).

Because the Newton and Hodge polygons of \(W_z^{\mathrm {dR}}\) have the same endpoint, the slope of semilinear Frobenus acting on \(W_z^{\mathrm {dR}}\) is equal to 1; by the same reasoning for \(H^1_{\mathrm {dR}}(E_z/L_w)\), the sum of slopes for the semilinear Frobenius acting on \(H^1_{\mathrm {dR}}(E_z/L_w)\) is 1, so it has another slope equal to 0.

In particular, the \(L_w\)-linear Frobenius \(\mathrm {Frob}_w^{[L_w{:}\,{{\mathbf {Q}}}_p]}\) has distinct eigenvalues.

Also, the \(L_w\)-line \(W_z^{\mathrm {dR}}\) must coincide with \(F^1 H^1_{\mathrm {dR}}(E_z/L_w)\), so that the latter space is the slope-1 eigenline for the semilinear Frobenius \(\mathrm {Frob}_w\).

As in the discussion around (3.9), Gauss–Manin induces an identification

$$\begin{aligned} H^1_{\mathrm {dR}}(E_{z_0}/L_w) \simeq H^1_{\mathrm {dR}}(E_z/L_w) \end{aligned}$$
(4.8)

of \(L_w\)-vector spaces with semilinear Frobenius action. But the position of the Hodge line \(F^1 H^1_{\mathrm {dR}}(E_z/L_w)\) varies w-adic analytically inside the disk \(V_L\)—here we use (4.8) to identify this line to a line inside the fixed space \(H^1_{\mathrm {dR}}(E_{z_0}/L_w)\)—and the associated w-adic analytic function is nonconstant (by the—trivial—Torelli theorem for elliptic curves). It follows there are at most finitely many \(z \in V_L\) for which \(F^1 H^1_{\mathrm {dR}}(E_z/L_w)\) is the slope-1 Frobenius eigenline. Taking the union over possible w we still see that the exceptional set is finite. \(\square \)

5 Outline of the argument for Mordell’s conjecture

The proof of the Mordell conjecture is substantially harder than the S-unit equation. To try to assist the reader, we summarize the proof here, and then elaborate on the ingredients over the next three sections.

First of all, we will make crucial use of the type of structure that occurred in Sect. 4.2, to which we give a name:

Definition 5.1

An abelian-by-finite family over Y is a sequence of morphisms

$$\begin{aligned} X \longrightarrow Y' {\mathop {\longrightarrow }\limits ^{\pi }}Y \end{aligned}$$

where \(\pi \) is finite étale, and \(X \rightarrow Y'\) is (equipped with the structure of) a polarized abelian scheme.

A good model for such a family, over an S-integer ring \({\mathcal {O}} \subset K\), is a family \({\mathcal {X}} \rightarrow {\mathcal {Y}}' \rightarrow {\mathcal {Y}}\) of smooth, proper \({\mathcal {O}}\)-schemes, satisfying the same conditions and also the assumptions at the start of Sect. 3.1, and recovering \(X \rightarrow Y' \rightarrow Y\) on base change to K.

Of course the polarization on \(X \rightarrow Y'\) is an additional structure but for brevity we do not explicitly include it in the notation.

For any such abelian-by-finite family \(X \rightarrow Y' \rightarrow Y\) take a complex point \(y_0 \in Y({{\mathbf {C}}})\) and consider the action of the topological fundamental group \(\pi _1(Y({{\mathbf {C}}}), y_0)\) on

$$\begin{aligned} H^1_B(X_{y_0}, {{\mathbf {Q}}}) \simeq \bigoplus _{\pi ({\tilde{y}}) = y_0} H^1_B(X_{{\tilde{y}}}, {{\mathbf {Q}}}), \end{aligned}$$

where the sum is taken over \({\tilde{y}} \in Y'({{\mathbf {C}}})\) lying over \(y_0\). We say that the family has full monodromy if the Zariski closure of \(\pi _1(Y, y_0)\), in its action on the right-hand side, contains the product of symplectic groups:

$$\begin{aligned} \overline{ \left( \text{ image } \text{ of } \pi _1(Y({{\mathbf {C}}}), y_0) \right) } \supset \prod _{\pi ({\tilde{y}})=y_0} \mathrm {Sp}\left( H^1_{B}(X_{{\tilde{y}}}, {{\mathbf {Q}}}), \omega \right) , \end{aligned}$$
(5.1)

where the symplectic group is with reference to the form \(\omega \) defined by the polarization.

The key reason to use abelian-by-finite families is that we can guarantee that the Galois orbits on any fiber of \(Y' \rightarrow Y\), above a K-rational point of Y, are “large.” In fact, what we need (see discussion in Introduction) is that most points in the fiber above \(y_0 \in Y(K)\) cannot be defined over “small” extensions of \(K_v\). To quantify the notions of large and small we introduce the following quantity:

Definition 5.2

Let E be a \(G_K\)-set and v a place of K such that the \(G_K\)-action on E is unramified at v. Let

$$\begin{aligned} \mathrm {size}_v(E) = \frac{ \text{ number } \text{ of } \text{ elements } \text{ of }\,E\,\text{ that } \text{ belong } \text{ to }\,\mathrm {Frob}_v\text{-orbits } \text{ of } \text{ size }\,< 8}{\text{ number } \text{ of } \text{ elements } \text{ of }\,E}\nonumber \\ \end{aligned}$$
(5.2)

If E is a zero-dimensional K-scheme, we will write \(\mathrm {size}_v(E)\) instead of \(\mathrm {size}_v(E({\bar{K}}))\).

Note that if \(E \rightarrow E'\) is a morphism of \(G_K\)-sets, and all fibers have the same cardinality, then

$$\begin{aligned} \mathrm {size}_v(E) \leqslant \mathrm {size}_v(E'). \end{aligned}$$
(5.3)

The next result is, in essence, a variant of Proposition 3.4, but it requires some careful indexing. It will be proved in Sect. 6.

Proposition 5.3

Let Y be a curve over K of genus \(g \geqslant 2\).

Let \(X \rightarrow Y' {\mathop {\rightarrow }\limits ^{\pi }} Y\) be an abelian-by-finite family over Y, with full monodromy (see Definition 5.1 and subsequent discussion). Let d be the relative dimension of \(X \rightarrow Y'\). Suppose that \(X \rightarrow Y' {\mathop {\rightarrow }\limits ^{\pi }} Y\) admits a good model over the ring \(\mathcal {O}\) of S-integers of K. Let \(v \notin S\) be a friendly place of K (Definition 2.7).

Let \(\mathrm {size}_v\) be as in (5.2). Then the set

$$\begin{aligned} Y(K)^* := \left\{ y \in Y(K){:}\,\mathrm {size}_v(\pi ^{-1} (y)) < \frac{1}{d+1}\right\} \end{aligned}$$

is finite.

In Sect. 7, we introduce a specific abelian-by-finite family \(X_q \rightarrow Y_q' {\mathop {\rightarrow }\limits ^{\pi }} Y\) for each prime \(q \geqslant 3\), referred to as the “Kodaira–Parshin family for the group \(\mathrm {Aff}(q)\).” Roughly, \(Y_q'\) is a Hurwitz space for \(\mathrm {Aff}(q)\) and \(X_q\) is the Prym of the universal curve. It has the following properties:

  1. (i)

    It has full monodromy (Theorem 8.1).

  2. (ii)

    The relative dimension \(d_q\) of \(X_q \rightarrow Y_q'\) is given by \(d_q = (q-1)(g-\frac{1}{2})\).

  3. (iii)

    For each \(y_0 \in Y(K)\) there is a \(G_K\)-equivariant identification of \(\pi ^{-1}(y_0)\) with the conjugacy classes of surjections \(\pi _1^{\mathrm {geom}}(Y-y_0, *) \twoheadrightarrow \mathrm {Aff}(q)\) that are nontrivial on a loop around \(y_0\).

Note that we can identify \(\pi _1^{\mathrm {geom}}\) with the profinite completion of a free group on 2g generators \(x_1, x_1', \dots , x_g, x_g'\) in such a way that the loop around \(y_0\) corresponds to the conjugacy class of \([x_1, x_1'] [x_2, x_2'] \dots [x_g, x_g']\). Therefore, the set of surjections \(\pi _1^{\mathrm {geom}}(Y-y_0, *) \twoheadrightarrow \mathrm {Aff}(q)\) nontrivial on a loop around \(y_0\) is identified with the left-hand side of (2.3).

There is probably nothing very special about the use of \(\mathrm {Aff}(q)\), but it is simple enough that we can compute everything explicitly.

Assuming these things we can prove.

Theorem 5.4

Let Y be a curve over the number field K with genus \(g \geqslant 2\). Then Y(K) is finite.

Proof

We apply Proposition 5.3 to the Kodaira–Parshin family with parameter q. What we will show is that we may choose q and the place v in such a way that v is friendly and

$$\begin{aligned} \mathrm {size}_v(\pi ^{-1} (y)) < \frac{1}{d_q+1} \ \ \text{ for } \text{ all } y \in Y(K). \end{aligned}$$
(5.4)

The key point is to use the mapping (5.5) below and the Weil pairing to give an upper bound on \(\mathrm {size}_v(\pi ^{-1}(y))\).

We choose q with the following properties:

  1. (i)

    \(q-1\) is not divisible by 4 or by any odd primes less than \(8[K{:}\,{{\mathbf {Q}}}]\).

  2. (ii)

    The Galois closure \(K'\) of K is linearly disjoint from \({{\mathbf {Q}}}(\zeta _{q-1})\) over \({{\mathbf {Q}}}\).

  3. (iii)

    \( \frac{8\cdot 2^{g+1}}{(q-1)^{g}} < \frac{1}{ (g-1/2) (q-1) +1}\).

This is possible by Dirichlet’s theorem: we choose q such that q is not congruent to 1 mod \(\ell \) for any prime \(\ell \) that either divides the discriminant of K, or that is less than \(8[K{:}\,{{\mathbf {Q}}}]\), and also q is not congruent to 1 mod 4. Then linear disjointness follows: for ramification reasons \(K' \cap {{\mathbf {Q}}}(\zeta _{q-1}) = {{\mathbf {Q}}}\). Such a q can be chosen arbitrarily large; in particular it can be chosen to satisfy the third condition.

Now form the Kodaira–Parshin family \(X=X_q \longrightarrow Y'_{q} \longrightarrow Y\) for the group \(\mathrm {Aff}(q)\) and choose a set S such that it has a good model over the ring of S-integers.

Next we show that there exists a place \(v \notin S\) of K such that:

  1. (i)

    v is friendly (in the sense of Definition 2.7)

  2. (ii)

    \((q_v, q-1)=1\) (recall that \(q_v\) was the cardinality of the residue field at v)

  3. (iii)

    For any odd prime factor r of \(q-1\), the class of \(q_v\) in \(({{\mathbf {Z}}}/r)^*\) has order at least \(8\).

Note that the latter two conditions depend only on the residue class of \(q_v\) modulo \(q-1\). We will produce v by the Chebotarev density theorem, applied to \({{\text {Gal}}}(K'(\zeta _{q-1})/{{\mathbf {Q}}})\). By hypothesis, \(K'\) and \({{\mathbf {Q}}}(\zeta _{q-1})\) are linearly disjoint over \({{\mathbf {Q}}}\), so the map

$$\begin{aligned} {{\text {Gal}}}(K'(\zeta _{q-1})/{{\mathbf {Q}}}) \longrightarrow {{\text {Gal}}}(K' / {{\mathbf {Q}}}) \times {{\text {Gal}}}({{\mathbf {Q}}}(\zeta _{q-1}) / {{\mathbf {Q}}}). \end{aligned}$$

is an isomorphism.

If K has no CM subfield, choose \(\sigma \in {{\text {Gal}}}(K'/{{\mathbf {Q}}})\) arbitrarily. Otherwise let E be the maximal CM subfield of K, and let \(E^+\) the maximal totally real subfield; choose some \(\sigma \in {{\text {Gal}}}(K' / E^+) \subseteq {{\text {Gal}}}(K'/{{\mathbf {Q}}})\) inducing the nontrivial automorphism of E over \(E^+\).

By the Chinese Remainder Theorem, we can choose a residue class \(a \in ({{\mathbf {Z}}}/(q-1))^*\) whose reduction modulo r is a primitive root for \(({{\mathbf {Z}}}/r)^*\) for every prime factor r of \((q-1)\).

By Chebotarev density, there is a place \(\wp \) of \(K'(\zeta _{q-1})\) such that the Frobenius \(\mathrm {Frob}_{\wp }\) is the element \((\sigma , a)\) of \({{\text {Gal}}}(K'(\zeta _{q-1})/{{\mathbf {Q}}}) \simeq {{\text {Gal}}}(K' / {{\mathbf {Q}}}) \times ({{\mathbf {Z}}}/ (q-1))^*\). Let p be the prime of \({{\mathbf {Q}}}\) below \(\wp \); thus \(p \equiv a\) modulo \(q-1\). The place v of K below \(\wp \) has residue field of size \(q_v = p^i\), with \(i \leqslant [K{:}\,{{\mathbf {Q}}}]\); therefore, if r is an odd prime factor of \((q-1)\), the order of \(q_v \text{ mod } r\) is at least \(\left\lceil \frac{r-1}{[K{:}\,{{\mathbf {Q}}}]} \right\rceil \geqslant 8\). For the last inequality we used property (i) of q.

If K admits a CM subfield then the place of \(E^+\) below \(\wp \) is inert in E, by choice of \(\sigma \). This shows that there indeed exists v as desired.

Now consider the Kodaira–Parshin family \(X=X_q \longrightarrow Y'_{q} \longrightarrow Y\) for the group \(\mathrm {Aff}(q)\) and write \(d_q\) for the relative dimension of \(X \rightarrow Y\). For any \(y \in Y(K)\) property (iii) of Kodaira–Parshin covers (page 26), and the surjection \(\mathrm {Aff}(q) \twoheadrightarrow {{\mathbf {F}}}_q^* \simeq {{\mathbf {Z}}}/(q-1)\), gives rise to a map of \(G_K\)-sets

$$\begin{aligned} \pi ^{-1}(y) \longrightarrow \underbrace{ H^1_{\mathrm {et}}(Y_{{\bar{K}}}, {{\mathbf {Z}}}/(q-1)).}_{M} \end{aligned}$$
(5.5)

Let \(\Upsilon \subseteq M\) be the image of the map. In explicit coordinates, the map (5.5) has been studied in Lemma 2.11 [see also remark after (iii) on p. 26]. Therefore, by Lemma 2.11, all fibers of the map have the same size. Therefore, in view of (5.3), it is enough to show that \(\mathrm {size}_v(\Upsilon ) < \frac{1}{d_q+1}.\)

Now M has the structure of a (2g)-dimensional free module over \({{\mathbf {Z}}}/(q-1)\). On choosing an identification of M with \(({{\mathbf {Z}}}/ (q-1))^{2g}\), the set \(\Upsilon \) consists of those elements \((y_1, y_1', \dots , y_g, y_g')\) such that the elements \(y_1, y_1', \ldots , y_g, y_g'\) generate \(({{\mathbf {Z}}}/ (q-1))\). (This is shown in the proof of Lemma 2.11.)

M is also equipped with a Galois-equivariant Weil pairing

$$\begin{aligned} \langle -, - \rangle {:}\,M \times M \rightarrow \mu _{q-1}^{\vee } := {{\text {Hom}}}(\mu _{q-1}, {{\mathbf {Z}}}/(q-1){{\mathbf {Z}}}). \end{aligned}$$

The Weil pairing is perfect, i.e. the corresponding map \(M \rightarrow {{\text {Hom}}}(M, \mu _{q-1}^{\vee })\) is an isomorphism. The Frobenius at v induces, in particular, an automorphism \(T{:}\,M \rightarrow M\) that satisfies

$$\begin{aligned} \langle T v_1, T v_2 \rangle = q_v^{-1} \langle v_1, v_2 \rangle . \end{aligned}$$

We want to bound the number of elements of M belonging to T-orbits of size less than \(8\). These elements are contained in the union of the submodules \(\ker (T^i-1)\) for \(1 \leqslant i \leqslant 8\). If \(m_1, m_2 \in \ker (T^i-1)\) then \((q_v^{-i} - 1) \langle m_1, m_2 \rangle = 0\). For every odd prime factor r of \(q-1\) we know that \(q_v^i\) is not congruent to 1 modulo r; therefore \((q_v^i-1)\) is relatively prime to r. Thus \(2\langle m_1, m_2 \rangle = 0\) for any \(m_1, m_2 \in \ker (T^i-1)\).

Now if A is a finite abelian group endowed with a nondegenerate pairing \(A \times A \rightarrow {{\mathbf {Q}}}/{{\mathbf {Z}}}\) then any subgroup \(B \subset A\) such that \(\langle B, B \rangle = 0\) has order at most \(\sqrt{A}\). Applying this to 2M we find

$$\begin{aligned} \left| 2\ker (T^i-1) \right| \leqslant \left( \frac{q-1}{2} \right) ^g \implies \left| \ker (T^i-1) \right| \leqslant 2^g (q-1)^g. \end{aligned}$$

Hence, the number of elements of M contained in the union of the submodules \(\ker (T^i-1)\) for \(1 \leqslant i \leqslant 8\) is at most \(8\cdot 2^{g} (q-1)^g\).

It remains to give an upper bound for the “\(\mathrm {size}_v\)” of \(\Upsilon \), the image of (5.5). The number of generating (2g)-tuples in \({{\mathbf {Z}}}/N\) equals \(\# ({{\mathbf {Z}}}/N)^* \times {\mathbf {P}}^{2g-1}({{\mathbf {Z}}}/N)\), which equals \( N^{2g} \cdot \prod _{p|N} (1-p^{-2g}) \geqslant \frac{1}{2} N^{2g}.\) So \(\Upsilon \) has at least \(\frac{1}{2} (q-1)^{2g}\) elements, of which at most \(8\cdot 2^{g} (q-1)^g\) belong to Frobenius orbits of size \(8\) or smaller. It follows that

$$\begin{aligned} \mathrm {size}_v(\pi ^{-1} (y)) {\mathop {\leqslant }\limits ^{(5.3)}} \mathrm {size}_v(\Upsilon )&\leqslant \frac{8\cdot 2^{g} (q-1)^g}{\frac{1}{2} (q-1)^{2g} } = \frac{8\cdot 2^{g+1}}{(q-1)^{g}} \\&< \frac{1}{\underbrace{(g-1/2) (q-1)}_{d_q} +1}, \end{aligned}$$

the last inequality by property (iii) of the prime q. This concludes the proof of (5.4). \(\square \)

6 Rational points on the base of an abelian-by-finite family

In this section we prove Proposition 5.3, which is in essence a variant of Proposition 3.4, and which we rewrite for the reader’s convenience.

Proposition 5.3

Let Y be a curve over K of genus \(g \geqslant 2\).

Let \(X \rightarrow Y' {\mathop {\rightarrow }\limits ^{\pi }} Y\) be an abelian-by-finite family over Y, with full monodromy (see Definition 5.1 and subsequent discussion). Let d be the relative dimension of \(X \rightarrow Y'\). Suppose that \(X \rightarrow Y' {\mathop {\rightarrow }\limits ^{\pi }} Y\) admits a good model over the ring \(\mathcal {O}\) of S-integers of K. Let \(v \notin S\) be a friendly place of K (Definition 2.7).

Let \(\mathrm {size}_v\) be as in (5.2). Then the set

$$\begin{aligned} Y(K)^* := \left\{ y \in Y(K){:}\,\mathrm {size}_v(\pi ^{-1} (y)) < \frac{1}{d+1}\right\} \end{aligned}$$

is finite.

Here’s what happens in the proof. There are two central lemmas, Lemmas 6.1 and 6.2.

  • The assumption that \(\mathrm {size}_v(\pi ^{-1} (y)) < \frac{1}{d+1}\) guarantees that most points in the fiber \(\pi ^{-1} (y)\) are defined over fields of large degree over \({{\mathbf {Q}}}_p\). As discussed in Sect. 1.3, we will use the fact that an extension \(K_v\) of \({{\mathbf {Q}}}_p\) is of large degree to bound the centralizer of Frobenius for a variety defined over \(K_v\).

    Some care is required with indexing since we only have most points; in particular, we need to identify the fibers over p-adically nearby points y. The discussion of indexing occupies the first part of the proof; the bound on the Frobenius centralizer is in the proof of Lemma 6.2.

  • Lemma 6.1 handles the possible failure of semisimplicity (see discussion in Sect. 1.4). As in Lemma 4.4, we use constraints on Hodge weights coming from global representations (Lemma 2.9) to show that only finitely many fibers can give rise to non-semisimple Galois representations. This requires a general position argument in linear algebra (Lemma 6.4).

Proof

Through the proof, we denote by p the prime of \({{\mathbf {Q}}}\) below v; “Tate module” always means “p-adic Tate module,” and “étale cohomology” means geometric étale cohomology taken with \({{\mathbf {Q}}}_p\) coefficients.

Recall also that we have fixed an algebraic closure \({\overline{K}}\) with Galois group \(G_K\). Fix an extension of v to that field; the completion of \({\overline{K}}\) gives an algebraic closure \(\overline{K_v}\) of \(K_v\). In particular, if \(L \subset {\overline{K}}\) is unramified at v, we obtain a Frobenius element \(\mathrm {Frob}_v \in \mathrm {Gal}(L/K)\).

Fix \(y_0 \in Y(K)^*\). It is sufficient to show that there are only finitely many points of \(Y(K)^*\) that lie in the residue disk

$$\begin{aligned} \Omega _v = \{y \in Y(K_v){:}\,y \equiv y_0 \text{ modulo } v\}, \end{aligned}$$

which we are regarding as a \(K_v\)-analytic manifold.

For each \(y \in Y(K)\), let \(E_y\) be the ring of regular functions on the zero-dimensional scheme \(\pi ^{-1}(y)\); this is an étale K-algebra and \({{\text {Hom}}}(E_y, {\bar{K}})\) is identified with the \(G_K\)-set \(\pi ^{-1}(y)_{{\bar{K}}}\) of preimages of y under \(\pi \). By our assumptions, the \(G_K\)-set \(\pi ^{-1}(y)_{{\bar{K}}}\) is unramified at v. Write \(E_0\) for \(E_{y_0}\).

The fiber \(X_y\) of \(X \rightarrow Y\) above \(y \in Y(K)\) is a priori a K-scheme, but the factorization \(X \rightarrow Y' \rightarrow Y\) gives it the structure of an \(E_y\)-scheme; in particular its de Rham cohomology \(H^1_{\mathrm {dR}}(X_y/K)\) has the structure of a free \(E_y\)-module. Moreover, the polarization on X induces an \(E_y\)-bilinear symplectic pairing

$$\begin{aligned} H^1_{\mathrm {dR}}(X_y/K) \times H^1_{\mathrm {dR}}(X_y/K) \longrightarrow E_y. \end{aligned}$$

Write \(E_{0,v} = E_0 \otimes _{K} K_v\), and \(V_v := H^1_{\mathrm {dR}}(X_{y_0}/K_v)\). Then \(V_v\) is a free \(E_{0,v}\)-module equipped with an (\(E_{0,v}\)-bilinear) symplectic form. Denote by \({\mathcal {H}}_v \subset {\mathcal {G}}_v\) the \(K_v\)-schemes defined by Weil restriction:

$$\begin{aligned} {\mathcal {G}}_v= & {} \mathrm {Res}^{E_{0,v}}_{K_v} \ \mathrm {Gr}(V_v,g)\\ {\mathcal {H}}_v= & {} \mathrm {Res}^{E_{0,v}}_{K_v} \ \mathrm {LGr}(V_v,\omega ). \end{aligned}$$

Here \(\mathrm {Res}^{E_{0, v}}_{K_v}\) denotes Weil restriction of scalars, \(\mathrm {Gr}(V_v,g)\) classifies free \(E_v\)-submodules of rank g inside \(V_v\), and \(\mathrm {LGr}\) classifies free rank-g submodules on which the symplectic pairing is trivial.

Then the period map at \(y_0\) gives a \(K_v\)-analytic function

$$\begin{aligned} \Phi _v{:}\,\Omega _v \longrightarrow {\mathcal {H}}_v \end{aligned}$$

(see Sect. 3.3 for a more detailed discussion of the radius of convergence; in particular it defines a rigid analytic function on a domain containing \(\Omega _v\), i.e. the \(K_v\)-points in a residue disk).

A priori, this period mapping is valued in a suitable Lagrangian Grassmannian of \(K_v\)-linear subspaces inside \(V_v\), but, just as in the discussion of Sect. 4.3, each of these Lagrangian subspaces are actually \(E_{0,v}\)-stable, so that the period mapping actually takes values inside \({\mathcal {H}}_v\). Lemma 3.3, and the assumption of full monodromy, imply that \(\Phi _v(\Omega _v)\) is Zariski-dense in \({\mathcal {H}}_v\).

To proceed further, as we discussed in the proof sketch, we need to carefully index the points above y. Firstly, \(E_y\) decomposes as a product of fields:

$$\begin{aligned} E_y = \prod _{y'} K(y'), \end{aligned}$$

where the product is over points \(y'\) of the scheme \(Y'\) lying above y. For any such \(y'\), the fiber \(X_{y'}\) of \(X \rightarrow Y'\) above \(y'\) is a d-dimensional abelian variety over the field \(K(y')\); write \(\rho _{y'}\) for the corresponding 2d-dimensional p-adic Galois representation of the absolute Galois group of \(K(y')\).

The base change \(E_y \otimes _K K_v\) splits as a product of fields

$$\begin{aligned} E_{y} \otimes _{K} K_v = \prod _{y', w} K(y')_w \end{aligned}$$
(6.1)

indexed by pairs \((y', w)\), where \(y'\) is a closed point of \(\pi ^{-1}(y)\) as above, and w is a place of \(K(y')\) over v. In this situation we will say, for short, that \((y', w)\) is above (yv).

Write \(X_{y',w}\) for the base change of \(X_{y}\) along \(E_y \rightarrow K(y')_w\), and \(\rho _{y',w}\) for the \(G_{K(y')_w}\)-representation on its étale cohomology. The de Rham cohomology \(V_v = H^1_\mathrm {dR}(X_{y} / K_v)\) over \(K_v\) splits as a product

$$\begin{aligned} V_v = \prod _{y',w} V_{y',w}, \ \ V_{y',w} = H^1_{\mathrm {dR}}(X_{y',w}/K(y')_w) \end{aligned}$$
(6.2)

in a fashion that is compatible with the \(E_{y} \otimes _K K_v\)-module structure and (6.1). The dimension of each \(V_{y',w}\) over \(K(y')_w\) is the same, namely, 2d.

Crystalline cohomology of the reduction modulo v (or, phrased differently, the Gauss–Manin connection for \(Y' \rightarrow Y\)) gives an isomorphism

$$\begin{aligned} E_y \otimes _{K} K_v {\mathop {\longrightarrow }\limits ^{\sim }} E_{0,v} = E_0 \otimes _{K} K_v \end{aligned}$$
(6.3)

whenever y belongs to the residue disk \(\Omega \) of \(y_0\). In particular this induces a bijection

$$\begin{aligned} (y', w)\,\text{ above }\,(y,v) {\mathop { \longleftrightarrow }\limits ^{\sim }} (y_0',w_0)\,\text{ above }\,(y_0, v) \end{aligned}$$
(6.4)

since both sides are identified with the spectrum of the common algebra of (6.3). Moreover, the identification (6.3) is compatible with the Gauss–Manin isomorphism

$$\begin{aligned} H^1_{\mathrm {dR}}(X_y/K_v) {\mathop {\longrightarrow }\limits ^{\mathrm {GM}}} H^1_{\mathrm {dR}}(X_{y_0}/K_v). \end{aligned}$$
(6.5)

If \((y', w)\) corresponds to \((y_0', w_0)\) under this identification, then (6.3) and (6.5) induce

$$\begin{aligned} K(y')_w \simeq K(y_0')_{w_0}, \ \ H^1_{\mathrm {dR}}(X_{y', w}/K(y')_w) \simeq H^1_{\mathrm {dR}}(X_{y_0', w_0}/K(y_0')_w).\nonumber \\ \end{aligned}$$
(6.6)

Also, (6.2) induces the splitting of the variety \({\mathcal {H}}_v\) as a product

$$\begin{aligned} {\mathcal {H}}_v = \prod _{(y_0', w)} {\mathcal {H}}_{(y_0', w)}, \end{aligned}$$

where the product is taken over \((y_0', w)\) above \((y_0, v)\), and where

$$\begin{aligned} {\mathcal {H}}_{(y_0', w)} = \mathrm {Res}^{K(y_0')_w}_{K_v} \ \mathrm {LGr}(V_{y_0', w},\omega ). \end{aligned}$$

We have a similar decomposition \({\mathcal {G}}_v = \prod _{(y_0', w)} {\mathcal {G}}_{(y_0', w)}.\)

If \(y \in \Omega _v\), and if \((y',w)\) above (yv) corresponds to \((y_0', w_0)\) under (6.4), then

$$\begin{aligned} \text{ projection } \text{ to } {\mathcal {H}}_{(y_0', w_0)} \text{ of } \Phi _v(y) = F^1 H^1_{\mathrm {dR}}(X_{y',w}). \end{aligned}$$
(6.7)

where we identify \(F^1 H^1_{\mathrm {dR}}(X_{y', w})\) with a Lagrangian in the \(K(y_0')_{w_0}\)-vector space \(V_{y_0', w_0}\) using the Gauss–Manin connection (6.6). This result (6.7) comes down to the fact, already noted, that (6.4) and (6.5) are compatible.

We will establish the following two lemmas.

Lemma 6.1

(Generic Simplicity) There is a finite subset \(F \subset \Omega _v \cap Y(K)^*\) such that, for \(y \in \left( \Omega _v \cap Y(K)^* \right) - F\), there exists \((y',w)\) above (yv) such that:

  1. (i)

    \([K(y')_w{:}\,K_v] \geqslant 8\)

  2. (ii)

    \(\rho _{y'}\) is simple as a \(G_{K(y')}\)-representation.

Observe that for \(y \in \left( \Omega _v \cap Y(K)^* \right) - F\), and \(y'\) above y, there are only finitely many possibilities for the isomorphism class of the field \(K(y')\). Thus, by Lemma 2.3, there are only finitely many possibilities for the isomorphism class of the pair \((K(y)', \rho _{y'})\), and so also only finitely many possibilities for the isomorphism class of any pair \((K(y')_w, \rho _{y'}|_{K(y)'_w})\) arising from \((y', w)\) as in Lemma 6.1. The proof of Proposition 5.3 will then follow from Lemma 6.1 above and Lemma 6.2 below. \(\square \)

Lemma 6.2

(Galois representations really do vary in our family) Fix a finite field extension \(K'_v\) of \(K_v\), with \([K'_v{:}\,K_v] \geqslant 8\), and a Galois representation \(\rho '\) of the absolute Galois group of \(K'_v\).

There are only finitely many \(y \in \Omega _v \cap Y(K)\) for which there exist \((y',w)\) satisfying conditions (i) and (ii) of Lemma 6.1 and moreover the pair

$$\begin{aligned} (K(y')_w, \rho _{y', w})\,\text{ is } \text{ isomorphic } \text{ to }\,(K'_v,\rho ') \end{aligned}$$

i.e. there is an isomorphism \(K(y)'_w \rightarrow K'_v\) carrying the isomorphism class of \(\rho '\) to that of \(\rho _{y',w}\).

To prove Lemmas 6.1 and 6.2 we shall analyze the period mapping more carefully.

Proof of Lemma 6.2

Under the correspondence of p-adic Hodge theory, \(\rho _{y',w}\) corresponds to the \(K(y')_w\)-vector space \(H^1_{\mathrm {dR}}(X_{y'}/K(y')_w)\), together with its natural semilinear Frobenius operator \(\phi \), and the (two-step) filtration defined by \(F^1 H_{\mathrm {dR}}(X_{y'}/K(y')_w)\).

Suppose that \((y', w)\) corresponds to \((y_0', w_0)\) under (6.4). Using the isomorphism (6.6) the triple just described corresponds to

$$\begin{aligned} (H^1_{\mathrm {dR}}(X_{y_0'}/K(y_0')_{w_0}), \phi _v= & {} \text{ semilinear } \text{ Frobenius }, \\&\text{ projection } \text{ of }\,\Phi _v(y)\,\text{ to }\,{\mathcal {H}}_{y_0', w_0}). \end{aligned}$$

It is enough to show that the set of y, for which this triple belongs to a fixed isomorphism class, is finite.

Belonging to a fixed isomorphism class means that the projection of \(\Phi _v(y)\) to \({\mathcal {H}}_{y_0', w_0}\) lies inside a single orbit for the action of the Frobenius centralizer \(Z(\phi _v)\) on \({\mathcal {G}}_{y_0', w_0}\), and so also a single orbit of \(Z(\phi _v^{[K_v{:}\,{{\mathbf {Q}}}_p]})\) on \({\mathcal {G}}_{y_0', w_0}\). (In both cases, these centralizers are taken inside \(K(y_0')_{w_0}\)-linear automorphisms of \(V_{y_0', w}\).)

Apply Lemma 2.1 to the field extension \(K(y_0')_w/K_v\) to see that this Frobenius centralizer has \(K_v\)-dimension at most \((\dim _{K(y_0')_{w_0}} V_{y_0', w_0})^2=4d^2\).

As noted earlier, the period map \(\Phi _v\) has Zariski-dense image (in the \(K_v\)-variety \({\mathcal {H}}_v\); therefore this remains true when projected to \({\mathcal {H}}_{y_0', w_0}\)). Since \(\dim _{K_v} {\mathcal {H}}_{y_0', w_0} = [K(y')_{w}{:}\,K_v] \cdot \frac{d(d+1)}{2} \geqslant 4 d(d+1) >4d^2\), Lemma 3.3 completes the proof of Lemma 6.2. \(\square \)

Proof of Lemma 6.1

Let us call \(y \in Y(K)^* \cap \Omega _v\) “bad” when, for every \((y',w)\) above (yv) such that \([K(y')_w{:}\,K_v] \geqslant 8\), the representation \(\rho _{y'}\) fails to be simple. We must show there are only finitely many bad \(y \in Y(K)^* \cap \Omega _v\).

Sublemma: If \(y \in Y(K)^* \cap \Omega _v\) is bad, there exists:

  • \((y', w)\) above (yv), with \([K(y')_w:K_v] \geqslant 8\);

  • a nonzero proper Frobenius-stable subspace \(W^{\mathrm {dR}}_{y',w}\) of \(H^1_{\mathrm {dR}}(X_{y'}/K(y')_w)\) such that \(\dim F^1 W^{\mathrm {dR}}_{y',w} \geqslant \dim (W^{\mathrm {dR}}_{y',w})/2\). (Here, and in the discussion below, dimensions are dimensions over \(K(y')_w\).)

Proof of sublemma: Take a bad \(y \in Y(K)^* \cap \Omega _v\). For each \(y'\) above y let \(W_{y'}\) be a nonzero subrepresentation of \(\rho _{y'}\) of minimal positive dimension. (It is therefore possible that \(W_{y'}\) is all of \(\rho _{y'}\)). For each place w of \(K(y')\) we define \(W^{\mathrm {dR}}_{y',w}\) by applying p-adic Hodge theory to \(W_{y'} \leqslant \rho _{y'}\); thus \(W^{\mathrm {dR}}_{y', w}\) is a \(\phi \)-stable submodule of \(H^1_{\mathrm {dR}}(X_{y'}/K(y')_w)\).

Note that

$$\begin{aligned} \dim W^{\mathrm {dR}}_{y',w} \leqslant d\,\text{ whenever }\,[K(y')_w{:}\,K_v] \geqslant 8. \end{aligned}$$
(6.8)

Indeed because \(y'\) is bad, the supposition \([K(y')_w{:}\,K_v] \geqslant 8\) forces \(\rho _{y'}\) to be non-simple; because it preserves (up to similitude) a bilinear form, we have \(\dim W_{y'} \leqslant \frac{1}{2} \dim \rho _{y'}\), thus (6.8).

Now assume that, for each \((y',w)\) above (yv), satisfying \([K(y')_w{:}\,K_v] \geqslant 8\), we have

$$\begin{aligned} \dim F^1 W^{\mathrm {dR}}_{y',w} < \frac{1}{2} \dim W^{\mathrm {dR}}_{y',w}. \end{aligned}$$

We will derive a contradiction, which will conclude the proof.

By Lemma 2.10, applied to \(W_{y'}\) as a Galois representation of \(K(y')\), we have

$$\begin{aligned} \sum _{w|v} [K(y')_w{:}\,K_v] \frac{ \dim F^1 W^{\mathrm {dR}}_{y',w}}{\dim W^{\mathrm {dR}}_{y',w}} = \frac{1}{2} [K(y'){:}\,K] \end{aligned}$$
(6.9)

for any \(y'\) a closed point of \(\pi ^{-1}(y)\). Sum over \(y'\) above y; using (6.8) we get

$$\begin{aligned} \sum _{[K(y')_w{:}\,K_v] \geqslant 8} [K(y')_w{:}\,K_v] \left( \frac{1}{2} - \frac{1}{2 d}\right)&+ \sum _{[K(y')_w{:}\,K_v] < 8} [K(y')_w{:}\,K_v] \nonumber \\&\geqslant \frac{1}{2} \sum _{(y',w)} [K(y')_w{:}\,K_v].\nonumber \\ \end{aligned}$$
(6.10)

Here all summations are over \((y', w)\) above (yv). Therefore,

$$\begin{aligned} \sum _{[K(y')_w{:}\,K_v] < 8} \frac{1}{2} [K(y')_w{:}\,K_v] \geqslant \frac{1}{2d}\sum _{[K(y')_w{:}\,K_v] \geqslant 8} [K(y')_w{:}\,K_v].\nonumber \\ \end{aligned}$$
(6.11)

Let \(e_1, \dots , e_k\) be the cycle structure of \(\mathrm {Frob}_v\) acting on the \({\bar{K}}\) points of \(\pi ^{-1}(y)\). The inequality above means that

$$\begin{aligned} \frac{1}{2} \sum _{i{:}\,e_i < 8} e_i \geqslant \frac{1}{2d} \sum _{i{:}\,e_i \geqslant 8} e_i, \end{aligned}$$

which is to say that \(\mathrm {size}_v(\pi ^{-1} y) \geqslant \frac{1}{d+1}\). This contradicts the assumption that \(y \in Y(K)^*\).\(\square \)

We now return to the proof of Lemma 6.1. Fix any \((y_0', w)\) above \((y_0, v)\) with \([K(y_0')_w{:}\,K_v] \geqslant 8\). Such a \((y_0', w)\) exists because of the assumption that \(y_0 \in Y(K)^*\). In view of the Sublemma and (6.7), it is enough to show that there are only finitely many \(y \in Y(K) \cap \Omega _v\) such the projection of \(\Phi _v(y)\) to \({\mathcal {H}}_{(y_0',w)}\) lies in the subvariety

$$\begin{aligned} {\mathcal {H}}_{(y_0', w)}^{\mathrm {bad}} \subset {\mathcal {H}}_{(y_0', w)} \end{aligned}$$

defined as the Lagrangian, \(K(y_0')_w\)-subspaces \(F \subset V_{y_0',w}\) (recall (6.2) for definition) for which there exists a Frobenius-stable subspace \(W \subset V_{y_0',w}\), satisfying

$$\begin{aligned} \dim (F \cap W) \geqslant \frac{1}{2} \dim (W), \end{aligned}$$
(6.12)

By the lemmas that follow, \({\mathcal {H}}_{(y_0', w)}^{\mathrm {bad}}\) is contained in a proper closed \(K_v\)-subvariety of \({\mathcal {H}}_{(y_0', w)}\); we conclude as in the proof of Lemma 6.2. \(\square \)

Lemma 6.3

Suppose \(L_w\) is a finite unramified extension of \(K_v\) of degree \(r \geqslant 8\). Let \((V, \omega )\) be a symplectic \(L_w\)-vector space, with \(\dim _{L_w} V = 2d\); let \(\phi {:}\,V \rightarrow V\) be semilinear for the Frobenius automorphism of \(L_w/K_v\) and bijective.

Then there is a Zariski-open

$$\begin{aligned} {\mathcal {A}} \subseteq \mathrm {Res}^{L_w}_{K_v} \ \mathrm {LGr}(V,\omega ) \end{aligned}$$

(where \(\mathrm {LGr}(V, \omega )\) is the Lagrangian Grassmannian, and \(\mathrm {Res}^{L_w}_{K_v}\) denotes Weil restriction of scalars from \(L_w\) to \(K_v)\) with the following property:

If \(F \subset V\) is a Lagrangian \(L_w\)-subspace, corresponding to a point of \({\mathcal {A}}(K_v)\), there is no \(\phi \)-invariant \(L_w\)-subspace W of V satisfying (6.12).

Proof

Just as in Lemma 2.1, \(V \otimes _{K_v} \overline{K_v}\) splits into 2d-dimensional spaces \(V_1, \dots , V_r\) indexed by embeddings \(L_w \hookrightarrow \overline{K_v}\); we can order them so that \(\phi \) induces isomorphisms \(V_i \simeq V_{i+1}\) for \(1 \le i \le r-1\), and thus can identify them all with \(V_1\) (we do not use the “cyclic” isomorphism \(V_r \simeq V_1\)).

The base extension \(W \otimes _{K_v} \overline{K_v}\) of any \(\phi \)-invariant \(L_w\)-subspace yields a subspace \(\bigoplus W_i \leqslant \bigoplus V_i\), where each \(W_i\) corresponds to \(W_1\) under the above identifications. Similarly, the base extension of a Lagrangian \(L_w\)-subspace \(F \leqslant V\) gives an subspace \(\bigoplus F_i \leqslant \bigoplus V_i\), where each \(F_i\) is Lagrangian. If (6.12) is satisfied, then \(\dim (F_i \cap W_i) \geqslant \frac{1}{2} \dim (W_i)\) for each \(1 \leqslant i \leqslant r\).

The next, and final, Lemma shows that the set of \((F_1 \dots , F_r)\) for which such a W exists is a proper, Zariski-closed subset. Thus there is a Zariski-open set inside in

$$\begin{aligned} \left( \mathrm {Res}^{L_w}_{K_v} \ \mathrm {LGr}(V,\omega ) \right) \times _{K_v} \overline{K_v} \end{aligned}$$

such that, if F belongs to this Zariski-open, it has the property quoted in the statement. Taking the intersection of Galois conjugates of this set, we get the desired Zariski-open inside \(\mathrm {Res}^{L_w}_{K_v} \ \mathrm {LGr}(V,\omega )\). \(\square \)

Lemma 6.4

Let \((V, \omega )\) be a symplectic vector space over a field of characteristic zero with \(\dim (V) =2d\); write \(\mathrm {LGr}(V, \omega )\) for the Grassmannian of Lagrangian subspaces. Let \(\mathrm {E}\) be the set of r-tuples of Lagrangian subspaces

$$\begin{aligned} (F_1, \dots , F_r) \in \mathrm {LGr}(V, \omega )^r \end{aligned}$$

for which there exists a proper nonzero subspace \(W \subset V\) such that \(\dim (F_j \cap W) \geqslant \frac{1}{2} \dim (W)\) for every j. If \(r \geqslant 8\) then \(\mathrm {E}\) is contained in a proper, Zariski-closed subset of \(\mathrm {LGr}(V, \omega )^r\).

Proof

In fact our argument will show that \(r \geqslant 5\) is enough.

First we argue that \(\mathrm {E}\) is Zariski-closed. Consider the product \(\mathrm {Gr}(V) \times \mathrm {LGr}(V, \omega )^r\) parametrizing tuples \((W, F_1, F_2, \ldots , F_r)\) such that each \(F_i\) is Lagrangian. For each i, the dimension \({\text {dim}} F_i \cap W\) is (Zariski) upper semicontinuous; so the set \(\tilde{\mathrm {E}}\) of tuples satisfying the conditions described is closed. Now \(\mathrm {E}\) is the image of the closed set \(\tilde{\mathrm {E}}\) under a proper map, so it is itself closed.

Since \(\mathrm {E}\) is closed it’s enough to produce a single tuple \((F_1, \ldots , F_r)\) not in E.

Take \(e_1, \ldots , e_d, e_1', \ldots , e_d'\) a standard symplectic basis for V, so \(\langle e_i, e_i' \rangle = 1\), and \(\langle e_i', e_i \rangle = -1\), and all other pairings between basis vectors are zero. Let

$$\begin{aligned} F_1= & {} {\text {span}} (e_1, e_2, \ldots , e_d)\\ F_2= & {} {\text {span}} (e_1', e_2', \ldots , e_d')\\ F_3= & {} {\text {span}} (e_1 + e_1', e_2 + e_2', \ldots , e_d + e_d')\\ F_4= & {} {\text {span}} (e_1 + 2 e_1', e_2 + 4 e_2', \ldots , e_d + 2d e_d'). \end{aligned}$$

Now each of these four spaces is maximal isotropic, and any two of them have trivial intersection.

Write \(\pi _{12}{:}\,V \rightarrow F_1\) for the projection along the decomposition \(V = F_1 \oplus F_2\), and similarly define \(\pi _{21}{:}\,V \rightarrow F_2\). Both \(\pi _{12}\) and \(\pi _{21}\) are isomorphisms when restricted to either \(F_3\) or \(F_4\). Write \(\Phi _{12;3}{:}\,F_1 \rightarrow F_2\) for the isomorphism

$$\begin{aligned} F_1 {\mathop {\longleftarrow }\limits ^{\pi _{12}^{-1}}} F_3 {\mathop {\longrightarrow }\limits ^{\pi _{21}}} F_2. \end{aligned}$$

In explicit coordinates \(\Phi _{12;3}\) takes \(e_i\) to \(e_i'\), and the similar map \(\Phi _{12;4}\) takes \(e_i\) to \(2i e_i'\).

We claim that only finitely many W can satisfy the condition stated in the Lemma with respect to \(F_1, F_2, F_3, F_4\). Suppose given such a W. Since \(W \cap F_1\) and \(W \cap F_2\) have trivial intersection with each other, and they each have dimension at least \(\frac{1}{2} \dim (W)\), we have a direct sum decomposition

$$\begin{aligned} W = \left( W \cap F_1\right) \oplus \left( W \cap F_2\right) \end{aligned}$$
(6.13)

and an equality \(\dim (W \cap F_1) = \dim (W \cap F_2) = \frac{1}{2} \dim {W}\). Similarly, we find that \(\dim (W \cap F_3) = \dim (W \cap F_4) = \frac{1}{2} \dim {W}\).

Next \(\pi _{12}\) gives an isomorphism \(F_3 \rightarrow F_1\); comparing dimensions, we see the restriction

$$\begin{aligned} \pi _{12}{:}\,W \cap F_3 {\mathop {\longrightarrow }\limits ^{\sim }} W \cap F_1 \end{aligned}$$

is an isomorphism as well. Similarly \(\pi _{21}{:}\,W \cap F_3 {\mathop {\longrightarrow }\limits ^{\sim }} W \cap F_2\).

In particular, \(\Phi _{12,3}\) carries \(W \cap F_1\) isomorphically to \(W \cap F_2\). The same reasoning applies to \(\Phi _{12,4}\). Therefore, \(W \cap F_1\) is stable under \(\Phi _{12,4}^{-1} \Phi _{12,3}\), which shows that \(W \cap F_1 \subseteq F_1\) is stable under the map \(e_i \mapsto 2 i e_i\).

There are then finitely many possibilities for \(W \cap F_1\); then there are also finitely many possibilities for \(W \cap F_2 = \Phi _{12,3} (W \cap F_1)\) and then by (6.13) finitely many possibilities for W; call them \(W_1, \ldots , W_N\).

Now, for each \(W_i\), the condition that \(\dim (F_5 \cap W_i) \geqslant \frac{1}{2} \dim (W_i)\) cuts out a proper Zariski-closed subset of the Lagrangian Grassmannian parametrizing \(F_5\); thus we may choose \(F_5\) so that no W satisfies the dimension bound. \(\square \)

7 The Kodaira–Parshin family

The argument that we have given for Mordell’s conjecture in Sect. 5 made use of a specific abelian-by-finite family, the Kodaira–Parshin family. In this section we explain how to construct this family, making use (in effect) of an algebraic version of the theory of Hurwitz spaces. We need this theory only in characteristic zero.

7.1 Hurwitz spaces for curves

Proposition 7.1

Let Y be a curve of genus at least 2 over a number field K, and let G be a center-free finite group. Then there is a K-curve \(Y'\) equipped with an étale map \(\pi {:}\,Y' \rightarrow Y\), and a relative curve \(Z \rightarrow Y'\), with the following properties:

  1. (i)

    \(Y'\) parameterizes G-covers of Y branched at a single point”: for \(y \in Y({\bar{K}})\), there is a bijection between \(\pi ^{-1}(y)\) and the set of G-conjugacy classes of surjections \(\pi _1^{\mathrm {geom}}(Y- y,*) \twoheadrightarrow G\) nontrivial on a loop around y. Moreover, if \(y \in Y(K)\), this identification is \(G_K\)-equivariant.

  2. (ii)

    Z gives the universal G-cover of Y branched at a single point”: there is a morphism \(Z \rightarrow Y' \times Y\) of relative curves over \(Y'\) (here, we are regarding \(Y' \times Y\) as the trivial family of curves over \(Y'\), with fiber Y everywhere).

    Moreover G acts on Z covering the trivial action on \(Y' \times Y\). This action makes \(Z \rightarrow Y' \times Y\) into a G-covering away from the graph of \(\pi \). If we take the fiber of this morphism of relative curves above \(y' \in Y'({\bar{K}})\), the resulting map \(Z_{y'} \rightarrow Y\) of curves is ramified exactly at \(\pi (y')\). The induced homomorphism

    $$\begin{aligned} \pi _1^{\mathrm {geom}}(Y- y',y_0) \rightarrow {{\text {Aut}}}_G(Z_{(y', y_0)}) \cong G \end{aligned}$$

    is exactly (in the conjugacy class of) the surjection from (i) classified by \(y'\).

There are several references on this matter that address much more general settings (e.g. [29, §3.22]) but since none of them give the precise statement we need, we will simply outline a direct proof, descending from the complex analytic analogue, in Sect. 7.3.

Now we apply this to the group \(G =\mathrm {Aff}(q)\):

Definition 7.2

Let Y be a curve of genus at least 2 over a number field K, and let q be a prime number. The Kodaira–Parshin curve family over Y with parameter q will be the sequence of morphisms

$$\begin{aligned} Z_q \longrightarrow Y'_{q} \longrightarrow Y, \end{aligned}$$
(7.1)

obtained from Proposition 7.1 applied to the group \(G = \mathrm {Aff}(q)\).

We now want to form an associated abelian-by-finite family to the Kodaira–Parshin curve family.

7.2 Prym varieties

We first describe the situation fiberwise:

Given a morphism \(C_1 \rightarrow C_2\) of curves over an algebraically closed field, the associated Prym variety is the cokernel of the induced map \(\mathrm {Pic}^0(C_2) \rightarrow \mathrm {Pic}^0(C_1)\) on Jacobians.

Now suppose that the covering \(C_1 \rightarrow C_2\) is Galois, with Galois group \(\mathrm {Aff}(q)\), and ramified over exactly one point of \(C_2\). The degree of this covering is \(q(q-1)\). Rather than take its Prym directly, however, we prefer to use a reduced version. Namely, we can form a smaller degree-q covering \(C_1' \rightarrow C_2\) using the permutation action of \(\mathrm {Aff}(q)\) on \({{\mathbf {Z}}}/q{{\mathbf {Z}}}\), and we are interested in the Prym variety of this associated covering:

$$\begin{aligned} \mathrm {coker}({{\text {Pic}}}^0(C_2) \rightarrow {{\text {Pic}}}^0(C_1')). \end{aligned}$$
(7.2)

We emphasize again that this is not the Prym variety of \(C_1 \rightarrow C_2\) but a “reduced” version of it where the role of \(C_1\) has been replaced by \(C_1'\).

We can reformulate this in terms of \(C_1\), rather than the associated curve \(C_1'\). Both \(\mathrm {Pic}^0(C_2)\) and \(\mathrm {Pic}^0(C_1')\) map to \(\mathrm {Pic}^0(C_1)\), with finite kernel. The image of \(\mathrm {Pic}^0(C_2)\) in \(\mathrm {Pic}^0(C_1)\) is now the connected component of the \(\mathrm {Aff}(q)\)-invariants; similarly the image of \(\mathrm {Pic}^0(C_1')\) in \(\mathrm {Pic}^0(C_1)\) is the connected component of the invariants by the subgroup \(H_{q} = ({{\mathbf {Z}}}/q{{\mathbf {Z}}})^*\), which is a point stabilizer in the permutation action of \(\mathrm {Aff}(q)\) on \({{\mathbf {Z}}}/q{{\mathbf {Z}}}\). In summary, then, the Prym variety of \(C_1' \rightarrow C_2\) is isogenous to cokernel of the map connected component of \({{\text {Pic}}}^0(C_1)^{G_q} \rightarrow \) connected component of \({{\text {Pic}}}^0(C_1)^{H_q}\).

This is an abelian variety of dimension \((2g-1) \cdot \frac{q-1}{2}\), isogenous to (7.2).

We may alternately describe this as follows: form the idempotent

$$\begin{aligned} e := \frac{1}{\# H_{q}} \sum _{h \in H_q} h - \frac{1}{\# \mathrm {Aff}(q)} \sum _{g \in \mathrm {Aff}(q)} g \in {{\mathbf {Q}}}[\mathrm {Aff}(q)] \end{aligned}$$

and let \(e' = 1-e\) be the complementary idempotent. Then \(e'' := \# \mathrm {Aff}(q) \cdot e' \in {{\mathbf {Z}}}[\mathrm {Aff}(q)]\) acts on \({{\text {Pic}}}^0(C_1)\), and moreover the connected component of its kernel is isogeneous to the Prym variety described above:

$$\begin{aligned} \text{ connected } \text{ component } \text{ of } {{\text {Pic}}}^0(C_1)[e''] {\mathop {\longrightarrow }\limits ^{\mathrm {isog.}}} \left( \text{ Prym } \text{ for } C_1' \rightarrow C_2\right) . \end{aligned}$$
(7.3)

Equation 7.3 gives a way to access the ‘reduced” Prym variety (at least up to isogeny) that we can conveniently apply in our relative situation: in the situation described in Definition 7.2, \(Z_q \rightarrow Y'_{q}\) is a relative curve over \(Y'_{q}\) and it admits a \(\mathrm {Aff}(q)\)-action, where \(\mathrm {Aff}(q)\) acts trivially on the base. The relative Picard scheme of this curve is an abelian scheme over \(Y'_{q}\) equipped with a symmetric and fiberwise ample line bundle. Thus we may form

$$\begin{aligned} X_q = \text{ relative } \text{ identity } \text{ component } \text{ of }\,{{\text {Pic}}}^0_{Z_q \rightarrow Y'_{q}} [ e''], \end{aligned}$$

where \([e'']\) means the kernel of \(e''\), and for the notion of “relative identity component,” see [12, Proposition 15.6.4]. This \(X_q\) is an abelian scheme over \(Y'_q\), equipped with a symmetric and fiberwise ample line bundle; its fiber over any \(y \in Y'_q({\bar{K}})\) coincides with the construction on the left hand side of (7.3); in particular this fiber is isogenous to the reduced Prym variety of the associated \(\mathrm {Aff}(q)\)-covering \(Z_{y} \rightarrow Y\).

Definition 7.3

Notation as in the prior definition. The Kodaira–Parshin family of Jacobians over Y, associated to the group \(\mathrm {Aff}(q)\), is the sequence of morphisms

$$\begin{aligned} X_{q} \longrightarrow Y'_{q} \rightarrow Y, \end{aligned}$$

where \(X_q\) is, as defined above, the reduced relative Prym of \(Z \rightarrow Y_q' \times Y\), considered as a morphism of relative curves over \(Y_q'\).

This is an abelian-by-finite family, in the sense of Definition 5.1.

7.3 Proof of Proposition 7.1

We give the proof of Proposition 7.1. As we have mentioned this is largely for lack of a good reference which states precisely what we need; certainly much more general statements about Hurwitz schemes exist in the literature.

We start by supposing that Y is a proper smooth curve over \({{\mathbf {C}}}\); while we work over \({{\mathbf {C}}}\) we identify Y with its complex points.

For \(y \in Y\) set S(y) to be the set of conjugacy classes of surjective homomorphisms from \(\pi _1(Y-y, *) \twoheadrightarrow G\), with the property that a loop around y has nontrivial image. Equivalently, S(y) is the finite set of isomorphism classes of connected coverings of Y with Galois group G, branched precisely at y.

For y near \(y^*\) there is a natural identification \(S(y) \cong S(y^*)\) since we can topologically identify (Yy) and \((Y, y^*)\). Thus the set \(\coprod _{y \in Y({{\mathbf {C}}})} S(y)\) has the structure of a Riemann surface \(Y'\) equipped with a covering map \(e{:}\,Y' \rightarrow Y\). Explicitly, for each \(y' \in Y'\), we have \(y' \in S(e(y'))\), or in words: \(y'\) classifies a connected G-covering of Y branched at \(y = e(y')\).

Moreover, the coverings indexed by the elements of S(y) fit together to a morphism

$$\begin{aligned} f{:}\,Z \rightarrow Y' \times Y \end{aligned}$$

of smooth complex manifolds; here G acts on Z, covering the trivial action on \(Y' \times Y\). More explicitly:

  • f is a covering map and a G-torsor when restricted to the complement of the analytic divisor

    $$\begin{aligned} \Delta := \text{ graph } \text{ of }\,e \subset Y' \times Y; \end{aligned}$$
  • the pullback of the above morphism along \({y'} \times Y \hookrightarrow Y' \times Y\) (for \(y' \in Y'\)) is isomorphic to the covering of Y classified by \(y'\).

Near the preimage of \(\Delta \) on Z the map looks in local coordinates like \((z,w) \mapsto (z, w^n)\) for suitable n.

Now everything can be algebraized, i.e. Z and \(Y'\) have unique structures of complex algebraic variety compatible with their analytic structures, and the G-action on Z as well as the morphisms \(Z \rightarrow Y' \times Y\) and \(Y' \rightarrow Y\) are algebraic. This is clear for \(Y'\); also the structure sheaf of Z defines a coherent analytic sheaf on \(Y' \times Y\) which can be made algebraic by GAGA [40, Theorem 3]; similarly the algebra structure on this coherent analytic sheaf comes from an algebra structure on the algebraic sheaf [40, Theorem 2].

We now switch to using the letters \(Z, Y, \dots \) for the complex algebraic varieties, rather than the associated analytic spaces. So we have defined a sequence of complex algebraic varieties

$$\begin{aligned} Z {\mathop {\longrightarrow }\limits ^{f}} Y'\times Y {\mathop {\longrightarrow }\limits ^{e \times \mathrm {id}}} Y \times Y \end{aligned}$$
(7.4)

where f is étale away from the graph of e, and e is étale; the composite \(Z \rightarrow Y^2\) is therefore étale away from the diagonal \(\Delta \). (Note that it is equivalent to check étale in the algebraic and analytic settings, see [33, XII, §3]).

Now suppose that Y is actually defined over a subfield \(K \subset {{\mathbf {C}}}\); we denote by \(Y_K\) the corresponding K-scheme (similarly \((Y^2)_K\), etc.); we want now to descend everything in sight to K.

Lemma 7.4

Write \(Z^{\circ }\) for the preimage of \(Y^2-{\Delta }\) in Z.

  1. (1)

    The étale cover \(F{:}\,Z^{\circ } \rightarrow Y^2-\Delta \) can be uniquely extended to a cover \(F_K{:}\,Z_K^{\circ } \rightarrow (Y^2-\Delta )_K\). (In both cases, these étale covers are understood to be equipped withG-action.)

  2. (2)

    Let \((y_1, y_0) \in Y({\bar{K}})^2\), with \(y_1 \ne y_0\). The geometric fiber

    $$\begin{aligned} F_K^{-1}(y_1, y_0)/G \end{aligned}$$

    is identified with the set \(S(y_0)\), as defined above, now using étale \(\pi _1^{\mathrm {geom}}(Y-y_0, y_1)\). If \((y_1, y_0) \in Y(K)^2\) this identification is equivariant for \(G_K\).

  3. (3)

    The quotient \(Z_K^{\circ }/G\) (which is étale over \((Y^2-\Delta )_K)\) extends uniquely to an étale cover of \(Y_K^2\). This cover is isomorphic to one of the form \(Y'_K \times Y_K \rightarrow Y_K^2\) for an étale cover \(Y'_K \rightarrow Y_K\), such that \(Y'\) is the base change of \(Y'_K\) to \({{\mathbf {C}}}\).

Assume Lemma 7.4 (the proof, which will be given in a moment, will involve only the theory of étale \(\pi _1\) and group theory). It produces a sequence \(Z_K^{\circ } \rightarrow Y'_K \times Y_K \rightarrow Y_K^2\); we need to extend \(Z_K^{\circ }\) to a K-structure on all of Z, and extend the first map accordingly.

Let \(Z_K \rightarrow Y^2_K\) be the normalization of \(Y_K^2\) inside the fraction field of \(Z_K^{\circ }\). Then \(Z_K\) is normal, and finite over \(Y_K^2\). The base extension \(Z_K \otimes _{K} {{\mathbf {C}}}\) is therefore also normal (the extension of a normal scheme along a field extension in characteristic zero is normal—see [44, Tag 037Z] or [19, Cor. 6.14.2]), and it is finite over \(Y^2\). Consequently, \(Z_K \otimes _{K} {{\mathbf {C}}}\) coincides with the normalization of \(Y^2\) in the function field of \(Z^{\circ }\). This latter normalization is identified with Z, for Z is also normal and finite over \(Y^2\).

The morphism \(Z_K^{\circ } \rightarrow Z_K^{\circ }/G \rightarrow Y'_K \times Y_K\) now extends to \(Z_K \rightarrow Y'_K \times Y_K\), and the other desired properties can be verified since they are true over \({{\mathbf {C}}}\).

Proof of Lemma 7.4

We do this by means of the theory of the étale fundamental group. We first formulate the basic point in purely group theoretic terms.

Let \(\Gamma , G\) be groups, with G finite center-free, and c a conjugacy class of morphisms in \({{\text {Hom}}}(\Lambda ,\Gamma )\) for some other group \(\Lambda \); when we apply this, \(\Gamma \) will be a \(\pi _1\) of a punctured curve, \(\Lambda \) will be the profinite completion of an infinite cyclic group, and c will come from monodromy around the puncture. Consider the set \(S = S(\Gamma , c, G)\) of all surjective homomorphisms \(\varphi {:}\,\Gamma \rightarrow G\), with the property that they are nontrivial when pulled back by c. There are natural commuting actions of \(\Gamma \) and G on S:

$$\begin{aligned} \gamma \cdot \varphi = \varphi \circ \mathrm {Ad}(\gamma )^{-1} \ (\gamma \in \Gamma ), \ \ \varphi \cdot h = \mathrm {Ad}(h^{-1}) \circ \varphi \ \ (h \in G). \end{aligned}$$

where we’ve written \(\mathrm {Ad}(x)\) for the automorphism \(g \mapsto x g x^{-1}\).

This \(\Gamma \)-action extends uniquely to an action (commuting with G) of any overgroup \({\widetilde{\Gamma }} \supset \Gamma \) in which \(\Gamma \) is normal and whose conjugation action preserves c. Indeed the extension is described by exactly the same formula; uniqueness comes from the fact that the stabilizer of \(\varphi \in S(\Gamma , c, G)\) inside \(\Gamma \times G^{\mathrm {op}}\) is given by

$$\begin{aligned} \{(\gamma \in \Gamma , h \in G){:}\,h^{-1} = \varphi (\gamma )\}, \end{aligned}$$

and so \(\varphi \) is determined by its stabilizer in \(\Gamma \times G^{\mathrm {op}}\). (We used that \(\varphi \) is surjective and that G is center-free.)

We apply this as follows. As above, fix two points \(y_0 \ne y_1 \in Y({{\mathbf {C}}})\); we will use \({\mathbf {y}} = (y_1, y_0) \) as a geometric basepoint for \(Y \times Y\). Consider the sequence of pointed schemes:

$$\begin{aligned} \underbrace{ (Y-\{y_0\}, y_1)}_{\Gamma := \pi _1} {\mathop {\longrightarrow }\limits ^{p \mapsto (p,y_0)}} \underbrace{ (Y^2 - \Delta , {\mathbf {y}}) }_{\tilde{\Gamma }^{\mathrm {geom}} := \pi _1}{\mathop {\longrightarrow }\limits ^{(y, y') \mapsto y'}} (Y, y_0) \end{aligned}$$
(7.5)

and let \(\Gamma , \tilde{\Gamma }^{\mathrm {geom}}\) be defined as the geometric étale \(\pi _1\) of the first and second spaces, at the specified basepoints. Now the long exact sequence for homotopy groups of a fibration gives rise to an exact sequence of topological fundamental groups; in the setting at hand this is short exact because the \(\pi _2\) of \(Y-\{y_0\}\) vanishes. The corresponding sequence of geometric étale fundamental groups is obtained by profinite completion; it remains exact by the results of [38]. Therefore the first map above identifies \(\Gamma \) with a normal subgroup of \(\tilde{\Gamma }^{\mathrm {geom}}\). It follows easily that, if we write

$$\begin{aligned} \tilde{\Gamma } = \pi _1((Y^2-\Delta )_K,{\mathbf {y}}), \end{aligned}$$

(arithmetic fundamental group) then the map \(\Gamma \rightarrow \tilde{\Gamma }^{\mathrm {geom}}\) identifies \(\Gamma \) to a normal subgroup of \(\tilde{\Gamma }\).

Now let \(S=S(\Gamma , c, G)\) be as above, where c is the conjugacy class of maps \({\widehat{{{\mathbf {Z}}}}} \rightarrow \pi _1(Y-\{y_0\}, y_1)=\Gamma \) arising from the monodromy around \(y_0\). The commuting \(\Gamma \times G\) actions on S define a cover of \(Y-\{y_0\}\), equipped with an action of G by automorphisms, whose fiber at \(y_1\) is identified with S. This cover may be described as follows: it is the disjoint union of all the connected G-covers of Y branched precisely at \(y_0\). In other words, it is the restriction of \(Z \rightarrow Y^2-\Delta \) to the fiber \(\{y_0\} \times (Y-\{y_0\})\). From the uniqueness just described, the extension of this \(\Gamma \times G\) action on S to an action of \({\widetilde{\Gamma }}^{\mathrm {geom}} \times G\) corresponds to the cover \(Z \rightarrow Y^2-\Delta \). Therefore, the (further) unique extension of the \(\Gamma \times G\)-action on S to \({\widetilde{\Gamma }} \times G\) gives the statement (1) in the Claim.

Statement (2) of the Claim (and the \(G_K\)-equivariance if \(y_1, y_0\) are K-rational) follows for the specific \((y_1, y_0)\) chosen above; however, since we showed that the K-structure on \(Z^{\circ }\) is unique, it must also be true for any choice of \((y_1, y_0)\).

For statement (3) we notice that the action of \(\Gamma \) on \(S(\Gamma , c, G)/G\) is in fact trivial. Therefore the resulting action of \({\widetilde{\Gamma }}\) factors through the quotient \(\pi _1(Y_K, y_0)\) arising from the last map of (7.5). This amounts to the third assertion. \(\square \)

8 The monodromy of Kodaira–Parshin families

8.1 Introduction, notation, statement of main theorem

In this section we consider surfaces in the classical topological category: by a “surface” we mean the complement of finitely many interior points inside a connected, orientable, compact two-dimensional manifold with boundary. Thus a surface can have both boundary and punctures. Throughout this section, the letters Y and Z will denote such a surface, and we will use \(y_0\) to denote a base point on Y. For such a surface Y, \(\mathrm {MCG}(Y)\) denotes the mapping class group of Y. To emphasize, Y could have “punctures” or boundary. The book of Farb and Margalit [16] is a reference on this material that contains all the results we will use. When we discuss homology or cohomology, the coefficients are always assumed to be the rational numbers \({{\mathbf {Q}}}\) unless stated otherwise.

We first reformulate the statement to be proven.

8.2 Covers and their homology

Let Y be a surface (possibly with punctures or boundary). An \(\mathrm {Aff}(q)\)-cover of Y is, by definition, a connected surface Z together with a degree q covering map

$$\begin{aligned} \pi {:}\,Z \longrightarrow Y \end{aligned}$$

whose monodromy representation on a general fiber is equivalent to the action of \(\mathrm {Aff}(q)\) on \({\mathbf {F}}_q\) (i.e. we can label points in the fiber by \({\mathbf {F}}_q\) in such a way that the monodromy representation has image \(\mathrm {Aff}(q)\)). We will often abuse notation and refer to this cover simply as Z, i.e., regard the map \(\pi \) as implicit.

After choice of basepoint \(y_0 \in Y\), such a cover determines an \(\mathrm {Aff}(q)\)-conjugacy classFootnote 5 of maps

$$\begin{aligned} \pi _1(Y, y_0) \twoheadrightarrow \mathrm {Aff}(q). \end{aligned}$$
(8.1)

We define two \({\text {Aff}}(q)\)-covers \((Z_1, \pi _1)\) and \((Z_2, \pi _2)\) to be isomorphic when there is a homeomorphism \(Z_1 \simeq Z_2\) commuting with the projections to Y; equivalently, when the associated conjugacy classes of \(\pi _1\)-representations (8.1) coincide.

If we have fixed a \({\text {Aff}}(q)\)-cover \(Z \rightarrow Y\), we denote by \({\text {Cov}}{:}\,\pi _1 \rightarrow \mathrm {Aff}(q)\) any homomorphism in the conjugacy class of (8.1). For \(\eta \in \pi _1\) we can unambiguously talk about the cycle decomposition of \({\text {Cov}}(\eta )\) in \(\mathrm {Sym}({\mathbf {F}}_q)\), which we regard as a partition of the positive integer q; this cycle decomposition is conjugation-invariant.

Given any covering map \(\pi {:}\,Z \rightarrow Y\), the pullback and pushforward on homology define a splitting

$$\begin{aligned} H_1(Z, {{\mathbf {Q}}}) = \pi ^* H_1(Y, {{\mathbf {Q}}}) \oplus \underbrace{ H_1^{\mathrm {Pr}}(Z, Y; {{\mathbf {Q}}})}_{\mathrm {ker } \left( \pi _*{:}\,H_1(Z) \rightarrow H_1(Y) \right) .} \end{aligned}$$

Henceforth we will drop the coefficients \({{\mathbf {Q}}}\) from the notation. The symbol \(\mathrm {Pr}\) stands for primitive; alternatively, \(H_1^\mathrm {Pr}(Z, Y)\) is the homology of a Prym variety.

Now \(H_1(Z, {{\mathbf {Q}}})\) and \(H_1(Y, {{\mathbf {Q}}})\) are both equipped with skew-symmetric pairings, the intersection pairings. The map \(\pi ^*\) scales the pairing by the degree q of the covering of \(Z \rightarrow Y\). If the pairing on \(H_1(Z, {{\mathbf {Q}}})\) is nondegenerate, we may identify the primitive homology with the orthogonal complement to \(\pi ^* H_1(Y, {{\mathbf {Q}}})\) in \(H_1(Z, {{\mathbf {Q}}})\), and in particular this primitive homology inherits a skew-symmetric pairing. In our case, Z and Y will both be compact surfaces punctured at a single point, and therefore the intersection pairings on \(H_1(Z, {{\mathbf {Q}}})\) and \(H_1(Y, {{\mathbf {Q}}})\) are perfect.

8.2.1 The mapping class group and its action on homology; the map \({\text {Mon}}\)

Clearly the diffeomorphism group of Y acts on the finite set of isomorphism classes of \({\text {Aff}}(q)\)-covers of Y, and this action factors through the mapping class group \(\mathrm {MCG}(Y)\). In algebraic terms, this action is induced from the map \(\mathrm {MCG}(Y) \longrightarrow \mathrm {Out}(\pi _1(Y, y_0))\).

Let \(\mathrm {MCG}(Y)_Z\) denote the stabilizer of \((Z, \pi )\) for this action. Since \(\mathrm {Aff}(q)\) has trivial centralizer in \(\mathrm {Sym}({\mathbf {F}}_q)\), such elements lift uniquely to mapping classes on Z, i.e. there is a homomorphism

$$\begin{aligned} \mathrm {MCG}(Y)_Z \longrightarrow \mathrm {MCG}(Z). \end{aligned}$$

Namely, fixing a representative \(\alpha {:}\,Y \rightarrow Y\), there is a unique \(f{:}\,Z \rightarrow Z\) that renders the diagram

(8.2)

commutative. Sending the mapping class of \(\alpha \) to the mapping class of f defines the desired homeomorphism.

This construction gives rise to actions of \(\mathrm {MCG}(Y)_Z\) on \(H_1(Z)\) and \(H_1^\mathrm {Pr}(Z, Y)\). This latter action is the monodromy map

$$\begin{aligned} {\text {Mon}}{:}\,\mathrm {MCG}(Y)_Z \rightarrow \mathrm {Sp}( H_1^\mathrm {Pr}( Z, Y )). \end{aligned}$$

8.2.2 The main theorem

Fix a surface Y of genus \(g \geqslant 2\), a point \(y \in Y\), a prime \(q \geqslant 3\); as before, \({\text {Aff}}(q)\) denotes the group of affine-linear transformations of \({\mathbf {F}}_q\).

We consider \({\text {Aff}}(q)\)-covers \(Z^{\circ }\) of \(Y - \{y\}\) such that the monodromy around y is nontrivial (hence a q-cycle); the compactification of such a cover is a surface Z of genus \(gq - \frac{q-1}{2}\). We call such Zsingly ramified\({\text {Aff}}(q)\)-covers of Y. The notation hides the dependence on the point y, which will remain fixed. There are, up to isomorphism, only finitely many such Z; choose a representative for each isomorphism class and call them \(Z_1, Z_2, \ldots , Z_{N}\), and let \({\text {Cov}}_1, {\text {Cov}}_2, \ldots , {\text {Cov}}_{N}{:}\,\pi _1(Y - \{y\}) \rightarrow {\text {Aff}}(q)\) be representatives for the associated monodromy mappings.

Let \(\mathrm {MCG}(Y - \{y\})_0\) denote the intersection of the groups \(\mathrm {MCG}(Y - \{y\})_{Z_i}\). The individual monodromy maps attached to the covers \(Z_i\) combine to give a map

$$\begin{aligned} {\text {Mon}}{:}\,\mathrm {MCG}(Y - \{y\})_0 \rightarrow \prod _{i=1}^{N} \mathrm {Sp}( H_1^\mathrm {Pr}(Z_i, Y) ). \end{aligned}$$
(8.3)

The mapping class group of a punctured surface fits in the Birman exact sequence [16, Theorem 4.6]

$$\begin{aligned} 0 \rightarrow \pi _1(Y, y) \rightarrow \mathrm {MCG}(Y - \{y\}) \rightarrow \mathrm {MCG}(Y) \rightarrow 0. \end{aligned}$$
(8.4)

Let \(\pi _1(Y, y)_0\) denote the inverse image of \(\mathrm {MCG}(Y - \{y\})_0\) in \(\pi _1(Y, y)\); the inclusion \(\pi _1(Y, y)_0 \subseteq \pi _1(Y, y)\) is of finite index.

The restriction of (8.3) to the subgroup \(\pi _1(Y, y)_0\) describes the monodromy of a Kodaira–Parshin family, as in Definition 7.3. We review this connection in more detail in Sect. 8.2.3. The following statement is equivalent to the large monodromy property of Kodaira–Parshin families, stated without proof as point (i) before Theorem 5.4.

Theorem 8.1

Let notation be as above; in particular, \(Z_1, \dots , Z_{N}\) are a set of representatives for isomorphism classes of singly ramified \({\text {Aff}}(q)\)-covers of Y. Then the map

$$\begin{aligned} {\text {Mon}}{:}\,\pi _1(Y, y)_0 \rightarrow \prod _{i=1}^{N} \mathrm {Sp}( H_1^\mathrm {Pr}(Z_i, Y) ) \end{aligned}$$
(8.5)

has Zariski-dense image.

We briefly outline the proof. We give in Sect. 8.4 a “normal form” for each \({\text {Aff}}(q)\)-cover. Using the sequence (8.4), we reduce to showing a similar assertion with \(\pi _1(Y, y)_0\) replaced by \(\mathrm {MCG}(Y-\{y\})_0\). This allows us to use Dehn twists. Using our normal form for \({\text {Aff}}(q)\)-covers, and constructing a suitable system of curves to Dehn-twist around, we can see that the monodromy surjects onto each factor \(\mathrm {Sp}(H_1^\mathrm {Pr}(Z_i, Y))\). A version of Goursat’s lemma completes the proof.

When considering the general problem (replacing \(\mathrm {Aff}(q)\) or cyclic covers by G-covers) the primitive homology must be further decomposed according to the representation theory of G. Looijenga [27] has proven a similar result for cyclic covers of surfaces without monodromy; in fact, Looijenga determines the exact image of \({\text {Mon}}\) in this situation. See also [18, Theorem 1.6] for an analogous result for unramified covers of a closed surface, and [37] for covers whose covering group is the Heisenberg group.

8.2.3 Application to Theorem 5.4

For clarity we now write out why Theorem 8.1, in the form stated, above, implies what is used in Theorem 5.4: namely, the Kodaira–Parshin family \(X_q \rightarrow Y_q' {\mathop {\rightarrow }\limits ^{\pi }} Y\) for the group \(\mathrm {Aff}(q), q \ge 3\), has full monodromy.

In stages:

  • We begin, as in Sect. 7.1 with a family \(Z_q \rightarrow Y_q' \rightarrow Y_q\), with \(Z_q \rightarrow Y_q'\) a relative curve. (The Kodaira–Parshin family was constructed by applying a Prym construction to this, as explicated in Definition 7.3).

  • Fix \(y \in Y_q({{\mathbf {C}}})\). The fiber of \(Y_q' \rightarrow Y_q\) above \(y \in Y_q({{\mathbf {C}}})\) is identified with the isomorphism classes of singly ramified \(\mathrm {Aff}(q)\)-covers of \(Y({{\mathbf {C}}})\), branched at y. This follows from property (i) of Proposition 7.1.

  • Fix \(y' \in Y_q'({{\mathbf {C}}})\) above \(y \in Y_q({{\mathbf {C}}})\). Let \(Z_{q,y'}\) be the fiber of \(Z_q \rightarrow Y_q'\) over \(y'\).

    By construction, \(Z_{q,y'} \rightarrow Y({{\mathbf {C}}})\) is a singly branched \(\mathrm {Aff}(q)\)-cover, and we can form the degree q cover associated to the action of \(\mathrm {Aff}(q)\) on \({{\mathbf {Z}}}/q{{\mathbf {Z}}}\), i.e. \(Z_{q,y'}({{\mathbf {C}}}) \times _{\mathrm {Aff}(q)} {{\mathbf {Z}}}/q{{\mathbf {Z}}}\). By our definitions above, we have

    $$\begin{aligned} Z_{q,y'}({{\mathbf {C}}}) \times _{\mathrm {Aff}(q)} {{\mathbf {Z}}}/q{{\mathbf {Z}}}\simeq Z_i \end{aligned}$$

    (for some unique i in \(\{1, 2, \dots , N\}\)) as Riemann surfaces over \(Y({{\mathbf {C}}})\).

  • The construction of Kodaira–Parshin families then gives rise, as in (7.3), to an isogeny

    $$\begin{aligned} \text{ fiber }\,X_{q,y'}\,\text{ of }\,X_q\,\text{ above }\,y' \longrightarrow \mathrm {Prym}(Z_i \rightarrow Y({{\mathbf {C}}})) \end{aligned}$$

    This isogeny induces an isomorphism of the rational homology groups:

    $$\begin{aligned} H_1^{\Pr }(Z_i, Y; {{\mathbf {Q}}}) \simeq \,\text{ first } \text{ homology } \text{ of } X_{q,y'}({{\mathbf {C}}})\,\text{ with } \text{ rational } \text{ coefficients. } \end{aligned}$$
  • This identification is compatible with monodromy, and so Theorem 8.1 translates to definition (5.1) of full monodromy.

8.3 Dehn twists and liftable curves

We say that e is a simple closed curve in a surface Y if it is the image of a smooth embedding \(S^1 \rightarrow Y\); a simple closed curve has no self-intersection. For us, a simple closed curve will always come with an orientation, namely, the orientation induced from a fixed orientation on \(S^1\). If \(y \in Y\) is a point, then we say \(\eta \in \pi _1(Y, y)\) is represented by a simple closed curve if there is a loop e in Y, based at y and representing the class \(\eta \in \pi _1(Y, y)\), which is a simple closed curve. We may say (somewhat imprecisely) that \(\eta \) “is” a simple closed curve.

If e is a simple closed curve in \(Y - \{y\}\), the Dehn twist \(D_e\) about e acts on \(H_1(Y)\) by the transvection \(T_e\); indeed, we can regard \(D_e\) as an element of \(\mathrm {MCG}(Y - \{y\})\). We want to study how this lifts to an \({\text {Aff}}(q)\)-cover \(Z \rightarrow Y\): let \(n_e\) be the order of the image of e in \({\text {Aff}}(q)\). Then \(D_e^{n_e}\) lifts to an automorphism of the cover Z, as we now describe. Suppose the image of e under \(\pi _1(Y, y_0) \rightarrow {\text {Aff}}(q)\rightarrow \mathrm {Sym}({\mathbf {F}}_q)\) has cycle structure \((d_1, \ldots , d_k)\). The preimage of e under \(Z \rightarrow Y\) is a disjoint union of circles \(e_1, \ldots , e_k\), with the circles \(e_i\) in natural bijection with the cycles in the permutation. Then \(D_e^{n_e}\) lifts to the product of commuting Dehn twists

$$\begin{aligned} \prod _i D_{e_i}^{n_e / d_i} \end{aligned}$$

on Z.

In our cases, the only possibilities for cycle structure are as follows:

  • If e maps to an element of \({\text {Aff}}(q)\) that is not in \({{\mathbf {F}}}_q^+\), i.e. has nontrivial image \(a \in {{\mathbf {F}}}_q^*\), then \((d_1, \dots , d_k) = (1, \mathrm {ord}_q(a), \mathrm {ord}_q(a), \ldots , \mathrm {ord}_q(a))\).

  • If e maps to a nonzero element of \({{\mathbf {F}}}_q^+\) then \((d_1, \dots , d_k) = (q)\).

  • If e maps to the identity element of \({\text {Aff}}(q)\), then \((d_1, \dots , d_k) = (1, \dots , 1)\).

Now we note that:

Lemma 8.2

Let e be a simple closed curve in \(Y - \{y\}\). Then the classes of the preimages \([e_1], \dots , [e_k]\) in the homology of Z are linearly independent; projected to \(H_1^\mathrm {Pr}(Z, Y)\), their span has dimension \(k - 1\).

Proof

Y admits the structure of a CW complex with one 2-cell such that e belongs to the 1-skeleton. The inclusion of this 1-skeleton into \(Y-y\) is a homotopy equivalence. Correspondingly the inclusion of the preimage (in Z) of this 1-skeleton into the preimage (in Z) of \(Y-y\) is also a homotopy equivalence. Note also that the inclusion of \(Y-y\) into Y induces an isomorphism on \(H_1\), with a similar statement for \(Z - \pi ^{-1}(y)\).

These remarks allow us to reduce the Lemma to corresponding assertions for a covering of a finite graph, which are clear. \(\square \)

The action of \({\text {Mon}}(D_e^{n_e})\) on \(H_1^\mathrm {Pr}(Z, Y)\) is a unipotent transformation u such that the image of \(u-1\) is exactly the span of the classes of the circles \(e_i\). By the Lemma just proven, this has dimension \(k-1\); correspondingly the fixed space \({\text {Mon}}(D_e^{n_e})\) on \(H_1^\mathrm {Pr}(Z, Y)\) has codimension \(k-1\).

We record the following consequence.

Lemma 8.3

Suppose \(Z \rightarrow Y\) is an \({\text {Aff}}(q)\)-cover. Let e be a simple closed curve in Y, and take \(M\) such that \(D_e^{M} \in \mathrm {MCG}(Y)_Z\). Then the rank of \({\text {Mon}}(D_e^{M})-\mathrm {Id}\) acting on \(H_1^\mathrm {Pr}(Z,Y)\) determines the conjugacy class of \({\text {Cov}}(e)\) in the symmetric group \(\mathrm {Sym}({\mathbf {F}}_q)\).

A particularly important case is when e is a simple closed curve in Y such that \({\text {Cov}}(e)\) maps to a generator for \({\mathbf {F}}_q^*\) under the natural map \({\text {Aff}}(q)\rightarrow {\mathbf {F}}_q^*\). We call such a e a liftable curve (for the \({\text {Aff}}(q)\)-cover \(Z \rightarrow Y\)). Its preimage in Z splits into a union of simple closed curves \(e^+\) of degree 1 over e, and \(e^-\) of degree \(q-1\) over e. For liftable e we write

$$\begin{aligned} {\widetilde{e}} := \text{ projection } \text{ of } \text{ the } \text{ class } \text{ of } e^+ \text{ to } \text{ primitive } \text{ homology. } \end{aligned}$$

According to our discussion above, \(D_e\) induces a transvection on \(H_1^\mathrm {Pr}(Z, Y)\), with center \({\widetilde{e}}\).

Write \(\cdot \) for the intersection pairing on homology. Given liftable curves A and B, we have

$$\begin{aligned} {\widetilde{A}} \cdot {\widetilde{B}} = (A^+ \cdot B^+) - \frac{1}{q} A \cdot B. \end{aligned}$$
(8.6)

Indeed, identifying primitive homology with the kernel of the pushforward, we have \(q{\widetilde{A}} = q A^+ - \pi ^* A\), and so the intersection pairing of \(q {\widetilde{A}}\) with \(q {\widetilde{B}}\) is

$$\begin{aligned} (q A^+ - \pi ^* A) \cdot (q B^+ - \pi ^* B) = q^2 (A^+ \cdot B^+) - 2q (A \cdot B) + q (A \cdot B), \end{aligned}$$

as desired.

8.4 A normal form for an \(\mathrm {Aff}_q\)-cover

Again, take Z a singly ramified \({\text {Aff}}(q)\)-cover of Y. We will describe the cover \(Z \rightarrow Y\) in a normal form by cutting Y carefully, using essentially the fact that \(\mathrm {Aff}(q)\) is solvable. The end result is roughly that the covering \(Z\rightarrow Y\) can be expressed as the sum of a trivial cover of a genus \(g-1\) surface and a nontrivial cover on a torus.

Choose a basepoint \(y_0 \in Y\). The map \({\text {Cov}}{:}\,\pi _1(Y - \{y\}, y_0) \rightarrow {\text {Aff}}(q)\) specifying the cover \(Z \rightarrow Y\) induces a map

$$\begin{aligned} H_1(Y, {\mathbf {Z}}) \cong H_1(Y - \{y\}, {\mathbf {Z}}) \rightarrow {\mathbf {F}}_q^* \end{aligned}$$

on abelianizations. The group \( {\mathbf {F}}_q^*\) is cyclic and \(H_1(Y, {\mathbf {Z}})\) is free. If we choose a surjection \({\mathbf {Z}} \twoheadrightarrow {\mathbf {F}}_q^*\), then the map on abelianizations lifts to a map

$$\begin{aligned} H_1(Y, {\mathbf {Z}}) \rightarrow {\mathbf {Z}} \rightarrow {\mathbf {F}}_q^*. \end{aligned}$$

We can choose this map so that \(H_1(Y, {\mathbf {Z}}) \rightarrow {\mathbf {Z}}\) is surjective, so it is given by intersecting with a primitive integral homology class \(\alpha _1\).

Choose a simple closed curve, which we also call \(\alpha _1\), representing this class. (Indeed, any primitive integral homology class is represented by a simple closed curve: [16, Proposition 6.2].) In fact, choose two such curves, \(\alpha _1^+\) and \(\alpha _1^-\), which pass “close by” but on either side of the ramification point y, and are parallel to one another (Fig. 1). Note that, since our cover is ramified at y, and the monodromy at y is a nontrivial element of in \({\mathbf {F}}_q^+\), \({\text {Cov}}(\alpha _1^+)\) and \({\text {Cov}}(\alpha _1^-)\) cannot both be trivial.

Fig. 1
figure 1

The curves \(\alpha _1^{\pm }\) and \(\alpha _2\) on Y

Cutting Y along the curves \(\alpha _1^{\pm }\) and discarding the connected component of y, we obtain a surface \(Y^1\) with two boundary components. Let \(Z^1\) be the pullback of our covering to \(Y^1\). The map \({\text {Cov}}{:}\,\pi _1(Y^1, y_0) \rightarrow {\text {Aff}}(q)\) has image contained in \({\mathbf {F}}_q^+ \subseteq {\text {Aff}}(q)\) by our choice of \(\alpha \); so it factors through \(H_1(Y^1, {\mathbf {Z}})\).

The boundary components (with orientations defined by an outward normal) define classes \(b_+, b_- \in H_1(Y^1,{{\mathbf {Z}}})\); these classes satisfy \(b_+ + b_- = 0\) because their sum is the boundary of \(Y^1\). We saw above that \(b_+\) and \(b_-\) cannot both have trivial image in \({\mathbf {F}}_q^+\); so \({\text {Cov}}(b_+) = - {\text {Cov}}(b_-)\) must be nontrivial. Conjugating by a suitable element of \({\text {Aff}}(q)\) as necessary, we may as well suppose that \(b_+ \in H_1(Y^1,{{\mathbf {Z}}})\) maps to \(1 \in {\mathbf {F}}_q^+\).

For a surface such as \(Y^1\) with boundary \(\partial Y^1\), Poincaré duality takes the form of a perfect pairing between absolute and relative homology:

$$\begin{aligned} H_1(Y^1, \partial Y^1; {{\mathbf {Z}}}) \times H_1(Y^1; {{\mathbf {Z}}}) \rightarrow {{\mathbf {Z}}}. \end{aligned}$$
(8.7)

The map \(H_1(Y^1,{{\mathbf {Z}}}) \rightarrow {\mathbf {F}}_q^+\) lifts to a map \(H_1(Y^1,{{\mathbf {Z}}}) \rightarrow {\mathbf {Z}}\); since \(b_+\) is a primitive element of \(H_1(Y^1,{{\mathbf {Z}}})\), we can choose such a lift taking \(b_+\) to 1. This lift is of the form \(x \mapsto \langle x, \alpha _2 \rangle \) for a relative homology class \(\alpha _2 \in H_1(Y^1, \partial Y^1; {{\mathbf {Z}}})\). Therefore \(\alpha _2\) intersects the boundary components with multiplicity \(+1\) and \(-1\).

The following lemma readily implies that \(\alpha _2\) can be represented by a simple curve, the image of an immersion \(e{:}\,[0,1] \rightarrow Y^1\) that meets \(\partial Y^1\) only at the endpoints, which we also call \(\alpha _2\). Indeed, it implies MCG(Y) acts transitively on that subset of \(H_1(Y, \partial Y) \simeq {{\text {Hom}}}(H_1(Y,{{\mathbf {Z}}}), {{\mathbf {Z}}})\) consisting of elements whose pairing with a fixed boundary circle is 1. It follows that we can find a mapping class carrying the homology class of e to the homology class \(\alpha _2\), as desired.

Lemma 8.4

Suppose Y is a surface of genus g with 2 boundary components, so \(V = H_1(Y, {\mathbf {Z}})\) is a free \({\mathbf {Z}}\)-module of rank \(2g + 1\). We regard it as equipped with a (degenerate) alternating form via \(H_1(Y) \rightarrow H_1(Y, \partial Y)\) and the duality pairing (8.7); the radical of this form is the rank-1 submodule \(V^0\) generated by b, the class of one of the two boundary components of Y.

Let \(\mathrm {Sp}(V, b)\) denote the group of automorphisms of V preserving the bilinear form and fixing b. Then the natural map \(MCG(Y) \rightarrow \mathrm {Sp}(V, b)\) is surjective.

Proof

The group \(\mathrm {Sp}(V, b)\) fits into an exact sequence

$$\begin{aligned} 1 \rightarrow {{\text {Hom}}}(V/V^0, V^0) \rightarrow \mathrm {Sp}(V, b) \rightarrow \mathrm {Sp}(V/V^0) \rightarrow 1, \end{aligned}$$

where the left-hand map is given by \(f \mapsto 1+f\).

Now one obtains a closed surface from Y by capping off both boundary components. The mapping class group \(\mathrm {MCG}(Y)\) surjects onto the mapping class group of this closure [16, Prop 3.19]. Therefore (by the surjectivity of the symplectic representation for a closed surface [16, Theorem 6.4]) it surjects onto \(\mathrm {Sp}(V/V^0)\).

Now let \(v \in V\) be a class, not in \(V^0\), which is represented by a simple closed curve in Y; and let b be one of the two boundary components of Y. We can represent \(v + b\) by a simple closed curve as well (possibly after replacing b with \(-b\)). Thus the image of \(\mathrm {MCG}(Y)\) contains the transvections \(T_v\) and \(T_{v+b}\). The composition \(T_{v+b} T_v^{-1}\) is a nontrivial element of \(\mathrm {Sp}(V, b)\), coming from the element

$$\begin{aligned} x \mapsto \langle x, v \rangle b \in {{\text {Hom}}}(V/V^0, V^0). \end{aligned}$$

These generate \({{\text {Hom}}}(V/V^0, V^0)\) so the result follows. \(\square \)

Cut \(Y^1\) along \(\alpha _2\), and let \(Y^2\) be the resulting surface; it is a surface of genus \(g-1\) with one boundary component. The pullback of the cover \(Z \rightarrow Y\) to \(Y^2\)splits, i.e., becomes a disjoint union of q copies of \(Y^2\). We can recover Y from \(Y^2\) by gluing to \(Y^2\) a torus with one boundary component. Thus our discussion has shown that it is possible to put any \({\text {Aff}}(q)\)-cover \(Z \rightarrow Y\) into a normal form: a connected sum of a trivial cover of a genus \(g-1\) surface and a nontrivial cover of a torus (Fig. 2).

Fig. 2
figure 2

Y as a connected sum

To summarize: let \({\mathsf {S}}_{g-1}\) be a genus-\((g-1)\) surface and let \({\mathsf {T}}\) be a torus. Fix a small open disk D in \({\mathsf {S}}_{g-1}\) and \(D'\) in \({\mathsf {T}}\), and set

$$\begin{aligned} {\mathsf {S}}_{g-1}^{\circ } = {\mathsf {S}}_{g-1} - D,\ \ {\mathsf {T}}^{\circ } = {\mathsf {T}} - D', \end{aligned}$$
(8.8)

so these are, respectively, a surface of genus \(g-1\) with one boundary component and a torus with one boundary component. We identify Y with the genus-g surface obtained by gluing \({\mathsf {S}}_{g-1}^{\circ }\) to \({\mathsf {T}}^{\circ }\) along an identification \(\partial D' \simeq \partial D\). (In relation to the discussion just given, \({\mathsf {S}}_{g-1}^{\circ }\) is homotopy-equivalent to \(Y^2\).)

Proposition 8.5

(Normal form for \({\text {Aff}}(q)\)-covers) Let Z be a singly ramified \({\text {Aff}}(q)\)-cover of Y. Then we may write Y as a connected sum:

$$\begin{aligned} Y = {\mathsf {S}}_{g-1} \# {\mathsf {T}}, \end{aligned}$$

where \({\mathsf {S}}_{g-1}\) is a genus-\((g-1)\) surface and \({\mathsf {T}}\) is a genus-1 surface, satisfying the following properties (with notation as above).

  • The ramification point y belongs to the interior of \({\mathsf {T}}^{\circ }\).

  • The cover \(Z \rightarrow Y\) splits over \({\mathsf {S}}_{g-1}^{\circ }\).

  • The cover \(Z \rightarrow Y\), when restricted to \({\mathsf {T}}^{\circ }\), extends over \({\mathsf {T}}\), i.e. has trivial monodromy around the boundary circle of \({\mathsf {T}}^{\circ }\).

  • With respect to a standard basis for \(\pi _1({\mathsf {T}}-y,*)\), a free group on two generators \(\beta _1, \beta _2\), the monodromy of the cover sends

    • \(\beta _1\) to an element of \({\text {Aff}}(q)\) projecting to a generator for \({{\mathbf {F}}}_q^*\), and

    • \(\beta _2\) to an nonzero element of \({{\mathbf {F}}}_q^+\).

Here \(\beta _1\) is a curve which crosses \(\alpha _1\) once and does not cross \(\alpha _2\); and similarly for \(\beta _2\) (Fig. 3).

Fig. 3
figure 3

The curves \(\beta _1\) and \(\beta _2\). (The basepoint is the intersection of \(\beta _1\) and \(\beta _2\))

Thus Z consists of q copies of \({\mathsf {S}}_{g-1}^{\circ }\) glued to a degree-q cover \(\widetilde{{\mathsf {T}}^{\circ }}\) of \({\mathsf {T}}^{\circ }\) along q boundary circles. In the sequel we will use \(\widetilde{{\mathsf {S}}_{g-1}^{\circ }}\) for the cover of \({\mathsf {S}}_{g-1}^{\circ }\) induced by Z.

8.5 Proof of Theorem 8.1

We must show (8.5) has Zariski-dense image. We will perform a series of reductions; the main steps are Lemmas 8.78.9, and 8.10.

Lemma 8.6

The image of \(\pi _1(Y, y)_0\) (see after (8.4) for definition) under the monodromy map

$$\begin{aligned} \pi _1(Y, y)_0 \rightarrow \mathrm {Sp}( H_1^\mathrm {Pr}(Z_i, Y) ) \end{aligned}$$

to any factor \(\mathrm {Sp}(H_1^\mathrm {Pr}(Z_i, Y))\) of the right-hand side of (8.5) is not contained in the center of \(\mathrm {Sp}\).

Proof

We leave the simple topological proof to the reader.Footnote 6\(\square \)

Because \(\pi _1(Y, y)_0\) is normal inside \(\mathrm {MCG}(Y-\{y\})_0\) and the symplectic groups are almost simple, Theorem 8.1 follows from the subsequent Lemma:

Lemma 8.7

The monodromy map restricted to \(\mathrm {MCG}(Y-\{y\})_0\),

$$\begin{aligned} {\text {Mon}}{:}\,\mathrm {MCG}(Y-\{y\})_0 \rightarrow \prod _{i=1}^{N} \mathrm {Sp}( H_1^\mathrm {Pr}(Z_i, Y) ), \end{aligned}$$

has Zariski-dense image.

In turn, using Lemma 2.12, this will follow from Lemmas 8.8 and 8.9.

Lemma 8.8

(Distinct covers are distinguished by monodromy around a simple closed curve.) For two non-isomorphic \({\text {Aff}}(q)\)-covers \(Z_1, Z_2\) there exists a simple closed curve \(\eta \) in Y such that the cycle decompositions of the monodromy around \(\eta \) in \(Z_1\) and \(Z_2\) are different.

Proof

Two coverings \(Z_1, Z_2\) define two maps \(\pi _1(Y-y) \rightarrow {\text {Aff}}(q)\). Suppose, first of all, that their projections to \({\mathbf {F}}_q^*\) have different kernels (i.e. are not related by an automorphism of \({\mathbf {F}}_q^*\)). We may find a primitive homology class whose images under the two maps \(f_1, f_2{:}\,H_1(Y, {{\mathbf {Z}}}) \rightarrow {\mathbf {F}}_q^*\) have different orders in \({\mathbf {F}}_q^*\). Indeed, there is a basis \(e_1, \dots , e_r\) for \(H_1(Y, {{\mathbf {Z}}})\) such that the kernel of \(f_1\) equals \((q-1)e_1, e_2, \dots , e_r\); not all of \(e_2, \dots , e_r\) can be in the kernel of \(f_2\), and so at least one of these latter classes suffice. Represent this primitive homology class by a simple closed curve to construct \(\eta \).

Otherwise, the coverings \(Z_1, Z_2\) define maps \(\pi _1(Y-y) \rightarrow {{\mathbf {F}}}_q^*\) having the same kernel. Accordingly, in the algorithm to convert an \({\text {Aff}}(q)\)-cover into a normal form described in Sect. 8.4, we can cut Y along the same curve \(\alpha _1\), as in Sect. 8.4, for both \(Z_1\) and \(Z_2\). We obtain, as before, a surface \(Y^1\) with two boundary components; the covers \(Z_1, Z_2\) define two maps

$$\begin{aligned} g_1, g_2{:}\,H_1(Y^1,{{\mathbf {Z}}}) \longrightarrow {{\mathbf {F}}}_q^+. \end{aligned}$$

If \(g_1\) is not proportional to \(g_2\), we can find a primitive homology class for \(H_1(Y^1,{{\mathbf {Z}}})\) which is in the kernel of one map but not the other. Represent this primitive homology class by a simple closed curve to construct \(\eta \).

Otherwise \(g_1\) and \(g_2\) are proportional, so the two maps \(\pi _1(Y^1) \rightarrow {{\mathbf {F}}}_q^+\) have the same kernel. Therefore, we can cut \(Y_1\) along the same curve \(\alpha _2\) for both \(Z_1\) and \(Z_2\). So we get a decomposition of Y as a connected sum \(Y = {\mathsf {S}}_{g-1} \# {\mathsf {T}}\) as above, such that both \(Z_1\) and \(Z_2\) become trivial on \({\mathsf {S}}_{g-1}\).

Let \(\beta _1\) and \(\beta _2\) be curves on \({\mathsf {T}}\) as in the end of Sect. 8.4. Then both maps \(\pi _1(Y - y) \rightarrow {\text {Aff}}(q)\) send \(\beta _1\) to an element of \({\text {Aff}}(q)\) projecting to a generator for \({{\mathbf {F}}}_q^*\); and they both send \(\beta _2\) to an element of \({{\mathbf {F}}}_q^+\). Each of \({\text {Cov}}_1 (\beta _1)\) and \({\text {Cov}}_2 (\beta _1)\) has a unique fixed point in \({{\mathbf {F}}}_q\); up to conjugation, we may suppose this fixed point is 0. By a further conjugation we may assume that \({\text {Cov}}_1 (\beta _2) = {\text {Cov}}_2 (\beta _2) = 1 \in {{\mathbf {F}}}_q^+\).

So we can write

$$\begin{aligned} {\text {Cov}}_1(\beta _1){:}\,x \mapsto c_1 x \end{aligned}$$

and

$$\begin{aligned} {\text {Cov}}_2(\beta _1){:}\,x \mapsto c_2 x. \end{aligned}$$

If \(Z_1\) and \(Z_2\) are not isomorphic covers, we must have \(c_1 \ne c_2\).

There is a map

$$\begin{aligned} \pi _1(Y-y) \longrightarrow \pi _1({\mathsf {T}}-y) \end{aligned}$$

which is obtained [in the notation of (8.8)] by collapsing \({\mathsf {S}}_{g-1}^{\circ }\) to a point; this gives a map from \(Y-y\) to a surface that is homotopy equivalent to \({\mathsf {T}}-y\).

There exists a simple closed curve \(\eta \in \pi _1(Y - y)\) mapping to \(\beta _1 \beta _2 \beta _1^{-1} \beta _2^{q-c_1}\) under this map \(\pi _1(Y - y) \rightarrow \pi _1({\mathsf {T}} - y)\): see Fig. 4 and its caption.

Then \({\text {Cov}}_1(\eta )\) is trivial but \({\text {Cov}}_2(\eta )\) is not trivial. This concludes the proof. \(\square \)

Fig. 4
figure 4

The curve \(\eta \). How to read the picture: follow along the path \(\eta \), starting at the basepoint \(y_0\). Write down a word in the symbols \(\beta _1\) and \(\beta _2\) as follows. Every time \(\eta \) crosses \(\alpha _1\), write \(\beta _1\) or \(\beta _1^{-1}\), depending whether the crossing was in the positive or negative direction. Every time \(\eta \) crosses \(\alpha _2\), write \(\beta _2\) or \(\beta _2^{-1}\). The resulting word is the image of \(\eta \) under the map \(\pi _1(Y - y, y_0) \rightarrow \pi _1({\mathsf {T}} - y, y_0)\), which we readily see is \(\beta _1 \beta _2 \beta _1^{-1} \beta _2^{2}\)

Lemma 8.9

The monodromy map \(\mathrm {Mon}{:}\,\mathrm {MCG}(Y)_{Z_i} \longrightarrow \mathrm {Sp}(H_1^\mathrm {Pr}(Z_i, Y))\) has Zariski-dense image.

We are now reduced to proving Lemma 8.9. Let \(Z=Z_i\) for some fixed i. By the construction of Dehn twists from liftable curves (see discussion at end of Sect. 8.3), as well as Lemma 2.14 on generation by transvections, it is enough to show:

Lemma 8.10

There exists a collection of liftable curves \(A_1, \dots , A_N\) on Y such that:

  1. (a)

    the \({\widetilde{A}}_i\) span the primitive homology \(H_1^\mathrm {Pr}(Z, Y);\)

  2. (b)

    the graph obtained by connecting \(A_i, A_j\) when \(\widetilde{A_i} \cdot \widetilde{A_j} \ne 0\) is connected.

8.6 Proof of Lemma 8.10

We put the singly ramified \({\text {Aff}}(q)\)-cover \(Z \rightarrow Y\) in a normal form, as explained in Sect. 8.4. Recall notation (D, \(D'\), \({\mathsf {S}}_{g-1}\), \({\mathsf {T}}\), and so forth) from the end of Sect. 8.4. We will produce the curves \(A_i\) by concatenating curves on \({\mathsf {T}}^{\circ }\) and curves on \({\mathsf {S}}_{g-1}^{\circ }\).

Fix a point \(p \in \partial D \cong \partial D'\). Fix a labelling of the points of Z above p by \({{\mathbf {F}}}_q\), compatible with the usual action of \({\text {Aff}}(q)\) for some fixed homomorphism

$$\begin{aligned} {\text {Cov}}{:}\,\pi _1(Y-y, p) \longrightarrow {\text {Aff}}(q). \end{aligned}$$

Recall that the cover \(Z \rightarrow Y\) splits over \({\mathsf {S}}_{g-1}^{\circ }\); the labelling above p therefore permits us also to label the components of \({\mathsf {S}}_{g-1}^{\circ }\) by \({{\mathbf {F}}}_q^+\).

Lemma 8.11

There exist \(q+1\) simple closed curves \(\{ \gamma _j{:}\,0 \leqslant j \leqslant q \}\) on \({\mathsf {T}}^{\circ }\), beginning and ending at p, not passing through y, and intersecting \(\partial D'\) only at its endpoints, such that:

  1. (i)

    For each j, the monodromy \(\mathrm {Cov}(\gamma _j)\) projects under \({\text {Aff}}(q)\rightarrow {{\mathbf {F}}}_q^*\) to the same fixed generator of \({{\mathbf {F}}}_q^*\).

  2. (ii)

    The monodromy of \(\gamma _j\), defining a map \({{\mathbf {F}}}_q \rightarrow {{\mathbf {F}}}_q\), fixes exactly j modulo q.

  3. (iii)

    The (unique) lifts \(\gamma _j^+\) to simple closed curves on \(\widetilde{{\mathsf {T}}^{\circ }}\) span the homology of \(\widetilde{{\mathsf {T}}^{\circ }}\)modulo the homology of its boundary.

  4. (iv)

    Each \(\gamma _j\) has the same orientation near p, i.e., either the outgoing branch is “above” the incoming branch for all j, or vice versa.

Proof

Take an explicit basis \(\beta _1, \beta _2\) of homology of T, such as was described in Sect. 8.4; conjugating if necessary we can suppose that the monodromy of \(\beta _1\) is \(x \mapsto g x\) (for \(g \in {{\mathbf {F}}}_q^*\) a generator) and the monodromy of \(\beta _2\) is \(x \mapsto x+1\). We can choose this basis in such a way that all powers \(\beta _1 \beta _2^j\) with j non-negative are represented by simple closed curves on \({\mathsf {T}}^{\circ }\), which start and end at p.

The monodromy around \(\beta _1 \beta _2^j\) is given by \(x \mapsto g(x+j)\), which fixes \( \frac{gj}{1-g} \in {{\mathbf {F}}}_q^+\). Write \([\ell ]\) for the unique representative of \(\ell \in {{\mathbf {F}}}_q\) that lies in \([0,q-1]\), and put

$$\begin{aligned} j^* = {\left\{ \begin{array}{ll} [\frac{gj}{1-g}], &{}\quad j \ne q \\ q, &{}\quad j=q. \end{array}\right. } \end{aligned}$$

The map \(j \mapsto j^*\) gives a bijection from [0, q] to itself. Now put

$$\begin{aligned} \gamma _{j^*} = \beta _1 \beta _2^j \ \ (j \in [0,q]). \end{aligned}$$

Conditions (i) and (ii) are clearly satisfied. To check (iii) we must verify that the associated homology classes span homology of \(\widetilde{{\mathsf {T}}^{\circ }}\) modulo its boundary. One could verify this by computing an explicit CW-complex for \(\widetilde{{\mathsf {T}}^{\circ }}\); we present an alternative group-theoretic proof. It is sufficient to show that

$$\begin{aligned} \text{ the } \text{ homology } \text{ clases } \text{ of } \text{ the } \text{ lifts } \text{ of }\,\gamma _j\,\text{ span }\,H_1(\widetilde{{\mathsf {T}}-y}). \end{aligned}$$
(8.9)

Here and in what follows we make use of the fact that our \({\text {Aff}}(q)\)-cover extends over \({\mathsf {T}}\), and thus write (e.g.) \(\widetilde{{\mathsf {T}}}\). To see that (8.9) indeed implies (iii), consider the diagram

where f is surjective because in fact \(H_1(\widetilde{{\mathsf {T}}-y}) \simeq H_1(\widetilde{{\mathsf {T}}})\): the preimage of y is a single point.

Let \({\tilde{p}}\) be the point above p corresponding to \(0 \in {\mathbf {F}}_q\). Projection to T identifies \(\pi _1(\widetilde{{\mathsf {T}}-y}, {\tilde{p}})\) with the subgroup \({\mathsf {H}} \leqslant \langle \beta _1, \beta _2 \rangle \) defined by

$$\begin{aligned} {\mathsf {H}} = \text{ stabilizer } \text{ of }\,0 \in {{\mathbf {F}}}_q. \end{aligned}$$

Therefore the first homology of \(\widetilde{{\mathsf {T}}-y}\) is the abelianization of \({\mathsf {H}}\). Under this correspondence, the homology class of the lift of \(\gamma _j\) corresponds to the image in \({\mathsf {H}}^{\mathrm {ab}}\) of \((\beta _2^{-j^*}) \beta _1 \beta _2^j (\beta _2^{j^*}) \in {\mathsf {H}}\).

We must therefore show that the elements \( \beta _2^{-j^*} \beta _1 \beta _2^{j+j^*}\) actually generate \({\mathsf {H}}^{\mathrm {ab}}\). Note that among these elements are \(\beta _1\) and (a conjugate of) \(\beta _1 \beta _2^q\), so it is enough to show that

$$\begin{aligned} \beta _2^q\,\text{ and }\,\beta _2^{-j^*} \beta _1 \beta _2^{j+j^*} \ \ (0 \leqslant j \leqslant q-1) \end{aligned}$$
(8.10)

generate \({\mathsf {H}}^{\mathrm {ab}}\). However, a set of left coset representatives for \({\mathsf {H}}\) are given by \(1, \beta _2, \dots , \beta _2^{q-1}\); according to Schreier’s algorithm a generating set for \({\mathsf {H}}\) is given by

$$\begin{aligned} \beta _2^q, \beta _2^{-[g j]} \beta _1 \beta _2^j, j \in [0,q-1]. \end{aligned}$$

Now considered modulo q the set of pairs \((-[gj], j) \equiv (-gj, j)\) appearing here coincide with the pairs \((-j^*, j+j^*) \equiv (-\frac{gj}{1-g}, \frac{j}{1-g})\) appearing in (8.10). So the elements of (8.10) even generate \({\mathsf {H}}\) not just its abelianization. \(\square \)

We return to the proof of Lemma 8.10. For each primitive homology class in \({\mathsf {S}}_{g-1}^{\circ }\) we fix a representative which is a simple closed curve on \({\mathsf {S}}^{\circ }_{g-1}\) beginning and ending at p. Let W be the resulting collection of simple closed curves. For each \(w \in W\) at least one of the two homotopy classes

$$\begin{aligned} A(w,j) = \gamma _j \cdot w^{\pm 1} \in \pi _1(Y, p) \end{aligned}$$
(8.11)

is representable by a simple closed curve on Y. The choice of sign depends only on w and does not depend on j, in view of property (iv) of the curves \(\gamma _j\). For a picture of the curve A(wj), see Fig. 5.

Fig. 5
figure 5

The curves \(\gamma _j\) and w on Y

The image of this curve in \({\text {Aff}}(q)\) projects to a generator of \({{\mathbf {F}}}_q^*\); therefore it is “liftable” in the sense of Sect. 8.3. Recall also from Sect. 8.3 the notation \(e^+\) for the degree-1 lift of a liftable curve e. The lift of A(wj) has homology class given by

$$\begin{aligned}{}[A(w,j)^+] = [ \gamma _j^+] \pm [w_j], \end{aligned}$$

where \(w_j\) means that we lift w to a closed loop on the jth preimage of \({\mathsf {S}}^{\circ }_{g-1}\) inside Z; the sign above is the same as in (8.11).

We have

$$\begin{aligned}{}[A(w,j)^+] - [A(w', j)^+] = \epsilon [w_j] + \epsilon ' [(w')_j] \ (\epsilon , \epsilon ' \in \pm 1) \end{aligned}$$

and we see readily that these classes—as both \(w, w'\) vary through W—span the homology of the jth preimage of \({\mathsf {S}}_{g-1}^{\circ }\) in the cover Z.

The boundary of \(\widetilde{{\mathsf {S}}_{g-1}^{\circ }}\) is a union of q circles. Considering the Mayer–Vietoris sequence

$$\begin{aligned} H_1(S^1)^q \rightarrow H_1(\widetilde{{\mathsf {S}}_{g-1}^{\circ }}) \oplus H_1(\widetilde{{\mathsf {T}}^{\circ }}) \twoheadrightarrow H_1(Z). \end{aligned}$$

and using the fact that the \([\gamma _j]\) span \(H_1(\widetilde{{\mathsf {T}}^{\circ }})\) modulo its boundary (Lemma 8.11 (iii)), we see that the \([A(w,j)^+]\) span \(H_1(Z)\).

As before, we define

$$\begin{aligned} \widetilde{A(w,j)} = \text{ projection } \text{ of }\,[A(w,j)^+]\ \text{ to } \text{ primitive } \text{ homology. } \end{aligned}$$

so that the homology classes \(\widetilde{A(w,j)}\) span \(H_1^\mathrm {Pr}(Z, Y)\). This completes the proof of part (a) of Lemma 8.10.

To prove part (b), we need to compute some intersection numbers. We note that the intersection number between any \(\gamma _j^+\) and any \(w_k\) is trivial. Thus

$$\begin{aligned} {[}A(w_1,j)] \cdot [A(w_2,k)]= & {} [\gamma _j ] \cdot [\gamma _k] \pm [w_1] \cdot [w_2]\\ {[} A(w_1, j)^+] \cdot [A(w_2,k)^+]= & {} [\gamma _j^+ \cdot \gamma _k^+] \pm \delta _{jk} [w_1] \cdot [w_2], \end{aligned}$$

where \(\delta _{jk}\) is the Kronecker \(\delta \) symbol, and, in both instances, the sign that appears the product of the sign for \(w_1\) and the sign for \(w_2\). Upon projecting to primitive homology, (8.6) gives

$$\begin{aligned}&\widetilde{A(w_1, j)} \cdot \widetilde{A(w_2, k)}\\&\quad =\pm ( \delta _{jk} -q^{-1})[w_1] \cdot [w_2] + ([\gamma _j^+] \cdot [\gamma _k^+] - q^{-1} [\gamma _j \cdot \gamma _k]). \end{aligned}$$

The connectedness of the “intersection graph” follows from this. It is enough to show that given \((w_1, j)\) and \((w_2, \ell )\) there exists \((w_3, k)\) with both intersection numbers nonzero. For this, we note that the factor \(( \delta _{jk} - q^{-1})\) is never zero, so we simply choose \(w_3\) so that \([w_1] \cdot [w_3]\) and \([w_2] \cdot [w_3]\) are sufficiently large: this is possible because \([w_1], [w_2] \ne 0\), the intersection pairing on \(H_1(S_{g-1}^{\circ })\) is perfect, and we can choose \(w_3\) such that \([w_3]\) is any given primitive homology class. This proves Lemma 8.10 and Theorem 8.1. \(\square \)

9 Transcendence of period mappings; the Bakker–Tsimerman theorem

It is desirable to extend the method to settings where the base Y is higher-dimensional, thus feasibly leading to finiteness results for integral points on Y. We will study the example when \(X \rightarrow Y\) is the moduli space of smooth hypersurfaces in \({{\mathbf {P}}}^m\); then integral points on Y correspond to integral homogeneous polynomials \( P(x_0, \dots , x_{m})\) of degree d whose discriminant \((\mathrm {disc} \ P) \in {\mathcal {O}}^*\).

(A natural family of generalizations of this example is given by considering the integral points on \({\mathbf {P}}^m - Z^{\vee }\), where \(Z \subset {\mathbf {P}}^m\) is a smooth subvariety, and \(Z^{\vee }\) is the dual projective variety to Z: there is a natural smooth projective family over \({\mathbf {P}}^m-Z^{\vee }\), namely, the family of smooth hyperplane sections of Z.)

9.1 The Ax–Schanuel theorem of Bakker and Tsimerman

Suppose that we are given a smooth proper map \(X \rightarrow Y\) of relative dimension d over the complex numbers (we identify complex algebraic varieties with their complex points). The primitive cohomology of each fiber \(H^d(X_y, {{\mathbf {C}}})^{\mathrm {prim}}\) carries a polarized Hodge structure. Let \({\mathfrak {H}}\) be the associated period domain which classifies polarized Hodge structures with the same numerical data as this primitive cohomology, so we have an analytic period map

$$\begin{aligned} \Phi {:}\,{\widetilde{Y}} \longrightarrow {\mathfrak {H}} \end{aligned}$$

where \({\widetilde{Y}}\) is the universal cover of \(Y({{\mathbf {C}}})\). This \({\mathfrak {H}}\) is open (for the analytic topology) in a certain complex flag variety \({\mathfrak {H}}^*\), which parameterizes isotropic flags with a given dimensional data inside a certain orthogonal or symplectic complex vector space.

Bakker and Tsimerman [1] have proven the following analogue of the Ax–Schanuel theorem. It is a very strong statement about the transcendence of \(\Phi \).

To simplify the statement, we assume that the monodromy mapping

$$\begin{aligned} \pi _1(Y) \rightarrow \mathrm {Aut}(H^d(X_y, {{\mathbf {C}}})^{\mathrm {prim}}) \end{aligned}$$

has image whose Zariski closure contains the full special orthogonal or symplectic group, stabilizing the intersection form. (This restriction, which guarantees that the image \(\Phi ({\widetilde{Y}})\) is Zariski-dense in \({\mathfrak {H}}^*\), is not important, and in [1] the theorem is formulated for an arbitrary Mumford–Tate domain as target.)

Theorem 9.1

(Theorem of Bakker and Tsimerman.) Suppose that \(V \subset Y \times {\mathfrak {H}}^*\) is algebraic. Write W for the image of \({\tilde{Y}}\) in \(Y \times {\mathfrak {H}}\). Suppose that \(U \subset V \cap W\) is irreducible analytic such that

$$\begin{aligned} \mathrm {codim}_{Y \times {\mathfrak {H}}^*}(U) < \mathrm {codim}_{Y \times {\mathfrak {H}}^*}(V) + \mathrm {codim}_{Y \times {\mathfrak {H}}^*}(W), \end{aligned}$$

where all the codimensions are taken inside \(Y \times {\mathfrak {H}}^*\). Then the projection of U to Y is contained in a proper (“weak Mumford–Tate”) subvariety.

In particular this has the following corollary:

Corollary 9.2

(Transcendence property of period mappings) With notation as above, suppose that \(Z \subset {\mathfrak {H}}^*\) is an algebraic subvariety, and

$$\begin{aligned} \mathrm {codim}_{{\mathfrak {H}}^*}(Z) \geqslant \dim (Y). \end{aligned}$$
(9.1)

Then any irreducible component of \(\Phi ^{-1}(Z)\) is contained inside the preimage, in \({\widetilde{Y}}\), of the complex points of a proper subvariety of Y.

Proof

Let Q be an irreducible component as in the statement of the corollary.

Let \(V = Y \times Z\). The intersection \(W^Z\) of W with \(Y \times (Z \cap {\mathfrak {H}})\), intersection taken in \(Y \times {\mathfrak {H}}\), is an analytic set. Moreover, the image of Q under the analytic map \({\tilde{Y}} \rightarrow Y \times {\mathfrak {H}}\) is contained in \(W^Z\). Therefore, the image of Q is contained in some irreducible component of \(W^Z\), call it U:

$$\begin{aligned} U = \text{ an } \text{ irreducible } \text{ component } \text{ of }\,W \cap \left( Y \times (Z \cap {\mathfrak {H}})\right) . \end{aligned}$$

We apply Theorem 9.1 with this choice of UVW. Then

$$\begin{aligned} \mathrm {codim}_{Y \times {\mathfrak {H}}^*} V = \mathrm {codim}_{{\mathfrak {H}}^*} Z \text{ and } \mathrm {codim}_{Y \times {\mathfrak {H}}^*} W = \dim {\mathfrak {H}}^*, \end{aligned}$$

so

$$\begin{aligned} \mathrm {codim}_{Y \times {\mathfrak {H}}^*} W + \mathrm {codim}_{Y \times {\mathfrak {H}}^*} V = \dim {\mathfrak {H}}^* +\mathrm {codim}_{{\mathfrak {H}}^*} Z {\mathop {\geqslant }\limits ^{(9.1)}} \dim {\mathfrak {H}}^* + \dim (Y). \end{aligned}$$

This shows \(\dim U = 0\), unless the projection of U to Y is contained in a proper weak Mumford–Tate subvariety. This implies the same property for Q, as desired. \(\square \)

9.2 Transferring transcendence to a p-adic setting

Theorem 9.1 can be transferred to the p-adic setting, which is where we use it:

With notation as above, suppose additionally that \(X \rightarrow Y\) is defined over \({{\mathbf {Z}}}[S^{-1}]\). Fix \(p \notin S\) and \(y_0 \in Y({{\mathbf {Z}}}_p)\). As before we can form the p-adic period map

$$\begin{aligned} \Phi _p{:}\,\underbrace{ \text{ residue } \text{ disk } \text{ around }\,y_0\,\text{ in }\,Y({{\mathbf {Q}}}_p)}_{U_p} \longrightarrow {\mathfrak {H}}^*_{{{\mathbf {Q}}}_p}, \end{aligned}$$
(9.2)

where \({\mathfrak {H}}^*_{{{\mathbf {Q}}}_p}\) is the base-change of \({\mathfrak {H}}^*\) to \({{\mathbf {Q}}}_p\); the map above is p-adic analytic, i.e., it is given in suitable coordinate charts by power series absolutely convergent on the residue disk.

Now suppose that we give ourselves a \({{\mathbf {Q}}}_p\)-algebraic subvariety \(Z \subset {\mathfrak {H}}^*_{{{\mathbf {Q}}}_p}\) satisfying the dimensional condition (9.1), i.e. the codimension of Z is greater than or equal to the dimension of Y.

Lemma 9.3

Let \(U_p\) be as in (9.2), i.e. \(\{y \in Y({{\mathbf {Z}}}_p){:}\,y \equiv y_0 \text{ modulo } p\}\). The set

$$\begin{aligned} \Phi _p^{-1}(Z) \end{aligned}$$

is not Zariski dense in Y.

Note that \(\Phi _p\) is defined only on the residue disk \(U_p\).

Proof

Footnote 7 It will be convenient to have the freedom to vary \(y_0\) later in the argument. To that end, note that the statement above depends only on \(U_p\); after all, at the level of points, \(\Phi _p\) is the map sending \(y \in U_p\) to the induced Hodge filtration on the primitive crystalline cohomology of the special fiber of \(X_y\).

By [28, Thm. 7.6] (or by the discussion of Sect. 3.3) the image of \(\Phi _p(U_p)\) is contained in a residue disk on \({\mathfrak {H}}_p^*\) containing \(\Phi _p(y_0)\), in particular, in some affine open set \({{\text {Spec}}}A_p\) of \({\mathfrak {H}}_p^*\) containing \(\Phi _p(y_0)\). We may suppose that \(Z \subset {\mathfrak {H}}_{{{\mathbf {Q}}}_p}^*\) is defined locally by equations \(F_i = 0\), where we suppose \(F_i \in A_p\), i.e. the \(F_i\) are regular functions on this affine open set.

Consider now

$$\begin{aligned} G_i = F_i \circ \Phi _p. \end{aligned}$$

These are defined by power series converging absolutely in \(U_p\), i.e. in a suitable choice of local coordinates, \(G_i\) lies in a Tate algebra

$$\begin{aligned} R= {{\mathbf {Q}}}_p \left\langle \frac{x_1}{p}, \dots , \frac{x_N}{p} \right\rangle \end{aligned}$$

of formal power series convergent on a disk of p-adic radius |p|. In these coordinates \(U_p\) corresponds to \((x_1, \dots , x_N) \in (p {{\mathbf {Z}}}_p)^N\). We want to show that the common zero-locus, inside \(U_p\), of the \(G_i\) is contained in (the \({{\mathbf {Q}}}_p\)-points of) an algebraic set. As a preliminary reduction, we will reduce to considering a single “irreducible component” of this common zero locus.

Fix a suitable open affine set \({{\text {Spec}}}\ B_p \subset Y_{{{\mathbf {Q}}}_p}\) “containing the residue disc of \(y_0\).” (More precisely, we may fix an open affine neighbourhood in Y, considered now as \({{\mathbf {Z}}}_p\)-scheme, of the image of the \({{\mathbf {Z}}}_p\)-valued point \(y_0{:}\,{{\text {Spec}}}\ {{\mathbf {Z}}}_p \rightarrow Y\), and take its generic fiber.) Then there is a morphism from \(B_p\) to R. Our result will now follow from the

Claim: Let \({\mathfrak {p}}\) be a prime ideal of R, vanishing at some point of \(U_p\). Suppose that \({\mathfrak {p}}\) is minimal among prime ideals containing \(\langle G_1, \dots , G_n \rangle \). Then \({\mathfrak {p}}\) contains (the image in R of) a regular function H, i.e. a function H belonging to \(B_p\) as above.

To see why this implies the statement of the lemma, assume the Claim. There are only finitely many such minimal primes as in the statement. Call them \({\mathfrak {p}}_1, \dots , {\mathfrak {p}}_t\). Let \(H_j \in {\mathfrak {p}}_j\) the function constructed according to the claim above. Then the vanishing locus of \(\prod _j H_j\) contains the common vanishing locus of the \(G_i\): if y lies in this common zero-locus, it lies in the vanishing locus of some \({\mathfrak {p}}_j\), and then \(H_j(y) = 0\).

We now prove the Claim. The ideal \({\mathfrak {p}}\) vanishes at some point at \(U_p\) by assumption; choose such a point \(y_0\).

We now transfer the question to the complex numbers. We fix an isomorphism \(\sigma {:}\,\overline{{{\mathbf {Q}}}_p} \simeq {{\mathbf {C}}}\), which gives in particular an embedding \(\sigma {:}\,{{\mathbf {Q}}}_p \hookrightarrow {{\mathbf {C}}}\). Then \(y_0\) gives rise to a complex point \(y_0^{\sigma } \in Y({{\mathbf {C}}})\), and the de Rham cohomology of \(X_{y_0}^{\sigma }\) is obtained from that of \(X_{y_0}\) via \(\sigma \):

$$\begin{aligned} H^*_{\mathrm {dR}}(X_{y_0}) \otimes _{({{\mathbf {Q}}}_p, \sigma )} {{\mathbf {C}}}= H^*_{\mathrm {dR}}(X_{y_0^{\sigma }}/{{\mathbf {C}}}). \end{aligned}$$
(9.3)

We may regard the period map \(\Phi _p\) as taking values in the Grassmannian \({\mathfrak {H}}^*_{{{\mathbf {Q}}}_p}\) for the left-hand de Rham cohomology. Also let \(U_{{{\mathbf {C}}}}\) be a small complex neighbourhood of \(y_0^{\sigma }\) and let \(\Phi _{{{\mathbf {C}}}}{:}\,U_{{{\mathbf {C}}}} \rightarrow {\mathfrak {H}}_{{{\mathbf {C}}}}^*\) be the complex period mapping, which we regard as taking values in the associated complex variety \({\mathfrak {H}}^*_{{{\mathbf {C}}}} := \left( {\mathfrak {H}}^*_{{{\mathbf {Q}}}_p}\right) ^{\sigma }\). This complex variety parameterizes certain flags inside the right-hand space of (9.3). Note the identification \(\Phi _{{{\mathbf {C}}}}(y_0^{\sigma }) = \Phi _p(y_0)^{\sigma }\).

Now Z gives rise to an algebraic subvariety \(Z^{\sigma } \subset {\mathfrak {H}}^*_{{{\mathbf {C}}}}\) and this subvariety again satisfies condition (9.1). The functions \(F_i\) are regular on an open affine containing \(\Phi _p(y_0)\); correspondingly we obtain \(F_i^{\sigma }\) on an affine open in \({\mathfrak {H}}_{{{\mathbf {C}}}}^*\) containing \(\Phi _{{{\mathbf {C}}}}(y_0^{\sigma })\), which locally cut out \(Z^{\sigma }\).

Ignoring convergence for a moment, regard the \(G_i\) in the completed local ring of \(Y_{{{\mathbf {Q}}}_p}\) at \(y_0\). This is a formal power series ring over \({{\mathbf {Q}}}_p\), and \(\sigma \) induces an injection from this completed local ring to the corresponding completed local ring of \(Y_{{{\mathbf {C}}}}\) at \(y_0^{\sigma }\); call this map \(G \mapsto G^{\sigma }\). Then we have in fact

$$\begin{aligned} G_i^{\sigma } = \text{ power } \text{ series } \text{ expansion } \text{ of }\,F_i^{\sigma } \circ \Phi _{{{\mathbf {C}}}}\,\text{ at }\,y_0^{\sigma }. \end{aligned}$$
(9.4)

This follows from just the same analysis of Sect. 3.3, or phrased informally, from the fact that the complex and p-adic period map satisfy the same differential equation.

It follows from (9.4) that the \(G_i^{\sigma }\), a priori complex formal power series, are in fact convergent in a small complex neighbourhood of \(y_0^{\sigma }\); their common vanishing locus for a sufficiently small such neighbourhood V coincides with \(\Phi _{{{\mathbf {C}}}}^{-1}(Z^{\sigma }) \cap V\).

Corollary 9.2, applied to \(Z^{\sigma } \subset {\mathfrak {H}}^*_{{{\mathbf {C}}}}\), shows that \(\Phi _{{{\mathbf {C}}}}^{-1}(Z^{\sigma }) \cap V \subset Y_{{{\mathbf {C}}}}\) is not Zariski dense in \(Y_{{{\mathbf {C}}}}\). Indeed, after analytically continuing \(\Phi _{{{\mathbf {C}}}}\) from V to a universal cover of \(Y_{{{\mathbf {C}}}}\), there are only finitely many irreducible components of \(\Phi _{{{\mathbf {C}}}}^{-1}(Z^{\sigma })\) which intersect V (by local finiteness of irreducible components of an analytic set). We can apply Corollary 9.2 to each of them to conclude that the common zero-locus of \(G_i^{\sigma }\) on V is contained in the zero locus of some algebraic function G (i.e., G arises from a regular function on a Zariski-open subset of \(Y_{{{\mathbf {C}}}}\) containing V).

Consider the ring \(R_{{{\mathbf {C}}}} = {{\mathbf {C}}}\{x_1, \dots , x_n\}\) of formal power series that are convergent in some neighbourhood of 0. Given an ideal I of this ring, we can associate a germ V(I) of an analytic set at the origin. The locally analytic Nullstellensatz [9, §3.4] asserts that the ideal of functions vanishing along this germ is precisely the radical \(\sqrt{I}\) of I.

We apply this with \(R_{{{\mathbf {C}}}}\) the ring of germs of holomorphic functions near \(y_0^{\sigma } \in Y_{{{\mathbf {C}}}}\), taking I to be the ideal generated by the \(G_i^{\sigma }\). Then \(\sqrt{I}\) is the ideal of functions vanishing on V(I) and in particular contains G. Thus \(G^m \in I\) for some \(m \geqslant 1\).

Therefore the ideal spanned by \(G_i^{\sigma }\) inside the ring of locally convergent power series contains the image of an algebraic function, i.e. a regular function on some Zariski-open subset of \(Y_{{{\mathbf {C}}}}\) containing \(y_0^{\sigma }\). The same is then a fortiori true if we replace “locally convergent” by “formal,” and this latter assertion can be carried back, via \(\sigma ^{-1}\), to \(Y_{\overline{{{\mathbf {Q}}}_p}}\). Thus, there is a regular function H, in a neighbourhood of \(y_0\) on \(Y_{\overline{{{\mathbf {Q}}}_p}}\), belonging to the ideal

$$\begin{aligned} H \in \langle G_1, \dots , G_k \rangle \end{aligned}$$
(9.5)

generated by the \(G_i\) in the completed local ring \(\widehat{{\mathcal {O}}}\) of \(Y_{\overline{{{\mathbf {Q}}}_p}}\) at \(y_0\).

By taking a norm we may suppose that H in fact arises from a regular function in a neighbourhood of \(y_0\) on \(Y_{{{\mathbf {Q}}}_p}\). Without loss of generality (multiplying by a suitable denominator if necessary), we may suppose that H is regular on the chosen open affine around \(y_0\), i.e., \(H \in B_p\). Note that \(B_p \otimes \overline{{{\mathbf {Q}}}_p}\) surjects on to each quotient \(\widehat{{\mathcal {O}}}/{\mathfrak {m}}_{\widehat{{\mathcal {O}}}}^t\) (where \({\mathfrak {m}}_{\widehat{{\mathcal {O}}}}\) is the maximal ideal). Therefore, for each \(t \geqslant 1\), there are \(Z_1, \dots , Z_k \in B_p \otimes \overline{{{\mathbf {Q}}}_p}\) such that

$$\begin{aligned} H \in \sum Z_i G_i + {\mathfrak {m}}_{\widehat{{\mathcal {O}}}}^t. \end{aligned}$$
(9.6)

By linear algebra we see that we can even choose \(Z_i \in B_p\).

The function H then defines a rigid-analytic function on the residue disk of \(y_0\). Thus H and \(G_i\) both lie inside the Tate algebra R previously defined. Recall that we have fixed a prime ideal \({\mathfrak {p}}\) of R, contained in the maximal ideal \({\mathfrak {m}}\) associated to \(y_0\), and containing the ideal J generated by the \(G_i\) inside R.

Now (9.6) implies that

$$\begin{aligned} H \in J + {\mathfrak {m}}^t \end{aligned}$$

for every \(t \geqslant 1\). Then the image of H in \(R/{\mathfrak {p}}\) lies in the intersection \(\bigcap _{t \geqslant 1} {\mathfrak {m}}^t\). Krull’s intersection theorem, applied to the Noetherian integral domain \(R/{\mathfrak {p}}\), implies that the intersection of powers of \({\mathfrak {m}}\) is trivial. Therefore \(H \in {\mathfrak {p}}\), as desired. \(\square \)

10 Bounds on points with good reduction

Let \(\pi {:}\,X \rightarrow Y\) be a smooth proper morphism over \({{\mathbf {Z}}}[S^{-1}]\), whose fibers are geometrically connected of relative dimension d. The goal of this section is to bound \(Y({{\mathbf {Z}}}[S^{-1}])\) by means of the same general techniques we have used elsewhere in the paper, i.e., by studying the variation of p-adic Galois representations of the fibers. We refer the reader to the Introduction (Sect. 1) for a discussion of the methods and how they compare with the curve case; the main difference in this general setting is that the linear algebra arguments required to avoid semisimplicity are much more elaborate, and are discussed in Sect. 11.

10.1 Statement of the result

Fix \(y_0 \in Y({{\mathbf {C}}})\), with fiber \(X_0\) and set \({\mathsf {V}}_0 = H^d(X_{0,{{\mathbf {C}}}}, {{\mathbf {Q}}})^{\mathrm {prim}}\). This is equipped with an intersection form \(\langle -, - \rangle \). Assume that the image of

$$\begin{aligned} \pi _1(Y_{{{\mathbf {C}}}},y_0) \rightarrow {{\text {Aut}}}\left( {\mathsf {V}}_0 \otimes {{\mathbf {C}}}, \langle -, -\rangle \right) \end{aligned}$$
(10.1)

has Zariski closure containing the identity component of the right-hand group.

The Hodge structure on \({\mathsf {V}}_0\) induces a weight-zero Hodge structure on

$$\begin{aligned} \mathrm {Lie} \ \ \mathrm {GAut}\left( {\mathsf {V}}_0 \otimes {{\mathbf {C}}}, \langle -, - \rangle \right) \simeq {{\mathbf {C}}}\oplus \mathrm {Sym}^2 {\mathsf {V}}_0 \text{ or } {{\mathbf {C}}}\oplus \wedge ^2 {\mathsf {V}}_0, \end{aligned}$$
(10.2)

according to the parity of \(\langle -, - \rangle \). We will refer to this as the adjoint Hodge structure to distinguish it from the Hodge structure on \({\mathsf {V}}_0 \otimes {{\mathbf {C}}}\).

Let \(h^p\) be the dimension of the Hodge component \((p,-p)\) in the adjoint Hodge structure.Footnote 8 For any \(E \in {\mathbf {Z}}_{\geqslant 0}\) that is at most the dimension of the adjoint Hodge structure, let

$$\begin{aligned} T(E) = \text{ sum } \text{ of } \text{ the } \text{ topmost } E \text{ Hodge } \text{ numbers. } \end{aligned}$$

Here the Hodge numbers are the list of ps for which \(h^{p} \ne 0\), each written with multiplicity \(h^p\); thus, for example, if \(p_{\max }\) is the largest p for which \(h^{p} \ne 0\), then \(T(1) = p_{\max }\), and if \(h^{p_{\max }} > 1\) then \(T(2)=2p_{\max }\).

We can extend T to be a continuous piecewise linear function \([0, \sum _{j \in {\mathbf {Z}}} h^j] \rightarrow {\mathbf {R}}_{\geqslant 0}\) such that \(T(0) = 0\), and with derivative specified as

$$\begin{aligned} T'(x) = {\left\{ \begin{array}{ll} p_{\max }&{}\quad \text{ for }\,x \in (0, h^{p_{\max }}),\\ p_{\max }-1 &{}\quad \text{ for }\,x \in (h^{p_{\max }}, h^{p_{\max }}+h^{p_{\max }-1}), \\ \text{ and } \text{ so } \text{ forth. } &{} \\ \end{array}\right. } \end{aligned}$$
(10.3)

The transcendence property of period mappings is an essential ingredient in the following theorem. It says that integral points on the base are not Zariski dense whenever the adjoint Hodge structure is quite “spread out,” that is to say, whenever the contribution of large |p| to the total dimension \(\sum h^{p}\) is large.

Theorem 10.1

Let \(\pi {:}\,X \rightarrow Y\) be a smooth proper morphism over \({{\mathbf {Z}}}[S^{-1}]\), whose fibers are geometrically connected of relative dimension d. With notation as above, suppose that the monodromy representation has large image, i.e. that (10.1) is satisfied, and moreover that

$$\begin{aligned} \sum _{p >0} h^p \geqslant h^0 + \dim (Y) \end{aligned}$$
(10.4)

and

$$\begin{aligned} \sum _{p>0} p h^{p} > T\left( h^{0}+\dim (Y)\right) + T\left( \frac{3}{2} h^{0}+\dim (Y)\right) . \end{aligned}$$
(10.5)

Then \(Y({{\mathbf {Z}}}[S^{-1}])\) is not Zariski dense in Y.

If we assume, moreover, that the monodromy representation for any subvariety \(Y' \subset Y\) continues to have large imageFootnote 9 [see (10.1)], then in fact \(Y({{\mathbf {Z}}}[S^{-1}])\) is finite.

Roughly speaking, a condition of type (10.4) is easily seen to be necessary for our method: with reference to the discussion of Sect. 1.2 we want Y to be transverse to all orbits of a certain group \(\mathrm {Z}(\phi )\) on a flag variety; the dimension of the flag variety is \(\sum _{p > 0} h^p\), and in our argument we shall bound the dimension of \(\mathrm {Z}(\phi )\) above by \(h^0\). Equation 10.5 is in practice a much more restrictive condition and is needed to control semisimplification.

The combinatorial machinations that give rise to inequality 10.5 could probably be greatly optimized. We aimed to give a treatment that was fairly short, at some cost to the sharpness of the results. Informally speaking, the condition says that the Hodge diamond of Y is not very concentrated near the middle.

10.2 Application to hypersurfaces

We will now outline the proof of the following statementFootnote 10:

Proposition 10.2

There exists \(n_0\) and a function \(D_0(n)\) such that both (10.4) and (10.5) apply to \(X \rightarrow Y\) the universal family of hypersurfaces in \({\mathbf {P}}^n\) of degree d, so long as \(n \geqslant n_0\) and \(d \geqslant D_0(n)\).

Numerical experiments suggest that \(n_0 \approx 60\) will do. Note that this family indeed has large monodromy image by [3].

We must emphasize that, in this case, the dimension of Y is very large, and so the statement that \(Y({{\mathbf {Z}}}[S^{-1}])\) is not Zariski dense is very modest indeed; but it seems to us an interesting first step, and potentially one can then iterate the argument by replacing Y by the Zariski closure of integral points. As suggested by the last line of the Theorem, it becomes relevant to analyze the following question:

What is the smallest possible codimension of a subvariety \(Y' \subset Y\) along which the monodromy drops?

In outline, the proof of Proposition 10.2 is as follows. It can be verified (we will omit the proof) that the middle Hodge numbers \(h^{pq}\) of a degree-d hypersurface inside \({\mathbf {P}}^n\) satisfy

$$\begin{aligned} h^{pq}(d) \sim \frac{d^n}{n!} A(n,p) \end{aligned}$$
(10.6)

where \(p+q=n-1\), and A(np) is the Eulerian number: the number of permutations \(\sigma {:}\,\{1, \dots , n\} \rightarrow \{1, \dots , n\}\) with the property that \(\sigma (i+1) > \sigma (i)\) for precisely p values of i. (Here we fix the dimension n of the ambient projective space, and the meaning of \(\sim \) is that the ratio approaches 1 as \(d \rightarrow \infty \).) Now consider \(\alpha _p := \frac{1}{n!} A(n, p)\), which defines a probability distribution on \(p \in \{0, \dots , n-1\}\). The conclusion will be deduced, in essence, from the fact that \(\alpha _p\) is well approximated by a binomial distribution with mean n/2 and variance n/12. We now describe the details.

First, consider the Hodge numbers \(h^p\) for the adjoint Hodge structure. Since the dimension of the symmetric or adjoint square of a k-dimensional vector space equals \(\frac{k^2 \pm k}{2}\), we have

$$\begin{aligned} 2 h^{p} = \sum _{p_1 + p_2 = p + (n-1)} h^{p_1, q_1} h^{p_2, q_2} \pm h^{(p+n-1)/2,(-p+n-1)/2} \end{aligned}$$

where in all cases \(p_1+q_1=p_2+q_2=n-1\). In particular, we deduce that

$$\begin{aligned} h^p \sim d^{2n} \frac{1}{2} \underbrace{ \sum _{p_1 -p_2 = p} \alpha _{p_1} \alpha _{p_2}}_{\beta _p}. \end{aligned}$$
(10.7)

Next, note that the dimension of the moduli space of degree d hypersurfaces in \({\mathbf {P}}^n\) is given by

$$\begin{aligned} {n+d \atopwithdelims ()d-1} -1 = \frac{d (d+1) \dots (d+n)}{(n+1)!} -1 \sim \frac{d^{n+1}}{(n+1)!} \end{aligned}$$

where the meaning of \(\sim \) is as before. In particular, for any fixed \(n \geqslant 2\),

$$\begin{aligned} \lim _{d \rightarrow \infty } \frac{\dim Y}{h^0} = 0, \end{aligned}$$
(10.8)

where \(h^0\) is the dimension of the zeroth Hodge number for the adjoint structure.

Let X(n) (or just X for short) be the random variable which sends a uniformly distributed random permutation \(\sigma \) of \(\{1, \dots , n\}\) to the number of i for which \(\sigma (i+1) > \sigma (i)\), subtract \(\frac{n-1}{2}\). Write \(y_i (1 \le i \le n-1)\) for the random variable, on the same space, with value 1/2 if \(\sigma (i+1)> \sigma (i)\), and \(-1/2\) if \(\sigma (i+1) < \sigma (i)\). Thus \(X = \sum y_i\) and the expectation \({\mathbb {E}}(X)\) is zero. The variance of X is then given by

$$\begin{aligned} \text{ Var }(X) = \sum _{i,j} {\mathbb {E}}(y_i y_j) = \underbrace{ \frac{n-1}{4} }_{i=j} - \underbrace{ 2 \frac{n-2}{12} }_{|i-j|=1} = \frac{n+1}{12}. \end{aligned}$$
(10.9)

Now let \(X'(n)\) be the random variable obtained by convolving X(n) with itself, i.e. with adding together two copies of X(n). Then

$$\begin{aligned} \text{ Var }(X') =2 \text{ Var }(X) = \frac{n+1}{6},\, \text{ and } \text{ the } \text{ probability } \text{ that }\,(X' =p) = \beta _p,\nonumber \\ \end{aligned}$$
(10.10)

where \(\beta _p\) is as in (10.7). Moreover, it is also known (see [8] for discussion and references to the literature) that as \(n \rightarrow \infty \),

$$\begin{aligned}&X(n)/\sqrt{n}\,\nonumber \\&\quad \text{ converges } \text{ in } \text{ distribution } \text{ to } \text{ a } \text{ normal } \text{ distribution } \text{ with } \text{ variance }\,1/12.\nonumber \\ \end{aligned}$$
(10.11)

and it follows then that \(X'(n)/\sqrt{n}\) converges in distribution to a normal distribution with variance 1/6. It follows in particular that

$$\begin{aligned} \sum _{p> 0} p \beta _p > A \sqrt{n} \end{aligned}$$
(10.12)

for some absolute \(A >0\). We also need:

Lemma 10.3

For sufficiently large n, we have \(\beta _0 < \frac{40}{\sqrt{n}}\).

Proof

The sequence \(\beta _p\) is symmetric and log-concave. The symmetry follows readily from the definition, whereas the second statement follows from the classical fact that the Eulerian numbers are log-concave. (See, for example, [20, Thms 1.4, 3.3].)

Let \(c = \frac{1}{40}\). This number is chosen to be less than the density of the normal distribution with mean zero and variance 1/6, at the point 1.1. From the convergence in distribution of \(X'\), it follows that for all large enough n there exists \(P > \sqrt{n}\) with the property that \(\beta _{P} > \frac{c}{\sqrt{n}}\).

We show that \(\beta _0\leqslant \frac{c^{-1}}{\sqrt{n}}\). Suppose not; then log-concavity means

$$\begin{aligned} \beta _p > \frac{ (1/c)^{1-p/P} c^{p/P} }{\sqrt{n}} \end{aligned}$$

for all \(p \in [0, P]\). In particular, this implies that \(\beta _p > \frac{1}{\sqrt{n}}\) whenever \(|p| \le P/2\). This contradicts \(\sum \beta _p = 1\) for large enough n. \(\square \)

Proof of Proposition 10.2

In what follows, write “for big enough n and d” as an abbreviation for “for \(n \geqslant n_0\) and \(d \geqslant D_0(n)\), for some function \(D_0\) of n.”

There are two conditions to be checked, (10.4) and (10.5). That the former condition holds for big enough n and d follows from (10.8), (10.7) and the convergence in distribution of \(X'(n)/\sqrt{n}\). It remains then to verify that (10.5) holds for big enough n and d.

Write T(y) for the sum of the topmost y adjoint Hodge numbers and H for the total dimension of the adjoint Hodge structure. We claim that

$$\begin{aligned} 2 T(2 h^0) < \sum _{p >0} p h^p, \end{aligned}$$

for big enough n and d. That statement readily implies the desired conclusion, in view of (10.8).

By Lemma 10.3, for sufficiently large n we have \(\beta _0 < \frac{40}{\sqrt{n}}\). Therefore, for d sufficiently large (depending on n) we have \(h^0 < 40H/\sqrt{n}\). On the other hand, by (10.12), the right-hand side \(\sum _{p > 0} p h^p\) is bounded below by a constant multiple of \( H \sqrt{n}\), for big enough d and n. Therefore, it is enough to verify that, for fixed positive constants \(c, \delta \), we have the inequality

$$\begin{aligned} T(\frac{c H}{\sqrt{n}}) \leqslant \delta H \sqrt{n} \end{aligned}$$
(10.13)

for big enough n and d.

Let \(\epsilon = \frac{\delta }{2c}\). Separate the contribution of Hodge numbers above and below \(\epsilon n\) to T; we get:

$$\begin{aligned} T(\frac{cH}{\sqrt{n}}) \leqslant (\epsilon n) \frac{cH}{\sqrt{n}} + \sum _{p > \epsilon n} p h^p \end{aligned}$$

Now the first quantity is bounded by \(\frac{1}{2} \delta H \sqrt{n}\). The second quantity equals

$$\begin{aligned} \sum _{p> \epsilon n} p h^p = H \sum _{p> \epsilon n} p \beta _p + H \sum _{p > \epsilon n} p \left( \frac{h_p}{H} -\beta _p \right) . \end{aligned}$$

There is a function \(D_1\) such that, for \(d \geqslant D_1(n)\), the second term is at most H. Also, using the variance bound \(\sum p^2 \beta _p = \frac{n+1}{6}\), the first term is at most \(H \epsilon ^{-1}\). Thus,

$$\begin{aligned} T(\frac{cH}{\sqrt{n}}) \leqslant \frac{1}{2} \delta H \sqrt{n} + H(1 + \epsilon ^{-1}), \end{aligned}$$

and the latter term is certainly bounded above by \(\frac{1}{2} \delta H \sqrt{n}\) for \(n \geqslant n_0\). This concludes the proof of (10.13), so also of our Proposition. \(\square \)

10.3 Setup for the proof of Theorem 10.1

In what follows, \(\ell \) denotes an arbitrary prime number not belonging to the fixed set S.

Working in the complex analytic category, let \({\mathsf {V}} = {\mathbf {R}}^d \pi _* {{\mathbf {Q}}}\). It is a local system of \({{\mathbf {Q}}}\)-vector spaces on \(Y({{\mathbf {C}}})\) (and it admits an integral structure); the \({\mathsf {V}}_0\) defined in Sect. 10.1 is its fiber above \(y_0\).

Let \({\mathbf {G}}\) be the connected automorphism group of the intersection form on \({\mathsf {V}}_0\), a semisimple \({{\mathbf {Q}}}\)-group; also let

$$\begin{aligned} {\mathbf {G}}' = \mathrm {GAut}({\mathsf {V}}_0, \langle -, - \rangle ), \end{aligned}$$

the corresponding generalized automorphism group, where we permit to scale the form \(\langle -, - \rangle \).

Passing to \(\ell \)-adic étale cohomology, there is a monodromy mapping \(\pi _1^{\mathrm {arith}}(Y, y_0) \longrightarrow {\mathbf {G}}'({{\mathbf {Q}}}_{\ell })\) and the section associated to an integral point \(y \in Y({{\mathbf {Z}}}[S^{-1}])\) gives a representation

$$\begin{aligned} \rho _{y,\ell }{:}\,G_{{{\mathbf {Q}}}} \longrightarrow {\mathbf {G}}'({{\mathbf {Q}}}_{\ell }). \end{aligned}$$

This describes the Galois action on the primitive geometric étale cohomology of the fiber \(X_y\) in degree d (after using an isomorphism \({\mathsf {V}}_0 \otimes {{\mathbf {Q}}}_{\ell } \simeq {\mathsf {V}}_y \otimes {{\mathbf {Q}}}_{\ell }\)).

In what follows we will freely use certain results about Galois representations into \({\mathbf {G}}'\) which are parallel to certain known results about \({{\text {GL}}}_n\)-valued representations; we refer to Sect. 2.3 for further discussion of these points.

We denote by \(\rho _{y,\ell }^{\mathrm {ss}}\) the semisimplification of \(\rho _{y,\ell }\) relative to \({\mathbf {G}}'\) (see Sect. 2.3). By Faltings’ finiteness theorem (Lemma 2.6) there are only finitely many possibilities for the \({\mathbf {G}}'({{\mathbf {Q}}}_{\ell })\)-conjugacy class of \(\rho _{y,\ell }^{\mathrm {ss}}\).

We must understand the variation of the representation \(\rho _y\) with y; as usual, we will study this using the period mapping. We begin with the complex Hodge structures.

The Hodge structure on \({\mathsf {V}}_y\), the fiber of \({\mathsf {V}}\) at y, is given by a self-dual filtration

$$\begin{aligned} {\mathsf {V}}_y = F^0 {\mathsf {V}}_y \supset \cdots \supset F^i {\mathsf {V}}_y \supset \cdots \end{aligned}$$
(10.14)

and in this way we can regard the period mapping as

$$\begin{aligned} \text{ universal } \text{ cover } \text{ of } Y_{{{\mathbf {C}}}} \longrightarrow \text{ Mumford--Tate } \text{ domain } \text{ for } {\mathbf {G}}', \end{aligned}$$
(10.15)

where the Mumford–Tate domain in question is understood to be the space of self-dual filtrations on \({\mathsf {V}}_0\) with the same dimensional data as the Hodge filtration on \({\mathsf {V}}_0\).

Also, the Hodge structure on \({\mathsf {V}}_0\) gives rise to a morphism

$$\begin{aligned} \varphi _{0}{:}\,S^1 \longrightarrow {\mathbf {G}}'({{\mathbf {C}}}). \end{aligned}$$

For each \(y \in Y({\mathbf {Z}}[S^{-1}])\), we may reduce modulo \(\ell \) and consider the crystalline Frobenius of the reduction \(X_{y,{\mathbf {F}}_{\ell }} := X_y \times _{{{\mathbf {Z}}}[S^{-1}]} {\mathbf {F}}_{\ell }\). This determines a transformation of the (primitive) crystalline cohomology

$$\begin{aligned} \mathrm {F}_{y}^{\mathrm {crys},\ell } \in {{\text {Aut}}}\ H^d_{\mathrm {crys}}(X_{y,{\mathbf {F}}_{\ell }})^{\mathrm {prim}}. \end{aligned}$$

The characteristic polynomial of this endomorphism is determined by the \(\zeta \)-function of \(X_{y,{\mathbf {F}}_{\ell }}\), and it can be deduced (see [21]) that its eigenvalues coincide with the eigenvalues of \(\ell \)-Frobenius on p-adic absolute étale cohomology for any prime \(p \ne \ell \).

In the coming subsections we will prove the following two lemmas:

Lemma 10.4

(Frobenius centralizer small, for some \(\ell \) below an absolute bound) There exists an integer L with the following property:

For any \(y \in Y({\mathbf {Z}}[S^{-1}])\), there exists a prime \(\ell \leqslant L, \ell \notin S\) such that the semisimplification of \(\mathrm {F}_y^{\mathrm {crys},\ell }\) (and so also the crystalline Frobenius itself) satisfies

$$\begin{aligned} \dim Z\left( \left[ \mathrm {F}_y^{\mathrm {crys},\ell }\right] ^{\mathrm {ss}}\right) \leqslant \dim Z_{{\mathbf {G}}'({{\mathbf {C}}})}(\varphi _0). \end{aligned}$$
(10.16)

On the left hand side, we take the centralizer inside \(\mathrm {GAut}(H^{d,\mathrm {prim}}_{\mathrm {crys}})\), to which the crystalline Frobenius—and so also its semisimplification—belongs.

Lemma 10.5

(Not Zariski dense. This is where semisimplicity gets taken care of.) Given a prime \(\ell \notin S\) and \(y_0 \in Y({{\mathbf {Z}}}[\frac{1}{S}])\) with the property that the centralizer of crystalline Frobenius \(\mathrm {Frob}_{y_0}^{\mathrm {crys}, \ell }\) is at most the dimension of \(Z_{G'({{\mathbf {C}}})}(\varphi _0)\), the set

$$\begin{aligned} \{y \in Y({{\mathbf {Z}}}[\frac{1}{S}]){:}\,y \equiv y_0 \text{ modulo } \ell , \rho _{y,\ell }^{\mathrm {ss}} \simeq \rho _{y_0, \ell }^{\mathrm {ss}}\} \end{aligned}$$
(10.17)

is not Zariski dense. (Here \(\simeq \) means that the representations are \({\mathbf {G}}'\)-conjugate).

Assuming these Lemmas, let us conclude the proof of Theorem 10.1. With L as in Lemma 10.4 let \(N= \prod _{\ell \leqslant L, \ell \notin S} \ell \). Now each \(y \in Y({{\mathbf {Z}}}[\frac{1}{S}])\) gives a collection of representations \(\rho _y^{\mathrm {ss}}{:}\,G_{{{\mathbf {Q}}}} \rightarrow {\mathbf {G}}'({{\mathbf {Q}}}_{\ell })\), one for each \(\ell \) dividing N. For each \(\ell \) dividing N, let \({\mathcal {G}}_{\ell }\) be the set of representations \(G_{{{\mathbf {Q}}}} \rightarrow {\mathbf {G}}'({{\mathbf {Q}}}_{\ell })\) that arises as some \(\rho _{y}^{\mathrm {ss}}\). This is a finite set (modulo conjugacy) by Lemma 2.6 applied to \({\mathbf {G}}' \subset \mathrm {GL}({\mathsf {V}}_0)\); note that it is straightforward to verify that the integrality of characteristic polynomial of Frobenius passes from the whole cohomology to the primitive cohomology.

Call a pair \((y, \ell )\) as in Lemma 10.4good if it satisfies (10.16). For each \(\ell \), Lemma 10.5 and the finiteness of \({\mathcal {G}}_{\ell }\) guarantee that the set of y for which \((y, \ell )\) is good is not Zariski dense. Taking the union over \(\ell \le L\) and applying Lemma 10.4, we see that \( Y({\mathbf {Z}}[S^{-1}])\) is itself not Zariski dense.

10.4 Proof of Lemma 10.4

Fix a prime \(p \notin S\) and let \(\rho _{y,p}{:}\,G_{{{\mathbf {Q}}}} \rightarrow {\mathbf {G}}'({{\mathbf {Q}}}_p)\) be the p-adic Galois representation at y, as above. We have observed that there are only finitely many possibilities for \(\rho _{y,p}^{\mathrm {ss}}\) (here, and below, the semisimplification is taken inside \({\mathbf {G}}'\)).

Let \({\mathbf {H}}\) be the Zariski closure of \(\rho _{y,p}^{\mathrm {ss}}(G_{{{\mathbf {Q}}}})\), with identity component \({\mathbf {H}}^{\circ }\). It is a reductive group (because we took the semisimplification, see Sect. 2.3 and references therein). Call an element in \({\mathbf {H}}^{\circ }(\overline{{{\mathbf {Q}}}_p})\)very regular if it is semisimple and:

(*) its centralizer inside \({{\text {Aut}}}({\mathsf {V}}_0 \otimes \overline{ {{\mathbf {Q}}}_p})\) has minimal dimension amongst all semisimple elements of \({\mathbf {H}}^{\circ }(\overline{{{\mathbf {Q}}}_p})\).

Choose a maximal torus \({\mathbf {T}}_0 \subset {\mathbf {H}}^{\circ }\), and let \(\Phi \) be the set of nontrivial characters \({\mathbf {T}}_0 \rightarrow {\mathbf {G}}_m\) arising from the conjugation action of \({\mathbf {T}}_0\) on the Lie algebra of \({{\text {Aut}}}({\mathsf {V}}_0 \otimes \overline{{{\mathbf {Q}}}_p})\). For \(t \in {\mathbf {T}}_0(\overline{{{\mathbf {Q}}}_p})\) the dimension of the centralizer of t, in \({{\text {Aut}}}({\mathsf {V}}_0 \otimes \overline{{{\mathbf {Q}}}_p})\), is the dimension of the centralizer of \({\mathbf {T}}_0\) in \({{\text {Aut}}}({\mathsf {V}}_0 \otimes \overline{{{\mathbf {Q}}}_p})\), plus the number of roots \(\alpha \in \Phi \) with \(\alpha (t) =1\) (counted with multiplicity). The condition (*) for an element \(t \in {\mathbf {T}}_0(\overline{{{\mathbf {Q}}}_p})\) amounts to asking that \(\alpha (t) \ne 1\) for all \(\alpha \in \Phi \). In particular:

  • Any very regular element is regular inside \({\mathbf {H}}^{\circ }\), and

  • Condition (*) implies the same condition with \({{\text {Aut}}}({\mathsf {V}}_0 \otimes \overline{ {{\mathbf {Q}}}_p})\) replaced by \({\mathbf {G}}'\).

The set of very regular elements is a nonempty Zariski-open subset of \({\mathbf {H}}^{\circ }\) (so also of \({\mathbf {H}}\)). Indeed, the function \(f =\prod _{\alpha \in \Phi } (\alpha (t)-1)\) defines a regular function on \({\mathbf {T}}_0\) which is invariant under the Weyl group. Therefore f extends to a regular function on \({\mathbf {H}}^{\circ }\), and the set of very regular elements is the locus where \(f \ne 0\) (this forces semisimplicity).

It follows, then, that the set of very regular elements in \({\mathbf {H}}({{\mathbf {Q}}}_p)\) is the complement of a proper Zariski-closed set. The preimage of the very regular set under \(\rho _{y,p}^{\mathrm {ss}}{:}\,G_{{{\mathbf {Q}}}} \rightarrow {\mathbf {H}}({{\mathbf {Q}}}_p)\) is nonempty, because \(\rho _{y,p}^{\mathrm {ss}}(G_{{{\mathbf {Q}}}})\) is Zariski-dense in \({\mathbf {H}}\). This preimage is also topologically open, since the very regular set is open. By the Chebotarev density theorem, then, we may choose some \(\ell \) such that

$$\begin{aligned} \rho _{y,p}^{\mathrm {ss}}(\mathrm {Frob}_{\ell })\,\text{ is } \text{ a } \text{ very } \text{ regular } \text{ element } \text{ of }\,{\mathbf {H}}^{\circ }. \end{aligned}$$
(10.18)

Because there are only finitely many possibilities for \(\rho _{y,p}^{\mathrm {ss}}\), this \(\ell \) can be taken to be bounded above by L that depends only on \(S, p, \dim ({\mathsf {V}})\).

On the other hand, it is known that:

the Zariski closure of \(\rho _{y,p}(G_{{{\mathbf {Q}}}_p})\) (this is an algebraic subgroup of \(\mathbf {G'}\)) contains a group \({\mathbf {S}}\) defined over \(\overline{{{\mathbf {Q}}}_p}\), with the following property:

with respect to a suitable isomorphism \(\overline{{{\mathbf {Q}}}_{{{\mathfrak {p}}}}} \simeq {\mathbf {C}}\), the group \({\mathbf {S}}\) becomes isomorphic to the Hodge torus, i.e. to the Zariski closure of the image of \(\varphi _0\) in \({\mathbf {G}}'({{\mathbf {C}}})\).

A result of rather similar nature to the quoted statement was proved by Sen [39] using Hodge–Tate decomposition (Sen’s result pertains to the target group \({{\text {GL}}}_n\)). It can be deduced using a remarkable result of Wintenberger [46] about functorially splitting the Hodge filtration for Fontaine–Laffaille modules. This is carried out by Pink [32, §2]; this latter method also readily adapts to \({\mathbf {G}}'\) target.Footnote 11

Thus

$$\begin{aligned} \dim Z_{{\mathbf {G}}'}({\mathbf {S}}) = \dim Z_{{\mathbf {G}}'({{\mathbf {C}}})}(\varphi _0). \end{aligned}$$
(10.19)

Moreover, a \({\mathbf {G}}(\overline{{{\mathbf {Q}}}_p})\)-conjugate of \({\mathbf {S}}\)—call it \({\mathbf {S}}'\)—is also contained in the Zariski closure of the image of \(\rho _{y,p}^{\mathrm {ss}}(G_{{{\mathbf {Q}}}})\). Indeed, choose a parabolic \({\mathbf {Q}} \leqslant {\mathbf {G}}'\) containing the image of \(\rho _{y,p}\) and minimal for that property; then \(\rho _{y,p}^{\mathrm {ss}}\) is obtained by projecting \(\rho _{y,p}\) to a Levi factor of \({\mathbf {Q}}\), and in particular the Zariski closure of the image of \(\rho _{y,p}^{\mathrm {ss}}\) certainly contains the projection of the Zariski closure of the image of \(\rho _{y,p}\). Now apply Lemma 2.5.

Now we have

$$\begin{aligned} \left[ \rho _{y,p}(\mathrm {Frob}_{\ell })\right] ^{\mathrm {ss}} {\mathop {\sim }\limits ^{\text {Lemma}~2.4}} \left[ \rho _{y,p}^{\mathrm {ss}}(\mathrm {Frob}_{\ell })\right] ^{\mathrm {ss}} {\mathop {=}\limits ^{(10.18)}} \rho _{y,p}^{\mathrm {ss}}(\mathrm {Frob}_{\ell }), \end{aligned}$$

where \(\sim \) denotes \({\mathbf {G}}'(\overline{{{\mathbf {Q}}}_p})\)-conjugacy. By (10.18), the definition of “very regular” element, and the discussion that follows it, the centralizer of this element inside \({\mathbf {G}}'\) is as small as possible, amongst semisimple elements in \({\mathbf {H}}^{\circ }(\overline{{{\mathbf {Q}}}_p})\). In particular, this centralizer is at most as large as the centralizer of \({\mathbf {S}}'\) on \({\mathbf {G}}'\), and so

$$\begin{aligned} \dim Z_{{\mathbf {G}}'} \left[ \rho _{y,p}(\mathrm {Frob}_{\ell })\right] ^{\mathrm {ss}} \leqslant \dim Z_{{\mathbf {G}}'}({\mathbf {S}}') = \dim Z_{{\mathbf {G}}'}({\mathbf {S}}) = \dim Z_{{\mathbf {G}}'({{\mathbf {C}}})}(\varphi _0).\nonumber \\ \end{aligned}$$
(10.20)

We now transfer this to the corresponding assertion for the crystalline Frobenius \(\mathrm {Frob}_{\ell }^{\mathrm {crys}}\). We know that the crystalline \(\ell \)-Frobenius on the \(\ell \)-adic vector space \(H^d_{\mathrm {crys}}(X_{y,{\mathbf {F}}_{\ell }})\) and the usual \(\ell \)-Frobenius on the p-adic geometric étale cohomology of \(X_y\) have the same characteristic polynomial. The same is true for primitive parts. Thus \(\rho _{y,p}(\mathrm {Frob}_{\ell })^{\mathrm {ss}}\) and \((\mathrm {Frob}_{\ell }^{\mathrm {crys}})^{\mathrm {ss}}\) both have the same characteristic polynomial; also they both scale the bilinear forms by \(\ell \).

Split \({\mathsf {V}}_0 \otimes {{\mathbf {Q}}}_p = \bigoplus {\mathsf {V}}_{\lambda }\) into eigenspaces for \(\rho _{y,p}(\mathrm {Frob}_{\ell })^{\mathrm {ss}}\). The biinear form gives a perfect pairing between each \({\mathsf {V}}_{\lambda }\) and \({\mathsf {V}}_{\ell \lambda ^{-1}}\) (interpreted as a self-pairing when \(\lambda ^2=\ell \)); the centralizer of \(\rho _{y,p}(\mathrm {Frob}_{\ell })^{\mathrm {ss}}\) in \({\mathbf {G}}'\) is the set of g stabilizing each \({\mathsf {V}}_\lambda \) and respecting these pairings. In particular the centralizer dimension is determined by the function \(\lambda \mapsto \dim ({\mathsf {V}}_{\lambda })\); the same analysis applies for \((\mathrm {Frob}_{\ell }^{\mathrm {crys}})^{\mathrm {ss}}\). We deduce that

$$\begin{aligned} \dim Z_{\mathrm {GAut}}(\left[ \mathrm {Frob}_{\ell }^{\mathrm {crys}}\right] ^{\mathrm {ss}}) = \dim Z_{{\mathbf {G}}'}(\rho _{y,p}(\mathrm {Frob}_{\ell })^{\mathrm {ss}} ) {\mathop {\leqslant }\limits ^{(10.20)}} \dim Z_{{\mathbf {G}}'({{\mathbf {C}}})}(\varphi _0),\nonumber \\ \end{aligned}$$
(10.21)

concluding the proof of the Lemma. \(\square \)

10.5 Proof of Lemma 10.5

We must analyze the set

$$\begin{aligned} \{y \in Y({{\mathbf {Z}}}[\frac{1}{S}]){:}\,y \equiv y_0 \text{ modulo } p, \rho _{y,p}^{\mathrm {ss}} \simeq \rho _{y_0, p}^{\mathrm {ss}}\} \end{aligned}$$
(10.22)

(we have switched from \(\ell \) to p for typographical simplicity). Here we are assuming that the centralizer of crystalline Frobenius \(\mathrm {Frob}_{y_0}^{\mathrm {crys}, p}\), inside the group \(\mathrm {GAut}\) of generalized automorphisms of the intersection pairing, has dimension at most the dimension of \(Z_{G'({{\mathbf {C}}})}(\varphi _0)\),

Now let us unwind the condition in (10.22), namely, that the semisimplified p-adic Galois representations for y and for \(y_0\) are isomorphic. Recall that semisimplification is taken relative to the ambient group \({\mathbf {G}}'({{\mathbf {Q}}}_p)\). The representation \(\rho _{y,p}\) is realized on \(H^d(X_y, {{\mathbf {Q}}}_p)^{\mathrm {prim}}\), and similarly for \(y_0\). The semisimplification of \(\rho _{y_0,p}\) (in the ambient group \({\mathbf {G}}'\)) is obtained by taking a maximal self-dual flag of \(\rho _{y_0,p}\)-stable subspaces

$$\begin{aligned} 0 \subset {\mathfrak {f}}^1 \subset {\mathfrak {f}}^2 \subset \dots \subset {\mathfrak {f}}^{{\mathsf {m}}} \subset \underbrace{ ({\mathfrak {f}}^{\mathsf {m}})^{\perp } }_{{\mathfrak {f}}^{{\mathsf {m}}+1}}\subset \dots \subset H^d(X_{y_0}, {{\mathbf {Q}}}_p)^{\mathrm {prim}} \end{aligned}$$

with the property that the representation on each graded piece is irreducible. (For the middle graded piece, i.e. the piece \({\mathfrak {f}}^{{\mathsf {m}}+1}/{\mathfrak {f}}^{{\mathsf {m}}}\), we interpret “irreducible” to mean that there is no isotropic invariant subspace, see Sect. 2.3 for explanation. We also permit the possibility that \({\mathfrak {f}}^{{\mathsf {m}}} = {\mathfrak {f}}^{{\mathsf {m}}+1}\).)

Since \(\rho _y^{\mathrm {ss}}\) and \(\rho _{y_0}^{\mathrm {ss}}\) are isomorphic, it means that there exist such flags \({\mathfrak {f}}_{y}\) and \({\mathfrak {f}}_0\) for both y and \(y_0\) such that the \(G_{{{\mathbf {Q}}}}\)-representations on \(\bigoplus _j \mathrm {gr}^{{\mathfrak {f}}}_j \) are isomorphic. In fact, we can arrange even that this is true for every j individually, and that the isomorphism preserves the intersection form for \(j = {\mathsf {m}}\): this follows by using the last sentence of Lemma 2.6.Footnote 12

Now the functors of p-adic Hodge theory carry \(H^d(X_{y_0},{{\mathbf {Q}}}_p)\) to \(H^d_{\mathrm {dR}}(X_{y_0}, {{\mathbf {Q}}}_p)\) and similarly for y. Moreover, the intersection form

$$\begin{aligned} H^d(X_{y_0}, {{\mathbf {Q}}}_p) \otimes H^d(X_{y_0}, {{\mathbf {Q}}}_p) \longrightarrow H^{2d}(X_{y_0},{{\mathbf {Q}}}_p) \left( \simeq {{\mathbf {Q}}}_p(-d) \right) \end{aligned}$$

is carried to the intersection form \(H^d_{\mathrm {dR}}(X_{y_0}, {{\mathbf {Q}}}_p) \otimes H^d_{\mathrm {dR}}(X_{y_0}, {{\mathbf {Q}}}_p) \longrightarrow H^{2d}_{\mathrm {dR}}(X_{y_0},{{\mathbf {Q}}}_p).\) These assertions remain valid for the primitive parts of cohomology.

The flags \({\mathfrak {f}}_y\) and \({\mathfrak {f}}_0\) are in particular \(G_{{{\mathbf {Q}}}_p}\)-invariant, and, under the correspondence of p-adic Hodge theory, these flags \({\mathfrak {f}}_y, {\mathfrak {f}}_0\) correspond to self-dual flags \({\mathfrak {f}}^{\mathrm {dR}}_y\) and \({\mathfrak {f}}^{\mathrm {dR}}_{0}\) inside the associated “de Rham” vector spaces:

$$\begin{aligned} {\mathfrak {f}}_y^{\mathrm {dR}}\,\text{ in }\,H^d_{\mathrm {dR}}(X_y)^{\mathrm {prim}}\,\text{ and }\,{\mathfrak {f}}_0^{\mathrm {dR}}\,\text{ in }\,H^d_{\mathrm {dR}}(X_{y_0})^{\mathrm {prim}}. \end{aligned}$$

Moreover, under the correspondence of p-adic Hodge theory, the filtered \(\phi \)-modules

$$\begin{aligned} ({\mathfrak {f}}^{\mathrm {dR}}_y)^{m+1}/({\mathfrak {f}}^{\mathrm {dR}}_y)^{m} \text{ and } ({\mathfrak {f}}^{\mathrm {dR}}_0)^{m+1}/({\mathfrak {f}}^{\mathrm {dR}}_0)^{m} \end{aligned}$$

correspond, respectively, to the Galois representations of \(G_{{{\mathbf {Q}}}_p}\) on \({\mathfrak {f}}_y^{m+1}/{\mathfrak {f}}_y^{m}\) and \({\mathfrak {f}}_0^{m+1}/{\mathfrak {f}}_0^m\). These Galois representations are isomorphic, so the filtered \(\phi \)-modules just mentioned above are also isomorphic. For \(m={\mathsf {m}}\), the middle degree, the isomorphism of Galois representations can be taken to preserve the bilinear form, and so the same is true for the isomorphism of filtered \(\phi \)-modules.

The map sending y to the Hodge filtration on \(X_y\) defines a period map

$$\begin{aligned} \Phi _p{:}\,\text{ residue } \text{ disk } \text{ at }\,y_0\text{, } \text{ modulo }\,p \longrightarrow p\text{-adic } \text{ period } \text{ domain }\,{\mathfrak {H}}_p \end{aligned}$$

where \({\mathfrak {H}}_p\) is now the set of self-dual flags inside \(V:= H^d_{\mathrm {dR}}(X_{y_0}, {{\mathbf {Q}}}_p)^{\mathrm {prim}}\) with the same dimensional data as the Hodge filtration on \(H^d_{\mathrm {dR}}(X_{y_0})^{\mathrm {prim}}\). Write \(\phi \) for the Frobenius map on V. Our analysis above shows that the set \( \{y \in Y({{\mathbf {Z}}}[\frac{1}{S}]){:}\,y \equiv y_0 \text{ modulo } p, \rho _{y,p}^{\mathrm {ss}} \simeq \rho _{y_0, p}^{\mathrm {ss}}\}\) is contained in a finite union of sets of the following type:

$$\begin{aligned} \Phi _p^{-1}({\mathfrak {S}}), \end{aligned}$$

where \({\mathfrak {S}} \subset {\mathfrak {H}}_p\) is the space of filtrations F on \(V = H^d_{\mathrm {dR}}(X_{y_0}, {{\mathbf {Q}}}_p)^{\mathrm {prim}}\) with the property that there exists another self-dual filtration \({\mathfrak {f}}\), the “semisimplification filtration”:

$$\begin{aligned} 0 = {\mathfrak {f}}^0 \subset {\mathfrak {f}}^1 \subset {\mathfrak {f}}^2 \subset \cdots \subset {\mathfrak {f}}^{{\mathsf {m}}} \subset \underbrace{ ({\mathfrak {f}}^{\mathsf {m}})^{\perp } }_{{\mathfrak {f}}^{{\mathsf {m}}+1}}\subset \cdots \subset {\mathfrak {f}}^{2 {\mathsf {m}}+1} = V \end{aligned}$$

with the following properties:

  • \({\mathfrak {f}}\) is \(\phi \)-stable.

  • The filtration induced by F on each graded piece \(\mathrm {gr}^{{\mathfrak {f}}}_j\) has weight equal to d/2. (This follows because it arises from applying p-adic Hodge theory to the restriction of a global Galois representation that is pure of weight d, using Lemma 2.9.)

  • We have an isomorphism of filtered \(\phi \)-modules

    $$\begin{aligned} (\mathrm {gr}^{{\mathfrak {f}}}_j, \text{ filtration } \text{ induced } \text{ by } F) \simeq (\mathrm {gr}^{{\mathfrak {f}}_0}_j, \text{ filtration } \text{ induced } \text{ by } F_0) \end{aligned}$$

    (i.e., an isomorphism of vector spaces respecting Hodge filtration and Frobenius). In particular, the left-hand side of the above equation lies in a fixed isomorphism class.

    On the right hand side \(F_0\) is the filtration at \(y_0\). In the case of the middle graded piece \(j ={\mathsf {m}}\), the isomorphism above may be taken, moreover, to preserve the bilinear forms on both sides.

The following Proposition 10.6 implies that the codimension of the set \({\mathfrak {S}}\) above is at least equal to the dimension of Y. Given this Proposition, Lemma 10.5 now follows from Lemma 9.3 (the p-adic transcendence of period mappings).

Proposition 10.6

Suppose V is a vector space over the field K equipped with a bilinear form \(\langle -, - \rangle \) and a linear automorphism \(\phi \in \mathrm {GAut}(V)\).

Suppose \(A_1, \dots , A_{{\mathsf {m}}}\) is a collection of K-vector spaces, each equipped with a decreasing filtration and a linear automorphism \(\phi _i{:}\,A_i \rightarrow A_i\). We suppose the final space \(A_{{\mathsf {m}}}\) is equipped with a bilinear form \(\langle -, - \rangle \).

Consider all self-dual filtrations

$$\begin{aligned} V = F^0 V \supset F^1 V\supset \cdots \supset F^d V \supset F^{d+1}V = \{0\} \end{aligned}$$

on V, where we fix the dimensions of each \(F^i\).

Call such a filtration F “bad” if there exists another self-dual filtration \({\mathfrak {f}}\) on V

$$\begin{aligned} 0 = {\mathfrak {f}}^0 \subset {\mathfrak {f}}^1 \subset \dots \subset {\mathfrak {f}}^{{\mathsf {m}}} \subset {\mathfrak {f}}^{{\mathsf {m}}+1} \subset \dots \subset {\mathfrak {f}}^{2{\mathsf {m}}+1} = V \end{aligned}$$

such that the following conditions hold.

  1. (a)

    \({\mathfrak {f}}\) is \(\phi \)-stable.

  2. (b)

    The weight of the filtration induced by F on each graded piece \(\mathrm {gr}^k_{{\mathfrak {f}}}\) equals d/2, i.e. the weight of the filtration F on V.

  3. (c)

    There exists an isomorphism of filtered \(\phi \)-modules:

    $$\begin{aligned} \left( \mathrm {gr}_{{\mathfrak {f}}}^j V, \text{ filtration } \text{ induced } \text{ by } F \right) \simeq A_j \end{aligned}$$

    for each \(j \le {\mathsf {m}}\), and in the middle dimension \(j={\mathsf {m}}\) this also preserves bilinear forms.

Define the Hodge numbers \(h^p\) as the dimension of \(\mathrm {gr}^p_F \ \mathrm {Lie} \mathrm {GAut}(V)\); let T(y) be the sum of the topmost y Hodge numbers, extended by linearity as in (10.3).

Put \(z = \dim Z(\phi ^{\mathrm {ss}})\), the dimension of the centralizer of the semisimple part of \(\phi \) in \(\mathrm {GAut}(V)\).

If e is a positive integer such that

$$\begin{aligned} \text{ number } \text{ of } \text{ positive } \text{ Hodge } \text{ numbers } \geqslant z+e \end{aligned}$$
(10.23)

and

$$\begin{aligned} \text{ sum } \text{ of } \text{ all } \text{ positive } \text{ Hodge } \text{ numbers } > T(z+e) + \ T \left( \frac{h^0}{2}+z+e \right) ,\nonumber \\ \end{aligned}$$
(10.24)

then the codimension of the space of bad filtrations is greater than or equal to e.

To be clear, we apply this with:

  • \(K = {{\mathbf {Q}}}_p\) and \(V = H^d_{\mathrm {dR}}(X_{y_0}, {{\mathbf {Q}}}_p)^{\mathrm {prim}}\) for some fiber of the family of Theorem 10.1;

  • The filtration F comes from the Hodge filtration on \(X_y\), where y lies in the residue disk of \(y_0\).

  • \({\mathfrak {f}}\) is another filtration which comes from a potential failure of the global Galois representation at y to be semisimple; the passage to the graded \(\mathrm {gr}_{{\mathfrak {f}}}\) affects semisimplification of the Galois representation.

  • Condition (b) comes eventually from global purity.

  • We have \(z \leqslant h^0\) by assumption (this came from Lemma 10.4) and we take \(e=\dim (Y)\).

The statement of Proposition 10.6 is complicated—it is an analogue, in our current setting, of Lemma 6.3. We’ll offer some vague motivation here. Proposition 10.6 asks: for which filtrations F on V do \(A_1, \ldots , A_{{\mathsf {m}}}\) form a composition series for V as filtered \(\phi \)-modules? Of course, filtered \(\phi \)-modules are in general far from simple, and the choice of F often amounts to a choice of extension class. Based on this, one might expect that the space of bad filtrations is large, perhaps even Zariski dense in the flag variety. This does not happen here because of the condition that the weight of the filtration on each \(A_i\) equal d/2. This equal-weight condition generalizes Eq. (6.12). Requiring the subobjects \({\mathfrak {f}}^j V\) giving the composition series to have large intersection with pieces of the filtration turns out to impose strong conditions on the filtration F. This is the content of Proposition 10.6.

11 Combinatorics related to reductive groups

It remains to prove Proposition 10.6 from the prior section. This is “just” a problem in linear algebra but it is a notational mess. We analyze it using some simple ideas about root systems. Although we work in the generality of an arbitrary reductive group, to help the exposition we will often explicate the discussion in the case of \({{\text {GL}}}_n\). One other reason we chose to work in this generality is that analysis of this type is likely necessary when carrying out a similar analysis for more general monodromy groups.

Since Proposition 10.6 is geometric, concerning the dimensions of certain algebraic sets, we can and will suppose that the base field K is algebraically closed. We will therefore permit ourselves to identify algebraic groups with their K-points; they will be correspondingly denoted by usual letters PG etc., rather than boldface letters as we have done previously.

There is a correspondence between filtrations and parabolic subgroups. We have a question about the interaction of two filtrations \({\mathfrak {f}}\) and F; we’re going to convert it to a question about the interaction of two parabolic subgroups P and Q.

One important warning: as defined \({\mathfrak {f}}\) is an increasing filtration, whereas F is decreasing. However, in actual fact, the indexing of \({\mathfrak {f}}\) is irrelevant. All that will matter throughout is the stabilizer of \({\mathfrak {f}}\); we could re-index it to be a decreasing filtration and nothing at all would change. On the other hand, the indexing of Fdoes matter, and thus we will need to keep track of extra data beyond its stabilizer.

Tracing back the origins of Proposition 10.6, \({\mathfrak {f}}\) comes from the semisimplification filtration on a global Galois representation, and F from the Hodge filtration. The following informal dictionary may be helpful, at least in interpreting the material from Sect. 11.3 onward:

  • The parabolic denoted P should be thought of as the stabilizer of the semisimplification filtration \({\mathfrak {f}}\).

  • The Levi quotient M of P corresponds to the associated graded for \({\mathfrak {f}}\); globally, the semisimplification of the Galois representation takes values in M.

  • The parabolic Q should be thought of as the stabilizer of the Hodge filtration F.

The argument can be informally summarized like this:

  • First of all, we bound the number of possibilities for \({\mathfrak {f}}\), using the fact that it is \(\phi \)-stable. This uses the fact that the centralizer of \(\phi ^{\mathrm {ss}}\) is not too large and happens in (11.18). After this point, it is enough to work with a given \({\mathfrak {f}}\) and P.

  • Having fixed \({\mathfrak {f}}\) and P, we break up the space of possible F into P-orbits. The set of F satisfying the weight condition (b) of Proposition 10.6, is a union of P orbits. We need to show that no P orbit of small codimension occurs in this set.

  • To illustrate the idea, we will just explain why the openP orbit doesn’t occur. Suppose F satisfies the weight condition (b) of Proposition 10.6. We show then that PQ/Q is not open in G/Q.

    We find a maximal torus \(T \subset P \cap Q\) and a cocharacter \(\nu {:}\,\mathbf{G }_m \rightarrow T\) which defines the filtration F. In particular, Q consists of non-negative root spaces for \(\nu \). The weight condition will imply that

    $$\begin{aligned} \sum _{\gamma \in \Sigma -\Sigma _P} \langle \nu , \gamma \rangle = 0, \end{aligned}$$
    (11.1)

    the sum being taken over roots \(\Sigma \) for T that correspond to root spaces outside P.

    By using the assumed numerology of Hodge numbers, not too many of these \(\langle \nu , \gamma \rangle \) can be zero. In particular, (11.1) implies that \(\langle \nu , \gamma \rangle < 0\) for at least one \(\gamma \in \Sigma -\Sigma _P\).

    That means there is at least one such root \(\gamma \in \Sigma -\Sigma _P\) that doesn’t belong to the Lie algebra of Q; equivalently,

    $$\begin{aligned} \mathrm {Lie}(Q) + \mathrm {Lie}(P) \ne \mathrm {Lie}(G), \end{aligned}$$

    which implies the desired conclusion.

11.1 Filtrations on reductive groups

Let G be a reductive group over an algebraically closed field K.

A (rational) cocharacter \(\lambda {:}\,{\mathbf {G}}_m \dashrightarrow G\) is simply a cocharacter that is allowed to be defined on a finite cover of \({\mathbf {G}}_m\). It determines a parabolic \(P_{\lambda }\), whose Lie algebra is the sum of non-negative weight spaces for \(\lambda \); the centralizer of \(\lambda \) is therefore a Levi factor for this parabolic. A “filtration” for G will be, by definition, an equivalence class of such rational cocharacters \(\lambda \), where \(\lambda \sim \lambda '\) if \(\lambda '\) is conjugate to \(\lambda \) under \(P_{\lambda }\) (or equivalently under the unipotent radical of \(P_{\lambda }\)).

Example 1

Filtrations.

  • A filtration on \(G = {{\text {GL}}}(V)\) is the same as a (decreasing) filtration \(F^{\bullet }V\) on V, where the indices are indexed by rational numbers. Specifically we set

    $$\begin{aligned} F^p V = \text{ sum } \text{ of } \text{ all } \text{ weight } \text{ spaces } \text{ for } \lambda \text{ on } V \text{ with } \text{ weights } \geqslant p\nonumber \\ \end{aligned}$$
    (11.2)

    The associated parabolic \(P_{\lambda }\) is precisely the stabilizer of this filtration.

    Note that \(F^{\bullet } V\) determines \(\lambda \) up to the equivalence described above: any two rational characters \({\mathbf {G}}_m \rightarrow P\) with the same projection to a Levi quotient are actually P-conjugate by Lemma 2.5.

  • If V is equipped with a bilinear form \(\langle -, - \rangle \), then a filtration on \(\mathrm {GAut}(V, \langle -, - \rangle )\) is the same as a self-dual filtration on V, again via the formula (11.2); more precisely, if the filtration F corresponds to a character \(\chi {:}\,{\mathbf {G}}_m \dashrightarrow \mathrm {GAut}(V)\) for which \(\chi (x)\) scales the form by \(x^{r}\), then

    $$\begin{aligned}&F^p V\,\text{ and }\,F^{r-p+\epsilon } V\,\text{ are } \text{ orthogonal } \text{ complements } \text{ of } \text{ one } \text{ another }\nonumber \\&\quad \text{(for } \text{ sufficiently } \text{ small }\,\epsilon ). \end{aligned}$$
    (11.3)

A map \(G_1 \rightarrow G_2\) of reductive groups induces, obviously, a map from filtrations for \(G_1\) to filtrations for \(G_2\). Thus a filtration on G determines a filtration on the underlying space of any G-representation. If \(G = {{\text {GL}}}_n\), this corresponds to the usual way in which a filtration on V induces (e.g.) a filtration on \(V \otimes V, V^*\), etc.

Indeed, for a general group G, to give a filtration of G is the same as giving a filtration functorially on every representation of G: this is part of the theory of filtered fibered functors, see [36, Section IV.2.1].

For any reductive group S write

$$\begin{aligned} {\mathfrak {a}}_{{\mathbf {S}}} := X_*( Z_{S}) \otimes {{\mathbf {Q}}}\end{aligned}$$

where \(Z_{S}\) is the center. (As usual, we write \(X_*\) for cocharacters and \(X^*\) for characters.) This space is canonically in duality with \(X^*(S) \otimes {{\mathbf {Q}}}\). If F is a filtration on S the projection of the associated cocharacter to the torus quotient of S defines a class in \({\mathfrak {a}}_S\). We call this the weight of F:

$$\begin{aligned} \mathrm {wt}(F) \in {\mathfrak {a}}_S. \end{aligned}$$

Example 2

Weights of filtrations.

  • For \({{\text {GL}}}(V)\), \({\mathfrak {a}}_{{{\text {GL}}}(V)}\)is a one-dimensional \({{\mathbf {Q}}}\)-vector space. We identify it with \({{\mathbf {Q}}}\) by identifying the character \(t \in {\mathbf {G}}_m \mapsto t \mathrm {Id}_V\) with \(1 \in {{\mathbf {Q}}}\). With this identification, the weight of the filtration on \({{\text {GL}}}(V)\), corresponding to \(F^p V\) as in (a) above, is \(\frac{\sum _{p} p \dim (F^p/F^{p+1})}{\dim V}\); thus this definition coincides with our previous definition (2.2).

  • For \(\mathrm {GAut}(V)\), we can make the same identification of \({\mathfrak {a}}\) with \({{\mathbf {Q}}}\) as for \({{\text {GL}}}\). With this identification, the weight of the filtration described before (11.3) is necessarily equal to r/2, one-half of the integer by which the associated character scales the form.

We can alternately describe filtrations using parabolics: for \(\lambda {:}\,{\mathbf {G}}_m \rightarrow G\) the projection of \(\lambda \) to the Levi quotient \(M_{\lambda }\) of the parabolic \(P_{\lambda }\) is central in \(M_{\lambda }\); thus we get a class \({\bar{\lambda }}\) in \({\mathfrak {a}}_{M_{\lambda }}\). The pair \((P_{\lambda }, {\bar{\lambda }} \in {\mathfrak {a}}_{M_{\lambda }})\) depends only on the filtration associated to \(\lambda \), and moreover completely determines that filtration, because of Lemma 2.5. In fact, any pair \((P, e \in {\mathfrak {a}}_M)\) of a parabolic and a “strictly positive” element of \({\mathfrak {a}}_M\), i.e. positive on all roots in the unipotent radical of P, arises from a filtration.

11.2 Levi subgroups

Now suppose that N is a Levi subgroup of G. The center of N then contains the center of G. In this way we obtain a map

$$\begin{aligned} {\mathfrak {a}}_G \longrightarrow {\mathfrak {a}}_N \end{aligned}$$
(11.4)

which is naturally split: a character of G, i.e. a homomorphism \(G \rightarrow {\mathbf {G}}_m\), can be pulled back to a character of N. The resulting map

$$\begin{aligned} \underbrace{ X^*(G) \otimes {{\mathbf {Q}}}}_{\simeq {\mathfrak {a}}_G^*} \longrightarrow \underbrace{ X^*(N) \otimes {{\mathbf {Q}}}}_{\simeq {\mathfrak {a}}_N^*} \end{aligned}$$

gives rise to a splitting of (11.4).

Example 3

If \(\dim V_i = n_i\) then \({{\text {GL}}}(V_1) \times {{\text {GL}}}(V_2)\) is a Levi subgroup of \({{\text {GL}}}(V_1 \oplus V_2)\). We identify \({\mathfrak {a}}_N = {{\mathbf {Q}}}^2\) as in the previous example; then \({\mathfrak {a}}_G\) is embedded as the subspace (1, 1) and the complementary subspace is spanned by \((-\dim (V_2), \dim (V_1))\).

11.3 The induced filtration on a Levi subgroup

If V is a vector space equipped with filtrations \(F^{\bullet }\) and \({\mathfrak {f}}^{\bullet }\), then \(F^{\bullet }\) induces a filtration on \(\mathrm {gr}^{{\mathfrak {f}}}_* V\). We need to analyze this induced filtration carefully when \(F^{\bullet }\) the Hodge filtration and \({\mathfrak {f}}^{\bullet }\) the semisimplification filtration.

It is convenient to again express this abstractly: for any reductive group G and any parabolic P, a filtration F on G induces a filtration \(F_M\) on the Levi quotient M of P. (With reference to the example above, P corresponds to the filtration \({\mathfrak {f}}^{\bullet }\), and M to the associated graded). To explain this we require the following lemma:

Lemma 11.1

Let \(\chi {:}\,{\mathbf {G}}_m \rightarrow G\) be a character defining the parabolic subgroup Q. Let Q act transitively on an algebraic variety Y. Then all fixed points of \(\chi \) on Y are conjugate under the centralizer N of \(\chi \).

Proof

We have a Levi decomposition \(Q=NV\), with V the unipotent radical. Suppose that \(y_0 \in Y\) is \(\chi \)-fixed. It is enough to verify that \(y_0\) is the unique point in \(Vy_0\) that is \(\chi \)-fixed. Let \(V_0\) be the stabilizer of \(y_0\) inside V. For \(x \in {\mathbf {G}}_m\),

$$\begin{aligned} \chi (x) \cdot (vy_0) = (\chi (x) v \chi (x)^{-1}) y_0, \end{aligned}$$

and thus the \(\chi \)-fixed points on \(Vy_0\) correspond to the fixed points for \(\chi (x)\)-conjugation on \(V/V_0\).

But all the weights of this \({\mathbf {G}}_m\)-action on V are positive, i.e. the limit of \(\chi (x) v \chi (x)^{-1}\) as \(x \rightarrow 0\) is equal to the identity. Therefore the only fixed point on \(V/V_0\) for conjugation by \(\chi ({\mathbf {G}}_m)\) is the identity coset. \(\square \)

Before we formulate the induced filtration in terms of parabolics, we recall some linear algebra associated to two parabolics. Suppose that PQ are parabolic subgroups of G, where Q is the stabilizer of a filtration F. It is known that P and Q contain a common maximal torus T and that \(P \cap Q\) is connected; this, together with everything else we will use is contained in [13, Chapter 2]. We will briefly summarize what we need.

Fixing T as above, we get Levi decompositions of P and Q such that both Levi factors contain T:

$$\begin{aligned} P = M U\,\text{ and }\,Q = N V \end{aligned}$$
(11.5)

We have a factorization

$$\begin{aligned} P \cap Q = (M \cap Q) \cdot (U \cap Q). \end{aligned}$$
(11.6)

In particular, this implies that the projection of \(P \cap Q\) to M along \(P \twoheadrightarrow M\) is just \(M \cap Q\).

To verify this factorization, we note that \((M \cap Q)\) normalizes \((U \cap Q)\), and also that it is easy to verify the corresponding splitting at the level of Lie algebras; since \(P \cap Q\) is connected, this factorization also follows.

Lemma 11.2

(Induced filtration on the Levi factor of a parabolic.) Let F be a filtration for the group G. There exists a representative \(\chi _P{:}\,{\mathbf {G}}_m \rightarrow G\) for the filtration F with the property that \(\chi \) is valued in P. Moreover, any two such representatives are conjugate under \(P \cap Q\).

For each such representative \(\chi _P\), the projection of \(\chi _P\) to the Levi quotient M of P defines a filtration on M which is independent of the choice of \(\chi _P\).

Proof

Let \(\chi {:}\,{\mathbf {G}}_m \rightarrow G\) represent the filtration, and let Q be the associated parabolic. The intersection \(P \cap Q\) contains a maximal torus T of G and we may certainly conjugate \(\chi \) so it is valued in T, so in P; this proves the existence statement.

For uniqueness fix \(\chi _P\), which we may now suppose to be valued in \(P \cap Q\). Now the image of \(\mathrm {Ad}(q)^{-1} \chi _P\) is in P if and only if

$$\begin{aligned} \chi _P({\mathbf {G}}_m) \subset \mathrm {Ad}(q) P, \end{aligned}$$

i.e. qP lies in the set of fixed points of \(\chi _P({\mathbf {G}}_m)\) on QP/P. These fixed points are all conjugate under N, the centralizer of \(\chi _P\), as we have seen above; thus \(qP \in NP\), so that \(q \in N(Q \cap P)\). Thus the characters \(\mathrm {Ad}(q)^{-1} \chi _P\) lie in a single \((P \cap Q)\)-orbit.

It remains to prove the final statement. Choose \(\chi _P, \chi _P'{:}\,{\mathbf {G}}_m \rightarrow P\), as above, both P-valued representatives for the filtration F. We have \(\chi _P' = \mathrm {Ad}(g) \chi _P\) for some \(g \in P \cap Q\), so \(\overline{\chi _P'} = \mathrm {Ad}({\bar{g}}) \overline{\chi _P}\), where bars denote projection to the Levi quotient of P. To see that these two characters define the same filtration we need to verify that

$$\begin{aligned} {\overline{g}} \in Q_{\overline{\chi _P}}. \end{aligned}$$

This follows from the remark after (11.6): extend the image of \(\chi _P\) to a maximal torus inside \(P \cap Q\); then, with the corresponding choice M of Levi subgroup for P, we have \(\overline{\chi _P} = \chi _P\) and \(Q_{\overline{\chi _P}} = Q \cap M\).

\(\square \)

Example 4

Induced filtration on the associated graded.

  • Consider the case of \(G={{\text {GL}}}(V)\). Suppose given a decreasing filtration \(F^{\bullet } V\) (with associated parabolic Q) and another parabolic P; we fix an increasing filtration \({\mathfrak {f}}^{\bullet } V\) with stabilizer P. We show that the construction above gives precisely the filtration induced by F on the associated graded to \({\mathfrak {f}}\).

    As above, we can represent the character for F by a character \(\chi \)preserving the filtration\({\mathfrak {f}}\). Then, writing \({\bar{F}}\) for the induced filtration:

    $$\begin{aligned} {\bar{F}}^j({\mathfrak {f}}^k/{\mathfrak {f}}^{k-1}) \end{aligned}$$

    is the sum of all eigenspaces with weights \(\geqslant j\); this is the image of the corresponding space in \({\mathfrak {f}}^k\), that is to say,

    $$\begin{aligned} {\bar{F}}^j ({\mathfrak {f}}^k/{\mathfrak {f}}^{k-1}) = \text{ image } \text{ of } F^j \cap {\mathfrak {f}}^k \text{ in } {\mathfrak {f}}^k/{\mathfrak {f}}^{k+1}. \end{aligned}$$
  • We now modify the example above by taking G to be \(\mathrm {GAut}(V, \langle -, - \rangle )\) for some symmetric or skew-symmetric nondegenerate bilinear pairing \(\langle -, - \rangle \). Now suppose that P is a parabolic subgroup of G, stabilizing the self-dual increasing filtration

    $$\begin{aligned} 0 = {\mathfrak {f}}^0 \subset {\mathfrak {f}}^1 \subset \dots \subset {\mathfrak {f}}^{{\mathsf {m}}} \subset {\mathfrak {f}}^{{\mathsf {m}}+1} \subset \dots \subset {\mathfrak {f}}^{2{\mathsf {m}}+1} = V. \end{aligned}$$

    Just as before, F induces a filtration \({\bar{F}}\) on each graded piece \({\mathfrak {g}}^j = {\mathfrak {f}}^j/{\mathfrak {f}}^{j-1}\). The associated Levi subgroup is isomorphic to

    $$\begin{aligned} {{\text {GL}}}({\mathfrak {g}}^1) \times \cdots \times {{\text {GL}}}({\mathfrak {g}}^{{\mathsf {m}}}) \times \mathrm {GAut}({\mathfrak {g}}^{{\mathsf {m}}+1}), \end{aligned}$$

    where we regard the last factor as \({\mathbf {G}}_m\) even if \({\mathfrak {f}}^{{\mathsf {m}}} = {\mathfrak {f}}^{{\mathsf {m}}+1}\), and the corresponding filtration on each factor is the one induced by \({\bar{F}}\).

11.4 Balanced filtrations and parabolic subgroups

As above, let G be a reductive group over a field K, and let:

  • F be a filtration of G associated with the parabolic subgroup Q,

  • P a parabolic subgroup of G, with Levi quotient M.

We say that F is balanced with respect to P if \(\mathrm {wt}(F) \in {\mathfrak {a}}_G\) is carried, under the embedding \({\mathfrak {a}}_G \rightarrow {\mathfrak {a}}_M\), to the weight \(\mathrm {wt}(F_M)\) of the filtration induced on the Levi quotient. Here \({\mathfrak {a}}_G \hookrightarrow {\mathfrak {a}}_M\) is as in Sect. 11.2.

Example 5

Balanced filtrations.

  • If \(G={{\text {GL}}}(V)\), and P is associated to the increasing filtration \({\mathfrak {f}}^q V\), then “balanced” says that, for every q, the filtration that F induces on \({\mathfrak {f}}^q/{\mathfrak {f}}^{q+1}\) has the same weight as the filtration F on V.

  • The same assertion holds for \(\mathrm {GAut}(V)\), where now F and \({\mathfrak {f}}\) are self-dual filtrations.

Note that if we choose a cocharacter \(\chi _P{:}\,{\mathbf {G}}_m \dashrightarrow P\) representing F, the condition of being “balanced” implies that, for any character \(\psi \) of P trivial on the center of G,

$$\begin{aligned} \langle \psi , \chi _P \rangle = 0. \end{aligned}$$
(11.7)

Now define

$$\begin{aligned} X(F) = \left\{ G\text{-conjugates } \text{ of } F \text{ that } \text{ are } \text{ balanced } \text{ with } \text{ respect } \text{ to } P\right\} , \end{aligned}$$

so that X(F) is a P-stable subvariety of G/Q and is equipped with a map

$$\begin{aligned} X(F) \longrightarrow \left\{ \text{ filtrations } \text{ of } M \right\} \end{aligned}$$
(11.8)

via the rule \(F \mapsto F_M\). We may regard this, in an evident way, as a “constructible” map between algebraic varieties (i.e. its graph is a constructible set) and thus we can reasonably speak of dimension of fibers.

We will analyze (11.8) by breaking X(F) into P-orbits. Consider for a moment \(F \mapsto F_M\) as a map

$$\begin{aligned} \text{ filtrations } P\text{-conjugate } \text{ to } F \longrightarrow \text{ filtrations } \text{ on } \text{ Levi } \text{ quotient } \text{ of } P \end{aligned}$$
(11.9)

where both sides are P-varieties. The left hand side is identified with \(P/(P \cap Q)\), and—if we choose a maximal torus of \(P \cap Q\) containing the image of a character defining F, and take the corresponding Levi decomposition \(P=MU\)—the image is identified with \(M/(M \cap Q)\). From this and (11.6) we find that each fiber of (11.9) has dimension

$$\begin{aligned} \dim (U) - \dim (Q \cap U) \end{aligned}$$
(11.10)

where U is the unipotent radical of P, and Q the stabilizer of F.

11.5 Double cosets of parabolic subgroups

Fix, as before, F a filtration of G associated with the parabolic subgroup Q, and P a parabolic subgroup of G, with Levi quotient M. Continue with notation X(F) as above. We will be concerned with estimating the size of the fibers of (11.8).

Fix a Borel B contained in P and a maximal torus \(T \subset B\). Since the variety X(F) depends only on the G-orbit of F, we may harmlessly replace F by a G-conjugate; in particular we may suppose that F is defined by a co-character \(\mu {:}\,{\mathbf {G}}_m \rightarrow T\) that is positive with respect to B, i.e. \(B \subset Q\).

Let \(\Sigma \supset \Sigma ^+\) be the set of roots of T on G and on B, respectively; one therefore gets notions of simple and positive roots. Let \(\Sigma _P, \Sigma _Q\) be the set of roots of T on P and Q. Therefore, \(\Sigma ^+ \subset \Sigma _P\) and \(\Sigma ^+ \subset \Sigma _Q\). Let \(\Delta _P\) be the subset of simple roots \(\alpha \) for which \(-\alpha \in \Sigma _P\), and similarly define \(\Delta _Q\); thus P and Q correspond to the subsets \(\Delta _P, \Delta _Q\) of the set of simple roots. Note that, since \(\mu \) defines the parabolic subgroup Q, \(\Sigma _Q\) is the set of roots having nonnegative pairing with \(\mu \), and in particular \(\mu \) is orthogonal to all roots for \(\Delta _Q\):

$$\begin{aligned} \langle \mu , \beta \rangle =0, \ \ \beta \in \Delta _Q. \end{aligned}$$

Recall the “adjoint” Hodge numbers associated to \(\mathrm {Lie} \ \ \mathrm {GAut} ( {\mathsf {V}}_0 \otimes {{\mathbf {C}}}, \langle -, - \rangle )\), introduced in Sect. 10. The following proposition uses an abstraction of that notion:

Proposition 11.3

Let the “Hodge numbers” be the multi-set of integers of the form \(\langle \mu , \gamma \rangle \) with \(\gamma \in \Sigma \), adding multiplicity \(\dim (T)\) to the multiplicity of zero. For \(i \ne 0\) let \(a_i\) be the number of roots \(\gamma \in \Sigma \) with \(\langle \mu ,\gamma \rangle = i\), so that \(a_i\) is the multiplicity of i as a Hodge number and \(\sum _{i > 0} a_i = \dim (G/Q)\); we take \(a_0\) the dimension of the Levi factor of Q.

Suppose \(e \le \dim (G/Q)\) is a positive integer such that

$$\begin{aligned} \text{ sum } \text{ of } \text{ all } \text{ positive } \text{ Hodge } \text{ numbers }&> \text{ sum } \text{ of } \text{ top } e \text{ Hodge } \text{ numbers } \nonumber \\&\quad +\, \text{ sum } \text{ of } \text{ top } \left( \frac{a_0}{2}+e\right) \text{ Hodge } \text{ numbers }, \nonumber \\ \end{aligned}$$
(11.11)

Then the codimension inside G/Q of any fiber of the mapping (11.8)

$$\begin{aligned} X(F) \rightarrow \text{ filtrations } \text{ of } M \end{aligned}$$

is greater than e.

Proof

We are going to analyze this P-orbit by P-orbit. Note that we have \(G = P W_{PQ} Q\), where \(W_{PQ}\) is the subset of the Weyl group W defined via

$$\begin{aligned} W_{PQ} = \{w \in W{:}\,w^{-1} \Delta _P> 0, w\Delta _Q > 0\}. \end{aligned}$$
(11.12)

Indeed, it is enough to see (by the Bruhat decomposition) that \(W = W_P \cdot W_{PQ} \cdot W_Q\), where \(W_P\) and \(W_Q\) are generated by simple reflections corresponding to \(\Delta _P\) and \(\Delta _Q\). Writing as usual \( \ell (w) = \# \{ \alpha > 0{:}\,w \alpha < 0\}\) for the length of a Weyl element, any minimal-length representative in a fixed double coset \(W_P \cdot w \cdot W_Q\) belongs to \(W_{PQ}\): for \(\alpha \in \Delta _P\), the element \(s_{\alpha } w\) has shorter length than w if \(w^{-1} \alpha < 0\). Similarly, for \(\beta \in \Delta _Q\), we know that \(w s_{\beta }\) has shorter length than w if \(w \beta < 0\).

For each \(w \in W_{PQ}\) we have either \(PwQ/Q \subset X(F)\), or \(PwQ/Q \cap X(F) = \emptyset \). Call wbad in the former case. For each bad \(w \in W_{PQ}\) let \(X(F)_w\) be the corresponding locally closed subvariety of X(F), i.e.

$$\begin{aligned} X(F)_w = X(F) \cap \left( (P wQ)/Q \right) . \end{aligned}$$

Thus \(X(F) = \coprod X(F)_w\), the union taken over bad w. Assume, by way of contradiction, that there exists some bad w such that a fiber of

$$\begin{aligned} X(F)_w \rightarrow \text{ filtrations } \text{ of } M \end{aligned}$$
(11.13)

has codimension inside G/Q that is \(\leqslant e\).

That w is bad means that the filtration defined by the co-character \(w\mu \) is balanced with reference to P. This means in particular that

$$\begin{aligned} \sum _{\gamma \in \Sigma - \Sigma _P} \langle w \mu , \gamma \rangle = 0. \end{aligned}$$
(11.14)

In fact \(\sum _{\gamma \in \Sigma - \Sigma _P} \gamma \) computes the modular character of the parabolic subgroup P: it is the negative of the character by which P acts on the determinant of its unipotent radical, and then use (11.7).

For this (bad) w, write

$$\begin{aligned} X = \{\beta \in \Sigma - \Sigma _P{:}\,w^{-1} \beta > 0\} = \{\beta \in \Sigma -\Sigma _P{:}\,-w^{-1} \beta \in \Sigma -\Sigma _Q\} \end{aligned}$$

(using Lemma 11.4, see below) and let \(X'\) be the complement of X inside \(\Sigma -\Sigma _P\).

Each fiber of (11.13) has, by (11.10), dimension

$$\begin{aligned} \dim (U) - \dim (\mathrm {Ad}(w) Q \cap U)&= \# \{ \alpha \in \Sigma -\Sigma _P{:}\\&- w^{-1} \alpha \in \Sigma - \Sigma _Q \} = \# X. \end{aligned}$$

(see Lemma 11.4). This is equal to the length \(\ell (w)\), although we won’t make explicit use of it. Therefore our assumption means \(\#X \geqslant \dim (G/Q)-e\). Then, since \(\# X' = \dim (G/P) - \#X\), we have

$$\begin{aligned}&\# X' \leqslant \dim (G/P) - \dim (G/Q) + e \nonumber \\&\quad = \dim (Q)-\dim (P) + e \leqslant \dim (Q/B) + e \leqslant \frac{a_0}{2} +e. \end{aligned}$$
(11.15)

Also, by (11.14),

$$\begin{aligned} \sum _{\beta \in X} \langle \mu , w^{-1} \beta \rangle = \sum _{X'} - \langle \mu , w^{-1} \beta \rangle . \end{aligned}$$
(11.16)

All entries on the left hand side are strictly positive because \(w^{-1} \beta \) is the negative of an element of \(\Sigma -\Sigma _Q\). All entries on the right-hand side are non-negative (because \(B \subset Q\) the cocharacter \(\mu \) is non-negative on positive roots.) Now X has size \(\geqslant \dim (G/Q) -e\), so the image \(-w^{-1}(X)\) omits at most e roots inside \(\Sigma -\Sigma _Q\). Therefore, the left-hand side of (11.16) is at least

$$\begin{aligned} \text{ sum } \text{ of } \text{ all } \text{ positive } \text{ Hodge } \text{ numbers } - \text{ sum } \text{ of } \text{ the } \text{ topmost } e \text{ Hodge } \text{ numbers. } \end{aligned}$$

On the other hand, the right-hand side of (11.16) is at most the sum of the top \((a_0/2+e)\) Hodge numbers. (Here we have used that, since \(e \le \dim (G/Q)\), the top e Hodge numbers are all positive.) So we get a contradiction to (11.16) under the stated hypothesis. \(\square \)

We used the following lemma.

Lemma 11.4

Let \(\Sigma \) be the set of all roots, and take \(w \in W_{PQ}\) (see (11.12)).

  1. (i)

    For \(\beta \in \Sigma - \Sigma _Q\), we have \(w \beta > 0 \iff -w\beta \in \Sigma -\Sigma _P\).

  2. (ii)

    For \(\alpha \in \Sigma -\Sigma _P\), we have \(w^{-1} \alpha > 0 \iff -w^{-1} \alpha \in \Sigma -\Sigma _Q\).

  3. (iii)

    The map \(x \mapsto -w(x)\) induces a bijection of these sets:

    $$\begin{aligned} \{ \beta \in \Sigma - \Sigma _Q{:}\,w \beta> 0\} \longrightarrow \{\alpha \in \Sigma -\Sigma _P{:}\,w^{-1} \alpha >0\} \end{aligned}$$
    (11.17)

    The size of this set is precisely the length \(\ell (w)\).

Proof

Take \(\beta \in \Sigma -\Sigma _Q\) with \(w \beta > 0\). If \(-w(\beta )\) were in \(\Sigma _P\), then \(\beta \) is a positive linear combination of roots in \(w^{-1} \Delta _P\), contradicting the negativity of \(\beta \).

This shows the \(\implies \) direction of (i) and the \(\implies \) direction of (ii) is similar. The reverse directions for (i) and (ii) are clear. For example, if \(-w\beta \) is in \(\Sigma -\Sigma _P\), then \(w \beta > 0\) because all roots in \(\Sigma -\Sigma _P\) are negative. Now it is clear that the maps w and \(w^{-1}\) give inverse bijections in (11.17). \(\square \)

11.6 Conclusion of the argument

We now return to the situation of Proposition 10.6. Let \(G= \mathrm {GAut}(V, \langle -, - \rangle )\).

We translate the problem into reductive group language. Let \(F_0\) be a fixed self-dual filtration on V; we will consider those filtrations F that are conjugate to \(F_0\) under G. Let Q be the stabilizer of \(F_0\) in G, with Levi quotient N. Reformulating Proposition 10.6 (replacing \({\mathfrak {f}}\) from the Proposition with the parabolic subgroup which is its stabilizer): we must estimate the codimension of \(g \in G/Q\) such that, writing \(F = gF_0\), there exists another parabolic subgroup \(P \leqslant G\) such that:

  1. (a)’

    (from property (a) of Proposition 10.6): \(\phi \in P\);

  2. (b)’

    (from property (b) of Proposition 10.6): F is balanced with respect to P, cf. the example of Sect. 11.4.

  3. (c)’

    (from property (c) of Proposition 10.6): The G-conjugacy class of \((P, \phi _M, F_M)\) is fixed, where \(\phi _M\) is the projection of \(\phi \) to the Levi quotient M of the parabolic P.Footnote 13

First of all, we reduce to the case when \(\phi \) is semisimple. Firstly, \(\phi \in P \implies \phi ^{ss} \in P\) and, supposing that \(\phi \in P\), then also \((\phi ^{ss})_M = (\phi _M)^{ss}\) (the subscript M denotes projection to M). Now if \((P, \phi _M, F_M)\) and \((P', \phi _{M'}, F_{M'})\) are conjugate, so that there is \(g \in G\) with \(\mathrm {Ad}(g) P=P'\) and \(\mathrm {Ad}(g){:}\,M \rightarrow M'\) carries \(\phi _M\) to \(\phi _{M'}\), then \(\mathrm {Ad}(g){:}\,M \rightarrow M'\) also carries \((\phi _M)^{\mathrm {ss}} = (\phi ^{\mathrm {ss}})_M\) to \((\phi _{M'})^{\mathrm {ss}} = (\phi ^{\mathrm {ss}})_{M'}\). In other words, if we replace \(\phi \) by \(\phi ^{\mathrm {ss}}\) then the codimension of the set described above will only decrease. We do this, and can therefore assume that \(\phi \) is semisimple.

We will first show that

$$\begin{aligned} (\text{ dimension } \text{ of } \text{ possible } \text{ pairs } (P, F_M)) \le z= \dim Z(\phi ), \end{aligned}$$
(11.18)

the dimension of the centralizer of \(\phi \) in G. (Note that, because of our reduction above, z corresponds to the dimension of the centralizer of \(\phi ^{ss}\), for the original choice of \(\phi \).)

The set of P containing a given semisimple \(\phi \) is a finite union of orbits of \(Z(\phi )\), as we see by infinitesimal computations. It suffices, therefore, to examine a single \(Z(\phi )\)-orbit on the space of P. Fix \(P_1\) in this orbit. The dimension of \(Z(\phi )\cdot P_1\) equals

$$\begin{aligned} \dim Z(\phi ) - \dim Z_{P_1}(\phi ). \end{aligned}$$
(11.19)

Next, if we fix \(P \in Z(\phi ) \cdot P_1\), the collection of filtrations \({\mathcal {F}}\) on its Levi factor M for which \((P, \phi _M, {\mathcal {F}})\) belongs to a fixed G-isomorphism class corresponds to a finite collection of orbits of \(Z_M(\phi _M)\) on the space of filtrations on M. Now \(\phi \) is P-conjugate to \(\phi _M\) by (2.1) so that \(\dim Z_M(\phi _M) \leqslant \dim Z_P(\phi )\). It follows that the dimension of the space of possible filtrations on M, for P fixed, is at most \(\dim Z_P(\phi ) = \dim Z_{P_1}(\phi )\). Adding this to (11.19) we deduce (11.18).

We may now conclude the proof. Suppose e is as in (10.24), so that both conditions are satisfied:

$$\begin{aligned}&\text{ number } \text{ of } \text{ positive } \text{ Hodge } \text{ numbers } \geqslant z+e\\&\text{ sum } \text{ of } \text{ all } \text{ positive } \text{ Hodge } \text{ numbers } > T(z+e) + \ T(\frac{h^0}{2}+z+e). \end{aligned}$$

Recall that \(X(F) \subset G/Q\) is the set of filtrations that are G-conjugate to F and are balanced with respect to P; We may apply Proposition 11.3, but taking the e of that Proposition to be \(z+e\) in the discussion above. (Note that the first displayed equation above guarantees, in the notation of Proposition 11.3, that \(z+e \leqslant \dim (G/Q)\), as needed to apply it.) Thus, if we fix P, the codimension inside G/Q of any fiber of

$$\begin{aligned} X(F) \rightarrow \text{ filtrations } \text{ on } M \end{aligned}$$

is at least \(z+e\).

However, we saw above that the dimension of possibilities for (P, filtration on M) is at most z. Therefore, the total codimension of the set of \(g \in G/Q\) satisfying (a)’, (b)’, (c)’ is at least e, concluding the proof. \(\square \)

12 Bounding Frobenius via point counts

We remark on an alternative approach to bounding the size of the Frobenius centralizer, i.e. the step that was achieved in the previous argument by Lemma 10.4. It is likely that in some ranges this gives rise to better numerical bounds.

Lemma 12.1

Let Y be a smooth hypersurface of degree d and dimension \(n \geqslant 2\), defined over the finite field k with q elements; let \(b= \dim H^n_{\mathrm {prim}}(Y_{{\bar{k}}}, {{\mathbf {Q}}}_{\ell }).\) Then the centralizer Z of the semisimplified Frobenius, acting on \(H^n_{\mathrm {prim}}(Y_{{\bar{k}}}, {{\mathbf {Q}}}_{\ell })\), has dimension at most \(3b^2/N\), where N is the largest integer for which \(q^{(n/2+1)N} < b/3\).

Proof

To avoid confusion between \(i = \sqrt{-1}\) and as an index we write \(e(\alpha ) := \exp (2 \pi i \alpha )\).

Let the Frobenius eigenvalues on \(H^n_{\mathrm {prim}}(Y_{{\bar{k}}}, {{\mathbf {Q}}}_{\ell })\) be given by

$$\begin{aligned} \lambda _1=q^{n/2} e(\theta _1), \dots , \lambda _b= q^{n/2} e (\theta _b), \end{aligned}$$

and let \(\mu \) be the measure on \(S^1\) given by \(\sum _{i=1}^b \delta _{\theta _i}\). If the multiplicities of the \(\theta _i\) are \(m_1, \dots , m_r\), with \(\sum m_i = b\), then \(\dim Z = \sum m_i^2\).

If g is any non-negative real-valued function on \(S^1\) we have \(\int g(t-\theta ) d\mu (\theta ) = \sum _{s} g(t- \theta _s)\), and so

$$\begin{aligned} \int _{t} dt \left| \int g(t-\theta ) d\mu (\theta ) \right| ^2 \geqslant \mathrm {dim} Z \cdot \Vert g\Vert _{L^2}^2 \end{aligned}$$
(12.1)

which bounds from above the dimension of the centralizer; this estimate is most effective if the support of g is concentrated near 0. Here, and in what follows, the measure is the Haar probability measure on \(S^1\).

If \(k'\) is the field extension of k of degree j, the number of points of \(Y(k')\) is given by

$$\begin{aligned} |Y(k')| = \sum _{\ell =0}^{n} q^{\ell j} + (-1)^n q^{nj/2} \sum _{s=1}^{b} e(j \theta _s). \end{aligned}$$

Since this lies between 0 and the size of \({\mathbf {P}}^{n+1}(k')\), i.e. between 0 and \(\sum _{\ell =0}^{n+1} q^{\ell j}\), we see that

$$\begin{aligned} \left| \sum _{s} e(j \theta _s) \right| \leqslant q^{(n/2+1) j}. \end{aligned}$$
(12.2)

Let

$$\begin{aligned} g_N(t) = \left( \sum _{r = -N}^{N} e(rt) \right) ^2 = \sum _{r = -2N}^{2N} (2N+1 - \left| r \right| ) e(rt), \end{aligned}$$

a function on \(S^1\). Note that \(\Vert g_N\Vert _{L^2}^2 = (2N+1)^2 + 2 \sum _{i=1}^{2N} i^2\). We have

$$\begin{aligned} \int g_N(t - \theta ) d\mu (\theta ) = \sum _{r=-N}^N (2N+1 - \left| r \right| ) \sum _{s=1}^b e(r(t-\theta _s) ). \end{aligned}$$

Using (12.2), we see that this is bounded in absolute value by

$$\begin{aligned} (2N+1) \left[ b + 2 \sum _{r=1}^N q^{(n/2+1)|r|} \right] \leqslant (2N+1) \left( b + 3 q^{(n/2+1) N}\right) \end{aligned}$$

since \(q^{n/2+1} \geqslant 4\). Therefore, by (12.1),

$$\begin{aligned} \dim (Z) \leqslant b^2 (1 + 3 b^{-1} q^{(n/2+1)N})^2 \cdot \underbrace{ \left( \frac{(2N+1)^2}{(2N+1)^2 + 2 \sum _{i=1}^{2N}i^2} \right) }_{\le \frac{3}{4N}}. \end{aligned}$$

Choose N the largest integer with \(q^{(n/2+1)N} < b/3\); we get

$$\begin{aligned} \dim (Z) \leqslant 3b^2/N. \end{aligned}$$

\(\square \)